In machine learning techniques, a hyper-parameter is a parameter which is set before the learning precess starting. Different model training algorithms require different hyper-parameters (meta-parameters). The time require to train and test a model can depend upon the choice of the hyper-parameters. The hyper-parameter is usually is a real number or an integer number. Hyper-parameters are always dependent variables.
Many approach can be applied to looking for a suitable set of hyper-parameters.
Cross-validation technique is usually used along with hyper-parameter optimisation.
3.6.1 Cross-Validation
There is a need to validate the stability of machine learning models. The models should be generalised from most of the patterns of training datasets correctly, and they should not pick up too much noise. Validation is a process of deciding whether results are acceptable as descriptions of training datasets. Cross-validation is a statistical method of evaluating and comparing learning algorithms.
The cross-validation divides a training dataset into two segments: one segment is used to learn or train a model, and the other is used to validate the model, Refaeilzadeh et al. (2016). In a standard cross-validation process, training and validation sets must
Chapter 3. Theoretical framework 45 crossover in successive rounds such that each labelled data has a chance of being validated against, Arlot and Celisse (2010), Fushiki (2011), Kan et al. (2018), Kim (2009), Molinaro et al. (2005). The cross-validation form is often k-fold cross-validation. There are some other forms of cross-validation which typically are individual cases of k-fold cross-validation.
←− Total Number of data set segments −→
Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5
Training Validation
Figure 3.7: K-fold cross validation (k=5)
In the k-fold cross-validation, a dataset is firstly divided into k equally partitions (folds).
Afterwards, k iterations of training and validation are performed where each iteration uses a different fold of the data for the validation process, while the remaining k-1 folds are utilised for the learning process. Figure 3.7illustrates 5-fold cross-validation.
3.6.2 Hyper-parameter optimisation
Hyper-parameter optimisation is a problem of choosing a set of optimal hyper-parameters for a learning algorithm in machine learning techniques. The same type of machine learning technique in different problems usually requires different parameters such as constraints, weights or learning rates to generalise different models.
Set of those parameters is called hyper-parameters and they have to be specified regarding a specific problem so that the machine learning technique can optimally solve the problem.
Hyper-parameter optimisation is a process to find a set of hyper-parameters that yields an optimal model. Hence the model can minimise a predefined loss function on given independent data. The objective function takes a set of hyper-parameters and returns the associated loss (error), Claesen and Moor (2015). Cross-validation is often used to estimate generalisation performances,Bergstra and Bengio (2012).
Chapter 3. Theoretical framework 46 A grid search algorithm conventionally performs hyper-parameters optimisation. The grid search is an exhaustive searching through a predefined subset of hyper-parameters’
space of a learning algorithm. The grid search algorithm is driven by some performance metric such as cross-validation on the training set Hsu et al. (2016). The grid search algorithm suffers from the curse of dimensionality, but is often embarrassingly parallel because typically the hyper-parameters are independent, Bergstra and Bengio (2012), Zhang et al. (2009)
In this research, the grid search algorithm is utilised to optimise hyper-parameters for ANN and SVM models. Although several advanced methods can be used for optimisation processes, Burges et al. (1999), Shawe-Taylor and Cristianini (2004).
They are iterative processes which cannot be easy to parallelise.
The hyper-parameters can be selected by experience, expertise and a priori knowledge of the problem,Cherkassky and Mulier(2007),Scholkopf and Smola(2001),Vapnik(2000).
Manual selecting hyper-parameters may lead to many repeated trials and error attempts before getting optimums. The limitation of the methods is that they are only suitable for experts. Hyper-parameters of ANN and SVM are independent hence the grid search algorithm can perform hyper-parameter search simultaneously for both ANN and SVM.
Algorithm 3.1 Pseudo-code of grid searching for hyper-parameters where Tin, Tout, n, Θ are input matrix, output matrix, number of labelled data used for the search and a set of hyper-parameters, respectively
1: functionGRIDSEARCH(Tin,Tout,n, Θ) 2: if SizeOf(Tin)> nthen
3: S ←Randomly select n samples from [Tin, Tout]
4: else
5: S←[Tin, Tout] 6: end if
7: S= (tin1 , tout1 ),(tin2 , tout2 ), ...,(tinm, toutm ) . tini ∈Tin,touti ∈Tout respectively
8: k←5
9: partitionS intoS1, S2, ..., Sk
10: A←Machine learning algorithm 11: for eachθ∈Θdo
12: forp=1 tokdo
13: hi,θ =A(S\Si;θ) . limit under-fitting and over-fitting are applied 14: end for
15: error(θ)← 1k
Pk
i=1LSi(hi, θ) 16: end for
17: θbest←argminθ[error(θ)]
18: hθbest ←A(S;θbest) 19: end function
Chapter 3. Theoretical framework 47 The number of data samples in traffic link models is varied from hundreds to thousands. The grid searching algorithm performing on the complete dataset is a time-consuming process when the number of labelled data is significantly large. In this research, a maximum of 1000 labelled data in the dataset are randomly chosen for the hyper-parameters optimisation process. The hyper-parameters which are a result of grid searching algorithm are applied to train ANN and SVM models on the complete dataset subsequently. The mean square error (MSE) is used to assess the performance of the ANN and the SVM models. The grid-search algorithm used in this research is to describe in Algorithm3.1.