Prediction of marshall design parameters of asphalt mixtures via machine learning algorithms based on literature data

Trang 1 Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=trmp20Road Materials and Pavement DesignISSN: Print Onl

Trang 1

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=trmp20

Road Materials and Pavement Design

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/trmp20

Prediction of Marshall design parameters of

asphalt mixtures via machine learning algorithms based on literature data

Mert Atakan & Kürşat Yıldız

To cite this article: Mert Atakan & Kürşat Yıldız (2024) Prediction of Marshall design

parameters of asphalt mixtures via machine learning algorithms based on literature data,Road Materials and Pavement Design, 25:3, 454-473, DOI: 10.1080/14680629.2023.2213774

To link to this article: https://doi.org/10.1080/14680629.2023.2213774

Published online: 23 May 2023

Submit your article to this journal

Article views: 237

View related articles

View Crossmark data

Trang 2

2024, VOL 25, NO 3, 454–473

https://doi.org/10.1080/14680629.2023.2213774

Prediction of Marshall design parameters of asphalt mixtures via machine learning algorithms based on literature data

Department of Civil Engineering, Faculty of Technology, Gazi University, Ankara, Turkey

ABSTRACT

Previous studies have achieved accurate predictions for Marshall design

parameters (MDPs), but their limited data and input variables might restrict

generalization In this study, machine learning (ML) was used to predict

MDPs with more generalised models To achieve this, a dataset was

col-lected from six different papers Inputs were material properties and their

ratios in the mixture, while target features were six MDPs used in mixture

design Four ML algorithms were used including linear regression,

polyno-mial regression, k nearest neighbour (KNN) and support vector regression

(SVR) Also, the cross-validation (CV) method was used to detect the

gen-eralisation capability of the models Accuracy of the SVR was the highest,

however, in nested CV its performance was highly reduced Therefore, KNN

was recommended due to its second highest performance The results

demonstrated that prediction of MDPs from only material properties is

possible and promising to use in mixture design

Abbreviations: ANN: artificial neural network; BC: bitumen content; BP:

bitumen penetration (1/10 mm); CV: cross-validation; DEM: discrete

ele-ment method; GA: genetic algorithm; Gmb: Bulk specific gravity of mixture;

Gmm: Maximum specific gravity of mixture; Gsb: bulk specific gravity of

aggregate; KNN: k nearest neighbour; LA: Los Angeles abrasion; LR: linear

regression; MARS: multivariate adaptive regression spline; MDP: Marshall

design parameter; MF: Marshall flow; MQ: Marshall quotient (kN/mm); MS:

Marshall stability; NMAS: nominal maximum aggregate size; NoB: number

of blows; PI: penetration index; PR: polynomial regression; R2: coefficient

of determination; SP: softening point (°C); SVR: support vector regression;

UPVT: ultrasonic pulse velocity–time; Va: air voids percentage; VFA: voids

filled with asphalt; VMA: voids in mineral aggregate; WA: water absorption

ARTICLE HISTORY

Received 21 November 2021 Accepted 9 May 2023

KEYWORDS

Asphalt mixture design; machine learning; Marshall design; prediction model; virtual design

1 Introduction

The durability and performance of the asphalt pavement are highly affected by the mechanical andvolumetric properties of the mixture (Sebaaly et al.,2018) In order to provide required mechanical andvolumetric characteristics, asphalt mixture design is carried out In other words, mixture design is themost significant factor that affects road performance

Today, Marshall, Hveem, and Superpave methods are commonly used design methods in the world(Jiang et al.,2018) There are some differences between these methods such as compaction style, size

of the specimen and mechanical tests applied to the specimen But basically, mixture design starts

by producing various asphalt mixture specimens in different binder content and gradation Then, it

CONTACT Mert Atakan mertatakan@gazi.edu.tr

Trang 3

ROAD MATERIALS AND PAVEMENT DESIGN 455

is determined which of the specimens meet the necessary performance criteria These methods arevery time-consuming, demanding and expensive For example a Superpave mixture design might takeapproximately 7.5 working day (Ozturk & Emin Kutay,2014) That is why, a prediction-based approach

in mixture design is vitally important Accordingly, predicting the physical and mechanical properties

of the asphalt mixture from its material characteristics without excessive laboratory work is essential

In this regard, researchers have employed two basic approaches: numerical simulations (e.g Discreteelement method (DEM), finite element method, and user defined algorithms) and soft computingmethods like machine learning (ML) (Liu et al.,2022)

There have been many studies using numerical methods to predict characteristics of asphalt crete such as air voids, density, rutting, etc Li and Wang (2017) have performed a Marshall design bypredicting Marshall characteristics of the virtual specimens produced in the DEM simulation Similarly,Shen and Yu (2011) have used DEM to predict voids in mineral aggregate (VMA) of the asphalt con-crete Jin et al (2022) have used a user defined algorithm to produce internal structure of the asphaltconcrete based on aggregate contacts Also, a large and growing body of literature has investigatedphysics engine simulation to produce virtual asphalt specimens Garcia-Hernandez et al (2021) haveproduced virtual Marshall specimens via a physic engine called as Nvidia PhysX to predict air voidcontent of the asphalt specimens Likewise, Komaragiri et al (2021) have used bullet physics engine

con-to simulate gyracon-tory compaction con-to predict density of the asphalt specimens

ML basically means extracting knowledge from data (Müller & Guido,2016) To achieve this, the dataare arranged as a table where columns represent features and rows represent a single observation or acase (Theobald,2017) Then, input and target variables are selected After that, data was split into two

as training and test data Once the data are split, a prediction model is trained using the training datavia various learning algorithms This model can predict target values from the input values Finally,the performance of the model is measured with test data by comparing prediction values and the realvalues

Many studies have been done to predict the mechanical and physical properties of asphalt ture using either ML algorithms or soft computing techniques such as artificial neural network (ANN),genetic algorithm (GA) and fuzzy logic Majidifard et al (2019,2020) have built a gene expression and

mix-a deep lemix-arning ML model to predict the rut depth mix-and the frmix-acture energy Therefore, they hmix-ave beenable to make asphalt mixture designs based on these predictions Miani et al (2021) have created anANN model to predict basic characteristics of the asphalt such as stiffness modulus, Marshall stability(MS), Marshall flow (MF) and air voids percentage (Va) Some other previous studies are listed in Table1.Volumetric characteristics of asphalt specimens (e.g Va, VMA, voids filled with asphalt (VFA)) have beenused to predict mechanical properties such as MS, MF and stiffness in a considerable amount of litera-ture For instance, Aksoy et al (2012) used some Marshall design parameters (MDPs) such as Va, density,VMA, etc as inputs to predict MS, MF and Marshall quotient (MQ) These parameters are well corre-lated with Marshall test results, however, they are obtained with experiments Unless produce Marshallspecimens, we cannot predict MS, MF and MQ with this type of model In other words, although thesekinds of models have made successful predictions, they are not sufficient to reduce laboratory labourand time Thus, it is necessary to build a prediction model that does not require any experimentalinput variable such as Va, VMA or VFA More specifically, a better prediction model should use mate-rial properties as inputs such as bitumen type, aggregate gradation, etc In this way, it is possible topredict all MDPs without producing any specimen For example, Azarhoosh and Pouresmaeil (2020),Nguyen et al (2019), Sebaaly et al (2018), Khuntia et al (2014) and Ozgan (2009) have not used volu-metric parameters as inputs Therefore, their models could be used to predict design parameters in thefuture without any laboratory work with high prediction accuracy However, these studies might not

be generalised with high accuracy due to limited input features, feature range (e.g a couple of bitumentype or aggregate type) or small dataset size To sum up, a part of the previous studies has used someMDPs as input variables that cannot reduce laboratory labour Others have created high-performanceprediction models, but their models might not be generalised In other words, they may not workproperly with other bitumen types and aggregate types or in another laboratory environment

Trang 4

456 M.ATAKAN ANDK.

Filler/bitumen ratio Binder type Binder ratio Marshall test temperature Exposure time to test temperature Ultrasonic pulse velocity–time of sample Sample volume

Sample height Sample production method Number of blows in compaction Saturated surface dry speciﬁc gravity Gmb

MS MF MQ Va VMA VFA Other additives type/ratio Repeated creep test properties

Trang 6

In this study, one of the aims is to establish a prediction model whose inputs are composed merely

of material properties, in order to achieve producing virtual Marshall specimens without any laboratoryeffort in the future Therefore, it is considered more input variables at the same time than previousstudies, and some of them are used for the first time such as Los Angeles abrasion (LA), penetra-tion index (PI), softening point (SP) and bitumen penetration (BP) The other aim is to obtain a moregeneralised model To achieve this, various datasets were combined from different studies In thisway, there will be various data from different laboratories and the constructed dataset will be morerepresentative

of the MS and MF values Once these two columns were added, missing values were imputed usingDataWig library which is later explained in detail on the title 2.1.2 After that, another column namedVFA was added to the dataset VFA was calculated by using VMA and Va values It is important to state

a point here Because some of the Va and VMA values were missing at the beginning, we could notadd the VFA column before imputing missing values Instead, we add after imputation the missingvalues

Once the dataset was completed, input and target features were determined Next, the data weredivided into train and test sets without any scaling Then, linear regression (LR) was applied to trainingdata However, for the other models, the data were scaled using the standard scaler function in thescikit-learn library before training In the training process, two different approaches were employed Atthe first one, the dataset was randomly divided as train and test sets with the train–test split function.Then the models were trained with the train set In the second approach, the dataset was divided

into more than one train and test group using k-fold cross-validation (k-fold CV) and coefficient of determination (R2) values were calculated for each model The differences between these approachesare explained further in the related titles

2 Getting data from each study for the chosen features

3 Joining the data from all studies into one dataset

First, we determined 14 specimen features in total considering previous studies and important tures that might affect the mixture design results In other words, basic material characteristics that canchange MDPs such as Va, MS or MF were used as input features Also, when choosing input features,

fea-we attached importance especially not to choose a MDP as an input Although MDPs like Va mighthighly correlate with MS and MF, the real aim of this study was to predict all MDPs without producingany Marshall specimens That is why all of six MDPs were chosen as target features Once all featureswere determined, it has generated three additional features (i.e PI, VFA and MQ) from current columnvalues These features are demonstrated in Table3

Trang 7

ROAD MATERIALS AND PAVEMENT DESIGN 459

Figure 1.Workﬂow diagram of the current study.

The challenges we encountered when creating the data set are given below:

• Some of the researchers shared the gradation of aggregates as a graphic We used a software namedGet Data Graph Digitilizer to get the exact percent of the coarse and fine aggregates This softwarecan scale the image and draw a new readable graphic over again In this way, necessary values can

be read from the graphs

• While some researchers have defined coarse aggregate as bigger than 4.75 mm, others havedefined it as bigger than 2.36 mm We assumed coarse aggregate as bigger than 2.36 mm That

is why, when creating the dataset, all data that we got from the studies was modified according tothis assumption

Trang 8

Table 2.Number of specimens of the studies that comprise this study’s dataset.

Reference Number of specimens Mirzahosseini et al ( 2011 ) 118 Azarhoosh and Pouresmaeil ( 2020 ) 90 Nguyen et al ( 2019 ) 60 Baldo et al ( 2018 ) 60 Aksoy et al ( 2012 ) 63 Tapkin et al ( 2010 ) 16

Table 3.Features of the created dataset.

1 Nominal maximum aggregate size (NMAS) (mm) 6 BP (1/10 mm) 10 BC (%)

17 Gsb (g/cm 3 )

a Features generated from other columns.

• In some studies, bitumen content (BC) was presented by the weight of the mixture (e.g Baldo

et al.,2018) These values were transformed into by weight of the aggregate to provide equivalencyamong the different studies

• In Baldo et al (2018), the number of blows used in the compaction has not been presented It hasbeen said that specimens were prepared according to EN 12697-30 This standard states that thenumber of blows should be between 25 and 100, but it also states it is generally used as 50 blows.Therefore, the number of blows for compaction was assumed as 50 blows for the study named(Baldo et al.,2018)

• Units were converted to the same unit for each feature

2.1.2 Handling missing values

There were some missing values in the dataset because some features were not presented in the ies we used DataWig library was used to impute missing values This approach employs automatichyperparameter tuning in deep learning feature extraction Therefore, even users who do not havedeep learning background can benefit from the library (Biessmann et al.,2019)

stud-We used one of DataWig functions named ‘SimpleImputer.complete’ to impute missing values.This function fits an imputation model for each column by choosing all other columns as inputs Thestatistical description of the data with missing values and after imputation are presented in Tables4and5

Once missing values were imputed, the dataset had been created completely However, some tures were not used in the model such as water absorption (WA) and bulk specific gravity of aggregate(Gsb) since they have a high number of missing values in the first place In other words, using thesefeatures might have led to high bias in the models, therefore they were not used in the models

fea-2.1.3 Splitting the data as train and test sets

The ‘train_test_split method’ in the scikit-learn library was used to split the data into train and testsets Train and test sizes were chosen as 0.67 and 0.33 Random state number, which provides thesame splitting state every time the code runs, was selected as 0

Trang 9

Table 4.Statistical description of the data with missing values.

NMAS (mm) Coarse agg (%) Filler (%) BC (%) Va (%) VMA (%) MS (kN) MF (mm) LA (%) WA (%) Gsb (g/cm3) BP (1/10 mm) SP (°C) NoB MQ PI

Table 5.Statistical description of the data with imputed values.

NMAS (mm) Coarse agg (%) Filler (%) BC (%) Va (%) VMA (%) MS (kN) MF (mm) LA (%) WA (%) Gsb (g/cm 3 ) BP (1/10 mm) SP (°C) NoB MQ PI

Trang 10

2.1.4 Scaling the data

The data were scaled before training for all models except for LR We used the standard scaler in thesckit-learn library It standardises features by transforming the data into which has a mean value of 0

and standard deviation value of 1 The standard scaled value of sample X was calculated as Equation (1)

where¯x is the mean of the samples and σ is the standard deviation of the samples.

2.2 Model performance assessment

2.2.1 Prediction performance

In order to assess the prediction performance of the models, we used the score function in the

scikit-learn library This function returns the coefficient of determination, namely R2value It was calculated

as Equation (4) where RSS is the residual sum of squares and TSS is the total sum of squares In

Equation (2), y i is ithvalue to be predicted, f (X i ) is the predicted value of y i , and n is the upper limit

of summation In Equation (3), y i is ith value in sample, ¯y is the mean value of the sample and n is the

upper limit of summation

The value of R2can be as high as 1.00 at maximum It can be also negative when the predictionperformance is poor Therefore, the closer this value is to 1.00, the higher the prediction performance

CV is used to assess the generalisation performance of the prediction model It is more stable and

comprehensive than the basic train/test split method One of the most common CV methods is k-fold

CV The k number is to be decided by the user which is commonly chosen as 5 or 10 (Müller & Guido,

2016) We used the cross_val_score function in the scikit-learn library to perform CV The number for k

was selected as 5 except for the support vector regression (SVR) model In addition, the random statewas 1, and the shuffle option was true Once parameters were decided, this function divided the datainto five parts which are called folds Then, five different training were accomplished which are calledsplits (Figure2) For instance, in split 1, training data were composed of fold 2–5 and test data werecomposed of fold 1 Finally, the CV score was calculated as an average of the accuracy scores of fivemodels established in every split

algorithm, it made up 6∗6= 36 combination Since five-fold CV was also used, 36∗5= 180 modelswere built to choose the best parameters Then, the model accuracy is calculated in every model and

Tiêu đề	Prediction of Marshall Design Parameters of Asphalt Mixtures via Machine Learning Algorithms Based on Literature Data
Tác giả	Mert Atakan, Kürşat Yıldız
Trường học	Gazi University
Chuyên ngành	Civil Engineering
Thể loại	article
Năm xuất bản	2024
Thành phố	Ankara

Định dạng
Số trang	21
Dung lượng	4,98 MB