1 Introduction

Currently, the world is living in an undeniable plastic crisis. In 2015, mismanaged plastic waste was estimated at between 66 and 90 million Mt [1], corresponding to 17–23% of the plastic waste generated worldwide [2]. The demand for plastics is expected to continue to rise due to their low cost and functional properties, for example, water resistance, flexibility, and durability. Nevertheless, these very properties are the cause for their high persistence in the environment.

To cope with plastic waste, many management alternatives have been developed, such as physical recycling, energy recovery, and resource recovery, to name a few. Prior to 2018, most recovered plastics were shipped to China for processing. In 2017, however, China imposed an unprecedented ban on importing most plastic waste. Thus, finding new alternatives to deal with the mounting plastic waste at the source has become imperative. This urgency has prompted the scientific community to look for innovative methods, such as incorporating plastic waste into construction and civil works.

One treatment alternative that has gained interest in the last few years is adding plastic waste in flexible pavements, as evidenced by the rise in the number of publications that studied plastic waste in pavements. These studies have increased sharply from 81 articles in 2011 to 621 in 2021 in the Science Direct database. Researchers have reported two prevalent techniques for adding plastic to flexible pavements: wet and dry techniques. The wet technique combines the plastic waste with the hot bitumen, while the dry technique adds plastic waste to the hot aggregates. The principal characteristic of the wet technique is that it modifies the bitumen, which in turn affects the final pavement performance. The advantages of the dry technique are that it permits larger quantities of plastic additives and, as some studies have concluded, present better results than the wet mixing technique [3, 4]. Table 1 presents the summary of the advantages and disadvantages of both processes.

Table 1 Advantages and disadvantages of wet and dry processes

Regardless of the addition technique, plastic waste in pavement offers some potential benefits. First, it reduces the plastic amount in landfills and the environmental impact of road development [5]. Second, it could reduce the cost of pavement construction because the requirement for bitumen or aggregates is reduced [6]. And last, the engineering properties of the pavement can be enhanced, especially those related to rutting and cracking resistance [7, 8]. Nevertheless, the extent of these benefits varies depending on the plastic type employed, the mixing parameters, and the asphalt mix. Thus, developing a model that predicts the potential effects on plastic asphalt mixes (PAM), modified through the wet or dry technique, would be valuable because it can provide confidence in the plastic addition benefits.

Several articles have proposed models for predicting asphalt mixtures properties [915], but only a few have been proposed for mixtures modified with plastic, for it is a relatively new research topic. A reliable PAM model could not only close the gap between research and application, but could also increase the technical reliability of this technology.

To the authors’ knowledge, only four articles have proposed PAM predictive models. Azarhoosh et al. [16] developed an artificial neural network (ANN) model on asphalt mixtures modified through the wet technique with HDPE and plastic bottles in a range of addition between 2 and 10%. The variable predicted was the accumulated strain. Tapkin et al. [17] also attempted to predict the accumulated strain using ANN for samples modified with PP through the wet mixing technique at 3–6% of addition. The other two articles came from the same authors, Tapkin and Çevik, and predicted the Marshall flow (MF), Marshall stability (MS), and Marshall quotient (MQ) [27, 28]. Table 2 summarizes the mentioned models. The range of the training data for all four models was limited to PP/PE/PET, wet mixing technique, and bitumen type 50/70.

Table 2 ML models proposed for PAM

Although the studies reviewed in Table 2 extend the modeling and knowledge of plastic asphalt mixtures, they present some issues that limit their application in real scenarios. The models can only predict when the features to evaluate are within the range of the training set, which makes the models too specific for their usage. Also, the data were limited to each study; therefore, it does not account for laboratory inter-variability bias and restricts the sample size. Small datasets increase the risk of overfitting, and their validation tends to be unreliable when the hold-out method is used. The hold-out validation method uses a small percentage of the data to test the models, while the remaining percentage is employed to train the ML models. This is especially useful when the dataset is large, and the validation process must be computationally cheap. However, in small datasets, other alternatives, such as bootstrap and cross-validation, are more appropriate [29].

With these limitations in mind, the present article would attempt to propose a more generalized model that can be applied to different plastic types, mixing techniques, and aggregates gradation. Yet this process is not straightforward because of the extreme complexity and interactions among variables. For instance, a first attempt to revise the relationship of the plastic-type/mixing process with the air voids property was depicted in Fig. 1. Air voids refer to the air pockets between coated aggregates. As Fig. 1 shows, a clear relation is difficult to deduce due to the high variability among observations. Therefore, other variables must be considered to suggest an appropriate model. These variables could be mixing techniques, plastic type, plastic size, and plastic pretreatment, which is only at the plastic level. Other pavement properties might also influence the final PAM properties. Factors such as aggregates type, mixture gradation, bitumen content, and bitumen type might be considered too.

Fig. 1
figure 1

Air voids effect after one percentage addition of plastic waste

Nevertheless, collecting these data in laboratory or field experiments is a daunting task due to the numerous measurements and tests. Moreover, its complexity is further compounded due to the possible combinations that must be tested. As an alternative, data mining techniques can be used to scrape data from trusted published studies. The significant advantages of this approach are the ample data range, the reliable nature of the data, and the account of inter-variability errors; however, data size might not be enough for a generalized ML model. Thus, instead of proposing a generalized final model, the present article will aim to measure the impact of each feature on the most accurate model trained with the data available and identify the optimal set of features to predict the target properties of the PAM. This understanding could help future studies to improve their data collection and results reporting, support the formulation of a generalized model, and in the end, facilitate the adoption of this technology in real scenarios.

The article will focus on the most basic properties of asphalt mixtures; air voids, Marshall stability (MS), Marshall flow (MF), indirect tensile strength (ITS), and tensile strength ratio (TSR). These properties were selected because they are the most measured by researchers, so the sample size is large enough for reliable inferences. Although these properties are considered basic, they are still relevant in the field, as shown by the number of articles that have repeatedly measured them and their validity in the Marshall mix design [30]. A low air void value improves the rutting, cracking, and water damage resistance of pavements, while a high air void value increases the permeability of the mixture [31]. Regarding the mix design, air voids should be within a limit, usually between 2 and 4%, depending on the type of asphalt mix and intended application [32]. The MS and MF are simple properties that involve inexpensive tests and the Marshall apparatus [33]. This equipment applies a diametrical load to a specimen at a constant rate (50 mm/min) [34], and it measures the load at which the sample breaks. This load is the MS, while the deformation at the breaking point is the MF. The stability indicates the pavement resistance to stress, so it displays the stiffness of the sample. MF, alternatively, reveals how plastic or brittle the mixture is. The ITS and TSR, as stated by the ASTMD6931, determine the asphalt-mix resistance to cracking and moisture, respectively. In the ITS, a vertical load is applied to a core sample at a 50 mm/min deformation rate. When the sample breaks, the load is recorded and used for calculating the ITS. Higher ITS correlates with higher cracking resistance [35]. The TSR is the ratio between the ITS of a dry sample and a wet sample, also known as the saturated sample. This measure reflects how well the mixture maintain its cohesion in the presence of water. The higher the TSR the better the mix will resist water damage.

2 Methods

2.1 Data Collection

To collect data, the method described by Pickering and Byrne [36] for the Systematic Quantitative Literature Review (SQLR) was followed. SQLR surveys all the relevant literature, facilitates selecting articles with reproducible and valid results, and supports quantitative and qualitative data. Besides, it generates a database that can be updated with new studies so that the initial models generated from the database can be updated or retrained.

An SQLR consists of two phases: article selection and data curation. In the first phase of the SQLR, the keywords pertaining to the search were defined. The keywords searched were 'plastic waste', 'bitumen', 'asphalt', and 'pavement'. The databases searched were Web of Science and Scopus. Other conditions were set to limit the scope to articles published between 2009 and 2020, written in English, and not conference proceedings. In total, 90 articles complied with these requirements. The filter process was summarized in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [37], included in Fig. 10.

In the second phase of the SQLR, the data were tabulated for the model’s training. A distinction between dependent variables (predicted or target variables) and independent variables (features) was made within these data. The predicted variables are the effect of plastic addition on specific asphalt properties (air voids, MS, MF, ITS, and TSR). The features are the independent variables employed to train the ML estimators. The features were either numerical or categorical. Numerical features were aggregates gradation, aggregates absorption, bitumen content, bitumen penetration, plastic content, and plastic size. Categorical data included plastic resin, mixing type (binary: wet or dry), and material replacement. Material replacement refers to the samples where the plastic addition replaced a percentage of aggregates or bitumen. In the case of the mentioned variables, the aggregates gradation is included for accounting for the asphalt mix type, while the penetration of the bitumen is employed for considering the bitumen type. Data collected and tabulated can be observed at the Online Resource 1. An additional data exploration analysis was included for summarizing this tabulated data.

2.2 Data Cleaning

After tabulation, the data underwent a pre-processing or cleaning phase. This stage aimed to prepare the data for model training. The cleaning phase is essential because some models, for example, those that employ gradient descent or distance algorithms, require data scaling and categorical variables to be transformed into dummy variables.

Feature engineering was performed on the plastic addition variable. This process was needed because some articles reported it as a percentage of the bitumen content, while others as aggregate content. This feature was transformed as a ratio over bitumen weight. To transform the observations reported as aggregates weight, they were divided by the bitumen content. Although this transformation is not entirely accurate, it provides a close approximation, for the aggregates correspond to at least 94% of the asphalt mix. Additional feature engineering was applied to plastic content and plastic-type features. These variables were multiplicated and combined under one feature, reducing the number of categorical features to three.

Some particular features, which were not included in the data collection, could cause outliers in the predicted variable. Some examples are the type of aggregate or filler, sample dimensions, source of bitumen, and aggregates nature. Although including these features may produce more accurate models, it was not possible because they were not commonly reported in the articles. So, it was necessary to identify outlier data points and discard them from the training data. The technique employed was the interquartile range. In this method, the interquartile range is defined as the difference between the first and third quartile. Any observation that is more than one and a half interquartile ranges higher or lower than the upper/lower quartile is considered an outlier and removed from the dataset. Other unusual values caused by data entry errors were also identified and corrected.

Missing data were replaced by applying an iterative imputer. The iterative imputer predicts the missing data as a function of the other variables in a round-robin fashion. The advantage of this imputer is that it restores the variability of the missing data and presents robust results regardless of the number of missing values [38]. Due to the nature of the data, the estimator selected was ExtraTreeRegressor. This estimator was the most appropriate because it did not impute negative values, and its estimations were within a reasonable range.

Last, the categorical variables were transformed to numeric through one-hot encoding, and all the features were scaled. The min–max scaling function from the module sklearn pre-processing was applied. This scaling method was the most suitable, for most of the features did not follow a Gaussian distribution. The transformation made by this method follows Eq. (1).

$${x}_{\mathrm{new}}=\frac{x-{x}_{\mathrm{min}}}{{x}_{\mathrm{max}}-{x}_{\mathrm{min}}},$$
(1)

where xnew is the scaled value, x is the value to scale, xmin is the minimum value among the feature, xmax is the maximum value among the feature

2.3 Feature Selection and Model Training

The minimum number of parameters needed to reduce the complexity and improve the model’s applicability must be included. Models with many features underperform due to irrelevant or redundant data [39]. Reducing the number of features included in the model also reduces the training time, which is crucial during the training of complex models. Thus, the importance of each feature and its ability to increase the accuracy of the models should be assessed. The model used for this evaluation was the extreme gradient boosting tree (XGBoost), a fast ensemble ML model that reduces the risk of overfitting. A stepwise forward selection was employed to select the optimal set of features that minimizes the mean squared error (MSE—Eq. (2)). The gradual addition of components is based on the correlation coefficient between each feature and the target variable; thus, features with higher correlation are added first to the XGBoost model. The MSE is an evaluation metric that heavily penalizes predictive outliers in the dataset. Lower MSE is preferred, for it means that the predicted values are close to the actual values. The optimal set of features was identified as minimizing the MSE with the least number of variables. Then, this set was used to train and evaluate other ML models.

$$\mathrm{MSE}=\frac{\sum_{i=1}^{n}{\left({y}_{i}-\widetilde{{y}_{i}}\right)}^{2}}{n},$$
(2)

where yĩ is the predicted value; yi is the observed value; n is the number of observations.

2.4 Model Training and Screening

Given the nature of the present research, the ML system to apply is a supervised model with a regression task. This study used prevailing ML algorithms, including linear regression, polynomial regression, support vector machine (SVR), decision trees, and ensemble methods based on decision trees. The ensemble models employed were Random Forest, Extra trees, and XGBoost. Ensemble models tend to perform better than single weak learners because they average the bias error of different estimators and reduce the variability of the predicted values [40]. For this reason, ensemble methods have been applied successfully in diverse fields, such as face recognition, computer security, object tracking, intrusion detection, and early disease diagnosis [40].

The tree decision estimator is an algorithm that formulates rules to classify or predict a variable [41]. In a decision tree, the main elements are the root node, the leaf nodes, and the branches. The root node is the starting point of the algorithm, and based on a threshold, it is split into two branches and two leaves nodes that will be subsequently split. This criterion considers all the features in the model and aims to minimize the cost function shown in Eq. (3) [42]. The algorithm divides its nodes until it reaches a predefined max depth, commonly adjusted to reduce overfitting risks. Other common hyperparameters are the minimum number of samples in a node before splitting and the minimum samples required to create a leaf. Although the decision tree is a good estimator, it is usually combined in ensemble methods to produce more accurate results.

$${J}_{k, {t}_{k}}=\frac{{m}_{\mathrm{left}}}{m}*{\mathrm{MSE}}_{\mathrm{left}}+\frac{{m}_{\mathrm{right}}}{m}*{\mathrm{MSE}}_{\mathrm{right}},$$
(3)

where k is the feature and tk is the threshold proposed. mleft is the number of instances in the left branch. mright is the number of instances in the right branch.

$${\mathrm{MSE}}_{\mathrm{node}}= \sum_{i\in node}{{(\widehat{y}}_{node}-{y}^{\left(i\right)})}^{2}$$
$${\widehat{y}}_{node}=1/{m}_{node}\sum_{i\in node}{y}^{\left(i\right)}$$

Ensemble methods are estimators that combine different predictors to improve the model’s final effectiveness. Ensemble methods are divided into two paradigms: sequential ensemble methods and parallel ensemble methods. The sequential paradigm trains each model in a sequential form, aiming to improve the predecessor model. The final predictor will be the prediction of the last trained model. Alternatively, the parallel paradigm trains different models simultaneously, and the final prediction is usually the arithmetic mean of the predictions of every single model [40]. Random Forest is one of the most used ensemble methods built on the decision tree under the parallel paradigm.

In a Random Forest, each tree adds randomness to the model by selecting the feature to split among a random features subset. In other words, it also attempts to minimize Eq. (3), but instead of evaluating all the features, as it happens in a single decision tree, it looks over a random subset of features. Besides, each tree is trained on a subsample of the train set. The subsampling could be with samples replacement, bagging, or without replacement, pasting. After the model has been trained, the final prediction will be the trees’ average prediction. In addition to the hyperparameters previously mentioned for the decision tree, the Random Forest also considers the number of trees, the maximum number of features to consider in each split, and the subsampling method (bagging or pasting). Alternatively, the extra tree regressor adds further randomness to the Random Forest. Instead of looking for the best feature threshold to split a node, it selects a random threshold. For this reason, extra tree regressors are computationally faster than the Random Forest, and it tends to decrease the variance while increasing bias error [43]. The hyperparameters of the extra tree regressor are the same as for the Random Forest.

The XGB regressor (XGBoost model) is a gradient boosting method built under the sequential paradigm. In a gradient boosting algorithm, the training aims to minimize each model’s loss function gradually. The process starts by training a weak learner and calculating its error, which is commonly MSE in the case of regression. Then, the error gradient is calculated as the first derivate of the partial derivate of the loss function. This gradient will provide the direction in which the following model parameters must be adjusted to reduce the error [44]. Although the XGB regressor uses the same principles as a standard gradient boosting method, it also presents three significant differences: it adds two regularization parameters to the loss function, it uses the second partial derivate of the loss function as the gradient, and the tree construction is parallelized. Thus, this algorithm is usually faster than traditional gradient boosting methods, and it reduces the risk of overfitting [45].

Other models that add regularization, such as lasso, ridge, and elastic net, were also included and applied to linear and polynomial regression models. For the initial evaluation, all the models were trained, and the Akaike information criterion (AIC), Eq. (4), and AIC corrected (AICc), Eq. (5), were calculated for evaluating their performance. The AICc is used when the sample size is not large enough, and its value tends to be similar to the AIC when the sample size increases. The optimal models for each predictive variable were those with the lowest AICc.

$$AIC=N ln\left(MSE\right)+2K,$$
(4)
$$AI{C}_{C}= AIC+\frac{2K(K+1)}{N-K-1},$$
(5)

where K is the number of features and N is the sample size.

Although the AIC is adequate when deciding the best estimators, it is not valid to evaluate how well the models fit the overall data. Therefore, the coefficient of determination (R2—Eq. [6]) was also reported to complement the AICc. The coefficient of determination reflects the degree at which the models were able to replicate the variability of the observed data. The advantage of the R2 is that regardless of the measurement unit, it returns a value between zero and one. Values close to one means that the model is a good fit.

$${R}^{2}=1-\frac{\sum_{i=1}^{n}{\left({y}_{i}-\widetilde{{y}_{i}}\right)}^{2})}{\sum_{i=1}^{n}{\left({y}_{i}-\overline{y }\right)}^{2})},$$
(6)

where y̅ is the mean of the observed variable; ỹi is the predicted value; yi is the observed value

The AICc was determined under the cross-validation with ten folds and ten repetitions. This method was used to overcome sample bias issues which may evolve due to the dataset size not being large enough. During this process, the data are randomly divided into ten groups of equal size. One of these subgroups is used as a test set, while the remaining nine are the training set. After training the model, the AICc is measured, and another subgroup is used as a train set. This loop will continue until all the ten groups have been employed as the train set, and the final performance metric is the average AICc of all the iterations. This technique was repeated ten times with different random subgroups so that the final performance metric was more reliable than single cross-validation.

The grid search was implemented in conjunction with the cross-validation to select the optimal set of hyperparameters. The grid search identifies the best hyperparameters given a predefined values range. For simple models, the function applied from sklearn was GridSearchCV(), while for ensemble models, RandomizedSearchCV() was used instead. The RandomizedSearchCV() creates a randomized search across a set of predefined hyperparameters, and in the case of ensemble models, it was required because the regular GridSearchCV would increment the training time exponentially. The hyperparameters employed for GridSearchCV and RandomizedSearchCV are described in Table 5. After evaluating all the proposed models with the best hyperparameters, the model with the lowest AICc was selected for the final study stage, model interpretability.

2.5 Model Interpretability

The interpretability of the models will be evaluated with the SHAP method. The Shapley values are based on game theory, and in the ML context, it considers each feature as a player and the prediction as the payout. This value is the fair reward provided to each feature, given their contribution to the final prediction [46]. This contribution is the difference between the models’ predictions that included the feature and those that did not. The SHAP method is based on these Shapley values, but it also proposes an alternative calculation of the Shapley values using a particular Kernel (Kernel SHAP). This method permits global and single interpretability and is efficient in tree-based models due to the TreeSHAP Method [47]. The library employed was shap [47, 48]. The preference for SHAP values is due to their ability to provide a clear understanding of the features’ interaction and importance. LIME (Local interpretable model-agnostic explanations) and XGBOOST feature importance are other popular alternatives for SHAP values; however, they do not guarantee consistency as SHAP values do, and in the case of the XGBOOST, it does not clarify whether the effect is positive or negative.

To validate the SHAP values, some resemblance with the theory and previous researchers’ deductions will be mentioned. This type of validation is not entirely correct, for the SHAP values assess correlation and not causation; nevertheless, it is still a valid option, given the limited literature about the topic. Thus, some intuitive explanations will be proposed after the interpretability analysis to corroborate and comprehend the resultant SHAP values.

3 Results and Discussion

3.1 Data Exploration

The sample size of the air voids data set was 177 observations. As Fig. 2a shows, air voids distribution reasonably followed a normal distribution with a mean of 3.71% and a standard deviation of 0.96 (coefficient of variation [CV] = 0.25). The Marshall and Superpave mix design methodologies prescribe an optimal air void value of 4.0% for asphalt mixes [49], so an average around this value is expected. The lowest value measured was 1.95 percent, indicating a higher demand for bitumen and thus a higher price for the pavement. The article with this observation was Khimta and Arora [50], and after inspection, it was confirmed that the cause for this low value was the significant bitumen amount (6.8% over asphalt mix weight). The highest observed value was 8.04% [24], which is undesirable, for it facilitates rutting and water damage [31]. Nevertheless, 40% of the observations were between 3.5 and 4.5, which is reasonable for the air voids.

Fig. 2
figure 2

Predictive variables distribution in the train set for a air voids, b MS, c MF, d ITS, and e TSR. Limits criteria were based on the Marshall mix design for medium traffic [57]

Figure 2b shows the MS distribution plot for the dataset, which consisted of 242 observations. The data do not appear to follow a normal distribution and has higher variability than the air voids (CV = 0.31). The standard deviation of the sample was 4.71 kN with a mean of 15.13 kN. All observations fulfill the recommendations of the Asphalt institute of surface mixes for light, medium, and heavy traffic (Asphalt Mix Design MS-2) [49]. Even for the Australian standards (AS 2150:2020 Asphalt—A guide to excellent practice), where the minimum limit is 8 kN, most articles complied [51]. The minimum MS value was 6.05 kN, whereas the largest was 27.1 kN. The study with the highest MS value was Murugan [52], and although it is difficult to propose a definitive explanation due to a large number of variables and their interactions, the substitution of aggregates for plastic waste could have been the cause. When the e-plastic waste substituted the coarse aggregates (particle size of 6.7 mm) by 12%, a potent glue effect was created that bound the aggregates further, increasing the cohesion and MS value.

The total number of observations for the MF was 198. Figure 2c shows that the data did not follow a Gaussian curve, and positive skewness can be perceived. Nevertheless, it presented more kurtosis than MS. The mean of the MF was 3.9 mm. Knowing that the permissible range of the MF for the Marshall mix design in medium traffic is between 2 and 4 mm, it is possible to see that more than 60% of the observations complied with these two conditions. Moreover, 31% of the samples were above the limit stipulated by the Marshall mix design, including Angelone et al. [53], who reported the highest MF value (11.1 mm). Meanwhile, only one observation was below 2 mm, with a value of 1.76 mm [54].

The total ITS sample size was 100 observations. There is no stipulated upper or lower value for the ITS; however, higher values are usually preferred because it typically displays higher fatigue life and lower rutting risks [55]. In Fig. 2d, although the mean of the ITS was 0.78 MPa, many data points were within the lowest bin range (0.15–0.3 MPa). This unusual density of observations has increased the sample variability (σ = 0.48), and compared with other variables, it has positioned the ITS as the one with the highest variability (CV = 0.61). After further inspection of the data, it was found that most of these observations came from the same source Mohamed, et al. [56]. In their article, the authors found that 50% of the samples with plastic addition presented ITS values lower than the unmodified asphalt mix, meaning that the plastic deteriorated the engineering properties of the mixture. The possible reasons were the high quantity of air voids in the mixtures, provoked by the PET addition, and the bitumen substitution. Because PET has a high melting point, when added through the dry technique under a temperature not hot enough, it does not melt completely to fill the voids. Similarly, replacing some bitumen with plastic reduced the binder availability to fill the voids. The lowest ITS value obtained in this article, and lowest in the total sample, was 0.15 MPa.

For the TSR, 106 observations were obtained. The average was 85%, presenting the lowest CV among all the predictive variables (\({\mathrm{CV}}_{\mathrm{TSR}}=0.10\), σ = 8.99). Figure 2e shows that 5% of the samples were below the 70% limit recommended by the standard [49]. Like the ITS, the TSR is highly affected by air voids [31]. If the mixture has high air voids, the water has higher chances of displacing the bitumen situated on the surface of the aggregates, leading to a reduction in cohesion. Two data points were less than 70%, reported by Tiwari and Rao [57] and Ranieri et al. [22]. Both articles presented lower TSR than the samples without plastic addition. The possible reasons were the use of hard bitumen and the dry mixing technique. Nevertheless, both authors increased the TSR after adding plastic through the wet technique. Other articles also observe the deterioration of the TSR due to the dry mixing. For instance, Almeida et al. [58], who obtained the highest TSR in the sample (109%), observed a TSR reduction of 17% after adding plastic through the dry technique. The melted plastic’s inability to fill the air voids might explain this TSR decrease in the sample.

3.2 Feature Selection and Model Evaluation

After data preparation, a correlation matrix (Fig. 11) was constructed to evaluate the correlation and potential collinearities among features. Then, each feature was considered to see its effect on the MSE of an XGBoost model. Figure 3 depicts this MSE evolution. Each model starts with the feature with the highest correlation with the predicted variable; aggregates gradation or plastic addition. Then, other features are gradually added to the model according to their correlation value. This order of addition is represented in the X-axis, whereas the Y-axis represents the resultant MSE for the corresponding set of features up to and including that feature. The optimal set of features is depicted in Fig. 3 by the vertical dashed line.

Fig. 3
figure 3

Optimal feature selection for a air voids, b MS, c MF, d ITS, and e TSR

The aggregates gradation and plastic-type are the two most essential features in the air voids (Fig. 3a). We stipulate that plastic has a high impact on air voids. Researchers’ findings suggest that this influence is exhibited in dry and wet mixing techniques. In the wet, Bagampadde et al. [59] explained that the air voids decrease because the bitumen expands after plastic addition. Similarly, plastic addition could increase the air voids due to the bitumen’s viscosity increase after an exaggerated addition of plastic, as Ranieri, et al. [22] demonstrated. Thus, although the plastic can increase the bitumen’s volume, it can also increase the viscosity up to the point that it reduces the bitumen’s capability to flow and fill air voids. In the dry mixing technique, the effect on the air voids is also noticeable. A reasonable explanation is that melted plastic covers the aggregates and fills the air voids between them. The air voids MSE slightly increased after adding bitumen penetration, bitumen replacement, and mixing process. Then, it rapidly decreases after including the bitumen content feature and plastic size. The plastic size, in the wet technique, alters the viscosity of the bitumen [60], and it would end up changing the air voids. In the dry technique, fine plastics can fill more air voids than coarse plastics because they can infiltrate the smaller void space and melt completely, resulting in better filling of air voids. After this sharp reduction, the MSE does not change too much, and the final set of features excluded aggregates absorption and replacement.

As in air voids, MS is highly affected by the plastic type feature (Fig. 3b). This strong effect is confirmed in articles that have studied the wet and dry mixing techniques. In the wet technique, because the modified binder becomes harder, it increases the binder’s bond strength with aggregates, resulting in a stiffer PAM [61, 62]. In the dry technique, the effect is attributed to an enhancement in the mix’s adhesion; melted plastic will bond further the aggregates due to its glue effect [63, 64]. After adding the gradation feature, the MSE is relatively constant up to bitumen content, selected as the last property in the optimal set of features. Notably, the final set of features did not contain the mixing process as it might be expected, meaning that whether the dry or wet techniques are employed, they do not exert a profound effect on the MS prediction.

The primary reduction of the Marshall flow MSE occurred after including the gradation feature (Fig. 3c). Again, the plastic type is one of the most relevant properties influencing the final MF modeling. The effect of plastic addition on MF, as explained by Akinpelu et al. [65], is that it enforces the internal friction of aggregates. Thus, it alters the cohesion of the asphalt mixture and thereby the MF. After aggregates gradation, the MSE stays constant, and its last slight decrease occurs with the addition of the property with the lowest correlation; bitumen content. In contrast to the other properties, MF is the only target variable including all the properties in the final set of features.

The plots shape of the ITS and TSR are very similar (Fig. 3d, e). In both properties, plastic-type, gradation, and aggregate absorption are relevant features in the final modeling. When plastic is added through the wet technique, the ITS is altered due to increased binder viscosity and adhesion ability [66]. And in the dry mixing process, the ITS varies due to an increment of aggregates adhesion. As in the MS, melted plastic on the aggregates’ surface acts as a glue that increases the ITS. In other words, the plastics act as a reinforcement binder for the plastic asphalt mix [67]. Because the TSR is based on the ITS, it is not surprising to see similar results in the MSE curve.

Interestingly, bitumen content, in the ITS, and bitumen penetration, in the ITS and TSR were not considered in the final selected model. This result was unexpected, for, in theory, bitumen content and grading substantially affect the ITS and TSR [31]. It might occur that although it presents a strong effect on the net value of the ITS and TSR, its impact is limited when the effect after plastic addition is attempted to be modeled. ITS, in fact, was the only predicted variable where the bitumen content and bitumen penetration were excluded from the optimal set of features.

It was unexpected to see that the type of mixing, wet or dry, does not significantly affect the predicted variables, and even in the MS and TSR, it was not included in the optimal set of features. This observation is counterintuitive, as one would anticipate that the final PAM characteristics would be decided by the plastic's interaction with hot bitumen or hot aggregates. Few articles have compared dry and wet techniques, so it is not possible to conclude which technique is superior given the small number of studies. Mishra and Gupta [3] determined that the addition of PE by the dry technique yields superior results to the wet technique. To the contrary, Prahara et al. [4] concluded that samples prepared using the wet technique yielded better results than the dry technique based on MS, MF, and MQ values. On the other hand, Mishra and Gupta [3] made their conclusions considering ITS and TSR in addition to the Marshall characteristics. Two additional studies have concluded that the wet and dry techniques produce similar volumetric qualities and, hence, comparable performance [22, 68]. The later studies found that the choice of asphalt mixture formulation process should not be based on the projected performance of the resultant mixture, rather on their practicability. This result is consistent with what is observed in Fig. 3, where the mixing procedure does not appear to be of much importance when predicting PAM properties.

Different ML models were evaluated based on the optimal features selected in the previous step. Figure 4 summarizes their AICc for each of the target variables. The results are ordered from the lowest to the highest AICc value, so the best estimators are located at the top. In general, ensemble models (Random Forest Regressor, Extra trees regressor, and XGBoost) perform better than single estimators. Random Forest returned the lowest AICc (− 659.46) in the air voids, followed by the other ensemble models. The R2 in the air voids Random Forest was 0.743.

Fig. 4
figure 4

Model evaluation for a air voids, b MS, c MF, d ITS, and e TSR

In the cases of MS, MF, and ITS, the Extra Tree Regressor was the best estimator. This estimator presented a relatively good R2 in the mentioned variables (R2MS = 0.7, R2MF = 0.76, and R2ITS = 0.77). Alternatively, the most accurate model for the TSR was the XGBoost, with an R2 of 0.8. More detail on how these best models predicted the target variables are shown in Fig. 12.

Two previous articles have also attempted to predict MF and MS after plastic addition [27, 28]. Although the authors have also reported the RMSE, it was unfeasible to compare their results with the ensemble models’ results because their models predicted the net value of the MS and MF instead of the effect. Nevertheless, they also reported the R2, which could work as an approximation to validate and compare the results. Tapkin et al. [28] reported an R2 for MS and MF of 0.87 and 0.86, respectively. The better performance observed in these results might be due to the low inter-lab variability and the implementation of artificial neural networks, a suitable Machine Learning algorithm with good results when the data is large enough. Also, because of the limited data size, ANN might have caused overfitting, which is reflected in the high R2 value. Although none of the proposed models performed as well as those reported by these authors, their R2s were close enough to infer that the models are still appropriate. After seeing that the obtained R2s are adequate, the optimal models selected for the next phase involving SHAP values were Random Forest for the air voids, Extra Tree regressor, in the case of MS, MF, and ITS, and XGBoost, for TSR.

3.3 Model Interpretability

To better understand the SHAP values, plots were employed to summarize them. The first plot (labeled as a. in Figs. 5, 6, 7, 8, and 9) depicts the global interpretation of the model, where the width of each bar represents the importance of the corresponding feature to the model. The features are displayed from the most important at the top to the least important at the bottom.

Fig. 5
figure 5

Random Forest interpretation for air voids. a Global interpretation and b local explanation

Fig. 6
figure 6

Extra tree regressor interpretation for MS. a Global interpretation and b local explanation

Fig. 7
figure 7

XGBOOST interpretation for MF. a Global interpretation and b local explanation

Fig. 8
figure 8

Extra tree regressor interpretation for ITS. a Global interpretation and b local explanation

Fig. 9
figure 9

XGBOOST interpretation for TSR. a Global interpretation and b local explanation

The second plot (labeled as b in Figs. 5, 6, 7, 8 and 9), commonly called the summary plot, presents the local explanation of each feature. It is important to note that features depicted in the Y-axis are also ordered by importance and that the plastic-type and gradation have been ungrouped. Thus, instead of representing gradation as one feature, it will be separated accordingly to the typical pavement gradation—the same for the plastic-type. The X-axis represents the SHAP values, and if this value is negative, this specific feature in this observation negatively impacts the prediction. The contrary is the case when the SHAP value is positive. And last, the color of the data point indicates whether the value of the feature was high (red), medium (purple), or low (blue). In the case of binary features, the label in the Y-axis will indicate what the color means.

3.3.1 Air Voids

Figure 5a confirms the importance of aggregates gradation and plastic-type during the modeling. Then, the bitumen characteristics (bitumen content and bitumen penetration) seem to present secondary importance in predicting air voids. According to these SHAP values, the plastic effect on air voids is driven by the plastic-type and plastic content and not by how it was added to the mixture (wet or dry technique), as one might expect. Also, this graph corroborates what was observed in the MSE plot (Fig. 3a), where the bitumen penetration, bitumen replacement, and mixing process did not reduce the MSE after being added. When the SHAP values are studied for the local explanation, Fig. 5b, they display relative consistency with the global plot.

The local explanation reveals that PE and the bitumen content present higher importance than the specific gradations. At the bitumen level, it is possible to notice that high bitumen content and penetration decrease the air voids prediction. Although this inference can only be applied within the modeling context, it is an effect that researchers have also observed. When the bitumen content is high, it permits the plastic to fill further the empty spaces within the PAM. The same occurs with penetration; the higher the penetration, the more capable of flowing the bitumen through the plastic and aggregates [31]. At the plastic level, a difference between the response of the PE and PET can be noticed. While PE decreases the air voids prediction, PET raises its value. This remark is consistent with the different melting points and air voids effect mentioned by previous articles [69, 70]. Within the mixing, the mixing technique also depicts a clear differentiation. Even though this feature does not rank among the top parameters, the observed effect on the local explanation is evident; the wet technique increases the predictions, whereas dry decreases it.

The initial R2 of the model was 0.743, and after eliminating those features considered less critical by the global interpretation, bitumen replacement, and mixing process, the model’s performance remains constant. Then, if the model is trained without the plastic size, it reduces its performance, R2 = 0.7131, which means that the plastic type is still relevant for predicting accurate results. Thus, the final features, ordered by importance, are gradation, plastic type, bitumen content, bitumen penetration, and plastic size.

3.3.2 Marshall Stability

In addition to plastic-type and aggregates gradation, the global interpretation of the MS reveals that the aggregate absorption and bitumen content are also relevant factors (Fig. 6a). This remark was also observed in Fig. 3b, where the MSE was further reduced after adding bitumen content and aggregate absorption in the features set. Other properties, such as bitumen penetration and plastic size, also present a certain degree of importance in the MS prediction; nevertheless, their global importance is minor compared to plastic-type and gradation. When plastic and gradation are divided in the local explanation plot, Fig. 6b, aggregate absorption becomes the feature with the highest impact on the MS prediction.

According to the SHAP values, high absorption in the aggregates produces a high MS prediction in the PAM. Similarly, the impact of PET is clear; high PET addition contributes to an increase in the MS estimation. This is contrary to the PE addition, which displays an apparent negative effect on MS, for most of their red data points are in the negative range. Moreover, regardless of the plastic type, plastic size reveals a positive contribution to the MS prediction. In the bitumen context, the existent correlation of bitumen penetration and content with the MS is also noticeable; bitumen with high penetration and high bitumen content produce an increased effect on the MS of pavements modified with plastic waste.

Although all these remarks represent correlation and not causality, some partly agree with the literature. Zulkati, et al. [71] explain that the high absorption in the aggregates can reduce the thickness of the bitumen layer on the aggregates, increasing the shear friction between aggregates and resulting in a higher MS. When plastic is added, the shear friction might increase further due to the interaction of aggregates–plastic and plastic–plastic, incrementing the final MS value. The same hypothesis can also be applied to the remark found about plastic size; larger plastic components permit better shear friction and cohesion within the PAM. This shear friction might also be affected by the plastic type. PET, which has a high melting point, tends to preserve its shape after being mixed, so its final interaction and contact with aggregates are higher than other plastics that have been completely melted during the PAM formulation. These considerations reaffirm the importance of plastic-type in MS modeling.

According to the literature review conducted by Heydari, et al. [69], the association between the amount of PE plastic added and MS has a concave shape. This reveals that the plastic content tends to behave similarly to the bitumen content, which exhibits a concave shape when plotted versus MS during the optimal bitumen content analysis. It thus means that the MS tends to increase with the plastic content until it reaches a maximum MS value, at which point it gradually decreases, and in some cases, it may have a value lower than the conventional asphalt mix. This trend is depicted in PE addition label in Fig. 6b. As can be seen, MS grows with a low PE content, but after the excessive addition of plastic, denoted by the red hue, it reduces to a level lower than the unaltered asphalt mixture. The same holds true for PET which was also noted by Taherkhani and Arshadi [72] and Ahmadinia et al. [73]. Aghayan and Khafajeh [70] suggest that the abrupt fall in PET MS is a result of the PET's lower stiffness compared to natural aggregates. Inspecting Fig. 6b suggests that contrary to the PE, there is no evidence of a decreasing tendency in the MS of the PET modified mix. It is possible that the MS can still decrease with a higher PET content, it is unlikely to fall to values lower than the conventional asphalt mix; therefore, neither red nor purple points are noticed in the negative region of the SHAP value plot.

The model’s performance that includes all the features depicted in Fig. 6a returned an MSE of 0.0165. Since the aggregate replacement and plastic size did not substantially impact the final MS prediction, they were excluded from the last model, which resulted in a slight reduction of the MSE, − 2.4%. Thus, the optimal model only included aggregates gradation, plastic-type, bitumen content, and bitumen penetration and presented an R2 of 0.7047 and an MSE of 0.016.

3.3.3 Marshall Flow

It is remarkable again to observe that the aggregates gradation and plastic-type are the two most essential features in estimating the MF (Fig. 7). As occurred in the case of MS, MF is affected by two main forces within the asphalt mixture; cohesion and internal friction resistance [74]. The internal friction resistance is expected to be more significant when the aggregate texture is high [75] or when the proportion of large aggregate is prominent. This effect of large aggregate size can be confirmed in the local explanation plot (Fig. 7b), where the percentage of large aggregates, passing the sieve 9.5 mm, and retained in the 4.75 mm, positively affects the MF prediction. Plastic addition also reinforces this interlocking effect if the plastic preserves its shape after being heated and mixed. This result will be mainly expected among plastics with a high melting point, and in the local explanation plot, this could be corroborated by comparing the positive PET impact vs. the negative effect of PE.

The third and fourth most important features were the mixing technique and plastic size. The effect of the mixing process was evident in Fig. 7b: wet mixing presents a negative impact on the prediction, whereas, for dry, the effect is positive. This could be because dry technique increases the interlocking strength among aggregates and plastic material, which is not observed during the wet technique, for it has a direct effect on the binder and not on the aggregate. In the case of the plastic size, it aligns with what has been discussed in the previous paragraph; larger particles in the mixture will allow higher interlocking.

The second most decisive factor in the Marshall flow is the ability of the bitumen to maintain the aggregates together. The impact of the bitumen penetration on the Marshall flow does not agree with the theoretical effect previously mentioned. Bitumen with high penetration can flow further through the aggregates, which will result in a better cohesion than in the case of less liquid binders; however, the local explanation plot shows something different. Nonetheless, it is essential to recognize that these plots show the influence of initial bitumen penetration and plastic addition, and that the effect of bitumen penetration and mixing technique can be better understood when examined simultaneously. When bitumen is modified using the wet method, its viscosity increases, making it less accessible for enhancing aggregate cohesiveness. As Fig. 7 demonstrates, the unfavorable effect of the wet mixing approach can be recognized. This drop in the Marshall flow among wet-processed samples is typical, as established in the literature Heydari, et al. [69]. Only one of the publications reviewed by Heydari et al. [69] that utilized the wet mixing technique observed an increase in flow value compared to the standard asphalt mix [76]. In contrast, in the instance of dry mixing, only one item displayed a lower marshal flow value [73]. Similarly, the impact of the aggregate absorption is unforeseen, for one might expect that less binder available will deteriorate the flow value due to the loss of cohesion. Still, a plausible explanation of this effect could be the generation of a thin bitumen layer on the aggregate and its repercussion on the internal friction strength.

The initial performance of this model displays an R2 of 0.766 and an MSE of 0.0213. With the observed results in Fig. 7a, it could be perceived that the inclusion of bitumen replacement is redundant, and it might deteriorate the final accuracy. After testing the model again without the bitumen replacement, an R2 of 0.763 and an MSE of 0.0216 were obtained. Thus, although the model did not improve after excluding the mentioned feature, it did not present an aggressive reduction on the R2, so it can be confirmed that the bitumen replacement feature is unnecessary. With this in mind, the optimal set of features for the MF are aggregates gradation, plastic-type, mixing process, plastic size, aggregate replacement, aggregate absorption, bitumen penetration, and bitumen content.

3.3.4 ITS

Although aggregates gradation and plastic-type display the highest impact on the global interpretation plot (Fig. 8), none of their sub-divisions are part of the top three features in the local explanation plot. Nevertheless, their impact is still noticeable as critical features for predicting the ITS. As mentioned before, the ITS measures how resistant the asphalt mix is to cracking, which is influenced by how well the asphalt mix can avoid the cracking appearance and its spread. In theory, mixtures with smaller nominal aggregate sizes resist better cracking [31]. When the plastic is added to the mix, these fine aggregates could be strengthened by the plastic glue effect, reducing crack spread. This interpretation can be corroborated by the effect observed among fine aggregates (aggregates passing 0.075 mm and 0.3 mm) in Fig. 8b. It was also expected to detect positive impacts among the low melting point plastic types. PE generally presents a positive variation in ITS prediction, and this trend could also be observed among not-so-representative plastic types, such as PS. In contrast, the positive effect of the PET was unexpected, as it does not validate the remark that the PET increases the air voids, and as high air voids increase the risk of cracking [31], one might anticipate that this plastic would reduce the ITS, which was not the case. Modarres and Hamedi [77] remarked that the addition of PET up to 4% increases the resistance to cracking, but after this point, the material deteriorates. Contrary to Modarres and Hamedi [77], the local interpretation plot Fig. 8b, shows values with higher PET than 4% which continue to exhibit positive impact on the ITS. This could be due to particle size effect, as small PET particles have demonstrated superior ITS performance compared to coarse PET particles [72].

Another unforeseen finding that cannot be validated was the importance of bitumen replacement in the local explanation plot (Fig. 8b). Theoretically, if a portion of the bitumen is replaced by plastic, less binder content will be available within the mixture, making it more prone to cracking. It is the same for the result observed in the aggregate absorption feature; higher absorption increases the ITS prediction. Maybe, in the end, the glue and the interlocking effects of the plastic on the mixture are more potent than the bitumen binding strength.

Studies comparing the dry mixing technique with the wet technique have yielded similar findings about the superiority of the wet technique in the ITS [57, 68, 78]. At the same plastic content, these investigations were conducted on HDPE, LDPE, and PET, and the dry and wet techniques were compared. As was the case with the high PET content, however, excessive addition of plastic waste by the wet technique begins to degrade the ITS [57], thus prudence is advised. This may explain why some wet mix data points are negatively impacting the ITS in the local explanation plot. Notably, although White and Hall [68] noticed that the wet technique performed better in the ITS, after evaluating and comparing other properties of the asphalt mixture, they concluded that there were no significant differences between the two mixing techniques, and that the final selection of the mixing technique should be based on practical considerations rather than anticipated mixture improvement.

Last, although the mixing technique does not seem important in the global interpretation, it is a critical feature that contributes to the model performance. Initially, the model presented an R2 of 0.776, and after being trained without the mixing technique, variable with the least impact according to the SHAP values, its R2 is reduced by 27%, meaning that it is still relevant during prediction, which is also confirmed in the local explanation plot. Therefore, no additional exclusion should be made for the final model so that the optimal set of parameters remain the same; aggregates gradation, plastic type, bitumen replacement, aggregate absorption, and mixing process.

3.3.5 TSR

One factor that strongly influences the TSR is the gradation of the aggregates. Dense grading asphalt, for instance, presents better moisture resistance than a mixture with high content of coarse aggregates [31]. Thus, this explanation can confirm the observed results of the SHAP values in the TSR prediction. Figure 9b reveals that when many aggregates pass the smallest sieve, 0.075 mm, the prediction on the effect of plastic tends to be positive. On the contrary, when the aggregates are in the medium range, passing the 0.6 mm sieve, an apparent inclination towards negative TSR predictions exists. This finding is consistent with the findings of Habeeb et al. [79], who reported that gradations with a high proportion of fine particles are more resistant to water damage. They explained that the high permeability of coarse mixes is the source of this trend. When plastic is added to asphalt mix, it has the ability to further improve water damage and stripping resistance because fine particles can be coated by modified bitumen or melted plastic. In the wet technique, when plastic is added to the bitumen, it increases the bitumen volume and facilitates the fine aggregates covering. However, in some cases. This addition also provokes an increase in the bitumen viscosity, making it difficult for the binder the complete spread among medium and large minerals. In the dry technique, the explanation is similar; melted plastic can more easily disperse and cover fine aggregates than coarse minerals. It is also worth noting that the wet mixing process often results in less loss of binder coating on aggregates when exposed to water than the dry technique, as demonstrated by Haider et al. [66].

The second critical factor that alters the TSR is the air voids [31]. If the mixture has a high air void content, it facilitates the water access and the displacement of the bitumen layer on the aggregates’ surfaces. After revising the PET and PE effect on air voids (Fig. 5b), one might expect that the impact of these plastics on the TSR is negative and positive, respectively. Nevertheless, another effect is perceived in Fig. 9b; when the PET and PE content are low or medium, it increases the TSR, but after adding and exaggerating content, a reduction in the TSR is observed. This observation was confirmed Ameri and Nasr [80], and Tiwari and Rao [57], and the inference that the exaggerated addition of plastic tend to hamper the moisture resistance of the PAM was also mentioned by Aghayan and Khafajeh [70], in their review of PET addition in asphalt mixtures.

The thickness and continuity of the bitumen layer on the aggregates will also affect the PAM’s resistance to water [81]. Although the global interpretation reaffirms that the bitumen content is an influential factor, the local interpretation plot does not provide a clear idea about its effect. When the bitumen addition is medium (purple color in the plot), its impact could be positive or negative, whereas high values (red color) reduce the TSR prediction. These results were unexpected, as medium and high bitumen content would increase the bitumen thickness on the aggregates, and it would guarantee the continuity of the binder on the aggregates’ surface.

The aggregate replacement feature is not essential for the model and can be removed. The proposed model, which includes all the features displayed in Fig. 9, returned an R2 of 0.80 and an MSE of 0.0046. Based on Fig. 3e and Fig. 9a, it is noticed that the bitumen replacement property might be redundant, and it could not be contributing to the final model performance. After testing the model without this property, this remark was confirmed as the new R2 was 0.808 and MSE was 0.0044. It could also be assumed that aggregate absorption and bitumen content might be irrelevant too; however, after testing the model without these features, a relatively aggressive deterioration of the models was observed, so they are still pertinent for the TSR prediction. With that being said, the optimal and final set of features for the TSR are gradation, plastic type, bitumen content, and aggregate absorption.

4 Implications and Limitations

4.1 Implication

This work highlighted the most important features that affect the performance of PAM as shown in Table 3. The final optimal models and corresponding interpretations have provided a better understanding of the effect of plastic addition to asphalt mixtures, and they can be used as a baseline for future machine learning models. The most critical features for predicting basic PAM properties are aggregates gradation, plastic type, and plastic content. Although the influence of other features is less significant, they can still be relevant in some cases as shown in Table 3.

Table 3 Summary feature importance for each predictive model

Based on the results obtained, a PAM that does not comply with standards could be modified, so that it complies with pavements standard requirements. For example, if an original PAM requires an increase in the air voids, for instance, increasing the proportion of larger aggregates, or using stiffer binders would be appropriate. On the contrary, when air voids are intended to be reduced, the best set of actions would be reducing coarse aggregates, adding PE instead of high melting point plastics and using softer binders. Table 4 provides an overview of these recommendations for each of the investigated attributes. As previously stated, SHAP values do not always imply causation, therefore, any decision on which factors to alter must be accompanied by adequate expertise. The effect of combining multiple recommendations may not be easily predicted by an expert; nevertheless, using the proposed models can provide a quantifiable estimate on the mix properties.

Table 4 General recommendations for improving PAM properties

4.2 Limitations

One of the challenges of employing ML modeling is the availability of data. If the amount of data is insufficient, machine learning models may not achieve high accuracy or may end up overfitting, which is undesirable for the development of generalized predictive models [82]. This work improved the training range of prior ML models that have attempted to predict the performance of PAM; however, the models developed in this study can benefit from a larger and more diverse training set. It should be emphasized that the current models can only be applied to data within the range of the training set and not to outlier values. ML models are black-box approach. As explained by Roscher et al. [83], this disadvantage is especially pertinent when stakeholders seek to interpret the model's predictions within the scientific field. To counter this criticism, SHAP value analysis has been employed in this work to gain better understanding of the effect of each feature on the final prediction.

However, SHAP values also have their limitations. First, it is essential to recognize that SHAP values indicate the significance of model features but may not necessarily reflect reality [46] because SHAP values represent correlation and not causation [48]. When searching for causality, more sophisticated causality tests should be conducted. Therefore, any plausible inference must be supported by common sense and expertise [83]. With this in mind, the majority of the SHAP values reported are in agreement with existing literature, and as long as they are analyzed within the range of the training data, they could serve as a reliable indicator of feature-parameter correlation.

5 Conclusion

The present article has identified what variables have the highest impact on predicting PAM basic properties (air voids, MS, MF, ITS, and TSR). Data were gathered from previous articles using the systematic quantitative literature review. These data were cleaned and prepared for training machine learning models. The Pearson correlation of each feature with the target properties was employed as a selection tool to evaluate how they contribute to the MSE of a general XGBoost model. This stage has returned an initial set of features. These features, then, were used as a training set in single and ensemble machine learning models. The model with the lowest AICc was selected for the next stage of model interpretability. The interpretability of the ML model was done through SHAP values, which provide a general understanding of features’ importance, and it indicates whether the feature has a positive or negative impact on the final prediction of the target variable.

The obtained results serve as a reference for developing or improving a PAM to maximize the beneficial plastic properties on the mix. The asphalt mix type and the gradation curve of aggregates are the most influential design parameters; followed by plastic type and proportion of addition to the mix. PET should be used rather than PE, if stiffer mixtures are desired. Other factors such as type of aggregate, and bitumen penetration, exert lesser effect on the final PAM properties. In general, an appropriate combination of these conditions would be essential for building an effective PAM and persuading stakeholders to utilize plastic waste in real-world circumstances. Based on our models’ analysis, mixing type does not have substantial impact on the final PAM properties. Therefore, the dry mixing strategy would be favored because it yields similar quality PAM as the wet mixing but has the advantage of ease formulation in the field.

Further research could attempt to retrain the models with updated data, so the new models are more accurate and relevant to stakeholders. The model can be further extended to include more advanced rheological properties of the bitumen, such as, viscosity, complex modulus, and phase angle. In addition, future studies could focus on the formulation of models that can predict the optimal bitumen content after plastic addition. This type of model is relevant in the field because it supports the potential of savings on bitumen after the inclusion of plastic waste and improves the asphalt mix quality.