Comparative Study of Hybrid Artificial Intelligence Approaches for Predicting Peak Shear Strength Along Soil-Geocomposite Drainage Layer Interfaces

Peak shear strength of soil-Geocomposite Drain Layer (GDL) interfaces is an important parameter in the designing and operating related engineering structures. In this paper, a database compiled from 316 large direct shear tests on soil-GDL interfaces has been established. Based on this database, five different machine learning models: Back Propagation Artificial Neural Network (BPANN) and Support Vector Machine (SVM), with hyperparameters optimised by Particle Swarm Optimisation Algorithm (PSO) and Genetic Algorithm (GA), respectively, and Extreme Learning Machine (ELM) optimised by Exhaustive Method, were adopt to assess the peak shear strength of soil-GDL interfaces. Then, a comprehensive investigation and comparison of the predictive performance for the models was conducted. Also, based on the selected optimal machine learning model, sensitivity analysis was conducted, and an empirical equation developed based on it. The research indicated that GA and PSO could significantly increase forecasting precision in a small number of iterations. The BPANN model optimised by PSO has the highest forecasting precision based on the statistics criteria: Root-Mean-Square Error, Correlation Coefficient, Coefficient of Determination, Wilmot’s Index of Agreement, and Mean Absolute Percentage Error. The normal stress has the biggest impact on the peak shear strength, followed by drainage core type, moisture saturation of the soil layer, shearing surface, soil type, consolidation condition, geotextile specification, soil density and drainage core thickness, and the ranking is affected partly by the data distribution of input parameters in the database based on mechanism analysis. An empirical equation developed from the optimal model was proposed to estimate the peak shear strength, which provides convenience for geotechnical engineering personnel with limited knowledge of machine learning technique.


Introduction
Geocomposite Drainage Layers (GDL) are increasingly applied in extensive geotechnical and geoenvironmental applications [1][2][3]. GDLs can replace the need for graded sand and gravel to effectively drain excess water and reduce pore water pressure, improving the stability of engineering projects [4]. Particularly for GDLs utilised in the capping and lining systems of landfills, they can also provide separation and reinforcement functions and perform as a capillary break to prevent the migration of contaminated water and gas produced from the waste [5]. In practical engineering, GDLs are commonly installed underneath cover soil above containment facilities, and the interface shear strength between GDL and cover soil governs the stability of the system [6].
Previously, many series of laboratory tests have been conducted to determine the shear strength along soil-GDL interfaces [7][8][9][10][11]. However, soil-GDL interface testing is expensive and time-consuming. Also significantly, in real engineering projects, the specific materials to be used on site are usually selected well after the design stage. This allows a better pre-estimate of interface shear strength before the specific materials are determined.
Accurate forecasting models that can evaluate the shear strength of soil-GDL interfaces can substantially overcome many of the existing challenges. Due to the complex mechanism of soil-GDL interaction and the multiple influence variables of the shear strength along soil-GDL interfaces, it is difficult for the simplified empirical models established by adopting traditional statistical methods to adequately present the complicated non-linear relationship between the variables that influence the interface shear strength. This has driven the search for reliable methods with high accuracy to forecast the shear strength of soil-GDL interfaces.
With the advance of computer science, the machine learning-based approach has attracted considerable scientific attention and has been extensively adopted in geotechnical engineering to model complex non-linear relationship between multi-inputs and outputs [12][13][14][15][16]. Pham et al. [17] applied Particle Swarm Optimisation (PSO) -Adaptive Network based Fuzzy Inference System (PANFIS), Genetic Algorithm (GA)-Adaptive Network based Fuzzy Inference System (GANFIS), Support Vector Regression (SVR), and Artificial Neural Networks (ANN), to predict the shear strength of soft soil. Their results show that the forecasting performance of the machine learning algorithms is satisfactory. Qi et al. [18] put forward five machine learning models including logistic regression (LR), multilayer perceptron neural networks (MLPNN), decision tree (DT), gradient boosting machine (GBM), and SVR, optimised by adopting firefly algorithm (FA), to predict the stability of hanging walls, and the research denotes that the estimating accuracy of the models is appreciated. Ceryan et al. [14] compared the performance of ANN, and SVR with different kernel functions in assessing the tensile strength of rock, and concluded that SVR with the least squares kernel function is more powerful in predicting the tensile strength. Overall, based on the aforementioned analysis, machine learning techniques are efficient for describing the non-linear relationship in multivariable problems.
Compared to other topics in geotechnical engineering, the application of machine learning approaches in evaluating the peak shear strength along soil-geosynthetics interfaces is rare. Debnath and Dey [19] proposed an ANN model to predict the shear strength of cohesive soil-geosynthetics interfaces based on the input parameters: dry density and moisture content of soil, normal stress, adhesion and frictional angle of the soil-geosynthetics interfaces. It is necessary to optimise the algorithms to increase the forecasting precision of machine learning models and conduct comprehensive investigations and compare the applicability for different machine learning algorithms to assess the peak shear strength of soil-GDL interfaces.
In this paper, a comprehensive investigation and comparison of the applicability for five different machine learning models including, Backpropagation Artificial Neural Network (BPANN) and Support Vector Machine (SVM), with hyperparameters optimised by Particle Swarm Optimisation Algorithm (PSO) and Genetic Algorithm (GA), respectively, and Extreme Learning Machine (ELM) optimised by Exhaustive Method, in estimating the peak shear strength of soil-GDL interfaces was conducted. Also, the relative significance of influence factors to the peak shear strength was analysed. After that, an empirical equation for assessing the peak shear strength was proposed to facilitate the peak shear strength prediction for geotechnical engineering personnel with limited knowledge of machine learning technique.

Machine Learning Algorithms and Optimisation Algorithms
This paper employs three types of machine learning algorithms, namely, BPANN, SVM and ELM. For optimizing the algorithms, PSO and GA have been adopted. Among the various positive aspects of employing these algorithms, three key advantages can be highlighted.
1. They are mature and have standard procedures for the application [20]. 2. Their widespread applicability in solving the issues of geotechnical engineering [21,22]. 3. They can accurately model the complex non-linear relationship between multiple independent variables and dependent variables [23].
A brief introduction and basic specifications of the employed machine learning and optimisation algorithms are presented below.

BPANN
Backpropagation Algorithm was the most popular algorithm adopted to build ANN models [24][25][26]. The BPANN model developed in the research is composed by an input layer for nine input parameters, a hidden layer for nine joints determined by the exhaustive method and an output layer, which outputs one output parameter (Fig. 1). Hyperbolic Tangent Sigmoid Transfer Function was utilised as the activation function in the proposed BPANN models with Levenberg-Marquardt Backpropagation Algorithm as the network training algorithm. Also, the initial weights and thresholds of each joint in the constructed BPANN model were optimised by GA and PSO, respectively.

SVM
SVM is based on the structural risk minimization principle and the maximum margin principle to conduct regression operations [27] that employ limited specimen data to establish the optimal regression models [28]. Another advantage of SVM is that it can use the kernel function to project the specimen data in a low-dimensional space to a high-dimensional space to convert non-linear issues to linear issues, reducing computational cost and difficulty [29]. In this research, the penalty parameter g and c of kernel function were optimised by PSO and GA.

ELM
ELM has the similar structure with feedforward ANN, which is an efficient and time-saving tool to establish the complex relationship between multiple input and output parameters [30]. Compared to traditional ANN algorithms, ELM has better generalisation capability and high predictive precision. It is also able to avoid the drawbacks of traditional machine learning algorithms, such as local minima, slow regression speed, etc. [31]. For ELM, the number of hidden layer joints greatly impacts on its predictive performance [32]. Hence, in this research, the hidden layer joint number for the ELM model was determined as 53 by the exhaustive method, with input layer joint number 9 and output layer joint number 1. Additionally, Logarithmic Sigmoid Function was adopted as the activation function.

GA and PSO
GA is a heuristic population optimisation algorithm that adopts the principle of natural evolution in the algorithm [33]. GA selects individuals using the operations, including selection, cross and mutation, to retain individuals with a large fitness value and eliminate those with a small fitness value. The new generation has a higher fitness value compared to that of the previous one.
PSO is another heuristic population optimization algorithm, originating from the predation behaviour of birds. In PSO, every particle among the population stands for a possible solution to the targeted issue. The particle's velocity controls their motion, which is regulated by the particle's and other particles' motion experience to achieve the optimum individual solutions in the solution space, respectively, and through continuous iterations, finally achieving the optimal solution for the targeted issue [34].

Hyperparameters Optimization
All machine learning algorithms have several crucial hyperparameters that can influence their predictive performance significantly [35]. Thus, optimising the hyper-parameters of machine learning algorithms before conducting training operation is needful. In this research, GA and PSO were employed to optimise the hyperparameters for the established BPANN and SVM models, using Root-Mean-Square Error (RMSE) (Eq. 1) as the fitness function. Previous researchers have demonstrated that GA and PSO are more capable of enhancing the machine learning models' forecasting accuracy than other intelligence algorithms [36][37][38].
where, n is the specimen data number, y i is measured data, f i is forecasting value.
The detailed procedure of employing GA and PSO to optimise the machine learning models is in the following contents: (1) stochastically produce individuals/particles composed of diverse hyperparameter values (2) calculate the fitness value of the individuals according to fitness function (RMSE) through calling the counter machine learning model (3) conduct corresponding operations on the individuals/ particles (4) compute the individual/particle's fitness value again (5) rerun Step 3 and Step 4 until reaching the predetermined ending conditions (6) take the individual/particle having the smallest RMSE value as the optimal individual/ particle, and take the optimal individual/particle's hyperparameter values as the initial hyperparameter values of the built machine learning models (7) train the machine learning models, and carry out prediction, the specific optimising procedure being presented in Fig. 2, the detailed parameter specifications of GA and PSO as tabulated in Table 1.
In this case, the individuals/particles' fitness value in GA and PSO was attained by adopting k-fold cross-validation method (k-CV) in hyperparameter optimisation. k-CV Fig. 1 The structure of BPANN developed to predict is an extensively adopted method to validate the machine learning models, In k-CV, the original data are divided into k equal groups. The training of machine learning models is based on k − 1 groups, while the validating is conducted on the remaining group. The training and validating process is repeated k times with different groups as the testing dataset. The average value of the k times forecasting precision is finally adopted as the performance index [39]. In the paper, the training dataset of the established database was utilised as the original data to conduct the k-CV operation on the machine learning models to obtain the indicator value of predicted accuracy (RMSE), and k was taken as 10 considering the database size and the recommendation in literatures [40]. The optimum values of hyperparameters for the machine learning models and corresponding optimising ranges were listed in Table 2.

Establishment of Database
The database adopted in this paper for the intelligent modelling was obtained from 316 large direct shear tests on soil-GDL interface conducted by the authors, the manufacturer [41] and Othman [6,42]. All of the GDLs comprising the interfaces within the database were produced by the same manufacturer. The GDL comprises a single cuspate HDPE (High Density Polyethylene) drainage core with a medium weight non-woven needle-punched and heat-treated staple fibre polypropylene geotextile filter thermally bonded on the dimple side and a lighter geotextile on the flat side or only a medium weight geotextile on the dimple side, as shown in Fig. 3. There are two types of different drainage cores: continuous drainage core and drainage core with cut-outs, which are adopted for the GDL, as shown in Fig. 4, with the thickness ranging from 4 to 7 mm. For the bonded geotextiles,   on the dimple side of the drainage core, the thickness of the geotextile is 1.75 mm, and on the flat side, the thickness is 1.2 mm. Additionally, some GDLs are only bonded with geotextiles on the dimple side. The geotextiles were produced from the same manufacturer, the typical properties of the geotextiles as tabulated in Table 3.
The tests employed broadly categorised soils of three types: clay, sand and gravel. The density and moisture saturation of the soils range from 1.36 to 2.01 g/cm 3 and 0% to 100%, respectively.
The interface peak shear strength of soil-GDL interfaces, under different normal stress ranging from 5 to 400 kPa, was measured utilising the large (~ 300 mm) direct shear apparatus. Most tests adopt the dimple side of GDL as the shearing surface during the experiments, while some tests used the flat side of GDL as the shearing surface. In several tests, the soils were consolidated for 24 h before shearing, while in the remaining tests, the soils were sheared directly after applying normal stress. The other operational procedures were the same, including: fixation method of GDL, loading method and external environmental factors such as test temperature and humidity. In this case, most tests were implemented in dry condition. For the tests conducted in    17 32 submerged condition, the influence of submerged condition is reflected on the value of moisture saturation for soil, taking their moisture saturation as 100%. Among the 316 tests, 107 tests were excluded since they lack a complete set of input parameters. The information of the remaining 209 tests was compiled and arranged to establish the database with 209 data groups. Each data group consists of nine input parameters: soil type (T), soil density (D), moisture saturation of soil (W), normal stress ( n ), shearing surface (dimple side or flat side) (F), thickness of drainage core (TH), type of drainage core (continuous drainage core or drainage core with cut-outs) (TY), geotextile specification (bonded on both sides or only dimple side) (GE), and consolidation condition (C). A single output parameter: peak shear strength ( ) was adopted. The parameters were selected following the literature to have high impacts on the peak shear strength of soil-geosynthetic interfaces [20,43,44]. The statistics parameters of the selected input and output parameters, value range of input parameters, and data type are tabulated in Table 4. Figure 5 presents the data distribution of the compiled database. In Fig. 5, x axis represents input parameter values, and y axis represents the corresponding data group numbers in the database. In this study, the machine learning models were established using Matlab software. In the Matlab modelling script, the nominal variables were digitalised and categorized for training and testing the machine learning models. The corresponding digitalised values of the nominal variables have been specified in the brackets after the nominal variables, respectively, as presented in Table 4.

Data Pre-Processing
In supervised learning, the database needs to be divided into two sub-datasets: the training dataset for model training, hyperparameter optimising, and model validation, and the testing dataset for verifying the predictive performance of models. The dividing ratio of the database influences the forecasting performance of models. The dividing ratio was determined after conducting optimisation analysis based on the recommendations in the existing literature to improve generalisation capability and avoid overfitting [19]. As such, the training and testing datasets comprise 80% (168 groups) and 20% (41 groups) of the whole database, respectively. Since the dimension of input parameters is various, the training duration and the forecasting precision may be influenced. To enhance the machine learning models' predictive precision and efficacy, normalisation is conducted adopting Eq. (2).
where, x Normalised and x is normalized value and original value, respectively,x min and x max is the minimum and maximum value, respectively.

Performance Evaluation Methods
During the machine learning modelling, the k-CV was initially adopted to validate the goodness of fit for the machine learning models based on the training dataset, with k set as 10. Then, testing dataset was used to assess their forecasting precision.
The performance of the proposed machine learning models was evaluated by three indexes: (i) RMSE: RMSE can represent the standard errors between forecasting values and measured values. The where, cov(, ) is covariance, var[] is variance. (iv) Coefficient of Determination (R 2 ): R 2 reflects how well the predicted value be close to the real value. R 2 ranges 0 to 1. An R 2 of 1 means the perfect fitting between the predicted value and real value. The definition of R 2 is shown in Eq. (5) [46].

Results of Hyperparameter Optimisation
The optimum BPANN and ELM hidden layer joint numbers were ascertained by the exhaustive method, with RMSE as the accuracy indicator, being presented in Fig. 6. According to Fig. 6, the RMSE of the BPANN models and ELM models with diverse joint numbers has relatively large difference, ranging from 42.99 to 11.04 and 913.51 to 4.46, respectively. For the BPANN algorithm, the model with nine hidden layer joints has the least RMSE. In terms of the ELM algorithm, 53 is the optimal number of hidden layer joint.
The optimisation processes of BPANN model adopting GA and PSO, respectively, are presented in Fig. 7.
Based on Fig. 7, for both of the BPANN model and the SVM model, their RMSE decreases considerably with the rise in iteration number utilising GA and PSO algorithms, respectively. It indicates that GA and PSO are powerful in optimising the hyperparameters of the established BPANN and SVM models, which is able to improve the forecasting precision of the established machine learning models markedly with satisfactory efficacy. More specifically, the optimisation effects on the machine learning models adopting PSO is larger than that of GA, with higher optimising magnitude and efficiency. For example, when PSO was used, the RMSE of BPANN and SVM models becomes stable at the 13rd and 16th iteration number, with value 3.68 and 5.23, respectively, while for GA, the RMSE stabilises at the 19th and 19th iteration number, with value 3.98 and 6.99, respectively. When the predetermined maximum iteration number was reached, the hyperparameters of the optimal individual/ particle that has the smallest RMSE value in the population were taken as the initial parameters of the BPANN model and SVM model, respectively, as shown in Table 1.

Comparing the Forecasting Performance
The performance on predicting the training and testing datasets is shown in Figs. 8, 9, 10, 11, and 12.
Based on Figs. 8, 9, 10, and 11, PSOBPANN has the highest forecasting precision among the established models on predicting the training dataset on the basis of the statistics indexes. More specifically, PSOBPANN achieved the lowest RMSE (3.69) and MAPE (8.61%) and the highest R 2 (0.95), WI (0.99) and R (0.96). Additionally, overall, the forecasting accuracy of BPANN models is better than the SVM models. For example, the GABPANN model has 0.56  Fig. 6 Optimisation processes by the exhaustive method lower RMSE, 7.39% lower MAPE and 0.06 higher R than those of GASVM. Moreover, the machine learning algorithms optimised by PSO have better performance than those optimised by GA. For instance, the PSOSVM model has 0.11 lower RMSE, 1.07% lower MAPE and 0.04 higher R that those of GASVM. In terms of the predictive capability for the testing dataset, based on Figs. 9, 10, 11, and 12, it can be seen that the BPANN model optimised by PSO has the highest accuracy in forecasting the peak shear strength of soil-GDL interfaces, with RMSE of 4.13, MAPE of 11.10%, R 2 of 0.93, WI of 0.98, and R of 0.94, and the performance of GABPANN is poorer than that of PSOBPANN, with RMSE of 6.37, MAPE of 19.19% R 2 of 0.87, WI of 0.96 and R of 0.93. Additionally, as with the estimating results in training dataset, the assessing accuracy of the BPANN models is higher than those of SVM models, and the optimisation performance of PSO is superior to GA.

Sensitivity Analysis of the Influence Variables
Sensitivity analysis was conducted to investigate the relative significance of the input variables to the peak shear strength of soil-GDL interfaces. Since the BPANN model optimised by PSO was considered as the most successful model in forecasting the peak shear strength, the PSOBPANN model was adopted to conduct sensitivity analysis in this paper. The relative importance of the input parameters for the established PSOBPANN model was evaluated adopting Garson's algorithm that has been widely applied in geotechnical engineering to assess the variable contribution [49,50]. Garson [51] proposed Garson's Algorithm, later modified by Goh [52],  7). The relative importance of the input parameters is plotted in Fig. 13.
where R ik is the relative importance of input parameters, W ij ,W jk are the connection weights of the input layer-hidden layer and the hidden layer-output layer, i = 1, 2, …, N, k = 1, 2, …, M (N, M are the numbers of the input parameters and output parameters).
Based on Fig. 13, normal stress is the most important variable to affect the peak shear strength of soil-GDL interfaces, with proportion of 17.91%, followed by drainage core type, moisture saturation, soil type and shearing surface, with percentage of 14.68%, 13.78%, 10.91%, and 10.86%, respectively. In comparison, consolidation condition, geotextile specification, soil density and drainage core thickness have slighter influences on the peak shear strength compared with the aforementioned factors. The detailed mechanism analysis of the relative importance for the input parameters to the peak shear strength is conducted in Sect. 8.

Development of an Empirical Equation for Peak Shear Strength Prediction
In this research, the BPANN model optimised by PSO has been proved as an efficient tool for predicting the peak shear strength of soil-GDL interfaces. However, the application of the developed PSOBPANN model for geotechnical engineering personnel with limited or no knowledge of machine learning techniques is of little use. To solve the problem, an empirical equation for forecasting the peak shear strength of soil-GDL interfaces was proposed based on the connection weights and biases of the PSOBPANN models, as shown in Eq. (8) [53]. The connection weights and biases of the PSOB-PANN model are tabulated in Table 5.
where, Y n is the normalised predictive values, ranging from − 1 to 1; b 0 is the biases of output layer joint; w k is the connection weights between the kth hidden layer joint and the output layer joint; b hk is the biases of the kth hidden layer   Fig. 12 The R values of the models for testing dataset joint; h is the hidden layer joint number; w ik is the connection weights between the ith input layer joint and kth hidden layer joint; X i is the ith normalised input parameter, ranging from − 1 to 1; f sig is Hyperbolic Tangent Sigmoid Transfer Function.

(d) PSOBPANN
After calculation, the empirical equation for assessing the peak shear strength of soil-GDL interfaces was gained, as expressed in Eq. (9).
where, Y max and Y min are the maximum and minimum values of the peak shear strength in the database, respectively, Y max = 20.14 kPa and Y min = 0.5 kPa.
Among Eq. (9): where, g i is the connection weight between the ith hidden layer joint and the output layer joint for the established PSOBPANN model, as listed in Table 5.
Among Eq. (11): where, h i is the bias of the jth hidden layer joint for the established PSOBPANN model, as listed in Table 5; p j is the connection weight between the jth input layer joint and ith hidden layer joint for the established PSOBPANN model, as listed in Table 5; N j is the ith normalised input parameter.

Case Study
To facilitate the practitioners and future researchers to utilise the developed empirical equation (Eq. 8) to predict the peak shear strength of soil-GDL interfaces, a case study is conducted in this session to use Eq. (8) to assess the peak shear strength of clayey soil-GDL interfaces based on a numerical example with real values, and then the predictive peak shear strength of interfaces is compared with the peak shear strength that is measured in laboratory tests to validate the predictive accuracy of the developed empirical equation. The basic properties of the adopted soil-GDL interface and the corresponding input parameter value of the machine learning models are presented in Table 6.  Fig. 13 Relative importance of the input parameters The detailed procedure of using the developed empirical equation (Eq. 8) to predict the peak shear strength of soil-GDL interfaces is following: firstly, the corresponding input parameter values of the interface in Table 6 are normalised by Eq. (2). Secondly, the normalised input parameter values are substituted into Eq. (12) to calculate the values of Ai (i = 1, …, 9), with combining the values of h i (i = 1, …, 9) and p j (j = 1, …, 9) in Table 5, respectively. Thirdly, the obtained values of A i (i = 1, …, 9) are substituted into Eq. (11) to calculate the value of C 1 , with combining the values of g i (i = 1, …, 9) in Table 5. Fourthly, the obtained value of C 1 is substituted into Eq. (10) to calculate the value of Y n . Finally, the obtained value of Y n is substituted into Eq. (9) to be renormalised to obtain the value of that is the predictive peak shear strength of the interface. The predictive peak shear strength of the interface using the developed empirical equation and the measured peak shear strength in laboratory tests is drawn in Fig. 14.
Based on Fig. 14, the predicted peak shear strength of the interface is close to the measured peak shear strength by laboratory tests. This indicates that the developed empirical equation (Eq. 8) has high accuracy to predict the peak shear strength of soil-GDL interfaces. It provides convenience for geotechnical engineering personnel with limited knowledge of machine learning technique to forecast the peak shear strength of soil-GDL interfaces.

Discussion
According to the sensitivity analysis results, normal stress has the largest influence on the peak shear strength of soil-GDL interfaces. This fact conforms to the findings highlighted by the previous studies [54][55][56]. Furthermore, drainage core type (continuous drainage core or drainage core with cut-outs) and moisture saturation affect the peak shear strength as the second and third most influencing factors. As shown in Fig. 15, the interlocking effects between the grooves in geogrid core and soil, leading in superior frictional performance, would have resulted in the higher peak shear strength between soil-GDL with drainage core   The interlocking mechanism between geogrid drainage core and soil with cut-outs than that of GDL with continuous drainage core. In terms of the moisture saturation, the large impact of moisture saturation on the peak shear strength of soil-GDL interfaces agrees with the findings from Othman [6].
Shearing surface (flat side or dimple side), soil type (clay, sand and gravel), and consolidation condition are demonstrated to have moderate influence on the peak shear strength. The difference of shear strength for the interfaces between geosynthetics and different types of soil has been reported by many scholars due to the diverse basic properties of soil, including particle shape/angularity, mean grain size, hardness, etc. [44,57,58]. In terms of the shearing surface, when the dimple side of GDL is adopted as the shearing surface, the interlocking effects between soil and the dimple drainage core can provide larger shear resistance, leading to higher peak shear strength along the interfaces than that of flat side. The magnitude of interlocking effects enhances with the rise of compaction effort during soil placement, normal stress and moisture saturation, the detailed interaction mechanism between soil and GDL as shown in Fig. 16. In this research, a majority of tests were conducted on soil with low to medium moisture saturation (0%-50%) under low normal stress (less than 60 kPa), and the compaction efforts during soil placement was light, as shown in Fig. 5, with a medium interlocking effect between soil and drainage core. Therefore, the relative importance of shearing surface to the shear strength is manifested to be moderate. In the aspect of consolidation condition, previous studies have indicated that consolidation can increase the shear strength of cohesive soil-geosynthetics interfaces evidently [59,60]. However, in this research, the tests on cohesive soil-GDL interfaces only account for about 40% of the total tests, as shown in Fig. 5, and other tests were performed on sand and gravel-GDL interfaces. The influence of consolidation condition on the shear strength of sand/gravel-GDL interfaces is not very high. Hence, the relative importance of consolidation condition to the shear strength is indicated as being not very significant.
The variations of input parameters including: geotextile specification (bonding geotextile on both sides or only on the dimple side), soil density, and drainage core thickness are not shown to be vital to the peak shear strength. For the geotextile specification, in the compiled tests, the GDL was clamped firmly on the lower shear box of the direct shear apparatus. When the dimple side was adopted as the shearing surface, the bonded geotextile on the flat side has negligible influence on the shear strength, while when the flat side was chosen as the shearing surface, the bonded geotextile can provide larger friction force, resulting in the higher peak shear strength than that of GDL only bonded geotextile on the dimple side. However, in this research, a large proportion of compiled tests (80%) were conducted on the interfaces between soil and the dimple side of GDL, as shown in Fig. 5, thus, the relative importance of geotextile specification to the shear strength is demonstrated to be low. In terms of soil density, according to the existing research, the peak shear strength of the interfaces is mobilised from two components: the skin friction and the interlocking effects between soil and geosynthetics [44]. The effects of soil density on the interlocking effects are relatively large but on the skin friction is non-obvious [61]. In this research, based on Fig. 5, most of tests were carried out under low normal Fig. 16 The interaction mechanism between soil and GDL (after Chao and Fowmes [67]) stress, and the interlocking effects between soil and GDL are not the dominant factor for generating shear resistance, as illustrated in Fig. 16. Thus, the variation of soil density is manifested to have less influence on the peak shear strength. In the aspect of drainage core thickness, the influence of its changes on the peak shear strength is the least among the input parameters. This is because, in this case, the variation in the thickness of drainage core does not change the dimension of the cuspate elements on the drainage core, which has marginal influences on the interlocking effects and skin friction between soil and GDL, contributing less to the change of the peak shear strength.

Limitations
The predictive precision and reliability of the established machine learning models can be improved further when a larger database is available, whilst some larger databases of geosynthetics tests exist the data is often incomplete making the application of machine learning impossible [62][63][64][65][66]. Secondly, there may some other influence variables for the peak shear strength of soil-GDL interfaces which were ignored during the compilation of database for modelling, such as mean grain size of soil, etc. In the future, it is worthy to adopt more influencing variables to conduct machine learning modelling, which may enhance the predictive ability of the machine learning models. Thirdly, the value ranges of some adopted input parameters, such as normal stress, moisture saturation of soil, etc., are small, thus, an attempt to expand the value ranges of some input parameters can further increase the generalisation ability of the established machine learning models and have a better understanding about the relative importance of the influence variables to the peak shear strength.

Conclusions
In this paper, based on the compiled database, an investigation and comparison of the applicability for five different machine learning models including, BPANN and SVM, with hyperparameters optimised by PSO and GA, respectively, and ELM optimised by Exhaustive Method, in predicting the peak shear strength of soil-GDL interfaces was conducted. Also, the sensitivity analysis was conducted to assess the relative significance of influence variables based on the PSOBPANN model with the optimal estimating performance among the forecasting models, and corresponding mechanism analysis was performed. Moreover, to facilitate the forecasting of the peak shear strength for geotechnical engineering personnel, an empirical equation for assessing the peak shear strength based on the PSOBPANN model was proposed. Based on the results and discussion presented earlier, the following major conclusions can be made: 1. GA and PSO can improve the forecasting precision of the built machine learning models markedly in a small number of iterations. 2. Among the established models, the PSOBPANN model has the best predictive precision, on the basis of the statistics indexes: RMSE, R, R 2 , WI, and MAPE. 3. Overall, the BPANN models have better performance in predicting the peak shear strength of soil-GDL interfaces than that of the SVM models, and the optimisation performance adopting PSO is superior to that of GA. 4. The sensitivity analysis indicates that that normal stress is the biggest influence factor to peak shear strength of soil-GDL interfaces, followed by drainage core type, moisture saturation, shearing surface, soil type, consolidation condition, geotextile specification, soil density and drainage core thickness. 5. Based on the mechanism analysis, the ranking of the relative importance for input parameters is affected partly by the data distribution of input parameters in the database, thus, it is worthy to construct a database with more even data distribution of input parameters. 6. An empirical equation developed from the PSOBPANN model was presented to assess the peak shear strength of soil-GDL interfaces, which provides convenience for geotechnical engineering personnel with limited knowledge of machine learning technique. The high predictive accuracy of the developed empirical equation has also been validated by comparing the predictive peak shear strength with the measured value in laboratory tests.
included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.