An advanced meta-learner based on artificial electric field algorithm optimized stacking ensemble techniques for enhancing prediction accuracy of soil shear strength

Shear strength is a crucial property of soils regarded as its intrinsic capacity to resist failure when forces act on the soil mass. This study proposes an advanced meta-leaner to discern the shear strength property and generate a reliable estimation of the ultimate shear strength of the soil. The proposed model is named as metaheuristic-optimized meta-ensemble learning model (MOMEM) and aims at helping geotechnical engineers accurately predict the parameter of interest. The MOMEM was established with the integration of the artificial electric field algorithm (AEFA) to dynamically blend the radial basis function neural network (RBFNN) and multivariate adaptive regression splines (MARS). In the framework of forming MOMEM, the AEFA consistently monitor the learning phases of the RBFNN and MARS in mining soil shear strength property through optimizing their controlling parameters, including neuron number, Gaussian spread, regularization coefficient, and kernel function parameter. Simultaneously, RBFNN and MARS are stacked via a linear combination method with dynamic weights optimized by the AEFA metaheuristic. The one-tail t test on 20 running times affirmed that with the greatest mean and standard deviation of RMSE (mean = 0.035 kg/cm2; Std. = 0.005 kg/cm2), MAE (mean = 0.026 kg/cm2; Std. = 0.004 kg/cm2), MAPE (mean = 7.9%; Std. = 1.72%), and R2 (mean = 0.826; Std. = 0.055), the MOMEM is significantly superior to other artificial intelligence-based methods. These analytical results indicate that MOMEM is an innovative tool for accurate calculating soil shear strength; thus, it provides geotechnical engineers with reliable figures to significantly increase soil-related engineering design.


Introduction
The soil shear strength can be defined as the magnitude of shear stress that soil is capable of withstanding [1]. In civil engineering, the shear strength of soils is fundamental for describing their susceptibility to applied pressures from building loads and construction machines/equipment. Geotechnical engineers use shear strength of soil as an essential factor to evaluate the stability of structure on or embedded in the ground, such as retaining walls, embankments, airfield pavements, and foundations of a high-rise building [2]. Therefore, obtaining an accurate estimation of the shear strength of soil is a highly important task in various geotechnical designs [3][4][5].
In practice, computing this parameter of interest faces various difficulties. It is because the mapping function between the shear strength and soil properties has been proved to be complicated [6][7][8][9][10][11][12]. Freitag [13] found that 1 3 there is a complicated dependency between soil strength and properties of moisture content and bulk density; thus, estimation of soil strength based on such factors is by no means an easy task. Conventional estimations of shear strength are dependent of in the values of the cohesion (c) and the angle of internal friction (φ). However, there are inconsistencies in the values of the cohesion and the angle of internal friction for a particular soil [14].
Fredlund et al. [15] attempted to capture the variation of the soil shear via a soil-water characteristic curve and the saturated shear parameters of soil [15]. However, the closed-form predictions are not applicable to all types of soils. Gao et al. [3] pointed out that the shear strength of soil varies significantly with different soil types. Moreover, performing laboratory tests (e.g., applied triaxial equipment) to estimate the shear strength of soils is also found to be both time-consuming and costly due to the employments of instruments as well as highly trained technicians [16].
Due to such challenges, alternative methods based on advanced machine learning models have been recently proposed to deal with the task of soil shear strength prediction. Based on existing samples collected from laboratory tests, intelligent models can be developed and train to perform shear strength calculations automatically and instantly. Based on a direct comparison between actual testing outcomes and results predicted by those intelligent models, the reliability of machine learning models can be objectively judged.
Adaptive neuro-fuzzy inference system (ANFIS) has been employed to predict the shear strength of unsaturated soils [17]. This fuzzy neural network model is shown to be a capable tool for dealing with the problem of interest. A hybrid model consisting of artificial neural networks and support vector machines with various ensemble strategies (i.e., voting, bagging, and stacking) has been proposed by Chou and Ngo [18] to compute the strength of fiber-reinforced soil. Chen et al. [19] proposed a modified linear regression method for modeling the variation of shear strength based on soil properties. Mbarak et al. [20] predict the undrained shear strength of soil with the utilizations of random forest, gradient boosting and stacked models.
Recently, metaheuristic approaches have been increasingly harnessed to create intelligent models to copy with the task at hand. Pham et al. [21] replaced the conventional gradient descent based algorithm used for training ANFIS models by well-established metaheuristic algorithms, including particle swarm optimization and genetic algorithm. Tien Bui et al. [22] recently combines the least square support vector machine and metaheuristic method of cuckoo search optimization for predicting the shear strength parameter of soft soil. Moayedi et al. [23] investigated the feasibility of spotted hyena optimizer and ant lion optimization for training neural networks used for estimating the soil parameter of interest. Nhu et al. [24] establish an integrated model of support vector regression and particle swarm optimization relying on various soil features such as moisture, density, components, void ratio, and water content. The dragonfly algorithm, whale optimization algorithm, invasive weed optimization, elephant herding optimization, shuffled frog leaping algorithm, salp swarm algorithm, and wind-driven optimization models have been employed to construct capable neural network-based prediction models [25,26]; Metaheuristic has shown their capability of improving predictive accuracy of neural network models by annihilating the drawback of conventional backpropagation and gradient descent methods [27][28][29].
Besides, in the fields of data science and civil engineering, there is an increasing trend of applying ensemble learning methods to solve complex data modeling tasks [18,[30][31][32][33]. This novel approach of machine learning employs multiple learning paradigms to construct combined models featuring excellent predictive capability. Ensemble models have been shown to attain better prediction accuracy and alleviate the overfitting phenomenon [34][35][36][37]. A machine learning ensemble is generally constructed by aggregating results produced by several alternative models; therefore, the trained ensemble model often has good flexibility in data modeling and strong resilience to noise [38][39][40].
Following these two trends of research, this study proposes a novel metaheuristic optimized machine learning ensemble approach for predicting the soil shear strength. The ensemble includes the two capable individual machine learning methods of the radial basis function neural network (RBFNN) [41,42] and multivariate adaptive regression splines (MARS) [43,44] which are powerful tools for nonlinear and multivariate data modeling. The two techniques also present different types of machine learning; thus, they are likely to contribute their reciprocal strengths and cover weakness of each other in building a meta-ensemble model with an outstanding efficiency [45]. In this integrated framework, individual RBFNN and MARS models are trained and their prediction results are combined via the stacking aggregation method [35].
In addition to simultaneously tuning hyper-parameters for constituent models in the ensemble model, the weights of each individual model's output must be determined adaptively. The training phase of the proposed meta-ensemble model is further enhanced by the employment of the artificial electric field algorithm (AEFA)-a state-of-the-art metaheuristic approach. The AEFA [46] is inspired by the concept of the electric field, charged particles, and the Coulomb's law of electrostatic force. This algorithm has achieved competitive performances against other optimization methods in terms of both solution quality and convergence speed. Hence, the AEFA surely optimizes the performance of the stacking method-based ensemble model by simultaneously searching for the most suitable set of hyperparameter values and weights of constituent models.
A dataset of 249 data samples collected from the geotechnical investigation process in Hanoi (Vietnam) is used to train and test the proposed approach. The variables of the depth of the sample, sand percentage, loam percentage, clay percentage, moisture content percentage, wet density, dry density, void ratio, liquid limit, plastic limit, plastic index, and liquidity index are employed as influencing factors.
All in all, the goal of this research is to construct and verify a hybrid Metaheuristic-optimized meta-ensemble learning model, denoted as MOMEM, to enhance the prediction accuracy of the soil shear strength. The contribution of this study is multifold: (1) a novel machine learning ensemble for predicting soil shear strength is proposed; (2) the model is adaptively constructed with the employment of the AEFA metaheuristic; (3) a superior predictive performance is achieved compared to other benchmark machine learning models. The rest of the study is organized as follows: the collected dataset is described in the second section. The next section reviews the research method, followed by a section that presents the newly developed machine learning framework. Experimental results and comparisons are reported in the fifth section. The final section provides several concluding remarks of the study.

The collected dataset
The dataset used in the present study was collected at the geotechnical investigation phase of the Le Trong Tan Geleximco Project, located in the west of Hanoi, Vietnam (Fig.1). The site investigation was conducted in April 2009. This project covers an area of approximately 135 ha, which was used for the construction of low-rise housing, high-rise housing, public infrastructures, and entertainment centers. In order to gather information on the soil conditions, the boring-based soil sampling is utilized. The boreholes are drilled by means of slurry (a mixture of bentonite and water), and thin-walled metal tubes to ward off soil collapses. The soil samples with a diameter of 91 mm are gathered by the method of piston samplers. The sample collection process complies with the Vietnamese national standards of the TCXDVN-194-2006 (High Rise Building-Guide for Geotechnical Investigation), the TCN-259-2000 (the procedure for soil sampling by boring methods).

Radial basis function neural network (RBFNN)
A typical structure of a RBFNN model [41,42] used for nonlinear function approximation consists of three layers and is graphically presented in Fig. 3. The first layer has a number of neurons equal to the number of influencing factors, which is equal to 12 in this study. The next layer comprises a set of radial basis function (RBF) neurons, which are basic units of information processing. This neuron employs the RBF, which is an attenuation, central-radial symmetry, non-negative, non-linear function. It is noted that all functions that are dependent only on the distance from a center vector are radially symmetric about that vector. The last layer, which is the output layer, yields the network output by performing a linear combination of outputs produced by units in the second layer.
The RBF used in the second layer is mathematically presented as follows [47]: where v and denote the parameter of position and width, respectively, of the RBF nodes.
It is proper to note that the number of output neurons in this study is 1, which is the shear strength of the soil. Accordingly, the soil shear string can be computed via the RBFNN model as follows [48][49][50]: x − ν j is Euclidean distance between the center of the jth hidden node and a data point; w j represents the connecting weight from the jth hidden node to output layer; b is the bias term; N n denotes the number of hidden neurons; j (•) is the radial basis function of the jth hidden note.
It is noted that the model construction phase of the RBFNN model requires a proper determination of the number of neurons (N n ) and the RBF width (σ). An appropriate set of N n and σ ensures the success of establishing a   Corr. = 0.072 C orr. = 0.034 C orr. = -0.028 Fig. 2 Distributions of the soil parameters used in this research robust RBFNN model used for soil shear strength estimation. The parameter N n dictates the architecture of the inference model, resulting to either the over-fitted (in case of too many RBF nodes assigned) or under-fitted model with low accuracy (in case of having a small number of RBF nodes) [51,52]. Additionally, the parameter σ affects the influence of RBF nodes on each data point; therefore, it determines the generalization of the prediction model.

Multivariate adaptive regression splines
MARS was developed by Friedman [43], which is formed by fitting basic function (term) to distinct intervals of input variables. In general, splines (also called piecewise polynomials) have pieces smoothly connected with joining points called knots (k). MARS uses two-sided truncated power functions as spline basis functions, described as Eqs. (3) and (4): where (q ≥ 0) is the degree the resultant function estimate; [] + indicates to take positive values.
An interaction term is yielded by multiplying an existing term with a truncated linear function having a new attribute. Accordingly, both the existing term and the newly created interaction term are involved in the construction MARS model. The search for new terms is limited in a maximum order of the user. The formula of interaction term and the general MARS function can be referred to Eqs. (5) and (6), respectively: where K m is the number of truncated linear functions multiplied in the mth term. K m is less than maximum interaction among variables I max that is pre-defined by the user. x v(m,j) is the input variable corresponding to the jth truncated linear function in the mth term; k m,j is the knot value of x v(m,j) ; s m,j is the selected sign + 1 or − 1; ŷ is the dependent variable predicted by the MARS model; c 0 is a constant; T m (x) is the mth term, which may be a single spline basis functions; and c m is the coefficient of the mth term.
A MARS model is constructed through a two-stage process including a forward phase and backward phase. The forward phase adds terms until the maximum number of terms is reached which is pre-assigned by users (T max ). This phase aims at reducing the sum-of-squares residual error and tends to result in the complex and over-fitted model. Hence, it needs to have a backward phase to remove the redundant terms with less impact on the model's generalization. The generalized cross-validation (GCV) was introduced as a criterion to recognize those redundant terms as shown in Eq. (7): where n is the number of data patterns. C(M) is a dynamic complexity penalty defined as Eq. (8): where M is the number of terms in Eq. (8); the parameter d is a penalty for each term added into the model and preassigned by the user. Notably, a suitable value of d can lead to an under-fitted or over-fitted model with poor generalizability [53][54][55]. In general, T max , I max , and d are regarded as essential factors to control the MARS model's performance.

Artificial electric field algorithm
Artificial electric field algorithm was developed by [46] which is inspired by the Coulomb's law of electrostatic force, stated that an electrostatic force between two charged particles is directly proportional to the product of their charges and inversely proportional to the square of the distance between their positions. In the AEFA, an agent is regarded as a charged particle and its strength is measured by electric charge. A particle can either attract or repel others by an electrostatic force which is used as a means of communication channel among particles. A particle with better fitness will possess a greater electrostatic force. In the AEFA, the attractive electrostatic force is regulated that a charged particle with the greatest charge attract all other particles of lower charge and move slowly in the search space. Figure 4 presents the flowchart of the AEFA.
The AEFA first randomly locates of N particles in d-dimensional search space with the position of ith particle presented as . The position of the particle ith at time t is given by the following equation: The location of the best particle with the greatest fitness is denoted by P best = X best . The force acting on the charge ith from charge jth is calculated as follows: where Q i (t) and Q j (t) are the charges of ith and jth particle at any time t, respectively; K(t) is the Coulomb's constant at any time t; ϵ is a positive constant; and R ij (t) is the Euclidian distance between the ith particles and the jth particle. The Coulomb's constant K(t) is given as follows: where α and K0 are a constant value and initial value, respectively; iter is the current iteration and maxiter is the maximum number of iterations. .
The total electric force acts on the ith particle by all the other particles is given by the following equation: where rand() is an uniform random number in the range of The electric field of the ith particle at any time t is given by the following equation: The acceleration of the ith particle at any time t is given by the following equation: where M i (t) is the unit mass of the ith particle at any time t. The velocity and position of the ith particle are updated as follows: where rand() is a uniform random number in the interval [0, 1]. The charge of a particle is calculated based on the fitness functions with an assumption of equal charge for all particles as shown in Eq. (10). Fundamentally, the charge of the best particle should have the greatest value while other particles have charge values in the range of [0 1]. Hence, the particle with a larger charge has a greater force for the best fitness value: where fit i is the fitness value of ith particle at any time t; best(t) = min(fit i (t)) and worst(t) = max(fit i (t)).

The proposed metaheuristic-optimized meta-ensemble learning model for soil-shear-strength estimation
This section describes the overall structure of the metaheuristic optimized machine learning ensemble, named as MOMEM, used for predicting the soil shear strength. The proposed model consists of the RBFNN and MARS machine learning models. The construct a robust prediction approach, the AEFA metaheuristic is employed to optimize the hyperparameters of the RBFNN and MARS. Besides, the dynamic weight values for model members involved in the ensemble MOMEM are also automatically determined by the AEFA method. The general workflow of the proposed MOMEM for estimating the shear of soil is presented in Fig. 5.

Initiative phase
As presented in Fig. 5, the initialization phase uses purely the training data for the construction. Since greater numeric ranges tend to result in undesirable bias in the construction of a machine learning model; therefore, all attribute values are converted into the same range of [0, 1] so that these attributes are fed to the construction process of a machine learning with the equal weight. Herein, the convertion uses the normalization method with the equation presented in Eq. (20). It is worth noticing that the study employed 10-cross validation method to divide the whole dataset. Hence, the training dataset is the result of gathering 9 of 10-crossing folds. The remaining fold is used for testing the trained machine learning model. The training dataset is then partitioned into two sets; one set with 70% of data patterns is employed to trained model; another set with the remaining 30% of data patterns is used to validate the newly trained model: where x i,j is a data value of the attribute jth, x j,min is the lowest value for the attribute jth, x j,max is the highest value for the attribute jth, and x i,j tran is the transformed value of the attribute jth.
Parameters in the MOMEM are composed of a number of member inference models integrated into and their controlling parameters. Hence, the framework of blending machine learners leads to an increase of parameter numbers along with the increase of involved inference models. In this study, reciprocal merits of the MARS and RBFNN are blended to establish the MOMEM, there are seven parameters (N n , σ, M max , d, I max , α, and β) that need to be simultaneously fine-tuned to attain the optimal configuration of MOMEM with ultimate generalizability. In this phase, each of all seven parameters initially was assigned many random values in an extensive range to have an evaluation basis for the searching loop at the next phase. The searching boundary for each parameter is set to be sufficiently large to cover all potentially suitable values of concerned parameters and shown in Table 2.

Searching phase
At steps 1.1 and 1.2, member models will receive values of parameter generated by the AEFA searching engine to proceed with the construction of inference models by using partitioned training data that is processed at the previous phase. This is the construction process of many model candidates and thus inevitably consumes time most compared to other steps. Further, more inference models blended in this meta-ensemble learning framework will entail more computational time. In the MOMEM, particularly, the RBFNN model and MARS model are established independently with AEFA-assigned parameter values (N n and σ) and (M max , I max and d), respectively. Eventually, the trained models generate the prediction values of all data points for training and testing sets. It should be emphasized that these parameter values purposely attain the greatest prediction accuracy for the proposed meta-learning model, the MOMEM, rather than individual models. It is important to note that many new models are created to potentially replace the old ones at each iteration. As mentioned previously, the final prediction values of the MOMEM is calculated by summing up the predictive values of blended member models multiplied with their dynamic weight of the corresponding models. Notably, values of dynamic weights (α and β) express the influencing level of the corresponding constituent models on the final prediction values of the MOMEM which are tuned by the AEFA searching engine in the standardized range of [0, 1] as shown in Table 3. Formulas of merging predictive values are expressed in Eq. (21). It is noted that the traditional stacking ensemble method; all member models are assumed to have an equal role, which is expressed by the average weight assigned [56]. These subjective average values may diminish the generalizability of the ensemble model. Further, the AEFA searching engine can automatically discard any member models by assigning a dynamic weight of 0 if it recognizes those models undermine the accuracy of the MOMEM. At that time, the MOMEM becomes a single hybrid machine learner: where p MNVIM , p RBFNN , and p LSVR indicate the prediction values of the MOMEM, RBFNN, and MARS, respectively; α and β are the dynamic weight of the RBFNN and MARS, respectively.
Setting an appropriate objective function for the AEFA searching engineer is crucial to obtain a robust meta-learner successfully. Many joined individual models along with the support of the AEFA are likely to purely minimize the training error that causes the final model to be trapped at the over-fitting problem. It is worth noticing that this well-fitting model has poor generalizability since it is over-trained with a high-complex degree to fit the training data only [57]. Inspired by successes of other works [58][59][60][61], this study put forward an objective function expressed by a sum of validation error and training error, as shown in Fig. 5. The present study prefers to employ the root mean square error (RMSE) for the objective functions because this evaluation criterion has an additional feature of reducing the number of large biases between actual values predictive values. The AEFA searching engine thus needs to reduce the sum of RMSE of training and validation concurrently as shown in Eq. (22). The offered objective function is thus expected to alleviate the effect of over-fitting, increase the generalizability of the MOMEM, and reduce the number of large undesirable biases. Hence the concerned issued may be stroked efficiently: The set of parameter values (N n , σ, M max , I max , d, α, and β) will be synchronously adjusted by the transformation procedure of AEA. A bad set of values is replaced by better ones based on the obtained values of the objective function. Therefore, the values of tuning parameters will be progressively improved after each loop. Accordingly, the learnability and generalizability of the MOMEM are progressively improved after each searching loop. The AEFA will select the best tuning parameters, i.e., for each searching loop, those that provide the lowest value of fitness function. The AEFA shall memorize the suitable sets of parameters and extend the searching loop to meeting the stopping criterion. This study used the iteration number of 50 as the stopping criterion.

Optimal meta-learner and application phase
The searching process halts when the stopping condition is met, indicating that the optimal set of tuning parameters (N n , σ, M max , I max , d, α, and β) is available to train MOMEM with the entire training dataset. The trained MOMEM is then saved to perform prediction tasks for the testing phase or generate prediction value for new data input. It is recommended to update and re-train when having enough number of new data. More data of soil-shear-strength enable MOMEM to further discover more underline features of soil-shear-strength thus reaching closer to the underlying function of soil's shear strength. where SST= total of sum square errors; SSR = sum of square residual errors; p i = predicted value; y i = actual value; ȳ = average of actual value; and n = number of data patterns.

Experimental results and discussion
In order to avoid the bias in the data partition process, a tenfold cross-validation approach is applied for the splitting dataset [62]. Thus, all data patterns will be in turn assigned in both training and testing sets. Since this study compares AI-inference models based on statistical methods with mean and standard deviation, all models will perform the prediction task on 20 run times. Hence, the ten crossing folds are used twice. The models were run in MATLAB environment version 2018a [63]. Table 3 displays the performance of the MOMEM on the soil-shear-strength for 20 run times. As seen in the table, the MOMEM attains a very high accuracy for inferring values of soil-shear-strength which is expressed by the relatively low average values of RMSE (0.035 kg/cm 2 ), MAPE (7.94%), and MAE (0.026 kg/cm 2 ) in the testing phase. Further, the low values of the standard deviation of those criteria strongly confirm that the high performance of MOMEM is retained stable and exiguously affected by the data partition. Interestingly, there are a slight difference between the values of evaluation criteria between the testing phase and the training phase. Particularly, the absolute deviations of MAPE and RMSE at the two phases are only 0.25% and 0.003 (kg/cm 2 ), respectively. These facts indicate that the over-fitting issue is addressed efficiently in the MOMEM by summarizing the training and validation errors in the objective function.
Sets of parameter values found on 20 run times are shown in Table 4. Surprisingly, the values of all parameters vary in large ranges. Especially, the optimal spread of Gaussian function in RBFNN lies from 0.018 to 4.979 while the maximum number of neurons is 60 that is as many as ten times of minimum number, respectively. Those changes are obviously interpreted as to grasp the different characters of each training and validation dataset. Apparently, the chance of the trial-and-error method or experience-based parameter setting to successfully determine the optimal configuration of MOMEM is limited.
As exhibited in Fig. 6, there is a change of the member model's role in the MOMEM which is identified by the values of α and β found in a range of [0 0.997]. Especially, the RBFNN absolutely dominated the MARS in the construction of MOMEM in the 16th run when α and β are found as 0.997 and 0, respectively. Inversely, the RBFNN was discarded in the final MOMEM in the 12th run and the 17th run. The values of dynamic weights assigned to be 0 dedicated that the corresponding constituent model (RBFNN or MARS) was recognized to impair the accuracy of the MOMEM model; thus, the final prediction values of the MOMEM were produced by the only one of two constituent models (RBFNN or MARS). Accordingly, the MOMEM becomes a pure hybrid of the AEFA with the RBFNN or with the MARS, abbreviated as AEFA-RBFNN and AEFA-MARS, respectively.
It is worth noticing that found values of parameters lie in the middle of pre-defined ranges, indicating the boundary of each range was sufficient to cover all the potential values of parameters. Figure 7 exhibits the convergence curves of the particular run, which demonstrated the AEFA to have a rapid convergence. In detail, the optimal values were able to be found at approximately iteration 35 out of 50, which affirmed that the fittest values are always determined before the stopping condition is met.

Result comparison and discussion
It is necessary to compare the performance of the developed models against that of other AI-based inference models so that the role of techniques in MOMEM is clearly clarified. RBFNN, MARS, generalized regression neural network (GRNN), and support vector regression with Gaussian function (kernel function) (SVR gau ) are selected because they present a different type of machine learner and highly perceived in solving engineering-related issues in the literature. AEFA-MARS and AEFA-RBFNN, and traditional MARS + RBFNN ensemble models are selected in the comparison to quantify the contribution of each technique integrated into MOMEM. All models were run on MATLAB environment 2018a [63] with the trial-and-error method to find the potentially suitable configuration. The interface platforms are presented in Fig. 8.
The statistical results shown in Table 5 have demonstrated the proposed model, MOMEM, as the greatest model by achieving the greatest values in terms of RMSE (0.035 kg/ cm 2 ), MAPE (7.94%), MAE (0.026 kg/cm 2 ). Those values are much lower than that of the second-best model, the    A cause-and-effect relationship between the soil's shear strength and its attributes will lead to a high value of R 2 , meaning that the MOMEM mapped a closer underlying function of soil's shear strength by gaining the greatest value of R 2 (0.864 and 0.826 for training and testing phase, respectively). In other words, 82.6% of data patterns can be inferred by MOMEM-mapped function while it is only 77.7% for the second-best model, the AEFA-RBFNN. Figure 9 presents the actual-versus-prediction value of all data patterns in the testing phase. Generally, the obtained results have firmly proved a possibility of boosting the accuracy of soil-shear-strength estimate by integrating the AEFA in the fusion of the MARS and the RBFNN.

One-tail t test method for examining the mean difference
This study further conducted a one-tailed t test to test for the significant differences of the MOMEM's performance in estimating shear strength of soil against that of other AI models. The one-tailed t test was calculated on the root mean absolute error (RMSE) values with an equal size of 20 samples (run times) and unknown variances. It is worth noting that the RMSE sample is assumed to have a normal distribution. The procedure for implementing one-tailed t test is as follows: where n is the number of samples (n = 20); ν is the degree of freedom; s 1 2 and s 2 2 are the unbiased estimators of the variances of the two samples; the denominator of t is the standard error of the difference between two means x 1 and x 2 (average). Calculated results with a confidence level of 95% (α = 0.05) are presented in Table 6. For all cases, t statistic < − t-critical one-tailed (− 1.69), indicating MOMEM significantly outperformed other models in reducing RMSE values of soil-shear-strength. This conclusion may be manifested in Fig. 10. In conjunction with a robust performance demonstrated as analyzed above, the MOMEM has no request for setting up its configuration in the soil-shearstrength estimate process. In general, Analysis results support the MOMEM as the best choice for civil engineers; this model is able to yield reliable estimate values of soil's shear strength.
In order to achieve efficiency, the construction of the MOMEM using the hybrid of several AI techniques inevitably increases the complexity of the model structure. Thus, the MOMEM needs more computation time than other models in the model construction process. Considering the many obtained benefits, including significantly improved prediction accuracy of soil-shear-strength and user-friendliness, the additional computation time is justified and acceptable.

Concluding remarks
The shear strength is the maximum resistance or stress that a particular soil can offer against failure over its improper surface loading. The shear strength of soils is essential for any stability analysis. Therefore, it is crucial to determine the reliable values of soil-shear-strength. A novel meta-learner was created based on the framework, called Metaheuristic-optimized meta-ensemble learning model (MOMEM) which was a combination of the RBFNN, the MARS, and (27) Degree of freedom: v = (n − 1)(s 2 1 + s 2 2 ) 2 ∕(s 4 1 + s 4 2 ) the AEFA search engine. In the MOMEM, AEFA applies its optimization procedure to organize suitable configuration of the inference model member, The findings demonstrated that the AEFA searching engine can autonomously determine the best set of parameter values to adaptively exploit features of each training dataset. The AEFA is able to recognize and discard a model member that damages the accuracy of the MEMOM. At that time, the MEMOM becomes a purely single hybrid model of a metaheuristic algorithm and a machine learner. In summary, the AEFA plays an integral role in selecting the merits of RBFNN and MARS to make them the best fit in the MOMEM.
This study is the first to put forward a novel stacking techniques-based ensemble model of hybridizing MARS and RBFNN that represent different types of learners in the field of machine learning for sharing reciprocal merits in the meta-ensemble model. Additionally, the present study lifted the operation of ensemble model to a higher level by integrating a robust metaheuristic optimization algorithm, AEFA, to ascertain automatically attaining the maximum Despite possessing many advantages, the MOMEM also has two weaknesses: First, the MOMEM performs the inference process as a black box since the mapped functions of soil-shear-strength is not visually formulated. Second, the model construction needs a long time due to the optimization process and two inference models involved in the trained process. However, the computational time problem can be sharply shortened by using modern high-speed computers. Further, once the optimal model is found, it can quickly infer the outcome of new data patterns and saved for use in a long time. The saved model should be updated with new data points collected which is also a work for authors to implement in the future.
Funding Open Access funding provided by University Of South-Eastern Norway.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.