Predicting critical shear stress using multivariate adaptive regression splines and genetic expression programming for cohesive soils on the Island of Oahu, Hawaii

Rahimnejad, Reza; Vosoughifar, Hamid Reza; Bateni, Sayed M.; Ooi, Phillip S. K.; Rezaie, Fatemeh

doi:10.1007/s42452-024-05965-4

Predicting critical shear stress using multivariate adaptive regression splines and genetic expression programming for cohesive soils on the Island of Oahu, Hawaii

Research
Open access
Published: 11 June 2024

Volume 6, article number 321, (2024)
Cite this article

Download PDF

You have full access to this open access article

Discover Applied Sciences Aims and scope Submit manuscript

Predicting critical shear stress using multivariate adaptive regression splines and genetic expression programming for cohesive soils on the Island of Oahu, Hawaii

Download PDF

Reza Rahimnejad¹,
Hamid Reza Vosoughifar²,
Sayed M. Bateni³,
Phillip S. K. Ooi⁴ &
…
Fatemeh Rezaie³

417 Accesses
Explore all metrics

Abstract

31 undisturbed cohesive silts with plasticity indices ranging from 3 to 55% were tested in an erosion function apparatus to obtain their erodibility curves. The critical shear stress (${\tau }_{cr})$ was estimated by fitting a hyperbolic function to the erodibility curves. Tests were also conducted to obtain the index properties of each sample. Existing regression-based approaches cannot capture the complex and highly nonlinear relationship between ${\tau }_{cr}$ and other common parameters of cohesive soils with great accuracy. Hence, two robust approaches, namely multivariate adaptive regression splines (MARS) and genetic expression programming (GEP) are utilized to estimate the ${\tau }_{cr}$ from easily measurable index soil properties. These soil properties are selected based on a literature review and correlation analysis between the ${\tau }_{cr}$ and other parameters [e.g., water content, plastic limit, liquid limit, plasticity index, liquidity index, activity, median grain size, percent fines (particles smaller than 0.075 mm), percent clay (particles smaller than 2 μm), undrained shear strength, compression and recompression indices, soil unit weight, consolidation pressure, pre-consolidation pressure and void ratio]. Three statistical metrics namely coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE) were used to evaluate the performance of the models. Results indicate that the MARS approach outperformed GEP based on: (1) estimates from MARS (R² = 0.992, MAE = 0.483 N/m², and RMSE = 0.641 N/m²) were better than those from GEP (R² = 0.906, MAE = 1.445 N/m² and RMSE = 1.686 N/m²); and (2) the MARS approach was able to detect the change in rate of variations of ${\tau }_{cr}$ (i.e., points where the trend showed a change in slope) when samples from different locations were compared. Also, a sensitivity analysis was performed to investigate the importance of each selected model parameter on ${\tau }_{cr}$.

Article Highlights

1.
Critical shear stress of cohesive soils are related to common soil parameters with a high degree of accuracy.
2.
MARS outperforms GEP in estimating critical shear stress in cohesive soils.
3.
The study provides insights into the complex relationship between soil properties and critical shear stress.

Using Gene Expression Programming to Determine the Impact of Minerals on Erosion Resistance of Selected Cohesive Egyptian Soils

Prediction of Enhanced Soil–Anchored Geogrid Interactions in Direct Shear Mode Using Gene Expression Programming

Article 03 September 2020

Developing Prediction Equations for Soil Resilient Modulus Using Evolutionary Machine Learning

Article 16 October 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Critical shear stress (${\tau }_{cr}$) is defined as the shear stress threshold below which soil scour does not happen. The accurate estimation of local scour depth at the bridge piers is vital for safe and economical design of bridges [1]. Underestimation of scour depth can cause costly bridge failures, while its overprediction increases the cost of construction [2]. Currently, soil scour depth in rivers and streams is estimated using the Federal Highway Administration (FHWA) Hydraulic Engineering Circular-18 (HEC-18) [3], which has been found to be overconservative when predicting scour depth in cohesive stream beds. This is because the HEC-18 equation was developed based on experiments on cohesionless soils where the median grain size diameter (D₅₀) is the only soil parameter used for scour prediction. Unlike cohesionless soils, scour of cohesive soils occurs more slowly since cohesive soils in general are more scour resistant primarily due to the physico-chemical forces that exist between particles. Underprediction of ${\tau }_{cr}$ will result in a greater scour depth prediction and can unnecessarily increase a bridge foundation cost.

Many attempts have been conducted to study the effect of a soil’s chemical and physical properties on ${\tau }_{cr}$. Mitchener and Torfs [4], Owen [5], and Thorn and Parsons [6] mentioned that ${\tau }_{cr}$ is correlated with the dry density of soil. Lal [7] showed that the ${\tau }_{cr}$ grows as the activity (A) increases. Ivarson [8] formulated the ${\tau }_{cr}$ as a function of unconfined compressive strength of clay and mean velocity of flow.

Briaud et al. [9] found no relationship between ${\tau }_{cr}$ and other soil parameters and recommended the erosion function apparatus (EFA) test as a means of determining the ${\tau }_{cr}$. Julian and Torres [10] showed that the fraction of clay and silt in cohesive soils affects ${\tau }_{cr}$. Straub and Over [11] found a relationship between ${\tau }_{cr}$ and unconfined compressive strength. Kimiaghalam et al. [12] investigated the effect of different physical and chemical soil parameters on ${\tau }_{cr}$ They found a strong correlation between soil cohesion from direct shear test and ${\tau }_{cr}$. Because cohesive soils are more scour resistant due to the physico-chemical forces that exist between particles, Rahimnejad and Ooi [13] developed an empirical regression-based equation to estimate ${\tau }_{cr}$ from the soil’s plasticity index (PI), liquid limit (LL), A, and water content (w). Wilson et al. [14] investigated variations in ${\tau }_{cr}$ in response to changes in saturated hydraulic conductivity, soil bulk density, w, soil penetration strength and surface shear strength. Li et al. [15] studied the mathematical correlation between ${\tau }_{cr}$ and varying antecedent soil moisture conditions in soils with different textures. Yagisawa et al. [16] conducted a large-scale levee erosion experiment to develop an equation to predict ${\tau }_{cr}$ based on three soil parameters: fine content, dry density, and moisture constant. They found that there is ± 30% error when comparing the direct measurements of ${\tau }_{cr}$ with values estimated from the empirical equations. Briaud et al. [17] developed models to quantify ${\tau }_{cr}$ on the basis of soil properties. They used indicators such as coefficient of determination (R²) and cross-validation score to determine the robustness of models. The models all had D₅₀ in common. Other soil parameters included: unit weight, w, undrained shear strength, percent fines, activity, soil uniformity, and plasticity.

Predicting ${\tau }_{cr}$ from soil parameters has been a challenge for researchers. The relationships vary among researchers mainly because they worked on soils from different regions, they did not perform the same tests on the soils or they simply used different types of regression or equations. The literature review suggests that it is challenging for traditional regression-based techniques to accurately capture the highly nonlinear and complex relationship between ${\tau }_{cr}$ and common soil parameters, which led to the motivation for this study, which is to use deep learning methods such as Multivariate Additive Regression Splines (MARS) and Gene Expression Programming (GEP) to estimate the ${\tau }_{cr}$ of cohesive soils.

MARS is a non-parametric model that was first introduced by Friedman [18]. GEP is a population based evolutionary optimization technique that was proposed by Ferreira [19, 20]. Emamgolizadeh et al. [21] developed a GEP- and MARS-based model to estimate the soil cation exchange capacity from percent clay, percent organic matter and pH. They found that GEP can accurately predict soil cation exchange capacity in terms of percent clay, percent organic matter and pH. Haghiabi [22] applied MARS to predict the scour depth around pipelines. Zhang et al. [23] used MARS to estimate the soil-wall relative stiffness in braced excavations. Yu [24] modeled shear strength of clayey soil using MARS based on five variables named sleeve friction, plastic limit (PL), cone tip resistance, overburden weight and LL. Abed et al. [25] estimated soil compaction parameters (i.e., maximum dry density and optimum w) by MARS. Goharzay et al. [26] utilized GEP to evaluate the probability of occurrence of soil liquefaction. Nawaz et al. [27] proposed a GEP-based prediction model for evaluation of soil’s PL based on soil properties. The findings indicated that the PL was influenced by clay and silt content along with sand particles. Using GEP, Jalal et al. [28] estimated the swelling pressure of expansive soils.

To the best of our knowledge, no other study has used MARS and GEP to estimate ${\tau }_{cr}$. The objectives of this study are to (1) use robust MARS and GEP to accurately estimate the ${\tau }_{cr}$ of cohesive soils from index soil properties such as LL, PI, liquidity index (LI), A, and D₅₀, (2) implement a sensitivity analysis to determine the relative importance of each input variable on the ${\tau }_{cr}$, and (3) compare the results of MARS and GEP with those of Rahimnejad and Ooi [35]. MARS and GEP have several advantages: (1) they are able to capture the complex relationship between the ${\tau }_{cr}$ and soil characteristics more robustly than the traditional regression methods, (2) they do not need a pre-defined specific function to describe the relationship between inputs and output(s), (3) in contrast to “black-box” models, such as deep learning models that offer limited insights into the underlying relationships in the data, MARS and GEP can provide an explicit expression and show how each variable contributes to the model’s predictions, and (4) they are robust against outliers and noise in the data, and are renowned for their simplicity. These methods can offer accurate predictions with fewer parameters and a more intuitive structure.

This paper is organized as follows. Section 2 describes the source of data, methodology, evaluation metrics, and parameter selection. Section 3 shows the results of MARS, GEP, and sensitivity analysis. Finally, conclusions are given in Sect. 4.

2 Methodology

2.1 Data

31 Shelby tube samples were collected from the vicinity of 5 water channels on the Island of Oahu, Hawaii (Fig. 1). The five locations selected include Waiahole Stream Bridge, Honouliuli Stream Bridge at West Loch Golf Course, the upstream most bridge at Keehi interchange on Moanalua Stream, Pier 26 of the Honolulu High-Capacity Transit Corridor Project founded in the Kaloi Drainage Channel along North–South Road and Halawa Stream Bridge on Kamehameha Highway. Laboratory tests were conducted on each sample to obtain the erodibility curve from an EFA (which yields the ${\tau }_{cr}$) as well as common soil parameters such as w, Atterberg limits, grain size distribution, undrained shear strength (S_u) from unconsolidated undrained triaxial tests, and consolidation test parameters. Results of laboratory tests for all samples are summarized in Table 1.

Table 1 Soil samples and parameters

Full size table

Borings were drilled to obtain Shelby tube samples in the softer soils and Pitcher samples in the stiffer soils. Sampling was continuous, alternating between Shelby tubes and split-spoon samplers over the depth of each boring. Because drilling in the vicinity of the various bridge piers within the streams was costly, the borings were drilled from land near the stream banks.

2.2 Methods

2.2.1 MARS

MARS is a non-parametric modeling technique that can be used to detect underlying nonlinear patterns that are concealed in a complex data set. MARS combines spline regression, recursive partitioning and stepwise model fitting [29].

MARS divides the population into smaller sections by finding hinge points (i.e., knots), which are points at which the slope of the trend line (spline) changes sign or magnitude. These piecewise linear functions are called basis functions (BFs). The number of BFs in a model depends on the degree of nonlinearity of the data. In general, MARS uses a forward pass to generate several knots at random locations of the data set and defines a pair of BFs at these knots. MARS moves the knots until the BFs result in the smallest RMSE, and this process continues until the maximum number of BFs is reached. At this stage, the model may be overfitted, and thus MARS uses a backward pass to eliminate the unnecessary BFs, which do not have a significant effect in reducing the residual error.

Generally, a BF at a random knot, t, can be written as:

$$ \left( {0,x - t} \right) = \left\{ {\begin{array}{*{20}c} {x - t} & { x \ge t\,\,0 } & {\,otherwise} \\ \end{array} } \right. $$

(1)

The MARS model is written as a linear combination of BFs and their interactions:

$$\ y = \beta _{0} + \,\sum\nolimits_{{i = 1}}^{M} {\beta _{i} B_{i} \left( X \right)} $$

(2)

where ${\beta }_{i}$ is the regression coefficient estimated by least square method, B_i(X) is a basis function or product of two or more basis functions, and M is the number of terms in the final model that is found in a forward–backward stepwise process.

In the backward process, the basis functions are selectively removed one by one from the model generated in the forward pass. This process results in J models with J-1, J-2, …, 1, 0 BFs. Finally, the backward pass selects the model with the lowest generalized cross validation (GCV) value as the best model [30]. GCV for the jth model is given by

$$ GCV_{j} = {SSE_{j}}/{\left(1 - \left( {vm_{j}/n} \right) \right)}^{2} $$

(3)

where SSE_j is the sum of square of residuals from a population with n samples and m_j is the number of basis functions for the jth model in the backward step. The smoothing parameter (ν) is typically selected between 2 and 4 [21]. A lower GCV is an indication of a model with better performance.

2.2.2 GEP

Ferreira [19, 20] combined the genetic algorithm and genetic programming approaches to develop GEP. It captures the nonlinearity and complexity of problems by expression trees (ETs) with different shape and size.

GEP uses fixed length strings (genomes or chromosomes). GEP modifies chromosomes by operators such as mutation, inversion, transposition, crossover/recombination and gene crossover [19]. The chromosome consists of one or more genes of equal length. Each gene consists of two parts: 1) head which is composed of terminals (variables and constants) and functions (e.g., + , −, × , /, sqrt), and 2) tail composed of only terminals.

Mutation operator is used in GEP to modify the characteristics of the initial population of chromosomes. Functions or terminals at the head of a gene can convert into another while at the tail of a gene, terminals can only transform to terminals which allows the structural organization of the chromosome remain intact. Inversion operator acts only on the heads of the genes and helps to generate new individuals through modification and random selection. During transposition, the transposable elements are activated and migrate to another place in the chromosome. GEP performs crossover and recombination in three ways: one-point, two points, and gene recombination. In one-point recombination, one point is chosen and the pair crossover their material through this point to generate offsprings. In the two-point method, two points are selected as crossover points where parent chromosomes exchange material to produce offsprings. In gene recombination, entire genes are exchanged between two parent chromosomes resulting in two offspring chromosomes that contain genes from both parents.

Generally, after GEP randomly generates chromosomes from the population, five steps are implemented to build GEP-based models:

1.
Fitness function selection: In this paper, roost relative square error (RRSE) fitness function of program i is used for individual selection:
$$RRSE=\sqrt{\frac{{\sum }_{j=1}^{n}{\left({P}_{ij}-{T}_{j}\right)}^{2}}{{\sum }_{j=1}^{n}{\left({T}_{i}-\underline{T}\right)}^{2}}}$$
(4)
where ${P}_{ij}$ is the predicted value by an individual chromosome $i$ for fitness case j, ${T}_{j}$ is the target value for fitness case j and $\underline{T}$ is the average of ${T}_{j}$. Knowing RRSE for each individual program, the fitness of the individual chromosome i (f_i) is:
$${f}_{i}=1000\times \frac{1}{1+{RRSE}_{i}}$$
(5)
where f_i ranges from 0 to 1000. A higher f_i corresponds to better chromosome fitness [31].
2.
In this stage, gene of chromosome is generated using a set of terminals (T) and functions (F). For this study, the terminal set consists of independent variables that are used in the MARS method (i.e., LI, LL, PI, A and D₅₀). Trigonometric and arithmetic functions such as + , −, × , /, √, Sin, Tan, Arctan, Exp, and power form the function set in this study.

3.
The length of the head and number of genes is chosen in this step. The number of expression trees is determined by the genes per chromosomes. The number of success rate in GEP increases as the number of genes increase from 1 to 3 [20]. In this study, 3 genes per chromosome are used. It should be noted that the performance of the GEP model was not improved when the number of genes and head length exceed 3 and 7. Best individuals were obtained when the number of chromosomes was set to 30.
4.
The genetic operators and their rates are selected. Genetic operators such as mutation, inversion, transposition, and recombination are used for this study.

5.
Finally, the sub functions are linked using math operators (i.e., + , −, × , /). It has been proven that the addition operator ( +) gives better results compared to other operators [21].

2.3 Performance evaluation of the models

Three statistical metrics, namely coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE) are used to evaluate the accuracy of the MARS and GEP models:

$$ \;R^{2} = \frac{{\sum\nolimits_{{i = 1}}^{n} {\left( {y_{i} - \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{y} } \right)\left( {\hat{y}_{i} - \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{y} } \right)} }}{{\sqrt {\sum\nolimits_{{i = 1}}^{n} {\left( {y_{i} - \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{y} } \right)^{2} } \sum\nolimits_{{i = 1}}^{n} {\left( {\hat{y}_{i} - \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{y} } \right)^{2} } } }} $$

(6)

$$\ MAE = \frac{1}{n}\sum\nolimits_{{i = 1}}^{n} {y_{i} - \hat{y}_{i} } $$

(7)

$$RMSE=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}$$

(8)

where ${y}_{i}$ and ${\widehat{y}}_{i}$ are the ith measured and predicted ${\tau }_{cr}$, respectively, n denotes the number of samples, and $\underline{y}$ represents the mean of the variable.

2.4 Parameter selection

Shear strength in fine-grained soils is in part the result of physico-chemical forces between clay particles. These forces govern the spacing, arrangements, and orientation of the particles. In addition, external forces such as overburden pressure can cause interparticle movements that result in a decrease in interparticle spacing and an increase in interparticle contacts which in turn leads to an increase in soil shear strength [32]. The literature also suggests that the size of fine particles has a direct impact on the shear strength of the soil [33]. Therefore, the shear strength in soil is dependent on the physico-chemical forces, particle size and stress history of the soil. From a soil mechanics standpoint, the physico-chemical forces can conceivably be related to a soil’s PI, while the stress history can be represented by its w, void ratio (e), unit weight, and LI. D₅₀ is used in this study to account for the particles size effect.

Figure 2 shows the correlation between ${\tau }_{cr}$ and LL (%), PI (%), LI (%), A, and D₅₀ with coefficients of determination of 0.086, 0.132, 0.078, 0.194, and 0.293, respectively. A wide range of PI suggests a wide range of soils with different external energy needed to expand the double layer far enough to detach particles from the soil surfaces [29]. PI itself is related to LL and PL since PI = LL—PL. For a given soil, an increase in LL will result in an increase in PI, which increases the ${\tau }_{cr}$ [34].

A is defined as the ratio of PI to the clay fraction of the soil. Thus, an increasing PI will result in increased A and ${\tau }_{cr}$ [7, 33]. A smaller grain size leads to a larger specific surface area, and hence less inter-particle attractive forces among particles, ultimately reducing the ${\tau }_{cr}$.

LI is a parameter that captures the effect of soil stress history, and is related to the soil moisture content, PL and PI as follows:

$$LI=\frac{w-PL}{PI}$$

(9)

Cohesive soils having a negative LI indicates a low w and a drier stiffer soil that it is most likely over-consolidated. For a certain soil, as w increases (i.e., LI increases), the soil becomes more normally consolidated and more erodible.

The ${\tau }_{cr}$ was also plotted versus soil w, PL, Su, percent finer than 2 μm (i.e., percent clay), percent finer than 0.075 mm (i.e., percent fines), recompression ratio (RR), compression ratio (CR), e, pre-consolidation pressure (σ_vm'), and over consolidation ratio (OCR), but no correlation was found (not shown herein). Hence, in this study, only five input parameters, namely LL, PI, LI, A, and D₅₀ were used as inputs in the GEP and MARS methods to estimate the ${\tau }_{cr}$. The whole dataset (with 31 samples) was split into randomly selected training (with 21 data points) and testing (with 10 data points consisting of 2 data points from each location) datasets to train and test the MARS and GEP models (Table 1).

3 Results and discussion

3.1 MARS

There are three key parameters in MARS: the number of basis functions in the forward step (nk), degree of interaction (deg), and the smoothing parameter v in GCV. These values were changed from 10 to 30, 1 to 2, and 2 to 4, respectively to create MARS models, and MAE, RMSE, and R² of each model were calculated. The best model was selected based on a minimum value of MAE and RMSE, and maximum value of R² in both training and testing phases. The best model had nk, deg and ν of 23, 2, and 2. The estimated coefficients and basis functions of the final MARS model are summarized in Table 2.

Table 2 Basis functions and coefficients of the MARS model

Full size table

Figure 3 shows ${\tau }_{cr}$ estimates from MARS versus observations for both training and testing datasets. As shown in Fig. 3, the estimated and measured values of ${\tau }_{cr}$, are closely scattered around the 45° agreement line in both training (with MAE = 0.483 N/m², RMSE = 0.641 N/m², and R² = 0.992) and testing (with MAE = 1.611 N/m², RMSE = 0.998 N/m², and R² = 0.928).

Figure 4 indicates the variation of RMSE for MARS and GEP with the number of basis functions and iterations. Based on Fig. 4, the performance of the models improves as the number of basis functions and iterations increase. Once nk reaches 23, adding another basis function will not lead into further model improvement.

As shown in Table 1, A is the only parameter that does not appear in the MARS model individually. The rate of variation of D₅₀ changes at 0.008 mm. A sharp decrease in ${\tau }_{cr}$ is observed for D₅₀ values smaller than 0.008 mm (with Β_i = − 4562.5) followed by a slight increase for greater values of D₅₀ (with Β_i = 142.59). This observation is in agreement with Fig. 2e. PI appears both individually and interactively (i.e., combination with other parameters) in the model. The hinge values for PI are 33% and 16%. PI has a negative impact on the ${\tau }_{cr}$ (with coefficients of − 87.4 and − 168.9) when it appears individually in the MARS model. However, through an interaction with the LL values smaller than 57.7%, the impact becomes positive and much stronger with coefficients of 3726.5 and 6608.4. Similarly, LL has a negative impact on ${\tau }_{cr}$ when it appears individually in the MARS model. It is worth noting that LL appeared in ten BFs, implying its significant impact on the ${\tau }_{cr}$. When LI is smaller than 0.7, it shows a positive impact on ${\tau }_{cr}$. In contrast, for the higher LI values, the effect is like what is shown in Fig. 2c. It can be concluded that for the LI values smaller than 0.7, ${\tau }_{cr}$ increases as LI increases. A appears three times in the model. A was the only parameter that could not individually improve the accuracy of the model, however it did in combination with other parameters such as LI and LL.

Rahimnejad and Ooi [13] showed a relationship between ${\tau }_{cr}$ and common soil parameters with a coefficient of determination of 0.7. A higher coefficient of determination could not be obtained due to the limited number of regression parameters and the challenge inherent in using soils from multiple sources whereby it is difficult to develop a relationship that accounts for the different rates of variation at each location. From the observations explained above, the MARS approach can capture the variations within the data set.

3.2 GEP

From the 31 samples, 21 randomly selected samples were used for the model development and the remaining 10 samples (2 from each location) were used to validate the GEP model. The optimum GEP formulation is obtained by evolving the genes towards the formulation with minimum error with respect to the observed values. Training of GEP is terminated after 100,000 generations where the RRSE between two subsequent runs was less than 0.01. Final model is given by:

$$ \begin{aligned} \tau _{{cr}} =\, & exp\left\{ {Arctan\left( {(Sin\left( {LI - PI} \right))^{{ - 1}} - tan\left( {A^{4} } \right)} \right)} \right\}^{3} + tan\,tan\left( { - 4.63678 + 2LL + LI} \right) \\ & - \left( {LI - \left( {\frac{{D_{{50}} }}{{LI}}} \right)} \right) + \left( {4.60163 + LL} \right) + tan\left\{ {( - 7.02627 - LI)/LL} \right\}^{3} \\ \end{aligned} $$

(10)

The measured and predicted values of ${\tau }_{cr}$ are compared in Fig. 5. The results indicate that the GEP model performs better in training (MAE = 1.445 N/m², RMSE = 1.686 N/m² and R² = 0.906) than testing (MAE = 1.523 N/m², RMSE = 1.997 N/m² and R² = 0.875).

Figure 6 shows the ETs for Eq. 10. Unlike the MARS model, the GEP model cannot detect the points where the rate of variation changes. PI appears in the first ET and has a positive impact on the ${\tau }_{cr}$. LL appears in the second and third ETs. By increasing LL, ${\tau }_{cr}$ increases. In the first expression tree, the LI appears and has a negative impact on the ${\tau }_{cr}$. LI is repeated in all ETs. The overall effect of A and LI on the ${\tau }_{cr}$ cannot be easily detected and requires further sensitivity analysis. D₅₀ shows negative impact which agrees with Fig. 2e.

3.3 Sensitivity analysis

The generated equation was used to conduct the sensitivity analysis. This analysis was applied to find the relative importance of each input variable in the ${\tau }_{cr}$. In each test, only one input variable was changed at a fixed rate and the effect of the modified variable on ${\tau }_{cr}$ was determined. Fixed rates of 10%, 20%, 30% and 40% were used in this study. The sensitivity of ${\tau }_{cr}$ to changes in each input variable is calculated by Eq. 11.

$$ Sensitivity of \tau _{{cr}} (\% ) = \frac{1}{N}\sum\nolimits_{{i = 1}}^{N} {\left( {\frac{{\% Change \,in \,\tau _{{cr}} }}{{\% \,Change \,in \,the\, input \,variable}}} \right)} \times 100 $$

(11)

where N (= 31) is the number of input data used in this study. Figure 7 illustrates the sensitivity of ${\tau }_{cr}$ to different input variables at rates of 10%, 20%, 30%, and 40%, respectively.

The results of sensitivity analysis indicate that ${\tau }_{cr}$ is generally more sensitive to variations in A, LI, and D₅₀. Figure 7a–d show that changes in LL, PI, LI, A, and D₅₀ yield respectively an average change of 11.6%, 14.8%, − 17.7%, 25.7%, and − 13.6% in ${\tau }_{cr}$.

3.4 Comparison of results from this study with those of Rahimnejad and Ooi [35]

The MARS and GEP models are compared to the model presented by Rahimnejad and Ooi [35] in Fig. 8 with related information summarized in Table 3. The results show that the MARS model can predict the ${\tau }_{cr}$ more accurately. Again, the trend in error is clear for the predicted values from GEP.

Table 3 Statistical metrics of different approaches to estimate critical shear stress (${\tau }_{cr}$)

Full size table

4 Conclusion

MARS and GEP are two approaches that have been used in engineering problems to overcome complexity and non-linearity. In this study, these methods are used to predict the ${\tau }_{cr}$ based on common soil properties. Based on correlation analysis and previous studies, five common soil parameters, namely LL, PI, LI, A, D₅₀ are selected for model development. Selected parameters represent both stress history and physico-chemical interparticle forces in a cohesive soil.

Data was collected from five different locations. Previous work on this data showed a coefficient of determination of 0.7 (using classical non-linear regression). This suggests that a more complex statistical approach should be utilized to capture the main sources of variations in samples collected from different locations.

From the data set, 21 samples were picked randomly for the training data and the remainder were used to validate the models (two samples from each location). Both MARS and GEP yield accurate results. However, the estimates from MARS (MAE = 0.998 N/m², RMSE = 1.611 N/m² and R² = 0.928) are better than those from GEP (MAE = 1.523 N/m², RMSE = 1.997 N/m² and R² = 0.875). Moreover, MARS was able to detect hinge points. For instance, the impact of LI on ${\tau }_{cr}$ was different for values of LI greater and less than 0.7. Finally, the GEP model yields data points bouncing along the agreement line (45°line) sinusoidally whereas in the MARS model, the data points are more randomly distributed around the 45° line.

Using the testing dataset, the outcomes from the MARS and GEP models are compared to those of Rahimnejad and Ooi [35]. The MAE and RMSE of ${\tau }_{cr}$ estimates from MARS are 0.998 N/m² and 1.611 N/m², which are 81.8% and 58.3% smaller than the MAE and RMSE of 3.867 N/m² and 5.498 N/m² from Rahimnejad and Ooi [35]. The effect of each input variable on the ${\tau }_{cr}$ is investigated by conducting the sensitivity analysis. The results of the sensitivity analysis show that ${\tau }_{cr}$ changes on average by 11.6%, 14.8%, − 17.7%, 25.7%, and − 13.6% as LL, PI, LI, A, and D₅₀ increase from 10 to 40%, respectively. Hence, A and LL have the highest and lowest effect on ${\tau }_{cr}$. Future studies should be directed towards using other machine learning methods to estimate ${\tau }_{cr}$ and compare their performance with those of MARS and GEP.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Eini N, Bateni SM, Jun C, Heggy E, Band SS. Estimation and interpretation of equilibrium scour depth around circular bridge piers by using optimized XGBoost and SHAP. Eng Appl Comput Fluid Mech. 2023. https://doi.org/10.1080/19942060.2023.2244558.
Article Google Scholar
Bateni SM, Borghei SM, Jeng D-S. Neural network and neuro-fuzzy assessments for scour depth around bridge piers. Eng Appl Artif Intell. 2007;20:401–14. https://doi.org/10.1016/j.engappai.2006.06.012.
Article Google Scholar
Richardson E V, Davis SR. Evaluating scour at bridges. Hydraulic Engineering Circular No. 18 (HEC-18), Rep. No. FHWA-NHI 01-001. Washington, D.C. 2001.
Mitchener H, Torfs H. Erosion of mud/sand mixtures. Coast Eng. 1996;29:1–25. https://doi.org/10.1016/S0378-3839(96)00002-6.
Article Google Scholar
Owen MW. Erosion of Avonmouth Mud, Report [Hydraulics Research Station (Great Britain)]. 1975.
Thorn MFC, Parsons JG. Erosion of cohesive sediments in estuaries: an engineering guide. In: Proc Int Symp Dredg Technol. 1980. p. 349–58.
Lal R. Soil erosion and sediment transport research in tropical Africa. Hydrol Sci J. 1985;30:239–56. https://doi.org/10.1080/02626668509490987.
Article Google Scholar
Ivarson WR. Scour and erosion in clay soils. In: Stream Stability and Scour at Highway Bridges: Compendium of Stream Stability and Scour Papers Presented at Conferences Sponsored by the Water Resources Engineering Division of the American Society of Civil Engineers. ASCE; 1998. p. 104–19.
Briaud J-L, Chen H-C, Li Y, Nurtjahyo P, Wang J. Pier and contraction scour in cohesive soils. NCHRP Report 516. 2004.
Julian JP, Torres R. Hydraulic erosion of cohesive riverbanks. Geomorphology. 2006;76:193–206. https://doi.org/10.1016/j.geomorph.2005.11.003.
Article Google Scholar
Straub TD, Over TM. A pier and contraction scour prediction in cohesive soils at selected bridges in Illinois. 2010.
Kimiaghalam N, Clark SP, Ahmari H. An experimental study on the effects of physical, mechanical, and electrochemical properties of natural cohesive soils on critical shear stress and erosion rate. Int J Sediment Res. 2016;31:1–15. https://doi.org/10.1016/j.ijsrc.2015.01.001.
Article Google Scholar
Rahimnejad R, Ooi PSK. Factors affecting critical shear stress of scour of cohesive soil beds. Transp Res Rec J Transp Res Board. 2016;2578:72–80. https://doi.org/10.3141/2578-08.
Article Google Scholar
Wilson GV, Zhang T, Wells RR, Liu B. Consolidation effects on relationships among soil erosion properties and soil physical quality indicators. Soil Tillage Res. 2020;198:104550. https://doi.org/10.1016/j.still.2019.104550.
Article Google Scholar
Li M, Liu Q, Zhang H, Wells RR, Wang L, Geng J. Effects of antecedent soil moisture on rill erodibility and critical shear stress. CATENA. 2022;216:106356. https://doi.org/10.1016/j.catena.2022.106356.
Article Google Scholar
Yagisawa J, van Damme M, Pol JC, Bricker J. Verification of a predictive formula for critical shear stress with large scale levee erosion experiment. In: Proc 11th ICOLD European Club Symp: 2–4 October. 2019 Chania, Crete.
Briaud J-L, Shafii I, Chen H-C, Medina-Cetina Z. Relationship between erodibility and properties of soils. Washington DC: Transportation Research Board; 2019.
Google Scholar
Friedman JH. Multivariate adaptive regression splines. Ann Stat. 1991. https://doi.org/10.1214/aos/1176347963.
Article MathSciNet Google Scholar
Ferreira C. Gene expression programming in problem solving. In: Roy R, Köppen M, Ovaska S, Furuhashi T, Hoffmann F, editors. Soft computing and industry. London: Springer; 2002. p. 635–53. https://doi.org/10.1007/978-1-4471-0123-9_54.
Chapter Google Scholar
Ferreira C. Gene expression programming: a new adaptive algorithm for solving problems. 2001.
Emamgolizadeh S, Bateni SM, Shahsavani D, Ashrafi T, Ghorbani H. Estimation of soil cation exchange capacity using genetic expression programming (GEP) and multivariate adaptive regression splines (MARS). J Hydrol. 2015;529:1590–600. https://doi.org/10.1016/j.jhydrol.2015.08.025.
Article Google Scholar
Haghiabi AH. Prediction of river pipeline scour depth using multivariate adaptive regression splines. J Pipeline Syst Eng Pract. 2017. https://doi.org/10.1061/(ASCE)PS.1949-1204.0000248.
Article Google Scholar
Zhang W, Zhang Y, Goh ATC. Multivariate adaptive regression splines for inverse analysis of soil and wall properties in braced excavation. Tunn Undergr Sp Technol. 2017;64:24–33. https://doi.org/10.1016/j.tust.2017.01.009.
Article Google Scholar
Yu D. Developing multivariate adaptive regression splines model for predicting the undrained shear strength of clayey soil from cone penetration test data. Multiscale Multidiscip Model Exp Des. 2022;5:215–24. https://doi.org/10.1007/s41939-021-00113-6.
Article Google Scholar
Abed MS, Kadhim FJ, Almusawi JK, Imran H, Bernardo LFA, Henedy SN. Utilizing multivariate adaptive regression splines (MARS) for precise estimation of soil compaction parameters. Appl Sci. 2023;13:11634. https://doi.org/10.3390/app132111634.
Article Google Scholar
Goharzay M, Noorzad A, Ardakani AM, Jalal M. A worldwide SPT-based soil liquefaction triggering analysis utilizing gene expression programming and Bayesian probabilistic method. J Rock Mech Geotech Eng. 2017;9:683–93. https://doi.org/10.1016/j.jrmge.2017.03.011.
Article Google Scholar
Nawaz MN, Qamar SU, Alshameri B, Nawaz MM, Hassan W, Awan TA. A robust prediction model for evaluation of plastic limit based on sieve # 200 passing material using gene expression programming. PLoS ONE. 2022;17: e0275524. https://doi.org/10.1371/journal.pone.0275524.
Article Google Scholar
Jalal FE, Iqbal M, Ali Khan M, Salami BA, Ullah S, Khan H, et al. Indirect estimation of swelling pressure of expansive soil: GEP versus MEP modelling. Adv Mater Sci Eng. 2023;2023:1–25. https://doi.org/10.1155/2023/1827117.
Article Google Scholar
Storlie CB, Swiler LP, Helton JC, Sallaberry CJ. Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliab Eng Syst Saf. 2009;94:1735–63. https://doi.org/10.1016/j.ress.2009.05.007.
Article Google Scholar
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, 2nd Edn. Springer Science & Business Media; 2009.
Ferreira C. Gene expression programming, vol. 21. Berlin Heidelberg: Springer; 2006. https://doi.org/10.1007/3-540-32849-1.
Book Google Scholar
Ladd CC, Division MI of TSM. Strength Parameters and Stress-strain Behavior of Saturated Clays. Massachusetts Institute of Technology. 1971.
Kouakou NM, Cuisinier O, Masrouri F. Estimation of the shear strength of coarse-grained soils with fine particles. Transp Geotech. 2020;25:100407. https://doi.org/10.1016/j.trgeo.2020.100407.
Article Google Scholar
Kelly WE, Gularte RC. Erosion resistance of cohesive soils. J Hydraul Div. 1981;107:1211–24. https://doi.org/10.1061/JYCEAJ.0005743.
Article Google Scholar
Rahimnejad R, Ooi PSK. Model for the erosion rate curve of cohesive soils. Transp Res Rec J Transp Res Board. 2017;2657:19–28. https://doi.org/10.3141/2657-03.
Article Google Scholar

Download references

Acknowledgements

The financial support of the State of Hawaii Department of Transportation (HDOT) in cooperation with the Federal Highway Administration (FHWA) is greatly appreciated and acknowledged. The contents of this paper reflect the view of the authors, who are responsible for the facts and accuracy of the data presented. The contents do not necessarily reflect the official views or policies of HDOT or FHWA. The contents contained herein do not constitute a standard, specification or regulation.

Funding

This study has been made possible by the Hawaii Department of Transportation (HDOT) and Federal Highway Administration (FHWA) grants DOT-10-030 and DOT-08-004 to the University of Hawaii at Manoa.

Author information

Authors and Affiliations

Fugro USA Land, Inc, 1777 Botelho Drive, Suite 262, Walnut Creek, CA, 94621, USA
Reza Rahimnejad
University of Hawaii at Manoa, Holmes Hall 342, 2540 Dole Street, Honolulu, HI, 96822, USA
Hamid Reza Vosoughifar
Department of Civil, Environmental and Construction Engineering and Water Resources Research Center, University of Hawaii at Manoa, Holmes Hall 342, 2540 Dole Street, Honolulu, HI, 96822, USA
Sayed M. Bateni & Fatemeh Rezaie
Department of Civil, Environmental and Construction Engineering, University of Hawaii at Manoa, Holmes Hall 383, 2540 Dole Street, Honolulu, HI, 96822, USA
Phillip S. K. Ooi

Authors

Reza Rahimnejad
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Reza Vosoughifar
View author publications
You can also search for this author in PubMed Google Scholar
Sayed M. Bateni
View author publications
You can also search for this author in PubMed Google Scholar
Phillip S. K. Ooi
View author publications
You can also search for this author in PubMed Google Scholar
Fatemeh Rezaie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Reza Rahimnejad: writing—original draft, editing, Methodology; Hamid Reza Vosoughifar: review and editing; Sayed M. Bateni: supervision, funding acquisition, project administration; Phillip S.K. Ooi: writing and editing; Fatemeh Rezaie: writing—review and editing.

Corresponding authors

Correspondence to Sayed M. Bateni or Fatemeh Rezaie.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rahimnejad, R., Vosoughifar, H.R., Bateni, S.M. et al. Predicting critical shear stress using multivariate adaptive regression splines and genetic expression programming for cohesive soils on the Island of Oahu, Hawaii. Discov Appl Sci 6, 321 (2024). https://doi.org/10.1007/s42452-024-05965-4

Download citation

Received: 06 September 2023
Accepted: 15 May 2024
Published: 11 June 2024
DOI: https://doi.org/10.1007/s42452-024-05965-4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Predicting critical shear stress using multivariate adaptive regression splines and genetic expression programming for cohesive soils on the Island of Oahu, Hawaii

Abstract

Article Highlights

Similar content being viewed by others

Using Gene Expression Programming to Determine the Impact of Minerals on Erosion Resistance of Selected Cohesive Egyptian Soils

Prediction of Enhanced Soil–Anchored Geogrid Interactions in Direct Shear Mode Using Gene Expression Programming

Developing Prediction Equations for Soil Resilient Modulus Using Evolutionary Machine Learning

1 Introduction