Assessment of small strain modulus in soil using advanced computational models

Fan, Hongfei; Hang, Tianzhu; Song, Yujia; Liang, Ke; Zhu, Shengdong; Fan, Lifeng

doi:10.1038/s41598-023-50106-3

Assessment of small strain modulus in soil using advanced computational models

Article
Open access
Published: 18 December 2023

Volume 13, article number 22476, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Assessment of small strain modulus in soil using advanced computational models

Download PDF

Hongfei Fan²,
Tianzhu Hang²,
Yujia Song^1,3,
Ke Liang⁴,
Shengdong Zhu⁵ &
…
Lifeng Fan^1,3

497 Accesses
Explore all metrics

Abstract

Small-strain shear modulus ($G_0$) of soils is a crucial dynamic parameter that significantly impacts seismic site response analysis and foundation design. $G_0$ is susceptible to multiple factors, including soil uniformity coefficient ($C_u$), void ratio (e), mean particle size ($d_{50}$), and confining stress ($\sigma '$). This study aims to establish a $G_0$ database and suggests three advanced computational models for $G_0$ prediction. Nine performance indicators, including four new indices, are employed to calculate and analyze the model’s performance. The XGBoost model outperforms the other two models, with all three models achieving $R^2$ values exceeding 0.9, RMSE values below 30, MAE values below 25, VAF values surpassing 80%, and ARE values below 50%. Compared to the empirical formula-based traditional prediction models, the model proposed in this study exhibits better performance in IOS, IOA, a20-index, and PI metrics values. The model has higher prediction accuracy and better generalization ability.

Machine learning-based prediction of soil compression modulus with application of 1D settlement

Article 16 June 2020

Developing Prediction Equations for Soil Resilient Modulus Using Evolutionary Machine Learning

Article 16 October 2023

Optimized ANN-based approach for estimation of shear strength of soil

Article 06 June 2023

Introduction

The dynamic shear modulus (G) of the soil is a critical parameter in seismic site response analysis and dynamic foundation design. Its value decreases as the shear strain amplitude ($\gamma _\text{a}$) increases. When the $\gamma _\text{a}$ is less than $10^{-6}$, G is referred to as the small strain shear modulus ($G_0$). Many researchers have conducted studies on $G_0$ for sandy soils using cyclic triaxial tests, bending element tests, and resonant column tests. They have analyzed the influencing factors and proposed predictive models.

Resonant column experiments were employed to investigate the influencing factors on the $G_0$ of pure silica sand. The results indicate that $G_0$ exhibits a power-law growth relationship with increasing the confining stress ($\sigma '$) values. At constant $\sigma '$ values, $G_0$ decreases with the increase of the void ratio (e)^1,2,3. Similar findings have been observed in the studies conducted by multiple scholars^4,5,6,7,8,9.The Hardin model is the most commonly used $G_0$ prediction model^1,3:

$$\begin{aligned} G_0 = Af(e){\left( \frac{\sigma '_0}{P_a}\right) }^n \end{aligned}$$

(1)

where A and n are fitting parameters, f(e) is the function of e, and $P_\text{a}$ is reference stress, usually atmosphere pressure (i.e., 100 kPa).

Particle size distribution is one of the most important factors influencing the $G_0$ of sandy soils. Existing research studies often consider the uniformity coefficient ($C_\text{u}$) and the median particle size ($d_\text{50}$) as characteristic parameters of the particle size distribution when conducting a comprehensive analysis of the effect of particle size distribution on $G_0$.

The impact of particle gradation on the $G_0$ of sandy soils was investigated through resonant column experiments¹⁰. The findings reveal that $G_0$ increases with the rise of $C_\text{u}$ and $d_{50}$ when relative density ($D_\text{r}$) appear similar, with $d_\text{50}$ exerting more prominent influence compared to $C_\text{u}$. Furthermore, in the Hardin model, the parameter n escalates with the increase of $C_\text{u}$. In resonant column frequency sweep experiments conducted on uniformly graded natural sand, when $d_{50}$ is less than 1.8 mm, $G_0$ gradually increases with the enlargement of $d_{50}$¹¹. Experimental investigations were conducted using the same methodology on silica sands with varying gradations¹². The results reveal that when $d_{50}$ remains constant for both e and $\sigma '$, its variation does not significantly affect $G_0$. However, an increase in $C_\text{u}$ results in a decrease of $G_0$.Similar results have been observed in resonant column and bending element experiments with uniformly sized glass bead powders of different particle sizes⁷: when e remains constant, although $G_0$ decreases with increasing $d_{50}$, this decrease can be considered negligible.

The diversity of soil types and experimental methods has led to variations in predictive models for $G_0$. As a result, predicted results for different types of sandy soils often show significant differences when compared to actual measured data. Machine learning algorithms have the ability to model data based on a large amount of experimental data and adaptively adjust model parameters according to different soil characteristics. This approach allows for a more comprehensive consideration of various influencing factors, ultimately improving the comprehensiveness and accuracy of predictive models. Several researchers have already used machine learning algorithms to build predictive models in this regard.

Probabilistic machine learning has been integrated into predictions of desiccation cracks with uncertain input¹³. Six different machine learning algorithms were analyzed, and it was found that the enhanced XGBT model had greater predictive capacity. Additionally, multivariate robust regression analysis and exponential smoothing time-series analysis were used to address zeolite data, resulting in the development of a novel prediction process that correlates influential variables of zeolite–alkali activated sand¹⁴.The ANFSI, MLP, and MRM machine learning algorithms were used to develop predictions for $G_0$ and minimum damping ratio ($D_\text{min}$) of sand containing rubber particles¹⁵. The input parameters for these models are $\sigma '$, fiber content, and rubber content. Neural networks with 3-3-1 and 3-2-1 architectures showed good performance in predicting the damping ratio(D) and G of mica-sand mixtures¹⁶. When the Backpropagation Neural Net-work(BPNN) model was used to predict the G and D of marine clay¹⁷, the established model showed good predictive performance for marine clay at different strains and depths. In addition, a novel genetic expression programming model was used to predict the normalized shear modulus and damping ratio of sandy soils¹⁸.

Currently, traditional empirical formulas for predicting $G_0$ are constrained by the limitations imposed by the types of soils in the experimental data and the variety of experimental methods, resulting in limited generalization performance of prediction models. In contrast, machine learning algorithms have demonstrated excellent performance in handling large data sets and predicting soil performance indicators.

In this study, $G_0$ data from several published papers are used to build a G0 database for sandy soils. Three machine learning algorithms-BPNN, Genetic Algorithm-enhanced BP Neural Network (GA-BP), and Extreme Gradient Boosting algorithm (XGBoost) for their strong performance in soil prediction are selected to build prediction models. The input features for these models include soil grading characteristics, such as $C_\text{u}$ and $d_\text{50}$, and state parameters, such as $\sigma '$ and e. The goal is to obtain accurate predictions of the $G_0$ for different types of sandy soils. A comparative analysis is performed to evaluate the predictive performance of the three machine learning models for $G_0$ in sandy soils, and a comparison with traditional empirical relationship models is made to validate the effectiveness and accuracy of the machine learning models.

Data analysis and computational methods

Data collection and data analysis

In this study, a dataset of 1966 sets of $G_0$ values was obtained from 13 different literatures using various experimental methods such as resonant column tests, bending element tests, and torsional shear tests^{6,12,19,20,21,22,23,24,25,26,27,28,29}. The dataset includes a wide range of soil types and particle size distributions, including 481 sets of coral sand, 675 sets of silica sand, and 96 sets of sandy gravel obtained using three different sampling techniques.

According to the above factors affecting $G_0$ of the soil, $C_\text{u}$ and $d_{50}$, which can reflect the grading characteristics, e, which reflects the density state, and ${\sigma '}$, which reflects the stress state, are selected as input parameters in this paper. Supplement Table S1 provides essential information about the experiments conducted in each referenced source. Soil classification was performed according to ASTM 2487 standards to achieve uniformity. Specifically, soils with poor particle size distribution and those containing poorly graded gravels were designated SP. Sands with poor grain size distribution and the presence of silt were designated SP-SM. Sands with a well-graded particle size distribution and silt content were labeled SW-SM. Pure silt was classified as ML. Well graded gravels were identified as GW, while poorly graded gravels were identified as GP based on their particle size distribution and characteristics.

Each database has numerous data points spread across numerous rows and columns, which makes it challenging to comprehend. Descriptive statistics are thus generated for the database. In the present research, descriptive statistical parameters, mean, SE mean, StDev, variance, coefficient of variance, minimum, Q1, median, Q3, maximum, IQR, skewness and kurtosis have been calculated for overall, training, testing databases, as mentioned in Table 1^30,31,32. Table 1 demonstrates that the overall database contains a number of $C_\text{u}$, $d_{50}$, e, ${\sigma '}$ and $G_0$ in the range of 1.2–65.49, 0.02–10, 0.25–5.19, 20–700, 12.36–460.79. As a result, Fig. 1 depicts the frequency distribution of the database’s $C_\text{u}$, $d_{50}$, e, ${\sigma '}$ and $G_0$ variables. Before utilizing the database for training and testing purposes, the database has been preprocessed, and missing data and outliers have been removed and normalized by the min–max normalization function $=(x$-min)/(max-min), where x is the actual value.

Table 1 Results of multicollinearity analysis for the complete database.

Full size table

Applied soft computing approaches

Backpropagation neural network (BPNN) model

Several writers have employed the BPNN modelling approach^33,34, BPNN is a highly efficient and widely used artificial neural network model that consists of three main layers: the input layer, the hidden layer, and the output layer. Neurons in the input layer can directly interact to receive and process data, while the output layer is responsible for visualizing the results. The neurons in the hidden layer, while not directly visible, play a critical role in processing and transforming the information. Figure 2 shows a schematic of a three-layers BPNN.

The neural network consists of three layers: the input layer represented by $a_\text{m}$, the hidden layer represented by $b_\text{u}$, and the output layer represented by $c$ $_n$. Let $w_\text{mn}$ denote the connection weight between the $m$th neuron in the input layer and the uth neuron in the hidden layer, and let $v_\text{mn}$ denote the connection weight between the $u$th neuron in the hidden layer and the nth neuron in the output layer. Thus, we can obtain the expressions for the neurons in the hidden layer and the output layer as given in Eqs. (2) and (3):

$$\begin{aligned} b_\text{u}= & {} f\left(\sum _\text{u}{w_\text{mu}}{b_\text{u}}+p_\text{n}\right) \end{aligned}$$

(2)

$$\begin{aligned} c_\text{n}= & {} f\left(\sum _\text{n}{w_\text{mu}}{b_\text{u}}+p_\text{n}\right) \end{aligned}$$

(3)

the excitation function is the Sigmoid function, k_u is the threshold of the hidden layer neuron, p_n is the threshold of the output layer neuron. Each time the output value is compared with the desired output, if the mean square error does not meet the predetermined requirements, the back-propagation process is carried out and the mean square error is returned in the form of a gradient and assigned to each layer neuron. The process is repeated until the mean square error converges.

Genetic algorithm-enhanced backpropagation neural network (GA-BP) model

By integrating a genetic algorithm with the Backpropagation Neural Network (BPNN), this approach draws inspiration from natural selection processes found in the biological world. The combination of genetic algorithms with BPNN theoretically ensures that the selection of weights and thresholds becomes an optimisation process. Using the global optimisation advantages of genetic algorithms, the BPNN is extended to improve its global optimisation capabilities. In addition, improvements are made to the genetic algorithm by introducing mechanisms such as “Genetic monitoring” and “Life-death individual alternation”. These enhancements refine the performance of the genetic algorithm.

The “Genetic monitoring” mechanism involves determining the occurrence of genetic degradation by comparing the maximum individual fitness values between the $(n+1)^\text{th}$ and nth generations of a population. Let the maximum individual fitness value of the nth generation be denoted as $Infitmax _\mathrm {(n)}$. If the population fitness satisfies Eq. (4), it is identified as genetic degradation. In such cases, the “Life-death individual alternation” mechanism is activated to increase the diversity of individuals within the population, thereby reducing the probability of genetic degradation. If genetic degradation is not effectively controlled throughout the process, the individual with the highest fitness in the current genetic process is selected as the final outcome.

The formula for calculating individual fitness $Infit_\mathrm {(i)}$ is given in Eq. (5).

$$\begin{aligned}{} & {} {Infit_\text{max}}{(n+1)} < {Infit_\text{max}}{(n)} \end{aligned}$$

(4)

$$\begin{aligned}{} & {} \quad Infit\mathrm {(i)} = 1/{\sum _\mathrm {i=1}^{\text{p}}({y_\text{i}}-{o_\text{i}}})^2 \end{aligned}$$

(5)

where p is the number of output neurons; y_i is the desired output of the ith output neuron; o_i is the network output of the ith output neuron.

The “Life-death individual alternation” mechanism involves identifying individuals in the offspring population whose fitness is lower than the average fitness of that population as “deceased individuals”. These “deceased individuals” are then replaced by an equal number of randomly generated “newborn individuals”. The purpose of this mechanism is to preserve the best performing individuals in the current population, while introducing new individuals to prevent the population from becoming less diverse. It helps the algorithm to break out of the current genetic degradation deadlock and prevents it from getting stuck in a “local optimum”. This mechanism is designed to maintain a balance between preserving promising individuals and introducing new genetic material to increase the genetic diversity of the population.

The GA-BP model uses a genetic algorithm during the training process, which uses selection, crossover and mutation operations to perpetuate genetic information. It continuously activates the BP neural network to calculate fitness values for each generation of the population. Throughout this process, the ’Genetic monitoring mechanism’ and the ’Life-death individual alternation mechanism’ come into play until the genetic constraints are met. The individual obtained through genetic evolution is then compared with the individual with the highest fitness value throughout the genetic process. The one with the higher fitness is decoded and assigned to the BP neural network for local optimisation. This process continues until the network’s output meets the required error limits and other criteria, at which point the algorithm is completed.

Extreme gradient boosting(XGBoost) model

The Extreme Gradient Boosting algorithm(XGBoost), is an ensemble machine learning algorithm based on decision trees that works within the gradient boosting framework. This algorithm combines several basic learners to create a powerful model. XGBoost is an efficient implementation of the Gradient Boosting Decision Trees (GBDT) algorithm and optimises various aspects of GBDT, including the objective function, the optimisation method, the handling of missing values and the prevention of overfitting^35,36.

When XGBoost is running, it first trains a basic learner on the training data set. The results produced by this learner are then adjusted based on the training samples, and the next learner is trained using these adjusted samples. This iterative process continues for several rounds until the number of base learners reaches a predefined value. Finally, all the base learners are combined and the computation proceeds as follows:

(1) Constructing Boosting Models:

$$\begin{aligned} {f({X_\text{i}})} = \sum _\mathrm {k=1}^{\text{k}}{f_\text{k}}({X_\text{i}}),{f_\text{k}} \in F \end{aligned}$$

(6)

where F is the set of all regression trees.

(2) Training objective function:

$$\begin{aligned} {L^{(\text{t})}} = \sum _\mathrm {i=1}^{\text{n}}l({y_\text{i}},{y^{(\mathrm {t+1})}}+{f_\text{t}}({X_\text{i}}))+\Omega {(f_\text{t})} \end{aligned}$$

(7)

where “$\sum _\mathrm {i=1}^{\text{n}}l({y_\text{i}},{y^{(\mathrm {t+1})}}+{f_\text{t}}({X_\text{i}}))$” is the loss function and “$\Omega {(f_\text{t})}$” is the sum of all regularisation terms.

(3) The loss function is subjected to a second order Taylor expansion and the final objective function is obtained:

$$\begin{aligned} Obj(\text{t}) = \sum _\mathrm {i=1}^{\text{n}}[({\sum _\mathrm {i\in {I_\text{i}}}}g_\text{i}){w_\text{j}}+{\frac{1}{2}}({\sum _\mathrm {i\in {I_\text{i}}}}h_\text{i}+\lambda ){{w_\text{j}}^\text{2}}]+{\gamma }T \end{aligned}$$

(8)

Performance evaluation

The performance of the machine learning models was assessed using several metrics. The mathematical formulation of the performance metrics is as follows^37,38,39:

Coefficient of determination ($R^2$)

$$\begin{aligned} R^2 = 1 - \frac{{\sum \limits _{\mathrm {i = 1}}^{\text{n}} {{{\left( {{y_\text{i}} - {{{\hat{y}}}_\text{i}}} \right) }^2}} }}{{\sum \limits _{\mathrm {i = 1}}^{\text{n}} {{{\left( {{y_\text{i}} - {{{\bar{y}}}_\text{i}}} \right) }^2}} }} \end{aligned}$$

(9)

Absolute relative error(ARE)

$$\begin{aligned} \text{ARE}= \left| {\frac{{\left( {{y_\text{i}} - {{{\hat{y}}}_\text{i}}} \right) }}{{{y_\text{i}}}}} \right| \times 100\% \end{aligned}$$

(10)

Root Mean Square Error (RMSE)

$$\begin{aligned} \text{RMSE }= \sqrt{\frac{1}{\text{n}} \times {{\sum \limits _{\mathrm {i = 1}}^{\text{n}} {\left( {{y_\text{i}} - {{{\hat{y}}}_\text{i}}} \right) } }^2}} \end{aligned}$$

(11)

Mean Absolute Error (MAE)

$$\begin{aligned} \text{MAE} = \frac{1}{\text{n}} \times \sum \limits _{i = 1}^n {\left| {{y_\text{i}} - {{{\hat{y}}}_\text{i}}} \right| } \end{aligned}$$

(12)

Variance Accounted For (VAF)

$$\begin{aligned} \text{VAF} = \left[ {1 - \frac{{\text{var}\left( {{y_\text{i}} - {{{\hat{y}}}_\text{i}}} \right) }}{{\text{var}\left( {{y_\text{i}}} \right) }}} \right] \times 100 \end{aligned}$$

(13)

a-20 index

$$\begin{aligned} \mathrm {a-20 index} = \frac{m20}{H} \end{aligned}$$

(14)

Index of Agreement (IOA)

$$\begin{aligned} IOA = 1 - \frac{{\sum \limits _{\mathrm {i = 1}}^{\text{n}} {{{\left| {{{\hat{y}}}_\text{i}}-{{y_\text{i}}} \right| }}} }}{2{\sum \limits _{\mathrm {i = 1}}^{\text{n}} {{{\left| {{y_\text{i}} - {{{\bar{y}}}_\text{i}}} \right| }}} }} \end{aligned}$$

(15)

Index of Scatter (IOS)

$$\begin{aligned} IOS = \frac{RMSE}{Avg.\,of\,Actual\,Values} \end{aligned}$$

(16)

Performance Index (PI)

$$\begin{aligned} PI = R^2+ (VAF/100)-RMSE \end{aligned}$$

(17)

where ${y_\text{i}}$ is the measured value, ${{\bar{y}}}_\text{i}$ is the mean value of $y_\text{i}$, ${{\hat{y}}}_\text{i}$is the predicted value of $y_\text{i}$, m20 is the ratio of experimental to the predicted value (0.8–1.2), H is the total number of data samples, and n is the number of predicted samples.perfect predictive model always has a performance equal to the ideal value mentioned in Table 2.

Table 2 Ideal value of the different performance indicators.

Full size table

Sensitivity analysis

Sensitivity analysis identifies variables that will most significantly impact predictions. There are global and local forms of sensitivity analysis. Various techniques are used to conduct sensitivity analysis, including the cosine amplitude technique applied in this study. The mathematical expression for the cosine amplitude method is^40,41:

$$\begin{aligned} SS = \frac{{\sum \limits _{c = 1}^n {({X_{ic}}*{X_{jk}})} }}{{\sqrt{\sum \limits _{c = 1}^n {{X_{ic}}^2*\sum \limits _{c = 1}^n {{X_{jk}}^2} } } }} \end{aligned}$$

(18)

where $X_{ic}$ is input parameters $C_\text{u}$, $d_{50}$, e, and $\sigma '$, and $X_{jk}$ is output parameter $G_0$. A strong influencing input variable always has a SS value near one. In this study, 666 data points have been collected from the field. The sensitivity analysis result has been drawn, as depicted in Fig. 3.

Figure 3 illustrates that the input parameters, such as $C_\text{u}$, $d_{50}$, e, and $\sigma '$, highly influence $G_0$ prediction. Comparing all input variables, the $C_\text{u}$ variable (= 0.59) influences the $G_0$ prediction less than other input variables.

Results and discussion

Modeling

Three methods BPNN,GA-BP and XGB were used to model the prediction of $G_0$.The input parameters for these models included four soil-related factors: $C_\text{u}$, $d_{50}$, e and $\sigma '$. The output parameter selected for prediction was $G_0$. Around a third of the data, 666 data points in total, were chosen from the database to be the training set, while the remaining 1300 data points were designated as the test set. The minimum and maximum values and other descriptive statistics of the data in the training set align well with the database and accurately represent the data contained within.

Both the BPNN and GA-BP models used 12 hidden layer neurons, the learning functions are all chosen as trainlm functions, and the transfer functions between the input layer and the implicit layer, and between the implicit layer and the output layer are all chosen as Sigmoid functions.In the case of the XGBoost model, a regression model using XGBRegressor was used. This model was configured with a maximum tree depth of 6, 100 base learners and a learning rate set to 0.1, the rest of the parameters are default values.

Figure 4 illustrates the comparison between the predicted values of $G_0$ by different prediction models and the actual measured values on the training set. It can be seen that each model can accurately predict the dynamic shear modulus of the soil during training.

Furthermore, the data points for all three models are evenly distributed on both sides of the 45 degree line, indicating reasonably good predictions. Of the models, the XGBoost model shows the best training performance with $R^2$ of 0.99. The ARE is consistently below 10%, with 98% of data points having ARE values below 5%.The GA-BP model comes next with $R^2$ of 0.97 and approximately 67% of the data points have ARE values below 25%. However, the BPNN model has the worst training performance with an $R^2$ of 0.94. Its predictions are less accurate, as only 55% of the data points in the training sets have ARE values below 25%. Consequently, the predictive performance of the BPNN model is comparatively inferior to the other two models. Overall, the XGBoost model shows the highest accuracy in predicting $G_0$ of the training set.

Model performance analysis

Figure 5 shows a comparison of the predictions made by different prediction models for the $G_0$ on the test data set with the actual measured values. Table 3 shows the performance metrics for the different models. In the figure, the scatter points represent the model predictions for the samples in the test data set. The figure shows that the scatter points for all three models are evenly distributed around the diagonal line (Y = X), indicating that the models have achieved good predictive performance for $G_0$.

Table 3 Different model performance metrics.

Full size table

The model with the highest prediction accuracy is the XGBoost model, closely fol-lowed by the GA-BP model, which has slightly lower prediction performance compared to XGBoost. On the other hand, the BPNN model has the worst prediction performance. As can be seen in Fig. 5a, the scatter points of the predictions of the BPNN model are more spread out, indicating less accuracy in its predictions.

Figure 6 shows the distribution of the ARE for the predictions made by the different models for each dataset in the database compared to the actual measured values. It can be observed that the ARE for all three models is mainly concentrated below 40%.

For the BPNN model, the proportions of data points with ARE less than 10%, 20%, 30% and 40% are 36.0%, 64.9%, 82.8% and 88.0%, respectively. For the GA-BP model, the pro-portions of data points with ARE less than 10%, 20%, 30% and 40% are 47.9%, 74.1%, 89.3% and 93.5%, respectively. The XGBoost model has the highest accuracy, with the propor-tions of data points with ARE less than 10%, 20%, 30% and 40% being 75.8%, 96.4%, 100% and 100% respectively.

The XGBoost model has relatively low ARE values, indicating excellent predictive performance. The GA-BP model slightly outperforms the BPNN model in terms of prediction. As shown in Table 3, the evaluation metrics for the XGBoost model are consistently superior to those of the BPNN and GA-BP models. In summary, these results suggest that the XGBoost model performs best in predicting the $G_0$, while the BPNN model has inferior prediction performance and the GA-BP model also provides good predictions for $G_0$.

Model performance evaluation

$G_0$ is primarily determined by experiment and empirical equations. Experimental measurements of $G_0$ are subject to many influences, making it difficult to determine its value effectively. Empirical formulae have the advantage of being simple to calculate and easy to use.

In order to evaluate the performance of the computational models, the $G_0$ prediction models proposed by Wichtmann¹² and Liu et al.⁴² for the calculation of $G_0$ of the soil were selected. Liu et al.⁴² provided an empirical relationship formula for $G_0$ in sandy soils, which takes into account the influence of particle size distribution, based on experimental results of quartz sand and volcanic sand:

$$\begin{aligned} G_0 = 108.56{C_\text{u}}^{ - 0.42}\frac{{{{(2.17 - e)}^2}}}{{1 + e}} {\left( \frac{\sigma '_0}{P_\text{a}}\right) } ^{0.36{C_\text{u}}^{0.32}} \end{aligned}$$

(19)

Wichtmann and Triantafyllidis based on the results of the silica sand $G_0$ tests, the similar $G_0$ prediction equations are given taking into account the effect of grading:

$$\begin{aligned} G_0 = (108.56 + 0.313 {C_\text{u}}^{2.98})\frac{{{{(1.94e^{ -0.066C_\text{u}} - e)}^2}}}{{1 + e}} {\left( \frac{\sigma '_0}{P_a}\right) } ^{0.36{C_\text{u}}^{0.32}} \end{aligned}$$

(20)

where $C_\text{u}$ is the uniformity coefficient, e is the void ratio, $\sigma '$ is the enclosing pressure, and $P_\text{a}$ is reference stress, usually atmosphere pressure (i.e., 100 kPa).

The above two mentioned empirical formulas were used to predict $G_0$ for different types of soils in the database, the comparison between the predicted results and the actual values is shown in Fig. 7. The performance metrics for the traditional empirical formula models and the computational models in predicting the various soil types in the database are summarised in Table 4.

Table 4 Performance metrics of predictive models for predicting data in databases.

Full size table

From Fig. 7. it can be seen that the models proposed by Liu’ s and Wichtmann’s models show significant deviations from the actual values of $G_0$ in the database. Liu ’s model shows that 84.09% of the predicted results have an ARE of less than 50%, while Wichtmann and Triantafyllidis’s model has 63.55% of the results with ARE values of less than 50%. The predicted data are widely scattered, indicating a lower prediction accuracy.

Table 4 shows that the BPNN model, the GA-BP model and the XGBoost model all have R2 greater than 0.95, RMSE less than 30, MAE less than 25 and VAF greater than 80%. In contrast, the traditional empirical formulas provide lower performance metrics across all indicators compared to the three artificial intelligence algorithms, indicating poorer predictive performance.

In summary, the XGBoost model has the highest prediction accuracy for predicting soil $G_0$, followed by the GA-BP model and the BPNN model, while the empirical formula models have the lowest prediction accuracy. Therefore, when performing $G_0$ calculations for soil, it is advisable to select the XGBoost model for estimation.

Conclusions

The present study has been carried out to introduce exact and dependable computational models for forecasting the $G_0$ of sandy soil. A database of 1966 $G_0$ datasets is established, and 666 of them are extracted as a training set. Three computational models, specifically, BPNN, GA-BP, and XGBoost, are introduced to compare the prediction results with those of the traditional empirical formulas. The research verifies the machine learning model’s superiority for forecasting $G_0$. The study concludes that the utilization of advanced computational models is beneficial for predicting $G_0$. The following conclusions are drawn:

(1)
Comparing the performance of the three machine learning models, the XGBoost model exhibits the strongest predictive effect for $G_0$, while the GA-BP model, optimized using genetic algorithms, generates more precise prediction results than the BPNN model.
(2)
The analysis of sensitivity towards four input variables—$C_\text{u}$, $d_{50}$, e, and $\sigma '$—using the cosine amplitude method indicated that $\sigma '$ had the most significant impact on $G_0$ prediction. After that, the variable e showed a noticeable impact, while $d_{50}$ and $C_\text{u}$ demonstrated lower sensitivities.
(3)
Comparing the prediction outcomes of machine learning models and traditional empirical formulas, performance metrics such as IOS and IOA demonstrate the former’s superior generalization ability and more accurate prediction performance. Among the machine learning models evaluated, the XGBoost model achieves the best performance in terms of predicting $G_0$.

In summary, this study presents three machine learning models for predicting the $G_0$ of sandy soils. The results show that the XGBoost model performs exceptionally well in predicting $G_0$ in soils, exceeding the predictive power of conventional empirical formulas. However, the machine learning models utilized in this study are limited by the relatively small dataset in the database, resulting in less accurate predictions and restricted prediction accuracy when the models are applied on a larger scale. Hence, it is advisable to integrate Generative Adversarial Network (GAN) algorithms to expand the database, enhance the model’s ability to generalize, and improve predictive accuracy. An efficient and accurate prediction model for $G_0$ is of great significance to engineers working in research areas, including seismic engineering.

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information file.

References

Hardin, B. O. & Richart, J. F. E. Elastic wave velocities in granular soils. J. Soil Mech. Found. Div. 89, 33–65. https://doi.org/10.1061/JSFEAQ.0000493 (1963).
Article Google Scholar
Hardin, B. O. & Black, W. L. Sand stiffness under various triaxial stresses. J. Soil Mech. Found. Div. 99, 27–42. https://doi.org/10.1061/JSFEAQ.0000865 (1966).
Article Google Scholar
Hardin, B. O. & Drnevich, V. P. Shear modulus and damping in soils: Design equations and curves. J. Soil Mech. Found. Div. 98, 667–692. https://doi.org/10.1061/JSFEAQ.0001760 (1972).
Article Google Scholar
Iwasaki, T. & Tatsuoka, F. Effects of grain size and grading on dynamic shear moduli of sands. Soils Found. 17, 19–35. https://doi.org/10.3208/sandf1972.17.319 (1977).
Article Google Scholar
Seed, H. B., Wong, R. T., Idriss, I. & Tokimatsu, K. Moduli and damping factors for dynamic analyses of cohesionless soils. J. Geotech. Eng. 112, 1016–1032. https://doi.org/10.1061/(ASCE)0733-9410(1986)112:11(1016) (1986).
Article Google Scholar
Youn, J.-U., Choo, Y.-W. & Kim, D.-S. Measurement of small-strain shear modulus g max of dry and saturated sands by bender element, resonant column, and torsional shear tests. Can. Geotech. J. 45, 1426–1438. https://doi.org/10.1139/T08-069 (2008).
Article Google Scholar
Yang, J. & Gu, X. Shear stiffness of granular material at small strains: Does it depend on grain size?. Geotechnique 63, 165–179. https://doi.org/10.1680/geot.11.P.083 (2013).
Article Google Scholar
Wichtmann, T., Hernández, M. N. & Triantafyllidis, T. On the influence of a non-cohesive fines content on small strain stiffness, modulus degradation and damping of quartz sand. Soil Dyn. Earthq. Eng. 69, 103–114. https://doi.org/10.1016/j.soildyn.2014.10.017 (2015).
Article Google Scholar
Payan, M., Khoshghalb, A., Senetakis, K. & Khalili, N. Effect of particle shape and validity of gmax models for sand: A critical review and a new expression. Comput. Geotech. 72, 28–41. https://doi.org/10.1016/j.compgeo.2015.11.003 (2016).
Article Google Scholar
Menq, F.-Y. Dynamic Properties of Sandy and Gravelly Soils (The University of Texas at Austin, 2003).
Google Scholar
Lontou, P. & Nikolopoulou, C. Effect of Particle Shape and Validity of gmax Models for Sand: A Critical Review and a New Expression (Department of Civil Engineering, University of Patras, 2004).
Google Scholar
Wichtmann, T. & Triantafyllidis, T. Influence of the grain-size distribution curve of quartz sand on the small strain shear modulus $g_max$. J. Geotech. Geoenviron. Eng. 135, 1404–1418. https://doi.org/10.1061/(ASCE)GT.1943-5606.0000096 (2009).
Article Google Scholar
Jamhiri, B., Xu, Y., Shadabfar, M. & Costa, S. Probabilistic machine learning for predicting desiccation cracks in clayey soils. Bull. Eng. Geol. Env. 82, 355. https://doi.org/10.1007/s10064-023-03366-2 (2023).
Article Google Scholar
Jamhiri, B., Jalal, F. E. & Chen, Y. Hybridizing multivariate robust regression analyses with growth forecast in evaluation of shear strength of zeolite-alkali activated sands. Multiscale Multidiscip. Model. Exp. Des. 5, 317–335. https://doi.org/10.1007/s41939-022-00120-1 (2022).
Article Google Scholar
Akbulut, S., Hasiloglu, A. S. & Pamukcu, S. Data generation for shear modulus and damping ratio in reinforced sands using adaptive neuro-fuzzy inference system. Soil Dyn. Earthq. Eng. 24, 805–814. https://doi.org/10.1016/j.soildyn.2004.04.006 (2004).
Article Google Scholar
Cabalar, A. F. & Cevik, A. Modelling damping ratio and shear modulus of sand-mica mixtures using neural networks. Eng. Geol. 104, 31–40. https://doi.org/10.1016/j.enggeo.2008.08.005 (2009).
Article Google Scholar
Wu, Q., Wang, Z., Qin, Y. & Yang, W. Intelligent model for dynamic shear modulus and damping ratio of undisturbed marine clay based on back-propagation neural network. J. Mar. Sci. Eng. 11, 249. https://doi.org/10.3390/jmse11020249 (2023).
Article Google Scholar
Keshavarz, A. & Mehramiri, M. New gene expression programming models for normalized shear modulus and damping ratio of sands. Eng. Appl. Artif. Intell. 45, 464–472. https://doi.org/10.1016/j.engappai.2015.07.022 (2015).
Article Google Scholar
Giang, P. H. H., Van Impe, P. O., Van Impe, W. F., Menge, P. & Haegeman, W. Small-strain shear modulus of calcareous sand and its dependence on particle characteristics and gradation. Soil Dyn. Earthq. Eng. 100, 371–379. https://doi.org/10.1016/j.soildyn.2017.06.016 (2017).
Article Google Scholar
Shi, J., Haegeman, W. & Xu, T. Effect of non-plastic fines on the anisotropic small strain stiffness of a calcareous sand. Soil Dyn. Earthq. Eng. 139, 106381. https://doi.org/10.1016/j.soildyn.2020.106381 (2020).
Article Google Scholar
Liu, X., Li, S. & Sun, L. The study of dynamic properties of carbonate sand through a laboratory database. Bull. Eng. Geol. Env. 7, 3843–3855. https://doi.org/10.1007/s10064-020-01785-z (2020).
Article CAS Google Scholar
Zhou, L. Experimental Study on Dynamic Shear Modulus and Damping Ratio of Calcareous Sand in South China Sea (Guangzhou University, 2020).
Google Scholar
Senetakis, K., Anastasiadis, A. & Pitilakis, K. The Small-Strain Shear Modulus and Damping Ratio of Quartz and Volcanic sands (ASTM International USA, 2012).
Book Google Scholar
Liu, X., Zou, D., Liu, J. & Zheng, B. Predicting the small strain shear modulus of coarse-grained soils. Soil Dyn. Earthq. Eng. 141, 106468. https://doi.org/10.1016/j.soildyn.2020.106468 (2021).
Article Google Scholar
Liang, K., Chen, G., Du, X., Xu, C. & Yang, J. A unified formula for small-strain shear modulus of sandy soils based on extreme void ratios. J. Geotech. Geoenviron. Eng. 149, 04022127. https://doi.org/10.1061/JGGEFK.GTENG-1091 (2023).
Article Google Scholar
Jafarian, Y. & Javdanian, H. Dynamic properties of calcareous sand from the persian gulf in comparison with siliceous sands database. Int. J. Civil Eng. 18, 245–249. https://doi.org/10.1007/s40999-019-00402-9 (2020).
Article Google Scholar
Sahaphol, T. & Miura, S. Shear moduli of volcanic soils. Soil Dyn. Earthq. Eng. 25, 157–165. https://doi.org/10.1016/j.soildyn.2004.10.001 (2005).
Article Google Scholar
Cai, Y., Dong, Q., Wang, J., Gu, C. & Xu, C. Measurement of small strain shear modulus of clean and natural sands in saturated condition using bender element test. Soil Dyn. Earthq. Eng. 76, 100–110. https://doi.org/10.1016/j.soildyn.2014.12.013 (2015).
Article Google Scholar
Guo-xing, C., Su-yu, S., Qi, W. & Tian-zhu, H. Shear wave velocity-based new procedure for assessing seismic liquefaction triggering of sand-gravel soils. Chin. J. Geotech. Eng. 44, 1763–1771 (2022).
Google Scholar
Khatti, J., Grover, K. S., Kim, H.-J., Mawuntu, K. B. A. & Park, T.-W. Prediction of ultimate bearing capacity of shallow foundations on cohesionless soil using hybrid lstm and rvm approaches: An extended investigation of multicollinearity. Comput. Geotech. 165, 105912. https://doi.org/10.1016/j.compgeo.2023.105912 (2024).
Article Google Scholar
Hosseini, S. et al. Assessment of the ground vibration during blasting in mining projects using different computational approaches. Sci. Rep. 13, 18582. https://doi.org/10.1038/s41598-023-46064-5 (2023).
Article ADS PubMed PubMed Central CAS Google Scholar
Khatti, J., Samadi, H. & Grover, K. S. Estimation of settlement of pile group in clay using soft computing techniques. Geotechn. Geol. Eng.https://doi.org/10.1007/s10706-023-02643-x (2023).
Article Google Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536. https://doi.org/10.1038/323533a0 (1986).
Article ADS Google Scholar
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79, 2554–2558. https://doi.org/10.1073/pnas.79.8.2554 (1982).
Article ADS MathSciNet PubMed PubMed Central CAS Google Scholar
Xu, D. et al. Real-time road traffic state prediction based on kernel-knn. Transp. A: Transp. Sci. 16, 104–118. https://doi.org/10.1080/23249935.2018.1491073 (2020).
Article Google Scholar
Maidstone, R., Hocking, T., Rigaill, G. & Fearnhead, P. On optimal multiple changepoint algorithms for large data. Stat. Comput. 27, 519–533. https://doi.org/10.1007/s11222-016-9636-3 (2017).
Article MathSciNet PubMed Google Scholar
Khatti, J. & Grover, K. S. Prediction of ucs of fine-grained soil based on machine learning part 2: comparison between hybrid relevance vector machine and gaussian process regression. Multiscale Multidiscip. Model. Exp. Des.https://doi.org/10.1007/s41939-023-00191-8 (2023).
Article Google Scholar
Khatti, J. & Grover, K. S. Prediction of compaction parameters for fine-grained soil: Critical comparison of the deep learning and standalone models. J. Rock Mech. Geotech. Eng.https://doi.org/10.1016/j.jrmge.2022.12.034 (2023).
Article Google Scholar
Khatti, J. & Grover, K. S. Assessment of fine-grained soil compaction parameters using advanced soft computing techniques. Arab. J. Geosci. 16, 208. https://doi.org/10.1007/s12517-023-11268-6 (2023).
Article Google Scholar
Khatti, J. & Grover, K. S. Cbr prediction of pavement materials in unsoaked condition using lssvm, lstm-rnn, and ann approaches. Int. J. Pavement Res. Technol.https://doi.org/10.1007/s42947-022-00268-6 (2023).
Article Google Scholar
Khatti, J. & Grover, K. S. Prediction of ucs of fine-grained soil based on machine learning part 1: Multivariable regression analysis, gaussian process regression, and gene expression programming. Multiscale Multidiscip. Model. Exp. Des.https://doi.org/10.1007/s41939-022-00137-6 (2023).
Article Google Scholar
Liu, X., Yang, J., Wang, G. & Chen, L. Small-strain shear modulus of volcanic granular soil: An experimental investigation. Soil Dyn. Earthq. Eng. 86, 15–24. https://doi.org/10.1016/j.soildyn.2016.04.005 (2016).
Article Google Scholar

Download references

Acknowledgements

This research was funded by Inner Mongolia Autonomous Reogion University Young Talents Support Project, grant number NJYT23045; Jiangsu Province Graduate Research and Practice In-novation Program Project, grant number KYCX23_1462.

Author information

Authors and Affiliations

Transportation Institute, Inner Mongolia University, Hohhot, 010021, China
Yujia Song & Lifeng Fan
Institute of Geotechnical Engineering, Nanjing Tech University, Nanjing, 211816, China
Hongfei Fan & Tianzhu Hang
Intelligent Transportation Equipment Inner Mongolia Autonomous Region Engineering Research Center, Hohhot, 010021, China
Yujia Song & Lifeng Fan
Department of Civil Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Ke Liang
Knowledge Management Department, Fujian Yongfu Power Engineering Co., Ltd., Fuzhou, 350000, China
Shengdong Zhu

Authors

Hongfei Fan
View author publications
You can also search for this author in PubMed Google Scholar
Tianzhu Hang
View author publications
You can also search for this author in PubMed Google Scholar
Yujia Song
View author publications
You can also search for this author in PubMed Google Scholar
Ke Liang
View author publications
You can also search for this author in PubMed Google Scholar
Shengdong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lifeng Fan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.H.F. carried out the experiments analyzed the data, and wrote the paper; H.T.Z. designed the experiments, carried out the rest of the experiments and analyzed the data; S.Y.J.provided experimental method support; L.k. analyzed the data; Z.S.D. revised the paper; F.L.F provided research ideas and funded the publication of articles.

Corresponding author

Correspondence to Lifeng Fan.

Ethics declarations

Competing interests

Author Shengdong Zhu is employed by Fujian Yongfu Power Engineering Co., Ltd.The remaining authors declare that the research was conducted in the absence of any commercial or financial relations.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Table S1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fan, H., Hang, T., Song, Y. et al. Assessment of small strain modulus in soil using advanced computational models. Sci Rep 13, 22476 (2023). https://doi.org/10.1038/s41598-023-50106-3

Download citation

Received: 21 October 2023
Accepted: 15 December 2023
Published: 18 December 2023
DOI: https://doi.org/10.1038/s41598-023-50106-3
Springer Nature Limited

Assessment of small strain modulus in soil using advanced computational models

Abstract

Similar content being viewed by others

Machine learning-based prediction of soil compression modulus with application of 1D settlement

Developing Prediction Equations for Soil Resilient Modulus Using Evolutionary Machine Learning

Optimized ANN-based approach for estimation of shear strength of soil

Introduction