1 Introduction

Soft clay accounts for a high proportion of land in coastal areas and river deltas across the world [1,2,3]. The characteristics of soft clay are high plasticity, high natural water content (even higher than the liquid limit), low shear strength (< 40 kPa), and high compressibility [4]. Such a kind of soft ground is insufficient for supporting the heavy loads; it leads to high settlement and affects the stability of infrastructures [5]. Hence, dealing with soft clay has been challenging for geotechnical engineers [1].

Soil stabilization is one of the most popular and effective soil improvement techniques [2]. Cement, lime, fly ash, and blast-furnace slag are the common additives used to mix with soft clay [6] to improve the workability and compaction characteristics, increase the shear strength of the soil, and reduce the settlement of the ground [7]. Due to such advantages, soil stabilization with chemical binders has been applied widely in many countries [8,9,10].

It has been reported that some factors could affect the characteristics of soil stabilization, including the soil characteristics, the type of binders, the binder content, the water content, the mixing method, the curing time, and others. Soil characteristics, such as soil type, organic content, grain size distribution, pH, and natural water content, affect the ultimate strength of the soil mixing column [10]. For soft soils, the natural water content is also an important factor. If the water content is higher than the liquid limit, the strength of the stabilized soil will decrease [10]. Moreover, the amount of binder added plays an essential role in developing the final strength of stabilized soil [10]. By raising the amount of the stabilization agent, the UCS of the stabilized soil increases, and the permeability decreases [10,11,12]. Besides, alternative pozzolans, such as fly ash, lime, and blast-furnace slag, have been used in soil stabilization. These pozzolans can increase the strength and reduce the permeability of treated soil [13, 14]. Furthermore, the UCS of soil–cement material is affected by the mixing method and the curing time [10].

Previously, choosing the binder type and the desired amount of binder has been conducted by creating and testing a thousand trial specimens. The step requires huge effort with high cost and time. Especially, this process has to conduct separately for each project with different input parameters. Therefore, some studies have suggested predictive models for the UCS of stabilized soil based on the common input variables, such as the water to binder ratio, the binder content, and the curing time [15,16,17,18,19,20].

The normalized empirical models were used to develop the predictive equations based on the experimental data. Abrams' law applied in concrete technology shows that the strength of hardened concrete could be predicted through the water to cement ratio. Liu [21], Horpibulsk [2], and Cong [3] applied the Abrams' law in predicting the strength of soil–cement material based on the ratio of clay water content (including natural water in clay and water in cement slurry) and cement content. Horpibulsk [2], Tsuchida [22], and Yao [1] also suggested new formats of the predictive formulas using the empirical models.

However, according to Narendra [5], there are some limitations in the empirical models. Firstly, these models are usually developed based on several assumptions, simplifications, and approximations. The normalized empirical models are developed based on a small volume of experimental data points, which could be less accurate and not valid for applying other clay conditions. Furthermore, the empirical formula provides significant errors when applied to similar soil properties. In addition, most predictive formulas consider a few variables, such as the cement content, the water to cement ratio, and the curing time, while the effects of soil characteristics and binder types have not been examined.

It can be seen that an advantage predictive model that meets these requirements has not been found yet. Furthermore, there are limited published studies considering the effect of several types of binder additives on the strength of stabilized clay. As a result, developing a reliable model which could apply to a wide range of clay conditions and consider the effects of common binders by reliable modeling technologies, such as artificial neural network and genetic programming, is beneficial.

Artificial neural network (ANN) is a problem-solving algorithm that simulates the structure of the human brain. In terms of chemical soil stabilization, there have been some studies using the ANN models for predicting the UCS value, such as Das [15], Tinoco [23], Sunitsakul [17], Abbey [18], Ghorbani [19], and Saadat [20]. The predictive formulas developed based on the ANN models are more accurate than nonlinear multivariable regression or multiple regression analyses with high performance [24]. However, ANN is considered to be a “black box” program. The predictive equations are developed based on the complex transfer functions, such as logistic sigmoid and hyperbolic tangent sigmoid functions. As a result, the ANN-based predictive functions are limited in their application as they cannot be used conveniently to calculate the output using the input values [24].

Genetic programming (GP) is a kind of supervised machine learning technique that applies the principles of Darwin's evolution theory [25]. It is another alternative approach to behavior modeling. Gene-expression programming (GEP) is a branch of GP that develops a solution to a problem using a computer program [26], and it is the method that has been used commonly in geotechnical engineering [27]. In GEP, populations are also selected based on fitness function and presented with a gene through several operators [28]. GEP is able to make strong predictive functions without a preliminary assumption about the possible structure of functional connection [29]. The GEP model is a robust, powerful, and accurate predictive tool. In addition, the GEP-based formulas are transparent and more practical than the ANN-based equations. Hence, the proposed predictive equations formulated from the GEP model could be ready to apply in practice. As a result, the GEP technique was used to develop a model for predicting the UCS of soil stabilization based on a comprehensive database gathered from the literature.

The study focuses on clay stabilization with different cementitious binders, including ordinary Portland cement (CEM I), quick lime, fly ash Types C and F, and blast-furnace slag. Both wet and dry mixing methods were considered. The research results apply to determine the UCS in the laboratory condition only. The GEP technique was applied to generate a predictive model. A parametric study was also conducted to examine the effects of each variable, the effects of binder types, the combination of binders, and the total water content on the UCS of stabilized clay.

Within the scope of this study, the effect of the chemical composition of clay, including the content of CaO and SO42−, organic content, and pH of the soil, was excluded. The specimen preparation processes, such as the size and shape of the mold, the sample making and curing methods, were assumed to be similar. These parameters could be analyzed in subsequent studies.

2 Data preparation

The database used in this study was gathered from several experimental studies in the literature. The experimental data related to the UCS of clay stabilization were chosen consistently based on the following criteria. The study examined the chemical-treated clayey soil only, while sandy soil was not considered in this research. Common types of chemical binders used to stabilize soil, including cement (CEM I), quick lime, fly ash (Type F and Type C), and blast-furnace slag, were all selected. The UCS results from the studies with similar sample making and testing standards were chosen.

Table 1 shows the selected data sources available from reliable published journal articles, including Bolton [30], Ge [31], JGS [32], Xiao [33], Naveena [6], Asgari [34], Correia [35], Oh [36], Tastan [37], Kwan [38], Consoli [39], Kassim [40], and Abbey [18]. The database includes the experimental data for soil stabilization applied for some types of clay in different countries, such as Japan, Singapore, Thailand, Malaysia, India, Australia, the UK, the USA, Portugal, Iran, Brazil, and Taiwan. Finally, approximately 1183 data points were selected for developing a GEP-based formula for estimating the strength of soil stabilization.

Table 1 Sources of experiment data from the literature

The number of independent variables was chosen based on the literature review and several trials. In this research, eleven independent variables considered in the predictive model are:

  • Group 1 represents the soil characteristic: the plastic index (PI), the percentage of clay (Clay), the percentage of silt (Silt), the percentage of sand (Sand);

  • Group 2 represents the mixing method and curing time: the total water content (Total water), the curing period (Age). The total water content includes the natural water content in the clay and the water used to mix with binders.

  • Group 3 represents the binder types and binder content: the lime content (Lime), the cement content (Cement), the fly ash Type F content (FA F), the fly ash Type C content (FA C), and the slag content (Slag).

Table 2(a) and (b) presents the range of variables from the dataset. Table 3 (a) and (b) shows the statistical analysis, including the maximum, minimum, range, mean, standard deviation (SD), and coefficient of variation (CoV) of all variables. The maximum value of the total water content is 265%, while the longest curing period is 360 days. The highest percentages of lime, cement, fly ash Type F, fly ash Type C, and slag used are 20%, 100%, 34.5%, 30%, and 42.5%, respectively. The maximum achievable strength in this study is approximate 6000 kPa. It indicates that the database could represent a wide range of input variables. Besides, the high volume of data (1183 data points) brings outstanding results in comparison with previous studies.

Table 2 Range of variables from input and output data
Table 3 Statistical analysis of input and output data

3 Model development

3.1 Gene-expression programming

Gene-expression programming (GEP) was developed by Ferreira [41]. It consists of two parts: a linear chromosome of fixed length and parses trees in different sizes and shapes (expression trees—ETs) [41, 42]. It also contains a terminal set, function set, fitness functions, and termination functions [43]. GEP evolves several genes (sub-ETs) represented as tree-like structures, and they are connected by a linking function.

The main elements of GEP are expression trees (ETs), genes, and chromosomes. The GEP model is expressed by ET, which includes some sub-ETs (genes). Each gene contains several chromosomes, while each chromosome could be an input variable, constant value, or function [44]. Each GEP gene is composed of a head and a tail. The head of the gene contains mathematical functions and terminal symbols, while the tail contains terminal symbols like constant values or variables [45]. Constant values are used to adjust the equations in the model.

The main operators of GEP are selection, mutation, transposition, and crossover, which are similar to traditional GP. They allow the program to produce the next generation with better fitness scores [46]. The explanation and detail of the GEP structure, as well as its operators, could be accessible through the studies of Soleimani [47], Gandomi [24], and Shahmansouri [28].

For setting up the GEP modeling, it is necessary to define the function set, terminal set, fitness function, control variables, and termination condition for obtaining a solution. GEP then randomly creates an initial population. Chromosomes in that population are converted into an expression tree (ET) by combining terminal and function sets. Next, the fitness function is applied to evaluate each predicted output. If that value does not meet the desired output, chromosomes or genes are evolved through genetic operators (selection, crossover, and mutation) to create new mutagenic generations [48]. That process is stopped when the predicted output meets the desired quality output.

Figure 1 illustrates an example of the GEP solution with two sub-ETs. Each sub-ET contains nine chromosomes, and the head length is five. It expresses the mathematical equation as:

$$y = \sqrt {d_{1} *\left( {7 + \ln \left( {\frac{{d_{3} }}{{d_{2} }}} \right)} \right)} + \cos \left( {d_{3} + \frac{{\sin (d_{0} - d_{4} )}}{{d_{4} }}} \right)$$
(1)
Fig. 1
figure 1

The GEP solution with two sub-ETs

where y: dependent variable (output); d0, d1, d2, d3, and d4: independent variables (inputs).

Recently, many studies have shown that the GEP model is more efficient than GP and comparable with the black-box ANN models [24]. Besides, Mousavi [49] developed the GEP model to predict the compressive strength of concrete with more accuracy than traditional models. Leong [44] applied the GEP technique to predict the UCS of fly ash-based geopolymers and evaluate the effect of each parameter on the UCS of geopolymers. Leong [50] then applied the ANN and GEP models to identify the contribution of input variables on the UCS of soil–fly ash geopolymer. In the research of Mohammadzadeh [43], the GEP-based model was developed to predict the coefficient of consolidation for the compression index of fine-grained soils based on the input variables such as the liquid limit, plastic limit, and initial void ratio. The GEP model was applied to estimate the UCS of geopolymer concrete based on ground granulated blast-furnace slag [28]. The GEP technique was also applied to generate predictive models to investigate the soil properties [45, 51, 52]. Abdi [53] developed a GEP-based model for predicting enhanced interaction coefficient based on large-scale direct shear tests conducted on soil–anchored geogrid samples. Johari [54] applied the GEP technique to investigate the collapsible soils treatment using nano-silica in the Sivand Dam region, Iran. Oulapour [55] generated a GEP model for predicting the cracking zones in earthfill dams. Furthermore, the GEP model was used to solve many geotechnical engineering problems with high accuracy [26, 56,57,58,59,60,61,62,63]. Due to such advantages, the GEP technique was applied to generate a reliable predictive model for the UCS of clay stabilization.

3.2 GEP modeling procedure

Prior to GEP modeling, the data were divided into training, testing, and verifying subsets. The training data were used to train and select the optimal predicted programs. The selected models then measured their performance by using the testing dataset. Finally, the proposed GEP model was verified by an independent subset (unseen data). In this study, K-fold cross-validation (CV) was applied to split the data. Then, the model for UCS prediction, including the eleven independent variables, was developed based on approximately 1,183 data points. The datasets containing 789 (67.7%) and 197 (16.6%) data points were used for training and testing, respectively, while the remaining 197 (16.6%) unseen data points were used for verifying purposes.

GeneXpro Tools 5.0 software [64] was applied to simulate the GEP model for predicting the UCS of chemical-clay stabilization. It is a powerful and flexible modeling tool designed for regression. It also can process with a large number of variables with high accuracy and generalizability.

In GeneXpro Tools, the user needs to define the number of chromosomes, the number of genes (sub-ETs), the head length of the gene, the fitness function, and the genetic operators. The optimal parameters were determined according to the suggested values in Uysal [29], through several trial runs and an error approach to ensure sufficient robustness and generalization of the model. In each trial, the value of one parameter was changed, while others were set constant to monitor the results. The fitness function was used to evaluate training and testing subsets at the end of each trial. When the errors of these subsets were small and as close as possible, the value of the parameter was chosen. The most important parameters which impact the complexity and accuracy of the GEP model are the number of genes (sub-ETs) and chromosomes. The head size of the gene is another essential parameter. It decides the number of branches of each sub-ET. The linking function was chosen based on trials to obtain the desired accuracy. Four basic functions (addition, subtraction, multiplication, and division) were tried to monitor the results. The optimal models were chosen when the training and testing processes provide approximately same and high accuracies.

Finally, the GEP-based model was developed with seven sub-ETs and a maximum of 200 chromosomes for each gene. They were linked by the addition function. The maximum head length of 10 was chosen, and the root mean square error (RMSE) was set as the fitness function. Fourteen different mathematical operators, including addition ( +), subtraction ( −), multiplication (*), division (/), square root (√), natural logarithm (ln), power (x2), inverse (1/x), exponential (exp), addition with three inputs (x1 + x2 + x3), subtraction with three inputs (x1 − x2 − x3), multiplication with three inputs (x1 * x2 * x3), and cube root (3√), were used for GEP modeling. The maximum number of generations was 30,000. In the model, d0, d1, d2, d3, d4, d5, d6, d7, d8, d9, and d10 represent the plastic index, the percentage of clay, the percentage of silt, the percentage of sand, the total water content, the curing time, the percentage of lime, the percentage of cement, the percentage of fly ash Type F, the percentage of fly ash Type C, and the percentage of slag, respectively.

3.3 GEP-based model result

Figure 2 presents the tree-like structure of the selected optimal GEP model with seven sub-ETs linked by the addition linking function. The mathematical formula evolved from the GEP model is presented as Eq. (2). It is obvious that the GEP technique yields practical and straightforward predictive formula.

$$\begin{aligned} {\text{UCS}} & = d_{0} \cdot (d_{10} - 2) + d_{3} + 3 \cdot (d_{5} - d_{4} - d_{10} ) - 2 \cdot (d_{1} + d_{2} - d_{7} ) + d_{6} \cdot (d_{2} + d_{6} + 2) \\ & \quad + d_{5} \cdot d_{9} + \, 2c_{59} - \frac{{{\text{e}}^{{c_{51} - c_{55} - c_{56} }} }}{{c_{59} }} - {\text{e}}^{{c_{51} }} + d_{7} \cdot c_{21} \cdot \ln (d_{5} ) + d_{8} \cdot (d_{4} \cdot d_{3} )^{1/3} + c_{50} + c_{64} \\ & \quad + c_{40} \cdot \ln \left[ {\left( {\frac{{d_{5}^{{d_{10} }} }}{{d_{4} }} + d_{4}^{2} + {\text{e}}^{{d_{9} }} } \right)^{ - 1/2} } \right] + d_{7} \cdot \left( {\frac{{d_{3} }}{{c_{72} + d_{4} + c_{78} }}} \right) \cdot \left( {2 \cdot d_{2} + \frac{{d_{4} }}{{c_{78} }}} \right) + c_{10} \\ \end{aligned}$$
(2)
Fig. 2
figure 2

Tree expression of the proposed GEP model

where UCS: unconfined compressive strength (kPa); c10 =  − 9.204; c21 = 9.910; c40 =  − 15.170; c59 =  − 9.800; c51 = 5.616; c50 =  − 9.6502; c55 = 8.759; c56 =  − 5.498; c64 =  − 9.812; c72 =  − 7.635; c78 =  − 3.177; d0: PI (%); d1: Clay (%); d2: Silt (%); d3: Sand (%); d4: Water (%); d5: Age (days); d6: Lime (%); d7: Cement (%); d8: FA F (%); d9: FA C (%); d10: Slag (%).

The accuracy of the GEP model was evaluated through the coefficient of correlation (R), root mean square error (RMSE), and mean absolute error (MAE), which are measured as Eqs. (3)–(5). Figure 3 illustrates the performance at the training, testing, and verifying phases and entire datasets. Table 4 shows that the correlation coefficients of all phases are close and fluctuated around 0.95. The root mean square errors of the proposed model are less than 240 kPa, and the mean absolute errors are from 150 to 170 kPa. Besides, it needs to be mentioned that the selected optimal GEP model was developed based on a thousand data points. Even though the data were chosen carefully with the similar making and testing standard, uncertain factors that could affect the final model are unavoidable. Therefore, it could be said that the proposed GEP model is accurate and reliable.

$$R = \frac{{\sum\nolimits_{i = 1}^{N} {\left( {O_{i} - \overline{O}} \right)\left( {P_{i} - \overline{P}} \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{N} {\left( {O_{i} - \overline{O}} \right)^{2} \sum\nolimits_{i = 1}^{N} {\left( {P_{i} - \overline{P}} \right)^{2} } } } }}$$
(3)
$${\text{RMSE}} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {O_{i} - P_{i} } \right)^{2} } }}{N}}$$
(4)
$${\text{MAE}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {O_{i} - P_{i} } \right|}$$
(5)
Fig. 3
figure 3

The performance of the proposed GEP model

Table 4 Performance of the proposed GEP model

where N: the number of data points presented to the model; Oi and Pi: the observed and predicted outputs, respectively; and O and P: the mean of observed and predicted outputs, respectively [65].

3.4 Performance analysis

3.4.1 External criteria

Golbraikh [66] provided some criteria as the external validation models for checking the testing dataset. It is suggested that one of the slopes of regression lines (k or k') through the origin should be approximate 1.0. Besides, the m and n indexes (performance indexes) should be lower than 0.1. Moreover, the squared correlation coefficients through the origin between predicted and experimental values (Ro2 or Ro'2) are recommended to be close to 1.0 [47, 67]. Table 5 presents the results of these criteria. It is clear that the proposed model satisfies the required conditions and shows great performance with a high accurate predictive capability.

Table 5 Statistical parameters for the external validation of the GEP-based model

3.4.2 Comparative study

Table 6 illustrates the comparison among several numerical predictive models for UCS of soil stabilization and the proposed GEP-based model in this research. It is obvious that the predictive model in this study was generated based on a large volume of data (1183 data points), which were collected from plenty of studies in different nations. In contrast, other published models were developed based on small group of testing data from specific area. Therefore, that advantage brings distinguished results in comparison with other models for predicting the UCS of stabilized soil.

Table 6 Comparative study

Furthermore, eleven input variables in the suggested model are the main parameters that affect the UCS result. Especially, the model not only considers the soil properties but also investigates the effect of four chemical binders (cement, lime, fly ash, and slag). As a result, the novel model is unique and different from other previous models.

In addition, the proposed GEP-based model could be applied for a wide range of input parameters. For example, the maximum cement content in the model is 100%, while the normal amount of cement used is around 10–30%. Hence, the applicability of this model is wider than others.

In terms of the model performance, the proposed model achieves a high correlation coefficient (R = 0.951) and low errors. According to a logical hypothesis [68], if the model provides a high correlation coefficient (R > 0.8) and low error values (e.g., MAE and RMSE), the prediction relationship among input and output variables is accurate and reliable. Thus, it demonstrates an outstanding performance with strong predictive capability of the proposed GEP model.

It is the fact that the ANN-based models could slightly outperform the GEP-based models. However, that technique is considered as a “black-box” programming; hence, the applicability of the ANN-based formulas is limited in practice.

In conclusion, the evolutionary predictive model in this study could be confidently applied in determining the UCS of chemical-stabilized clayey soil considering the effects of the soil characteristics and the types and the contents of chemical binders. Therefore, the GEP-based formula is a reliable option for a designer and researcher in estimating the UCS value of stabilized clayey soil.

4 Parametric study

4.1 Effect of each variable on the UCS of stabilized clay

The selected optimal GEP-based model (Eq. 2) was used to examine the effect of each input variable on the UCS of stabilized clay. The examined variable was assumed to be varied within the range of its input, while the average values were kept constant for other parameters. The effects of input parameters on the UCS of stabilized soil based on the parametric study are illustrated in Fig. 4.

Fig. 4
figure 4

Effects of input parameters on the UCS of stabilized soil

4.1.1 Effect of Atterberg limits

Figure 4a shows the effect of the Atterberg limits of the soil on the UCS of stabilized soil. It is obvious that the UCS is reduced when the plastic index is high. The plastic index is an important parameter that correlates with soil behavior [5]. The stiffness and strength of the soil are decreased when the plastic index is increased [47, 70]. At the high level of the plastic index, the soil is more ductile [1]. As a result, the plastic index has a negative effect on the UCS of stabilized soil.

4.1.2 Effect of the particle size

The effect of the particle size of the soil on the UCS of stabilized soil is presented in Fig. 4b–d. The high percentage of clay may lead to a negative effect on the UCS of stabilized soil. In contrast, the UCS of stabilized soil increases linearly with the percentage of silt and sand. Increasing the percentage of sand could improve the UCS of stabilized soil. Szymkiewicz [71] concluded that the strength of stabilized soil reaches a higher value if the soil contains a well-graded grain size. Within the scope of this study, the effect of sand particle size and shape and the types of silt were not considered. It could be examined in another research.

4.1.3 Effect of the total water content

Figure 4e illustrates the effect of the total water content in the stabilized soil. Total water includes the natural water content in the soil and the added water used to mix with the binder in the slurry (the wet mixing method). The added water is zero for the dry mixing method. It should be noted that total water content has a negative correlation with the UCS of stabilized soil. The strength of the material markedly reduces when the amount of total water content is high. Besides, it requires more binder content to achieve the designed strength. The correlation relationship between the UCS and the total water content is expressed as:

$${\text{UCS}} = A \cdot W_{{\text{c}}}^{ - B}$$
(3)

where A and B: constant values, and Wc: the total water content.

The finding is well in agreement with the reports in the researches of Horpibulsk [2], Kitazume [10], Naveena [6], and Yao [1]. Their studies showed that water content plays a dominant role in the characterization of stabilized soils, especially on compressibility [2]. It reflects the microfabric of the material. The high amount of water content increases the distance between the soil particles and creates a more porous structure in the stabilized soil. Horpibulsuk [72] showed that the pore size increases remarkably when the water content is increased. As a result, it leads to the growth of capillary pore size and a low level of crystalline structure [6]. Therefore, the strength of stabilized soil is reduced with the increase in total water content [1].

4.1.4 Effect of the curing period

The effect of the curing period on the UCS of stabilized soil is demonstrated in Fig. 4f. It indicates the UCS of stabilized soil could achieve higher values by increasing the curing period. The correlation relationship between the UCS and the curing time is:

$${\text{UCS}} = a \cdot \ln t + b$$
(4)

where a and b: constant values, and t: the curing time (day).

That result is similar to the reports of Horpibulsk [2], Kitazume [10], and Yao [1]. Based on the microstructure analysis, Horpibulsuk [73] found that in a short curing period, the volume of large pores in the soil–cement material increases, and the volume of small pores decreases, leading to low strength. In contrast, in the long-term period, the volume of large pores decreases significantly, while the volume of small pores increases. In addition, during that period, cementitious products are growing. Thus, the strength of stabilized soil increases over time [73].

However, the strength gain rate depends on the types of binder. For cement stabilized soil, the strength increases significantly in the first 1 to 3 months; then, the strength gain is low [74]. For lime and slag, the strength development is continuously for a long period if the water in the soil and the binder are sufficiently for pozzolanic reactions. When lime or slag is mixed with soil, the calcium reacts with the silicates and aluminates to create calcium silicate hydrates and calcium aluminate hydrates [6]. These reactions could occur for a long period if an adequate binder is provided [6].

4.1.5 Effect of the binder type and binder content

The effects of the binder types (lime, cement, fly ash, and slag) and the binder content are illustrated in Fig. 4g–k. Generally, the amount of binder is the essential parameter that has a positive effect on the UCS of stabilized soil. Depending on the types of binders, the strength gain rate is different.

When the substantial binder is provided, the hydration compounds are formed fully and create a hardened skeleton matrix [6]. The hydration products enclose the soil particles. In the long curing period, the alkaline conditions in the soil–binder are increased; the silicate and alumina from clay minerals and amorphous materials on the surface of clay particles are dissolved. They react with calcium ions to form insoluble compounds [6]. Besides, more amount of cementitious products are generated, and they enhance the inter-cluster bonding strength and fill the pore space [73]. Thus, the strength of stabilized soil increases when the binder content is high.

These findings are expected and well agreement with previous studies. Thus, it demonstrates that the proposed GEP-based model is reliable and applicable.

4.2 Effect of the combination of variables

4.2.1 Effect of the particle size of the soil

In order to evaluate the effect of the soil particle size on the UCS of stabilized soil, the percentages of sand, silt, and clay in the soil were changed in different ways. In this section, the amount of cement used was 20%, and other binders were 0%. Firstly, the proportion of sand was kept constant at 10%, 30%, and 50%; the percentage of silt was increased, and the percentage of clay was decreased at the same time (it was assumed that the total amount of sand, silt, and clay in the soil is 100%). Figure 5 shows that increasing the amount of silt (and decreasing the clay content) leads to an increase in the UCS. The level of strength gain is depended on the percentage of sand (10%, 30%, and 50%). If the amount of sand in the soil is high, the UCS significantly improves with the increase in the silt content.

Fig. 5
figure 5

Effects of the percentage of silt and clay on the UCS of stabilized soil

Figure 6 illustrates the effect of sand content in the soil on the UCS of stabilized soil. The percentage of silt is kept constant at different levels (10%, 30%, and 50%), the amount of sand is increased, and the clay content is decreased. It is obvious that with a low percentage of silt, the UCS of stabilized soil remains constant even the amount of sand is changed. However, if the silt content is high, the UCS of stabilized soil increases significantly by increasing the amount of sand.

Fig. 6
figure 6

Effects of the percentage of sand and clay on the UCS of stabilized soil

The effect of the percentage of silt and sand in the soil on the UCS of stabilized soil is demonstrated in Fig. 7. In this case, the UCS is observed by remaining the amount of clay, increasing the percentage of silt, and decreasing the amount of sand at the same time. The UCS of stabilized soil shows two different trends. In the first part, the UCS grows up following the increase in the silt content. However, if the silt content is still increased (the sand content is decreased), the UCS is then decreased. The peak of the UCS curve could be the suitable ratio of particle size distribution to achieve the reasonable UCS value of the stabilized soil. For example, when the soil contains 10% of clay, the UCS of stabilized soil achieves the highest value if the percentages of silt and sand contents are 50% and 40%, respectively. It is noted that some parameters, such as the type of silt, the size and shape of sand, the water content, and the type and the amount of binder, could cause different strength gain trends. Hence, these findings should be verified by laboratory experiment results with similar input conditions.

Fig. 7
figure 7

Effects of the percentage of silt and sand content on the UCS of stabilized soil

The effect of particle size of soil on the UCS of stabilized soil could be explained through the shape and size of the soil particle and microstructure of soil–cement material. The size of soil particles affects the shear strength and characteristics of the soil. Sand and silt particles have irregular in size and shape. The silt particle is smaller than sand. According to the classification in ASTM D3282 [75], the diameter of the sand particle is from 0.075 to 2 mm, the silt particle is from 0.002 to 0.075 mm, and the clay diameter is smaller than 0.002 mm.

The typical shape of clay particles could be flaky, needle shape, or elongated particles [76]. Hence, the clay particles and clay clusters in the soil easily slide over each other when sheared, leading to low strength and stiffness [73]. Thus, a high amount of clay content could lead to low strength.

The soil microfabric includes domain (a group of clay particles), cluster (a group of domains), ped (a group of clusters), silt or sand grain, micropore, and macropore [77, 78]. If the soil contains sand grain only, the large soil–cement clusters are created, and large pore spaces are developed. It causes a loose structure in the material, which makes the strength of clay stabilization is low [78].

For the soil contains sand, silt, and clay particles, increasing the particle size and the sand content could increase the internal friction angle and shear strength of the soil [79]. Well-graded soil provides the highest shear strength [80]. In this case, the fine particles create the skeleton for the structure, and the clay–cement clusters fill the pore between silt and sand particles. The cementitious products will enclose the particles, connect them together, and fill the micropores to create a rigid structure [73].

4.2.2 Effect of cement in combination with other chemical binders

In the situation that the amount of cement was fixed at 20%, the performance of stabilized soil was examined by adding with other kinds of binder. Figure 8 shows that the strength gain rate is different and depends on the type of added binder (lime, fly ash Types C and F, and slag). Soil stabilization with cement (20%) and fly ash provides lower strength than other binders. Fly ash contributes little to the strength gain from pozzolanic reactions [74]. Furthermore, fly ash Type C shows better performance than fly ash Type F. According to ASTM C618 [81], the CaO content in fly ash Type C is 24%, much higher than the amount of CaO in fly ash Type F (8%). Hence, more hydration productions are created by fly ash Type C than fly ash Type F.

Fig. 8
figure 8

Effects of the added binder types on the UCS of stabilized soil

Figure 8 demonstrates that stabilized soil with cement in combination with slag could achieve high strength. The blast-furnace slag, a type of steel slag, contains the main components of Portland cement, such as CaO, SiO2, and Al2O3. Especially, the amount of SiO2 and Al2O3 in the slag is much higher than in Portland cement. Therefore, it could create more hydration productions in the long-term period through hydration reactions [31, 74].

Lime could be used to combine with cement to enhance the strength of stabilized soil. By combining 20% cement with lime (in this study, the lime content is up to 20%), the UCS of stabilized soil is improved significantly. The high percentage of CaO content in lime (93%) provides an essential mineral for the pozzolanic reactions and generates long-term strength for the stabilized soil [6].

4.2.3 Effect of the total water content and cement content

Figure 9 shows the effect of total water content and the binder content on the UCS of stabilized soil. The total water content was kept constant at different levels, such as 20%, 40%, 80%, and 160%, while the cement content was increased linearly. As mentioned above, the cement content has a positive effect on the UCS. However, the strength gain rate depends on the amount of total water content. Besides, it requires more cement content to achieve the desired strength if the water content is high. For example, if the total water content is 20%, the strength of stabilized soil could be 2000 kPa by mixing with 7% cement content. On the other hand, if the total water content is 40% or 80%, the cement content needs to be increased to 19% and 35% to each the same strength value. Thus, this finding could help the consultancy decide the suitable binders for soil stabilization depending on the water content. Finally, these results support engineers in choosing reasonable methods to enhance the UCS of stabilized clay.

Fig. 9
figure 9

Effects of the cement content and total water content on the UCS of stabilized soil

5 Conclusions

The study gathered over thousand comprehensive data points on the UCS of clay stabilization with common cementitious binders, such as lime, cement, fly ash, and slag. The GEP technique was applied to generate the predictive model. Eleven independent variables including the plastic index (PI), the percentage of clay (Clay), the percentage of silt (Silt), the percentage of sand (Sand), the total water content (Total water), the curing period (Age), the lime content (Lime), the cement content (Cement), the fly ash Type F content (FA F), the fly ash Type C content (FA C), and the slag content (Slag) were considered in the model. The results show that the proposed predictive model performs well with a high correlation coefficient (R = 0.951) and low errors (e.g., RMSE and MAE). Furthermore, the selected optimal model satisfied well with all external criteria. The comparative study demonstrates that the GEP-based model in this study was generated based on a large volume of data (1183 data points), while other studies just used a small dataset. As a result, the selected optimal GEP model could be confidently applied for different clay conditions in mixing with common chemical binders. The research results show distinguished accuracy and reliability in comparison with previous models. Besides, the GEP model generates transparent and practical mathematical equations which could be ready to use in practice. As such advantages, the proposed GEP-based model could help engineers in estimating the UCS of clay stabilization with different binders. However, the model was developed based on the laboratory test results; thus, it needs to be modified when applying in the in situ test as there are plenty of uncertain variables on the site.

The parametric study was conducted to examine the effects of the essential parameters on the UCS of stabilized soil. Most findings from the parametric analysis are expected and well in agreement with other experimental results. It confirms that the proposed model is reliable and accurate. Thus, it helps consultants to understand the ways to optimize the ultimate strength and choose suitable binders for clay stabilization. The parametric study results indicate that:

  • The plastic index, the percentage of clay, and the total water content have a negative effect on the UCS of stabilized soil. The correlation relationship between the UCS and total water content is expressed as UCS = A⋅WcB.

  • The percentage of silt and sand in the soil, the binder type, the binder content, and the curing time show a positive effect on the UCS of stabilized soil. Increasing these parameters could enhance the strength of stabilized clay. The correlation relationship between the UCS and the curing time is UCS = a⋅lnt + b.

  • Increasing the sand and silt content in the soil could improve the strength of stabilized clay. Well-graded soil provides the highest strength when the clay is mixed with binders. However, the effect of the type of silt, the amount of silt content, and the size and shape of sand particles should be deeply studied by laboratory tests.

  • The stabilized clay with cement in combination with slag could achieve excessive strength than other adding binders. Lime and fly ash could also be used to enhance the UCS of stabilized soil. Moreover, fly ash Type C shows better performance than fly ash Type F in stabilizing clay.

  • The stabilized soil could reach higher strength by reducing the natural water content in the soil and the water to binder ratio. Moreover, it requires more cement content to achieve the desired strength if the water content is high.