Introduction

Nowadays construction sector has rapid growth and that necessitates the use of marginal materials and problematic soils for all its engineering structures (Onyelowe et al. 2023). However, the use of such materials requires stabilization/solidification using additives. Among them, use of expansive soil in construction applications is a challenge. Generally, these soils have a high affinity to water and exhibit swelling and shrinkage with change of water content (Dang et al. 2016). Due to such volume changestructures built over on these soils suffernumerous forms of failures. It is estimated that about 50% of damages to infrastructure globally are due to expansive soil. Among them, road and liner infrastructure have adverse effects due to large exposer area (Maheshwari et al. 2016). To avoid such problems, expansive soils are being amended with chemicals, industrial wastes and pozzolanic materials (Reddy et al. 2015). Among them most popular technique is to stabilize the expansive soil with cement. However, few case studies have shown that cement stabilization has few adverse effects such as emission of greenhouse gases and formation of cracks due to hydration heat (Dash and Hussain 2012; Dora et al. 2021). To avoid such effects researcher has proposed to stabilize expansive soil with lime in combination with industrial wastes and agricultural waste like flyash, rice husk, fibers of bagasse, polypropylene, bagasse ash etc. (Dang et al. 2016). Sugarcane bagasse is a fibrous waste produced by the refining sugar industry that has probable to be utilized as a cementing supplementary material (Modani & Vyawahare 2013). The biomass deposit is utilized as a fuel source for generating steam during the sugar preparation process in boilers, resulting in the production of ash (Teixeira et al. 2015). The utilization of bagasse ash as a cementing complementary material has been a topic of significant interest among researchers for an extended period.

Bagasse ash utilization gained attention to be used as a potential stabilizer because of high silica and alumina (James and Pandian 2018; Shivakumar and Sridharan 2017). The chemical composition of bagasse ash, as indicated by various studies (Katare & Madurwar 2017; Setayesh Gar et al. 2017; Shafiq et al. 2018), meets the requirements of a pozzolanic material based on ASTM C618. The ash sample was found to contain a significantly high percentage of silicon dioxide (SiO2) in comparison to other oxides (Le et al. 2018). The bagasse ash has excellent reactivity with lime and is thus able to improve properties of soils including reduced swell, shrinkage, and plasticity while it improves strength and durability (Al-Shamraniand and Subhani 2018)

The compressive strength is the main deciding factor for stabilized soils to use in various applications including subgrade and liners. For subgrade application minimum of 600 kPa and 750 kPa is the required unconfined compressive strength (UCS) for 7-day chemical stabilized soils according to Austroads (2013) and IRC 37 (2012) respectively. On the other hand, United States Environmental Protection Agency (USEPA) suggest a minimum value of 250 kPa for liner application. Such high UCS values necessitate stabilization of expansive soil for subgrade applications for highways and railways to ensure better resilience. Similarly, liner constructed for landfills and/or containment ponds, significantly reduces permeability, and thus reduces the risk of contaminant migration through leachate (Mei et al. 2020).

Earlier research studies on the influence of bagasse ash on UCS lime reconstituted expansive soils mainly focused on laboratory studies to determine the optimum dosage of each additive. Advancement of various machine learning techniques has gained momentum to optimize time and better resource utilization. The stabilization involves various materials at various combinations such as bagasse ash percentage, lime dosage, curing time, and soil characteristics including plasticity and swelling. Employing a machine learning algorithm that analyzes these complex relationships and maximizes compressive strengthby reducing the need for laboratory experimentation and this saves the saving time and better resource management. The present study deals with how strength criteria of lime reconstituted expansive soils can be improved using bagasse ash by examining various properties of these materials by applying a series of tests and then comparing them with a machine learning algorithm towards achieving sustainable infrastructure development.

Methodology

Statistical analysis of the collected database

A total of 79 records were collected from experimentally tested samples of expansive soil mixed with Lime and bagasse ash (BA) and contained in a previous research study (Goutham and Krishnaiah 2024). Each record contains the following data:

• BA

Bagasse Ash content (%)

• Lm

Lime content (%)

• LL

Liquid limit (%)

• PL

Plastic limit (%)

• SL

Shrinkage limit (%)

• MDD

Maximum Dry Density (kN/m3)

• OMC

Optimum Moisture Content (%) 

• UCS

Unconfined compressive strength (kPa)

The collectedrecords were divided into 75% training set (60 records) and 25% validation set (19 records). The table in the Appendix includes the complete dataset, while Tables 1 and 2 summarizes their statistical characteristics and the Pearson correlation matrix. Finally, Fig. 1 shows the histograms for both inputs and outputs and Fig. 2 shows the relations between the inputs and the outputs. While Table 1 shows the minimum, maximum, average, standard deviation, and variance of the collected database for the studied parameters, Table 2 presents the internal consistency in the form of correlation between the input parameters and the output. In Table 2, it can be observed that the plastic limit (PL) showed the highest correlation of 83% followed in that order by the shrinkage limit (SL) with correlation of 79% with the UCS. OMC and BA produced correlations of 66% and 60%, respectively, while the rest produced correlations less than 50% except lime (Lm), which produced a correlation of 50%. Figure correlation curves between the output and the input parameters support the outcome presented in Table 2. These correlations show weak internal consistency between the output and the input parameters, and this requires optimization techniques to produce optimal UCS for smart design and construction purposes.

Table 1 Statistical analysis of collected database
Table 2 Pearson correlation matrix
Fig. 1
figure 1

Distribution histograms for inputs (in blue) and outputs (in green)

Fig. 2
figure 2

Relations between inputs and output (UCS)

Research program and sensitivity analysis

Three different Artificial Intelligence (AI) techniques were used to predict the unconfined compressive strengths (UCS) of expansive soil mixed with Lime and bagasse ash using the collected database. These techniques are “Genetic programming” (GP), three models of “Artificial Neural Network” (ANN) and “Evolutionary Polynomial Regression” (EPR). Flowcharts for the used techniques are presented in Fig. 3. All three developed models were used to predict (UCS) using mix contents (BA, Lm), consistency limits (LL, PL, SL) and compaction parameters (MDD, OMC). The Accuracies of developed models were evaluated by comparing the sum of squared error (SSE), mean average error (MAE), mean squared error (MSE), root mean squared error (RMSE) and R-squared (R2) between predicted and calculated shear strength parameters values.

Fig. 3
figure 3

Flowcharts for different (AI) predictive models

Sensitivity analysis

A preliminary sensitivity analysis was carried out on the collected database to estimate the impact of each input on UCS values. The “Single variable per time” technique is used to determine the “Sensitivity Index” (SI) for each input using Hoffman & Gardener (1983) formula as follows:

$$SI \left({X}_{n}\right)= \frac{Y\left({X}_{max}\right)-Y\left({X}_{min}\right)}{Y\left({X}_{max}\right)}$$
(1)

Accordingly, the (SI) values are 0.76, 0.62, 0.54, 0.82, 0.81,0.06 & 0.65 for (BA, Lm, LL, PL, SL, MDD & OMC respectively. A sensitivity index of 1.0 indicates complete sensitivity, and a sensitivity index less than 0.01 indicates that the model is insensitive to changes in the parameter.

Using GP technique

Three GP models were developed with complexity levels ranged between three and four. The population size, survivor size and number of generations were 1000, 300 and 2000 respectively. Figure 4 shows the improvement in accuracy with increasing complexity. Equation (2) presented the output formula for UCS from the last trial, while Fig. 9(a), show edits fitness. The average error % of total dataset is 13%, while the R2 value is 0.950.

Fig. 4
figure 4

GP Model accuracy versus complexity level

$$\mathrm{UCS }=\frac{{\mathrm{OMC}}^{2}}{4\mathrm{PL}-\mathrm{BA}.\mathrm{OMC}}+\frac{\mathrm{MDD}}{\mathrm{OMC}-\mathrm{X}+\mathrm{BA}}+\mathrm{MDD}.\mathrm{X},\mathrm{ X}={\left(\frac{2\mathrm{PL}}{\mathrm{OMC}}\right)}^{\left(\frac{2\mathrm{PL}}{\mathrm{OMC}}\right)}$$
(2)

Using EPR technique

Finally, the developed EPR model was limited to 6th level polynomial, for 7 inputs, there are 924 possible terms (462 + 210 + 84 + 28 + 7 + 1 = 924) as follows:

$$\sum\nolimits_{n=1}^{n=7}\sum\nolimits_{m=1}^{m=7}\sum\nolimits_{l=1}^{l=7}\sum\nolimits_{k=1}^{k=7}\sum\nolimits_{j=1}^{j=7}\sum\nolimits_{i=1}^{i=7}{{{{{X}_{n}.X}_{m}.X}_{l}.X}_{k}.{X}_{j}.X}_{i}+\sum\nolimits_{m=1}^{m=7}\sum\nolimits_{l=1}^{l=7}\sum\nolimits_{k=1}^{k=7}\sum\nolimits_{j=1}^{j=7}\sum\nolimits_{i=1}^{i=7}{{{{X}_{m}.X}_{l}.X}_{k}.{X}_{j}.X}_{i}+\sum\nolimits_{l=1}^{l=7}\sum\nolimits_{k=1}^{k=7}\sum\nolimits_{j=1}^{j=7}\sum\nolimits_{i=1}^{i=7}{{{X}_{l}.X}_{k}.{X}_{j}.X}_{i}+\sum\nolimits_{k=1}^{k=7}\sum\nolimits_{j=1}^{j=7}\sum\nolimits_{i=1}^{i=7}{{X}_{k}.{X}_{j}.X}_{i}+\sum\nolimits_{j=1}^{j=7}\sum\nolimits_{i=1}^{i=7}{X}_{j}.{X}_{i}+\sum\nolimits_{i=1}^{i=7}{X}_{i}+C$$
(3)

GA technique was applied to these 924 terms to select the most effective terms to predict the values of UCS. The process began with only 2 terms and increased gradually up to 8 terms, Fig. 5 presents the enhancement of fitness with increasing the number of terms and indicates that 6 is the optimum number of terms. The output is illustrated in Eq. (4). The average error % and (R2) values were (11% & 0.963) respectively. The EPR closed-form equation can be applied manually in the design of bagasse ash plus lime reconstituted expansive soil for road and landfill infrastructures (Onyelowe et al. 2023).

Fig. 5
figure 5

EPR Model accuracy versus number of terms

$$UCS= \frac{788P{L}^{3}-1.62\times {10}^{6}PL}{OM{C}^{3}}+ \frac{2.3\times {10}^{9}}{P{L}^{2}.OM{C}^{2}}-\frac{2.1\times {10}^{9}}{P{L}^{3} .OMC}- \frac{BA.S{L}^{3}}{2.5 OM{C}^{2}}+\frac{75020}{PL}-1628$$
(4)

Using ANN technique

Four ANN models; 7–1-1, 7–2-1, 7–3-1, and 7–4-1 were developed to predict UCS values. All the models used normalization method (-1.0 to 1.0), activation function (Hyper Tan) and “Back propagation” (BP) training algorithm. Only the number of neurons in the hidden layer was increased from 1 to 4 to find out the optimum layout of the network. The SSE & R2 values of the six ANN’s and the optimum network layout are illustrated in Figs. 6 and 7 while the weight matrix of the model is shown in Table 3. The average error % of total data set is 4% and the R2 value is (0.996).

Fig. 6
figure 6

ANN Model accuracy versus number of neurons in the hidden layer

Fig. 7
figure 7

The layout of the optimum ANN models

Table 3 Weights matrix for the developed ANN

The relative importance values for each input parameter are illustrated in Fig. 8, which indicates that compaction parameters (MDD & OMC) and PL are the most influential inputs, then BA, LL. The results of all developed models are summarized in Table 4. The relations between calculated and predicted values are shown in Fig. 9. It can be observed in Fig. 9 that the GP produced a parametric line of fit expression of y = 0.999 × with performance indices as MAE 14.80 kPa, RMSE 20.00 kPa, and R2 of 0.950, EPR produced a parametric line of fit expression of y = 0.992 × with performance indices as MAE 11.6 kPa, RMSE 16.50 kPa, and R2 of 0.963, and ANN produced a parametric line of fit expression of y = 0.997 × with performance indices as MAE 4.26 kPa, RMSE 5.55 kPa, and R2 of 0.996. This shows that the ANN outperforms the GP and the EPR having produced the least error values, the highest coefficient of determination (R2) and zero outliers beyond the ± 25% performance fit envelop. The Taylor diagram, which compared the accuracies of the models and the variance distribution of the models presented in Figs. 10 and 11, respectively supports the outcome of Fig. 9.

Fig. 8
figure 8

Relative importance of input parameters

Table 4 Accuracies of developed models
Fig. 9
figure 9

Relation between predicted and calculated (UCS) values using the developed models

Fig. 10
figure 10

Comparing the accuracies of the developed models using Taylor charts

Fig. 11
figure 11

Variance distribution for the developed models

Conclusions

This research presents three models using AI techniques (GP, ANN and EPR) to predict the unconfined compressive strengths (UCS) of expansive soil mixed with Lime and bagasse ash using mix contents (BA & Lm), consistency limits (LL, PL, & SL) and compaction parameters (MDD & OMC). The comparative results of the developed models have been concluded:

  1. 1.

    ANN model showed the best accuracy 96%, while GP & EPR models had almost the same accuracy of 88%.

  2. 2.

    GP and EPR models depended on OMC, PL & BA in addition to MDD for GP and SL for EPR, which indicated the insignificant impact of Lm, LL on the UCS values.

  3. 3.

    ANN model assured the previous point, its importance analysis showed that compaction parameters (MDD & OMC) and PL are the most influence inputs, then BA and LL.

  4. 4.

    The developed models are valid within the considered range of parameter values, beyond this range; the prediction accuracy should be verified.