1 Introduction

As a property of soils, permeability measures the speed of water percolation. Low permeability (k-value) means the possible generation of excess pore water pressure and secondary consolidation (Lin et al., 2018; Zhang et al., 2021). The measurements of permeability are essential for geotechnical projects related to groundwater tables or precipitation water. The k-value of soil depends upon many factors, including: grain size properties, void ratio, fine content, over-consolidation ratio, drainage type, and density of soil or impurities, if any (Cashman & Preene, 2020; Smith, 2014).

The k-value can be identified in direct and indirect methods. For direct methods, tests are carried out on soil samples. In contrast, for indirect methods, the k-value is calculated from empirical formulas based on grain size properties and void ratio (Nagy et al., 2013; Osterhout, 1922). Some direct and indirect methods are listed in Table 1. It is not convenient to perform permeability tests on every soil sample. So, modeling is often used to provide a rough estimate of the actual k-value. The k-value exhibits up to 10 orders of magnitude, ranging from a coarse feature to a very fine feature.

Table 1 Direct and indirect methods for measuring the k-value

Proposed earlier based on some assumptions, indirect methods are suitable for different conventional soils. However, for soils incorporating waste materials, the values of constants would change significantly. So, if various waste materials or composites are present, new equations must be developed based on experimental data. This gap can be bridged by developing certain models with artificial intelligence techniques (Baghbani et al., 2022; Shahin, 2013).

Sand is a kind of natural materials used in the majority of construction projects. In India, the expected demand for sand is 700 MT (in 2017), with an annual growth rate of 6%–7%. Mining has led to a 90% decline in sediment levels in key Asian rivers, putting local communities into risks of flooding, land loss, contaminated drinking water, and crop devastation (Ministry of Mines, 2018). India produces more than 3 million tons of WFS annually. The scarcity of natural sand drives practitioners to replace them with waste and by-products from industries. The replacement materials should meet the strength and other requirements specified in construction laws and codes. One such material which has been used in construction over recent decades is WFS. Many researchers have reported the viability of WFS in various geotechnical applications (Gedik et al., 2018; Heidemann et al., 2021; Javed, 1995; Siddique & Singh, 2011; Sinha et al., 2020; Tittarelli, 2018; Winkler & Bolshakov, 2000), hydraulic or fluid barriers (Abichou et al., 1998, 2000, 2002), retaining-wall backfills (Lee et al., 2001), highway sub-bases (Guney et al., 2006; Javed & Lovell, 1994; Mast & Fox, 1998; Partridge et al., 1999) and ground improvement (Vipulanandan et al., 2000). For all such applications, knowledge on the drainage behaviour of WFS or WFS-incorporating sand is necessary. FHWA (2004) has reported a range of permeability 10–3–10–6 cm/s.

This study proposes a model to measure the permeability of soils incorporating a certain kind of industrial materials. Sandy soil has been mixed with waste foundry sand at different ratios to cover the range of replacement, and the hydraulic behaviour at different densities has been observed. Few existing research is available that considers the relative density as a governing parameter in indirect methods. The main objective of this study is to explore the drainage behaviour of WFS-incorporated sand in retaining-wall backfills and earthen dam applications.

Sustainable geomaterials are rapidly replacing conventional materials. Due to their accuracy, soft computing methodologies have gained pervasive traction over the preceding two to three decades. Many models have been trained for evaluating material properties based on known input parameters (Dalkilic et al., 2023; Khatti & Grover, 2023a, 2023b, 2023c, 2023d, 2023e; Kumar et al., 2023; Länsivaara et al., 2023; Rabbani et al., 2023). Machine learning modeling can be broadly classified as an approximation, classification and forecasting (Sarker, 2021). This study will adopt artificial neural networks, multi-linear regression, decision trees, and random forest techniques. Several past studies on modelling of k-value by using AI techniques are listed in Table 2. In most cases, the input variable is the grain size properties of soil. The performance of models is dependent upon available data sets, correlation between variables, and standard deviation in values.

Table 2 AI methods used in the past studies on k-value

2 Materials used

2.1 Sand

Sandy soil samples acquired in Punjab, India is used in this study. The grain size properties are presented in Fig. 1 and Table 3. The type of gradation highly impacts the permeability of soil. The specific gravity of soil is determined as per ASTM D854 (2002), while GSD parameters are as per ASTM D422-63 (2007). The soil is classified as poorly graded sand.

Fig. 1
figure 1

Grain sizes' distribution curve of the materials used

Table 3 Grain size properties of the materials used

2.2 Waste foundry sand

In this study, WFS is acquired from an iron foundry located in Ludhiana, India. The grain sizes are presented in Fig. 1. The WFS grain sizes' distribution curve is found similar to that of Ottawa sand at the F65 grade (Carey et al., 2020). The grains of WFS are smaller than those of sand, so the specific gravity is found to be smaller. The index and engineering properties for this type of WFS are reported by Kumar and Parihar (2022). WFS is classified as poorly graded sand as per the USCS system (ASTM D2487, 2006; Casagrande, 1948).

3 Research methodology

This study is divided into four phases: data generation, data preparation, modeling, and getting conclusions based on the best-performed model (Fig. 2). The data in this study are generated from a series of laboratory scale tests. The experimental output data are further investigated for outliers and multicollinearity. The phases involved will be discussed in detail in the following sections.

Fig. 2
figure 2

Flowchart of the research methodology

3.1 Lab experiments

To cover a wide range of replacements with WFS and to explore the relative density, 18 different compositions are considered (Table 4). At least five distinct readings of permeability for each composition are measured. The time during which a particular amount of water is drained from the samples is also considered as one of the governing parameters.

Table 4 Proportions of different composite soils

Permeability is affected by the relative density of soil, as liquid would take more time to flow through denser media. Test for RD is a preliminary step for sample preparation in the permeability experiment.

3.1.1 Relative density

The relative density of soil composites indicates the compactness of cohesionless materials. The test in this study is conducted as per ASTM D4254 (2000). To measure the minimum density (\({\rho }_{min}),\) the soil is poured down by free, falling from 2–3 cm height in a relative density mold of 3000 cc, and the mass of soil in the mold is noted down. The maximum density (\({\rho }_{max})\) is determined by vibrating the filled mold at a frequency of 3,600 vibrations/min under a 115 kg surcharge for 8 min. After the vibration and removal of the load, the settlement of the loading plate is measured, and the reduced volume is thereby determined. Since the mass remains constant, a reduction in volume in the latter case results in increased density.

The density for different relative densities is calculated by using Eqs. 2 and 3. The values of densities for different relative densities with variation in WFS replacement level are shown in Fig. 8.

$${\rho }_{\Delta }={\rho }_{max}-{\rho }_{min}$$
(1)
$${\rho }_{d}={{\rho }_{max}-(R}_{D}*{\rho }_{\Delta })$$
(2)

3.1.2 Permeability

The permeability of soil indicates the degree of easiness of water to flow through a porous medium. Permeability doesn't depend upon the density and viscosity of the flow-in materials, like hydraulic conductivity. This study measures permeability for all considered cases (Table 4) as per ASTM D2434 (2019). As the soil composites are granular, the constant head method is performed on all compositions. The total head is maintained at 122.5 cm. A Permeameter with a height of 12.73 cm, a diameter of 10 cm, two operating valves, and an air vent is used (Fig. 3). De-aired water is allowed to flow through the sample with two-way drainage. At least five readings of quantity outflow per unit of time for each sample are noted. An equation given by Darcy law (Eq. 3) is adopted for calculating the k-value in m/day.

$$k=\frac{Q\times L}{h\times A}$$
(3)

where Q is the amount of drained water per unit time; h is water head; A is the cross-sectional area of sample; and L is the length of the sample.

Fig. 3
figure 3

Setup for permeability test

3.2 AI-approaches

R programming offers a rich ecosystem of packages, and is specifically designed for machine learning, so it is a versatile tool for data scientists and researchers. Its packages, such as "caret", "mlr" and "tidymodels", provide a wide range of tools and functions to streamline the entire machine-learning workflow. These packages offer well-documented and efficient solutions, ranging from data preprocessing, feature engineering and model selection to training, validation and evaluation. This study uses Rstudio (V 1.4.1564) and R programming platform (V 4.3.1) to access soft computing techniques.

For training the model, data are initially split into three sections: Training data, Validation data and Test data. At the beginning, classifier training is done by using a training data set, followed by using the validation data set to tune the parameters, so as to estimate the skill of the machine learning model on unseen data. In the final stage, the performance of the classifier is tested by using a test data set. According to a widely used thumb rule, the number of data points should be at least ten times the input parameters (Alwosheel et al., 2018; Haykin, 2009). The minimum number of data points required in this study is 70. To avoid overfitting of the model, k-fold cross-validation is considered, with the value of k as a five-seed value of 42 (Fushiki, 2011). The dataset for training and validation combined should be 85% of the total dataset, with the nearest multiple of 5. The total 90 data points in this study are divided in the ratio of 62/13/15 for training/validation/testing, respectively (Fig. 4). The k here represents the fold for cross-validation; it should not be mistaken with k, which represents the permeability.

Fig. 4
figure 4

Splitting of the total data set

3.2.1 Artificial neural networks (ANN)

This computing technique's working principle is inspired by the biological neural network of human brains. This method was first proposed by MaCulloch and Pitts (1943). A group of simulated neurons make an artificial neural network. Every neuron functions as a node linked to other nodes via connections that resemble biological axon-synapse-dendrite connections. A weight is assigned to each link to indicate how strongly one node will affect others (Winston, 1992). Because they can reproduce and model non-linear processes, artificial neural networks have been applied in many disciplines. In civil engineering, they are widely used for soft computing (Lazarevska et al., 2014; Xu et al., 2022; Yang et al., 2021). The input-hidden-output layer schematic is presented in Fig. 5. The hit and trial method is used to select the number of hidden layers, which is found to be optimum at 10. The hyperparameters are optimized for computation time and respective error (Table 5). The activation function chosen for the neurons in the hidden layers is Rectified Linear Unit (ReLU), which helps introduce non-linearity into the model.

Fig. 5
figure 5

Architecture of the proposed neural network model

Table 5 Hyperparameters for ANN model

3.2.2 Multilinear regression (MLR)

This approach reveals linear relationships between independent (y) and dependent (x) variables. Since multiple-regression takes into account many explanatory variables, it extends the ordinary least-squares regression. The generalized relation is given in Eq. 3, where 'a' represents the intercept and \({\prime}\epsilon {\prime}\) represents the error. The coefficient bn is determined by minimizing the sum of the square of residuals after the model is evaluated with statistical parameters.

$$y=a+{b}_{1}{x}_{1}+{b}_{2}{x}_{2}+\cdots \cdots \cdots \cdots \cdots +{b}_{n}{x}_{n}+\epsilon$$
(4)
$$k=a {(BS)}^{{b}_{1}} {(WFS)}^{{b}_{2}} {(RD)}^{{b}_{3}} {(Q)}^{{b}_{4}} {(T)}^{{b}_{5}}$$
(5)

3.2.3 Decision tree model (DT)

Decision tree is a non-parametric supervised learning method, which can be used for classification and regression. It is aimed at discovering simplistic decision rules derived from data features, so as to build a model that can predict the value of a variable (Fig. 6). This technique is widely used in civil engineering fields, where decisions are often made based on variables' upper or lower limits. For example, if the permeability value is less than 10–6 cm/s, the soil will be classified as clay (Desai & Joshi, 2010; Singh et al., 2020). Table 6 outlines the hyperparameters and their respective approximate values used for tuning a decision tree classifier, where the criterion is Gini impurity.

Fig. 6
figure 6

Parts of the decision tree

Table 6 Hyperparameters for DT model

3.2.4 Random forest (RF)

In 1995, the first random decision forest method was developed by Ho (1995). In the fitting process, errors are computed, and the importance of variables is measured. The relevance of variables in a regression or classification task can be ranked by using random forests. Variables that create high values for this score are given higher weightage than those that produce low values. This method solves the problem of overfitting, since the output is based on majority voting or averaging. This technique is widely used in geotechnical engineering to calculate engineering and index properties (Dutta et al., 2019; Rauter & Tschuchnigg, 2021).

In this method, the number of trees for the prediction of the k-value is optimized by using an error rate curve, as shown in Fig. 7. More trees than the optimum value may increase the calculation time of the model; meanwhile, a less value can predict erroneous values. The error rate is found to vary insignificantly (can be considered constant) after 50 trees. Fundamental settings or hyperparameters that influence the behavior and performance of the model are tabulated in Table 7.

Fig. 7
figure 7

Error rate progressively for number of trees

Table 7 Hyperparameters for RF model

3.2.5 Limitations of AI models

  1. 1.

    ANN: ANN requires a relatively large amount of data for training, so it may be computationally intensive. It is also often considered a kind of "black-box" models, which makes it a challenging task to interpret their decision-making process. Selecting the exemplary architecture and hyperparameters can be a trial-and-error process, and it is sensitive to initial conditions.

  2. 2.

    MLR: MLR assumes a linear relationship between independent and dependent variables. It might not capture complex non-linear relationships in data. Additionally, it is sensitive to multicollinearity among the predictor variables, which can lead to unstable coefficient estimates.

  3. 3.

    DT: Decision tree assumes that data are non-linearly separable, and it can lead to overfitting, especially when the tree depth is not adequately controlled. It may not perform well on highly imbalanced datasets; and biased trees might be created if one class dominates others.

  4. 4.

    RF: Random forest is less interpretable than individual decision trees and can be computationally expensive for large datasets. It may not perform optimally when there is a high degree of multicollinearity in the features; and it may struggle with extrapolation as relying on the range of values seen in the training data.

4 Results and discussion

4.1 Experimental results

The variation in density is plotted against WFS content (Fig. 8). It can be seen that the density reduces by 25% with increased WFS content, because WFS exhibits lower dry density values. The experimental results of the k value for all cases are plotted in Fig. 9. As seen in the surface heat map, the permeability is decreasing as the relative density and WFS content increase. And the permeability also decreases with an increase in the replacement level of WFS. Fully replacement of sand with WFS reduces the k value by 36%, 51% and 57% for RD values of 65%, 75% and 85%, respectively.

Fig. 8
figure 8

Variation of density with WFS content

Fig. 9
figure 9

Variation of permeability with RD and WFS

4.2 Statistical features of data sets

The descriptive statistical summary for the training, validation, testing and total data sets is given in Table 8. The summary features all necessary statistical parameters: count, lower and upper bound, mean, standard deviation, kurtosis and skewness. Standard deviation is the maximum in the validation dataset for all parameters. Kurtosis and skewness are reported for all datasets, purposed to measure the symmetries about the center point and the distribution pattern of data. The Pearsons coefficient between two input parameters shows their relationship, and the histogram represents the distribution of data values (Fig. 10).

Table 8 Statistical features of different data sets
Fig. 10
figure 10

Distribution of input parameters with Pearson's coefficient values

4.3 Performance of AI models

4.3.1 Results of ANN

The hit and trial method was used to select the number of hidden layers, which is found to be optimum at 10. The 5–10–1 network with 76 weights resulted in SSE value of 0.0782 with a skip-layer connection. More than half of the predicted points are on the negative side of the 1:1 line. Figure 11 shows the output of the ANN model. Error lines of ± 20% indicate that the maximum data points exist within that limit (Fig. 11a). The maximum error value is found to be -0.55 m/day, which is unacceptable (Fig. 11c). Consequently, this model is inadequate for determining the permeability of sand and WFS mixture.

Fig. 11
figure 11

Results of artificial neural network: a performance of the model; b actual and predicted k-values; and c distribution of errors

4.3.2 Results of MLR

The results of MLR model are presented in Fig. 12. As can be seen, all data points get fitted between ± 15% error lines (Fig. 12a). The plot for actual and predicted values along data points is presented in Fig. 12b. The maximum value of the errors is 0.4 m/day (Fig. 12c). MLR assumes that the amount of errors in the residuals is similar at each point of the linear model. This scenario is known as homoscedasticity. This assumption is attributed to the low degree of reliability.

Fig. 12
figure 12

Results of multi-linear regression model: a performance of the model; b actual and predicted k-values; and c distribution of errors

$$k=0.5051\frac{{(\text{BS})}^{0.001} {(Q)}^{0.03724}}{{(\text{RD})}^{0.492}{ (\text{WFS})}^{0.0591}{ (T)}^{0.003}}$$

4.3.3 Results of DT

The decision tree analysis predicts the k-values based on Q values. The decision-making process is showcased in Fig. 13. As seen, in a particular box, the k-value is given with the no. of observations (n) and the percentage of observations considered in that condition (%). Condition is written beneath the boxes; all boxes are filled with contrast counter colors (the larger the values, the darker the colors).

Fig. 13
figure 13

Decision tree based on Q value

Figure 14a presents the scatter plot of actual and predicted k-values acquired from the decision tree model. It is shown that this model performs poorly in predicting the k-value of the sand-WFS mixture; so it is not recommended. A particular predicted value covers a wide range of actual values. The relative error value is 0.6 m/day, the highest among all models (Fig. 14c).

Fig. 14
figure 14

Results of decision tree model: a performance of the model; b actual and predicted k-values; and c distribution of errors

4.3.4 Results of RF

This approach performs better than other techniques. Results from the RF model are presented in Fig. 15. The error lines to fit the data in the scatter plot are drawn at 15% on the positive side and at 10% on the negative side. For actual k-values less than 3.5 m/day, the data points are around the 1:1 line, and the values greater than 3.5 m/day lean towards the negative error line (Fig. 15a). The maximum value of error is found to be 0.45 m/day (Fig. 15c).

Fig. 15
figure 15

Results of random forest model: a performance of the model; b actual and predicted k-values; and c distribution of errors

4.4 Comparison of performance of different models

4.4.1 Performance parameters

The effectiveness of the proposed models is evaluated by the following performance parameters: coefficient of determination (R2), mean squared error (MSE), root mean square error (RMSE), performance index (PI), index of scatter (IOS), index of agreement (IOA), variance accounted for (VAF), and a20 index (Table 9). The mathematical expressions, ideal values and significance of each parameter are also listed in Table 9. Notation y represents actual data, \(\overline{y }\) represents the mean of actual data, \(\widehat{y}\) represents the predicted data, n is the number of data points, and 20 m represents the number of data points, which are in the range of \(\pm\) 20% of the actual data.

Table 9 Insights into the considered performance parameters

The values of these parameters for the training, validation and testing datasets are presented in Tables 10, 11 and 12, respectively. The values demonstrate the performance of different models. The values of R2, MSE and RMSE for the testing dataset are less than those for the training and validation datasets.

Table 10 Performance parameters for the training dataset
Table 11 Performance parameters for the validation dataset
Table 12 Performance parameters for the testing dataset

The values of R2 for the training, validation and testing datasets are 0.96500, 0.96614 and 0.9126, respectively. The R2 value for the ANN model is the least of all the proposed models for the testing dataset. For the training data, the values of R2, MSE and RMSE are 0.96106, 0.04289 and 0.2071, respectively.

The performance parameters for RF are optimum for all datasets in a minimum error. The value of R2 for the training, validation and testing datasets are 0.99314, 0.99374 and 0.9579, respectively. Contrast to the random forest, the multi-linear regression performs well, as the values of R2 for the training, validation and testing datasets are 0.98066, 0.96854 and 0.9265, respectively. The values of R2 for the training, validation and testing datasets are 0.96106, 0.95567 and 0.9338, respectively, which shows a poor correlation between the actual and predicted values.

4.4.2 Check for overfitting

Overfitting is a common challenge in machine learning. It means that a model learns the training data to a so excessive degree that it even captures the noises and random fluctuations, instead of only the genuine underlying patterns. As a result, an overfitted model performs well on the training data, but poorly on unseen or new data, leading to bad generalization. Understanding the issue of overfitting is crucial for building up accurate and reliable models on real-world tasks. In this study, the overfitting ratio is computed in Eq. 5. OFR confirms that the RF model is ideally fit.

$${\text{OFR}}=\frac{{{\text{RMSE}}}_{validation}}{{{\text{RMSE}}}_{training}}$$
(6)
figure a

4.4.3 Taylor's diagram

Taylor's diagram quantifies the degree of correspondence between the predicted and actual values. Figure 16 depicts a Taylor diagram, which graphically illustrates the following metrics for all the proposed methods: the value of the Pearson correlation coefficient, the root-mean-square error, and the standard deviation. As can be seen, the mark of RF is much closer to the actual value point than other marks are.

Fig. 16
figure 16

Taylor diagram for comparison of all proposed models

4.4.4 Distribution of residuals

The error distribution highlights the instances where a model consistently underperforms or overperforms along the data points. The error distribution comparison of all proposed AI approaches (Fig. 17) indicates that the random forest is the best-fit approach for the prediction of the k-value of sand-WFS mixtures. The box plots depict the lower and upper values of residuals with outlier points. Investigating these outliers can provide insights into unique scenarios or data points that require special attention. The performance of the models can be compared based on the distance of the median from the origin line. The results are consistent with the performance parameters. Patterns and trends identified from the models' error distribution indicate the absence of systematic errors. These patterns can help increase the robustness of the performance of the models. In addition, the error distribution results show that the models are well-calibrated, particularly in specific prediction ranges.

Fig. 17
figure 17

Distribution of residuals for the proposed models

4.5 Sensitivity analysis

Sensitivity analysis is carried out to determine the most influential input parameters in predicting k-values. As RF is an outperforming approach, sensitivity analysis is done based on the RF method, and the results are presented in Table 13. This analysis is made by relucting one parameter, and the value of R2 is noted. The parameters that cause a reduction in R2 are influential. It is shown that RD and Q are the most influential parameters, which highly impact the k-values of the composite soil.

Table 13 Sensitivity analysis based on the random forest method
$$\widehat{{s}_{i}}=\widehat{{s}_{a}}-\widehat{{s}_{r}}$$
(7)

4.6 Comparison with the existing literature

The best architecture model in this study is the random forest for predicting the soil permeability. The proposed model in this study is compared with the models available from the existing literature (Table 14). The R2 for the actual and predicted datasets of the available models is relatively lower than that of the model proposed in this study.

Table 14 Comparison of the best-performing model with the existing literature

5 Conclusions

This study explores the drainage behavior of WFS-incorporated sand. The experimental research has been extended to include AI modeling. The following major conclusions are drawn from this study.

  • The permeability tends to decrease as the relative density of the soil increases. A notable reduction in the k-value, up to 140%, can be observed when the relative density is increased from 65% to 85%. Similarly, an increase in the replacement level of WFS is associated with the decrease in the permeability. When sand is completely replaced with WFS, there are reductions of 36%, 51%, and 57% in the k-values for the relative density of 65%, 75%, and 85%, respectively.

  • The R2 value and other performance parameters indicate that the relationship between the actual and predicted values is most pronounced in the random forest method. The order of the performance of all the proposed models can be presented as RF > MLR > ANN > DT.

  • Taylor's diagram is used to verify the outcomes of all the considered AI approaches, and it proves the good performance of RF, as its mark is nearer to the actual value. The overfitting ratio for RF is close to 1, indicating a strong level of fitness of the model.

  • Sensitivity analysis demonstrates that Q and RD are the most influential parameters for predicting k-values.