Shrink–swell index prediction through deep learning

Growing application of artiﬁcial intelligence in geotechnical engineering has been observed; however, its ability to predict the properties and nonlinear behaviour of reactive soil is currently not well considered. Although previous studies provided linear correlations between shrink–swell index and Atterberg limits, obtained model accuracy values were found unsat-isfactory results. Artiﬁcial intelligence, speciﬁcally deep learning, has the potential to give improved accuracy. This research employed deep learning to predict more accurate values of shrink–swell indices, which explored two scenarios; Scenario 1 used the features liquid limit, plastic limit, plasticity index, and linear shrinkage, whilst Scenario 2 added the input feature, ﬁnes percentage passing through a 0.075-mm sieve (%ﬁnes). Findings indicated that the implementation of deep learning neural networks resulted in increased model measurement accuracy in Scenarios 1 and 2. The values of accuracy measured in this study were suggestively higher and have wider variance than most previous studies. Global sensitivity analyses were also conducted to investigate the inﬂuence of each input feature. These sensitivity analyses resulted in a range of predicted values within the variance of data in Scenario 2, with the %ﬁnes having the highest contribution to the variance of the shrink–swell index and a relevant interaction between linear shrinkage and %ﬁnes. The proposed model Scenario 2 was around 10–65% more accurate than the preceding models considered in this study, which can then be used to expeditiously estimate more accurate values of shrink–swell indices.


Introduction
The reactivity of soils is a characteristic that affects the mechanical properties of most clayey grounds [1]. Reactive soils undergo substantial volume changes in response to the variations in soil moisture content through swelling and shrinking. The shrink-swell ground movement leads to distresses concerning infrastructures built on and in the vicinity of reactive soils [2,3]. The damages to lightweight structures such as pavements, underground pipes, and residential structures due to these shrink-swell ground movements are well known [4,5]. The severe damage brought by reactive soils has been recorded in Australia, China, Egypt, India, Israel, South Africa, the UK, and the USA, totalling a significant annual financial loss [6,7]. In the UK, the impact of reactive soils summed up to £3 billion from 1991 to 2001 due to the effect of droughts, making it the most damaging geohazard in the region [8]. In the USA, the cumulative rehabilitation costs were more than twice the financial loss incurred from natural disasters due to floods, hurricanes, tornadoes, and earthquakes, amounting to around US$ 15 billion per year [8]. Li et al. [9] found that the damage caused by reactive soils in Australia was mostly to lightweight structures even though no combined estimates are reported in the literature. Approximately 20% of the Australian land can be categorised as moderately to very highly reactive soils, with six out of eight major cities being affected, causing geohazards to structures and infrastructures [6].
The shrink-swell soil index (I ss ) is a soil parameter commonly used to determine the potential characteristic surface movement (y s ) of sites having reactive soils in & B. Teodosio bertrand.teodosio@vu.edu.au 1 Australia [10]. This index is determined through laboratory testing using undisturbed soil samples collected from the field. The I ss index has been used in Australia for more than 15 years and the Australian Standards, AS 1289 7.1.1. provides standardised testing procedures. Estimates of y s have generally been successful in determining the dimensions of residential slabs and footings. Determining the value of I ss requires the collection of undisturbed soil samples for the swell test and the simplified core shrinkage test. Undisturbed soil sampling costs relatively higher and is more difficult to implement than disturbed sampling. Determining I ss takes a longer time to obtain results, which can take more than four days involving around two hours of hands-on experimental work depending on the skill level of the individual performing the tests [1]. This can be a significant waiting period for most practitioners and researchers, and the results are sensitive to instrumental conditions, skills and experience of the tester, and changing ambient conditions. Several studies had attempted to correlate I ss with other soil properties such as the Atterberg limits (i.e. liquid limit (LL), plasticity limit (PL), plasticity index (PI), and linear shrinkage (LS)) to estimate I ss indirectly. However, most studies have found sub-par correlation coefficients (R 2 \ 0.80) between I ss and the Atterberg limits, which ranged from 0.20 to 0.53 [9]. For instance, Young and Parmar [11] performed more than 300 laboratory tests to correlate I ss with more common soil indices such as gravimetric soil moisture content (x), LL, PL, PI, and LS that resulted in low correlation factors. Earl [12] suggested that the Atterberg limits alone may not be reliable to estimating values of I ss based on his investigations using clay samples from the Shepparton geologic formation. Reynolds [13] performed similar correlation analyses for a dataset of clay samples collected from Central, Southeast, and Northwest Queensland for a pavement design application and also reported weak relationships. Similar weak correlations were found in the investigations conducted by Zou [14] and Li et al. [9], where soil samples were collected from 47 study sites from 37 suburbs in Melbourne. However, these investigations employed simple univariate regressions that limit the capturing of nonlinear relationships between extracted features or variables. Contrarily, Jayasekera and Mohajerani [15] found a relatively stronger correlation with R 2 values that ranged from 0.85 to 0.91. However, their investigation had a low variance dataset, with I ss values that varied from 3.8 to 5.5, which limited the predictive capacity of their model [15].
Recent advances in data engineering and data science have expanded the application of artificial intelligence (AI) techniques to many disciplines [16]. AI refers to the ability of a machine or robot to display intelligence comparable to humans by learning through experiences in performing a specific task with improving measured performance [17]. Machine learning (ML) is a subset of AI referring to algorithms capable of learning and improving performance without explicit programming or hard coding (Fig. 1a). ML tasks include recognising objects, understanding speech, responding to a conversation, solving problems, optimising solutions, greeting people, and driving a vehicle [18][19][20]. Rumelhart et al. [21] initially proposed shallow learning that initiated ML applications (Fig. 1b). These shallow neural networks restrict algorithmic support and are unable to train multiple hidden layers due to limitations in the computing power and available data [22].
Recent applications of AI in geotechnical engineering include geotextile [23,24], tunnelling [25], geothermal energy [26], unsaturated flow [27], geo-structural health monitoring [28,29], liquefaction [30], nanotechnology [31], carbon sequestration [32], and soil properties and behaviour prediction [33][34][35]. The ML techniques applied in these past investigations include artificial neural network (ANN), support vector machine (SVM), genetic algorithms (GA), fuzzy logic, image analysis, and adaptive neurofuzzy inference systems (ANFIS). One of the emerging ML techniques in geotechnical engineering is deep learning (DL), an implementation of ANN with multiple hidden layers as presented in Fig. 1c. It allows computation of more complex features of the input layer [36,37]. Each hidden layer computes a nonlinear transformation of the preceding layer [38]. This deep network can have a substantially greater representation of the extracted features that can learn significantly more complex functions than a shallow network [22]. However, understanding the implementation of DL can be a challenge due to its complex network. It is beneficial for users to understand the implementation of algorithms to enable the accurate and confident application of complex models such as DL [22,36,38].
The application of AI to investigate the properties and nonlinear behaviour of reactive soil is currently not well explored, although there has been an increased application of AI in geotechnical engineering in general [39]. As linear correlations between I ss , and Atterberg limits had potential, most model accuracy and data ranges of the previous studies found sub-par results. DL has the potential to give better accuracy in the prediction of I ss using the Atterberg limits given the ability of neural networks to handle complex nonlinear scenarios. The application of Atterberg limits can be used by promoting adaptive nonlinear models and providing insightful findings [16]. Therefore, the objective of this study is to employ DL to predict more accurate values of I ss using different combinations of soil properties including LL, PL, PI, LS, and %fines. This can contribute to an efficient process of calculating the maximum potential characteristic surface movement, y s . This study also carries out a sensitivity analysis to identify the relative influence of each input variable, LL, PL, PI, LS, and %fines, on the targeted output, I ss in %/pf. The sensitivity analyses elaborate on how the specific DL prediction mechanism functions with respect to the input features, increasing the applications.

Methodology
The concept of ANN is analogous to the neural network of a human brain in the way it processes information and establishes logical relationships [40]. A collection of connected nodes called artificial neurons comprise a network comparative to those of a human brain. The artificial neurons are connected by links called edges to transmit signals from a single neuron to other neurons. These signals are represented by real numbers. Each node and edge have weights that serve as a correlation factor that adjusts signals as learning occurs. The main function of this network is to obtain the lowest value of a loss or cost function, L(y,yˆ), that will give the optimum weights. The DL process is influenced by two main considerations: the architecture of the neural network and the learning process of the implemented algorithm.

Deep learning architecture
In this current study, a DL network was used to obtain the acceptable weights for I ss prediction and comprised of an input layer, ten hidden dense layers, and an output layer. The input and the output values are the expected number of input and output neurons, which indicate the size of the matrices for the calculation. The input layer contained the input vector extracted from a dataset and the number of artificial neurons in the input layer was determined by the input features extracted. In this study, two scenarios were considered. The first scenario, or Scenario 1, used the input features LL, PL, PI, and LS, whereas the second scenario, or Scenario 2, also used %fines.
Earl [12] and Reynolds [13] suggested that LL, PL, PI, and LS were not sufficient to be employed for estimating the values of I ss . Thus, Earl [12] added clay and silt fraction, and Reynolds [13] included California Bearing Ratio (CBR), per-cent swell, and other polynomial features as inputs. The resulting values of R 2 were higher compared to those with Atterberg limits alone and ranged from 0.51 to 0.78 [13] and from 0.54 to 0.82 [12]. However, the data samples they used were limited (n \ 30). In addition, the dataset had low variance (0.1 \ I ss \ 4.0), and the models were not tested or validated for their prediction capacity. These improved outcomes led to the development of Scenario 2 of this study, which included %fines as an additional input to predict I ss . The addition of the CBR by Reynolds [13] as an input did not result in greater R 2 compared to the addition of; thus, in the current study, the CBR was not considered.
In this current study, the number and size of hidden dense layers were determined by trial and error. The hidden dense layers connect each neuron, receiving input from its preceding layer. The dimension of the first five hidden dense layers was restrained to eight since greater dimensions resulted in the model calculation divergence (i.e. erratic and high values of calculated loss). The dimension of the last five hidden dense layers was increased to 128 to generate more nonlinearity in the relationship between neurons. The increase in the number of hidden layers and the value of the dimension, depending on a specific scenario, often leads to improved accuracy [41]. The consequence is the time inefficiency in performing the DL algorithm, specifically for the training period to obtain the optimum weights. The output layer concludes the DL learning process using one neuron.

Deep learning process
The learning process of a DL neural network comprises five main stages, (1) pre-processing, (2) random initialisation, (3) forward propagation, (4) backward propagation, and (5) evaluation. The DL process implemented in this study is summarised in Fig. 2.
Pre-processing a dataset is an essential step that can increase the accuracy of a DL network training and validation. The common pre-processing techniques applied to previous DL networks include the removal of data entries with outliers and missing values, creation of polynomial features, implementing feature scaling, and employing normalisation to a dataset [42].
The dataset used in this study was extracted from the five studies conducted by [12-14, 43, 44], as presented in Appendix 1. A total of 169 and 116 data entries were collected for Scenarios 1 and 2, respectively, with a summary description presented in Fig. 3 and Table 1. Outliers were determined using the interquartile range (IQR), calculated as where IQR is described as where Q1 is the first quartile, Q3 is the third quartile, and subscripts lb and ub indicate the lower bound and the upper bound outliers. The circles in Fig. 3 represent the outliers less than or more than the calculated values of Outlier lb and Outlier ub . A comparison between DL with and without the outliers showed comparable results indicating a negligible effect of the outliers. Therefore, in this study, the complete dataset without removing the outliers was used. Data entries with missing values of I ss , LL, PL, PI, LS, and %fines were omitted for Scenario 2. Polynomial features are commonly created to incorporate a nonlinear relationship between the target and the input features. Polynomial features are added to improve the accuracy of linear models with limited features or when one feature is dependent on another. The use of polynomial features was initially implemented in the dataset but did not result in a noteworthy effect on the accuracy of the algorithm. Therefore, these features were not implemented in the DL of this study. The entire dataset was randomly split into two; one for training and the other for testing, with a ratio of 70-30%, as listed in Table 1. The 70-30% ratio was considered the most suitable division for training and validating neural network models with small dimensional datasets [45].
Two feature scalings, standard scaling and min-max scaling, were tested to check if the DL process would improve. The feature scaling using the standard scaling was observed to be more applicable to the study and was employed using the following equation: where x is the data entry, x is the mean value of a feature, and s is the standard deviation of a feature. The values of x and s for the input parameters LL, PL, PI, and LS are presented in Table 1, for both training and testing data. Applying normalisation was considered and implemented in preliminary model runs. However, normalising resulted in a lesser accuracy of the results, compared to standard scaling.
The random initialisation method of He et al. [46] was used to initialise the DL process. Following this method, the weights were randomly initialised with values close to zero and then multiplied by ffiffiffiffiffiffiffiffiffiffi , where size L-1 was the number of neuron in layer L -1. Multiplying this term helps consider the nonlinearity of the activation functions. This initialisation proposed by He et al. [46] was specifically used together with the Rectified Linear Units (ReLU) activation, solving learning inefficiency and vanishing gradient issues. The loss function employed in the DL process is the mean squared error (MSE), which is a commonly used function for regression. The calculated loss is the mean overseen data of the squared differences between a true value in the dataset and a predicted value calculated by the DL algorithm described as  where y i is the actual value, yˆi is the predicted value, and N is the total number of data entries. ReLU by Nair and Hinton [47] was implemented as the activation function for the forward and backward propagation, which reads where x i is the input value of feature i. A simplified representation of the usage of the activation function is presented in Fig. 4. Ridge Regression or L2 Regularisation was also used since the preliminary DL runs experienced overfitting. L2 regularisation adds a squared magnitude of coefficient as penalty term to the loss function defined as where k is a hyperparameter for regularisation, and w i is a weight of a feature. The value of k is taken to be greater than zero. Taking the value of k too high may lead to larger weights and underfitting. After fine-tuning the hyperparameter k, the value was specified as 1.00 for Scenario 1 and 2.35 for Scenario 2.
The Adaptive Moment (Adam) estimation by Kingma and Ba [48] was used in the DL neural network. The Adam stochastic optimisation is widely used due to its benefits of straightforward implementation, computational efficiency, and lower required memory. The Adam method combines the momentum gradient descent method and the Root Mean Squared Propagation (RMSprop), which is modelled as where Vdw, Vdb, Sdw, and Sdb are the derivative of the weights and bias, which is being computed in iteration or epoch, t. The initial values of Vdw i , Vdb i , Sdw i , and Sdb i are assigned to zero and then calculated for each weight.
The calculated values of Vdw, Vdb, Sdw, and Sdb are then corrected using the power of the current epoch, t, described below Each weight and bias will be updated using the equations below The learning rates (a) for Scenarios 1 and 2 were taken as 7.5 9 10 -5 and 5.0 9 10 -5 after fine-tuning using trial and error. The decay rates for both scenarios were b 1 = 0.9, b 2 = 0.999, and e = 5.0 9 10 -6 . The forward and backward propagation was implemented in a loop until the specified epoch was achieved, as shown in the DL process in Fig. 2. Note that every iteration of the optimisation loop comprises forward propagation, cost calculation, backward propagation, and weights updating. The epoch of the final DL run was 500 since this value had resulted in an optimum and stable loss curve with an acceptable learning period, completing the deep learning processes in less than three minutes for each scenario.

Sensitivity analysis
Sensitivity analyses help identify the independent influence of input variables on a targeted output. There are two types of sensitivity analyses; local and global approaches. A local sensitivity analysis assesses the local impact of feature variations concentrated on the sensitivity in the proximate vicinity of a set of feature values [49]. On the other hand, a global sensitivity analysis quantifies the overall importance of the features and their interactions with the predicted results by implementing a comprehensive coverage of input values [50]. This study used a global sensitivity analysis approach by implementing the method by Saltelli [51] and Sobol [52,53]. The bounds to generate the input features were specified as LL = 15-130, PL = 5-60, PI = 0-100, LS = 0-40, and %fines = 1-100 listed in Table 1, and then the scheme by Saltelli [51] was implemented. This was based on the range of values presented in Fig. 3b. Three indices were calculated. The first one was the first-order Sobol index (S 1 ), calculated as [52,53] where var(x i ) is the variance of a feature, var(b y) is the variance of the target output, I ss , E denotes expectation, x i is a feature, and b y is the target output, I ss . The term E(b y|x i ) in Eq. 18 indicates the expected value of the output b y when feature x i is fixed. The first-order Sobol index, S 1 , reflects the expected reduction in the variance of the model when feature x i is not changing. Thus, S 1 measures the direct effect of each feature on the variance of the model. It is worth noting that the sum of all the calculated values of S 1 should be equal to or less than one. It is common to perform the calculation of S 1 , and the total Sobol sensitivity S T , which includes the sensitivity of first-order effects and the sensitivity due to interactions between a feature X i and all other features [54] given by where x -i denoted the features except x i , and the sum of all the calculated values of S T is equal or greater than one.
If the values of S T are substantially larger than the values of S 1 , then there are likely higher-order interactions occurring. Hence, it is worth calculating the second-order or higher-order sensitivity indices (e.g. S 2 ).
The second-order and higher-order sensitivity indices can similarly be expressed as where var(x i , x j ) is the variance of features x i and x j . This calculates the amount of variance of b y explained by the interaction of features x i and x j .

Results and discussion
The results of the DL training, testing, and sensitivity analysis are discussed in the following sections.

Prediction of I ss using deep learning
The DL process outlined in Fig. 3 was implemented to the randomly allocated dataset for training (118 and 81 data entries for Scenarios 1 and 2, respectively) and testing (51 and 35 data entries for Scenarios 1 and 2, respectively) listed in Table 1. The calculated loss values of the training and testing set using Eq. (7) against epochs are presented in Fig. 5. The loss values of both Scenarios 1 and 2 showed acceptable learning curves. This curve is a diagnostic tool for algorithms that learn from training datasets incrementally. The learning curves for both Scenarios 1 and 2 display a good fit that is negligibly experiencing underfitting or overfitting. This is characterised by training and testing loss values that decrease to the point of stability with a minimal gap between the two, as shown in Fig. 5. It is common to have a difference between the final loss values of the training and testing curves. Training curves having loss values less than testing curves are referred to as a ''generalisation gap''. It can be observed that Scenario 1 (Fig. 5a) had a relatively wider gap between the training and testing loss values than Scenario 2 (Fig. 5b). This shows that Scenario 2, with features LL, PL, PI, LS, and fines, can give relatively better I ss predictions with less overfitting due to generalisation.
The final performance training and testing of the model was assessed in terms of root mean squared error (RMSE) calculated as The RMSE indicates the average deviation of predictions from the measured values, with values closer to zero indicating better performance. The calculated RMSE for Scenario 1 was 1.26, whilst Scenario 2 had a lower RMSE of 0.90. This strengthens the authors' inference that adding %fines as an input feature, even though this reduces the size of the dataset, can more reliably predict I ss .
Further evaluation of the models of Scenarios 1 and 2 was carried out using an identity line or 1:1 line, as shown in Fig. 6. The 1:1 line has a slope of 1, forming a 45°angle. This line is used as a reference in a 2-dimensional scatter plot comparing two datasets that are expected to be alike under ideal conditions. When all the actual and predicted data points have equal values from the two datasets, the corresponding scatters fall along the 1:1 line [55]. Using the 1:1 line, there are two measurements we want to obtain that reflect the reliability of the predictions of the models. The first measurement is the coefficient of determination, R 2 , calculated as where RSS is the sum of squares of residuals and TSS is the total sum of squares.
The second measure is the linear regression coefficient or slope that describes the relationship between the predicted and actual values. The values of R 2 and slope range between zero and one, with unity indicating a perfect fit.
The training set of Scenario 1 estimated R 2 to be 0.81 and the slope to be 0.75 (Fig. 6a), whilst the testing set of Scenario 1 resulted in a value of R 2 of 0.76 and a slope of 0.59 (Fig. 6b). These obtained correlations showed  improvements compared to most previous studies. For instance, Li et al. [9] reported that the correlation between I ss and the Atterberg limits ranged from R 2 = 0.20 to R 2 = 0.53. Most of the performed studies up to date have concluded that I ss and Atterberg limits are poorly correlated [11][12][13][14], except in the case of Jayasekera and Mohajerani [15] that found a relatively stronger correlation with R 2 values, which ranged from 0.85 to 0.91. However, the study of Jayasekera and Mohajerani [15] focused on a dataset with a low variance that limits the predictive ability of their model, with I ss values that varied from 3.8 to 5.5.
The training set of Scenario 2 estimated R 2 to be 0.84 and the slope to be 0.95 (Fig. 6c), whilst the testing set of Scenario 2 resulted in a value of R 2 of 0.82 and a slope of 0.85 (Fig. 6d). The values of R 2 in the training and testing sets using the DL architecture and process implemented in Scenario 2 were comparable with Earl [12], noting that Scenario 2 had a wider variance (0.1 \ I ss \ 9.0). It can be observed from Fig. 6c that the slope is 0.95, which can be considered a strong correlation. However, due to the generalisation gap discussed earlier, the testing set commonly has lower accuracy than the training set. This holds in Fig. 7d, where the slope decreased to 0.85, which still shows a stronger correlation.

Sensitivity analysis
The bounds to generate the input features are specified in Table 1. The scheme by Saltelli [51] was implemented to generate the input features for predicting the values of I ss . Sensitivity analysis was then performed (1) to assess the influence of the features on the targeted output, and (2) to identify the relationship between input features and their influence on the target output. Descriptive statistics of the generated input variables using the scheme by Saltelli [51] and the predicted results using DL for Scenarios 1 and 2 are listed in Table 3. The generated input features of Scenario 1 were similar to the figures in Scenario 2. Interestingly, the predicted values of I ss of Scenario 1 were observed to have higher values than Scenario 2, leading to higher calculated x , s, minimum value, Q1, median, Q3, and maximum value. It is important to note that in Table 3, the maximum value of the predicted I ss in Scenario 1 was 16.8, which is almost twice greater than the maximum value in the training set (9%/pF) presented in Fig. 3 and Table 1. On the other hand, the maximum value of the predicted I ss in Scenario 2 was comparable to the training data (& 9%/ pF). Thus, the range of predicted values of Scenario 2 is more practical and within the acceptable range.
The results of the global sensitivity analysis using Sobol [52,53] for Scenario 1 and Scenario 2 are presented in Fig. 7. In Scenario 1, it can be observed in the first-order Sobol indices, S 1 , presented in Fig. 7b, that LS exhibited first-order sensitivities. This signifies that LS has the highest contribution of a single parameter to the output variance of I ss . On the other hand, LL appears to have no first-order effects suggesting that it has a low contribution to the variation of the predicted values of I ss . The values of the total-order Sobol index, S T , were checked afterwards and are shown in Fig. 7a. If the values of S T are markedly higher than those of S 1 , then there are likely higher-order interactions occurring. Higher-order interactions indicate that the fractional contribution of parameter interactions to the output variance exists. The values of S T revealed (Fig. 7a) higher values than S 1 . Hence, the second-order indices, S 2 , were calculated. PI and LS had the strongest feature interaction followed by PL and LS in Scenario 1, as shown in Fig. 7c. The remaining interactions can be considered insignificant.
In Scenario 2, it can be observed in S 1 presented in Fig. 7e that fines exhibited first-order sensitivities. This signifies that %fines has the highest contribution of a single parameter to the variance of I ss . The other features, LL, PL, PI, and LS, appear to still have substantial first-order effects on the variation of the predicted values of I ss . The values of S 1 in Scenario 2 (Fig. 7d) were more than four times lower than Scenario 1 (Fig. 7b). This reveals that the individual contribution of features in Scenario 2 is more distributed to other variables than Scenario 1. The values of S T were then computed (Fig. 7d) and were found to be substantially larger than that of S 1 , hence, likely higherorder interactions are occurring. The interaction between LS and %fines had the largest values of S 2 , followed by PL-%fines, PI-LS, and PI-%fines, as shown in Fig. 7f.

Model comparison
The predictive accuracy of Scenario 1 was comparable to the models of Earl [12], Reynolds [13], and Li et al. [9]. The accuracy of the developed DL model was compared to the proposed models in previous studies. The comparison involved same seeded scenarios to predict the values of I ss using the models listed in Table 4.
The developed DL neural network of Scenario 2 performed the best among the models, with the most desirable values of slope, R 2 , and RMSE. This indicates that Sce- Table 3 Descriptive statistics of the generated input variables using the scheme by Saltelli [51] and the predicted results using DL for Scenario 1 (with features LL, PL, PI, and LS) and Scenario 2 (with features LL, PL, PI, LS, and fines) Scenario 1 (n = 1,310,720) Scenario 2 (n = 1,572,864)  Table 4.

Generated input variables Predicted results Generated input variables
The models of Earl [12] and Reynolds [13] with LL as input had fairly acceptable values of slope, R 2 , and RMSE. The performance of the models and the values of R 2 conformed to the published work of Earl [12], Reynolds [13], and Li et al. [9] when applied to the compiled dataset used in this study (shown in Appendix). However, the models by Jayasekera and Mohajerani [15] underperformed when applied to the test dataset of this study. This may be due to the limited data and low variance of I ss values used in their dataset and when applied to the compiled dataset in this study, the predictive range became inappropriate.

Conclusion
The soil parameter I ss is widely used in Australia to determine the potential ground surface movement. However, determining I ss takes a longer time to obtain results taking more than four days and involving hours of handson experimental work. This study developed an efficient method to estimate accurate values of I ss through DL neural networks. This study proposed two scenarios; Scenario 1 involved the features LL, PL, PI, and LS whilst Scenario 2 added the input feature of %fines. The proposed models were further investigated using sensitivity analysis to identify the relative influence of the input features on the targeted output, I ss . This predictive DL model may significantly reduce the waiting period for the laboratory test results that are highly sensitive to instrumental conditions, skills and experience of the tester, and changing ambient conditions.
Results of implementing DL neural networks showed more reliable predictions in Scenario 2 (training: R 2 = 0.84 and slope = 0.95; testing R 2 = 0.82 and slope = 0.85) than Scenario 1 (training: R 2 = 0.81 and slope = 0.75; testing R 2 = 0.76 and slope = 0.59). These results suggested that adding a more relevant feature can be more beneficial than more data samples. Furthermore, the sensitivity analysis resulted in a more practical range of predicted values in Scenario 2, with %fines having the highest contribution to the variance of I ss . The values of R 2 in the training and testing sets using the DL architecture and process implemented in Scenario 2 were considerably higher and had a wider variance than those of the previously conducted studies. This makes Scenario 2, the proposed model, around 10-65% more accurate than the models considered in this study for predicting I ss . The developed DL neural network of Scenario 2 can then be used to estimate more accurate values of I ss if an expedited result is required for design calculations. Acknowledgements This work was funded in partnership with the Victorian State Government that the authors would like to acknowledge and thank.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions. This work was funded in partnership with the Victorian State Government.
Availability of data and materials Data are available in Appendix 1.

Declarations
Conflict of interest The authors declare that there is no conflict of interest regarding the publication of this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.