Introduction

Climate change and finding resilient solutions to mitigate its impact, are major challenges in a modern engineering society. Transportation infrastructures are particularly exposed to severe rainfall and directly impacted by unprecedented weather conditions—such as sudden and heavy rainfall. Moreover, Hydraulic earth structures, such as levees and dams, are susceptible to damage [30, 50]. The primary potential failure modes include internal erosion, overtopping, and slope instability. 45% of the 126 reported cases of dam failure overseas resulted from internal erosion, accounting for a significant portion of total failures [32]. Specifically, the crushed rock shoulder, designed to reduce drop-off and enhance safety, is affected. However, a method to rationally evaluate the erosion resistance of crushed rock shoulder materials is hardly found, nor is a test-based design method to select erosion-resistant rock shoulder materials.

One rational approach is to use hydrodynamics-based computer software such as Flow-3D-Hydro [17, 26, 38, 44, 51]. FLOW-3D-Hydro is capable of estimating sediment transport based on dedicated computational models. Different mechanisms, such as bedload transport, suspended load transport entrainment, and deposition, are coupled through the mass conservation concepts. Additionally, it requires many input parameters such as particle diameter, sediment density, critical Shields’ number, bedload coefficient, entrainment parameter, roughness, molecular diffusion coefficient, and turbulent diffusion multiplier. However, obtaining these parameters is not a straightforward process. It requires a solid understanding of the hydrodynamics conditions, making the use of the software challenging in practice without proper training and experience.

Another common and simplified approach developed to describe the erosion model is the excess shear stress model as presented in Eq. (1). [1, 2, 4, 5, 11, 15, 21, 21, 22, 22,23,24,25, 29, 35, 36, 43, 45, 46].

$${\epsilon }_{r}^{.}={k}_{d}{({\tau }_{e}-{\tau }_{c})}^{a}$$
(1)

where \({\epsilon }_{r}^{.}\) is the erosion rate (m/sec), \({k}_{d}\) is the erodibility coefficient (m3/ N-sec), \({\tau }_{e}\) is the fluid- induced shear stress (Pa), \({\tau }_{c}\) is the critical shear stress (Pa), and “\(a\)” is an empirical exponent which depends on the soil type. The exponent “\(a\)” is suggested to be 1 for cohesive soils and 1.5 for non- cohesive soils [46]. Equation (1) is dimensionally correct only when "\(a\)" is equal to 1. Despite this dimensional inconsistency, this equation is widely used in practice due to its simplicity. This excess shear stress model describes the erosion process based on two excess shear stress parameters: the critical shear stress that governs the ultimate erosion depth and the erodibility coefficient that governs the erosion rate. However, extracting these parameters from the erosion tests is not straightforward; it demands a data regression procedure that requires a thorough understanding of the testing method and hyperbolic assumption of the underlying theory. Once the critical shear stress and the erodibility coefficient are evaluated, this excess shear stress model can conveniently predict the erosion behavior of the field soils.

Different techniques were developed to evaluate the erosion properties of soils, such as traditional flume test [19, 20, 40,41,42], Rotating Cylinder Test [34], Hole Erosion Test [52], Erosion Function Apparatus [12], and Jet Erosion Test [24], and Mini Jet Erosion Test [24]. However, their inherent scale of erosion is not large enough to create erosion of shoulder rocks subjected to heavy rains.

The availability of big data and increased computational power have resulted in a new era of data-driven approaches that enhance engineering problem-solving abilities. Notably, AI-based machine learning techniques are gaining significant attention and acceptance as a powerful tool to accomplish these challenges. Machine learning models effectively manage highly nonlinear problems while revealing unknown correlations among various parameters [47, 48]. These models can learn from patterns between input and output data, and once these patterns are learned, they can be used to perform predictions with reasonable accuracy [10]. Al-Swaidani et al. [6] employed machine-learning models to estimate the strength of problematic clayey soils treated with nano lime and nano pozzolan. They found that the ANN technique can effectively predict the expansive clayey soils’ California Bearing Ratio (CBR) and plasticity index. Aregbesola and Byun [7] implemented different machine learning methodologies to classify geogrid reinforcement in stabilized and unstabilized aggregate specimens based on a few properties of aggregate. Remarkably, all models achieved a minimum accuracy of 0.9 in predicting unstabilized specimens, and results suggested that this methodology can effectively determine the type and presence of geogrid reinforcement in aggregates. A review paper by Fatehnia and Amirinia [16] explored the use of AI for predicting the load-bearing capacity of pile foundations. They argue that AI methods such as ANNs offer a significant advantage over traditional methods by providing more accurate predictions due to their ability to handle the complexities of material behavior. The ANN-based modeling approach has been used to study soil erosion as well. Harris and Boardmann [27] proposed expert systems and ANNs as an alternative approach to traditional mathematical models for predicting erosion in the South Downs region of Sussex, England. ASCE [8] details the application of neural networks for rainfall-runoff modeling, stream flow forecasting, and reservoir operations. Licznar and Nearing [33] utilized ANN to reliably predict the amount of soil erosion resulting from rainfall runoff on highway shoulders through quantitative means.

The erosion resistance of soils and rocks may vary due to their shape, size, angularity, and material composition differences. Therefore, employing machine learning methodologies such as ANNs in predicting erosion resistance in specific conditions may offer distinct advantages compared to conventional testing methods, such as cost reduction, lower labor requirements, and time savings.

This study investigates the erosion resistance of commonly used materials for highway shoulders by employing experimental testing and a machine-learning approach. Ten distinct gravel types/gradations were selected for erosion resistance testing. These tests were conducted using a large-scale University of Nebraska Lincoln Erosion Testing Bed (UNLETB). The erosion results obtained from the tests were classified into three categories based on performance: well-performing, poor-performing, and not acceptable. Knowing that ANNs are data-hungry models, a method was devised to generate additional synthetic data within the bounds of the test results. The ANN model was then trained to predict the performance category of a rock material by inputting parameters characterizing the gradation curve (D10, D30, D60, Cu, Cc). The accuracy of the developed model's predictions was evaluated using the hold-out method and k-fold cross-validation method. Once the model was trained and its predictive performance assessed, it could be utilized to determine the suitability of rocks for erosion resistance on highway shoulders. Detailed discussions on the UNLETB system, test results, ANN training methodology, and performance verification are provided in subsequent chapters.

UNLETB and erosion tests results

The University of Nebraska-Lincoln Erosion Testing Bed (UNLETB) idea was inspired by the University of Mississippi Erosion Testing Bed (UMETB) [27, 44] that was used to analyze the erosion of levee soils in New Orleans during Hurricane Katrina. UNLETB concept is to capture the erosion profile with a waterproof video camera (GoPro10) under the plunging circular water jet, as shown in Fig. 1.

Fig. 1
figure 1

a Conceptual Design of UNELTB Font and Top Views, and b UNLETB After Fabrication

UNLETB consists of a large outer tank, sump pumps, PVC pipes, a sample box, a waterproof camera, and a glass plate. Considering that the flow should be sufficient to erode the particles and the nozzle diameter should be relatively larger than the maximum particle size, the nozzle diameter and flow rate were designed to be 7.62 cm and 4000 cm3/sec. The tank is filled with water, and pumps circulate the water to apply it as a jet to the PVC nozzle on a 20 cm × 20 cm × 20 cm sample box. The sample box has an acrylic face with 1 cm grids. A glass plate is placed atop the sample box until the flow stabilizes. Then, the camera is switched on to capture video images of the erosion process. Finally, the video images are analyzed frame by frame to determine erosion depth at desired time steps.

The materials of this study were selected based on the recommendations of Nebraska Department of Transportation (NDOT), based on the availability, and the current utilized materials in the shoulders. The gradation curves and USCS (Unified Soil Classification System) symbols of ten different provided materials are shown in Fig. 2. Material names are kept as local names provided by NDOT.

Fig. 2
figure 2

Gradation Curves of the Tested Materials

Gradation parameters such as D10, D30, and D60 represent the particle sizes corresponding to 10%, 30%, and 60% passing, respectively. Additionally, the coefficient of uniformity (Cu) and the coefficient of curvature (Cc) are usually used in soil characterization and classification. It is believed that these parameters may affect the erosion of the relatively large particles such as the ones used for highways shoulder. The gradation parameters are presented for the tested materials are presented in Table 1.

Table 1 Gradation parameters of the tested materials

The erosion test results conducted using UNLETB for the selected materials are depicted in Fig. 3. The erosion curves are interpreted based on erosion depth and erosion rate. The step-by-step erosion process reflected in the curves is because of the 1 cm rock particle dislodgement, reflected as 1 cm erosion. It is evident from Fig. 3 that while some materials, such as Gravel Surface Course, exhibit erosion depths reaching up to 20 cm, others, such as 1.5 in Rock Aggregate, show negligible erosion. The negligible erosion of the material may be attributed to the presence of large-sized crushed aggregates, which may not be suitable for highway shoulders due to potential tire damage. Based on the erosion depth depicted in Fig. 3, two groups of erosion curves can be observed. In the first group (green box in Fig. 3), the erosion depth of specimens varies between 5 and 10 cm, while in the second group (red box in Fig. 3), the erosion depths vary between 10 and 20 cm. Materials with lower erosion depths indicate higher erosion resistance, while those with higher erosion depths suggest lower erosion resistance.

Fig. 3
figure 3

Erosion Test Results from UNLETB

Accordingly, this work classifies the gradation curves into three categories according to experimental observations. The proposed categories of gradation curves are named well-performing (WP), poorly performing (PP), and not acceptable (NA) based on their erosion resistance performance, as depicted in Fig. 4.

Fig. 4
figure 4

Proposed Classification Categories

The left side in Fig. 4 illustrates the three proposed categories, with the well-performing curves enclosed in the green box, poorly performing curves in the red box, and not acceptable curves in the blue box. The respective erosion results of these boxes (gradation categories) are highlighted in a similar color in the right figure. Upon visual observation, it is evident that the erosion depth of gradation curves within the green box remains within an acceptable range, those within the red box demonstrate excessive erosion depths, and gradation curves within the blue box are deemed not acceptable because of their large size. The proposed basis for this classification using the experimental data and visual observation offers a simplified yet effective approach to categorizing the erosion performance of a material.

Synthetic data generation

Supervised machine learning methodologies rely on a rich dataset to accurately estimate the unknown function that maps inputs to outputs. The limited data from tests made it challenging to train presented a challenge for training the ANN model. To overcome this, a scheme is proposed to increase the data in the database by generating synthetic data. The synthetic data for training the ANN model are generated by systematically shifting an experimental gradation curve with a small distance uniformly towards the direction of larger particle size (left) and towards the direction of smaller particle size (right) on the logarithmic scale, thus mimicking the gradation of new material. The overall process of synthetic gradation curve generation is illustrated in Fig. 5.

Fig. 5
figure 5

a Synthetic Data Generation Process from a Single Curve and b All Synthetic Gradation Curves in their Respective Erosion Performance Groups

By utilizing this approach, all the original experimental gradation curves were utilized to create additional synthetic gradation curves, thereby increasing the diversity and quantity of the training dataset. Although a large number of synthetic data points could be generated using this strategy, 364 curves were deemed to be sufficient for the purpose of training the ANN model in this work. Each synthetically generated gradation curve depicted in Fig. 5b has been categorized into the erosion performance group, denoted by blue, green, and red colors representing NA, WP, and PP, respectively. The range of the synthetic gradation curves, as shown in Fig. 5b, spans from the percent finer 10% to 90% because the input parameters (D10, D30, D60, Cc, Cu) required to train the ANN model fall within this range.

With knowledge of the D10, D30, D60 values of the synthetic gradation curves and their respective positions in the gradation plot (i.e., whether they fall into the well-performing (WP), poorly performing (PP), or not acceptable (NA) category), the ANN model was trained efficiently to predict erosion resistance performance.

Machine learning approach

In this study, an ANN was used as a supervised machine learning model for predicting erosion resistance behavior of different gradations of highway shoulder rocks. ANN models use interconnected nodes in a layered structure mimicking the human brain's thought process. These models can learn from data sets and predict system behavior without prior knowledge of input–output relationships [9, 39]. The feed-forward backpropagation algorithm through supervised learning was implemented to train ANN algorithms to predict erosion behavior. This method improves the ANN's prediction ability by continuously learning and adjusting the model based on new data through a corrective feedback system.

A multiclass classification ANN model predicting material suitability for highway shoulder is trained using the test data and the generated synthetic data. The performance of the trained ANN model to predict erosion resistance is evaluated using the handout and k-fold cross-validation technique. Additionally, the impact of different combinations of input parameters, network architectures, and other hyperparameters on the model’s accuracy is also tested. The overall flow to expand the findings from UNLETB by incorporating the erosion test results into an ANN-based system to accurately and conveniently predict the erosion resistance of various gradation rocks is shown in Fig. 6.

Fig. 6
figure 6

Flowchart for ANN-Based Erosion Resistance Prediction System

Identification of input parameters

It is crucial to include all input parameters that directly or indirectly influence the outputs of the ANN during the training process. Simultaneously, it is essential to eliminate redundant parameters from the high-dimensional dataset to reduce the dimensionality of the input space, thereby improving the predictive performance of the ANN model. This process is commonly referred to as the feature selection technique. There are five parameters (D10, D30, D60, Cu, and Cc) that characterize the gradation curve and can be used as the inputs for the ANN model in this study. Two combinations were tested in order to select the best set of input parameters. The first combination included gravel size parameters D10, D30, and D60 as the input parameters. In contrast, in the second combination, uniformity coefficient (Cu) and coefficient of curvature (Cc) were also included as inputs, in addition to D10, D30, and D60. The determination of the most effective combination for predicting erosion resistance performance will be based on the model test results.

Data preprocessing

The results from sieve analysis, erosion tests and synthetic data were collected in a database. It is generally a good practice to normalize or pre-process the input features in the dataset before initializing training to achieve better performance [53]. Pre-processing typically accelerates the learning process and balances the focus of the training on all variables, by ensuring all variables are treated equally [18]. Therefore, all the data in the database were rescaled from the original range to a common range between 0 and 1 using the normalization equation given in Eq. (2), where Xnorm is the normalized value, Xinp is the actual input value, Xmin is the smallest value, and Xmax is the largest value in the input dataset. This way, the original distribution of the data was retained but scale was changed by applying a uniform scaling factor.

$${X}_{norm}= \frac{{X}_{inp}- {X}_{min}}{{X}_{max}- {X}_{min}}$$
(2)

The outputs of the ANN model are the categorical data (WP, PP, NA), so they must be encoded to numerical values before use in training. A popular technique, one hot encoding [37] was used, where each label (class) is mapped to a binary vector. To achieve this, the categorical value was first transformed into the integer values, and then each integer value was represented as a binary vector where all elements are set to zero, except for the element at the index corresponding to the integer value, which is set to one as shown in Fig. 7.

Fig. 7
figure 7

One Hot Encoding of Categorical Data

Model configuration

The ANN model was created in a three-step process: (1) training using 70% of the data, (2) testing using 15% of the data, and (3) validating using the remaining 15% of the data in the database. If the model meets the training criteria in the first step, it moves on to the testing step, where its performance is evaluated using a previously unseen test dataset. If the model does not meet the training criteria, it returns to the first step for additional training. The type and number of hyperparameters used in the ANN model were decided using the trial-and-error method. Burian et al. [14] suggest that the accuracy of an ANN model and its overall performance tend to improve with a decrease in the number of hidden neurons. Developing a neural network model involves a crucial and challenging aspect of selecting an appropriate architecture. This entails determining the optimal number of layers and nodes within each layer. There is no standard process for determining the optimal ANN architecture. Hence, the ANN architecture (number of layers and neurons) gradually increased, starting with a single hidden layer containing three neurons until there was an improvement in the predictions. The proposed neural network used the Rectified Linear Unit (ReLU) activation function in the hidden layer, known for its speed and improved performance [3]. The output layer of this multiclass classification ANN model is designed to represent the vector class, a task facilitated by the implementation of the softmax activation function. As suggested by Bridle [13], the softmax function transforms the vector of numerical outputs into probabilities. The architecture of the proposed ANN model is shown in Fig. 8.

Fig. 8
figure 8

Architecture of the Proposed Neural Network

To train the constructed ANN model, an Adam optimizer proposed by Kingma and Ba [31] was used. Adam optimizer is based on the stochastic gradient descent method and is known for its computational efficiency and low memory requirements. The configuration parameters (alpha, beta1, beta2, and epsilon) of this optimizer were kept as default. The categorical cross-entropy loss function was defined during the training process for calculating the difference between the predicted probabilities of a classification model and the actual outputs. The equation for the categorical cross-entropy loss function is presented in Eq. (3), where y represents the actual output for a given input and \((\widehat{{y}_{i}})\) denotes the predicted probability for that input.

$$Loss= -\sum_{i=1}^{\begin{array}{c}output\\ size\end{array}}{y}_{i}*\text{log}(\widehat{{y}_{i}})$$
(3)

Validation of trained model

Two validation techniques were used to evaluate the performance of the model. The first technique is the hold-out method, in which the data is divided into different sets: one set for training and the other for testing. However, this technique can sometimes lead to a biased model performance. That is why the second validation technique, k-Fold cross validation proposed by Jung and Hu, [28] is also used. In the k-fold cross-validation technique, the dataset is divided into k subsets. The hold-out method is then iterated k times, where each of the k subsets serves as the test set while the remaining k-1 subsets are utilized for training the model. The value of k in this work was selected as five, and the average performance from all k tests was calculated. Both validation techniques are illustrated in Fig. 9.

Fig. 9
figure 9

Validation Techniques Employed for ANN Model

Evaluation of ANN model

The number of accurate predictions made by an ANN model is used for assessing the model's robustness in classification problems. It plays a critical role in determining the model’s classification accuracy and overall reliability. For this purpose, a confusion matrix [49] is commonly used to evaluate the ANN classification model’s performance and indicate how well the model predicts the correct class label for a data set. Each row in the confusion matrix represents the predicted class of a sample, and each column represents the actual class of the same samples. A confusion matrix typically contains four elements: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). A confusion matrix for two output predictions is presented in Fig. 10.

Fig. 10
figure 10

A Basic Confusion Matrix

TP represents the number of specimens correctly predicted as positive by the model located in the diagonal section of the confusion matrix. TN refers to the number of specimens correctly predicted as negatives by the model, determined by adding up the values in all rows and columns except for the row and column of the class in question. FP represents the number of specimens incorrectly predicted as positive by the model, calculated by summing up all the values in the column of that class, excluding the TP value. FN refers to the number of specimens incorrectly predicted as negative by the model, determined by adding all the values in the row pertaining to that class, excluding the TPs. TP, TN, FP, and FN values in the confusion matrix are used to evaluate the ANN model performance using various metrics such as Accuracy, Precision, Recall, and F1 Score and identify areas for improvement.

The accuracy metric represents the overall accuracy of the multiclass classification ANN model. It is defined as the number of correct predictions (TP) divided by the sum of all values in the confusion matrix (TP + FP). The precision metric evaluates the model's ability to identify positive cases accurately and measures the accuracy of predicting a particular class. It is calculated by dividing the number of TP predictions for a class by the sum of TP and FP predictions. The recall metric represents the ability of the model to detect all positive cases. It is calculated as the number of TP predictions divided by the sum of TP and FP predictions. The F1 score metric combines the precision and recall measurements into a single value. It is calculated by taking the harmonic mean of the precision and recall values, which gives equal weight to both measurements. A high F1 score means the model performs well in both identifying positive cases accurately and detecting all positive cases. A low F1 score, on the other hand, indicates the model needs improvement in either precision or recall. The accuracy, precision, recall, and F1 score can be calculated using Eq. (4) – (7).

$$Accuracy=\frac{ \sum True Positives}{\sum True Positives+ \sum False Positives}$$
(4)
$$Precision = \frac{True Positive}{True Positive+False Positive}$$
(5)
$$Recall= \frac{True Positive}{True Positive+False Negative}$$
(6)
$$F1 Score= 2*\frac{Precision*Recall}{Precison+Recall}$$
(7)

Results

ANN training was initially set to run for 1000 iterations, but it was observed there was little improvement in the loss value after 300 iterations. As a result, the network was trained for 300 epochs. The best performing ANN model was determined after conducting trials with two combinations of input parameters, different hidden layer neurons, and two different validation methods. Table 2 gives information about the various models and the optimal parameter values for each model.

Table 2 Various ANN models based on different parameters

As shown in Table 2, there is not much difference in the accuracy values when the Cu and Cc parameters are included in the input layer. Therefore, to reduce the size of the ANN model and the associated computational time, the input parameters in the first combination with three variables (D10, D30, D60) were selected. The accuracy values achieved using the hold-out validation and k-Fold cross validation showed a slight difference mostly because hold-out validation is susceptible to variance especially if the data is small. Therefore, the results achieved using k-Fold cross validation were accepted as an accurate estimate of the model's generalization performance because this method can better detect overfitting. Various trials were conducted using a combination of different hidden layer neurons and epochs. Initially, the training started on a model with three neurons in the hidden layer and trained 32 times. Using this model, an accuracy of approximately 67% was achieved. The model's performance improved as the number of neurons and epochs increased. The model with eight neurons in the hidden layer and trained on 300 epochs showed the best performance among all the models. Therefore, the architecture of the best performing ANN model consisted of an input layer with three input parameters (D10, D30, D60), one hidden layer with eight neurons, and finally, one output layer with three outputs. The accuracy of this ANN model reached up to 99% both on training and testing data. This indicates the model does not suffer from overfitting. As a result, the overall improvement in the accuracy of the model was 48% from the initial model. Using the ANN model, the erosion resistance behavior of rocks can be predicted with 99% accuracy based on its D10, D30, and D60 values obtained from a sieve analysis test. Figure 11 illustrates the improvement in the accuracy of the best trained model as the number of iterations increases.

Fig. 11
figure 11

Accuracy vs. Epochs of the Trained Model

Performance evaluation

The average confusion matrix of the validation dataset using k-Fold cross validation is shown in Fig. 12. This matrix compares the actual target values (true labels) with those predicted by the ANN model and gives a holistic view of how well the classification model works and what types of mistakes are present. The main diagonal cells (top left to bottom right) represent the average number of correctly predicted outputs, while the off-diagonal cells represent the average number of wrongly predicted outputs across all five folds. As clearly visible from Fig. 12, most of the values in the off-diagonal are zero, indicating that the ANN model exhibits excellent classification capabilities. Out of 75 instances in the validation dataset, 74 (27 + 25 + 22) correct predictions were made, as presented in the main diagonal. The cell located in the first row and the second column indicates that a 0.6 sample (or 1 sample) was incorrectly predicted as belonging to the PP class when, in reality, it belongs to the WP class. The model accurately predicted the rest of the instances related to the class PP and NA.

Fig. 12
figure 12

Average Confusion Matrix

The data in the confusion matrix can be used to evaluate the model performance first by calculating the matrix elements, i.e., TP, TN, FP, FN, and then calculating various metrics based on Eqs. (4)-(7). The values of these confusion matrix elements are calculated in Table 3.

Table 3 Class-based true positives, true negatives, false positives, and false negatives

The TP values for the WP, PP, and NA classes from Table 3 indicate that the model made accurate predictions of 27 instances being positive for the WP class, 25 instances being positive for the PP class, and 22 instances being positive for the NA class. The TN value for the WP class was 45, meaning the model correctly identified 45 instances as not belonging to the WP class. Similarly, the TN values for the PP and NA classes were identified as 49 and 53, respectively. The FP values for the WP, PP, and NA classes were calculated as 0, 1, and 0, respectively, meaning the model made one incorrect prediction for the PP class, labeling an instance as positive (i.e., belonging to the PP class) when actually it was negative (i.e., not belonging to the PP class). The FN value of 1 for the WP class was identified, meaning the model incorrectly predicted an instance in the WP class as negative (i.e., not belonging to the WP class) when, in fact, it was positive (i.e., belonging to the WP class).

The performance of the ANN model was calculated using evaluation metrics such as the overall accuracy, class-based precision, recall, and F-score given in Eqs. (4)-(7). The values of these calculated matrices are presented in Table 4.

Table 4 Class-based precision, recall, F1 score values and overall accuracy

The precision metrics for the WP, PP, and NA classes were determined as 1, 0.96, and 1, respectively. This indicates that the ANN model accurately predicted 100% of instances for the WP and NA classes while achieving a 96% accuracy for the PP class. The recall metrics for the WP, PP, and NA classes were identified as 0.96, 1, and 1, respectively. This implies that 96% of positive cases for class WP were correctly predicted by the model, while all positive cases for PP and NA were accurately predicted, indicating a perfect recall. The F1 scores for the WP, PP, and NA classes were determined as 0.98, 0.98, and 1, respectively. An F1 score of 0.98 for WP and PP suggests that the model achieved a good balance between precision and recall. Moreover, a perfect F1 score of 1 for NA indicates the model’s perfect accuracy in identifying positive cases (precision) and its ability to detect all positive cases (recall) perfectly. The F1 scores for the WP, PP, and NA classes were calculated as 0.98, 0.98, and 1, respectively. An F1 score of 0.98 for WP and PP indicates the model had a good balance between precision and recall. The F1 score of 1 for NA indicates perfect precision and recall for this class. Finally, the overall accuracy of the ANN model was calculated to be 99%. This confirms that 99% of all predictions were correct.

Conclusion

In this paper, the erosion resistance of highway shoulder rocks was evaluated based on experimental studies conducted using the large-scale UNLETB. An ANN multiclass classification model was developed to facilitate the prediction of the erosion performance of shoulder rocks conveniently without requiring specialized testing equipment. This model was trained with a successful strategy for generating synthetic data, with the aim of categorizing the erosion performance of rock materials into three groups: Well Performing (WP), Poor Performing (PP), and Not Acceptable (NA). The best performing ANN model was obtained after testing the various combinations of input parameters, model architectures, training iterations and validation techniques. The performance of the trained model was assessed using various evaluation metrics such as class-based precision, recall, and F1 score, and overall accuracy. Based on the results obtained from this study, following conclusions can be drawn:

  • The ANN model achieved 99% accuracy levels in its predictions and was successfully able to distinguish between the different erosion behavior of shoulder materials within three performance groups (WP, PP and NA) based on information from gradation curves. Extensive testing of the model's performance using various evaluation methodologies yielded exceptionally favorable outcomes.

  • The successful implementation of the ANN classification model, combined with its ability to accurately categorize erosion into three groups, highlights the potential for the application of machine learning techniques in solving complex problems in the field of geotechnical engineering.

  • This work provides valuable insights into the behavior of shoulder rocks under erosion and can support engineers and researchers in making informed decisions regarding shoulder materials selection for erosion applications.