1 Introduction

In general, excavating underground spaces may decrease load-bearing capacity in the surrounding rocks or walls and create a void from moving the displacement rocks into an excavated cavity (Galvin 2016). This risk of caving must be permanently or temporarily reinforced by applying tendons. Among all tendons' types, rock bolting systems, categorized as both active and passive reinforcement systems, are more practical for avoiding further collapsing in the excavated spaces' roofs or walls (Cao et al. 2012; Galvin 2016; Tincelin 1991). In this type of reinforcing system, rock bolts are generally inserted into a borehole and drilled into unstable rocks (Blanco Martín et al. 2011). After extending into the unstable zone, they will be anchored into the stable areas to stabilize the fracture zone using a fixture. This mechanism causes transfer of loads from surrounding rocks with lower tensile strength's values to the rock bolts (Ho et al. 2019; Thompson et al. 2012). Due to the simplicity, availability of materials, easing of installation in the field, and lower costs, fully grouted rock bolting system is the most popular of the rock bolting systems, which is being extensively applied in many fields of mining, civil, and geotechnical engineering (Blanco Martín et al. 2011; Li 2017; Thenevin et al. 2017; Jodeiri Shokri et al. 2024). Unlike other ground support techniques, for underground spaces, such as U-shaped steel, steel mesh, and hydraulic props, which passively resist rock mass deformation around the tunnel perimeter, this continuously mechanically coupled (CMC) rock bolt system actively interacts with the surrounding rocks. The bolts and grout, typically cement or resin, act as an internal reinforcement, creating a composite structure that behaves more like a solid mass and reduces stress concentrations around the excavation (Blanco Martín et al. 2011; Feng et al. 2017; Nourizadeh et al. 2023a, b).

Pull-out tests, in which a bolt is grouted into a rock sample and then steadily pulled out, are generally performed to determine the failure or maximum pull-out capacity and displacement's values of a fully grouted rock bolting system (Che et al. 2020). For several decades, the pull-out test has been conducted in-situ or laboratory to study the axial load transfer mechanism resulting from the fully grouted rock bolts (Aziz et al. 2017; Aziz et al. 2020; Blanco Martín et al. 2011; Che et al. 2020; Chen et al. 2018; Chen et al. 2021; Chen et al. 2019; Entezam et al. 2023; Gregor 2023a, b; Hao et al. 2020; Høien et al. 2021; Jodeiri Shokri et al. 2023; Kılıc et al. 2002; Li et al. 2016; Moosavi et al. 2005; Motallebiyan et al. 2023; Nemcik et al. 2014; Salcher and Bertuzzi 2018; Thenevin et al. 2017; Yu et al. 2022). For instance, Kilic et al. (2002) conducted 80 laboratory fully grouted pull-out tests to suggest new empirical relationships between grouting materials and untensioned fully grouted rock bolts on basalt blocks. Also, they presented several relationships for calculating the maximum pull-out capacity capacity based on different influential factors of grouts, such as the shear strength, the uniaxial compressive strength (UCS), the curing time (CT), and considering bolt profile specifications, such as length, diameter, and the bonding area. Moosavi et al. (2005) presented some empirical relationships based on confining pressure and a maximum pull-out capacity of the pull-out test. Blanco Martin et al. (2011) predicted mechanical behaviors of fully grouted rock bolt with a new analytical approach. For this, the considered bolt radius, young modules, and displacement of the free end of the bolt were considered input parameters. Eventually, they developed a closed-form solution for predicting the load vs. displacement curve resulting from the pull-out tests. Feng et al. (2017) designed and performed a series of pull-out tests to find a laboratory solution for interpreting the mechanical behaviors of rock bolting systems in strata layers with applying different segmented steel tubes. They found that the bolt installed in the strata layers was more durable and stable. Thenvin et al. (2017) provided a comprehensive dataset of laboratory pull-out tests considering various diameters of rebars and types of grouts under constant radial stiffness or confining pressure boundary conditions. Che et al. (2020) investigated a series of comprehensive pull-out tests in resin-forced soft rocks under various conditions. The discrete element method (DEM) analyzed the test, considering a new micro-bond contact model. Chen et al. (2019) developed a new analytical model based on the ratio of the bolt to grout. Giot et al. (2019) carried out several in-situ pull-out tests in claystone for investigating the load transfer mechanism of fully grouted rock bolts. The bolts were instrumented using strain gages along their lengths. The results showed an increasing the axial strains from the head to the far end of the bolts. Entezam et al. (2023) suggested that the UCS values of grout and the axial-bearing capacities increased by replacing a small fly ash content in the grout mixture. Motallebiyan et al. (2023) found that the ultimate bearing capacity increased by increasing the ribs spacing. In another research, Jodeiri Shokri et al. (2023) concluded that the confinement’s diameter directly impacted on the maximum peak loads values of pull-out tests in the fully grouted rock bolting system.

As the literature reviews reveals, measuring the axial bearing capacity of fully grouted rock bolting systems through a pull-out test is critical for ensuring their safety, stability, and overall performance. Indeed, accurate predictions are a base and pillar of the structural integrity of various engineering projects, from underground mining to geotechnical engineering. By accurately forecasting maximum peak loads (PL) and displacements (DP) values using machine-learning (ML) methods, researchers and engineers can identify potential failures in reinforcement systems in advance, saving time and reducing costs. In addition, precise predictions contribute to optimizing design parameters, ensuring the efficient use of resources and materials in various projects. Therefore, the accurate prediction of these parameters in the pull-out tests is not only essential for the immediate safety of the structures but also contributes to the long-term sustainability and resilience of the built environment.

Moreover, literature reviews revealed that although some valuable relationships have been suggested for the investigation of the fully grouted pull-out test, artificial intelligence (AI) techniques have not been employed yet for predicting the most influential parameters, including the peak load of the test and its’ displacement during the test. For this purpose, the main objectives of this paper are:

  1. (a)

    To present and develop the most recent AI techniques, such as multiple linear regression (MLR), multi-layer perceptron neural network (MLPNN), Bayesian regularized neural network (BRNN), generalized feed-forward neural networks (GFFNN), extreme gradient boosting (XGBoost) and ensemble soft computing models as powerful tools for investigating the pull-out test;

  2. (b)

    To build a dataset in predicting the axial-bearing capacity of the fully grouted rock bolting system;

  3. (c)

    To find the effects of the most crucial parameters, such as CT and water-to-grout ratios (W/G), on the maximum capacity of the pull-out test.

2 Materials and Methods

2.1 Data measurements

To determine the maximum peak loads and displacements values of the rock bolt system, 34 fully grouted pull-out small-scale with various values of water-to-grout ratios (W/G) and CT, were conducted after preparing the required samples.

2.2 Designing fully grouted pull-out tests

2.2.1 Required materials

Cementitious grout (Stratabinder HS), 16 mm diameter rebars, 50 mm diameter and 50 mm long steel pipes, a casting board base, and a tensile machine attachment, were chosen as the required materials (Fig. 1). A base with a top layer of polymethyl methacrylate with 53 mm holes regularly spaced across the base was prepared for casting the required samples. Also, a medium-density fibreboard (MDF) with a Melamine laminate overlay was used as the second layer with 18 mm holes to locate the rebars inside them (Fig. 1d). Also, an attachment made from a square steel tube, a steel bolt, and a nut was designed for the tensile testing machine (Fig. 1e).

Fig. 1
figure 1

The materials, including a stratabinder, b rebars, c steel pipe, d casting board, and e attachment for the tensile machine, have been used for the required pull-out test

2.2.2 Sample preparation

As mentioned before, two independent variables, including various (W/G) and CT, were considered as input data. The (W/G) were 30%, 35%, and 40%. After preparing the grouts, they were carefully cured for 7, 14, 21, and 28 days (Fig. 2). Figure 3 shows the prepared samples after casting them on a base.

Fig. 2
figure 2

a and b A view of one of the pull-out samples; c procedure of casting samples (Nourizadeh et al. 2021)

Fig. 3
figure 3

A view of the pull-out test

2.2.3 Pull-out tests

After preparing and casting the required samples, series of comprehensive pull-out tests were conducted by using a tensile testing machine made by measure test simulate (MTS) Insight® electromechanical testing systems at the University of Southern Queensland (UniSQ). The rate of the test was 1 (mm/min). It is noteworthy that to better simulate the actual conditions of the rock bolt system in underground spaces, the steel pipe acted as the confining rock, while the grout and rebar together represented the rock bolt system itself. It means that confining pressure of the surrounding rocks would be assumed to be as hoop strength of the pipe. Additionally, the tensile testing machine will simulate the external forces acting on the rock bolt by applying a pulling force along its axis. A general view of pull-out tests and prepared samples were is shown in Fig. 3.

3 Overview of employed AI techniques

3.1 Artificial neural network

Artificial neural network (ANNs) is one of the AI techniques inspired by the biologic nature of neural networks (Jodeiri Shokri et al. 2013). ANN dataset includes two primary datasets: the training dataset, 80% of the dataset, and the testing dataset, including the rest. ANNs are the best alternative to other statistical methods, such as MLR and non-linear (NMLR) regression analyses, so that ANNs can identify input similarity, improving the indefinite data’s interpolation. Three following ANNs methods, including the multi-layer perceptron neural network (MLPNN), Bayesian regularized neural network (BRNN), and generalized feed-forward neural network (GFFNN), were used in this study.

3.1.1 MLPNN

The MLP technique consists of three main layers: the input, hidden, and output. All data have proceeded in the form of signals in all layers. Generally, the number of input neurons is the same as the number of independent variables, while the output neuron(s) represent the dependent variable(s). The number of hidden neurons will be determined based on previous experience knowledge and using a trial-and-error procedure. An overfitting in the network might happen when too many hidden neurons are assigned. It also may result in increasing network processing time. The general Fig. 4 shows the MLP’s structure. Noteworthy, MLP simulates the output value by Eq. 1 (Bakhtavar et al. 2021):

$$ y = f_{i} \left( {\sum\limits_{i = 1}^{n} {w_{ij} x_{j} + b_{i} } } \right) $$
(1)

where x and y are the input and output values, respectively. w signifies the weights, and b denotes the bias values, and f shows the activation (transfer) function.

Fig. 4
figure 4

The general architecture of the MLP model

3.1.2 BRNN

In 1992, MacKay proposed the Bayesian regularized algorithm for tackling issues such as finding optimal hidden layers in designing ANN structures. He applied Bayes’s theory to the regularization procedure (Mackey 1992). Indeed, the BRNN is a type of propagation neural network that integrates the conventional sum of the least-squares error function as follows (Fig. 5) (Bui 2012):

$$ E_{D} = \sum\limits_{i = 1}^{n} {\left( {y_{i} - t_{i} } \right)^{2} = } \sum\limits_{i = 1}^{n} {\left( {e_{i} } \right)^{2} } $$
(2)
$$ S\left( w \right) = \beta E_{D} + \alpha E_{W} $$
(3)
$$ E_{W} = \sum\limits_{i = 1}^{m} {w_{i}^{2} } $$
(4)

where n and t indicate the number of training datasets and the target value, respectively. α and β are hyperparameters (regularization parameters), Ew indicates penalty term (large penalizer values of the weights), m is the number of weights, and S(w) is the performance function of the network.

Fig. 5
figure 5

Topology of the BRNN model

3.1.3 GFFNN

GFFNNs are generalizations of MLPNNs in which one or more layers can be connected. The GFFNN prevents complexity problems much more considerably than MLPNN. Outwardly clarifying the issue, MLPNN is usually trained hundreds of times, further learning epochs. However, a GAFFNN uses only a few training epochs (Fig. 6) (Abbaszadeh Shahri and Asheghi 2018).

Fig. 6
figure 6

A view of the general GFFNN architecture

3.1.4 XGBoost

In 2015, Chen and Guestrin proposed the XGBoost method to solve the main classification and regression problems (Chen and Guestrin 2016). This technique enables the parallel creation of boosting trees efficiently and generates them simultaneously (Bhattacharya et al. 2020; Duan et al. 2020; Nguyen et al. 2019; Ren 2017; Zhang and Zhan 2017). The XGBoost model employs gradient boosting (GB) to provide a circumstance under which an objective function (OF) is comprised. The optimizer of the value of OF is the core of the XGBoost method, which operates with each different optimization technique (Nguyen et al. 2019; Duan et al. 2020). The defined OF in the XGBoost consists of two main elements, i.e., training loss (L) and regularization (Ω) (Eq. 5) (Zhao et al. 2023):

$$ OF\left( \theta \right) = L\left( \theta \right) + \Omega \left( \theta \right) $$
(5)

To measure model performance relevant to the training dataset, training loss must be determined. The regularization term is effectively applied to control the overfitting problem and deal with its accruing. In this regard, the system complexity related to boosted trees is investigated using Eq. 6 (Zhao et al. 2023):

$$ \Omega \left( f \right) = \left( {\gamma \cdot n} \right) + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}\lambda \cdot \sum\limits_{j = 1}^{n} {\left( {\omega_{j}^{2} } \right)} $$
(6)

In which γ is the complexity of each leaf, n stands the number of leaves, λ scales the penalty and \(\omega_{j}\) indicates the vector of scores on leaves. Notably, the structure score of XGBoost is the OF formulated as Eqs. 7 and 8 (Zhao et al. 2023):

$$ OF = \sum\limits_{j = 1}^{n} q + \left( {\gamma \cdot n} \right) $$
(7)
$$ q = \left( {G_{j} \cdot \omega_{j} } \right) + \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}\left[ {H_{j} + \lambda } \right]\omega_{j}^{2} } \right) $$
(8)

where \(\omega_{j}\) and q are independent vectors and best \(\omega_{j}\) for a presented structure (a quadratic form). Gi and Hi are first and second derivatives of the MSE loss function, respectively.

3.2 Ensemble model

This research also incorporated the implementation of ensemble learning models to predict PL and DP values resulting from the pull-out test. For this, several base models (sub-models) were integrated to build an ensemble learning model. There are four following approaches to integrate sub-models, such as, (a) simple averaging (SAE), (b) weighted averaging (WAE), (c) integrated stacking, and (d) separate stacking ensemble models. Besides, super learner ensembles can be implemented using bagging and boosting techniques.

The selection of diverse base models may improve the ensemble technique’s overall predictive performance. For this, base models with reliable performance are combined to provide a better comprehensive analysis. This technique helps to tackle the limitations of individual models while allowing the ensemble to boost the collective strengths of each base model. Among the base models, which have had a better performance, which higher values of evaluation criteria can determine, will be chosen. Opting for high-performance models helps ensure that the ensemble maintains a high level of accuracy and generalizability compared to any single model. This process of selecting and combining the required base model may reduce the variance associated with individual models, minimizing the impact of noisy data points, and mitigate the risk of overfitting, leading to more stable and consistent predictions. Furthermore, ensemble learning techniques are versatile and can be applied to different data and ML algorithms, including regression, classification, and clustering.

3.2.1 Sub-models

Several AI techniques were employed to develop a set of base models. Each base model had a unique structure or architecture. These base models were then averaged to create a final, more robust model. For instance, in developing an ANN model, n basic MLP model with different hidden layers, transfer functions, and optimisers can be presented. These base models were utilized to achieve averaging values using basic, weighted, integrated, and separate stacking techniques.

3.2.2 Averaging techniques

3.2.2.1 SAE

The SAE technique generated all the results by averaging the outputs from each sub-model learning (Fig. 7).

Fig. 7
figure 7

The SAE procedure

3.2.2.2 WAE

In the SAE technique, equal weights were assigned for each sub-model which can improve the results. The obtained WAE (Fig. 8) combined results by averaging the outputs for all base models. Optimization algorithms can be employed to determine sub-model weight.

Fig. 8
figure 8

Diagram of the WAE-DE hybrid algorithm

3.2.2.3 Stacking ensemble

One of the other methods is stacking ensemble (Fig. 9) to average sub-model results, in which the meta-learner procedure combines sub-models in large numbers into an averaged model. This process considerably resulted in having better results. In other words, a meta-learner model was trained by training basic models and combining base model predictions. In this paper, two following stacking ensembles were used for training basic models: (a) the phrase-integrated stacking ensemble (ISE) and (b) the separate stacking ensemble (SSE).

Fig. 9
figure 9

The stacking ensemble method

3.2.2.4 Super learner

The super learner (SL) as an ensemble technique uses the stacked generalization to k-fold cross-validation. In addition, this technique is categorized in the cross-validation ensembles. The k-fold divides of the datasets are imported into each sub-model, and then a meta-model is obtained for each base model out-of-fold outcomes. Figure 10 shows the SL technique. After dividing the dataset into train and test datasetes, the training dataset was divided into tenfolds for cross-validation. Subsequently, k-fold splitting was applied for evaluating all sub-models, and prediction results obtained from each model were then recorded.

Fig. 10
figure 10

Flowchart of the SL technique

3.3 Data analysis

3.3.1 Data presentation

A comprehensive statistical analysis was conducted based on the dataset. Two effective parameters were identified as model inputs to predict the maximum peak load and displacement values resulting from pull-out tests on fully cementitious grouted rock bolts. Thirty-four data with different W/G and CT were measured. Descriptive statistics of inputs and output parameters are given in Table 1. The W/G values varied between 30 and 40%, while CT ranged from 7 to 28 days. The measured maximum peak load and DL values were also 23.65–53.67 kN and 4.98–12.24 mm, respectively. The boxplots of effective parameters are illustrated in Fig. 11, which indicates that neither the median nor the equivalence line lies in the centre of the boxes, and available data are not symmetric. Notably, the effective parameters had only outlier data for displacement. The adverse relationships between the variables and, subsequently, the creation of natural groups in the dataset, are due to outliers in the data set. Thus, data investigation for detecting outliers and natural groups facilitates the development of predictive models by providing a more homogeneous dataset. Figure 11 shows Pearson correlations. As seen, both the W/G and CT negatively affected the displacement values (Fig. 12). Also, the W/G parameter negatively affected the peak load values. The violin and histogram plots of effective parameters are demonstrated in Figs. 13 and 14, respectively.

Table 1 Descriptive statistics of effective parameters
Fig. 11
figure 11

Box plot of effective parameters

Fig. 12
figure 12

Pearson correlation between effective parameters

Fig. 13
figure 13

Violin plot of effective parameters, including (W/G), CT, PL, DP

Fig. 14
figure 14

Histogram plot of effective parameters

3.4 Pre-analysis and model evaluation

To facilitate model development for peak load and displacement, the measured data was normalized beforehand. This normalization simplifies the modeling process. The normalized data should have been imported into AI models in the first step. For this, available data were normalized using the min–max normalized method that changes the data range to 0 to 1 values based on Eq. 9 (Hosseini et al. 2022c):

$$ X_{n} = \frac{{X_{m} - X_{min} }}{{X_{max} - X_{min} }} $$
(9)

where xn signifies the normalized values of x, xmax, xmin represent the maximum and minimum value of variables, and xm is the measured value of the variable.

Afterwards, training and testing datasets were randomly determined. Twenty-four data (80%) were considered for model training, while the rest (20%) were used for the testing model. This selection was based on suggestion of researchers (Hosseini et al. 2022a, b; Wang et al. 2023a; b). The provided datasets in the normalizing step were then used in developing the MLPNN, BRNN, GFFNN, and XGBoost models. In fact, the model parameters were modified at the point of prediction to obtain the highest accuracy and performance of the models.

To improve the generalization efficiency of the ANN models, several hyperparameters were carefully adjusted during the model training process. For this, K-fold cross-validation procedure was used to analyse the generalization efficiency of trained models (Qi et al. 2018). Kohavi (1995) proposed the ten-fold cross-validation method to provide an optimized trained model. This method splits training datasets into tenfolds (subsets), in which one fold is considered for the validation part, and the rest of the folds are specified for the training part (Lin et al. 2018). Each training dataset will be repeated ten times to train and validate the model. Averaging the performances of ten iterations can be used to calculate the overall performance of the selected hyperparameters.

In this study, the various AI models were developed, and their optimal structure was determined by employing evaluation indices for degree of accuracy and performance. In the third step, three evaluation criteria, including determination coefficient (R-squared), value account for (VAF), and root mean square of errors (RMSE), were determined to investigate the efficiency level of the ENN models (Eqs. 1012) (Dehghani et al. 2021; Hosseini et al. 2022a, b; Shamsi et al. 2021).

$$ R^{2} = 1 - \left( {\frac{{\sum\limits_{i = 1}^{n} {(O_{i} - P_{i} )^{2} } }}{{\sum\limits_{i = 1}^{n} {(P_{i} - \overline{P}_{i} )^{2} } }}} \right) $$
(10)
$$ RMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {(O_{i} - P_{i} )^{2} } } $$
(11)
$$ VAF = 100 \cdot \left( {1 - \frac{{var(O_{i} - P_{i} )}}{{var(O_{i} )}}} \right) $$
(12)

where Oi signify measured value, Pi indicates predicted value, \(\overline{P}_{i}\) denotes average of the predicted values; n stands the number of datasets. Noteworthy, the value of one, zero, and 100 for R2, RMSE, VAF, respectively indicate a model with the highest performance and accuracy.

In the next step, the final rating of the model (FRM) and color intensity system (CIS) were applied to compare and assess the performance degree of different developed models. In the FRM procedure, the R2, RMSE, and VAF values were rated. The highest rate was considered a model with the highest R2 and VAF values with the lowest RMSE value. The highest rate depends on the number of obtained models. For instance, the best model will have a rate of 10 if there are 10 models. To formulate the FRM rating system, Eq. 13 was used (Hosseini 2023; Hosseini et al. 2022c).

$$ FRM = \sum\limits_{i = 1}^{2} {\left( {r_{i}^{{R^{2} }} + r_{i}^{RMSE} + r_{i}^{VAF} } \right)} $$
(13)

where ri indicates the rate of evaluation criteria, i stands 1 for train rates of evaluation indicators or 2 for test rates of evaluation indicators.

4 Results and discussions

Due to the practical applicability and relative ease of implementation of the SAE algorithm, it was used to create the ensemble learning structure in this paper. The SAE method offers several advantages compared to WAE, SE, and SL. The SAE is a straightforward approach that quickly integrates diverse base models with minimal complexity compared to other ensemble techniques. In addition, its effectiveness reduces variance and improves overall performance, especially when dealing with a limited dataset where computational resources or time constraints are factors to consider. Also, this technique helps reduce the risk of overfitting by taking an unweighted average of the predictions from multiple models. In fact, this approach ensures that all models are equally weighted in the ensemble, preventing any model from dominating the final prediction. Given the constraints and scope of our study, the SAE could provide a reasonable baseline for comparison with more advanced ensemble methods. Four techniques, including MLPNN, GFFNN, BRNN, and XGBoost, were trained for constructing SAE learning, and their prediction results were presented in the following sections.

4.1 Developing multiple regression model

Multiple regression is a statistical model that adjusts the relationships between the output(s) and the inputs. The MLR model can be formulated as follows (Entezam et al. 2022; Shakeri et al. 2020) (Jodeiri Shokri et al. 2020):

$$ Y = c_{0} + c_{1} x_{1} + c_{2} x_{2} + c_{3} x_{3} + \cdots + c_{n} x_{n} $$
(14)

Y is the response variable, x is the input variable, c0 is constant, and c1,c2, …, cn are regression coefficients.

Unlike MLR, Multiple non-linear regression (MNLR) is a technique to recognize the non-linear relationships between dependent and independent variables using non-linear and linear relationships. This paper used SPSS software V.24 to obtain MLR and MNLR. The same training and testing datasets were used to create regression models for developing models. The MLR model revealed the linear relationship between independent and dependent variables formulated for maximum peak load and displacement values as Eqs. (15) and (16), respectively. Moreover, the MNLR model was constructed for the maximum peak load and displacement values as Eqs. (17) and (18), respectively. As found, the W/G and CT parameters were used as predictors, and peak load and displacement were considered the response parameters.

$$ PL = 88.736 - \left( {1.455 \times \frac{W}{G}} \right) + \left( {0.064 \times CT} \right) $$
(15)
$$ DP = 15.174 - \left( {0.212 \times \frac{W}{G}} \right) - \left( {0.015 \times CT} \right) $$
(16)
$$ \begin{aligned} PL & = - 16418999.497 + \left( {5738720.163 \times \sqrt {WG} } \right) \\ & + \left( {69.425 \, \times \log_{10}^{CT} } \right)\, - \left( {6.025 \times 10^{ - 13} \times e^{WG} } \right) \\ & + \left( {5.558 \times 10^{ - 12} \times \, e^{CT} } \right) - \left( {500442.371 \times WG} \right) \\ & - \left( {2.27 \, \times \, CT} \right) \\ \end{aligned} $$
(17)
$$ \begin{aligned} DP & = 14.51 - \left( {1.371 \times \sqrt {WG} } \right) + \left( {0.438 \times \sqrt {CT} } \right) \\ & - \left( {4.309 \times 10^{ - 18} \times e^{WG} } \right) - \left( {1.1794.309 \times 10^{ - 12} \times e^{CT} } \right) \\ \end{aligned} $$
(18)

4.2 Developing a hybrid MLPNN model

This paper used an ANN to solve the problem due to its complex nature. The Levenberg–Markvart (LM) was applied in the system for learning function and training ANNs. The performance of ANNs was controlled by several hyperparameters, such as the number of hidden layers, the number of neurons in the hidden layers, the type of transfer (activation) function, and the learning algorithm. Indeed, the accuracy and results of each ANN depend on the type and value of the mentioned parameters. These parameters were set with different values to provide a network with the best possible performance and determine an optimal ANN architecture in this paper. This process used a trial-and-error procedure and did not follow any particular rule. The properties of the developed networks for predicting peak load are given in Table 2. The “purelin”, “logisg”, “tansig”, and “radbas” were considered as transfer functions. Furthermore, the total number of hidden nodes ranged from 12 to 40.

Table 2 Obtained MLPNN models for predicting peak load values

Ten cases were studied, and the results of the MLPNN with various scenarios were obtained (Table 3). The optimal structure of MLPNN were chosen based on evaluation criteria. The FRM method (Eq. 13) was applied to rate each R2, RMSE, and VAF of training and testing procedures. For instance, values of 3.527, 2.343, 2.834, 3.966, 4.638, 3.916, 2.939, 3.943, 4.075, 4.962, 5.303, and 3.808 were calculated for the RMSE indices of testing dataset for models 1–11, respectively. Subsequently, the ranks of 9, 12, 11, 5, 3, 7, 10, 6, 4, 2, 1, and 8 were obtained for mentioned models, respectively. Also, the CIS was employed to validate the chosen step of the optimal model and faster visual selection. Herein, an exclusive color (for example, red) was assigned to each row. Then, the model with the highest rate was determined with an intensive red color in each row. The color with less intensity (lighter colours) receives the lower rate (weight) of indices. The results of modeling peak load indicated that model No. 3 with a structure of 2–8–9–1, the transfer function of “logsig–logsig–tansig–tansig” and a total rate of 63 out from 66 is the first-ranked MLPNN model. As seen in Table 3, high R2 values of 0.896 and 0.819 for train and testing proved the capability of the MLPNN model in predicting peak load values. Figure 15 compared the actual and predicted values of peak loads using model No. 3.

Table 3 Obtained evaluation indices and FRM and CIS to detect the optimal topology of MLPNN for peak load prediction
Fig. 15
figure 15

Correlations of peak load predicted by modeling MLPNN with measured values

The ensemble model, a new generation of developed models, was constructed in the next step. Therefore, the more accurate models with the highest rates were considered between others to averaging and developing a new model with the highest accuracy and performance. Models 2, 3, 7, 9, and 11 were selected with a cumulative rate of 51, 64, 52, 49, and 34, respectively. Furthermore, the structures of networks were “2–6–9–1”, “2–8–9–1”, “2–12–15–1”, “2–16–18–1”, and “2–20–20–1”. The R-squared values of this model were obtained as 0.959 and 0.939 for training and testing, respectively. It means that the ensemble model of MLPNN (EMPLPN) could predict peak load values better than the based models of the MLPNN technique with higher accuracy.

The mentioned process was repeatedly performed to predict displacement values as well. Eleven base models were generated, and the best ones, which could accurately predict displacement, were selected to construct EMLPNN. The results are given in Tables 4 and 5. Model No. 10 was the best, with a total rate of 59 out of 60 and R-squared values of 0.940 and 0.913 for training and testing datasets. The predicted and actual displacement values were compared in Fig. 16. Eventually, the ensemble model was run using averaging models 7, 8, 9, 10, and 11.

Table 4 Obtained MLPNN models for displacement prediction
Table 5 Obtained evaluation indices and FRM and CIS to detect the optimal topology of MLPNN for displacement prediction
Fig. 16
figure 16

Correlations of displacement predicted by modeling MLPNN with measured values

4.3 Developing a hybrid BRNN model

This section predicted the peak load and displacement values using the BRNN technique. It is essential to check the stopping criteria to obtain the best BRNN model. Since the system complexity is determined by the number of hidden neurons in the BRNN model, the modeling process of BRNN is controlled by determining the number of hidden nodes as a stopping criterion. The number of hidden nodes was adjusted based on a range between 2 and 11 to avoid overfitting and learning issues. Ten BRNN models were then developed. The performance of models was evaluated by applying system evaluation indicators (Eqs. 1012). The results from BRNN modeling for predicting peak load and displacement are reported in Tables 6 and 7. The FRM and CIS procedures were used to choose the optimal architecture of BRNN models. BRNN models nos. 2 and 5 are the best to predict peak load and displacement, with a cumulative rate of 55 and 57 out of 60, respectively. The structures of optimal models were 2–3–1 and 2–6–1, respectively.

Table 6 Obtained evaluation indices and FRM and CIS to detect the optimal topology of BRNN for peak load prediction
Table 7 Obtained evaluation indices and FRM and CIS to detect the optimal topology of BRNN for displacement prediction

It should be noted that the colours considered for these models are more intense than other BRNN models. The comparison between the measured and predicted peak load and displacement using the best BRNN model is shown in Figs. 17 and 18, respectively.

Fig. 17
figure 17

Correlations of peak load predicted by modeling BRNN with measured values

Fig. 18
figure 18

Correlations of displacement predicted by modeling BRNN with measured values

In this step, several BRNN models with acceptable accuracy were selected among other models. Their average was used as the input information of the ensemble model to develop BRNN (EBRNN) model to predict output with higher precision. In this regard, all ten-base model of peak load prediction (Table 6) was used to average results. In comparison, the BRNN model was averaged for constructing the EBRNN predictive model of displacement using all models except models 1, 2, 6, and 9 (Table 7). The cumulative calculated rate for models 3, 4, 5, 8, 9, and 10 were 36, 34, 57, 41, 24 and 31, respectively.

4.4 Developing a hybrid GFFNN model

After splitting the database into training and testing datasets and normalising, ten different GFFNN models with various learning algorithms, number of hidden nodes, and transfer functions were developed. As mentioned before, the GFFNN architecture was designed through a trial-and-error procedure. Their performance was analyzed using R2, RMSE, and VAF. The determination of statistical indices for each developed GFFNN model for predicting peak load and displacement are presented in Tables 8 and 9, respectively. Based on the results of peak load prediction (Table 8), the GFFNN1 is the best model with optimal architecture, i.e., “2–2–1”. Notably, the optimal topology of the best model yielded the highest R2 (0.913 and 0.924 for training and testing) and VAF (89.50 and 96.81 for training and testing), and the lowest RMSE values (7.55 and 3.633 for training and testing). Hence, this model was rated with the highest score, i.e., 60. In addition, the CIS method colors the calculated values based on the best values of color intensity from more to less. As seen from Table 8, the best GFFNN model with high performance and accuracy was Model No. 1, which was colored with the highest color intensity (red). Figure 19 demonstrates the predicted peak load using GFFNN No. 1 compared to the measured one and these values for each laboratory test.

Table 8 Obtained evaluation indices and FRM and CIS to detect the optimal topology of GFFNN for peak load prediction
Table 9 Obtained evaluation indices and FRM and CIS to detect the optimal topology of GFFNN for displacement prediction
Fig. 19
figure 19

Correlations of peak load predicted by modeling GFFNN with measured values

Ten GFFNN models with various topologies were trained, and the best model was selected. The results revealed that the GFFNN model No. 7 presented the highest accuracy for determining the displacement; therefore, the GFFNN7 was chosen as the best model with a topology of “2–8–1” and a total rate of 57 from 60 (Table 9). As seen, the color (red) of the GFFNN7 model is more intense than that of other base models. Figure 20 illustrates the estimated displacement using GFFNN No. 7 compared to the measured one and these values for each laboratory test. Similar to BRNN modeling and constructing EBRNN, the ensemble GFFNN (EGFFNN) model was based on averaging GFFNN No. 1, 2, 3, 5, and 9 for peak load prediction and GFFNN No. 2, 7, and 9.

Fig. 20
figure 20

Correlations of displacement predicted by modeling GFFNN with measured values

4.5 Developing hybrid XGBoost model

The XGBoost technique is also applied for peak load and displacement prediction. The modeling process was stopped using the following two factors: maximum tree depth and rounds. These stopping criteria addressed the XGBoost model to solve the system’s complexity. Like MLPNN, BRNN, and GFFNN, the XGBoost also had an overfitting issue in high numbers of tree depth and the nrounds. Therefore, the rounds and tree depth criteria were determined in the range of [50–200] and [1–3], respectively. The best XGBoost with the optimal number of two stopping criteria was determined based on “trial-and-error”. The results of peak load and displacement are shown in Tables 10 and 11, respectively. For displacement prediction, adjusting two criteria was performed to yield an optimal combination of stopping parameters. Based on Tables 10 and 11, ten XGBoost structures were developed and evaluated using FRM and CIS techniques. The best XGBoost model for estimating peak load was Model No. 2, with a cumulative rate of 60 out of 60, nrounds of 50, and a maximum tree depth of 2. The R2 values of this model were obtained as 0.973 and 0.961 for the train and test phases, respectively. Besides, Model No. 7 was selected as the optimal XGBoost model for predicting displacement values. The specifications of this XGBoost model were 100 for nrounds and 1 for maximum tree depth. This model presented the R2 values of 0.965 and 0.957 for training and testing, respectively. The predicted peak load and displacement values well corresponded to the actual values using the XGBoost-based model (Figs. 21 and 22). It can be concluded that the performance of the XGBoost model was higher than that of the MLPNN, BRNN, and GFFNN models.

Table 10 Obtained evaluation indices and FRM and CIS to detect the optimal topology of XGBoost for peak load prediction
Table 11 Obtained evaluation indices and FRM and CIS to detect the optimal topology of XGBoost for displacement prediction
Fig. 21
figure 21

Correlations of peak load predicted by modeling XGBoost with measured values

Fig. 22
figure 22

Correlations of displacement predicted by modeling XGBoost with measured values

This step focused on generating an ensemble XGBoost model (EXGBoost) using averaging base models. The peak load predictive models were analyzed. As a result, models nos.1, 2, 6, 8, and 10 were selected among other XGBoost-based models to develop EXGBoost. Besides the EXGBoost model for modeling and predicting displacement with higher accuracy, models nos. 2, 3, 4, 5, and 7 were averaged.

4.6 Developing hybrid ensemble models

As explained earlier, several base models of AI are averaged and imported into a new system generation to develop ensemble models. The results of the selected top base models and their ensemble models are given in Table 12. The models were rated and colored (purple) based on FRM and CIS methods. The comparison of measured and predicted peak load and displacement and their correlation plot of models are shown in Table 12 and Figs. 23 and 24.

Table 12 Performance indices for best models and their ensemble models
Fig. 23
figure 23

Correlations of peak load predicted by modeling EMLPNN with measured values

Fig. 24
figure 24

Correlations of displacement predicted by modeling EMLPNN with measured values

Table 12 provides the performance of various models used for the predictive analysis. In predicting PL, the best MLPNN, BRNN, GFFN, XGBoost, EMLPNN, EBRNN, EXGBoost models demonstrated robust performances, achieving R2 values of 0.896, 0.923, 0.913, 0.973, 0.959, 0.941, 0.922, and 0.998 during the training phase and 0.819, 0.982, 0.924, 0.961, 0.931, 0.981, 0.94, and 0.989 during the testing phase. The highest values of R2 and the lowest values of RMSE signified that EXGBoost had the best performance, leading to its top position with a cumulative rating of 1. Therefore, the results highlighted the competitive performances of various models, with XGBoost and EXGBoost emerging as the top-performing models, closely followed by the EBRNN and EMLPNN models. The findings emphasize the effectiveness of these models in accurate predictive analysis and underline their potential for practical applications in real-world scenarios. Also, the best model in predicting DP was EXGBoost, which performed best among the developed models (Figs. 25, 26, 27, 28, 29, 30).

Fig. 25
figure 25

Correlations of peak load predicted by modeling EBRNN with measured values

Fig. 26
figure 26

Correlations of displacement predicted by modeling EBRNN with measured values

Fig. 27
figure 27

Correlations of peak load predicted by modeling EGFFNN with measured values

Fig. 28
figure 28

Correlations of displacement predicted by modeling EGFFNN with measured values

Fig. 29
figure 29

Correlations of peak load predicted by modeling EXGBoost with measured values

Fig. 30
figure 30

Correlations of displacement predicted by modeling EXGBoost with measured values

Figure 31 shows the Taylor diagram to compare the models better. The diagram showed that the EXGBoost models had the best performance in predicting peak load and displacement using the available dataset. Notably, the MLPNN and GFFNN models had the lowest places for predicting peak load and displacement, respectively.

Fig. 31
figure 31

Taylor’s diagram for measured and predicted values using all developed models for peak load (left) and displacement (right)

The ensemble XGBoost model effectively addressed the challenges associated with peak load and displacement prediction resulting from pull-out tests of the fully grouted rock bolts. XGBoost excels in capturing complex non-linear relationships between input features and the predicted output, allowing it to effectively model the intricate interactions that influence peak load and displacement. Based on the results, XGBoost could provide more accurate predictions than the other traditional linear models. The XGBoost model seems to enable researchers to identify and prioritize the most influential factors affecting the axial-bearing capacity of the fully grouted rock bolting system. Notably, its ability to manage data with missing values or outliers ensured that the model could provide reliable predictions despite data imperfections, enhancing its practical applicability and reliability in challenging experimental conditions.

4.7 Sensitivity analysis

Sensitivity analysis techniques, such as the cosine amplitude method (CAM), evaluate the impact of input parameters or assumptions on the output of a model or system. This method involves systematically varying individual input parameters while keeping other factors constant and measuring the resulting changes in the model’s output. By applying this method, researchers can quantify the model’s sensitivity to specific input variations and identify the parameters that had the most critical impacts on the model’s behaviors. Through this analysis, researchers can identify critical parameters that contribute the most to output variability, allowing for the prioritization of resources and efforts toward addressing and optimizing these influential factors. Also, this method evaluates how sensitive the model is too small or large fluctuations in input parameters (Jong and Lee 2004), as follows:

$$ s_{ij} = \frac{{\sum\limits_{k = 1}^{m} {x_{ik} \cdot x_{jk} } }}{{\sqrt {\left( {\sum\limits_{k = 1}^{m} {x_{ik}^{2} } } \right) \cdot \left( {\sum\limits_{k = 1}^{m} {x_{jk}^{2} } } \right)} }} $$
(19)

where xik and xjk are the input and output parameters, m denotes the number of datasets.

The histogram of peak load and displacement sensitivity analysis to inputs parameters are depicted in Fig. 32. The performed sensitivity analysis revealed that CT has the most sensitive parameter on both peak load and displacement, whereas W/G shows the least sensitivity in the case of each two output parameters.

Fig. 32
figure 32

The importance of effective parameters

5 Conclusions

This research applied soft computing methods, employing an ensemble approach to predict the axial-bearing capacity in fully grouted rock bolting systems. By utilizing an ensemble model, XGBoost technique, the study suggested an advanced methodology for improving the predictive accuracy of the critical parameters, such as maximum pull-out capacity and displacement values, in mining, civil, and geotechnical projects. For this purpose, thirty-four fully grouted rock bolting samples were cast. The required samples were prepared based on three and four different W/G ratios and CT, respectively. To evaluate axial-bearing capacity of the fully grouted rock bolts, a comprehensive series of the pull-out tests were conducted to determine the maximum peak load capacity and the displacement values. After conducting the required tests, a database was built based on two inputs, including W/G ratio and CT, while the ultimate tensile and displacement values were considered the output data. Afterwards, different soft computing methods, including MLPNN, BRNN, GFFNN, and XGBoost, were applied to predict the axial-bearing capacity of the fully grouted rock bolting system. For this, the linear and non-linear relationships were provided by MLR and MNLMR techniques. Along with regression analysis, other results taken from ensemble models of MLPNN, BRNN, GFNN, and XGBoost were compared using three statistical criteria indexes, including R-squared, RMSE, and VAF. The results revealed that the developed ensemble XGBoost model could accurately predict the peak load and displacement values better than the other methods, such as MLPNN, BRNN, GFFNN, and multiple regression. The statistical criteria for ensemble XGBoost in predicting the peak load values for the training and testing phases were with R2 = 0.998, 0.989; RMSE = 0.0119, 0.0201; and VAF = 98.984, 99.473, respectively. These values for the displacement prediction were R2 = 0.985, 0.979, RMSE = 0.0298, 0.0435 and VAF = 99.483, 98.658 for the training and testing phase, correspondingly. Furthermore, sensitivity analysis was performed using the CAM to determine input parameters’ impact on the peak load and displacement. Results of the sensitivity analysis denoted that CT in days is the most impact parameter for predicting the outputs using this data set. This study involves some limitations that can be adopted in future works. One of the primary limitations of this study was the relatively small dataset size, comprising only 34 samples with two inputs and two outputs. The study’s reliance on a specific dataset might introduce constraints associated with the specific conditions under which the data were collected. However, it is possible to extend the database with other influential parameters such as the mechanical behaviour of grouts, types of grouts and confinements, and different specifications of the bolt’s profile might be added to the dataset. Indeed, this is the first step in applying such soft computing methods in this field. Although this paper includes only laboratory data, other types of data with a broader range of geological conditions, rock bolt types, and environmental factors collected from the in-situ measurements can also be added to the database. This would allow for a more comprehensive understanding of the predictive capabilities of the ensemble XGBoost model and other machine-learning techniques in various real-world scenarios. Building on the success of the ensemble XGBoost model, future studies could explore the integration of hybrid modeling approaches that combine various ML techniques with other computational methods, such as finite element analysis or computational fluid dynamics. This integration could provide a more holistic and multidimensional perspective on the complex behaviour of fully grouted rock bolt systems under varying conditions.