1 Introduction

The unconfined compressive strength (UCS) of rocks is undoubtedly a key input parameter of numerous constitutive models predicting the stress of rocks (Hoek and Brown 1980, 1997, 2019; Hoek 1983; Hoek et al. 2002). Determining the UCS of a rock sample in the laboratory requires specialist equipment and intact rock samples free of fissures and veins which are generally not easy to obtain. A viable alternative is to associate the rock’s UCS with various physical and mechanical test indexes such as the pulse velocity (Vp), Schmidt hammer rebound number (Rn), effective porosity (ne), total porosity (nt), dry density (γd), point load index (Is50), shear wave velocity (Vs), Brazilian tensile strength (BTS), slake durability index (SDI), and using simple and multiple regression analyses (Sachpazis 1990; Tuğrul and Zarif 1999; Katz et al. 2000; Kahraman 2001; Yılmaz and Sendır 2002; Yaşar and Erdoğan 2004; Dinçer et al. 2004; Fener et al. 2005; Aydin and Basu 2005; Sousa et al. 2005; Shalabi et al. 2007; Çobanoğlu and Çelik 2008; Vasconcelos et al. 2008; Sharma and Singh 2008; Kılıç and Teymen 2008; Diamantis et al. 2009; Yilmaz and Yuksek 2009; Yagiz, 2009; Moradian and Behnia 2009; Khandelwal and Singh 2009; Altindag 2012; Kurtulus et al. 2012; Bruno et al. 2013; Mishra and Basu 2013; Khandelwal 2013; Tandon and Gupta 2015; Karaman and Kesimal 2015; Ng et al. 2015; Armaghani et al. 2016a, b; Azimian 2017; Heidari et al. 2018; Çelik and Çobanoğlu 2019; Barham et al. 2020; Ebdali et al. 2020; Teymen and Mengüç 2020; Li et al. 2020). While a significant number of empirical relationships in predicting the UCS of rocks have been proposed in the literature, their predictive accuracy is generally significantly lower than obtained using artificial neural networks (ANNs) (Yılmaz and Yuksek 2008, 2009; Dehghan et al. 2010; Yagiz et al. 2012; Ceryan et al. 2013; Minaeian and Ahangari 2013; Yesiloglu-Gultekin et al. 2013; Yurdakul and Akdas 2013; Momeni et al. 2015; Armaghani et al. 2016a; Armaghani, et al. 2016b; Madhubabu et al. 2016; Ferentinou and Fakir 2017; Barham et al. 2020; Pandey et al. 2020; Ceryan and Samui 2020; Ebdali et al. 2020; Teymen and Menguc 2020; Moussas and Diamantis 2021; Armaghani et al. 2021). ANNs are advanced computational models, which can simulate highly non-linear relationships between various input and output parameters, but their predictive ability is limited to the range of input parameter values to which they have been trained; that is neural network (NN) models are unable to provide any predictions beyond the data input range to which they have been trained and developed. Within the input parameter value range, the frequency distribution of the input and output parameter values significantly affects the predictive accuracy of the NN model. A uniform input/output parameter value distribution, with the majority of data input/output a value spanned over a limited value range does not constitute a statistically suitable database to train and develop ANN models.

This research aimed at training and developing three ANN models for the prediction of the UCS of granite by compiling a high-quality data and site-independent database spanning the weak to strong granite range by consolidating a significant number of non-destructive tests results reported in the literature. As part of an ongoing research, this study expands the research recently reported by Armaghani et al. (2021), by introducing the Rn as a third non-destructive test index and by extending the range of the input parameter values; and therefore, expanding the predictive capability of the NN model. A data and site-independent database comprising 274 datasets correlating the Rn, Vp, and ne with the UCS of granite was compiled and used to train and develop various ANN models using Levenberg–Marquardt (LM) algorithm and two widely used optimization algorithms (OAs) namely particle swarm optimization (PSO) and imperialist competitive algorithm (ICA).

2 Research Significance

Despite the fact that a variety of semi-empirical/empirical expressions for predicting the UCS of rocks using various non-destructive test parameters have been published in the literature, the majority of the offered expressions do not provide a high degree of accuracy or any generalized solutions. This is mostly owing to an insufficient description of rock characteristics, the presence of complex correlation among the input parameters, very less experimental results, and inconsequential methods of calculations. Therefore, a more sophisticated method that is capable of capturing the complex behavior of rock UCS based on a large experimental result is required.

With their expertise in non-linear modelling, machine learning (ML) algorithms can capture the complex behavior of influencing parameters and provide feasible tools for simulating many complex problems. In the past, several ML algorithms, namely ANN, adaptive neuro-fuzzy inference system (ANFIS), fuzzy inference system (FIS); extreme gradient boosting machine (XGBoost), multivariate adaptive regression splines (MARS), extreme learning machine (ELM), ensemble learning techniques, support vector machine (SVM), and so on, have been employed to estimate the desired output including rock strength, landslide displacement, slope stability analysis, prediction of soil–water characteristic curve, inverse analysis of soil and wall properties in braced excavation, concrete compressive strength and so on (Armaghani et al. 2016a, b; Asterisand and Kolovos 2017; Zhang et al. 2017, 2018, 2020, 2021, 2022a, b, c; Zhang and Phoon 2022; Zhang and Liu 2022; Wang et al. 2019).

A detailed review of literature demonstrates the widespread applicability of ANNs in a variety of engineering disciplines, including the prediction of rock UCS. ANNs can simulate highly non-linear relationships between various inputs and output parameters, and provide a quick solution. However, their predictive ability is limited to the range of input parameter values to which they have been trained. In this study, a data and site-independent database comprising 274 datasets correlating the Rn, Vp, and the ne with the UCS of granite was compiled and used to train and develop various ANN models.

3 Literature Review on the Available Proposals

This section presents and discusses a comprehensive review of prior researches using semi-empirical and soft computing methodologies in predicting the UCS of granite. However, prior to the extended assessment of previous studies, a brief overview of experimental works is presented in the following sub-sections.

3.1 Experimental Works

The Schmidt hammer rebound number, i.e., Rn, is a hardness index of rock, which is obtained by pressing a piston perpendicular to the specimen surface and measuring the spring rebound. Depending on the applied impact energy, two Schmidt hammer types are available, the N-type (2.207 Nm impact energy) and L-type hammer (0.735 impact energy). The N-type Schmidt hammer was originally used to determine the rebound number of concrete cubes and was then also introduced by the ASTM standards for rock hardness testing, although it has been debated that the higher impact energy may actively fissures and cracks. Note that, the Rn depends on the degree of weathering, water content, sample size, hammer axis orientation and on the data reduction techniques (Poole and Farmer 1980; Ballantyne et al. 1990; Katz et al. 2000; Sumner and Nel 2002; Basu and Aydin 2004; Demirdag et al. 2009; Niedzielski et al. 2009; Çelik and Çobanoğlu 2019).

The compressional wave velocity, i.e., Vp, is a mechanical index, which involves the propagation of compressional waves below the yield strain of rock. The Vp is determined by calculating the required travel time of the waves between the transmitter and receiver. It depends on the mineralogy, texture, fabric, weathering grade, water content and density of rock (Yasar and Erdogan 2004; Kilic and Teymen 2008; Dheghan et al. 2010; Mishra and Basu 2013; Tandon and Gupta 2015; Momeni et al. 2015; Ng et al. 2015; Heidari et al. 2018).

The effective porosity, i.e., ne, is a physical rock index which shows the amount of interconnected void space. The voids are quantified in the form of inter-granular space, micro-fractures at grain boundaries, joints and faults (Franklin and Dusseault 1991). The rock’s porosity depends on various parameters such as the particle size, particle shape and weathering grade, which may change the pore size distribution, pore geometry, and even result in new pore formation (Tuǧrul 2004).

3.2 Semi-empirical Proposals

The significant number of simple and multiple regression analysis relationships correlating the UCS of granite with the three non-destructive test indexes (i.e., Rn, Vp, and ne) used in earlier researches are summarized in Table 1. The UCS of granite is generally proposed to reduce exponentially with increasing effective porosity, whereas linear and exponential growth patterns are suggested for the relationship between UCS and Rn, Vp. Most of the proposed relationships predict the rock UCS using only one input parameter and only a limited number of relationships have been proposed using two or three of the non-destructive test indices as input parameters for the prediction of the UCS of granite. The proposed relationships predict the UCS of weak to very strong granite (ISRM 2007).

Table 1 Available proposals correlating Rn, Vp, and ne with the UCS of granite

3.3 Soft Computing Proposals

During the last decades, a significant number of soft computing models, namely ANN, ANFIS, back-propagation neural network (BPNN), FIS, radian basic function NN (RBFN), gene expression programming (GEP), extreme gradient boosting machine with firefly algorithm (XGBoost-FA), and SVM, have been reported for the prediction of UCS of rocks (Meulenkamp and Grima 1999; Gokceoglu and Zorlu 2004; Yılmaz and Yuksek 2008; Dehghan et al. 2010; Monjezi et al. 2012; Yagiz et al. 2012; Mishra and Basu 2013; Yesiloglu-Gultekin et al. 2013; Momeni et al. 2015; Mohamad et al. 2015; Torabi-Kaveh et al. 2015; Teymen and Mengüç 2020; Barham et al. 2020; Ceryan and Samui 2020; Mahmoodzadeh et al. 2021; Yesiloglu-Gultekin and Gokceoglu 2022; Asteris et al. 2021). The proposed models predict the UCS of various rock types and formation methods spanning the very soft to hard rock range. Interestingly, few models have been proposed which specifically predict the UCS of granite (Yesiloglu-Gultekin et al. 2013; Armaghani et al. 2016a, b, 2021; Cao et al. 2021; Jing et al. 2021; Asteris et al. 2021; Mahmoodzadeh et al. 2021). Table 2 shows the different ML algorithms, input parameters, and database size and prediction accuracy of the UCS of granite reported in the literature. While the type and number of input parameters used generally varies, the Rn and Vp are included in most of the proposed models. In most of the reported models, the database includes generally limited data.

Table 2 Soft computing models reported in the literature for the prediction of UCS of rocks

4 Experimental Database

The experimental database used in this research to train and develop soft computing models for the prediction of the UCS of granite was compiled from 274 datasets reported in the literature (Tuğrul and Zarif 1999; Mishra and Basu 2013; Ng et al. 2015; Koopialipoor et al., 2022). The database comprises three input parameters obtained from non-destructive tests, including the Rn, Vp and ne and a single output parameter, the UCS of soft to hard granite (UCS = 20.30–211.9 MPa). The study reference and the descriptive details of the collected datasets are presented in Tables 3 and 4, respectively.

Table 3 Data used from experiments reported in the literature
Table 4 Descriptive details of the acquired datasets

The consolidated database only includes experimental results reported in compliance with international testing standards, which conform to the principles of statistical analysis regarding the definition of statistically significant sample size and statistical distribution of the reported test results. The robustness of the methodology used in this research to consolidate the test results reported by different researchers determines the reliability of the actual model prediction. A correlation matrix is presented in Fig. 1 to illustrate the degree of correlation (based on Pearson correlation coefficient) between the parameters. In addition, the comparative histograms of the input and output parameters are shown in Fig. 2. Note that the normalized values of input and output parameters were considered in illustrating comparative histograms.

Fig. 1
figure 1

Correlation matrix of the input and output variables

Fig. 2
figure 2

Comparative histograms of the input and output parameters

5 Methodology

This section presented theoretical background of ANN followed by a brief overview of PSO and ICA algorithms. After that, methodological development of ANN-LM model and hybrid ANN-PSO and ANN-ICA models are presented and discussed.

5.1 Artificial Neural Network (ANN)

ANNs are developed with the goal of learning from experimental or analytical data. These models may categorize data, forecast values, and aid in decision-making. ANN maps input parameters to a given output. In comparison to traditional numerical analysis processes (e.g., regression analysis), a trained ANN can deliver more trustworthy results with far less processing effort. ANN functions in the same way as the biological neural network in the human brain does. The basic building block in ANN is the artificial neuron, which is a mathematical model that seeks to replicate the activity of a biological neuron (Hornik et al. 1989; Hassoun 1995; Samui 2008; Das et al. 2011; Samui and Kothari 2011; Asteris and Plevris 2013, 2016; Nikoo et al. 2016, 2017, 2018; Asteris and Kolovos 2017; Asteris et al. 2017; Cavaleri et al. 2017; Psyllaki et al. 2018; Mohamad et al. 2019; Koopialipoor et al. 2020; Pandey et al. 2020; Apostolopoulou et al. 2018, 2019, 2020).

In ANN, the input data are sent into the neurons, which then processed using a mathematical function to produce an output. To imitate the random nature of the biological neuron, weights are assigned to the input parameters before the data reaches the neuron. The architecture of a back-propagation neural network (BPNN) can be expressed as: \(N-{H}_{1}-{H}_{2}-\cdots -{H}_{\mathrm{NHL}}-M\), where N is the number of input neurons (i.e., the number of input parameters), Hi is the number of neurons in the ith hidden layer for i = 1, …, NHL, NHL is the number of hidden layers, and M is the number of output neurons (output parameters). A typical structure of a single node (with the corresponding R-element input vector) of a hidden layer is presented in Fig. 3.

Fig. 3
figure 3

Architecture of ANNs

For each one neuron i, the individual input dataset \({p}_{1}, \dots , { p}_{R}\) are multiplied with the corresponding values of weights \({w}_{i,1}, \dots , { w}_{i,R}\) and the weighted values are fed to the junction of the summation function, where the dot product (\(W\cdot p\)) of the weight vector \(W=\left[{w}_{i,1}, \dots , {w}_{i,R}\right]\) and the input vector \(p={\left[{p}_{1}, \dots , { p}_{R}\right]}^{T}\) is generated. The value of bias b (threshold) is added to the dot-product forming the net input \(n\), which is the argument of the activation function ƒ:

$$n={W\cdot p={w}_{i,1}p}_{1}+{{w}_{i,2}p}_{2}+ \dots + {{w}_{i,R}p}_{R}+b.$$
(28)

The suitable selection of the activation/transfer function ƒ plays a key role during the training and development of ANN-based models affecting the structure and prediction accuracy of ANN. Although the hyperbolic Tangent Sigmoid function and the log-sigmoid function are the most commonly used activation function, many other types of functions have been proposed the last decade. In the research presented herein, an in-depth investigation on the effect of transfer functions on the performance of the trained and developed models has been conducted studying ten different activation functions. The main goal during the development process of an ANN model and especially during their training is to correlate the various input and the output parameters and minimize the error (Psyllaki et al 2018; Kechagias et al. 2018; Roy et al. 2019; Armaghani et al. 2019; Armaghani et al. 2019; Cavaleri et al. 2019, Chen et al. 2019, Xu et al. 2019; Yang et al. 2019; Pandey et al. 2020).

5.2 Optimization Algorithms

Kennedy and Eberhart (1995) invented PSO in 1995, inspired by the behavior of social species in groups, such as bird and fish schools or ant colonies. PSO simulates the sharing of information between members. In the last two decades, PSO has been used in a variety of areas in conjunction with other techniques (Koopialipooret al. 2019a, b; Moayedi et al. 2020). This approach searches for the best solution using particles whose trajectories are changed by a stochastic and a deterministic component. Each particle is impacted by its ‘best’ obtained position and the ‘best’ achieved position of the group, but it moves at random. In PSO, a particle \(i\) is defined by its position vector, x, and its velocity vector,\(v\). During the course of each iteration, each particle changes its position as follows:

$${v}_{i}^{k+1}=w\times {v}_{i}^{k}+{c}_{1}.{r}_{1}\left({{x}_{\mathrm{best}}}_{i}^{k}-{x}_{i}^{k}\right)+ {c}_{2}.{r}_{2}\left({{g}_{\mathrm{best}}}_{i}^{k}-{x}_{i}^{k}\right),$$
(29)
$${x}_{i}^{k+1}={x}_{i}^{k}+{v}_{i}^{k}.t,$$
(30)

where \(w\), \({c}_{1}\), \({c}_{2}\), \({r}_{1}\), and \({r}_{2}\) are, respectively, inertia weight, two positive constants and two random parameters. Figure 4 illustrates a 2-D representation of a particle, ‘i’, movement between two positions. It can be observed how the particle best position, Pbest, and the group best position, gbest, influence the velocity of the particle at the k + 1 iteration.

Fig. 4
figure 4

Movement of \(i\) particle in the search space during k and k + 1 iterations

Atashpaz-Gargari and Lucas (2007) proposed ICA as a global search population-based on optimization algorithm. ICA begins with the creation of a randomly generated starting population known as countries. The process continues to create N countries, after which the number of imperialists is chosen as a specified number of the lowest-cost countries. The remaining countries are used as special functions among otherempires.in ICA, imperialists are more powerful when they have more colonies. ICA is made up of three major operators: assimilation, revolution, and competition. The body of the ICA is made up of colonies that are all equally absorbed by imperialists. However, the revolution is bringing about a lot of unexpected developments. The imperialists are competing to get more colonies in the competition portion, and any empire that can meet the needed requirements finally wins. This technique is continued until the target benchmark is reached. Several studies provide further information concerning ICA (Atashpaz-Gargari and Lucas 2007).

5.3 ANN-LM and Hybrid ANN Modelling

The basic steps of a holistic methodology were followed in order to obtain the optimum ANN model that estimates the UCS of granite. Typically, in the context of ANN training and development, the majority of researchers selects in advance: (a) the method to be used to normalize the data, (b) the transfer function to be used, (c) the training algorithm to be used for the training of neural networks, (d) the number of hidden layers and (e) the number of neurons for each hidden layer. Specifically, in order to estimate the layout of the neural network (i.e., hidden layers, and neurons), several semi-empirical methodologies are available in the literature, taking into account the number of input parameters, output parameters, the number datasets used for training or even combinations of the above. Table 5 presents a representative list of these semi-empirical relationships which are widely accepted and commonly used in practice. In this work, however, rather than relying on empirical patterns, a different approach has been adopted, as a means to determine the most suitable layout of the neural network. Specifically, a thorough and in-depth investigation algorithm is employed, the steps of which are outlined as follows.

Table 5 Available in the literature semi-empirical formulas for the selection of suitable number of neurons per hidden layer

Step 1. Develop and train a plethora of ANN models: The development and training of multiple alternative ANNs takes place with varying number of hidden layers (1 or 2) and number of neurons (1–30). Furthermore, each alternative ANN is trained for ten different activation functions, as well as using Levenberg–Marquardt (LM) algorithm as training algorithm. Finally, the alternative variations are further expanded by examining ten different random initial values of weights and bias for each developed ANN model.

Step 2. Select the optimal architectures: From the previously developed ANNs which have been trained with the LM algorithm, the optimal architecture (ANN-LM) was selected that achieved best statistical performance indices using the training datasets.

Step 3. Upon the two optimum architectures, selected in the previous step, several different optimization algorithms were implemented such as PSO and ICA, so that optimized values of weights and bias are obtained.

Step 4. Depending on the performance indices, achieved by the examined ANNs, the optimum one is selected and proposed.

The proposed algorithm is time consuming, since it requires the development and evaluation of numerous alternative ANN models, but provides higher credibility in reaching an optimum architecture upon which the optimization algorithms will be applied. An additional benefit of the proposed algorithm that is missing from the available semi-empirical methodologies in the literature is that the optimum combination of transfer functions is also obtained (besides the number of hidden layers, and the number of neurons).

5.4 Hybridization Procedure of ANN and OAs

In the last decade, many studies have been conducted in engineering applications to improve the ability of ANN models using OAs (Koopialipoor et al. 2019a, b; Moayedi et al. 2020; Golafshani et al. 2020). Specifically. The learning parameters (weights and biases) of ANNs are optimized using OAs. PSO and ICA, two extensively utilized OAs, were used to optimize the weights and biases of ANNs in this study. The ANN and PSO, and ICA approaches work together to provide a prediction model for rock UCS. The methodological development of hybrid ANN can be described as: (a) initialization of ANN; (b) set hyper-parameters such as number of hidden layers, number of hidden neurons, and activation function; (c) initialization of OA; (d) select swarm/particle size and other deterministic parameters of OA; (e) set terminating criteria; (f) training of ANN using training dataset; (g) calculate fitness; (h) select optimum values of weights and biases based on the performance criteria; and (i) testing of ANN. Figure 5a represents the entire process of hybrid ANN construction followed in this work.

Fig. 5
figure 5

a Construction procedure of hybrid ANNs and b steps of computational modelling

5.5 Performance Indices

The complete dataset of 274 observations was divided into training, validation, and testing datasets prior to computational modelling. Specifically, a total of 183 observations were used for the training of ANN models, whereas 45 observations were used for validation and 46 data for testing. The steps of computational modelling are illustrated in Fig. 5b. Right after the model construction, their predictive accuracy was assessed using four widely used indices (Chandra et al. 2018; Raja and Shukla 2020, 2021a, b; Raja et al. 2021; Khan et al. 2021, 2022; Aamir et al. 2020; Bhadana et al. 2020), namely root mean square error (RMSE), mean absolute percentage error (MAPE), correlation coefficient (R), and variance account for (VAF). The mathematical expressions of these indices are given as follows:

$$\mathrm{RMSE}=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}({y}_{i}-{\widehat{y}}_{i}{)}^{2}},$$
(43)
$$\mathrm{MAPE}=\frac{1}{n}{\sum }_{i=1}^{n}\left|\frac{{y}_{i}-{\widehat{y}}_{i}}{{y}_{i}}\right|,$$
(44)
$$R=\sqrt{\frac{{\sum }_{i=1}^{n}({y}_{i}-{y}_{\mathrm{mean}}{)}^{2}-{\sum }_{i=1}^{n}({y}_{i}-{\widehat{y}}_{i}{)}^{2}}{{\sum }_{i=1}^{n}({y}_{i}-{y}_{\mathrm{mean}}{)}^{2}},}$$
(45)
$$\mathrm{VAF}(\%)=(1-\frac{\mathrm{var}({y}_{i}-{\widehat{y}}_{i})}{\mathrm{var}({y}_{i})})\times 100,$$
(46)

where \(n\) denotes the total number of datasets, and \({y}_{i}\) and \({\widehat{y}}_{i}\) represent the predicted and target values, respectively. Recent research has highlighted the limitations of the RMSE, the MAPE, and the R in assessing the predictive accuracy of neural networks (Asteris et al. 2021; Armaghani et al. 2021). To this end, the a20-index was used to estimate the model accuracy. Note that, a20-index is a recently proposed index which has a physical engineering meaning and can be used to ensure the reliability of a data-driven model (Apostolopoulou et al. 2019, 2020). The mathematical expression of this index can be given by:

$$\mathrm{a}20-\mathrm{index}=\frac{\mathrm{m}20}{M},$$
(47)

where M is the number of dataset sample and \(\mathrm{m}20\) is the number of samples with a value of (experimental value)/(predicted value) ratio, between 0.80 and 1.20. In a 100% accurate predictive model, the a20-index would be equal to 1 or 100%. The a20-index shows the number of samples that satisfy the predicted values with a deviation of ± 20%, compared to experimental values.

6 Results and Discussion

6.1 Development of ANN-LM Models

As stated above, a sum of 274 datasets was compiled and used to train and develop 3 ANN-based models. From the 274 data in the database, a sum of 183 observations was used for the training of ANN models, while 45 observations were used for validation and 46 data for testing. The parameters presented in Table 6 were used for the training and development of the ANN models. The combinations of these parameters yield 240,000 different ANN architecture models with one hidden layer.

Table 6 Parameters used for the training of ANN models

The above ANN models were assessed based the RMSE index using the testing datasets. Table 7 presents the 20 best models for the case of ΑΝΝ. From the information shown in Table 7, it appears that the optimum ANN with one hidden layer is the ANN-LM 3-11-1 model which corresponds to minmax normalization technique in the range [− 1.00, 1.00], 11 neurons for the hidden layer and transfer functions the normalized radial basis transfer function and the symmetric saturating linear transfer function (SSL). Table 8 presents the performance indices for the optimum ANN model both for training and testing datasets, while their architecture is presented in Fig. 6. Herein, the performance of the developed ANN-LM 3-11-1 model is presented for both training and testing datasets. A model with the lowest RMSE and highest R values in the testing phase (RMSE = 14.8272 and R = 0.9607) was selected for the construction of ANN-LM 3-11-1 model. The value of a20-index for optimum ANN-LM 3-11-1 model was determined to be 0.8470 and 0.7174 in the training and testing phases, respectively. However, to better illustrate the predictive outcomes, Fig. 7 represents a comparison of actual vs. predicted values of granite UCS prediction. Two different diagrams, viz., scatter plot and line diagram are presented for the optimum ANN-LM 3-11-1 model.

Table 7 Top 20 optimum structures/architectures of ANN-LM models
Table 8 Performance of the optimum ANN-LM model
Fig. 6
figure 6

Structure of the optimum ANN-LM 3-11-1 developed and proposed model

Fig. 7
figure 7

Actual vs. predicted UCS using the optimum developed ANN-LM 3-11-1 model

6.2 Development of ANN-PSO Model

The PSO was used to optimize the weights of biases of the optimum ANN-LM 3-11-1 model and ANN-PSO 3-11-1 model was constructed. The basic settings for constructing ANN-PSO 3-11-1 model are presented in Table 9. Specifically, the population size was set between 10 and 100 by step 5, 50 different cases of random numbers, and the maximum number of iterations was set to 100. Therefore, the total number of combinations was determined to be 95,000. Note that the RMSE was selected as the cost function. The convergence behavior of all the developed ANN-PSO 3-11-1 models is presented in Fig. 8. It can be seen from this figure that the ANN-PSO 3-11-1 model converged into a good solution within 30–40 iterations. A flat curve was discovered beyond this range, indicting no better solution was found. The influence of random numbers generation in ANN-PSO modelling is also illustrated in Fig. 9. The performance of all the developed ANN-PSO 3-11-1 models is presented in Table 10. In the training phase, the values of R and RMSE ranged from 0.9170 to 0.9335 and 0.2019 to 0.1815, respectively, while in the testing phase, these values ranged from 0.9293 to 0.9335 and 0.2020 to 0.1968, respectively. The performance of the most appropriate ANN-PSO 3-11-1 model is presented in Table 11. As can be observed from this table, the ANN-PSO 3-11-1 model has achieved the desired predictive accuracy with a20-index = 0.5519, R = 0.9330, RMSE = 23.4893, MAPE = 20.95%, and VAF = 86.9750 in the training phase and a20-index = 0.5435, R = 0.9330, RMSE = 23.9205, MAPE = 24.50%, and VAF = 79.3740 in the testing phase. Figure 10 depicts a comparison of actual and expected granite UCS values in order to further show the predicting conclusions. Two distinct diagrams, a scatter plot and a line diagram, are provided for the optimal ANN-PSO 3-11-1 model.

Table 9 Hypermeters configuration of ANN-PSO 3-11-1 model
Fig. 8
figure 8

Convergence of the developed ANN-PSO 3-11-1 model

Fig. 9
figure 9

a, b Training and testing performance curves of ANN-PSO 3-11-1 models for different random numbers (1:1:100) while the population size is constant to optimum size 60, c, d training and testing performance curves of ANN-PSO 3-11-1 models for different population sizes (10:2:60) while the random number is constant to optimum 60

Table 10 Performance analysis of ANN-PSO 3-11-1 models
Table 11 Prediction accuracy of the optimum ANN-PSO 3-11-1 model
Fig. 10
figure 10

Experimental vs. predicted UCS using the optimum ANN-PSO 3-11-1 model

6.3 Development of ANN-ICA Model

Analogous to PSO, the ICA was utilized to optimize the weights and biases of the optimal ANN-LM 3-11-1 model, and the ANN-ICA 3-11-1 model was created. Table 12 presents the settings of hyper-parameters for building the ANN-ICA 3-11-1 model. In particular, the population size was set between 10 and 100 by step 5, 10 different sets (1 to 10 by step 1) for random number generation, and the maximum number of epochs was set to 200. Therefore, the total numbers of possible combinations were 380,000. The RMSE was selected as the cost function in each case. Figure 11 depicts the convergence behavior of all produced ANN-ICA 3-11-1 models. This graph demonstrates that the ANN-ICA 3-11-1 model converged to a satisfactory solution within 160–180 iterations. Beyond this range, a flat curve was identified, indicating there is no better solution. The influence of random number (i.e., initial values of weights and biases) on the performance of ANN-ICA 3-11-1 model is presented in Fig. 12. Table 13 presents the performance of all produced ANN-ICA 3-11-1 models. During the training phase, R and RMSE values varied between 0.9241 and 0.9317 and 0.1929 and 0.1832, respectively. During the testing phase, these values ranged between 0.9272 and 0.9286 and 0.2048 and 0.2030, respectively. Table 14 displays the performance of the best suitable ANN-ICA 3-11-1 model. The ANN-ICA 3-11-1 model has achieved the desired predictive accuracy, as shown in the table, with a20-index = 0.6339, R = 0.9286, RMSE = 18.6663, MAPE = 21.21%, and VAF = 85.0725 in the training phase and a20-index = 0.6739, R = 0.9285, RMSE = 22.5228, MAPE = 16.48%, and VAF = 82.3275 in the testing phase. Figure 13 displays a comparison of actual and predicted granite UCS values to further illustrate the conclusive predictions. The ideal ANN-ICA 3-11-1 model is shown by two separate diagrams: a scatter plot and a line diagram.

Table 12 Hypermeters configuration of ANN-ICA model
Fig. 11
figure 11

Convergence of the developed ANN-ICA 3-11-1 model

Fig. 12
figure 12

a, b Training and testing performance curves of ANN-ICA 3-11-1 models for different random numbers (1:1:10) while the population is constant to optimum size 75 and the number of empires is constant to optimum size 12; c, d training and testing performance curves of ANN-ICA 3-11-1 models for different population numbers (15:5:100) while the random number is constant to optimum 6 and the number of empires is constant to optimum size 12; and (e, f) training and testing performance curves of ANN-ICA 3-11-1 models for different empires numbers (2:2:20) while the random number is constant to optimum 6 and the number of population is constant to optimum size 75

Table 13 Performance analysis of ANN-ICA models
Table 14 Prediction accuracy of the ANN-ICA 3-11-1 model
Fig. 13
figure 13

Experimental vs. predicted UCS using the optimum ANN-ICA 3-11-1 model

A performance summary of the developed models is furnished in Table 15. In terms of the a20-index criterion, the produced ANN-LM 3-11-1 model achieved the highest precision with 84.70% and 71.74% accuracies, respectively, in the training and testing phases. The values of other indices also demonstrate that the proposed ANN-LM 3-11-1 is efficient and robust. Based on the RMSE index, the ANN-LM 3-11-1 model outperforms the ANN-PSO 3-11-1 and ANN-ICA 3-11-1 models by 103.73% and 61.90%, respectively, in the training phase, and by 61.33% and 51.90%, respectively, in the testing phase. These results corroborate the aforementioned premise in every aspects.

Table 15 Prediction accuracy of the optimum ANN models

To further demonstrate the overall performance of the optimum ANNs in a succinct manner, Taylor diagram and error histogram for both training and testing stages are presented in this sub-section. Note that, Taylor diagram is a two-dimensional mathematical diagram that is used to provide a brief assessment of a model’s accuracy (Taylor 2001). It denotes the associations between the real and estimated observations in terms of R, RMSE, and ratio of standard deviations indices. In Taylor diagram, a model is represented by a point. It should be emphasized that for an ideal model, the point’s position should correspond with the reference point (as shown in black color circle). On the contrary, error histogram displays the distribution of error between the actual and estimated values for a data-driven model. Figure 14 illustrates the optimum ANNs generated in this study, whereas Fig. 15 presents the error histogram between the estimated and actual UCS of granite. From these figures, the robustness of the proposed ANN-LM 3-11-1 model can be visualized.

Fig. 14
figure 14

Taylor diagram for the optimum ANNs a training phase and b testing phase

Fig. 15
figure 15

Error histogram for training (TR) and testing (TS) phases

6.4 Closed-Form Equations for Estimating Granite UCS Using Optimum ANN-LM 3-11-1 Model

It is quite common for the majority of published studies pertaining to the studied problem, to present primarily the architecture of the resulting optimal ANN model, together with the values of the statistical indicators based on which the model performance was evaluated. Yet, this information alone makes it impossible to assess the reliability of the proposed mathematical model, and more importantly prohibit any substantial comparison with other models available in the literature. In order to characterize the reliability of the proposed computational model, it is necessary for the researchers to provide all the pertinent data that clearly describe the model so that it can be reproduced and checked by other researchers. Especially for the case of ANNs, apart from the architecture, it is deemed necessary to provide the transfer functions corresponding to the proposed ANN model, and more importantly the finalized values of weights and biases of their developed and proposed models. In addition, since a publication targets not only researchers but also practicing engineers, it would be particularly useful to include a Graphical User Interface (GUI) in the study, implementing the proposed ANN-LM 3-11-1 model so that it can be checked by experts and be utilized by anyone interested in this problem.

To overcome this deficiency, the explicit mathematical equation that represents the optimum developed ANN-LM 3-11-1 model, along with the accompanying weights and biases, are provided in this study. Therefore, the proposed model can be readily reproduced (e.g., in a spreadsheet environment), by any interested third party. Even if an understanding of neural networks is lacking, an implementation of the model is still possible, since the required calculation steps are defined through the provided explicit mathematical formula. In light of the above, the derived equations for the prediction of both normalized (UCSnorm) and real (UCSreal) values of granite UCS, using Rn, Vp, and ne values are expressed by the following equations for the optimum developed ANN-LM 3-11-1 model:

$${\mathrm{UCS}}_{\mathrm{norm}}=\mathrm{ satlins}\left(\left[\mathrm{LW}\left\{\mathrm{2,1}\right\}\right]\times \left[\mathrm{radbasn}\left(\left[\mathrm{IW}\left\{\mathrm{1,1}\right\}\right]\times \left[\mathrm{IP}\right]+\left[B\left\{\mathrm{1,1}\right\}\right]\right)\right]+\left[B\left\{\mathrm{2,1}\right\}\right]\right),$$
(48)
$${\mathrm{UCS}}_{\mathrm{real}}= \frac{\left({\mathrm{UCS}}_{\mathrm{norm}}-a\right)\times \left({\mathrm{UCS}}_{\mathrm{max}}-{\mathrm{UCS}}_{\mathrm{min}}\right)}{b-a}+{\mathrm{UCS}}_{\mathrm{min}},$$
(49)

where a = − 1.00 and b = 1.00 are the lower and upper limits of the minmax normalization technique applied on the data, UCSmax = 211.90 and UCSmin = 20.30 are the maximum and minimum values of granite UCS present in the database that was used for training and the development of ANN models. The satlins and radbasn are the symmetric saturating linear transfer function and the normalized radial basis transfer function, respectively, as discussed in Table 6. In addition, their details (equations and graphs) are presented in detail in Table 18 of the Appendix.

Equation (48) describes the developed ANN-LM 3-11-1 model in a purely mathematical form, so that its reproduction becomes straightforward. In Eq. (48), \([\mathrm{IW}\{\mathrm{1,1}\}]\) is a 11 \(\times\) 3 matrix that contains the weights of the hidden layer; \(\left[\mathrm{LW}\left\{\mathrm{2,1}\right\}\right]\) is a 1 \(\times\) 11 vector with the weights of the output layer; \(\left[\mathrm{IP}\right]\) is a 3 × 1 vector with the three input variables; \(\left[\mathrm{B}\left\{\mathrm{1,1}\right\}\right]\) is a 11 \(\times\) 1 vector that contains the bias values of the hidden layer; and \(\left[\mathrm{B}\left\{\mathrm{2,1}\right\}\right]\) is a 1 \(\times\) 1 vector with the bias of the output layer. The \([IP]\) vector contains the three normalized values of the input variables Rn, Vp, and ne. It can be expressed as

$$\left[\mathrm{IP}\right]=\left[\begin{array}{c}a+\left(b-a\right)\left(\frac{{n}_{e}-\mathrm{min}\left({n}_{e}\right)}{\mathrm{max}\left({n}_{e}\right)-\mathrm{min}\left({n}_{e}\right)}\right)\\ a+\left(b-a\right)\left(\frac{{V}_{p}-\mathrm{min}\left({V}_{p}\right)}{\mathrm{max}\left({V}_{p}\right)-\mathrm{min}\left({V}_{p}\right)}\right)\\ a+\left(b-a\right)\left(\frac{{R}_{n}-\mathrm{min}\left({R}_{n}\right)}{\mathrm{max}\left({R}_{n}\right)-\mathrm{min}\left({R}_{n}\right)}\right)\end{array}\right],$$
(50)

where the \(\mathrm{min}\left({n}_{e}\right)\)=0.06, \(\mathrm{max}\left({n}_{e}\right)\)=7.23; \(\mathrm{min}\left({V}_{p}\right)\)=1160, \(\mathrm{max}\left({V}_{p}\right)\)=7943; \(\mathrm{min}\left({R}_{n}\right)\)=16.80, and \(\mathrm{max}\left({R}_{n}\right)\)=72 are the input parameters’ minimum and maximum values (shown in Table 4). The values of final weights and biases that determine the matrices \(\left[\mathrm{IW}\left\{\mathrm{1,1}\right\}\right]\), \(\left[\mathrm{LW}\left\{\mathrm{2,1}\right\}\right]\), \(\left[\mathrm{B}\left\{\mathrm{1,1}\right\}\right]\) and \(\left[\mathrm{B}\left\{\mathrm{2,1}\right\}\right]\) are presented in Table 16.

Table 16 Finalized weights and bias of the optimum ANN-LM 3-11-1 model

In this form of matrix multiplication, Eq. (49) can be easily programmed in an Excel spreadsheet and, therefore, it can be more easily evaluated and used in practice. It is worth to note that such an implementation can be used by various interested parties (i.e., researchers, students, and engineers), without placing heavy requirements in effort and time. Figure 16 presents the developed Graphical User Interface (GUI), implementing Eqs. (48) and (49). The developed GUI is also provided as a supplementary material. The developed GUI can be used as an alternate tool to estimate the UCS of rocks. Researchers and practitioners can utilize the developed GUI for estimating the rock UCS in the preliminary stages of any major/minor projects such as tunneling works, railway projects, and highway projects.

Fig. 16
figure 16

GUI for the prediction of UCS (also appended as a supplementary material)

6.5 Comparison of the Optimum ANN-LM 3-11-1 Model with Semi-empirical Relationships Available in the Literature

The accuracy of the optimum ANN-LM-3-11-1 model developed in this study to predict the UCS of granite is compared with the prediction accuracy of the top nine models available in the literature. The prediction accuracy of the various models is assessed using a range of statistical indices including the a20-index, R, RMSE, MAPE and VAF. Herein, the comparison is presented for the testing dataset because a model that made more accurate prediction in the testing phase is considered to be more accurate and should be accepted with more conviction. Table 17 shows that the optimum developed ANN-LM-3-11-1 model significantly outperforms the prediction accuracy of the top nine models reported in the literature. The ANN-LM-3-11-1 model predicts the UCS of granite with less than ± 20% deviation from the experimental data for 71.74% of the specimen, while the second in rank model proposed by Mishra and Basu (2013) predicts the UCS of granite with less than ± 20% deviation from the experimental data for 54.35% of the specimen. The variation of the various statistical analysis indexes suggests that the a20-index is the most prediction accuracy sensitive index. For example, while the Pearson correlation coefficient of the first and second rank models only differ by 16%, the a20-indexes differ by 32%.

Table 17 Prediction accuracy ranking of the optimum developed ANN-LM model and the top 9 models

It is of interest to reiterate that the developed optimum ANN-LM-3-11-1 model can predict the UCS of granite strictly within the range of values to which it has been trained and which is presented in detail in Table 4. The prediction accuracy of the proposed model is particularly high for the range where the various parameter value distribution is dense and which typically comprises 5% of the total parameter values.

7 Summary and Conclusion

The aim of this research is to estimate the UCS of rocks using three non-destructive test indicators, namely Rn (L), Vp, and ne. For this purpose, a sum of 274 datasets was compiled and used to train and validate three ANN-based models including ANN-LM 3-11-1, ANN-PSO 3-11-1, and ANN-ICA 3-11-1. Specifically, the performance of constructed ANNs was evaluated initially, followed by a comparison of the predicted accuracy of models currently available in the literature. Existing models in the literature employ only Vp and/or ne as input parameters for predicting the UCS of granites; however, to this purpose, the created ANN-based models were compared utilizing three test indicators, viz., Rn (L), Vp, and ne. Using a20-index, R, RMSE, MAPE, and VAF criteria, the ANN-LM 3-11-1 was found to be the highest performing model in both the training and testing phases of UCS prediction. Furthermore, when compared to the predicted UCS of granite using existing models in the literature, the ANN-LM 3-11-1 proposed in this study delivers a more accurate prediction. Considering the experimental results, the developed GUI based on the constructed ANN-LM 3-11-1 model can be used as a tool to estimate the UCS of granites.

The goal of this study was to not only build a high-performance ML model, but also to provide adequate details of numerous proposals that had been proposed in the literature. Details of previous studies employing soft computing models to predict granite UCS are also provided and discussed. The main advantages of the study include (a) closed-from equation; (b) available GUI platform as a readymade tool for estimating rock UCS; and (c) the most efficient model. However, the optimum developed ANN-LM 3-11-1 model can predict the UCS of soft to hard granite ranging between 20.30 and 211.9 MPa. This is one of the limitations of this study. In addition, the prediction accuracy of the ANN-LM 3-11-1 model could be influenced by any potential variations of the input data distribution to which it has been trained and developed. The descriptive statistics of the input parameters demonstrates that the number of samples with Vp ranging between 1000 and 3000 m/s is relatively limited. Therefore, extending the database in this range would require a recalibration of the optimum developed ANN-LM 3-11-1 model. The future direction of the study could include (a) a comprehensive evaluation of the ANN-LM model’s accuracy relative to other soft computing models using real-world data from a variety of fields; (b) an evaluation of the ANN-LM 3-11-1 model’s superiority over other hybrid ANNs constructed with different optimization algorithms; and (c) implementation of dimension reduction techniques such as principal component analysis (PCA), independent component analysis, and kernel-PCA for a comprehensive evaluation of results. Nonetheless, to the best of the authors’ knowledge, this is the first research to employ three non-destructive test indicators (Rn, Vp, and ne) to predict the UCS of granites.