Introduction

As the deluge of the lean strategy continues to impact practically every commercial and scientific domain, the mining industry is also experiencing a transformational shift to extract the deposit at a greater depth (Lööw 2015). Considering the variability in ore grade and the cost of extraction, Mining3 (previously known as CRC Mining) recently launched an idea of extracting and processing the targeted ore content in-place (Bryan 2017). This idea is known as in-place mining (IPM), and it can minimise the movement of rock while maximising the amount of recoverable high-value grade. Depending on the ore body and mining method, IPM encompasses three different schemes: (a) in-line recovery (ILR); (b) in situ recovery (ISR); and (c) in-mine recovery (IMR) (Bryan 2017; Mining3 2017; Mousavi and Sellers 2019). In a simple description, ILR incorporates technologies such as selective cutting and ore upgrading to selectively extract and pre-concentrate the material at a location close to the surface; ISR involves pumping the pregnant solution into the ground to dissolve the deeply buried ore grades; IMR is a coupled process where the pre-fragmented rock blocks in the designed stopes are subjected to leaching (Bryan 2017; Mining3 2017; Mousavi and Sellers 2019). Preparing the stopes for IMR needs special treatment via cutting and blasting. In the ILR scheme, the most common method to extract useful minerals is to selectively blast the rock blocks near the contact areas of ore bodies and gangue. This process is oftentimes (a) cyclic; (b) hostile if the ore body is deposited under high in situ stress; and (c) less predictable. Besides, controlling the energy distribution in blasting to avoid overbreak and underbreak could be somewhat challenging. Problems such as grade loss, ore dilution, and stability are often the results of poor energy control (Konicek et al. 2013). Rather than being blasted, Hood et al. (2005) concluded that if the targeted minerals could be selectively cut from the deposit, mining operations could become more efficient, reliable, and predictable. Yet, the existing cutting technologies (i.e. drag bits and roller discs) suffer from the twin issues of excessive wear rate and high reaction forces when cutting abrasive and strong rocks (Ramezanzadeh and Hood 2010). With the emergence of novel technologies such as undercutting, ODC, and ADC, Vogt (2016) indicated selective cutting shows great potential for future mining.

Despite the novelty, the practicality of these emerging technologies is often constrained by the estimation of in-place geological resources. It is well known that MWD data is an indication of the spatial distribution of rock mass conditions. A better understanding of MWD data not only is beneficial to the blast design but also to the downstream processes such as grade engineering (Sellers et al. 2019). There existed an extensive amount of studies focusing on extracting or interpreting the rock characteristics based on the MWD data. For example, researchers at Mining3 performed numerous laboratory and field studies for rock mass characterisation based on the MWD techniques for blasting design (Segui and Higgins 2002; Smith 2002; Cooper et al. 2011). There were also some analytical approaches using wavelet transformation to solve MWD related rock mass characterisation tasks (Li et al. 2007; Chin 2018).

According to Lewis and Vigh (2017), the reliability and computational cost are the two primary factors for the better developments of inversion technology; adequate data mining techniques such as ML might provide an alternative. Substantial efforts have been devoted to implementing soft computing to model different metrics that affect MWD and rock-cutting operations. Among all, performance prediction and rock type classification were two popular themes. Examples such as the work done by Akin and Karpuz (2008) and Basarir et al. (2014) estimated the rate of penetration (\(ROP)\) using ANN and adaptive neuro-fuzzy inference system (ANFIS). Hegde et al. (2020) also employed a data-driven approach such as the random forest (RF) to predict the specific energy (\(SE)\) associated with drilling operations with highly accurate prediction. Anifowose et al. (2015) concluded that, compared with traditional regression modeling, the non-parametric nature and the flexibility of the above-mentioned machine learning algorithms make the prediction of drilling operations more efficient and accurate. Other than the performance prediction, studies such as Kadkhodaie-Ilkhchi et al. (2010), Klyuchnikov et al. (2019), Romanenkova et al. (2020), and Zaitouny et al. (2020) used fuzzy system (FS), ANN, decision tree (DT), and recurrent neural network (RNN) algorithms to classify the rock types associated with MWD data. The successful implementation of soft computing in estimating the lithology from MWD data is of great importance in identifying the productive and non-productive layers to optimise the drilling operations.

For rock cutting, studies were heavily skewed to performance prediction rather than rock type classification. For example, to predict \(ROP\), Grima et al. (2000) used ANFIS; Mahdevari et al. (2014), Fattahi and Babanouri (2017) used support vector regression (SVR); Yagiz (2008), Zhang et al. (2020), Zhou et al. (2020), and Koopialipoor et al. (2020) used ANN. For \(SE\) prediction, Ghasemi et al. (2014) employed a fuzzy system (FS); Salimi et al. (2018) employed the classification and regression tree (CART). Later, Yilmazkaya et al. (2018) concluded that employing soft computing to rock cutting shows better accordance with the actual measurement than traditional regression models. Despite the unparalleled surge in performance prediction, little research has used soft computing techniques to classify the rock type associated with the cutting data. This leaves a gap as information from historical data is crucial for cutting operations to be conducted more precisely, selectively, and efficiently. It is also worth mentioning that the successful implementation of any ML algorithm is often subjected to some hand-tuning to find the best configuration. As emphasised by Bergstra et al. (2013), such parameter tuning is somewhat critical to a method’s full potential. Zhang et al. (2020) also pointed out that the influences of hyperparameters were seemingly underlooked when investigating the potential of ML algorithms for rock cutting or MWD.

To shed light upon the knowledge gaps, this paper intended to develop a self-adaptative ANN model to (1) learn from the recently acquired laboratory and field cutting data for ODC and ADC and (2) facilitate the further development of these novel cutting techniques for the selective cutting and breakage of ores for intelligent mining. The developed model was configured with a Bayesian optimisation algorithm to support the automated hyperparameter optimisation. The results indicated that the proposed ANN model is capable of identifying the rock types associated with ODC and ADC with impressive precision. The outcome of this paper could (1) facilitate the further development of these novel cutting technologies and (2) enable mining engineers to effectively and precisely map the high-value ore grade based on a much denser database when planning a selective cutting operation.

Literature review

The nature of this paper necessitated the discussion of the background information about the novel cutting technologies developed over the years, as well as the fundamental of the artificial neural network and Bayesian optimisation algorithm.

Background information about the selective cutting technologies

Essentially, the foundation of selective cutting lies with the concept of undercutting where a conventional rolling disc is used as a drag bit. By directly creating tensile stress while constantly rotating its interface with the rock, Ramezanzadeh and Hood (2010) stated that the undercutting technology adopts the advantages of drag bits (low reaction force) and roller discs (low wear rate). Through a series of laboratory testing and field trials, Ramezanzadeh and Hood (2010) also pointed out that though the machine weight and power required for the undercutting approach are substantially less than what was observed in conventional TBM, reducing the excavation costs remains an issue. Meanwhile, manufacturers such as Wirth (2013) and Sandvik Mining (2007) have been continuing work in this area to improve the cutting efficiency for mechanical cutting of hard rock mining. This promoted the second generation of undercutting technology, known as oscillating disc cutting.

The ODC technology uses an undercutting disc with an internal drive added to oscillate the disc at a small amplitude. The reason behind adding the complication of oscillation motion to an undercutting disc is that cyclic loading can induce fatigue cracking, therefore weakening the rock. The experimental investigations by Karekal (2013) exhibited force reductions in ODC, while Kovalyshen (2015) predicted such force reductions from an analytical perspective. ODC technology has been licenced to Joy Global by CRCMining since 2006 and later was rebranded as DynaCut (Sundar 2016). Recently, Tadic et al. (2018) conducted a series of field trials using the DynaCut in a sandstone quarry located in Helidon, Queensland, as shown in Fig. 1a. The geotechnical assessment indicated that three types of sandstones exist in the site, including (i) the low strength sandstone (LSS) with UCS ranges from 11 to 35 MPa; (ii) the medium strength sandstone (LSS) with UCS ranges from 29 to 56 MPa; and (iii) the high strength sandstone (HSS) with UCS varies between 47 and 85 MPa. To investigate the potential of processing cutting data for the real/semi-real-time characterisation of rocks. this paper used the ODC data collected from this field trial to train and test the developed ANN model at a field scale.

Fig. 1
figure 1

Source of the data collected for this study with (a) field trials conducted for ODC using the DynaCut (b) laboratory testings conducted for ADC using Wobble (Tadic et al. 2018; Xu et al. 2021)

The concept of actuated disc cutting has been proposed as an extension to both undercutting and oscillating disc cutting technologies. An actuated disc cutter attacks rock in an undercut manner with the disc itself actuated around a secondary axis, as shown in Fig. 1b. Rather than oscillating around a secondary axis with limited frequencies and amplitudes like an ODC, the actuation motion actuates the disc cyclically in the direction other than the linear undercutting motion, inducing off-centric revolutions of the disc at a wide range of frequencies and amplitudes. Dehkhoda and Detournay (2017) first proposed an analytical model to understand the mechanics of ADC. Further experimental studies by Dehkhoda and Detournay (2019) reported the parametric effects of ADC key variables on the average thrust force and specific energy. Rock cutting tests were performed on two types of sedimentary rocks (Savonnière limestone (SL) with UCS ranges from 14 to 19 MPa and Gosford sandstone (GS) with UCS ranges from 28 to 31 MPa), using a CSIRO customised ADC unit, known as Wobble. Wobble is equipped with force, torque, and displacement transducers to monitor the cutting process. Xu (2019) and Xu and Dehkhoda (2019) evaluated the ADC-induced fragmentation process by considering the force dynamics and quantifying the fragments generated. To test the idea of rock characterisation based on cutting data, the ADC data collected from ADC laboratory tests was used in this paper to train and test the developed ANN model at a laboratory scale.

Fundamentals of the Artificial neural network and Bayesian optimisation algorithm

The history of ANN can be traced back to 1943 when McCulloch and Pitts (1943) intended to develop a computing system to mimic human biological systems. By replicating the capabilities of the biological neural network, the artificial neurons receive input from synapses and send output when weight is exceeded (Shahin et al. 2002; Shahin et al. 2004). The most frequently used type of ANN is the feed-forward multiplayer perception (MLP). This type of ANN consists of three basic layers, known as (1) the input layer; (2) the hidden layer; and (3) the output layer. The input layer receives the information from raw data and passes it down to the next layer (i.e. the hidden layer). Once all the hidden layers finish the calculation, the output layer will deliver the result. Though those layers have different functions, the way the information is shared and is determined by the neurons presented in each layer. Each neuron first receives a piece of information and then assigns a random weight to that information. By summarising all the input values multiplied by their corresponding connection weights, the net input for a neuron can be calculated as shown in Eq. (1).

$${a}_{j}=\sum_{i=1}^{\infty }{x}_{i}{w}_{ij}+{\theta }_{j}$$
(1)

where \({x}_{i}\) is the ith neuron in the previous layer, \({w}_{ij}\) is the weight that connects the ith neuron in the previous layer and jth neuron in the current layer, and \({\theta }_{j}\) is a bias term that influences the horizontal offset of the function (fixed value of 1).

Once the net put for a neuron is calculated, an activation function will be further applied to determine the output value, referring to Eq. (2).

$${x}_{j}=f({a}_{j})$$
(2)

There are many activation functions available. Based on different purposes, they can be further divided into four categories known as (1) bounded; (2) continuous; (3) monotonic; (4) and continuous. The most commonly used activation function is the rectilinear linear unit function (ReLu). This is because ReLu is rarely saturated by its gradient, which greatly accelerates the convergence of stochastic gradient descent. Other possible activations are the sigmoid function, arc-tangent function, and hyperbolic tangent function (Fig. 2).

Fig. 2
figure 2

Different types of activation functions used in ANNs

An ANN model must undergo a training phase to be able to learn the possible relationships between the inputs and outputs. The goal of the ANN training is to adjust the internal weights of the ANN implicitly (Ghaboussi 2018). By optimising the weights, an ANN model seeks to minimise the difference between the neural network outputs and the desired outputs. Different approaches can be used in solving the optimization problem. The classical approach is the back-propagation method where the gradient of the error function is evaluated with respect to its weight and then used to update the weights to improve the response, as shown in Eq. (3).

$${{\varvec{w}}}_{{\varvec{n}}{\varvec{e}}{\varvec{x}}{\varvec{t}}}={{\varvec{w}}}_{{\varvec{n}}{\varvec{o}}{\varvec{w}}}-\eta \frac{\partial {E}_{T}}{\partial \omega }$$
(3)

where \(\eta\) is called the learning rate of the ANN, \({E}_{T}\) is the norm of error in all samples, \({{\varvec{w}}}_{{\varvec{n}}{\varvec{o}}{\varvec{w}}}\) is the present set of vector for the unknown weight parameters, and \({{\varvec{w}}}_{{\varvec{n}}{\varvec{e}}{\varvec{x}}{\varvec{t}}}\) is the next set of vector for the unknown weight parameters (Hertz 2018).

The weights are originally initialised randomly. This process proceeds until a solution to Eq. (3) is reached. However, in some cases where the structure of the network is more complex than the nonlinearity between the inputs and the outputs, a problem called overfitting is observed. In this case, the training error is small, and the testing error is large. This often happens when the ANN “memories” the training data but cannot generalise well enough, as shown in Fig. 3. The reasons for overfitting could be (1) the number of hidden neurons used, (2) or the training data is insufficient. Too many hidden neurons give the ANN numerous degrees of freedom in the input/output relationship. Underfitting, on the other hand, happens when both the training and testing errors are large. This occurs when the ANN is poorly trained. Underfitting is corrected by either adding more hidden neurons or adding more training data. Good learning occurs when both the training and testing error are small. In this case, the ANN has learned the training data and can generalise for inputs that it has never “seen” before.

Fig. 3
figure 3

Overfitting and underfitting of ANN models (Géron 2019)

Over the year, several attempts have been made to improve the performance of the back-propagation algorithm. Among all, simulated annealing (Sexton et al. 1999; Wang et al. 1999), the genetic algorithm (Goldberg and Holland 1988), and Bayesian optimisation (Rasmussen 2003; Lizotte 2008; Murphy 2012; Snoek et al. 2012) have shown great potential. In particular, simulated annealing is a stochastic global method that searches the optimal based on the likelihood of accepting the current point when compared with other points; the genetic algorithm continually imitates the mechanics of natural selection and natural genetics until no further progress can be made; Bayesian optimisation encodes a prior belief between inputs and outputs, updates the belief based on the laws of probability as information accumulates, and uses the updated belief to guide the optimisation process. Table 1 summarises the pros and cons of the above-mentioned optimisation algorithms. From a practical point of view, Lizotte (2008) and Snoek et al. (2012) concluded that Bayesian has some advantages over the others because (1) it is not restricted to explicitly modeling the objective and (2) it keeps track of each past evaluation which in turn makes the optimisation process more efficient.

Table 1 Summary of the popular optimisation algorithms

To better understand Bayesian optimisation, one has to understand its two major components: (1) a probabilistic model that describes the prior beliefs and (2) an acquisition function that evaluates the next point based on the prior beliefs. For the probabilistic model, most literature chose the Gaussian process (GP) due to its flexibility and tractability. A GP process is a random process where any point \(x\in {\mathbb{R}}\) is assigned a random variable \(h(x)\) and where the joint distribution of a finite number of these variables \(p \left[h({x}_{1})\right.\), \(h\left({x}_{2}\right),\)…, \(\left.h({x}_{N})\right]\) itself is also Gaussian. For the acquisition function, some of the popular ones are known as the probability of improvement (\(PI\)), the expected improvement (\(EI\)), and the upper confidence bound (\(UCB\)), as shown in Eqs. (4)–(6). It is worth mentioning that this research uses the \(PI\) as the acquisition function due to its simplicity.

  • The probability of improvement is defined as

    $$\begin{array}{c}PI\left(x\right)=P\left(h\left(x\right)\ge h\left({x}^{+}\right)\right)=\Phi \left(Z\right)\\ Z=\frac{\mu \left(x\right)-h\left({x}^{+}\right)}{\sigma \left(x\right)}\end{array}$$
    (4)

    where \(h\left({x}^{+}\right)\) is the value of the best sample up to the present and \({x}^{+}\) is the location of that sample and \(\mu \left(x\right)\) and \(\sigma \left(x\right)\) are the mean and the standard deviation of the GP posterior predictive at \(x\), respectively.

  • Expected improvement is defined as

    $$\begin{array}{c}EI\left(x\right)=\left\{\begin{array}{c}\left(\mu \left(x\right)-h\left({x}^{+}\right)-\delta \right)\Phi \left(Z\right)+\sigma \left(x\right)\phi \left(Z\right)\\ 0\end{array}\right.\\ Z=\left\{\begin{array}{c}\frac{\mu \left(x\right)-h\left({x}^{+}\right)-\delta }{\sigma \left(x\right)}\\ 0\end{array}\right.\end{array}$$
    (5)

    where Φ and ϕ are the CDF and PDF of the standard normal distribution, respectively, δ determines the amount of exploration during optimisation and higher δ values lead to more explorations, and a recommended default value for δ is 0.01, and other notations have been defined above.

  • Upper confidence bound is also defined as

    $$UCB\left(x\right)=\mu\left(x\right)-\varsigma h\left(x^+\right)$$
    (6)

    where \(\boldsymbol{\varsigma }\) is a tuneable parameter that is used to balance the exploitation against the exploration of the acquisition function.

Methodology

An overview in Fig. 4 illustrated the workflow of developing the predictive model. A detailed discussion of each step has been further presented in this section.

Fig. 4
figure 4

The workflow of developing ANN model

Data acquisition and definition of input variables

The datasets used in this paper are the testing results from two test campaigns: (1) the fifty cutting tests conducted at a laboratory using ADC and (2) the two hundred and forty-eight cutting trials conducted at the field scale using ODC. As discussed above, the laboratory investigations conducted by Xu (2019) intended to understand the ADC-induced fragmentation under the influences of different operating conditions, while the field trials conducted by Tadic et al. (Tadic et al. 2018) evaluated the scalability of ODC. Despite the objectives, both studies evaluated the performance of ODC and ADC in terms of specific energy and instantaneous cutting rate. As the parametric analyses were not the priority for Tadic et al. (Tadic et al. 2018), all the cutting tests were conducted at the same oscillating amplitude and frequency at various cutting depths with two cutters of different radiuses on three types of sandstone, known as LSS, MSS, and MSS. On the other hand, the cutting tests conducted by Xu (2019) were performed on two types of sedimentary rocks, known as SL and GS, at various actuation amplitudes, frequencies, cutter radiuses, and cutting depths. Considering the designs of experiments for the above two studies were rather different, the input data for the ANN model, therefore, included the following variables for both scenarios for consistency purposes:

  • Specific energy (\(SE\)) indicates the energy required to cut one unit volume of rock. The cutting process is considered to be more efficient if \(SE\) is less. The unit for \(SE\) here is \(MJ/{m}^{3}\).

  • Instantaneous cutting rate (\(ICR\)) is the production rate during the period of cutting. The cutting process is considered to be more efficient if \(ICR\) is high. The unit for \(ICR\) here is \({m}^{3}/hr\).

  • Cutter radius (\(r)\) has a unit of \(mm\).

  • Cutting depth (\(d\)) has a unit of \(mm\).

Table 2 provides a summary of the statistical information about the raw datasets. It can be seen that the observed \(SE\) for the laboratory trials conducted with ADC varies between 0.99 \(MJ/{m}^{3}\) to 14.77 \(MJ/{m}^{3}\) with a standard deviation around 3.44 \(MJ/{m}^{3}\) across the fifity experiments. While for the field trials conducted with ODC, \(SE\) exhibits a rather narrow spreading from 1.21 to 6.88 \(MJ/{m}^{3}\) with a smaller standard deviation of 0.99 \(MJ/{m}^{3}\) for the two hundred and forty-eight experiments. The same trend was observed for \(r\) with ADC showing more variability (i.e. a larger standard deviation when compared with ODC). As for \(ICR\), we observed a totally different move as to \(SE\) and \(r\). \(ICR\) for ODC varies significantly between 34.70 and 118.50 \({m}^{3}/hr\) with a standard deviation of 18.12 \({m}^{3}/hr\), while ADC only changes between 0.01 and 0.07 \({m}^{3}/hr\). Based on the above information, Eq. (7) presents the proposed ANN model where function \(g\) represents the architecture of the developed model \(C\) describing the outcome of the model (i.e. the rock types).

Table 2 Descriptive statistics of the datasets
$$C=g(SE, ICR, r, d)$$
(7)

Pre-processing of raw data and assigning training and testing sets

Pre-processing is often necessary when developing a reliable ML model. As the raw inputs are often comprised of varying scales, converting them into the same scale can (1) reduce the estimation errors and (2) boost the processing time (Sola and Sevilla 1997). To normalise the input variables for ADC and ODC, a Z-score transformation was applied in this paper to avoid outliers. The formula for Z-score normalisation is below in Eq. (8) (Brase and Brase 2013). The normalised datasets for ADC and ODC can be further found in Appendix in Tables 7 and 8. After progressing, the dataset was randomly divided in half, where the training phase was performed on the first 50%, while the remaining 50% was used in the testing phase.

$$z=\frac{X-\mu }{\sigma }$$
(8)

where \(z\) is the normalised data, \(X\) is the raw data,\(\mu\) is the mean value of feature \(X\), and \(\sigma\) is the standard derivation of feature.

Design the architecture of the ANN model

The ANN model developed in this paper has the following four basic components:

  • The number of hidden layers

  • The number of neurons in each hidden layer, which includes the dropout percentage and the shrinkage percentage in each layer

  • The activation function

  • The learning rate

According to Bonilla et al. (2008), different setups of basic components change the architecture of an ANN model and thus alter the synaptic weighting for each input variable. The back-propagation algorithm then predicts the output based on the synaptic weighting of input variables. Despite being clumsy and time-consuming, literature reported that the optimisation of an ANN model is often acquired by using the trial and error method (Bonilla et al. 2008; Horst and Pardalos 2013; Rajabi et al. 2017).

Steaming from above, the ANN model developed in this paper was coupled with the Bayesian algorithm to enable an efficient and robust optimisation process. Like simulated annealing and genetic algorithms, it is necessary to specify the domain range for each hyperparameter. The Bayesian algorithm often takes the range of each hyperparameter and generates a distribution function in searching for the best one. For the ANN model in this paper, the details of the domain range set for each hyper hyperparameter were presented in Table 3. For the activation function, as discussed before, the ReLu has been employed due to its popularity in solving classification problems. Based on the above details, Fig. 5 provides an example of a typical ANN following the above methodology. It is worth mentioning that the actual model (or the fine-tuned model) is dependent on the output of Bayesian optimisation, which will be further discussed.

Table 3 Domain range of each hyperparameter
Fig. 5
figure 5

An example of ANN model developed

Training, testing, and evaluating model performance

The main issues associated with the training and testing of the ANN model are known as overfitting and underfitting. Thus, it is important to monitor the error function, known as the log-loss function for classification problems. To better evaluate the performance of the model, the k-fold cross-validation and confusion matrix were also employed in this paper.

Log-loss function

In a classification problem, the log-loss function or cross-entropy is often used to evaluate the performance of an algorithm. Essentially, log-loss compares the probability of the model against the ground truth. If the difference between prediction probability and the ground truth is significant, the model is then penalised for that prediction. Mathematically, the log-loss function is defined in Eq. (9):

$$log-loss=-\frac1N\sum_{n=1}^N\sum_{m=1}^My_{nm}\;\log\; p_{nm}$$
(9)

where \({p}_{nm}\) is the probability that model assigns to record \(n\) as label \(m\), \(N\) is the number of records, \(M\) is the number of class labels, and \({y}_{nm}\) represents the true label m for record.

k-fold cross-validation

When dealing with a rather small training set, cross-validation is often employed to resample the sample to avoid overfitting and underfitting. The common procedures of k-fold cross-validation often involve (1) splitting the training set into kth smaller sets; (2) selecting one set; (3) training the model using the remaining k-1 sets; (4) testing the model against the selected one set; and (5) computing the average score for each step. For the ANN model in this paper, the raw data has been divided into half as training and testing sets. The training set has then been spitted into five folds, and the model was then trained on each fold and then validated by the rest of the folds; see Fig. 6.

Fig. 6
figure 6

The procedure of k-fold cross-validation for the ANN model

Confusion matrix

Other than k-fold cross-validation, the confusion matrix also can be used to better visualise the performance of a model. When constructing a confusion matrix, the predictions are often plotted against the true lables, as shown in Table 4, where:

  • Positive (P): Observation is positive.

  • Negative (N): Observation is negative.

  • True positive (TP): Observation is positive and prediction is also positive.

  • True negative (TN): Observation is negative and prediction is also negative.

  • False negative (FN): Observation is positive while prediction is negative.

  • False positive (FP): Observation is negative while prediction is positive.

Table 4 A simple guide of the structure of a confusion matrix

Once the confusion matrix is ready, several performance parameters known as classification accuracy, recall, and precision will be computed based on the following equations; see Eqs. (10)-(12).

$$accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
(10)
$$recall=\frac{TP}{TP+FN}$$
(11)
$$precision=\frac{TP}{TP+FP}$$
(12)

Results

ANN model training history

For the ANN model proposed in this paper, Bayesian optimisation was further employed in search of the best sets of hyperparameters when training the model. To prevent overfitting and underfitting, the training dataset was split into five folds, where k-fold cross-validation was further conducted. Based on the given domain range (referring to Table 3), Bayesian optimisation computed thirty searches. Each search was first trained on the 4/5th of the training data and later cross-valuated on the other 1/5th of the training data. For each search, the log-loss and the epochs (i.e. iterations) were recorded for each training and cross-validation (details of training history can be found in Table 9).

Figure 7 presents the training history for ADC, from which it can be concluded that at the 8th search, the mean log-loss from the five cross-validations is almost zero. This indicated the ANN model finds the best set of hypermeters at the 8th search as the uncertainty associated with that set of hypermeters is almost zero. The hyperparameters for the 8th search were attached in Table 5.

Fig. 7
figure 7

Training history and hypermeters tuning for ADC where (a) the number epochs (i.e. iterations) was recorded for each training and cross-validation and (b) the log-loss was recorded for each training and cross-validation

Table 5 The value for each hyperparameter of the 8th search of ANN model for ADC

For ODC, the training history is shown in Table 9, where it was found that the “classifier” has the highest accuracy (i.e. the lowest uncertainty) at the 22nd search (details of training history can be found in Table 10). As the mean log-loss computed from five cross-validations was lesser than other searches, it can be concluded that the set of hyperparameters associated with the 22nd search is the most compatible set when training the dataset for ODC (Fig. 8). Details of the hyperparameters of the 22nd search are shown in Table 6.

Fig. 8
figure 8

Training history and hypermeters tuning for ODC where (a) the number epochs (i.e. iterations) was recorded for each training and cross-validation and (b) the log-loss was recorded for each training and cross-validation

Table 6 The value for each hyperparameter of the 22nd search of ANN model for ODC

ANN model performance

To evaluate the model’s performance, the confusion matrix was computed. The accuracy, recall, and precision were also calculated using Eqs. (10)–(12). As shown in Fig. 9, from the knowledge gained from the training data, the model could accurately identify the two rock types associated with the testing data in the ADC case. In particular, the proposed ANN model indicated six sets of the testing data are associated with GS, while the other nineteen sets of testing data belong to SL. This prediction matched perfectly with the actual condition, which resulted in a perfect score for recall, precision, and overall accuracy. In contrast, for ODC, the ANN model correctly predicted that thirteen, thirty, and sixty-seven sets of testing data are associated with LSS, MSS, and HSS, respectively. The model, however, misclassified six sets of actual LSS into MSS, five sets of actual MSS into LSS, two sets of actual MSS into HSS, and one set of HSS into MSS (Fig. 10). Though there existed some misclassifications, the proposed ANN model for ODC still scores high in terms of recall, precision, and overall accuracy.

Fig. 9
figure 9

The confusion matrix computed for ADC based on the proposed model

Fig. 10
figure 10

The confusion matrix for computed for ODC based on the proposed model

Discussion

ANN could be a powerful tool for the processing of data in mining engineering-related projects. This section of the paper further demonstrated the superiority of the model by comparing the performance of the proposed model with the conventional logistic regression model.

Comparison with conventional logistic regression model

Logistic regression (LR) is one of the most fundamental algorithms for solving classification problems. By estimating the relationships between one dependent variable and other independent variables, logistic regression is easy to implement and is often used as the baseline for any binary classification problem. Taking the same datasets as above, this paper further constructed a logistic regression model to predict the rock types associated with ADC and ODC operations. The modeling process is rather straightforward with detailed procedures available from the following literature: Chen et al. (2019), Vallejos and McKinnon (2013), and Subasi (2020). As can be seen from Figs. 11 and 12, the occurrences of misclassifications (both scenarios of ADC and ODC) for the logistic regression model are more frequent than that of the ANN-Bayesian model proposed. This further resulted in lower values of accuracy as.

  • The overall accuracy of our ANN-Bay model: ADC = 1.00 and ODC = 0.89.

  • The overall accuracy of logistic regression model: ADC = 0.88 and ODC = 0.73.

Fig. 11
figure 11

The confusion matrix for computed for ADC based on the logistic regression model

Fig. 12
figure 12

The confusion matrix for computed for ODC based on the logistic regression model

It can be further concluded that, although LR is easy to implement, its performance is rather poor in comparison with the proposed ANN-Bayesian model. This is due to the fact that LR is not able to handle a large number of categorical features well.

Applications of the proposed ANN model

Applications of ML techniques in the context of mining and other branches of geoscience and geoengineering are focused on data estimation and forecasting, whereas classical mathematical modelling methods are often constrained by the highly coupled and non-linear relationships between the inputs and the outputs. For the mining industry, knowledge of the locations of the high-value recoverable ore grade is very important to minimise extraction costs. Conventional mathematical modeling might not be applicable here as the field data is oftentimes complicated and huge and sometimes needs real-time processing. A neural network. however, can update its knowledge over time as more information accumulates. The application of this approach, therefore, can result in greater accuracy and more robust prediction than the conventional deterministic or statistical techniques. This paper investigated the applicability of the M, in particular, the self-adaptive ANN model, for the classification of rock types associated with two types of the cutting method at a laboratory and a field scale. The results indicated that the proposed ANN model is robust and accurate in terms of identifying the rock types for actuated disc cutting (ADC) at a field scale. Further upscaling indicated the model is also compatible with the field observation for oscillating disc cutting (ODC). From the above analyses and results, it can be concluded that the proposed ANN model has the potential for:

  • Mapping high-value recoverable ore grades more efficiently

  • Optimising selective cutting operations

  • Facilitating decision-making

  • Reducing cutting costs based on the characteristics of rocks

Limitations of the proposed ANN model

Despite delivering promising results during training and testing, the proposed ANN model still has some limitations, most of which are concerned with (1) the stochastic nature of neural networks and (2) the quality of input data. The successful implementation of the neural network is always subjected to some sense of “randomisation”, that is, the stochastic assignment of a subset of the weights to continue the optimisation process. Therefore, it is difficult to control the flow of the model other than checking the output. Further limitations of ANN modeling originate from the data itself. The adage of “what comes in, what comes out” is well applicable to any ANN model. This paper also reflected the importance of input data to the performance of a model. As seen from Figs. 9 and 10, the results that were trained using laboratory data (i.e. ADC data) exhibit a higher level of accuracy when compared to the results obtained for field data (i.e. ODC data). This indicated that field data may suffer from some external viability that is difficult for computational algorithms to interpret, while the laboratory data offers good internal viability. One possible explanation might lie in the scale and homogeneity of rock specimens used for testing. When selecting samples for ADC laboratory tests, there is a chance that the samples were monolithic and monotonic. While when conducting field trials for ODC, the data collected from the machine might be affected by the often jointed, anisotropic, and heterogeneous rock masses. It is also noteworthy that two different cutting mechanics were reported in this paper and further research is required to better understand the impacts of cutting mechanics on the proposed model.

Conclusion

This paper presented a self-adaptive ANN model for the characterisation of the rock types associated with some historical cutting data to benefit selective cutting for intelligent mining methods. The input data, known as the specific energy, instantaneous cutting rate, cutter radius, and cutting depth, were originated from the laboratory tests conducted for ADC and the field trials conducted for ODC. The results and observations can be summarised as below:

  • With the help of Bayesian optimisation, the ANN model presented an architecture of A: 5.00-0.11-0.88-0.43-0.08 for ADC and an architecture of A: 5.00-0.17-0.08-0.08-0.28 for ODC, where each value corresponds to the number of hidden layers, the number of neurons percentage in each hidden layer, the learning rate, the allowable dropout percentage in each layer, and the allowable shrinkage percentage in each layer.

  • The results obtained from the model with the above-mentioned architecture were highly encouraging. In particular, the model for ADC was extremely accurate in classifying the rock types with accuracy, recall, and precision all equal to one. For ODC, there existed some misclassifications; this resulted in an overall accuracy of 0.89. The recall and precision observed ranged from 0.68 to 0.99.

  • Comparing the results for ADC and ODC, one can conclude that the proposed ANN model was very sensitive to the quality of input data. In specific, the model output for the laboratory tests conducted for ADC was more accurate than the field trials for ODC.

  • The proposed ANN model seems quite promising for optimising the selective mining operation. However, like other ML algorithms, this ANN model was also stochastic by nature. This means that the system’s flow is untraceable, and the output from the model might be subjected to some slight variations for each execution.

It is worth mentioning that the Bayesian optimisation algorithm is quite sensitive to the selection of the prior distributions. When constructing the prior distributions, this study took the default mode of settings for simplicity. Therefore, exploring the different settings of the prior distributions and their effects on the accuracy of the model is yet another valuable topic that can be further discussed.

Field implementation of new mechanical mining technologies is often constrained by the estimation of in-place geological resources. This work developed a self-adopting neural network to classify the rock type associated with the measure-while-cutting data. Accurate and precise inversion by experienced geologists could be somewhat time-consuming and labour-intensive so the proposed framework takes advantage of the domain knowledge that is available and developed a neural network to learn from the existing knowledge, further enabling the inversion to be performed for selective cutting operations. The framework was trained and tested not only on the laboratory data but also on the field data. With the field data being fed to the neural network, the model can quickly identify and classify the rock types associated with the cutting.

This methodology provides the next step towards enabling the interpretation to be performed more precisely, selectively, and efficiently in practice with the machines that are now starting to be tested in mining operations. The value of being able to reduce the carbon footprint of mining operations by tuning the energy of the machine to create the appropriate fragmentation for transport and processing is evident, but cannot yet be quantified until further field data is available. The selective mining technologies using ODC have been tested by Hillgrove Mine in South Australia recently. Further study is underway to better understand the performance of the ANN model from a production scale.