1 Introduction

Brønnøy Kalk AS is an open pit mine in Northern Norway extracting marble that is processed into a whitening agent and used by the paper industry. The fragmented marble is produced by a conventional drill and blast operation. The mine provides raw materials to one of the largest suppliers of ground calcium carbonate (GCC) in the world. GCC is important for paper production as it is used not only as filler to reduce cellulose and energy consumption but also as pigment to improve brightness, opacity, and ink absorption. In the mine, the carbonate deposit constitutes different types of marble strata, which are characterized by impurities and visual characteristics (Watne 2001). In addition to marble, dykes and veins of dolerite, aplite, and pegmatite are abundant in the deposit. However, only the so-called speckled marble is regarded to have the required quality to be mined as a product (Vezhapparambu et al. 2018; Watne 2001). Each blast containing approximately 100,000 metric tons of raw material is divided into smaller blocks or selective mining units (SMUs) based on geometric and visual characteristics. Generally, each SMU is approximately 5,000 tonnes of rock and contains 10 to 15 blast holes. By sampling the rock chips from the blastholes in the unit, a collective sample characterizing the whole SMU is prepared and sent to the laboratory for the assessment of quality, by means of the Technical Association of the Pulp and Paper Industry (TAPPI) index, the primary quality indicator. Produced SMUs are blended carefully considering the quality specifications, and the blend is fed to the primary crusher. Lower-quality SMUs may be stockpiled for future blending with high-quality blocks (Vezhapparambu et al. 2018).

Mining and manufacturing industries will produce prominent amounts of waste which are sometimes toxic. To reduce such waste and related environmental impacts, the industries should adapt the best available techniques for production. One of the United Nation’s (UN’s) 2030 Sustainable Development Goals (SDG 12) is to ensure the sustainable consumption and production patterns by industries (United Nations 2022). Maximum utilization of raw material is important to minimize waste and related environmental impact. In the particular case at Brønnøy Kalk AS, due to geometry-based definition of the SMU boundaries, some high-quality marble may not be included in high-quality blocks, or low-quality marble may be left in the high-quality block boundaries. Therefore, a high-resolution and improved quality assessment method can increase the production efficiency and provide better blending plans.

Following the digitization policy, measurement while drilling (MWD) data have been used to define the geological features and rock properties since it provides high-resolution data at a low cost (Vezhapparambu and Ellefmo 2020). As part of the policy, MWD data have been regularly collected. MWD is defined as acquisition of data related to drilling parameters gathered from the drilling rig. The main parameters recorded by the MWD system, varying depending on drill rig characteristics, are dampening pressure, feeding pressure, flush air pressure, rotation pressure, percussion pressure and penetration rate.

For the development of rock property predictive models based on MWD data or other information, different machine learning algorithms such as boosting, neural networks, and fuzzy logic have been used (Kadkhodaie-Ilkhchi et al. 2010). Supervised learning algorithms aiming to find the link between inputs and outputs and to make predictions with minimum error are widely used for mining engineering implementations. Depending on the purpose, the output of the algorithm can be either a continuous quantity, as in the case of regression modelling, or labels or classes as in the case of classification modelling. Both regression and classification supervised learning algorithms are widely used in mining engineering-related modelling studies. Leighton (1982) improved blast design considering the peaks in an MWD parameter, penetration rate, as an indicator of weak zones. Segui and Higgins (2002) also stated that MWD data can be used to improve blast results in mines. They proposed that different types and amount of explosive can be used for each blast hole, according to the information derived from MWD parameters. Basarir et al. (2017) predicted rock quality designation (RQD) by a soft computing method called adaptive neuro fuzzy inference system (ANFIS) using MWD data such as bit rotation, bit load and penetration rate. Recently, Vezhapparambu et al. (2018) established a method to classify rock type utilizing MWD data. For classification supervised learning algorithms, Chang et al. (2007) used logistic regression to predict probabilities of landslide occurrence by analysing the relationships between instability factors and the past distribution of landslides. Hou et al. (2020) established a method to classify rock mass based on a random forest algorithm using cutterhead power, cutterhead rotational speed, total thrust, penetration, advance rate and cutterhead torque collected during a tunnel boring machine (TBM) operation. Zhou et al. (2016) investigated 11 different supervised learning algorithms including naïve Bayes (NB), k-nearest neighbour (KNN), multilayer perceptron neural network (MLPNN) and support vector machine (SVM) for rock burst prediction. Caté et al. (2017) employed machine learning as a tool for the prediction of gold mineralization. They compared six classification methods in terms of detection of gold-bearing intervals using geophysical logs as a predictor.

This study investigates the relation between MWD data and the quality of the marble by means of classification modelling to assess the quality of marble, in terms of the TAPPI index. Thereby, the mine can more accurately define SMU boundaries and thus improve production efficiency. Initially a reliable database was constructed containing MWD operational parameters and TAPPI values. For model development, first, a basic stand-alone classification algorithm, logistic regression (LR) modelling, and then a more advanced ensemble machine learning method, random forest modelling (RF), were used. The prediction performances of the models were thoroughly examined using commonly used performance measures. As a next step, the model's output may be utilized to create a systematic blending strategy to prevent ore loss.

2 Field Studies

The database used in this study was gathered in a recently completed project entitled Increased Recovery in the Norwegian Mining Industry by implementing the geometallurgical concept (InRec). The project was coordinated by the Mineral Production Group at the Norwegian University of Science and Technology and funded by the Norwegian Research Council and three industry partners (Ellefmo and Aasly 2019).

The collected MWD data can be grouped into two subgroups depending on whether they are controlled by the operator or the medium that is drilled. Penetration rate (PR), flush air pressure (FAP), rotation pressure (RP) and dampening pressure (DP) were considered as dependent variables as they change with the conditions of rock characteristics. However, percussion pressure (PP) and feed pressure (FP) were labelled as operator-controlled parameters. PR automatically increases if the rig’s rotation, feed and percussion pressures are increased without any change in rock type. From this, it can be concluded that penetration rate alone may not be a suitable indicator of rock properties and should be combined with other variables. Flush air pressure is defined as the pressure required to carry drill cuttings out of the borehole, so it is highly dependent on borehole depth. Percussion pressure is controlled by the operator and defined as the pressure released by the piston while hammering. The pressure needed to push the entire hammer downward is known as feed pressure. The feed and percussion pressures are adjusted by the rig's control system to regulate the lower and higher extreme values of rotation pressure and dampening pressure (Ghosh et al. 2017; Schunnesson 1998). Rotation pressure is the amount of pressure used to rotate the drilling bit. The rotation or rotation speed is fixed by the operator. Lastly, dampening pressure is the pressure that helps control percussion pressure and is adjusted by the rig (Atlas-Copco 2019).

MWD data collected during field work are comprised of log time (YYYY-MM-DDThh:mm:ss), depth (metres), penetration rate (metre/min), flush air pressure (bar) percussion pressure, (bar), feed pressure (bar), rotation pressure (bar) and dampening pressure (bar) collected from the MWD system of the Atlas Copco (now named Epiroc) T-45 SmartROC drilling rig for 10-cm intervals. In addition, every 3.5 m, a new rod is attached to the drilling string, leading to anomalies in the MWD data. The anomalies are obvious especially in percussion pressure and feed pressure because they are adjusted by the drilling rig system or by the operator to ensure a smooth drilling process (Ghosh et al. 2017). Since such anomalies can mislead the interpretation of the data, filtering algorithms are implemented. The data beyond two standard deviations from the mean of percussion and feed pressure are filtered out. Moreover, in order to eliminate rod change effect, the data coinciding with the depth at which rod was changed was also filtered out. Figure 1 represents the distribution of filtered MWD data, and summary statistics are shown in Table 1.

Fig. 1
figure 1

MWD data distribution

Table 1 Summary statistics of filtered MWD data

Pearson correlation coefficients between MWD parameters are shown in Fig. 2. Due to the strong correlation (0.84) between dampening pressure and feed pressure, feed pressure is not included as an independent variable during the modelling process.

Fig. 2
figure 2

Pearson correlation coefficients between recorded filtered MWD parameters

The quality or TAPPI data were obtained from laboratory tests conducted on the drill cuttings of five boreholes at different locations, containing pure and impure marble due to the intrusions of non-carbonate lithologies and fracture zones. Holes were selected based on geological variability in collaboration with the on-site mining geologist. Each hole was divided into several sections with different lengths ranging from 1.3 to 3.4 m. Drill cuttings were collected from 29 sections in total, and the quality of each section was determined. To be able to ensure exact georeferencing of the data, “To” and “From” depths were recorded at the beginning and the end of each section (Vezhapparambu 2019). After lab tests, the TAPPI values and corresponding quality class of the section were determined and assigned to the corresponding MWD data. It should be noted that range of TAPPI values and corresponding quality classes were defined considering the blending process necessary for achieving the desired product quality. For example, class 1 corresponds the highest quality, while class 6 refers to waste material with the lowest quality. Due to a data confidentiality agreement, TAPPI values cannot be presented. However, the normalized values used to assess the quality of marble are shown together with assigned classes in Table 2 and Fig. 3.

Table 2 Normalized TAPPI values and assigned quality classes
Fig. 3
figure 3

Quality class distribution

The dataset comprises 441 MWD data used as independent variables and corresponding quality classes used as dependent variables. Figure 4 shows the normalized TAPPI values and corresponding MWD operational parameters together for one of the holes (number 204).

Fig. 4
figure 4

Recorded MWD data and normalized TAPPI values used in modelling for one of the holes (# 204)

3 Modelling

For the construction of marble quality classification models, a conventional stand-alone and an ensemble method are used for the prediction of the quality class. Logistic regression (LR) is employed as a conventional model. Afterwards, the ensemble method-based random forest (RF) algorithm combining several decision trees is used. When the dataset has noise and outliers, an advanced method such as RF is expected to yield more precise results compared to conventional methods (Breiman 2001).

Typically, supervised models work quite similarly in principle. The models need a p-dimensional input vector, X = (X1, X2, …, Xp)T and an output vector, Y. The aim is to find the best function f(X) to predict Y and minimize the loss determined by a loss function or error (L(Y, f(X))) measuring the difference between the predicted and measured values (Cutler et al. 2012).

To improve the model’s ability to generalize, models need to be tuned by carefully choosing their hyperparameters depending on the characteristics of the data and learning capacity of the algorithm. In this study, the optimum hyperparameters were investigated by the GridSearchCV function provided by the scikit-learn library (Pedregosa et al. 2011). Although it is computationally demanding, grid search is one of the most powerful techniques among the methods used to find optimum hyperparameters. In this method, a list of values for different hyperparameters is provided, and different combinations are evaluated to find the best set in terms of model performance (Raschka 2016). The list is defined by the user and depends on the dataset characteristics such as size of the dataset. Besides grid search, a k-fold cross-validation algorithm was employed in the training stage. In this method, the data are split into two parts as training and testing before the model is trained. While the testing data are kept as unseen data, the training data are split into k folds where k–1 folds are used for the model training, and the remaining fold is used for testing (Raschka 2016). The k-fold validation method enables updating the parameters of the model k times on different k folds of data which results in reducing overfitting (Sadrossadat et al. 2022). As seen from Fig. 3, the database comprises unbalanced class distribution in the output. To prevent the drawbacks of unbalanced class distribution, a stratified k-fold cross-validation method was implemented. In the training stage, the number of folds was chosen as 10 as suggested in the literature, and an external testing set was used to assess the models' predictive performance (Kohavi 1995; Qi et al. 2020; Rodriguez et al. 2010).

The dataset was randomly divided into two groups as training and testing, adopting a stratified shuffle split method regarding the elimination of an unbalanced class split in the training and testing sets. The training set used in the construction step of the models comprised 352 datapoints (80%). The test set used for validation of the performance and predictive power of the models comprised 89 datapoints (20%). The models were trained and tuned through Python programming (Guido Van and Fred 2009), JupyterLab web-based user interface (Kluyver et al. 2016) and the scikit-learn library (Pedregosa et al. 2011).

3.1 Logistic Regression (LR) Model

LR is one of the most widely used conventional classification algorithms as it is easy to implement, and it performs quite well for linearly separable cases. LR can be thought of as linear regression used for classification problems. The least square function of linear regression is used in LR, which establishes multivariate correlations between one or more independent and dependent variables, such as

$$ Y_{n} = \beta_{0} + \beta_{1} X_{1n} + \beta_{2} X_{2n} + \cdots + \beta_{m} X_{mn} + \varepsilon_{n} , $$
(1)

where Yn is the dependent variable; X = X1n, X2n, …, Xmn are the independent variables; β0 is a constant; β1, β2, …, βm are the coefficients of regression; and εn is the error. The main difference between linear regression and LR is that LR’s output varies between 0 and 1, seen in Fig. 5. It uses the logistic function defined below (Belyadi and Haghigtat 2021)

$$ f\left( x \right) = \frac{1}{{1 + e^{ - X} }}\;t \in {\mathbb{R}},\;f\left( x \right) \in \left( {0, 1} \right). $$
(2)
Fig. 5
figure 5

Graphic representation of logistic function

The LR model employs a variety of algorithm-specific hyperparameters, mainly penalty, solver and C value. They should be carefully chosen to avoid overfitting as well as underestimation. Penalty indicates the regularization method, for example, l1, l2 and Elastic-Net to reduce the loss function. Solver is the algorithm used in the optimization problem. C value defines the strength of the regularization. The lower the C-value, the higher the regularization strength. After the hyperparameter optimization by grid search method, the accuracy of the model reached its maximum with the following hyperparameters [solver: newton-cg, penalty: l1 and C: 10].

3.2 Random Forest (RF) Model

The RF algorithm is based on the idea of ensemble learning, defined as the aggregation of many decision trees. It was introduced by Breiman (2001) and is used for both classification and regression problems. Similarly, input features (independent variables) can be both categorical and continuous. The RF algorithm is preferred because of varying reasons such as ease of use, strong generalization ability and rapidity. Although, the mathematical idea behind a decision tree is easy to grasp, the RF algorithm is regarded as a so-called black box model since the model consists of a large number of trees (Couronné et al. 2018).

The RF algorithm has two sources of randomization. The first source is due to the random selection of the splitting input feature in the “root node” or “parent node”. Bootstrapping is the second source of randomness and a very popular method in the construction step of the RF. It is defined as dividing the training data into small subsamples randomly and growing several trees with them (Hastie et al. 2009). However, in this study, no bootstrapping over samples was needed since the model accuracy is already high, and it is computationally demanding.

The RF algorithm builds f from a set of “base learners”, h1(x), …, hj(x), which are then merged to form the “ensemble predictor” f(x). For the regression problems, the final output is obtained by averaging the predictions from each tree in the model, as given in Eq. (3).

$$ f\left( x \right) = \frac{1}{J} \mathop \sum \limits_{j = 1}^{J} \left( { h_{j} \left( x \right)} \right). $$
(3)

The error of the model result can be calculated by different criterions, but the mean squared error is mostly used. The model is trained until the mean squared error gets its minimum value. For the classification problems, the prediction is determined by a majority vote. The final output is calculated as seen in Eq. (4).

$$ f\left( x \right) = \frac{1}{J} \arg \max \mathop \sum \limits_{j = 1}^{J} I\left( {y = h_{j} \left( x \right)} \right). $$
(4)

The RF model is trained until the splitting criterion, for example, misclassification error, Gini index or entropy, reaches its minimum. In the current model, the Gini index is implemented as splitting criterion, which is defined in Eq. (5). The input feature for splitting in the “root node” is selected randomly. Gini index measures the impurity in the “child node” corresponding to each input feature and selects the one with the smallest Gini index as the next splitting feature. Each node is split into two nodes until the Gini index is zero. The nodes having a zero Gini index are called “terminal nodes”.

$$ {\text{Gini}}\;{\text{index}} = \mathop \sum \limits_{{k \ne k{\prime }}}^{k} p_{k} p_{{k^{\prime}; }} p_{k} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} I\left( {y_{i} = k} \right), $$
(5)

where yi is the predicted value, k is the number of classes, pk is the proportion of class k observations in the node (Cutler et al. 2012).

There are several hyperparameters that can be adjusted to build a successful RF model, such as depth of the trees (max_depth), number of trees (n_estimators) and number of variables (max_features). One of the most important hyperparameters is the depth of the trees in the RF model. The number of trees in the forest is another hyperparameter that needs to be tuned. The model accuracy becomes better while the number of trees is higher in the forest. However, the model training may take a much longer time with more trees (Agrawal 2021). The other important hyperparameters is the number of variables employed when looking for the best split. The highest accuracy is achieved with the following hyperparameters: [max_depth: 6, n_estimators: 118 and max_features: 2].

3.3 Performance Indicators

There are several error metrics to investigate the performance of a classification model. In this study, the predictive performance of the models is assessed by considering accuracy score, precision, recall and F-score, which are defined in Eqs. (6), (7), (8) and (9). These metrics are derived from the confusion matrix, visualizing the predictions with respect to the real classes. An example confusion matrix with two classes is shown in Table 3 and used for explanation. The positive class is represented by 1, while the negative class is represented by 0. Values in the diagonal are the number of correct predictions, while the rest are the number of incorrect predictions. In the metrics, the number of correctly predicted positive class examples (true positive), correctly predicted examples that do not belong to the class (true negative) and incorrectly predicted classes (false negative or false positive) are considered. Accuracy is defined as the overall success of the classification model. It is calculated by taking the ratio of correct predictions and total number of predictions. Precision is the percentage of correct positive predictions; that is, the proportion of positive identifications the have really been correct. Recall or sensitivity is the percentage of true positives that are successfully predicted. In other words, it explains what proportion of real positives have been correctly predicted. F-score is the weighted average of precision and recall. It can be used for the interpretation of precision and recall together (Sokolova and Lapalme 2009).

$$ {\text{Accuracy}} = \frac{{{\text{TN}} + {\text{TP}}}}{{{\text{TN}} + {\text{FN}} + {\text{TP}} + {\text{TN}}}}, $$
(6)
$$ {\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}, $$
(7)
$$ {\text{Recall}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}, $$
(8)
$$ F{\text{-score}} = 2* \frac{{{\text{Precision}}*{\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}. $$
(9)
Table 3 Example confusion matrix with two classes

4 Results and Discussion

To measure the performance of LR and RF models, confusion matrices including six quality classes were used. In Figs. 6 and 7, confusion matrices for the training, testing and overall dataset for both models are shown. It can, as expected, be clearly seen that the RF model outperforms the LR model. For all data, LR model predicts 41 out of 47 data correctly (true positive) for class 1. However, 20 data are labelled as class 1 (false positive), while three of them belong to class 2, 16 belong to class 4 and one belongs to class 5. Similarly, predictions for the rest of the classes are highly inaccurate, as seen in Fig. 6. On the other hand, the RF model labelled 45 out of 47 data correctly (true positive), and there is only one false positive prediction for class 1. The accuracy is quite high for the rest of the classes as well.

Fig. 6
figure 6

Confusion matrices for the LR model

Fig. 7
figure 7

Confusion matrices for the RF model

The performance of models was also examined during the training and testing stages to ensure whether they were overfitting or not. Table 4 represents the accuracy, precision, recall and F-score of the multiclass models utilized for the prediction of quality classes. For each model, overall precision, recall and F-score values were calculated by taking weighted average of each class as shown in Table 4. The tree-based ensemble classification (RF) model produced better predictions than the LR model.

Table 4 Performance metrics of the models

The marble quality classes predicted by the RF model can suggestively be utilized as inputs to geostatistical methods for building a quality block model. As a result, SMU boundaries may be drawn based on the predicted marble quality rather than based on the procedure the mine uses today that are based on tonnage and geometrical constraints. In such a case, taking into account the predicted marble quality, the SMU boundaries will be chosen so that the quality variability within each SMU is lower (homogeneous quality) and higher to ease the blending process between the SMU variability.

Furthermore, the SMU boundaries defined by the suggested algorithm can also improve production efficiency by reducing the probability of high-quality marble being discarded or low-quality marble being mixed in the product.

5 Conclusions

MWD data-based marble quality class prediction models were developed via the use of different supervised learning type classification modelling methods, LR and RF. MWD parameters were selected based on statistical evaluations and used as independent variables. The RF model provides higher accuracy, precision, recall and F-score than the LR model. The constructed RF model can be used as reliable tool for the preliminary prediction of quality class of marble.

The suggested RF model allows quality assessment and provides high-resolution quality data that can be used by geostatistical methods as valuable input to create a quality block model for efficient production and blending plans. The predicted classes can potentially be used in an indicator kriging approach, where each predicted class is transformed into a binary class to evaluate the portion of the blast or the SMU above some quality. Such block model may help the mine for a more accurate definition of SMU boundaries to reduce production and ore loss and to improve and strengthen the blending plans. This way, the productivity will enhance and contribute to SDG 12, the responsible consumption and production pattern.

As proved by the statistical performance indicators, the suggested models are reliable and accurate; however, it is always possible to improve model performance by adding new data representing a broader range of the geology in the studied mine. For other mines in a similar situation, the presented methodology can be followed to improve production efficiency and reduce production loss.