1 Introduction

The variations between production schedule forecasts and actual performance for underground mining, particularly in sub-level open stoping (SLOS), have been a concerning subject for years [1,2,3,4]. This has necessitated several studies to establish appropriate mining efficiency factors such as dilution and recovery to improve the prediction of stope performance and, ultimately, ore and metal production forecasts. Mining dilution relates to the percentage of planned or unplanned sub-economic or waste material extraction beyond the delineated stope boundaries. Dilution affects profitability margins due to additional resources and costs required to handle the additional sub-economic or waste material. According to Planeta, Bourgoin [5], a 13% increase in dilution is estimated to potentially cause a 15% drop in revenue and a 60% reduction in profitability. Mining recovery accounts for the economic material left in the stope that is not recoverable due to design and operational constraints, such as blast performance and loading equipment accessibility and capabilities in the extraction confinements. It is a percentage measure of the extraction efficiency, accounting for ore losses for stopes. Accurate dilution and recovery factors improve volumetric estimations on final ore quantities mined, thereby improving resource allocation planning efficiency and effectiveness (equipment, time, labour), as affirmed by Bagde [6]. Importantly, this improves the accuracy and reliability of business cashflow forecasts due to improved accuracy of production forecasts. Production schedules for the long-term production planning and forecast horizon in SLOS operations utilise draft mining stopes due to limited data available as exploration and development progresses. As more information is generated from downstream mining processes such as development and sampling, the draft stopes (and, by default, the tonnes and grades) are altered and refined to optimise the mining dilution and recovery of a final mined stope. The draft stopes would later change dimensions to adjust for the final ore drive layout and reef positions within the drive, with further changes at the drill design stage to accommodate the drilling and mucking capabilities on site. Figure 1 summarises the adjustments to draft stopes and the associated changes in stope tonnes and grades, providing insights into the complexity of dilution and recovery factors in SLOS, and why a weak prediction relationship is expected. These adjustments result in deviations between production schedule forecasts generated from draft stopes and actual production from the final modified and mined stopes at the production stage [7, 8].

Fig. 1
figure 1

Transformation of scheduled draft stopes to final mined stopes in SLOS mining

Extant literature on dilution in underground mining SLOS methods mainly focuses on the performance of finalised stopes, ready for drilling and blasting. The established dilution factors from such studies are usually extended, without moderation, for application in long-term production planning assumptions. This ignores the fact that long-term production schedules utilise draft stopes and therefore disregards the influence of multiple intermediate adjustments to draft stopes. As a result, the strength of the dilution relationship established using final stopes is expected to be weaker relative to the draft stopes due to the stope dimension disparities arising from stope adjustments. To address the limitations of the weak relationship envisaged, sensitivity tests and iterative model tuning will be conducted to establish the model’s optimal settings.

The final predicted stope tonnes mined will largely depend on the dilution and recovery factors applied. However, in practice, recovery factors are difficult to relate to original stopes due to multiple intermediate adjustments to stope designs and geological block model updates. Furthermore, metal content recovery may exceed 100% due to excessive overbreak into sub-economic ore zones and, inevitably, due to the heterogeneous nature of ore grade distribution and background grade disparities commonly observed in block models. These factors complicate the formulation of a reliable recovery model [9]. In addition, the transformation of stopes has multiple subfactors that exhibit a complex plurality of influence and interconnectedness. As such, the determination of appropriate dilution factors relative to draft stopes is challenging to decipher without the rigours of sufficiently trained predictor models [10]. Fortunately, machine learning (ML) techniques and their applications in predictive modelling continue to gain momentum, with significant results reported in various mining processes. Specifically, the successful application of white box models such as the decision tree (DT)-based algorithms and principal component analysis (PCA) in improving prediction in mining processes provide a compelling proposition for consideration in the current study [11,12,13,14,15,16].

For clarity, the primary objectives of this study are summarised as follows:

  • Establish a robust and proactive dilution prediction mechanism to support production scheduling optimisation of brownfield expansions or greenfield mining projects based on generic data mostly available at the pre-feasibility stages (geological, geotechnical and stope design attributes).

  • Improve the prediction of dilution on draft stopes used in long-term production schedules, utilising the best available information from known variables at the early stage of stope design and schedule generation.

Following this, the novel contribution of this study arises from the concept of establishing suitable dilution factors based on early design data to facilitate the generation of robust production schedules and evaluations at the strategic level of detail. This enriches the intensity of strategic planning elements, which may facilitate unlocking business growth opportunities [17]. Most studies on dilution utilise finalised stopes, leading to considerably high model prediction accuracy of over 70% being reported [18,19,20]. The predicted dilution factors from these studies are then literally used for new or brownfield expansions without moderation, causing undesirable discrepancies. Furthermore, it is argued that blast damage depends on the rock mass quality (Q), with minimal damage occurring in stopes with higher Q ratings, and vice versa for stopes with lower Q ratings [21]. Therefore, the effects of drill and blast are largely accounted for in the geotechnical modelling if the stopes to be mined lie in the stable zone of the stability graph, which is usually the case in approved mining cases. The deliberate omission of these factors will not compromise the model’s accuracy materially for the medium to long-term planning horizons. However, these may be considered optionally for site-specific model modifications to suit changing conditions and short-term needs [22]. Following on this approach, prediction models for dilution of draft stopes are expected to be comparatively weaker than dilution prediction models commonly based on finalised stopes. Nonetheless, any minor improvement in dilution prediction at the planning stage shrinks the concerning production disparities between production schedule and final mined output. This is precisely, the main objective of this study.

2 Literature Review

Dilution in underground mining and tunnelling has been studied in various contexts in the form of stope hanging wall stability [23,24,25,26,27], overbreak in mining excavations [18, 20, 28] and ore dilution in underground mining stope extraction [7, 9, 29, 30]. Extant literature on mining dilution has identified several attributes as having a considerable impact on ore dilution and recovery [10]. These attributes can be classified into three broad classes viz: geological, geotechnical and geometric (physical) factors. While other factors such as drill and blast parameters (drill hole length, diameter, pattern, charge density, explosive type) and human error factors influence dilution, their impact has been shown to have very minimal impact on dilution [18, 20, 31], and are, therefore, not considered in this study.

2.1 Stope Geometry

Stope geometric attributes have been proven to impact the stope hanging wall stability and dilution significantly [24, 29]. Geometric attributes comprise reef dip, stope area, stope width, reef thickness and height. Unplanned dilution is known to be sensitive to stope height and hanging wall dip. Shorter and steeply dipping stopes generally have lower dilution, while longer and shallow dipping stopes have higher dilution [32]. Furthermore, longer strike lengths reduce overall stope stability, thereby increasing unplanned dilution [8, 27]. Thus, variations to stope geometry between pre- and post-development stope design phases impact dilution. Furthermore, dilution is also sensitive to stope widths and spans, particularly in SLOS mining methods [29]. Open stopes with shorter strike lengths and longer height dimensions and/or longer strike lengths and shorter height dimensions are generally more stable (and therefore, less prone to overbreak) than stopes of the same area designed in a square shape [33]. In this context, the appropriateness of the stope design outlay and dimensions contributes significantly to the determination of overall stope geometry and, ultimately, stope performance [18, 23, 34,35,36]. As such, analysing these attributes on mined stopes may provide useful insights on dilution patterns for modelling dilution prediction in stopes planned for future extraction.

2.2 Geotechnical Factors

Geotechnical factors that significantly influence dilution include rockmass characterisation properties such as rock quality designation (RQD), modelled stope span stability thresholds (hydraulic radius), the existence and nature of faults and bedding plane structures [37, 38]. Early studies primarily sought to predict dilution using variants of the stability graph method, which was first identified by Mathews, Hoek [23] and later modified by Potvin [24]. This method uses geotechnical rockmass characteristics to establish zonal stability, stable stope spans and the possible failure criterion and extent in stope excavations. The combined effect of stope hanging wall shape and size has been modelled as hydraulic radius, showing a positive correlation relationship with the magnitude of hanging wall instability [39]. The relationship between the rock quality (Q) and rock quality designation (RQD) is expressed as follows:

$$Q=\frac{RQD}{J_n}\times \frac{J_r}{J_a}\times \frac{J_w}{SRF},$$

where

J n, Jr, Ja, Jw and SRF refer to joint set number, joint set roughness, joint alteration, joint water inflow and stress reduction factor, respectively. Jw is 1 for dry areas (no water seepage), Ja is also 1 for fresh joints, while Jr is largely constant except in areas where a major fault approaches or intersects a stope. Thus, Q varies linearly with RQD, and therefore, the RQD can be used to assess the impact of rock mass quality on overbreak/dilution performance, as noted in [34]. Further studies concluded that narrower stopes require a shorter hydraulic radius as they are more prone to instability owing to their high sensitivity to dilution [40]. However, the prediction of the magnitude of changes to stopes has had challenges with different interpretations by scholars, resulting in a lack of consensus on the measurement of inputs and reliability of results based on stope sizes [25, 26]. Structural faults and bedding planes are zones of weakness within rock formations. As such, stopes within these zones are more likely to overbreak excessively, causing huge variations between planned and mined stope tonnes. Furthermore, structural faults and defects tend to dictate the failure and propagation mode of blast-induced fracture patterns [18].

2.3 Geological Factors

Geological factors involve the mineralogy of the rock, the rock formations, lithology and density. The presence of minerals with low hardness or clay within the mineralisation and host rock creates zones of weaknesses that reduce rockmass stability, thereby increasing chances of overbreak and dilution under increasing in situ or blast-induced stresses [41]. Rock strength categories based on the degree of weathering alteration have also been studied for a potential relationship with stope stability, with a positive correlation confirmed [10]. This suggests that the minerals within a rock formation are potential test variables for stope performance studies. However, despite the significant achievements in dilution prediction and control, these methods rarely relate dilution to original draft stopes off the long-term production schedules. They only consider the final stopes ready for drilling and blasting.

2.4 Dilution Prediction with Machine Learning

Machine learning (ML) models can be trained on historical data to “learn” underlying useful relationships for predicting future performance based on unseen data. According to [42,43,44], predictive capabilities of ML models can be leveraged to improve the accuracy of production schedule inputs. ML models such as artificial neural network (ANN) methods have been used for some time to determine dilution in underground mining, with some high-accuracy results reported [20, 28, 30]. These studies concluded that hanging wall stability and dilution were sensitive to drilling accuracy, explosive charge density, in situ stresses and hanging wall dip angle. Similarly, comparative results were also reported from using multiple regression analysis variants for dilution prediction in underground mines [18, 20]. However, prior studies found that ANN models require a relatively large amount of input data for developing an accurate predictive model and tend to overfit prediction capacity in small data sets [45]. Decision tree model variants, such as the random forest (RF), have also been successfully applied to predict hanging wall stability and dilution in underground stopes [34, 46] In related studies, decision tree (DT) algorithms were successfully used by Ajak, Lilford [47] to predict the existence of undesirable clay material in an orebody at an iron operation in Western Australia, attaining a prediction accuracy of over 70%. The results were complemented by findings in Jafrasteh, Fathianpour [48], who also used RF decision tree ensembles, with affirmative results in predicting copper grade for the Sarcheshmeh copper deposit in Kerman, Iran. Despite these studies being carried out for open pit operations, both findings confirm the suitability of the application of decision tree variants for dilution and ore grade prediction in mining processes.

3 Data Collection and Preparation

Data for 282 mined stopes in a narrow to medium width orebody was collected from an anonymised Western Australian gold mine for the period 2016 to 2021. The mine utilises a sub-level, open stoping (SLOS) method for mining shallow to medium dipping orebodies (15–60°). The data was pre-processed to extract key variables, remove outliers and address missing values. Indicator variables were used to quantify categorical data (major faults and weathering condition) in the model. Validation checks on the data revealed a few missing figures and abnormalities. In instances where the data could not be verified, affected stopes were removed to minimise speculation errors and skewing of research data. Stopes with block model output-related outliers, or missing values, were also removed (case deletion) from the study data. Missing values for stope attributes were replaced with averages in instances where neighbouring stopes existed for problematic stopes (mean substitution). The final recovered metal content was used to back-calculate the overbreak and underbreak relative to design (draft) stopes to establish the dilution estimate factors for the original scheduled stopes (draft stopes). The full list of variables considered for this study is presented in Table 1. As alluded to earlier, the choice of variables is influenced by the desire to establish a robust and proactive dilution prediction mechanism based on generic data that is mostly available at the pre-feasibility stages of mine design. Additionally, geotechnical factors such as RQD and rock joints data have also been used successfully with ML techniques in predicting the rates of penetration (ROP) for tunnel boring projects and schedules [49, 50], suggesting such variables are good candidates for consideration in related predictions such as in this study. Furthermore, variability in autonomy levels and independent influence of some parameters suggests their exclusion may not necessarily impact the results [51]. As such, operational factors are deliberately excluded as these have been shown to have minimal impact on dilution. However, these may be considered later as an improvement initiative when sufficient data for mined stopes becomes available. Additionally, the impact of variations in stope design methods on dilution has not been investigated as the scope of this study is limited to the SLOS mining operations.

Table 1 Stope variables considered for this study

4 Methods and Model Development

A schematic summary of the methods and processes is presented in Fig. 2. Firstly, hierarchical clustering of stope variables and dilution was performed to identify potential clusters. Correlation and regression tests were conducted using the stepwise selection and elimination (SSE) method, followed by an assessment of the R2 values for each cluster’s model.

Fig. 2
figure 2

Schematic summary of the methods and model generation processes

Hierarchical clustering of stope variables and dilution was completed to identify potential clusters. Correlation and regression tests were conducted using the stepwise selection and elimination (SSE) method, followed by assessing the R2 values for each cluster’s model. While SSE methods can lead to omitting variables that might be important in a non-linear context, it was necessary to undertake this sub-step in the model formulation to determine key determinant variables should the evaluation of R2 values suggest a strong linear relationship exist. The best model for each cluster was then used to generate a classification and regression tree (CART) predictor model. As CART naturally selects splits based on the most informative variables at each node, manually selecting variables using SSE could lead to a less intuitive and interpretable model, particularly if the relationship exhibits some non-linear traits. To circumvent these potential limitations of SSE-CART, principal component analysis (PCA) was then applied, instead of SSE regression to each cluster to generate comparative PCA-CART models. Following a comparison of the SSE-CART models’ results with the PCA-CART, the best model was selected and optimised through iterative model tuning. IBM’s Statistical Package for Social Sciences software (SPSS), Version 20, was used for clustering, regression, principal component analysis and decision tree model formulation.

4.1 Methods

4.1.1 Hierarchical Clustering

Hierarchical cluster analysis was conducted to gain insights into the data distribution and characterisation. Clustering partitions data into groups based on a similarity or dissimilarity measure to improve homogeneity within groups [52]. This improves exploratory pattern analysis and classification in data analysis and machine learning [53]. K-means and hierarchical clustering are the commonly used data clustering methods cited in extant literature [53,54,55,56].

4.1.2 Stepwise Selection and Elimination (SSE) Regression

Multiple regression analysis (MRA) was conducted to analyse the relationship between dilution and draft stopes. MRA is a statistical method for analysing and interpreting the variance of the dependent variable which caused changes in the independent variables [18]. With 14 initial variables, the SSE method was selected as it adds variables incrementally to the model, retaining variables that have a significant contribution to the model, while those with insignificant contributions are discarded [57, 58].

4.1.3 Principal Component Analysis (PCA)

Principal component analysis (PCA) was carried out to investigate the potential to improve the prediction capability of the models, given the low R2 values expected. PCA is a multivariate factor reduction technique for extracting important information from multiple inter-correlated variables, consolidating them into new orthogonal variables called principal components [59]. It extracts the most important information from multiple variables, linearly transforming it into a significantly reduced number of new, uncorrelated combinations of the original variables [60]. Principal component variables are inter-correlated and, therefore, do not require correlation analysis and regression testing before modelling in CART.

4.1.4 CART Model

The CART technique is a robust decision tree algorithm capable of developing a tree structure for predicting discrete and continuous variables and providing an easy-to-use model for users. This technique recursively splits input data into the root and branch nodes hierarchy, determined by attributive selection measures on the input data’s measures of entropy, variance reduction, information gain and a probability function known as Gini index, to generate a minimum tree size with maximum data segregation capabilities. The CART model is preferred for its speed, as it discards undesirable input features during data processing and lends greater interpretability to users. Furthermore, minimal or no data transformation is necessary for the CART method to discover non-linear relationships in input data. Additionally, CART can also handle multimodal quantitative and qualitative data with minimum impact on prediction accuracy [13]. Moreover, the outliers that may negatively affect the modelling procedure and accuracy have a negligible impact on the CART method’s results, as the algorithm can detect the outliers automatically and put them in a separate cluster [61]. To facilitate modelling of the splitting criteria and extraction of results, the CART models are usually reconfigured with interpretative mathematical expressions (Fig. 3), simulating the model’s splitting decisions from each parent or internal node to the child node for the target (dilution) variable. The splitting occurs at the nodes, governed by conditional rules imposed by the predictor variables’ automatically determined threshold values until a stopping condition is satisfied. The stopping criterion is imposed by constraints, such as child and parent node thresholds or attainment of maximum tree depth. Figure 3 shows a schematic representation of a CART model, where a and b are splitting parameters for a dependent variable X.

Fig. 3
figure 3

Simplified representation of a decision tree (CART) showing rule-based splitting

As mentioned previously, CART is a rule-based white box algorithm that makes predictions through recursive splits of parent nodes based on binary options imposed by input parameters. As such, the path from each root node to the leaf node can be described as a condition that can be expressed mathematically in the form of X ≤ c, where X represents input variable and c is a threshold variable value that lies within the range of X and sets the condition for binary splitting through “If-then” options [13].

4.1.5 CHAID Model

It is further proposed to use the chi-square, automatic interaction detection (CHAID) decision tree as a comparative prediction checking model to assess the validity of the CART model. The CHAID algorithm, originally developed by [62], is one of the oldest and most commonly used classification tree models. Unlike CART, the algorithm can develop more than two branches to a single node or root, thereby improving the classification homogeneity (purity) of child nodes. The CHAID model will also serve to check whether any potential additional splits from its multiway split criteria can improve the prediction capacity of the proposed CART model.

4.2 Model Development

Hierarchical clustering was conducted on the stopes’ variables (Table 1) to assess the existence of distinct clusters within the data. Two clusters with 180 and 38 stopes were identified from the dendrogram at a re-scaled dissimilarity level of 9, as shown in Fig. 4. The sample size for extracted (mined) stopes was limited by the number of stopes already mined at the time, with the number expected to increase as mining progresses. However, this may provide an opportunity for the predictive model to be applied on stopes available for future extraction. Stopes in cluster 1 were predominantly narrow-vein with a moderate dip, while stopes in cluster 2 were in shallow dipping, moderate-width mineralisation. A third cluster was created by considering the full data set. Multicollinearity and regression tests were conducted on all clusters, retaining ten variables that were progressed with for model building (Au, Cu, true thickness, apparent thickness, stope height, dip, major faults, structure count, weathering condition and RQD number). The initial choice of model input parameters for a machine learning model is derived mainly from existing literature on the subject and site-specific engineering experience [23, 34, 63]. However, this study focused mainly on parameters known at the early stages of mine planning when draft production stopes are designed, evaluated and scheduled for long-term production plans.

Fig. 4
figure 4

Dendrogram representation of stope clustering results

The predictor variables were first entered as one batch for flat multiple regression and later entered for regression using SSE method. The model with the highest R2 from the SSE method for each cluster was retained for further analysis. PCA was conducted for dimension reduction of input variables utilising varimax rotation settings to improve factor loading spatial variance, thereby improving the interpretability of results [64]. PCA reduced the original variables to four principal component variables for each cluster, based on an eigenvalue threshold setting of 1 (Fig. 5).

Fig. 5
figure 5

Scree plot for full data cluster’s derived principal components

PCA was necessary to compare and cross-validate the prediction capability obtained in SSE regression (linear analysis) with the result obtained using the derived principal component variables (non-linear analysis). A minimum factor loading of 0.5, as recommended by several scholars [65, 66] was used, although other scholars have recently considered a lower threshold of 0.4 [67]. Bartlett’s test of sphericity, which measures the statistical significance of correlations among variables, was assessed and found to be significant (p<0.01) and, therefore, appropriate for PCA analysis. Furthermore, sampling adequacy was also assessed and confirmed using the Kaiser-Meyer-Olkin (KMO) measure, with KMO values greater than 0.5.

A decision tree classifier (CART) was used to identify predictor variables that had a high impact on stope dilution using pre-defined dilution categories on a 10% dilution increment up to 40%, with the over 40% dilution range classified as one extreme dilution category. A 40% dilution threshold was selected as this was the current factor established on site, based on reconciliation data of final mined stopes. Furthermore, the 40% baseline threshold also presented an opportunity to check the validity of the dilution factor currently being used for both production and planning purposes on the study site. Preliminary classification results indicated very poor prediction accuracy for the multiple bin categories in 10% categorical increments. As such, selective focus on a target dilution category necessitated the consolidation of the dilution categories. As the relationship between dilution and stability parameters such as the stability number (N) or the hydraulic radius (HR) is best modelled by a logistic relationship [21, 68], the classification and model prediction can be enhanced by establishing a single cut-off value and consolidate sub-categories either side of the cut-off to create two categories. While the consolidation may improve the overall prediction accuracy, the granularity level of prediction will be reduced due to fewer categories. As a result, two categories of dilution ranking emerged, with a 40% dilution cut-off. The utilisation of a binary classification approach enabled the streamlining of a few stopes proximal to the outlier margins to be analysed with minimal impact on the model’s prediction accuracy due to the increased category bandwidth. This also allowed the high prediction accuracy of the model on one category to be inferred indirectly to the other category, where the model’s direct prediction is weak. Figure 6 shows a generalised logistic regression model and how this concept is applied to reclassify categories.

Fig. 6
figure 6

General logistic regression model showing the classification of categories based on cut-off point

The cases for each cluster were divided into two groups to create a training set (80% of the cases) and a testing set (20% of the cases). The 80:20 ratio has been recommended by several scholars in statistical sampling and machine learning literature [14, 69]. CART models were generated for each cluster based on the clusters’ predictors from SSE regression. The Gini index setting, which maximises the homogeneity of splitting relative to the target variable, was used for all cluster models. A series of sensitivity analyses and tests were conducted, and results were recorded to determine the model accuracy and optimal settings. Model tuning involved pruning, iterative adjustments to the minimum number of cases in the models’ parent and child nodes and adjustments to maximum tree depth to attain optimal prediction accuracy on both training and testing models. The optimal tree growing and stopping criteria for all clusters were set at a minimum of three parent node cases, one child node case and a maximum tree depth of 5, with Gini minimum change in improvement set at 0.0001, based on the validation and tuning tests.

These iterative adjustments are a delicate trade-off between higher model accuracy and model complexity, requiring some diligence to strike the right balance. Comparative CART models were also generated using the PCA-derived components. The same training and test sample data was used for all clusters in both methods to ensure the model results were comparable. To establish the best model, confusion matrices were formulated to assess the models’ prediction effectiveness. In the matrix, a model’s predicted results are presented in the columns while the rows show the actual classification results. Useful measures derived from the confusion matrix for this study are the model’s accuracy and F1 score, where

  • Accuracy—ratio of correctly predicted inputs to total sample size

  • F1 score—harmonic mean that combines precision and recall, to measure model accuracy.

The F1 score is the desired value because the study seeks to achieve higher accurate predictions for the target category of true positives (category>40%). A confusion matrix for the SSE-CART training model is shown in Table 2, followed by calculations of model prediction metrics.

Table 2 Confusion matrix for SSE-CART Training model for the full data cluster sample

where, TN is true negative, TP is true positive, FN is false negative, and FP is false positive values. The model metrics are calculated as shown below:

$$\textrm{Accuracy}=\left(\textrm{TP}+\textrm{TN}\right)/\left(\textrm{TP}+\textrm{TN}+\textrm{FP}+\textrm{FN}\right)$$
$$\textrm{F}1\ \textrm{score}=2\times \left(\textrm{Recall}\times \textrm{Precision}\right)/\left(\textrm{Recall}+\textrm{Precision}\right)=2\times \left(0.761\times 0.684\right)/\left(0.761+0.684\right)=0.720$$

4.3 Hybrid PCA-CART Model

The PCA-derived components were also used to generate dilution prediction models using the CART method. This was useful to investigate the possibility of improving on the observed CART models’ results obtained using SSE regression variables. A similar process of using the Gini settings, with pruning and adjustments to child and parent node cases, as before, was followed to generate PCA-CART models.

5 Results and Discussion

5.1 SSE Regression and PCA

The results of SSE regression models are shown in Table 3. Without context to the study, the R2 values of 11.4–21.7% are considered low for predicting the change in the dependent variable (Planning dilution) by the independent variables. However, this is considered appropriate for this study because the low R2 is consistent with unavailable data for the numerous underlying intermediate steps that define the transformation of the draft stopes in the long-term production schedule to the finalised stopes, from which the dilution is inferred. Furthermore, with maximum recovery capped at 100%, the presence of background grade (inferred and unclassified) and ore grade variation between planned (draft) and mined (final) stopes both suppress the planning dilution factor, which is back-calculated from the final metal quantity recovered. As such, a weak, but important relationship is expected. All three clusters’ models showed an outcome with a statistically significant F-change value of less than 0.05, confirming the existence of a significant relationship between some of the predictors and the target variable. On that account, the null hypothesis of no significant relationship between the dependent and target variables is rejected.

Table 3 SSE multiple regression results and selected models for all clusters

It was further observed from the standardised coefficients (Beta values) that the stopes’ apparent thickness and dip, true thickness and stope height had the highest absolute value change of −2.71 and 2.60; −2.93 and −4.48 for the full data cluster, cluster 1 and cluster 2 models, respectively. Furthermore, the tolerance and variance inflation factors (VIF) for all models were observed to be above 0.1 and less than 10, respectively. This indicates minimal influence related to multicollinearity between the independent and the dependent variables [70].

5.2 Principal Component Analysis (PCA)

PCA reduced the original 14 variables to four principal component variables for each cluster, based on an eigenvalue threshold setting of 1, as mentioned earlier. Variables that either failed to load under the minimum loading threshold or cross-loaded across more than one component, were removed from the PCA analysis to avoid potential skewing of data in future analyses. The original variables grouped largely into geological (Au and Cu grades), geotechnical (RQD number, weathering condition, structure count), geometrical (reef thickness, stope depth, surface area, dip) and volumetric (stope size (tonnes), height) descriptive classes. As such, the derived principal components were renamed to reflect the descriptive characteristics of the original variables that constituted them. The results of PCA for all clusters are presented in Table 4. Components in cluster 2 classified poorly, which could be attributable to the low sample size, as confirmed by the low KMO value. As shown, the derived principal components accounted for 75.9%, 76.5% and 72.7% of the variation in the data for the full data cluster, cluster 1 and cluster 2, respectively.

Table 4 PCA results for all clusters

Extraction method: principal component analysis

5.3 Classification and Regression (CART)

Pre-validation results and extracted normalised importance of predictor variables showed apparent thickness and stope height as dominant predictors for the full data cluster. For cluster 1, the true thickness variable dominated over the weathering condition, while prediction in cluster 2 was solely dominated by stope height. The predictors used for the CART method were also used to generate comparative models using the CHAID method to cross-validate the CART model’s performance. Overall, the results of the two models were comparatively similar, with the CART method’s accuracy slightly superior. As such, model tuning progressed with the CART method.

SSE-CART and PCA-CART models were generated from the training and testing data sets, and the results are summarised in Table 5 for comparison. Results showed that utilising the consolidated dilution categories improved the overall classification (prediction) accuracy of the models. The dilution category where the model prediction was significantly higher was considered as the default target variable (dilution of more than 40%) because it is the category that requires more attention to better predict variability between design tonnes and final mined tonnes. Furthermore, accurate prediction of one category in a binary classifier is, by default, more likely to improve the classification of the other category. The models’ classification results show that the full data cluster model from SSE-CART had the highest overall prediction accuracy (62.8%). However, the hybrid PCA-CART models had the best comparative prediction accuracy for the target dilution category above 40% for all models.

Table 5 Comparison of results for SSE-CART and hybrid PCA-CART classification

To determine the best overall model, F1 scores were computed for both models, and the results are presented in Table 6. Results show that the PCA-CART hybrid models performed best, with an average F1 score of 0.72 (0.71 and 0.735 for training and testing models respectively). The conventional SSE-CART comes second, with an average F1 score of 0.70 (0.72 and 0.68 for training and testing models respectively). As such, the hybrid PCA-CART model for the full data cluster is selected as the best model for dilution prediction.

Table 6 Models performance metrics summary

The hybrid PCA-CART training and testing models for the full data cluster, simplified for the target category and reconfigured for the rule-based prediction interpretation governed by the model’s predictors (geometric, volumetric and geotechnical principal components) are shown in Figs. 7 and 8, respectively. While the overall sample prediction shows an almost even split for both categories, additional useful information is provided by the principal components’ splitting criterion. Results show that the model’s geometric principal component is the most influential variable, predicting 56.1% of all stopes (i.e. 78/139) below the principal component’s threshold of 0.6, as likely to overbreak by more than 40% (node number 1). This is quite significant, given that 78 stopes account for 89% of the target category (i.e. 78 out 88 stopes). The volumetric and geotechnical principal components are secondary determinants, accounting for the remaining 11% (i.e. 10 out of 88 stopes) for the target category when the geometric threshold is below 0.6. Further, the volumetric principal component’s threshold splits the 10 stopes evenly, suggesting the sensitivity of further splits based on secondary determinants’ is poor. Furthermore, there are no splits based on the geological principal component (ore grades), suggesting that these also have minimal effect on dilution prediction.

Fig. 7
figure 7

Hybrid PCA-CART training model for the full data cluster sample

Fig. 8
figure 8

Hybrid PCA-CART testing model for the full data cluster sample

Similarly, results of the randomly generated test model using the 80:20 ratio (Fig. 8) also show that 58.1% of the stopes (i.e. 25 out of 43) or 100% of target category stopes below the geometric principal component threshold of 0.6 are also likely to result in dilution above 40%, with no contributions from the secondary determinants, further confirming the immateriality of their contribution to prediction. As such, the geometric principal component alone is deemed sufficient in prediction as it accounts for approximately 90% of the target category prediction.

These observations suggest that 56 to 58% of stopes below a geometric principal component value of 0.6 are more likely to result in dilution above 40%. To establish appropriate dilution factors based on these findings, histograms and descriptive statistics were used to examine further the statistical behaviour of stopes for the full data set (Fig. 9) and the set below the geometric threshold of 0.6 and predicted dilution above 40% (Fig. 10). The latter is necessitated by the fact that, unlike the dilution category below 40% with a lower limit of zero, the dilution category for dilution above 40% has no distinct upper limit. Histogram results show dilution percentile values from P25 to P90. While a 40% dilution at a P75 confidence level is currently being used for planning assumptions based on stope performance of finalised stopes, this estimate is too optimistic for mine planning as it aligns mostly with a P40 level of confidence for draft stopes (Fig. 9). The relatively even split of the categories noted earlier (Fig. 7) suggests the P50 confidence level is appropriate for inference of overall dilution for draft stopes. This means the model underestimates the dilution 50% of the time and overestimates 50% of the time, but on average it underestimates dilution by 5% (i.e. 45.58–40%).

Fig. 9
figure 9

Full data cluster histogram and descriptive statistics for dilution category above (40%)

Fig. 10
figure 10

Histogram and descriptive statistics comparing the model prediction and actual dilution

Similarly, for stopes below the geometric threshold of 0.6 and a predicted dilution above 40%, the P50 level of confidence threshold shows the dilution will be approximately 67.9%, suggesting the geometric principal component (reef thickness, stope depth, surface area and dip) has a major influence on dilution.

Thus, the dilution extraction statement for the model is framed as follows:

  1. 1)

    IF “Geometric factor” in PCA variables ≤ 0.600, then only 10% of overbreak cases have a dilution estimate above the threshold, and therefore, the dilution estimate is 45.58% (minimum threshold).

  2. 2)

    IF “Geometric factor” in PCA variables > 0.600, then 90% of overbreak cases have a dilution above the threshold, and therefore, the overall dilution estimate is 0.5(45.58) + 0.5(67.9) = 56.7% i.e. (50% weighted for excessive overbreak).

5.4 Model Calibration and Example of Application

A new sample of 44 stopes recently completed from different sections of the mine was used to test the model’s performance on unseen data. The resultant PCA-CART classification and dilution prediction on the sample’s principal components showed an F1 score of 77%, which is closely similar to the 72% established during training and testing. The primary classification and prediction were driven by the geometric attribute PCA at a threshold of 0.45 for the dilution category above 40%. Accordingly, the estimated dilution was determined to be 56.7%, as per rule 2 above. A histogram constructed using actual dilution data, calculated using survey reconciled mined tonnes, is presented in Fig. 10. The results indicate that the actual dilution relative to the draft stopes is 60.8% at the P75 level of confidence (Fig. 10), closely aligning with the model’s prediction of 56.7%.

Furthermore, the results strongly suggest the following:

  1. 1.

    The P75 dilution estimate derived from the finalised stopes’ reconciliation is likely to significantly underestimate dilution if used for brownfield expansions or greenfield projects.

  2. 2.

    Despite the relatively low prediction accuracy due to the complex dilution relationships arising from changes in the shapes of draft stopes as new information becomes available, there is a high risk of underestimating production forecasts and costs without robust assumptions on dilution prediction.

  3. 3.

    The prediction capabilities shown by the geometrical principal component indicate that geometrical factors have a strong influence on dilution and, therefore, require special attention for a better understanding of dilution and its implications.

Accordingly, the final mined tonnes and grade are predicted from the modified dilution estimates. Thus, the model provides optionality for a generalised dilution prediction as well as for a more targeted niche of stopes based on their principal components’ values. The study results suggest the dilution estimate of 0.4 (40%), derived from drill and blast performance reconciliation, falls short by approximately 13% to adequately relate to the draft stopes on which the schedules and forecast are built. Furthermore, results show that the overall dilution increases from 45.5 to 56.6% for stopes with predicted dilution above 40% and having a geometric PCA value below the 0.6 threshold. While this may appear small, the downstream impact on material movement, mined ore grade and scheduled metal production is significant, particularly when considered over medium to long-term production planning horizons. Based on these findings, the established dilution prediction rules and factors will be applied to production schedules to improve the prediction accuracy of dilution, thereby improving ore and metal production forecasts.

6 Conclusions

Discrepancies between optimised production schedules and actual ore mined per period have been identified as a major concern in sub-level open-stoping underground operations. In this study, a model to improve dilution prediction has been established, considering the multiple intermediate changes between designed and final mined stopes. Linear regression, PCA and CART were used on a database of mined stope’s geotechnical, geological and dimensional attributes to create a hybrid ML dilution prediction model known as PCA-CART. Despite the challenges of a complex and indirect dilution relationship caused by unforeseen changes between designed and final mined stopes, the optimised model had an F1-score determination coefficient of 0.72. Furthermore, the study established a dilution factor of 45.5%, suggesting that drill and blast reconciled dilution factors from finalised stopes are more likely to underestimate dilution by at least 13% if used for long-term planning and scheduling. Furthermore, the study concluded that the stopes geometric attributes possess high dilution prediction capabilities and may be used singularly to predict overall dilution in the early design and planning stages, particularly in competent ground conditions where geological and geotechnical conditions are relatively stable. Thus, the model can be incorporated into production schedules to improve the prediction accuracy of final mined tonnes, and, ultimately, the grade and metal output forecast for narrow reef sub-level stoping operations. The method and framework applied in this study can be extended to other underground mining operations with similar settings to improve stope dilution prediction based on site geological, geotechnical and design factors. Future studies could focus on how to deal with the effects of background grade and ore grade heterogeneity to improve the strength of the prediction relationships and granularity of dilution categories.