Abstract
Tenuous dilution estimates in underground mine production scheduling continue to cause significant variations between schedule forecasts and actual production. This arises partly from the inference of dilution from predecessor stopes’ performance, disregarding that these stopes would have undergone multiple intermediate design changes between scheduling and actual mining. The resultant drill and blast-influenced dilution factors gradually lose its robustness over longer planning horizons or when applied to greenfield or brownfield expansions that do not have prior performance data. To overcome this problem, a new methodology is proposed to predict dilution in underground sub-level open stoping (SLOS) using basic geological, geotechnical and stope design attributes available in the early stage of mine planning. The method utilises principal component analysis (PCA), classification and regression tree (CART) algorithm and stepwise selection and elimination (SSE) analysis. First, SSE analysis was conducted to identify the most important independent variables to be used with the CART algorithm (i.e., the SSE-CART model) to provide a predictive model. PCA analysis was then performed, and the new principal components were used to propose a new comparative model (i.e., the PCA-CART model). Low R2 values were observed for both models, necessitating the consolidation of dilution categories to increase the models’ prediction bandwidth. The hybrid PCA-CART model outperformed the SSE-CART model with overall F1 score prediction accuracy of 72% and target dilution category prediction accuracy of over 93% against SSE-CART’s 70% and 72%, respectively. Importantly, this study revealed a 13% minimum underestimation of dilution relative to the original design stopes.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The variations between production schedule forecasts and actual performance for underground mining, particularly in sub-level open stoping (SLOS), have been a concerning subject for years [1,2,3,4]. This has necessitated several studies to establish appropriate mining efficiency factors such as dilution and recovery to improve the prediction of stope performance and, ultimately, ore and metal production forecasts. Mining dilution relates to the percentage of planned or unplanned sub-economic or waste material extraction beyond the delineated stope boundaries. Dilution affects profitability margins due to additional resources and costs required to handle the additional sub-economic or waste material. According to Planeta, Bourgoin [5], a 13% increase in dilution is estimated to potentially cause a 15% drop in revenue and a 60% reduction in profitability. Mining recovery accounts for the economic material left in the stope that is not recoverable due to design and operational constraints, such as blast performance and loading equipment accessibility and capabilities in the extraction confinements. It is a percentage measure of the extraction efficiency, accounting for ore losses for stopes. Accurate dilution and recovery factors improve volumetric estimations on final ore quantities mined, thereby improving resource allocation planning efficiency and effectiveness (equipment, time, labour), as affirmed by Bagde [6]. Importantly, this improves the accuracy and reliability of business cashflow forecasts due to improved accuracy of production forecasts. Production schedules for the long-term production planning and forecast horizon in SLOS operations utilise draft mining stopes due to limited data available as exploration and development progresses. As more information is generated from downstream mining processes such as development and sampling, the draft stopes (and, by default, the tonnes and grades) are altered and refined to optimise the mining dilution and recovery of a final mined stope. The draft stopes would later change dimensions to adjust for the final ore drive layout and reef positions within the drive, with further changes at the drill design stage to accommodate the drilling and mucking capabilities on site. Figure 1 summarises the adjustments to draft stopes and the associated changes in stope tonnes and grades, providing insights into the complexity of dilution and recovery factors in SLOS, and why a weak prediction relationship is expected. These adjustments result in deviations between production schedule forecasts generated from draft stopes and actual production from the final modified and mined stopes at the production stage [7, 8].
Extant literature on dilution in underground mining SLOS methods mainly focuses on the performance of finalised stopes, ready for drilling and blasting. The established dilution factors from such studies are usually extended, without moderation, for application in long-term production planning assumptions. This ignores the fact that long-term production schedules utilise draft stopes and therefore disregards the influence of multiple intermediate adjustments to draft stopes. As a result, the strength of the dilution relationship established using final stopes is expected to be weaker relative to the draft stopes due to the stope dimension disparities arising from stope adjustments. To address the limitations of the weak relationship envisaged, sensitivity tests and iterative model tuning will be conducted to establish the model’s optimal settings.
The final predicted stope tonnes mined will largely depend on the dilution and recovery factors applied. However, in practice, recovery factors are difficult to relate to original stopes due to multiple intermediate adjustments to stope designs and geological block model updates. Furthermore, metal content recovery may exceed 100% due to excessive overbreak into sub-economic ore zones and, inevitably, due to the heterogeneous nature of ore grade distribution and background grade disparities commonly observed in block models. These factors complicate the formulation of a reliable recovery model [9]. In addition, the transformation of stopes has multiple subfactors that exhibit a complex plurality of influence and interconnectedness. As such, the determination of appropriate dilution factors relative to draft stopes is challenging to decipher without the rigours of sufficiently trained predictor models [10]. Fortunately, machine learning (ML) techniques and their applications in predictive modelling continue to gain momentum, with significant results reported in various mining processes. Specifically, the successful application of white box models such as the decision tree (DT)-based algorithms and principal component analysis (PCA) in improving prediction in mining processes provide a compelling proposition for consideration in the current study [11,12,13,14,15,16].
For clarity, the primary objectives of this study are summarised as follows:
-
Establish a robust and proactive dilution prediction mechanism to support production scheduling optimisation of brownfield expansions or greenfield mining projects based on generic data mostly available at the pre-feasibility stages (geological, geotechnical and stope design attributes).
-
Improve the prediction of dilution on draft stopes used in long-term production schedules, utilising the best available information from known variables at the early stage of stope design and schedule generation.
Following this, the novel contribution of this study arises from the concept of establishing suitable dilution factors based on early design data to facilitate the generation of robust production schedules and evaluations at the strategic level of detail. This enriches the intensity of strategic planning elements, which may facilitate unlocking business growth opportunities [17]. Most studies on dilution utilise finalised stopes, leading to considerably high model prediction accuracy of over 70% being reported [18,19,20]. The predicted dilution factors from these studies are then literally used for new or brownfield expansions without moderation, causing undesirable discrepancies. Furthermore, it is argued that blast damage depends on the rock mass quality (Q), with minimal damage occurring in stopes with higher Q ratings, and vice versa for stopes with lower Q ratings [21]. Therefore, the effects of drill and blast are largely accounted for in the geotechnical modelling if the stopes to be mined lie in the stable zone of the stability graph, which is usually the case in approved mining cases. The deliberate omission of these factors will not compromise the model’s accuracy materially for the medium to long-term planning horizons. However, these may be considered optionally for site-specific model modifications to suit changing conditions and short-term needs [22]. Following on this approach, prediction models for dilution of draft stopes are expected to be comparatively weaker than dilution prediction models commonly based on finalised stopes. Nonetheless, any minor improvement in dilution prediction at the planning stage shrinks the concerning production disparities between production schedule and final mined output. This is precisely, the main objective of this study.
2 Literature Review
Dilution in underground mining and tunnelling has been studied in various contexts in the form of stope hanging wall stability [23,24,25,26,27], overbreak in mining excavations [18, 20, 28] and ore dilution in underground mining stope extraction [7, 9, 29, 30]. Extant literature on mining dilution has identified several attributes as having a considerable impact on ore dilution and recovery [10]. These attributes can be classified into three broad classes viz: geological, geotechnical and geometric (physical) factors. While other factors such as drill and blast parameters (drill hole length, diameter, pattern, charge density, explosive type) and human error factors influence dilution, their impact has been shown to have very minimal impact on dilution [18, 20, 31], and are, therefore, not considered in this study.
2.1 Stope Geometry
Stope geometric attributes have been proven to impact the stope hanging wall stability and dilution significantly [24, 29]. Geometric attributes comprise reef dip, stope area, stope width, reef thickness and height. Unplanned dilution is known to be sensitive to stope height and hanging wall dip. Shorter and steeply dipping stopes generally have lower dilution, while longer and shallow dipping stopes have higher dilution [32]. Furthermore, longer strike lengths reduce overall stope stability, thereby increasing unplanned dilution [8, 27]. Thus, variations to stope geometry between pre- and post-development stope design phases impact dilution. Furthermore, dilution is also sensitive to stope widths and spans, particularly in SLOS mining methods [29]. Open stopes with shorter strike lengths and longer height dimensions and/or longer strike lengths and shorter height dimensions are generally more stable (and therefore, less prone to overbreak) than stopes of the same area designed in a square shape [33]. In this context, the appropriateness of the stope design outlay and dimensions contributes significantly to the determination of overall stope geometry and, ultimately, stope performance [18, 23, 34,35,36]. As such, analysing these attributes on mined stopes may provide useful insights on dilution patterns for modelling dilution prediction in stopes planned for future extraction.
2.2 Geotechnical Factors
Geotechnical factors that significantly influence dilution include rockmass characterisation properties such as rock quality designation (RQD), modelled stope span stability thresholds (hydraulic radius), the existence and nature of faults and bedding plane structures [37, 38]. Early studies primarily sought to predict dilution using variants of the stability graph method, which was first identified by Mathews, Hoek [23] and later modified by Potvin [24]. This method uses geotechnical rockmass characteristics to establish zonal stability, stable stope spans and the possible failure criterion and extent in stope excavations. The combined effect of stope hanging wall shape and size has been modelled as hydraulic radius, showing a positive correlation relationship with the magnitude of hanging wall instability [39]. The relationship between the rock quality (Q) and rock quality designation (RQD) is expressed as follows:
where
J n, Jr, Ja, Jw and SRF refer to joint set number, joint set roughness, joint alteration, joint water inflow and stress reduction factor, respectively. Jw is 1 for dry areas (no water seepage), Ja is also 1 for fresh joints, while Jr is largely constant except in areas where a major fault approaches or intersects a stope. Thus, Q varies linearly with RQD, and therefore, the RQD can be used to assess the impact of rock mass quality on overbreak/dilution performance, as noted in [34]. Further studies concluded that narrower stopes require a shorter hydraulic radius as they are more prone to instability owing to their high sensitivity to dilution [40]. However, the prediction of the magnitude of changes to stopes has had challenges with different interpretations by scholars, resulting in a lack of consensus on the measurement of inputs and reliability of results based on stope sizes [25, 26]. Structural faults and bedding planes are zones of weakness within rock formations. As such, stopes within these zones are more likely to overbreak excessively, causing huge variations between planned and mined stope tonnes. Furthermore, structural faults and defects tend to dictate the failure and propagation mode of blast-induced fracture patterns [18].
2.3 Geological Factors
Geological factors involve the mineralogy of the rock, the rock formations, lithology and density. The presence of minerals with low hardness or clay within the mineralisation and host rock creates zones of weaknesses that reduce rockmass stability, thereby increasing chances of overbreak and dilution under increasing in situ or blast-induced stresses [41]. Rock strength categories based on the degree of weathering alteration have also been studied for a potential relationship with stope stability, with a positive correlation confirmed [10]. This suggests that the minerals within a rock formation are potential test variables for stope performance studies. However, despite the significant achievements in dilution prediction and control, these methods rarely relate dilution to original draft stopes off the long-term production schedules. They only consider the final stopes ready for drilling and blasting.
2.4 Dilution Prediction with Machine Learning
Machine learning (ML) models can be trained on historical data to “learn” underlying useful relationships for predicting future performance based on unseen data. According to [42,43,44], predictive capabilities of ML models can be leveraged to improve the accuracy of production schedule inputs. ML models such as artificial neural network (ANN) methods have been used for some time to determine dilution in underground mining, with some high-accuracy results reported [20, 28, 30]. These studies concluded that hanging wall stability and dilution were sensitive to drilling accuracy, explosive charge density, in situ stresses and hanging wall dip angle. Similarly, comparative results were also reported from using multiple regression analysis variants for dilution prediction in underground mines [18, 20]. However, prior studies found that ANN models require a relatively large amount of input data for developing an accurate predictive model and tend to overfit prediction capacity in small data sets [45]. Decision tree model variants, such as the random forest (RF), have also been successfully applied to predict hanging wall stability and dilution in underground stopes [34, 46] In related studies, decision tree (DT) algorithms were successfully used by Ajak, Lilford [47] to predict the existence of undesirable clay material in an orebody at an iron operation in Western Australia, attaining a prediction accuracy of over 70%. The results were complemented by findings in Jafrasteh, Fathianpour [48], who also used RF decision tree ensembles, with affirmative results in predicting copper grade for the Sarcheshmeh copper deposit in Kerman, Iran. Despite these studies being carried out for open pit operations, both findings confirm the suitability of the application of decision tree variants for dilution and ore grade prediction in mining processes.
3 Data Collection and Preparation
Data for 282 mined stopes in a narrow to medium width orebody was collected from an anonymised Western Australian gold mine for the period 2016 to 2021. The mine utilises a sub-level, open stoping (SLOS) method for mining shallow to medium dipping orebodies (15–60°). The data was pre-processed to extract key variables, remove outliers and address missing values. Indicator variables were used to quantify categorical data (major faults and weathering condition) in the model. Validation checks on the data revealed a few missing figures and abnormalities. In instances where the data could not be verified, affected stopes were removed to minimise speculation errors and skewing of research data. Stopes with block model output-related outliers, or missing values, were also removed (case deletion) from the study data. Missing values for stope attributes were replaced with averages in instances where neighbouring stopes existed for problematic stopes (mean substitution). The final recovered metal content was used to back-calculate the overbreak and underbreak relative to design (draft) stopes to establish the dilution estimate factors for the original scheduled stopes (draft stopes). The full list of variables considered for this study is presented in Table 1. As alluded to earlier, the choice of variables is influenced by the desire to establish a robust and proactive dilution prediction mechanism based on generic data that is mostly available at the pre-feasibility stages of mine design. Additionally, geotechnical factors such as RQD and rock joints data have also been used successfully with ML techniques in predicting the rates of penetration (ROP) for tunnel boring projects and schedules [49, 50], suggesting such variables are good candidates for consideration in related predictions such as in this study. Furthermore, variability in autonomy levels and independent influence of some parameters suggests their exclusion may not necessarily impact the results [51]. As such, operational factors are deliberately excluded as these have been shown to have minimal impact on dilution. However, these may be considered later as an improvement initiative when sufficient data for mined stopes becomes available. Additionally, the impact of variations in stope design methods on dilution has not been investigated as the scope of this study is limited to the SLOS mining operations.
4 Methods and Model Development
A schematic summary of the methods and processes is presented in Fig. 2. Firstly, hierarchical clustering of stope variables and dilution was performed to identify potential clusters. Correlation and regression tests were conducted using the stepwise selection and elimination (SSE) method, followed by an assessment of the R2 values for each cluster’s model.
Hierarchical clustering of stope variables and dilution was completed to identify potential clusters. Correlation and regression tests were conducted using the stepwise selection and elimination (SSE) method, followed by assessing the R2 values for each cluster’s model. While SSE methods can lead to omitting variables that might be important in a non-linear context, it was necessary to undertake this sub-step in the model formulation to determine key determinant variables should the evaluation of R2 values suggest a strong linear relationship exist. The best model for each cluster was then used to generate a classification and regression tree (CART) predictor model. As CART naturally selects splits based on the most informative variables at each node, manually selecting variables using SSE could lead to a less intuitive and interpretable model, particularly if the relationship exhibits some non-linear traits. To circumvent these potential limitations of SSE-CART, principal component analysis (PCA) was then applied, instead of SSE regression to each cluster to generate comparative PCA-CART models. Following a comparison of the SSE-CART models’ results with the PCA-CART, the best model was selected and optimised through iterative model tuning. IBM’s Statistical Package for Social Sciences software (SPSS), Version 20, was used for clustering, regression, principal component analysis and decision tree model formulation.
4.1 Methods
4.1.1 Hierarchical Clustering
Hierarchical cluster analysis was conducted to gain insights into the data distribution and characterisation. Clustering partitions data into groups based on a similarity or dissimilarity measure to improve homogeneity within groups [52]. This improves exploratory pattern analysis and classification in data analysis and machine learning [53]. K-means and hierarchical clustering are the commonly used data clustering methods cited in extant literature [53,54,55,56].
4.1.2 Stepwise Selection and Elimination (SSE) Regression
Multiple regression analysis (MRA) was conducted to analyse the relationship between dilution and draft stopes. MRA is a statistical method for analysing and interpreting the variance of the dependent variable which caused changes in the independent variables [18]. With 14 initial variables, the SSE method was selected as it adds variables incrementally to the model, retaining variables that have a significant contribution to the model, while those with insignificant contributions are discarded [57, 58].
4.1.3 Principal Component Analysis (PCA)
Principal component analysis (PCA) was carried out to investigate the potential to improve the prediction capability of the models, given the low R2 values expected. PCA is a multivariate factor reduction technique for extracting important information from multiple inter-correlated variables, consolidating them into new orthogonal variables called principal components [59]. It extracts the most important information from multiple variables, linearly transforming it into a significantly reduced number of new, uncorrelated combinations of the original variables [60]. Principal component variables are inter-correlated and, therefore, do not require correlation analysis and regression testing before modelling in CART.
4.1.4 CART Model
The CART technique is a robust decision tree algorithm capable of developing a tree structure for predicting discrete and continuous variables and providing an easy-to-use model for users. This technique recursively splits input data into the root and branch nodes hierarchy, determined by attributive selection measures on the input data’s measures of entropy, variance reduction, information gain and a probability function known as Gini index, to generate a minimum tree size with maximum data segregation capabilities. The CART model is preferred for its speed, as it discards undesirable input features during data processing and lends greater interpretability to users. Furthermore, minimal or no data transformation is necessary for the CART method to discover non-linear relationships in input data. Additionally, CART can also handle multimodal quantitative and qualitative data with minimum impact on prediction accuracy [13]. Moreover, the outliers that may negatively affect the modelling procedure and accuracy have a negligible impact on the CART method’s results, as the algorithm can detect the outliers automatically and put them in a separate cluster [61]. To facilitate modelling of the splitting criteria and extraction of results, the CART models are usually reconfigured with interpretative mathematical expressions (Fig. 3), simulating the model’s splitting decisions from each parent or internal node to the child node for the target (dilution) variable. The splitting occurs at the nodes, governed by conditional rules imposed by the predictor variables’ automatically determined threshold values until a stopping condition is satisfied. The stopping criterion is imposed by constraints, such as child and parent node thresholds or attainment of maximum tree depth. Figure 3 shows a schematic representation of a CART model, where a and b are splitting parameters for a dependent variable X.
As mentioned previously, CART is a rule-based white box algorithm that makes predictions through recursive splits of parent nodes based on binary options imposed by input parameters. As such, the path from each root node to the leaf node can be described as a condition that can be expressed mathematically in the form of X ≤ c, where X represents input variable and c is a threshold variable value that lies within the range of X and sets the condition for binary splitting through “If-then” options [13].
4.1.5 CHAID Model
It is further proposed to use the chi-square, automatic interaction detection (CHAID) decision tree as a comparative prediction checking model to assess the validity of the CART model. The CHAID algorithm, originally developed by [62], is one of the oldest and most commonly used classification tree models. Unlike CART, the algorithm can develop more than two branches to a single node or root, thereby improving the classification homogeneity (purity) of child nodes. The CHAID model will also serve to check whether any potential additional splits from its multiway split criteria can improve the prediction capacity of the proposed CART model.
4.2 Model Development
Hierarchical clustering was conducted on the stopes’ variables (Table 1) to assess the existence of distinct clusters within the data. Two clusters with 180 and 38 stopes were identified from the dendrogram at a re-scaled dissimilarity level of 9, as shown in Fig. 4. The sample size for extracted (mined) stopes was limited by the number of stopes already mined at the time, with the number expected to increase as mining progresses. However, this may provide an opportunity for the predictive model to be applied on stopes available for future extraction. Stopes in cluster 1 were predominantly narrow-vein with a moderate dip, while stopes in cluster 2 were in shallow dipping, moderate-width mineralisation. A third cluster was created by considering the full data set. Multicollinearity and regression tests were conducted on all clusters, retaining ten variables that were progressed with for model building (Au, Cu, true thickness, apparent thickness, stope height, dip, major faults, structure count, weathering condition and RQD number). The initial choice of model input parameters for a machine learning model is derived mainly from existing literature on the subject and site-specific engineering experience [23, 34, 63]. However, this study focused mainly on parameters known at the early stages of mine planning when draft production stopes are designed, evaluated and scheduled for long-term production plans.
The predictor variables were first entered as one batch for flat multiple regression and later entered for regression using SSE method. The model with the highest R2 from the SSE method for each cluster was retained for further analysis. PCA was conducted for dimension reduction of input variables utilising varimax rotation settings to improve factor loading spatial variance, thereby improving the interpretability of results [64]. PCA reduced the original variables to four principal component variables for each cluster, based on an eigenvalue threshold setting of 1 (Fig. 5).
PCA was necessary to compare and cross-validate the prediction capability obtained in SSE regression (linear analysis) with the result obtained using the derived principal component variables (non-linear analysis). A minimum factor loading of 0.5, as recommended by several scholars [65, 66] was used, although other scholars have recently considered a lower threshold of 0.4 [67]. Bartlett’s test of sphericity, which measures the statistical significance of correlations among variables, was assessed and found to be significant (p<0.01) and, therefore, appropriate for PCA analysis. Furthermore, sampling adequacy was also assessed and confirmed using the Kaiser-Meyer-Olkin (KMO) measure, with KMO values greater than 0.5.
A decision tree classifier (CART) was used to identify predictor variables that had a high impact on stope dilution using pre-defined dilution categories on a 10% dilution increment up to 40%, with the over 40% dilution range classified as one extreme dilution category. A 40% dilution threshold was selected as this was the current factor established on site, based on reconciliation data of final mined stopes. Furthermore, the 40% baseline threshold also presented an opportunity to check the validity of the dilution factor currently being used for both production and planning purposes on the study site. Preliminary classification results indicated very poor prediction accuracy for the multiple bin categories in 10% categorical increments. As such, selective focus on a target dilution category necessitated the consolidation of the dilution categories. As the relationship between dilution and stability parameters such as the stability number (N) or the hydraulic radius (HR) is best modelled by a logistic relationship [21, 68], the classification and model prediction can be enhanced by establishing a single cut-off value and consolidate sub-categories either side of the cut-off to create two categories. While the consolidation may improve the overall prediction accuracy, the granularity level of prediction will be reduced due to fewer categories. As a result, two categories of dilution ranking emerged, with a 40% dilution cut-off. The utilisation of a binary classification approach enabled the streamlining of a few stopes proximal to the outlier margins to be analysed with minimal impact on the model’s prediction accuracy due to the increased category bandwidth. This also allowed the high prediction accuracy of the model on one category to be inferred indirectly to the other category, where the model’s direct prediction is weak. Figure 6 shows a generalised logistic regression model and how this concept is applied to reclassify categories.
The cases for each cluster were divided into two groups to create a training set (80% of the cases) and a testing set (20% of the cases). The 80:20 ratio has been recommended by several scholars in statistical sampling and machine learning literature [14, 69]. CART models were generated for each cluster based on the clusters’ predictors from SSE regression. The Gini index setting, which maximises the homogeneity of splitting relative to the target variable, was used for all cluster models. A series of sensitivity analyses and tests were conducted, and results were recorded to determine the model accuracy and optimal settings. Model tuning involved pruning, iterative adjustments to the minimum number of cases in the models’ parent and child nodes and adjustments to maximum tree depth to attain optimal prediction accuracy on both training and testing models. The optimal tree growing and stopping criteria for all clusters were set at a minimum of three parent node cases, one child node case and a maximum tree depth of 5, with Gini minimum change in improvement set at 0.0001, based on the validation and tuning tests.
These iterative adjustments are a delicate trade-off between higher model accuracy and model complexity, requiring some diligence to strike the right balance. Comparative CART models were also generated using the PCA-derived components. The same training and test sample data was used for all clusters in both methods to ensure the model results were comparable. To establish the best model, confusion matrices were formulated to assess the models’ prediction effectiveness. In the matrix, a model’s predicted results are presented in the columns while the rows show the actual classification results. Useful measures derived from the confusion matrix for this study are the model’s accuracy and F1 score, where
-
Accuracy—ratio of correctly predicted inputs to total sample size
-
F1 score—harmonic mean that combines precision and recall, to measure model accuracy.
The F1 score is the desired value because the study seeks to achieve higher accurate predictions for the target category of true positives (category>40%). A confusion matrix for the SSE-CART training model is shown in Table 2, followed by calculations of model prediction metrics.
where, TN is true negative, TP is true positive, FN is false negative, and FP is false positive values. The model metrics are calculated as shown below:
4.3 Hybrid PCA-CART Model
The PCA-derived components were also used to generate dilution prediction models using the CART method. This was useful to investigate the possibility of improving on the observed CART models’ results obtained using SSE regression variables. A similar process of using the Gini settings, with pruning and adjustments to child and parent node cases, as before, was followed to generate PCA-CART models.
5 Results and Discussion
5.1 SSE Regression and PCA
The results of SSE regression models are shown in Table 3. Without context to the study, the R2 values of 11.4–21.7% are considered low for predicting the change in the dependent variable (Planning dilution) by the independent variables. However, this is considered appropriate for this study because the low R2 is consistent with unavailable data for the numerous underlying intermediate steps that define the transformation of the draft stopes in the long-term production schedule to the finalised stopes, from which the dilution is inferred. Furthermore, with maximum recovery capped at 100%, the presence of background grade (inferred and unclassified) and ore grade variation between planned (draft) and mined (final) stopes both suppress the planning dilution factor, which is back-calculated from the final metal quantity recovered. As such, a weak, but important relationship is expected. All three clusters’ models showed an outcome with a statistically significant F-change value of less than 0.05, confirming the existence of a significant relationship between some of the predictors and the target variable. On that account, the null hypothesis of no significant relationship between the dependent and target variables is rejected.
It was further observed from the standardised coefficients (Beta values) that the stopes’ apparent thickness and dip, true thickness and stope height had the highest absolute value change of −2.71 and 2.60; −2.93 and −4.48 for the full data cluster, cluster 1 and cluster 2 models, respectively. Furthermore, the tolerance and variance inflation factors (VIF) for all models were observed to be above 0.1 and less than 10, respectively. This indicates minimal influence related to multicollinearity between the independent and the dependent variables [70].
5.2 Principal Component Analysis (PCA)
PCA reduced the original 14 variables to four principal component variables for each cluster, based on an eigenvalue threshold setting of 1, as mentioned earlier. Variables that either failed to load under the minimum loading threshold or cross-loaded across more than one component, were removed from the PCA analysis to avoid potential skewing of data in future analyses. The original variables grouped largely into geological (Au and Cu grades), geotechnical (RQD number, weathering condition, structure count), geometrical (reef thickness, stope depth, surface area, dip) and volumetric (stope size (tonnes), height) descriptive classes. As such, the derived principal components were renamed to reflect the descriptive characteristics of the original variables that constituted them. The results of PCA for all clusters are presented in Table 4. Components in cluster 2 classified poorly, which could be attributable to the low sample size, as confirmed by the low KMO value. As shown, the derived principal components accounted for 75.9%, 76.5% and 72.7% of the variation in the data for the full data cluster, cluster 1 and cluster 2, respectively.
Extraction method: principal component analysis
5.3 Classification and Regression (CART)
Pre-validation results and extracted normalised importance of predictor variables showed apparent thickness and stope height as dominant predictors for the full data cluster. For cluster 1, the true thickness variable dominated over the weathering condition, while prediction in cluster 2 was solely dominated by stope height. The predictors used for the CART method were also used to generate comparative models using the CHAID method to cross-validate the CART model’s performance. Overall, the results of the two models were comparatively similar, with the CART method’s accuracy slightly superior. As such, model tuning progressed with the CART method.
SSE-CART and PCA-CART models were generated from the training and testing data sets, and the results are summarised in Table 5 for comparison. Results showed that utilising the consolidated dilution categories improved the overall classification (prediction) accuracy of the models. The dilution category where the model prediction was significantly higher was considered as the default target variable (dilution of more than 40%) because it is the category that requires more attention to better predict variability between design tonnes and final mined tonnes. Furthermore, accurate prediction of one category in a binary classifier is, by default, more likely to improve the classification of the other category. The models’ classification results show that the full data cluster model from SSE-CART had the highest overall prediction accuracy (62.8%). However, the hybrid PCA-CART models had the best comparative prediction accuracy for the target dilution category above 40% for all models.
To determine the best overall model, F1 scores were computed for both models, and the results are presented in Table 6. Results show that the PCA-CART hybrid models performed best, with an average F1 score of 0.72 (0.71 and 0.735 for training and testing models respectively). The conventional SSE-CART comes second, with an average F1 score of 0.70 (0.72 and 0.68 for training and testing models respectively). As such, the hybrid PCA-CART model for the full data cluster is selected as the best model for dilution prediction.
The hybrid PCA-CART training and testing models for the full data cluster, simplified for the target category and reconfigured for the rule-based prediction interpretation governed by the model’s predictors (geometric, volumetric and geotechnical principal components) are shown in Figs. 7 and 8, respectively. While the overall sample prediction shows an almost even split for both categories, additional useful information is provided by the principal components’ splitting criterion. Results show that the model’s geometric principal component is the most influential variable, predicting 56.1% of all stopes (i.e. 78/139) below the principal component’s threshold of 0.6, as likely to overbreak by more than 40% (node number 1). This is quite significant, given that 78 stopes account for 89% of the target category (i.e. 78 out 88 stopes). The volumetric and geotechnical principal components are secondary determinants, accounting for the remaining 11% (i.e. 10 out of 88 stopes) for the target category when the geometric threshold is below 0.6. Further, the volumetric principal component’s threshold splits the 10 stopes evenly, suggesting the sensitivity of further splits based on secondary determinants’ is poor. Furthermore, there are no splits based on the geological principal component (ore grades), suggesting that these also have minimal effect on dilution prediction.
Similarly, results of the randomly generated test model using the 80:20 ratio (Fig. 8) also show that 58.1% of the stopes (i.e. 25 out of 43) or 100% of target category stopes below the geometric principal component threshold of 0.6 are also likely to result in dilution above 40%, with no contributions from the secondary determinants, further confirming the immateriality of their contribution to prediction. As such, the geometric principal component alone is deemed sufficient in prediction as it accounts for approximately 90% of the target category prediction.
These observations suggest that 56 to 58% of stopes below a geometric principal component value of 0.6 are more likely to result in dilution above 40%. To establish appropriate dilution factors based on these findings, histograms and descriptive statistics were used to examine further the statistical behaviour of stopes for the full data set (Fig. 9) and the set below the geometric threshold of 0.6 and predicted dilution above 40% (Fig. 10). The latter is necessitated by the fact that, unlike the dilution category below 40% with a lower limit of zero, the dilution category for dilution above 40% has no distinct upper limit. Histogram results show dilution percentile values from P25 to P90. While a 40% dilution at a P75 confidence level is currently being used for planning assumptions based on stope performance of finalised stopes, this estimate is too optimistic for mine planning as it aligns mostly with a P40 level of confidence for draft stopes (Fig. 9). The relatively even split of the categories noted earlier (Fig. 7) suggests the P50 confidence level is appropriate for inference of overall dilution for draft stopes. This means the model underestimates the dilution 50% of the time and overestimates 50% of the time, but on average it underestimates dilution by 5% (i.e. 45.58–40%).
Similarly, for stopes below the geometric threshold of 0.6 and a predicted dilution above 40%, the P50 level of confidence threshold shows the dilution will be approximately 67.9%, suggesting the geometric principal component (reef thickness, stope depth, surface area and dip) has a major influence on dilution.
Thus, the dilution extraction statement for the model is framed as follows:
-
1)
IF “Geometric factor” in PCA variables ≤ 0.600, then only 10% of overbreak cases have a dilution estimate above the threshold, and therefore, the dilution estimate is 45.58% (minimum threshold).
-
2)
IF “Geometric factor” in PCA variables > 0.600, then 90% of overbreak cases have a dilution above the threshold, and therefore, the overall dilution estimate is 0.5(45.58) + 0.5(67.9) = 56.7% i.e. (50% weighted for excessive overbreak).
5.4 Model Calibration and Example of Application
A new sample of 44 stopes recently completed from different sections of the mine was used to test the model’s performance on unseen data. The resultant PCA-CART classification and dilution prediction on the sample’s principal components showed an F1 score of 77%, which is closely similar to the 72% established during training and testing. The primary classification and prediction were driven by the geometric attribute PCA at a threshold of 0.45 for the dilution category above 40%. Accordingly, the estimated dilution was determined to be 56.7%, as per rule 2 above. A histogram constructed using actual dilution data, calculated using survey reconciled mined tonnes, is presented in Fig. 10. The results indicate that the actual dilution relative to the draft stopes is 60.8% at the P75 level of confidence (Fig. 10), closely aligning with the model’s prediction of 56.7%.
Furthermore, the results strongly suggest the following:
-
1.
The P75 dilution estimate derived from the finalised stopes’ reconciliation is likely to significantly underestimate dilution if used for brownfield expansions or greenfield projects.
-
2.
Despite the relatively low prediction accuracy due to the complex dilution relationships arising from changes in the shapes of draft stopes as new information becomes available, there is a high risk of underestimating production forecasts and costs without robust assumptions on dilution prediction.
-
3.
The prediction capabilities shown by the geometrical principal component indicate that geometrical factors have a strong influence on dilution and, therefore, require special attention for a better understanding of dilution and its implications.
Accordingly, the final mined tonnes and grade are predicted from the modified dilution estimates. Thus, the model provides optionality for a generalised dilution prediction as well as for a more targeted niche of stopes based on their principal components’ values. The study results suggest the dilution estimate of 0.4 (40%), derived from drill and blast performance reconciliation, falls short by approximately 13% to adequately relate to the draft stopes on which the schedules and forecast are built. Furthermore, results show that the overall dilution increases from 45.5 to 56.6% for stopes with predicted dilution above 40% and having a geometric PCA value below the 0.6 threshold. While this may appear small, the downstream impact on material movement, mined ore grade and scheduled metal production is significant, particularly when considered over medium to long-term production planning horizons. Based on these findings, the established dilution prediction rules and factors will be applied to production schedules to improve the prediction accuracy of dilution, thereby improving ore and metal production forecasts.
6 Conclusions
Discrepancies between optimised production schedules and actual ore mined per period have been identified as a major concern in sub-level open-stoping underground operations. In this study, a model to improve dilution prediction has been established, considering the multiple intermediate changes between designed and final mined stopes. Linear regression, PCA and CART were used on a database of mined stope’s geotechnical, geological and dimensional attributes to create a hybrid ML dilution prediction model known as PCA-CART. Despite the challenges of a complex and indirect dilution relationship caused by unforeseen changes between designed and final mined stopes, the optimised model had an F1-score determination coefficient of 0.72. Furthermore, the study established a dilution factor of 45.5%, suggesting that drill and blast reconciled dilution factors from finalised stopes are more likely to underestimate dilution by at least 13% if used for long-term planning and scheduling. Furthermore, the study concluded that the stopes geometric attributes possess high dilution prediction capabilities and may be used singularly to predict overall dilution in the early design and planning stages, particularly in competent ground conditions where geological and geotechnical conditions are relatively stable. Thus, the model can be incorporated into production schedules to improve the prediction accuracy of final mined tonnes, and, ultimately, the grade and metal output forecast for narrow reef sub-level stoping operations. The method and framework applied in this study can be extended to other underground mining operations with similar settings to improve stope dilution prediction based on site geological, geotechnical and design factors. Future studies could focus on how to deal with the effects of background grade and ore grade heterogeneity to improve the strength of the prediction relationships and granularity of dilution categories.
References
Williams, J., L. Smith, and P. Wells, Planning of underground copper mining : proceedings of the 10th international symposium on the application of computer methods in the minerals industry, Johannesburg, South Africa, 10-14 April 1972 Southern African Institute of Mining and Metallurgy, Johannesburg. pp. 251-254. https://www.saimm.co.za/Conferences/Apcom72/251-Williams.pdf, 1972.
Kuchta M, Newman A, Topal E (2004) Implementing a production schedule at LKAB’s Kiruna Mine. Interfaces (Providence) 34(2):124–134
Nehring M et al (2012) Integrated short- and medium-term underground mine production scheduling. J South Afr Inst Min Metall 112(5):365–378
MacLean J (2017) Biggest risks for mining companies shift, yet challenges remain. Can Min J 138(1):6
Planeta S, Bourgoin C, Laflamme M (1990) The impact of rock dilution on undergroundmining: operational and financial considerations. In: Proc. nnd CIM Annual General Meeting, Ottawa, Ontario
Bagde MN (2021) Ore and backfill dilution in underground hard rock mining. J Min Sci 57(6):995–1005
Henning JG, Mitri HS (2008) Assessment and control of ore dilution in long hole mining: case studies. Geotech Geol Eng 26(4):349–366
Hughes R (2011) Factors influencing overbreak in narrow vein longitudinal retreat mining. McGill University (Canada), Ann Arbor, p 144
Ibarra-Gutiérrez S, Laflamme M (2021) Blasted ore losses and mineral reserve: reconciliation approaches and impact on stope performance. Mining, Metallurgy & Exploration 38(5):1893–1898
Forster K, Milne D, Pop A (2007) Mining and rock mass factors influencing hangingwall dilution. In: 1st Canada - U.S. Rock Mechanics Symposium
Seera M, Lim CP (2014) Online motor fault detection and diagnosis using a hybrid FMM-CART model. IEEE transactions on neural networks and learning systems 25(4):806–812
Faradonbeh RS, Monjezi M (2017) Prediction and minimization of blast-induced ground vibration using two robust meta-heuristic algorithms. Eng Comput 33(4):835–851
Shirani FR et al (2020) Rockburst assessment in deep geotechnical conditions using true-triaxial tests and data-driven approaches. International journal of rock mechanics and mining sciences (Oxford, England : 1997) 128:104279
Kaplan UE, Dagasan Y, Topal E (2021) Mineral grade estimation using gradient boosting regression trees. Int J Min Reclam Environ 35(10):728–742
Patil SD et al (2021) Predictive asset availability optimization for underground trucks and loaders in the mining industry. OPSEARCH 58(3):751–772
Perez LA (2021) Classification and regression models in copper refinery. Miner Process Ext Metall:1–7
Chavunduka, D., O. Sifile, and P. Chimunhu, Strategic planning intensity and firm performance: a case of Zimbabwe mining development corporation. 2015.
Jang H, Topal E (2013) Optimizing overbreak prediction based on geological parameters comparing multiple regression analysis and artificial neural network. Tunn Undergr Space Technol 38:161–169
Jang H, Topal E, Kawamura Y (2015) Unplanned dilution and ore loss prediction in longhole stoping mines via multiple regression and artificial neural network analyses. J South Afr Inst Min Metall 115:449–456
Mottahedi A, Sereshki F, Ataei M (2018) Development of overbreak prediction models in drill and blast tunneling using soft computing methods. Eng Comput 34(1):45–58
Stewart P, Trueman R Strategies for minimising and predicting dilution in narrow-vein mines–NVD Method. 2008. Australasian Institute of Mining and Metallurgy
Himanshu, V.K., et al., Innovative blasting practices for underground hard rock mining, in Blasting technology for underground hard rock mining, V.K. Himanshu, et al., Editors. 2023, Springer Nature Singapore: Singapore. p. 107-117.
Mathews K et al (1980) Prediction of stable excavation spans for mining at depths below 1000 metres in hard rock. Golder Associates report to CANMET. Department of Energy and Resources, Ottawa
Potvin, Y., Empirical open stope design in Canada. 1989, The University of British Columbia (Canada): Ann Arbor. p. 1.
Suorineni FT (2010) The stability graph after three decades in use: experiences and the way forward. Int J Min Reclam Environ 24(4):307–339
Papaioanou A, Suorineni FT (2016) Development of a generalised dilution-based stability graph for open stope design. Min Technol 125(2):121–128
Abdellah Wael RE, Hefni MA, Ahmed HM (2020) Factors influencing stope hanging wall stability and ore dilution in narrow-vein deposits: Part 1. Geotech Geol Eng 38(2):1451–1470
Jang H, Topal E, Kawamura Y (2015) Decision support system of unplanned dilution and ore-loss in underground stoping operations using a neuro-fuzzy system. Appl Soft Comput 32:1–12
Henning JG (2007) Evaluation of long-hole mine design influences on unplanned ore dilution. McGill University (Canada), Ann Arbor, p 331
Zhao X, Jia’an N (2020) Method of predicting ore dilution based on a neural network and its application. Sustainability 12(4):1550
Shaorui S, Jiaming L, Jihong W (2013) Predictions of overbreak blocks in tunnels based on the wavelet neural network method and the geological statistics theory. Math Probl Eng 2013:706491
Hefni MA, Abdellah Wael RE, Ahmed HM (2020) Factors influencing stope hanging wall stability and ore dilution in narrow-vein deposits: part II. Geotech Geol Eng 38(4):3795–3813
Villaescusa E (2014) Geotechnical design for sublevel open stoping. CRC Press
Chongchong Q et al (2018) Prediction of open stope hangingwall stability using random forests. Nat Hazards 92(2):1179–1197
Delentas A, Benardos A, Nomikos P (2021) Analyzing stability conditions and ore dilution in open stope mining. Minerals 11(12):1404
Cordova DP, Zingano AC, Gonçalves ÍG (2022) Unplanned dilution back analysis in an underground mine using numerical models. REM - International Engineering Journal 75
Milne D, Hadjigeorgiou J, Pakalnis R (1998) Rock mass characterization for underground hard rock mines. Tunn Undergr Space Technol 13(4):383–391
Heiniö M (1999) Rock excavation handbook for civil engineering. Sandvik, Tamrock
Urli, V., Ore-skin design to control sloughage in underground open stope mining. 2015, University of Toronto (Canada): Canada -- Ontario, CA. p. 108.
Heidarzadeh S, Saeidi A, Rouleau A (2019) Evaluation of the effect of geometrical parameters on stope probability of failure in the open stoping method using numerical modeling. Int J Min Sci Technol 29(3):399–408
Petlovanyi M et al (2019) The influence of geology and ore deposit occurrence conditions on dilution indicators of extracted reserves. Rudarsko - Geolosko - Naftni Zbornik 34(1):83–91
Chimunhu P et al (2022) A review of machine learning applications for underground mine planning and scheduling. Res Policy 77:102693
Buaba, J.A., Application of machine learning techniques to estimate mine safety and health hazards for integration into underground production scheduling optimization. 2023, South Dakota School of Mines and Technology: United States -- South Dakota. p. 119.
Chimunhu, P., et al., Chapter 11 - Underground mine planning and scheduling optimization: opportunities for embracing machine learning augmented capabilities, in Applications of artificial intelligence in mining, Geotechnical and Geoengineering, H. Nguyen, et al., Editors. 2024, Elsevier. p. 183-195.
Gallwey J, Eyre M, Coggan J (2021) A machine learning approach for the detection of supporting rock bolts from laser scan data in an underground mine. Tunn Undergr Space Technol 107:103656
Jorquera M, Korzeniowski W, Skrzypkowski K (2023) Prediction of dilution in sublevel stoping through machine learning algorithms. IOP Conference Series Earth and Environmental Science 1189(1):012008
Ajak AD, Lilford E, Topal E (2018) Application of predictive data mining to create mine plan flexibility in the face of geological uncertainty. Res Policy 55:62–79
Jafrasteh B, Fathianpour N, Suárez A (2018) Comparison of machine learning methods for copper ore grade estimation. Comput Geosci 22(5):1371–1388
Afradi A, Ebrahimabadi A, Hallajian T (2022) Prediction of TBM penetration rate Using fuzzy logic, particle swarm optimization and harmony search algorithm. Geotech Geol Eng 40(3):1513–1536
Afradi A, Ebrahimabadi A, Hedayatzadeh M (2024) Performance prediction of a hard rock TBM using Statistical and artificial intelligence methods. Journal of Mining and Environment 15(1):323–343
Afradi A, Ebrahimabadi A (2021) Prediction of TBM penetration rate using the imperialist competitive algorithm (ICA) and quantum fuzzy logic. Innov Infrastruct Solut 6(2):103
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques : concepts and techniques. Elsevier Science & Technology, Saint Louis, UNITED STATES
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Sander U, Lubbe N (2018) The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB. Accid Anal Prev 113:1–11
Lin C-L, Fan C-L (2019) Evaluation of CART, CHAID, and QUEST algorithms: a case study of construction defects in Taiwan. Journal of Asian Architecture and Building Engineering 18(6):539–553
Weichert D et al (2019) A review of machine learning for the optimization of production processes. Int J Adv Manuf Technol 104(5-8):1889–1902
Miller AJ (1984) Selection of subsets of regression variables. Journal of the Royal Statistical Society Series A (General) 147(3):389–425
Lindsey C, Sheather S (2010) Variable selection in linear regression. Stata J 10(4):650–669
Abdi H, Williams LJ (2010) Principal component analysis. WIREs Computational Statistics 2(4):433–459
Linting M et al (2007) Nonlinear principal components analysis: introduction and application. Psychol Methods 12(3):336–358
Nilashi M et al (2021) Sustainability performance assessment using self-organizing maps (SOM) and classification and ensembles of regression trees (CART). Sustainability 13(7):3870
Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. J R Stat Soc: Ser C: Appl Stat 29(2):119–127
Jang, H. and E. Topal, Optimizing overbreak prediction based on geological parameters comparing multiple regression analysis and artificial neural network. 2013.
Dien J, Beal DJ, Berg P (2005) Optimizing principal components analysis of event-related potentials: matrix type, factor loading weighting, extraction, and rotations. Clin Neurophysiol 116(8):1808–1825
Chen C-F, Tsai D (2007) How destination image and evaluative factors affect behavioral intentions? Tourism management (1982) 28(4):1115–1122
Truong Y, McColl R (2011) Intrinsic motivations, self-esteem, and luxury goods consumption. J Retail Consum Serv 18(6):555–561
Ertz M, Karakas F, Sarigöllü E (2016) Exploring pro-environmental behaviors of consumers: an analysis of contextual factors, attitude, and behaviors. J Bus Res 69(10):3971–3980
Madenova Y, Suorineni FT (2020) On the question of original versus modified stability graph factors—a critical evaluation. Min Technol 129(1):40–52
Faradonbeh RS, Jahed Armaghani D, Monjezi M (2016) Development of a new model for predicting flyrock distance in quarry blasting: a genetic programming technique. Bull Eng Geol Environ 75(3):993–1006
Midi H, Sarkar SK, Rana S (2010) Collinearity diagnostics of binary logistic regression model. Journal of Interdisciplinary Mathematics 13(3):253–267
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions. The authors declare that no funding or financial assistance was received for the material presented in this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chimunhu, P., Faradonbeh, R.S., Topal, E. et al. Development of Novel Hybrid Intelligent Predictive Models for Dilution Prediction in Underground Sub-level Mining. Mining, Metallurgy & Exploration (2024). https://doi.org/10.1007/s42461-024-01029-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42461-024-01029-8