1 Introduction

Mine dumps are massive vertical structures containing a large amount of barren or spoil material from mine excavation. A potential failure of a mine dump often leads to expensive downtimes due to delays in production, fatal/non-fatal injuries, loss of machinery, and costs associated with dump reconstruction (McQuillan et al. 2020; Oggeri et al. 2019; Xiao et al. 2022). There may also be a significant adverse environmental impact, including deterioration of surface/groundwater quality, degradation of crops through bioaccumulation of metals, and contamination of aquatic and terrestrial ecosystems (Adibee et al. 2013; Welch et al. 2021). Therefore, mine dumps require precise design, operation, and monitoring to avoid structural failures. The composition and characteristics of the materials in a dump define the shear strength, which is a key parameter for a stable dump design. The excavated overburden and interburden (i.e., from above and between coal seams, respectively) material discarded in a dump are often made up of sand/silt/clay or mixtures of sandstone, siltstone, mudstone claystone, limestone, shale, coal, and may contain concretions or efflorescence (Welch et al. 2021).

In coal mining, draglines are most frequently used since it is best suited for stripping deposits made up of interburdens and relatively flat-lying resource seams. In the context of building dragline dump, profile geometry, spoil and foundation characteristics, the response of groundwater to sequential dumping and changes in dump profile over time with dynamic loading contribute to the stability of mine dumps. Due to the incapability of available strength testing configuration to cater for large particle-sized spoils, characterisation of dragline spoils is empirical or semi-empirical (Hawley and Cunning 2017). A thorough study by Simmons and McManus (2004) to determine a correlation between back-analysed shear strengths of failed dumps and laboratory-based shear strength resulted in a coal spoil characterisation framework called BHP Mitsubishi Alliance coal (BMAC) framework. BMAC framework characterise spoil material in coal mines into four categories based on visual-tactile attributes relevant to shear strength, such as particle size, plasticity, fabric structure, age, and consistency/relative density. Over the last two decades, the coal mining industry has extensively relied on the BMAC framework for maintaining dumps through manual in-situ characterisation of spoils using field investigations. However, this manual in-situ spoil characterisation is prone to human biases and errors and eventually reflected in the design parameters of dumps that may compromise stability (Mostofa 2015). Hence, investigation of new technologies that can deliver reliable automated spoil characterisation while minimising human subjectivity is necessary. Unmanned aerial vehicle (UAV) based remote sensing technology is a potential solution to overcome the drawbacks of manual spoil characterisation to provide quick, reliable and objective results.

Recent developments in UAVs coupled with miniaturised high-resolution sensor systems have improved monitoring and management of mines (Ren et al. 2019). These sensors are non-invasive and offer several benefits such as a short revisit cycle, flexible data collection and high precision. Several studies have investigated the use of multi-sensor UAVs (e.g., UAV optical, multispectral and hyperspectral imaging) in mines for geological (Jackisch et al. 2020; Tan and Qiao 2020), metallurgical (Barton et al. 2021) and environmental applications to characterise joints/fractures (Kong et al. 2021; Samieinejad et al. 2017), minerals (Sinaice et al. 2022) and native vegetation (Banerjee and Raval 2022; Banerjee et al. 2020), respectively. Kim et al. (2022) and Tien Bui et al. (2018) demonstrated that UAV-based structure from motion (SfM) is a viable and efficient tool for accurate 3D topographic mapping in complicated terrains such as open-pit mines. Moreover, advances in SfM have improved the capacity of topographic surveying using UAVs by mapping targets at different spatial scales more efficiently (Anderson et al. 2019; Froideval et al. 2019; Mlambo et al. 2017). The outputs are obtained in the form of digital surface models, 3D point clouds and orthomosaics, which possess immense potential for automated spoil material characterisation in surface mines. However, there is no study till date on the use of SfM-derived data to characterise coal spoil materials in dumps geotechnically.

Several studies have evaluated different combinations of features extracted from images and the potential of machine learning algorithms for the geotechnical characterisation of the materials in other contexts such as sand grain size mapping in riverbeds (Bae et al. 2019; Lang et al. 2021). These studies showed that geotechnical characterisation improves accuracy by extracting features (e.g., colour, texture and shape) from images and feeding those features into machine learning algorithms. For instance, Bae et al. (2019) demonstrated that employing optical features and textural features extracted using Gray level co-occurrence matrix (GLCM) filters from very high resolution (1.23 mm ground sampling distance, i.e., GSD) UAV images can improve the overall characterisation. Experimental analysis of six sand grain classes indicated accuracy improvement from 42.34% to 91.28% when deploying a maximum likelihood algorithm for characterisation. Another more recent study by Lang et al. (2021) in this area developed a convolution neural network, ‘GRAINet’, to investigate the particle grain size distribution near river systems. The model was trained using high-resolution UAV (0.25 cm) image patches and yielded a low mean absolute error (1.1 cm). However, GRAINet, compared to the approach suggested by Bae et al. (2019), is a data-driven deep learning approach developed using extensive datasets and computational power. Hence, in the context of limited data, utilising the combination of feature extraction and the traditional machine learning approach, which demands less computational power, is an industry-friendly way to characterise the geotechnical parameters of a mine dump.

Hence, in this study, various machine learning algorithms along with textural feature extraction are explored to develop an optimum workflow to characterise spoil materials based on lithology, fabric structure and consistency/relative density in presence of a limited number of datasets. With the mining industry increasingly using UAVs for monitoring applications, the proposed workflow would be useful in providing necessary insights to improve dump stability. Moreover, an effort is made to reduce/limit the human intervention in the produced outputs to help advance the mining industry towards automation.

2 Material and methods

This section covers the background in regard to BHP Mitsubishi Alliance Coal (BMAC) framework and describes the methodology adopted in this study for dump characterisation.

2.1 Conventional coal spoil characterisation approach

Out of many factors affecting coal mine dump stability, Simmons and McManus (2004) demonstrated shear strength of spoil as the critical parameter. Determining shear strength parameters of spoil involves time-consuming laboratory tests. Hence, Simmons and McManus (2004) developed a framework that provides the basis to classify spoils with different shear strength parameters based on their visual-tactile attributes. This framework is called the BMAC spoil shear strength framework (Table 1). Coal mine industry routinely utilise the BMAC framework to monitor spoil dumps.

Table 1 BHP Mitsubishi Alliance Coal (BMAC) framework spoil categories and their visual-tactile attributes (Simmons and McManus 2004)

The BMAC categories are derived by multiplying each attribute’s category with its relative total weightage. Sum of product of category and relative total weightage of category [∑(Category × Weightage)/100] are rounded to the closest category except when there is substantial mixing of materials from adjacent categories (i.e., categories 1.5, 2.5 and 3.5). Hence, the original BMAC four spoil category system (i.e., 1, 2, 3 and 4) could be expressed as seven spoil categories (i.e., categories 1, 1.5, 2, 2.5, 3, 3.5, 4).

Although the visual evaluation method provided in BMAC is simple, there are several limitations associated with the implementation of BMAC, some of which are: (1) the broad characterisation of spoils in BMAC may ignore quantifying the strength of subclasses within a particular category, (2) field examination of these attributes is time-consuming and requires expertise, (3) exposure of personnel to field hazards, and (4) characterisation of spoil is subjective when the attributes fall into borderline of two categories (e.g., defining the boundary between matrix-supported and framework supported fabric structure) (Mostofa 2015). Hence, an alternative method is needed to minimize human intervention in this characterisation process. Exploring UAV-based remote sensing is a potential solution.

Even though UAV-based image analysis is a promising alternative, characterising BMAC attributes using UAV-based remote sensing is challenging due to the difference in operational spatial scales. BMAC was developed to provide a framework to standardise field-based material characterisation of dump piles. Geotechnical experts in the field have a detailed view of the material and ability to use invasive methodologies to characterise material categories. On the contrary, UAV-based SfM sensing is limited to passive modes of characterisation. Nevertheless, SfM-based surveying is spatially holistic, efficient, and takes away the human risk element in field surveying. Moreover, this limitation could be resolved through improved analytics involving empirical methodologies to characterise materials, such as partial least square regression, machine learning regression or deep learning.

Though not all, several of the attributes in BMAC could be derived using UAV-based SfM due to their association with features derivable from images. In this lieu, this study characterises mine dumps for particle size and consistency/relative density, which are essential attributes in the BMAC framework, and can be obtained through UAV optical imaging. Particle size distribution provides insights into the shearing behaviour of the material by identifying the fabric structure, which is governed by lithology and influenced by fragmentation/breakdown during handling (Gao et al. 2017; Simmons and McManus 2004). Similarly, consistency/relative density influences a material’s shear strength and is affected by the particles’ chemical composition, size, shape and orientation (Baver 1930). Several studies have identified lithology as an essential contributor to most BMAC attributes (Andrade et al. 2011; Bishwal et al. 2017). Therefore, characterisation of mine dumps is also performed for lithology to aid future research on automated attribute derivation. A comprehensive workflow is presented to generate several spectral and textural features from images that provide enough descriptive properties to segregate spoil material based on three BMAC attributes: particle size, consistency/relative density, and lithology. Although, this study does not focus on predominant particle size, liquid limit and age, these attributes could be inferred from deriving associations with related geotechnical parameters derivable from the image in future.

2.2 Association between lithology and fabric structure or consistency/relative density

Lithology (physical characteristics of rock such as colour, texture, particle size, and mineralogical composition) of spoil influences the attributes in the BMAC framework (Andrade et al. 2011; Bishwal et al. 2017; Simmons and McManus 2004; Ulusay et al. 1995). It is possible to deduce BMAC parameters from lithology, if a relationship between lithology and relevant parameters could be established empirically. In this regard, this study investigated the association between lithology and consistency/relative density or fabric structure through a Chi-squared test of independence, a statistical test that evaluates the association between two categorical variables (Ugoni and Walker 1995). The hypothesis tested in this study is given below:

H0: No significant relationship between the two variables.

H1: There exists a relationship between the two variables.

2.3 Study area and field investigation

A dump site at a surface coal mine in New South Wales, Australia, was selected as the study area (Fig. 1). A field investigation was carried out to determine the predominant lithology, consistency/relative density and fabric structure of the spoil dumps according to the BMAC framework. Locations of these manually characterised spoil dumps were recorded using a multi-band global navigation satellite system (GNSS) based real-time kinetic (RTK) positioning receiver (Reach RS2, Emlid Ltd., Hong Kong) with centimetre level precision. These spatially tagged and characterised spoil dump data points were used as ground truth for training and testing image-based characterisation. The characterisation was restricted to a 2 m radius circle around the control points to account for the heterogeneity present in the dump environment.

Fig. 1
figure 1

a Location of a selected mine dump site in New South Wales b Selected study area, ground control points (GCPs) and distribution of ground truth sample points in the selected dump site. The size of GCPs and sampling point locations are shown using visual markers, as the original points are not visible in the given map scale. These markers do not correspond to the actual size of the objects on the ground

The mine dump categorical labelling was conducted by a team of expert geologists and geotechnical engineers. Lithologies of 323 mine dumps were identified in this process, out of which 217 dumps were labelled for consistency/relative density and fabric structure. The lithology categories consist of carbonaceous coaly rock, coarse sandstone, fine sandstone and mudrock. The consistency/relative density category includes dumps that fall into one of the five BMAC framework categories: 1.5, 2, 2.5, 3, 3.5 and fabric structure is separated into four categories: 1.5, 2, 2.5, and 3.

2.4 Dump characterisation workflow

The workflow implemented to characterised dump piles in the study area is shown in Fig. 2. Briefly, the steps are collection of ground truth and determination of association between lithology and geotechnical parameters (Fig. 2a), generation of orthomosaic from geotagged images captured during ground truth collection (Fig. 2b), generation of textural features from red, green and blue bands of generated orthomosaic (Fig. 2c), smoothing optical bands using optimal filters for noise reduction (Fig. 2d), dimensionality reduction of generated 24 texture features generated (Fig. 2e), and composite of filtered optical bands and reduced textural features (Fig. 2f) were subjected to machine learning algorithms to determine best performing algorithm for each parameter.

Fig. 2
figure 2

Workflow of image-based characterisation of coal spoil from step a to f

A detailed explanation of steps from data collection to data processing is presented in Sect. 2.4.1 through to Sect. 2.4.5.

2.4.1 Aerial data acquisition and preprocessing

A quadcopter—DJI Phantom 4 RTK (DJI, Shenzhen, China) was used to capture optical (red, green, blue, i.e., RGB) image data around solar noon over the mine dump site on the same day (1st February 2021) of ground truth data collection. The DJI Phantom 4 RTK incorporates an RGB camera, model DJI FC6310, with a sensor size of 13.2 × 8.8 mm, and a focal length of 8.8 mm. The images from the camera are stabilised by an electromechanical gyrostabilized gimbal and tagged with positional values, i.e., latitude, longitude and altitude. The UAV camera system was set to trigger and acquire images at the desired flying height of 87.7 m to achieve a ground sampling distance (GSD) of 2.19 cm. Image acquisition was set to automated capture mode with acquisition programmed at 85% forward and side overlaps to cover an area of 33,112 m2.

The raw images from the UAV mission were processed using a SfM-based photogrammetric stitching package, Pix4D Mapper (Pix4D SA, Lausanne, Switzerland). All images were aligned at the highest resolution, with 40000 key points and 4000 tie points. After the alignment process, the residuals revealed a good camera calibration with no significant errors. Then ground control points (GCPs), obtained in Step 2.3, were used to improve model reconstruction by optimising camera positions and orientation data. The bundle adjustment after marking GCPs resulted in an orthomosaic. Due to the undulating terrain height of the dump site, the spatial resolution of the orthomosaic had a mean GSD of 5 cm and an absolute planimetric error of 2.3 cm.

2.4.2 Selection of optimum mean filter for image processing

A mean filter was used to minimise heterogeneity within a dump pile by reducing pixel-to-pixel intensity variance and image noise (Kaur and Kaur 2013) in optical bands, i.e., smoothing contributes to the lowering of local-level noises that may result from (1) sample impurity, i.e., the existence of minute amounts of non-class material inside the target class material and (2) fluctuating levels of the particle shadow, i.e., the variation in shadow length and area projected by different size of debris in a target class material. Smoothing contributes to class homogeneity and prevents the creation of erroneous, misclassified pixelated noise.

Mean filters at eight steps were applied to the orthomosaic (i.e., 3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13 and 15 × 15 pixels) to examine the changes in classification accuracy with each filter size. Filtered images were categorised for lithology using the Decision Tree (DT) algorithm to identify the optimum filter size. The DT algorithm is explicitly used here since it is relatively fast compared to other traditional algorithms (i.e., k-nearest neighbourhood (kNN), DT, random forest (RF) and support vector machine (SVM)). This step used a simple single-fold validation with the training-to-test data ratio set at 50:50 to reduce computational resources and time. The overall accuracy and kappa coefficient were evaluated, and the best filter was selected to smoothen the RGB data prior to classification.

2.4.3 Texture feature generation

When spectral data alone is insufficient, texture can provide additional discrimination between features by providing a spatial relationship between pixels in the image (Ait Kerroum et al. 2010; Haralick et al. 1973). Haralick's features, which characterise the joint local statistics of pairs of pixels through the grey level co-occurrence matrix (Moser et al. 2013), were used in this study.

In the context of smoothing input layers prior to textural feature extraction, a high-resolution input grey-level band is necessary for the computation of precise textural measures. Hence, input layers were kept without being smoothed for texture generation.

Simple Haralick textural features (energy, entropy, correlation, inverse difference moment, inertia, cluster shade, cluster prominence, Haralick correlation) from unsmoothed RGB bands were generated using a 3 × 3 window with a sliding length of one pixel. Hence, 24 texture bands (3 bands × 8 textural features) were obtained. However, to avoid resource-intensive classification, it is crucial to identify salient textural features that provide the necessary discriminatory information (Ait Kerroum et al. 2010). Thus, dimensionality reduction is incorporated as the next step to identify texture information-rich bands.

2.4.4 Dimensionality reduction of textural features

Dimensionality reduction reduces the high dimensions using a projection or selection approach (Sellami and Farah 2018). In this work, two dimensionality reduction approaches, i.e., principal component analysis (PCA) and independent component analysis (ICA), were employed to extract two salient textural features out of 24 textural bands to reduce the computational resource requirement and computation time for classification. PCA results in an orthogonal linear transformation that increases the variance of the components, whereas ICA results in linear transformation where components are non-Gaussian and statistically independent (Tibaduiza et al. 2012).

2.4.5 Analysis of model performance in classification

In this study, three sets of composites were generated, (1) RGB, (2) RGB with principal component bands of textural features (RGB + PCA), and (3) RGB with independent component bands (RGB + ICA) of textural features. In these composites, the optical bands were smoothed to reduce noise while texture characteristics were left unaltered to preserve all texture information.

This study used a few relatively mature machine learning algorithms, namely kNN, DT, RF and SVM, as these algorithms have been tested and widely adopted in UAV image classification (Maxwell et al. 2018). Four-fold cross-validation was used to evaluate the models to prevent overfitting and improve generalisation performance (Gebrehiwot et al. 2019). Four-fold cross-validation was accomplished by randomly splitting the data into four equal folds. Each run's training set was created by combining three folds, with the remaining one-fold acting as testing or validation set to measure classification errors.

Three accuracy metrics, i.e., F-score, overall accuracy and kappa coefficient derived from the confusion matrices were used to compare the performance of the algorithms. The F-score is used to check the ability of the model to identify a class confidently. Overall accuracy is a measure of a model’s ability to predict all classes, whereas kappa is a measure of agreement between ground truths and predictions generated by the model. The kappa coefficient was also included to find the best-performing model due to the imbalanced data.

3 Results

3.1 Association between lithology and geotechnical attributes

The association of lithology with (1) fabric structure and (2) consistency/relative density, obtained from Chi-squared tests, are tabulated in Table 2. The p-value obtained from chi-squared independence test was compared to the significance level to evaluate the variable association. A higher p-value at a given significance level indicates that variables are independent, and thus null hypothesis (no relationship between variables) is accepted. Conversely, if the p-value is lower than the threshold, an association exists between variables revealing that the null hypothesis cannot be accepted. The test indicated that at a 5% significance (p-value of 0.05), an association exists between lithology and fabric structure as the p-value is significantly low. Similarly, an association between lithology and consistency/relative density was revealed at 10% significance level (p-value of 0.10). The statistical analysis suggests that any processing technique that improves the accuracy of lithological classification would potentially improve the accuracy of the spoil classification based on fabric structure and consistency/relative density.

Table 2 Results of Chi-squared independence test for association between lithology and geotechnical parameters

3.2 Selection of optimum mean filter

The performance of the DT algorithm for lithological classification on RGB images smoothened with different filter sizes is given in Fig. 3. Due to computational constraints, the maximum filter size was 15 × 15 pixels. Results indicate that overall accuracy plateaus at the 9 × 9 window size. Further, the generation of the smoothened image using 9 × 9 pixels takes less time compared to 11 × 11, 13 × 13 and 15 × 15 window sizes while producing the same accuracy. Therefore, 9 × 9 window size was chosen as optimum and RGB bands were smoothened using the selected optimum filter for further analysis.

Fig. 3
figure 3

Accuracy assessment of different smoothing filters and time taken to produce smoothened image with the specified window sizes

3.3 Dimensionality reduction of textural features

Each of the 24 textural elements produced by the Haralick texture generation from unsmoothed optical data does not contain valuable information. Hence, components with eigenvalue > 4 and eigenvalue > 13,000 were selected to generate the composite with optical data for PCA and ICA respectively (Fig. 4). Therefore, the first two components retrieved from PCA, and ICA were considered as components with salient textural features required to characterise spoil in addition to RGB bands.

Fig. 4
figure 4

Eigenvalue plot for principal component analysis (PCA) and independent component analysis (ICA). The eigenvalues decrease monotonically with the number of components, indicating a lack of information in higher components. The eigenvalue plot can be exploited to identify the optimum number of components

3.4 Analysis of the model performance in classification

3.4.1 Lithology

The overall accuracy and kappa coefficient of the lithology classification of three composites (i.e., RGB, RGB + PCA and RGB + ICA) using DT, kNN, RF and SVM are given in Fig. 5a The kNN algorithm, though simplistic, outperformed other algorithms in composites of RGB and transformed the components obtained from PCA and ICA. In the RGB composite, the SVM achieves slightly higher accuracy (OA: 0.78, kappa: 0.72) than the other algorithms. The best classification (OA: 0.82, kappa: 0.77) results were obtained for RGB + PCA with kNN. Results imply additional textural features have increased overall accuracy by almost 4%. However, all algorithms produced classification with overall accuracies within a narrow range (i.e., 0.78–0.82) and a kappa coefficient range from 0.71 to 0.77. In the context of per-class accuracy, evaluation metrics such as precision, recall, and F-score (Fig. 5b–d) show that coaly carbonaceous rocks are classified more accurately (F-score: 0.94–0.96) than other lithologies, whereas mudrock is classified with low accuracy (F-score: 0.50–0.66). F-scores of coarse sandstone and fine sandstone range from 0.87 to 0.88, and 0.74 to 0.78, respectively. Figure 6 generated from classifying RGB + PCA composite using kNN algorithm illustrated carbonaceous coaly rock, coarse sandstone, fine sandstone and mudrock make 4%, 16%, 30% and 50% of the selected dump site, respectively.

Fig. 5
figure 5

Overall accuracy and kappa coefficient of classification algorithms on composites formed using optical data, and optical + textural data. Precision, recall and F-Score of classifications, respectively for a Optical data b Optical + textural data transformed with principal component analysis (RGB + PCA) and c and d Optical + textural data transformed with independent component analysis (RGB + ICA)

Fig. 6
figure 6

Lithological classification produced by a composite of optical data and data transformed with principal component analysis (RGB + PCA) using k-nearest neighbourhood (kNN)

3.4.2 Fabric structure

Outcomes of fabric structure classification reveal that kNN has superior performance on RGB + PCA composite with overall accuracy of 0.76 and kappa coefficient of 0.67 (Fig. 7a). All algorithms, however, generated categorisation with overall accuracies falling within a small range (i.e., 0.78–0.82) and a kappa coefficient range from 0.61 to 0.67. All categories with an F-score greater than 0.5 denotes the ability to classify the categories confidently (Fig. 7b–d). Values of precision, recall and F-score indicate that category-1.5 can be categorised with high accuracy (F-score: 0.86–0.92). F-scores of categories 2, 2.5, 3 range from 0.64–0.69, 0.71–0.75, 0.58–0.64 respectively. Figure 8 generated from classifying RGB + PCA composite using kNN algorithm illustrated cat-1.5, cat-2, cat-2.5 and cat-3 make 24%, 48%, 16% and 12% of the selected dump site, respectively.

Fig. 7
figure 7

Overall accuracy and kappa coefficient of classification algorithms on composites formed using optical data and, optical + textural data. Precision, recall and F-Score of classifications respectively for a Optical data b Optical + textural data transformed with principal component analysis (RGB + PCA) and c and d Optical + textural data transformed with independent component analysis (RGB + ICA)

Fig. 8
figure 8

Fabric structure classification using k-nearest neighbourhood (kNN) algorithm on a composite of optical data and data transformed with principal component analysis (RGB + PCA)

3.4.3 Consistency/relative density

The kNN performs slightly better with RGB + ICA (Fig. 9a) for consistency/relative density classification than other algorithms and composites. The best classification (OA: 0.71, kappa: 0.63) results were obtained with kNN. Tested algorithms produced overall accuracies within a small range 0.65–0.71 in all composites and kappa coefficient range from 0.56 to 0.63. F-scores of categories 1.5, 2, 2.5, 3, 3.5 range from 0.76–0.81, 0.51–0.58, 0.65–0.71, 0.34–0.47, 0.85–0.90, respectively, All the categories except category-3 show an F-score greater than 0.5, which denotes the ability to classify these categories confidently (Fig. 9b–d). Low F-score (less than 0.50) in category-3 may have risen from different lithologies belonging to this category. Figure 10 generated from classifying RGB + PCA composite using kNN algorithm illustrated cat-1.5, cat-2, cat-2.5, cat-3 and cat-3.5 make 23%, 30%, 16%, 24% and 8% of the selected dump site, respectively.

Fig. 9
figure 9

Overall accuracy and kappa coefficient of classification algorithms on composites formed using optical data and, optical + textural data. Precision, recall and F-score of classifications respectively for a Optical data b Optical + textural data transformed with principal component analysis (RGB + PCA) and c and d Optical + textural data transformed with independent component analysis (RGB + ICA)

Fig. 10
figure 10

Consistency/relative density classification produced by a composite of optical data and data transformed with independent component analysis (RGB + ICA) using k-nearest neighbourhood (kNN)

4 Discussion

4.1 Discussion of results

This study presented a workflow to characterise coal mine dumps for lithology, consistency/relative density and fabric structure. The method provides new insights into the mapping properties of coal spoils using machine learning algorithms. The study reveals that adding textural features improves the classification accuracy of three attributes. The highest accuracy (82%) was obtained for lithology, while consistency/relative density and fabric structure showed nearly equal accuracy (~ 77%). A possible explanation for this discrepancy might be the high dependence of lithology on colour (Simon et al. 2016) than the other two attributes. Therefore, lithological classification was inherently better since this study used optical data and textural features derived from the optical data. Additionally, statistical analysis using chi-squared test identified the association between lithology and fabric structure, and between lithology and consistency/relative density at 10% significance levels (p-value: 0 and 0.088 respectively). This implies that improvement of lithological classification is likely to improve the derivation of fabric structure and consistency/relative density.

kNN showed superior performance for categorical classification of three attributes due to the large difference between the number of training data (323 and 217) and the number of features (5). DT and RF had slightly better accuracy than SVM as they are more suited for categorical data and cope with collinearity better than SVM (Merghadi et al. 2018). In all three attributes of interest, machine learning performs well on composites of RGB and transformed textural bands (i.e., either RGB + PCA or RGB + ICA). This result indicates that accuracy can be further improved by using texture information in spoil classification. Given the vast number of textural parameters that can be derived from either spatial co-occurrence matrices or geostatistical functions at different window sizes and lag distances, the addition of textural information to classification may result in an exponential growth in the features used for classification (Fan 2013). To prevent the curse of dimensionality induced by additional textural parameters, which causes classification performance to deteriorate and computation time to increase, dimensionality reduction using PCA, and ICA is essential. Although composite with PCA performed slightly better in classification of lithology and fabric structure, it is worth noting overall accuracies achieved in RGB only, RGB + PCA and RGB + ICA are nearly equal. This outcome could be a result of the low spatial resolution not being able to capture the necessary texture information.

Fabric structure and consistency/relative density showed a statistically significant association with lithology. Out of these attributes, fabric structure shows a higher association with lithology at 5% significance as most mine dumps with the same lithology fall into a similar category for fabric structure. Hence, fine categorisation of lithologies (for instance, mudrock into claystone and siltstone) may further improve the association and inference of geotechnical attributes.

When comparing per-class accuracy, a major factor that impacted accuracy was the ambiguity between lithology when they had similar optical properties but different material properties. For instance, at the given spatial resolution, mudrock and sandstone appear similar in RGB band due to similar shades of grey, but they have different material types, which increases the error of omission and, thereby the overall accuracy. A possible solution to improve accuracy could be to either have high-resolution images with ample texture information or use multi-scale textural features at the expense of processing time. In addition, class imbalance in reference data due to the nature of the mine environment might also impact the algorithm’s performance. Since the likelihood of selecting a class in simple random selection is related to the per-class area, relatively rare classes will likely make up a smaller fraction of the training set. In presence of imbalanced data, it is typical for machine learning algorithms to underestimate the proportions of less common groups as the learning process tries to decrease the overall error by underpredicting rare classes.

4.2 Study contribution and scope for future work

Even though prior studies have highlighted the importance of autonomous spoil characterisation due to the time and effort required in manual BMAC framework-based characterisation, research to date has not explored use of UAV based optical remote sensing in coal spoil characterisation. This research addresses the gap by investigating remote sensing as a potential solution for automated characterisation. It is shown that when trained with sufficient data, machine learning algorithms could lead to automated and satisfactory classification results with minimal human intervention. The contribution is specifically beneficial as (1) fewer on-site visits for characterisation would be required, thereby reducing exposure of mine personnel to hazards, (2) reduced human biases in results, (3) ability to cover a large area and provide results in minimal time, (4) provide flexibility in multi-spatio-temporal data capture using UAVs, (5) verify whether dump progress follows predefined design for proactive decision making, and (6) generate a 3D profile of dump development with time.

This study primarily aims to prove that materials in a dump site could be classified into different classes through this first-of-its-kind applied use of UAV-based remote sensing. UAV-based classification of dump materials can potentially add practical value in the way dumps are currently being monitored, i.e., by making the process more comprehensive, safer, faster, and unbiased. The results show that the approach applies to material types classified under the BMA framework, developed over two decades for coal dumps. Although the study was done in one location, a standard four-fold cross-validation testing was used to split the samples into mutually exclusive ‘training’ and ‘test’ sets, advocating the reliability of the assessment. Furthermore, we do not aim to cross-deploy the trained classification model from one mine site to another. The reported algorithm, by its nature, needs some input training from a given dump site, and as long as the input training can be suitably provided, the method would potentially work in the given environment. It is important to add more sites to diversify the input material properties (lithology, fabric structure, etc.), possibly from other basins, if direct cross-deployment of the models is to be sought. This is outside the scope of current study and will require substantial data gathering and aerial surveying. Hence, we have deemed it as a future scope.

Although this study assessed pixel-based classification, few shortcomings could be addressed in future work. Pixel-based classification techniques classify individual pixels into one class and do not consider spatial and contextual information. For instance, shadow and changes in sun angles with image acquisition time can result in the classification of pixels belonging to one class into different classes (Aryaguna and Danoedoro 2016). Object-based classification techniques reduce this effect by grouping the pixels as objects and allocating the object to a class. Hence, object-based techniques provide a realistic and geometrically precise spatial mapping of features of interest (Blaschke 2010). Object-based classification could be explored as a future work to provide more accurate classification results. Additionally, with the availability of more data, an advanced machine learning algorithms could be employed to improve the results. Further case studies would be needed to evaluate the applicability of the proposed method to other mine sites.

5 Conclusions

Spoil dumps without proper temporal characterisation may cause design reconciliation issues and structural stability. Coal spoil characterisation has conventionally demanded the presence of expertise in the field, which is not only hazardous but also involves intensive field activity. The development of tools such as UAV coupled with SfM photogrammetry provides immense potential in automating the otherwise manual process of spoil characterisation that was previously time-consuming and prone to bias. This study presented the use of UAV-based optical sensing for lithology, fabric structure and consistency/relative characterisation of spoil dumps. The optical data together with derived textural features helped achieve better overall accuracies for lithology, fabric structure and consistency/relative density. The use of textural features slightly improved the performance by capturing spatial context in the classification. A further improvement in accuracy could be achieved with high-resolution images, more training data and improvement of the processing workflow. Nevertheless, it is evident from the obtained results that the workflow proposed can be used to remotely map mine dumps and automatically perform the lithological and geotechnical characteristics contributing to the shear strength of the spoil. This study also provides insight into machine learning based classification, which is not explored in the context of mine dump characterisation. Object-based classification and deep learning algorithms could be explored as a future work of this study to improve characterisation accuracy. Moving forward, automated dump spoil characterisation would enable proactive decision-making for improved and efficient dump management.