1 Glossary

  • Artificial intelligence (AI), a computing system or machine capable of solving problems typically requiring human or animal intelligence.

  • Backpropagation, an algorithm in neural network for computing the gradient and expected output value with respect to a loss function.

  • Digital terrain model (DTM), a digitised topographic representation of a geographical surface which is commonly discretised into grids.

  • Facies, the characteristics of a rock body that could be used to determine its origin, features and characteristics.

  • Machine learning (ML), a method that models the ‘rules’ from input data, mapping them to the output. Sub-fields of machine learning include neural network and deep learning.

  • Multispectral sensor, an image-based sensor capable of measuring light intensities in a discrete number of spectral bands that are not limited to the visible light range.

  • Open-pit mine, a mine where ore deposits are extracted on the surface of the earth.

  • Overfitting, a concept in machine learning where the model produced is tightly coupled with the training dataset and does not generalised well to other dataset within the same domain.

  • Point cloud, a set of data points representing an object or scene in 3D space.

  • Remote sensing, a technique of capturing information using sensors from a distance carried by a platform such as a satellite, aircraft or UAV.

  • Semi-autogenous grinding (SAG), a milling process where gravity is used as a primary force to achieve material breakage by inducing material to fall from the upper regions of a rotating cylinder to impact with and break material in lower regions of the cylinder.

  • Tailing pond, a reservoir for storing liquid tailings, i.e. mine liquid waste product.

  • Test set, a portion of a dataset used to validate the performance of a trained machine learning model

  • Train set, a portion of a dataset used by a machine learning algorithm for model optimisation.

  • Underground mine, a mine where ore deposits are extracted below the surface of the earth, which typically requires subsurface excavation.

  • Unmanned aerial vehicle (UAV), an aircraft without any human operators onboard.

  • Validation set, a portion of a dataset used to validate a machine learning model during training.

  • Well/borehole logs, a record of geological measurements and analyses of rock surfaces exposed within a drill hole.

2 Introduction

Advances in digitisation in industries has led to a big data revolution that has provided opportunities for improving performance in many tasks through data-driven methods for reasoning, modelling, optimisation and decision making (Thomas and McSharry 2015). In the mining industry, the extensive use of sensors and instrumentation has enabled large amounts of data to be collected in real-time from machinery during daily mine operations. These data appear in different forms (e.g. images, point clouds, discrete, time series) and dimensions (e.g. from 1D to 4D/5D and more) which have the potential to be indexed and fused into combined problem space representations. The large amount of collected data provides opportunities to be exploited through artificial intelligence (AI) methods, focusing on data-driven methods such as machine learning (ML) with its sub-fields including deep learning, with the aim of finding correlations, clusters and categories for gaining insights for improving both the safety and the productivity of a mine site (RioTinto 2022).

2.1 Deep learning

The theoretical definition and technical details of deep learning (DL) is well explained in a number of reference either comprehensively (Lecun et al. 2015; Goodfellow et al. 2016; Chollet 2021) or succinctly as a section of a review (Méndez et al. 2023; Zhang et al. 2021). This section aims to engage potential mining audience by giving a high-level explanation to DL. As depicted in Fig. 1, artificial intelligence (AI) encapsulates a range of developing technologies. Machine learning (ML) is capable of performing data classification and regression. Artificial neural network (ANN) is a subfield of ML that use layers of neurons with weightings that are trained to represent the transformation of the data into an output. DL consists of multiple layers of neurons derived from ANN capable of feature extraction and improved prediction.

In DL methods, the primary data structure is a network of nodes connected by links (also known as a network configuration (Pingel 2022)), where each node has a set of input links, providing inputs from data inputs or the outputs of other nodes, and output links connecting to other nodes or network outputs (Lecun et al. 2015). A node mimics a simplified model of a biological neural network. Historically, this has been an integrate and fire model, where a neuron is modelled as an integrator that sums the input values and generates an output if the sum exceeds a threshold (Burkitt 2006). A DL is a method that derives rules from data for mapping input data to desired outputs and is trained (in the case of supervised learning) using an algorithm that uses the error between the outputs generated by the DL model from specific examples of inputs and the ground truth outputs provided by the training data to modify the weights and biases within the network, aiming to optimise the overall performance (e.g. accuracy and precision). When a target level of performance is reached, the training process can end and the resulting DL model can function on new data sets to automate the classification of their data vectors.

Fig. 1
figure 1

Relationship between artificial intelligence, machine learning, neural network and deep learning inspired by Goodfellow et al. (2016) and Kavlakoglu (2020)

The experimental performance of a DL is typically assessed by separating (e.g. randomly) the data into training set, validation set and test set. The DL learns or extracts features (i.e. weights and biases) from the training set. The validation set is used to compare the model’s performance during training at certain intervals and is not used to optimise the model’s parameters. At the end of the training routine, the model is then tested by applying it to the known correct categories for a given input vector in the test set. The practice of data separation aims to ensure the validity of a trained model during and after training and to identify the model’s generalisation biases at different points i.e. overfitting and underfitting as explained in Lei (2021), Wagner et al. (2021) and Jabbar and Khan (2015). The broader question of validity concerns the performance of the DL network when applied beyond the initial training and testing context, i.e. to situations where the constraints and distribution of data inputs encountered may differ or not, in ways that may or may not be known or anticipated, from the data used to develop the DL network.

The term ‘deep’ in deep learning does not refer to the depth of the learning model’s comprehension capability but refers to the number of layers in the network architecture (Chollet 2021; Kavlakoglu 2020), which raises the question: How many layers does a network architecture need to have to be considered a deep learning method? Although there is no general consensus on the definition, in this paper, a network architecture is considered to be ‘deep’ if it consists of at least two hidden layers, a total minimum of 4 layers including the input and output layers (Kavlakoglu 2020).

DL throughout the recent decade has been a focus of attention because of its proven state-of-the-art performance in solving multiple tasks, especially in computer vision thanks to the increasing amount of publicly available datasets and computational resources. DL gained momentum in 2012 when a convolutional neural network (CNN) called AlexNet (Krizhevsky et al. 2012) created by the research team ‘SuperVision’ outperformed all competitors by a significant margin in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) (Russakovsky et al. 2015). Subsequently DL has been adopted outside the computer vision field in other research domains and industries to solve numerous tasks involving classification and regression. However, the majority of DL adaptations in other application domains have been implemented with minimal architectural modifications. Changes are typically limited to the datasets and network training methodology used.

2.2 Deep learning in mining

DL implementations have been extensively reviewed for geochemical mapping (Zuo et al. 2019), geosciences (Ayranci et al. 2021), ophthalmology (Wang et al. 2021c; Badar et al. 2020), finance and banking (Huang et al. 2020; Ozbayoglu et al. 2020) and medicine (Liu et al. 2021b; Wang et al. 2021a; Debelee et al. 2020; Bizopoulos and Koutsouris 2019; Bakator and Radosav 2018). However, DL reviews in the mining context, such as by Jung and Choi (2021) and Fu and Aldrich (2020), have not been extensive. The scope of the review by Jung and Choi (2021) includes a broader context of ML with a limited number of DL approaches (only 63 papers were reviewed). Meanwhile, Fu and Aldrich (2020) only include extraction, transportation, and processing of minerals in the mining context, providing a compact overview of DL methods focusing on implementation in these application fields. Other processes in the mining value chain could include exploration, planning, safety and reclamation. This paper aims to provide a comprehensive systematic review of published work on DL implementations in metal and coal mining-related applications, categorised based on the mining tier processes in Fig. 2. The aim is to encourage generalising DL adoption in different mining processes. The literature review aims to answer the following questions:

  1. 1.

    What are the DL implementation trends in the mining context?

  2. 2.

    How are the DL methods implemented for mining processes?

The first question is answered by examining trends such as the distribution of DL usage and related network architectures categorised across different mining processes. Answers to the second question consider the application context, the problem to solve, network architecture training methods and data. The answer to these questions will be the basis for outlining the limitations of these implementations. As a summary, the gap between state-of-the art deep learning approaches and their adoption in the mining context will be outlined with suggestions for possible implementation frameworks.

The knowledge domain of these applications could also be derived from other domains such as geotechnics, geoscience, remote sensing, computer vision and robotics because mining is an interdisciplinary field which includes a wide variety of processes. The included relevant literature in this survey is motivated by mining applications and/or applied to data collected from a mine site and not those that might have a potential to be applied in a mining context. To avoid redundancy, this paper excludes DL approaches in mine operations that have been compiled which include blast-induced impacts (Al-Bakri and Sazid 2021), blast vibration (Maulana et al. 2021) and microseismic event classification (Jinqiang et al. 2021).

Fig. 2
figure 2

Processes involved in a mining operation adapted from Jung and Choi (2021)

The mining industry could benefit from adopting DL method compared to ML techniques and analytical/numerical modelling in several different ways: (a) The large amount of data collected in the mining process could be exploited to make prediction and analysis for increase efficiency and productivity given DL is a data driven method. (b) DL methods do not require feature engineering and extraction which would require minimal data processing in comparison to ML techniques. (c) In comparison to analytical or numeral modelling, DL methods would require less mining expert intervention in the process of developing the model. (d) Finally, DL methods would take less time to make inferences given a trained model in comparison to numerical or analytical methods.

The articles collated are categorised based on the main processing involved i.e. exploration, extraction and reclamation and breaking it down based on the hierarchy as depicted in Fig. 2. The rest of the paper is structured as follows. Section 2 explains the search methodology used to find relevant articles. Section 3 outlines the trends of the published articles based on publication year, DL architecture and mine processes involved. Section 4 discusses the DL implementation in the articles categories into subsections based on the 3 main processes followed by Sect. 5, which discusses the findings, recommendations, and future prospects. Finally, the conclusion outlined in Sect. 6.

3 Review methodology

Using research databases such as Google Scholar, Web of Science, Scopus, Springer, ScienceDirect and IEEE Xplore to find relevant deep learning literature, mining processes keywords such as ‘Exploration’, ‘Extraction’ and ‘Reclamation’ were used in combination with their sub-keywords as shown in Fig. 2. These keywords were combined by adding ‘deep learning’ keywords and its specific method terms such as deep learning (DL), artificial neural network (ANN), fully connected network (FCN), convolutional neural network (CNN), generative adversarial network (GAN), recurrent neural network (RNN) and other derived networks to create an initial list of publications.

A second search session was conducted by including distinct relevant literature from the list of references and cited papers of the initial list. The literature searches were repeated until the references were exhausted, i.e. when the relevant articles from the references and cited papers in the respective articles in the collated publication list had been included.

The collated publication list was then filtered by skimming each article, ensuring their relevance within the scope of this paper by reading through the abstract, introduction, methodology (focusing on the network used) and its data. As mentioned in the introduction, only articles that matched the following criteria were included in the final review list, which comprised 111 articles: (a) Motivated and/or applied in a mining context; (b) Using a network consisting of a minimum of two hidden layers; and, (c) Not including the mining context of blast-induced impacts, blast vibration and microseismic event classification.

Fig. 3
figure 3

Mining-related publications distribution by year, noting that only the first 4 months of 2022 are considered

4 Research trends

A total of 111 articles were included for review in this survey. To examine the trends, Fig. 3 shows the distribution of articles according to their publication year. The number of deep learning articles in the mining context increased exponentially with 37 articles in 2021. Acknowledging the fact that the publications included were only up to the first quarter of 2022, the projected publication number in the year 2022 would likely more than double the previous year assuming about the same amount of work to be published each quarter. The implementation of ANNs in the mining context can be traced back to 1995 (Maxwell et al. 1995), where an ANN was used to predict the size of materials on a conveyor belt for mineral processing. The limited processing resources and data available then restricted the number of layers included in the neural network to one hidden layer. This one-layer implementation not only requires fewer resources but also reduces the capacity to overfit the machine learning model due to the small amount of data used, however, this does not satisfy the deep learning definition in terms of the minimum number of layers required. An analysis of articles based on the publication venue is provided in the associated supplementary information.

Fig. 4
figure 4

Distribution of published articles in respective mining processes

Figure 4 shows the distribution of published articles in each respective mining process. As shown in Fig. 4, the majority of articles were published covering the topic of minerals extraction, which accounts for about 72% of the total articles collated. The rest of the collated articles covered exploration and reclamation processes, with 31 published articles in total. The large number of articles focusing on the extraction method could be due to many reasons, although it is notable that extraction represents the primary capital investment in mining. In addition, the number of extraction sub-processes included in the review is twice as many as the number of sub-processes of the other processes individually. Also, some of the sub-processes, for example, mineral exploration and land cover, can be generalised into the geoscience and remote sensing field, respectively, where the context of application and motivation are not restricted to mining. However, only the articles that were applied specifically in the mining context were included in this review.

Figure 5 shows how the distribution of DL network types were broken down with respect to the three main mining processes. As shown in Fig. 5, the majority of DL implementations adapted a CNN approach which accounted for 78 articles, more than 70% of the published articles reviewed. FCN which is the basic implementation of an ANN, accounts for a total of 17 articles, about 15% of the total reviewed articles. RNN which is a DL architecture suitable for time-series or sequential data, accounts for a total of 13 articles. 2 articles were published implementing 3D convolutional neural network (3D-CNN) which is a type of CNN capable of performing convolution on unorganised volumetric data such as point clouds. Finally, one article reported implementing a deep belief network (DBN), which is a specific type of DL network that employs a greedy learning method for optimisation rather than the back propagation method used in the other network types included in this review.

Fig. 5
figure 5

Distribution of deep learning network used in different mining processes

The emphasis on CNN probably arises because it is a DL technique commonly applied for image-based tasks such as object identification and segmentation. This is the type of network that was made popular due to the state-of-the-art performance achieved in the ILSVRC challenge as mentioned in the introduction. Since then, most development has focused on CNN architecture compared to other types of DL networks. As a result, many simplified libraries and trained models were made easily accessible for a number of well known CNN implementations such U-Net (Ronneberger et al. 2015) and Mask R-CNN (He et al. 2017). The availability of a large image dataset and the cheap cost of image-based sensors in combination with the development of CNN as described are the possible driving factors for its adoption in the mining context.

5 Deep learning in mining processes

This section provides a review of deep learning implementations in the mining context, categorised based on the three main processes: exploration, extraction and reclamation. This review aims to establish existing knowledge of how DL methods have been adapted for solving problems in mining, in order to identify the gaps in the research conducted to date.

Tables 1, 2, 3, 4, 5 and 6 in the following sub-sections show the publications reviewed for each category of mining processes with the general network type used and applications. These applications briefly answer the question: ‘What type of problem does the DL implementation focus on solving?’ and considers only the component in the problem space for which DL was implemented. For example, DL is used by Wang et al. (2020) (Table 6) to segment the mine site area given a single satellite image with the aim of automating monitoring of land usage changes within a given period. However, the proposed change monitoring component does not adopt a DL technique, hence the change monitoring method is not mentioned in the ‘Specific application’ column. The ‘Specific application’ column also highlights the dataset used in each study with regards to their sources, contents, acquisition techniques and amount.

DL has been used to solve a variety of problem in domains that require different forms of outputs. In the ‘Specific application’ column of Tables 12345 and 6 the terms ‘estimating’, ‘classifying’, ‘detecting’, and ‘semantic segmenting’ are used to describe the DL task when solving a particular problem. These tasks are defined as follows:

  1. 1.

    Estimating: determining the value of a subject within a range of a continuous function. Estimating is comparable to regression. An example is to determine the cost of an operation given a set of conditions (Zhang et al. 2020; Guo et al. 2021).

  2. 2.

    Classifying: to identify the group to which a data instance belongs. Groups can be obtained by discretising a continuous range of values. For example, a DL method that classifies a truck loading capacity according to whether it is empty, 25%, 50%, 75% or 100% full rather than estimating the loading weight (Sun et al. 2021).

  3. 3.

    Detecting: to determine the location and region of a target, i.e., localising an object or agent within an environment. An example is to draw a circle around oversized rocks in an image (Loncomilla et al. 2022).

  4. 4.

    Semantic segmentation: a specific task in image or point cloud perception to classify an individual pixel or point into its respective group. For example, to classify all the pixels in a satellite image that fall within a mine site rather than its surrounding areas (Wang et al. 2020).

A DL approach could perform multiple tasks such as performing detection, classification and semantic segmentation using a single model. An example in a hyperspectral image is to draw a bounding box (i.e. detection), classify the mineral group of the bound box sub-image (i.e. classification), and then classify all the pixels in the bounding box belonging to the classified mineral type rather than the background (i.e. semantic segmentation) (Galdames et al. 2022). The text in the following sections also includes further discussion of selected implementations and their results as a guide for future studies.

5.1 Exploration

Exploration is the initial process in the mining value chain. During exploration, activities such as mapping, and mineral analysis and prospecting are carried out to estimate the mineral location and reserve size. This information is then used for mine planning and cost estimation to identify a feasible operating approach to gain investment to proceed with setting up the facilities to extract the economically feasible minerals.

Fig. 6
figure 6

Distribution of published articles relating to mine exploration activities

Figure 6 shows the distribution of mining processes involved in the mine exploration phase and Table 1 outlines the specific DL application applied in each article. These processes are categorised into two: (a) Mineral exploration, the process of mapping the geology, predicting mineral types and amount; and (b) Mine planning, the process of planning and estimating cost for mine operations. As mentioned earlier, mining-related exploration articles only accounted for about 13% of the overall articles reviewed. This could be due to the broader research field these processes belong to such as earth sciences and geology, wherein mining is an applied subfield.

Referring to the distribution in Fig. 6, the majority of articles reviewed involved mineral analysis, which accounts for 9 articles, about 65% of the total articles reviewed in mine exploration. These articles focused on identifying different types of rocks and minerals in images obtained from different types of imaging sensors as input utilising a CNN as listed in Table 1. Li et al. (2022b) proposed a simplified and lightweight network based on YOLOv3 by experimentally removing unneeded layers and branches. The dataset used was augmented to increase the number of training data which was also implemented by Liang et al. (2021) and Asiedu et al. (2020). A siamese adversarial-based network was proposed by Hao et al. (2022) which takes in the same microscopic image polarised differently to classify the type of minerals and their origin in one single architecture based on ResNet (He et al. 2016). The proposed network performance by Hao et al. (2022) and Filippo et al. (2021) was cross-validated by testing the trained networks with a dataset from a different context to the one used for training. Ran et al. (2019) proposed an image cropping method as the input layer and performed scoring by voting for the output confidence scores of the cropped images from the same single image to identify the rock type of the un-cropped image. Jin et al. (2022) proposed a network that uses U-Net (Ronneberger et al. 2015) as the backbone incorporating inception blocks (Szegedy et al. 2015) and dense connection blocks (Huang et al. 2017) as layers which achieved a pixel-wise accuracy of 93%, similar to its predecessor i.e. U-Net, ResNet and SegNet (Badrinarayanan et al. 2017), but was capable of converging faster as well as maintaining its performance across the training epochs without any fluctuations during validation. Baraboshkin et al. (2020) compared the lithography classification performance of AlexNet, VGG (Simonyan and Zisserman 2015), GoogleNet (Szegedy et al. 2015) and ResNet whereby all the network achieved similar f1 scores ranging from 93% (ResNet) to 96% (VGG). This shows the maturity of image-based DL methods whereby the network choice should be based on other aspect such as amount of data needed and computational time rather than their raw classification performance.

The article published relating to geotechnical mapping utilised RNN based on a long short-term memory (LSTM) network that takes into account well log neighbourhood layers for sedimentary facies classification (Santos et al. 2022). The proposed network outperformed traditional ML techniques such as XGBoost, random forest, naive Bayes and support vector machine (SVM) in terms of classification performance of a geological layer is depending on its neighbouring layers within the same borehole and layers from neighbouring boreholes.

Table 1 Description of articles surveyed for mine exploration processes

Multiple geological information discretised spatially on a map as additional layers for an input image to classify ore presence in a particular area was reported by Li et al. (2020b). The additional geological information provided the network with the necessary data for the classification task. Zhang et al. (2020) introduced an ant colony optimisation algorithm alongside an FCN for optimising the weights of the network. An experiment was also conducted comparing the performance by varying the network and optimiser configuration in terms of the FCN layers and the number of ants. The experiment suggests that the performance increases during training and testing as the number of layers increases. However, the performance peaked at 6 layers and started to decrease as more layers were added.

5.2 Extraction

After completion of the processes required during the planning phase, the mineral deposits are ready to be extracted. The extraction process is carried out using surface and/or underground mining methods depending on the environmental conditions (such as the depth of mineral deposits in the ground), maximising the profits (NSW 2022; Bustillo 2018). Different mining methods required different processes, equipment, management and safety requirements. For example, gas monitoring is more of a concern in underground mines given the closed environment compared to open pit mines where gases can readily disperse into the open air. This section discusses DL implementation in these various mining extraction processes.

Fig. 7
figure 7

Distribution of published articles relating to mine extraction activities

Table 2 Description of articles surveyed for equipment management in mine extraction processes

Figure 7 shows the distribution of DL papers for the various mining processes involved in the mine extraction phase. Tables 2, 3, 4 and 5 outline the specific DL methods applied in different processes of the mine extraction phase, i.e. equipment management, geotechnical management, ore preparation and mine safety, respectively. Equipment management in a mine extraction process involves management of the operation and maintenance of all machinery used at the mine site. Equipment operations include navigating through the mine site, extracting minerals, hauling extracted material, and loading and unloading material to/from stockpiles. Maintenance involves detecting and/or predicting failures and following these up with a maintenance activities and routines.

Overall, the majority of the shortlisted reviewed articles in this paper focus on topics in the mine extraction process where DL has been applied. Mine extraction-related articles account for 79 papers, about 79% of the total articles reviewed. The high number of published papers on mine extraction processes could be due to the specificity of mineral extraction fields to mining and the primacy of this activity for investment and revenue generation.

Table 3 Description of articles surveyed for geotechnical management in mine extraction processes

Overall, the majority of mining extraction articles implemented a CNN, which accounted for 53 papers or 67% of all the mine extraction articles included. As explained in the previous section, the wider adoption of CNN might be due to its maturity and easily available data. The implementation of FCN and RNN accounted for 11 and 12 articles, respectively. Additionally, in Table 3: Description of articles surveyed for equipment management in mine extraction processes, one article implemented a DBN and two articles implemented 3D-CNN, a DL network that is designed for 3D point cloud.

In the equipment management literature, Liu et al. (2021a) implemented early stopping for training a CNN model. Early stopping is an implementation during DL training that monitors the objective function, typically the loss function, and stops the training when no improvement is made for a certain number of epochs. The aim of DL is to create a generalised model that shows robust performance on a different dataset beyond those it was trained with. The early stopping method prevents the learning algorithm from overfitting the model to the training set after reaching stable objective function values. DL methods require a large amount of data to be able to produce a generalised model that is robust to data variations during testing. Having a small dataset could either cause the model to not learn anything or overfit the model. In the case of overfitting a model, whilst the model would achieve high performance during training, it would be very likely show poor performance when tested with different datasets even in the same context.

Rocky-CenterNet was proposed for detecting rocks in an image using ellipses rather than boxes to bound the region of a detected object (Loncomilla et al. 2022). An ellipse provides a tighter bounding box around a rock compared to using a rectangular box, which tends to include more background area around the detected rock. The tighter-fit bounding region allowed for a more accurate rock parameters estimation, such as width-to-length ratio which achieved a mean average precision of 0.73 at intersection over union (IoU) of 0.5 and 0.75 which is outperformed Mask R-CNN (mean average precision of 0.71).

Gomilanovic et al. (2022) proposed a method based on an LSTM network to estimate machinery (i.e. a bucket-wheel excavator and belt-conveyor) failure based on the history of failure data collected for a particular machinery used on a mine site. The likelihood score of the machinery to fail over a period of time was then compared to an analytical method which produced a mean failure RMSE across different types of errors i.e. mechanical, electrical and others of 0.008 versus 0.068 that of the analytical solution.

Other than comparing DL method to analytical method, Mansouri et al. (2019b) compared different CNN architectures for classifying junction types from images obtained from a UAV flown in an underground tunnel. The results showed that AlexNet performed the best compared to GoogleNet and Inceptionv3 (Szegedy et al. 2016) with and average accuracy of 89% compared to 74% and 63%, respectively. This should be a common practice when implementing DL methods in different environments where the base performance of multiple networks should be identified before applying network modifications to better suit the applied environment.

In geotechnical management, Lu et al. (2020) proposed a network which uses U-Net as the based network and incorporated feature extraction blocks which is made up of VGG and Inceptionv3 network in the encoder to perform semantic segmentation of rock fractures. Due to the small amount of dataset, transfer learning was performed to train the network which produced a pixel-wise f1 score of 93% compared to 78% of Inceptionv3-U-Net and 21% of U-Net.

Point clouds of rock surfaces were discretised into smaller chunks by means of voxelisation for the task of joint and fracture semantic segmentation by Battulwar et al. (2020) and Azhari et al. (2021). A dense point cloud is required for semantic segmentation of small objects such as a fracture on a relatively large area coverage of surfaces. The voxelisation of a point cloud limits the number of points used per sample, which reduces the memory usage during training on limited computing resources rather than using the whole scan in a single data sample. Point clouds produced from a LiDAR scanner could exhibit uneven point density distribution due to the distance differences between objects and the scanner in a scene. A voxel-based down-sampling method was introduced by Azhari et al. (2021) to preserve low-density regions in a LiDAR-produced point cloud. This approach ensures low-density points representing cracks are preserved for the network to sufficiently distinguish the different geometrical features of a cracked surface.

In the ore preparation application, a number of articles proposed a hybrid type architectures that combines parts of network to perform specific task into an existing network architecture. Xiao et al. (2020) proposed RDUNet which combines residual network of ResNet and DUNet (Jin et al. 2019), Li et al. (2022a) replaced CNN layers with deformable convolutional layers (Dai et al. 2017) in YOLOv3 while Chen et al. (2022) proposed Res-SSD which integrates ResNet with SSD (Liu et al. 2016) to perform semantic segmentation for different purposes. The deformable YOLOv3 (Li et al. 2022a) managed to achieve a mean average precision of 98% compared to 96% of YOLOv3 with normal convolutional layers while Res-SSD managed to score a mean average precision of 84% compared to 71% and 74% that of SSD and YOLOv3, respectively.

In the mining industry, data for a particular application might be hard to obtain for research purposes due to intellectual property considerations, poor data quality or the fact that the data have never been collected before. In the latter case, a sensor rig might need to be established and the data collected over a long time period before having sufficient data to be able to train a DL model. To overcome this issue, images were taken in a lab of a model haulage truck with varying amounts of load capacity to train a DL model to learn the load capacity class (Sun et al. 2021). The lab images were then added on top of images taken on a mining site as the training set. As for the test set, only mine site images were used to verify the model’s performance in real world conditions. Since the research was targeted for adaptation and deployment at mine sites, it is important that only the dataset collected from the mine site was used during testing. The environmental differences between a lab and on-site might influence the images taken and hence the model trained using the dataset. Such differences could include variations in lighting conditions, presence of dust and sensor setup.

Table 4 Description of articles surveyed for ore preparation in mine extraction processes

Data augmentation is another method for increasing the number of available datasets during training. For an image-based model, the input image could be augmented by rotating, flipping, scaling, cropping, skewing, noise addition or any combination of these at once (Si et al. 2020; Alzubaidi et al. 2022; Lu et al. 2020; Suprunenko 2020; Mustafa et al. 2020; Olivier et al. 2020; Liu et al. 2021c, e, g; Pan et al. 2022). Similar image data augmentation could also be applied to a 3D point cloud. Additionally, point clouds could be discretised by using overlapping grids such as overlapping voxels (Azhari et al. 2021). Similar to creating images in the lab, the augmented dataset is used only in the dataset for training and should not be included for testing the trained model. When designing a data augmentation pipeline, a sanity check should be performed to ensure the augmented data would still represent real data. This is especially true when applying data skewing which could distort an image such that the object in the image does not represent how the object is in reality.

In DL, the weights and biases of the function that maps the input to the output are typically optimised using backpropagation. As mentioned earlier, many datasets are required to find the optimised weights and biases that generalise the problem space. Transfer learning is a method where a model’s weights and biases are initialised using those from a model trained using a different dataset. In the mining extraction process, transfer learning was applied by Bewley and Upcroft (2017), Mansouri et al. (2019b), D’Angelo et al. (2019), Yi et al. (2022), Olivier et al. (2019), Liu et al. (2021d), Yang et al. (2021) and Wang et al. (2022b) for initialising their DL network using weights and biases of ImageNet dataset-trained layers respectively.

Table 5 Description of articles surveyed for mine safety in mine extraction processes

A question to ask in DL implementation is: How generalised and/or robust is the trained model?. This question identifies if the trained model is overfitting to the dataset used for training and testing and would likely show poor performance when applied to a new batch of data. Cross validation is a method in ML whereby a trained model is tested with different combinations and permutations of different datasets or dataset splits. In the k-fold validation method, a k number of dataset pool is created by uniquely splitting the collected dataset to train and test k number of times assuring high variability in terms of the data combinations in each set, as implemented by Baek and Choi (2020), Choi et al. (2021) and Erdogan Erten et al. (2021). The deep learning model is then trained k times using a different dataset pool in each training session. The average performance across the sessions is then considered to be the final performance.

In cases where a number of datasets are available, the DL model generalisation performance can be validated across the different datasets. These datasets are collected in different contexts to each other, with different factors such as different sensors, locations and weather. Rather than splitting a single dataset into training and test sets, the whole dataset is used for training and the trained model is tested on a different dataset. For example, in Azhari et al. (2021), extra cross-validation was performed by testing the trained model on different datasets captured from different rock formations. The CNN proposed by Mansouri et al. (2018) was trained using a dataset collected from a mine and tested using a dataset collected from a different mine, and vice versa. The trained model for heading estimation was tested by applying different datasets collected by varying the mine environment and location, and aerial vehicle parameters such as sensors, illumination and velocity. A sensitivity analysis allows determining the robustness of a trained model to changes in parameters of the system.

5.3 Reclamation

Reclamation is the process of minimising the negative effects of mining activity on the environment and restoring used land either to its former ecological functionality or for economically beneficial purposes. Land usage after reclamation could include agriculture, farming, development for residential or commercial purposes and wildlife habitation. The process of transforming mine lands happens after the mine’s end of life when the mineral deposit has depleted or is not profitable enough to be extracted.

Even though the land transformation happens after mine closure, the planning required and related work for reclamation starts during the mine planning phase and stretches through the extraction process. Related activities for mine reclamation include land cover mapping, change of land usage monitoring, and mine hazard investigation such as landslide, subsidence and pollution analysis.

Fig. 8
figure 8

Distribution of published articles relating to mine reclamation activities

Figure 8 shows the distribution of mining processes involved in the mine reclamation phase and Table 6 outlines the specific DL application applied in each article. Reclamation processes are categorised into two: (a) Land cover, the process of mapping the change in mine land usage; and (b) Mine hazard, the process of monitoring and preventing mine hazards that could have an effect on the environment. Overall, DL-based mine reclamation articles account for about 16% of the total DL in mining articles reviewed in this survey. Out of all the reclamation articles, the majority, i.e. 15 articles, implemented a CNN which takes in images either from a camera or derived from other sensors such as LiDAR and satellite images. The rest, i.e. 3 articles, implemented an FCN to perform classification or estimation using 1D mine data.

Referring to the DL articles in mine reclamation distribution in Fig. 8, 4 articles were published relating to change detection which accounted for about 22%. Even though the research is focused on detecting changes in land usage, only one article uses a DL technique to segment out the differences in land change over time (Tang et al. 2021). This is achieved with a Siamese CNN which takes in two images as input i.e. images of the same area from different time frames and outputs the differences in land use between the two time frames. Meanwhile, the rest of the change detection articles used DL as a tool to segment different types of land use from an image of the same area taken over a period of time. A non-DL-based technique is then applied to the segmentation output comparing the area differences over the period. Wang et al. (2020) implemented a Mask R-CNN which is a DL network that outputs bounding boxes surrounding the region around the object of interest and segments the pixels within the bounding box representing the object. This method provides better classification performance compared to a network that is purposely designed for semantic segmentation such as U-Net. Mask R-CNN restricts the region in the image for semantic segmentation to regions of high confidence where the objects of interest are localised.

Table 6 Description of articles surveyed in mine reclamation processes

The majority of mine reclamation articles reviewed are related to land cover mapping, which accounts for 10 articles, about 56% of all reclamation articles reviewed. Malik et al. (2021) used a multi-input approach by combining image and surface models derived from UAV imagery. For a DL network that takes in multiple inputs rasterised as an image layer, Chen et al. (2020) proposed a feature input reduction method to filter and keep input data that contributes to the performance of the network. Yan et al. (2021) applied transfer learning during model training using weight and biases from the same model trained using a different image dataset, ImageNet. Other than transfer learning, data augmentation can also be used to increase the number of datasets for training, for example, Xie et al. (2021) flipped and rotated the training images in different ways to increase the number of training images and randomly changed the luminance and colour space of these images to simulate images taken in different seasons and lighting conditions. These generated images were then used to trained a hybrid DL architecture based on U-Net and SegNet which during testing achieved a pixel-wise f1 score of 67% compared to 63% and 65% from U-Net and SegNet, respectively.

Ji and Luo (2021) proposed an ensemble learning methodology based on CNN to perform semantic segmentation of different land types in a multispectral image. The proposed method was then compared to other DL and machine learning methods whereby proposed ensemble method achieved a pixel-wise accuracy of 94% compared to 87%, 83%, 76% and 72% of those from the base CNN, ANN, extreme learning machine (ELM) and SVM, respectively.

The rest of the published mine reclamation articles related to mine hazards, which accounts for 4 articles, about 22% of the reclamation articles reviewed. Among these articles, Luo et al. (2019) performed feature filtering for the input data and a k-fold validation process.

6 Discussion

The implementation of DL methods in the mining research context has grown exponentially since 2017, as shown in Fig. 3. This adoption is most likely driven by the maturity of DL algorithms, which have rapidly advanced since the outstanding achievement of AlexNet (Krizhevsky et al. 2012) in 2012 and the success of DL implementations in other industries. Additionally, the availability of large datasets in combination with capable computing resources enables rapid implementations of DL in the mining research context, enabling the kinds of implementations and trials documented in this paper as a basis for better understanding the potential economic impacts of DL applied in mining. The aim of the study was to provide a generalised and compact comprehensive review of DL implementation in the mining industry. The review outlines the general type of network used in each study including the tasks that the respective DL methods were aimed to solve. The specifics of the network designs and datasets used have not been detailed since they have been applied to different cases that might not work for or generalise to other cases.

The review has shown that firstly, the trend of DL adoptions in the mining context shows an exponential increase and secondly, there are wide range of different situations across the exploration, extraction and reclamation process for which DL methods were implemented and that the reason for applying deep learning are estimation, classification and, to a lesser extent, semantic segmentation. It is important to discuss some of the learnings from the review in term of the range of methods applied, the availability of data and the challenges that the mining environment created for the implementations.

6.1 Range of deep learning methods applied

The majority (70%) of the articles included in this review implemented a CNN method. This could be due to the maturity of this type of network following its rapid improvement post-AlexNet. The affordability of high-quality image-based sensors and their ease of deployment drive the availability of large mining datasets for developing the DL model. CNN is widely used in the included articles such that other forms of data such as point clouds, geological data and hyperspectral spectrum are rasterised into images to be used as the CNN input. The conversion of data types from these sensing technologies could cause the loss of rich information captured from the complexity of a mine environment.

While CNNs have been proven to work effectively on images, it is not necessarily confirmed that they would be as effective on other data types converted to images. The conversion itself could cause loss of valuable information. As an example, an FCN could be used for image classification tasks by flattening 2D image pixel matrices into 1D vectors. Such an FCN implementation for image classification could produce good results but was more effective when using a CNN. Similarly with point cloud processing where point clouds were converted or projected into images and other forms of point cloud projection prior to the proposal of PointNet (Charles et al. 2017). This showed that selecting the correct DL approach for the data types is vital. Hence, it might be worth developing new architectures for different data types rather than just relying on data conversion to fit the input requirements of a particular network type.

Apart from CNN, the other type of networks adopted in the mining literature are FCN, RNN, DBN and 3D-CNN. A wider range of techniques, such as GAN and graph neural network (GNN) and transformers should be considered for performing tasks such as image classification and detection, air particle estimation, point cloud segmentation, and dust and pollution estimation that can be important to quantify the influence of mines on their environment.

6.2 Data and implementation access

A DL model is a generalised function of a specific task given input data and training the DL algorithm to find the optimal weights and biases that maps the input data to the labelled output. Hence, having the right data for the task is crucial not only for DL but for any AI applications. A common problem of DL mining applications is the limited and poor-quality data that arises from slow adoption of sensor data. The consequences will be over-fitting whereby the model fits the training data well but performs significantly worse during evaluation and applicability to other sites with different geology, design and equipment will be limited.

In comparison with the many thousands or millions of data points in image and financial analysis, the limited number of data points available for mining DL studies shown in Tables 12345 and 6 include such small numbers as 305 images with 8 features up to what could bee seen as large in the mining context as 2 mills with 15,905 data points taken only at half hour intervals. Other studies show researcher using Kaggle concrete datsets to simulate rock (Yi et al. 2022). This suggests the opportunity for improved DL focussed data collection in mining and for researchers to work together to combine datasets for more generic results.

In terms of data availability, the majority of the datasets used in the included articles were collected in particular mine sites and were not publicly accessible. A dataset collected from a particular mine site might not be generalisable to the same application in a different mine site, or even at the same mine site over time. This could be due to differences in sensors, sensor setup and environmental conditions, and also changes in the nature and content of excavated material as mining progresses at a site. A number of articles used publicly available datasets obtained from government led projects, organisations and general internet searches. These articles mostly focused on applications that use mapping data. The topological map of mining areas in West Virginia, USA obtain from WVU (2022), USGS (2022a) and USGS (2022b) were used by Maxwell et al. (2020a, 2020b) to classify land usage in mine sites. Similarly, data from Google Earth (Google Earth 2022) and images obtained from satellites such as Sentinel (ESA 2022), Landset (NASA 2022), and Gaofen (CRESDA 2022) were used to obtain RGB and hyperspectral images of a mining areas where the respective satellite covers either a full site (Balaniuk et al. 2020; Tang et al. 2021; Kumar and Gorai 2022; Xie et al. 2021; Meng et al. 2021; Chen et al. 2020) or by combining images from different sources taken of the same area (Malik et al. 2021; Luo et al. 2019; Wang et al. 2020). Even though these satellite datasets are publicly available, the proposed methods were not cross-validated with images of different mine sites. Similarly, well logs from Rio Bonito, Brazil collected by the SGB (2022) were used to classify different types of facies by Santos et al. (2022) and should be tested against data from other countries. Images of different rock types collected from general search engines were used to train a rock type classifier (Asiedu et al. 2020), however, the collected images were not detailed or published for replication.

Despite the limited publicly available data, organisations such as Humyn.ai (Humyn 2022) have been organising data science challenges to solve proprietary mining and resources-related problems proposed by mining companies. However, these challenge outcomes remain confidential and cover a wide range of disciplines, which includes data mining (analysis of large datasets to extract patterns), giving talks as well as machine learning.

Apart from publicly accessible datasets, an open access to the code written associated with the published articles provides a detailed technical implementation which might not be sufficient with the high-level ideas and methodology presented in a research article. Appending source code to a published article should be encouraged as a part of submission requirements such as currently practised in Computers and Geosciences (Jin et al. 2022; Hao et al. 2022; Liu et al. 2021d; Xu et al. 2021).

6.3 Challenges for deep learning in the mining industries

A general question to ask is whether a deep learning method developed in a research or prototyping context is suitable to be deployed in the industrial context. Two factors for consideration includes the model’s suitability in the deployment environment as well as the feasibility of adopting such technology, e.g., cost benefit analysis. Deep learning is a data driven method where the model produced is optimised for the particular dataset used during model development. A model should perform comparatively well in the operational environment if the model was well fitted to the training dataset used to develop it, i.e., in terms of noise and systematic variations. However, including the same noise and variance as expected in the output can be difficult in an environment where the geology and operational processes are varying continuously in space and time. This is a general challenge for a priori learning of models that are intended to be applied in dynamic environments. This will require further evaluation in terms of the contextual similarities between the experiment and the deployment in the mining environment and would benefit from conducting a sensitivity analysis and more testing in different, real environments than is often indicated in the papers. Potential methods for addressing this challenge include the development and application of methods for characterising operational dynamism, the integration of prediction in learning methods and their resulting models, and the use of adaptive models. This paper does not go into these topics since they go beyond the current focus on reviewing the literature on DL in mining.

This review highlights the reported comparison between the proposed DL method to other methods such as ML and DL of other architecture if available in the collated articles. This should be a common practice when proposing a DL framework especially if the dataset and/or the implementation code is not publicly accessible in order to provide a fair point of reference to compare the propose performance to. Although the DL approaches were shown to be beneficial in comparison, the difference in these task-specific networks and the performance uncertainties in the very different mining environments and methods as well as the challenging data collection conditions make it difficult to predict the performance of a specific DL implementation in other mining conditions.

6.4 Future directions

The future step after DL implementations in the mining research space should be to transfer the technology for practical adoption. This can be done by making DL an element of an automation process/framework whereby sensor data is interpreted online rather than relying on data collection for offline testing. The automation framework could include sensors more than just a stand-alone camera, such as LiDAR, radar, hyperspectral, environmental/weather sensors and encoders. The automation process should include a sensor interpretation fusion for thorough situational modelling rather than just relying on one sensor and data type for a single task.

One reason why DL is not yet popular in terms of adoption in the mining context could be due to how DL is treated by practitioners as a black box without much understanding of the uncertainties in the model, especially in terms of what computations the model carries out when making a decision. A DL method that incorporates symbolic representation (Yi et al. 2018) enabling a human-level understanding of the network’s inference logic has been proposed in an effort for explainable DL. Understanding the network’s explainability i.e. the rationale of a model’s output, would strengthen trust in DL adoption in the mining and other industries. Trust might also be improved by better mathematical modelling of the bounds and degradation characteristics of a DL model and in relation to the dynamism of its operational context.

In terms of uncertainties, performing a sensitivity analysis should be conducted before deploying DL into a complex environment such as a mining operation. Mining specific datasets with adversarial attacks applied, similar to Barbu et al. (2019), should be established to develop a robust DL model for such disturbances (Ren et al. 2020). A model capable of handling adversarial attacks would be more robust when deployed in the real world. The dataset collected in developing a deep learning model could be just a subset of the full environmental context. Incorporating an adversarial network could handle the uncertainties from the out-of-distribution context to a certain extent.

7 Conclusion

This work presents a compact comprehensive review of DL implementations in mining processes. The review has considered which DL methods are implemented in mining research to try to automate the solution of various tasks in exploration, extraction and reclamation-related processes. The adoption rate showed a sharp increase over the last 5 years, even with the slow initial start after DL became mainstream within research communities. DL can be useful in the mining industry applications where subjective opinions are used to make decisions including cases where available models are inaccurate or unsuitable given a large amount of available data and long analysis times due to mine specific trends resulting from the extraction sequence overlaid on the geological formation, care is required to apply models and celebrate successes outside of their training domain. However, further investigations are required to understand the relationships between the development and deployment environment as well as to understand how a model works i.e. the rational of the model’s output.

The similarity of DL frameworks implemented for mining tasks led to a focus on the processes that facilitate learning such as data pre-processing, training and validation methods. These complementary processes enable learning of the model specific to the task and enable the understanding of the trained uncertainties and optimum operational conditions.

Compared to processes such as exploration and reclamation, the extraction process accounts for 71% of articles; extraction is specific to the mining application field, whereas the other applications share knowledge domains such as geoscience and geotechnical engineering.

Most articles adopted a CNN designed for 2D image-based processing taking in image data from vision-based sensors such as cameras, microscopes and satellite images. Different types of data such as point clouds, geological data and signal data were converted into image pixels to enable their use as a CNN input. The implementation of other types of networks such as GAN and GNN should be beneficial to mining research, since these networks have been proven to show better performance in similar applications in different industries.

With the increasing number of deep learning implementations in mining research, deep learning methods have the potential for wider adoption in practice on-site. Importantly, the transition of the mining industry to remote control or automated equipment to enhance the safety of the mining environment provides an important opportunity to integrate Deep Learning within the automation pipeline, which includes the interaction and fusion with sensors and IoT devices. The understanding of trained model uncertainties and model inference could better enforce the trust for the mining industry to adopt deep learning methods in the real world.