1 Introduction

In the process of underground coal mining, it is paramount to identify the interface between coal seams and rock strata, which lays the foundation for accurately controlling when and where to lift the rocker arm of the shearer. Nevertheless, a technical problem still stands out—how to identify rock strata and coal seams in an accurate and quick manner.

In recent years, many scholars both at home and abroad have carried out numerous experimental and theoretical studies, where dozens of approaches to deal with coal–rock identification have been put forward. The Pittsburgh Research Center of the United States Bureau of Mines was the first to put forward coal and rock identification technology based on infrared detection methods. In accordance with this method, the shearer is applied to cut the rock from the coal seam, and different temperatures are reached. An infrared temperature sensor with high sensitivity is then adopted to detect and explore the temperature of the shearer’s pick, which means to determine whether it is a coal seam or a rock layer (Markham et al. 1990; Li et al. 2020). Wang et al. (2021) proposed an identification method for coal and rock interfaces that is based on the integration of information gained by multi-sensors. This method takes into consideration the vibration signals, current signals, acoustic emission signals, and infrared flash temperature signals during the cutting process of coal and rock with diversified proportions as well as picks of different wear degrees, thus establishing a sample library of multi-cutting signal characteristics with picks of different wear degrees. On the basis of the “and” decision criterion, in line with Dempster–Shafer (D–S) theory, the accurate identification of coal–rock interfaces is achieved (Wang et al. 2021). Nevertheless, this method sees its reliance on the relative movements between the picker on one hand and the rock layer or the coal seam on the other hand. When little difference can be discerned among the mechanical properties of the coal seam and the rock layer, this method will bring about extremely low accuracy, making it hardly effective in practice.

Using a probabilistic neural networkand fruit fly optimization algorithm, Si et al. (2016) put forward a diagnosis method by taking advantage of the vibration of the rocker arm transmission part. However, the on-line real-time processing of high frequency signals on the devices or equipment calls for high requirements and demands, which can hardly be satisfied by the shearer’s computer. In addition, the vibration signal significantly fluctuates in line with the alteration of the shearer’s position and attitude. Zhang et al. (2013) put forward a method to identify coal–rock interfaces based on principal component analysis and a Back Propagation (BP) neural network. In accordance with this method, the time domain signal of the shearer drum torque is first extracted, followed by the compression of the time domain signal via a principal component analysis. The final signal is eventually input to the BP neural network, and coal–rock identification is carried out. In accordance with Guo, a novel nonlinear feature selection method is put forward within the framework of a support vector machine to determine a better solution for multiple classifications. The effectiveness and superiority of this proposed feature selection method have been demonstrated through experimental results based on multiple data sets (Guo et al. 2021). Taking the advantages of two sorts of in situ coal and rock data, Yang et al. (2019) established the support vector coal classification model, which showed significant improvements in identification accuracy.

As the development of deep learning gains momentum, many researchers apply image identification technology to coal–rock recognition. The first step is to obtain the sectional images of coal seams and rock strata, followed by the application of image enhancement and denoising technologies for extracting features, and eventually coal–rock recognition can be gained (Huang and Liu 2015; Emily and Zhongyi 2012; Wang et al. 2003; Guo et al. 2019; Zhang et al. 2021). Zhang et al. (2020) integrated the deep learning target detection algorithm YOLOv2 based on a regression equation with a linear imaging model and successfully implemented intelligent identification and positioning of coal and rock images collected underground with the application of this algorithm. As a result, the accuracy rate of identification by means of YOLOv2 for coal and rock reached 78% (Zhang et al. 2020). Xing et al. (2021) used lidar to obtain 3D point cloud information of coal–rock samples, including distance and echo intensity, and then used gray-level co-occurrence matrix and for feature extraction and finally DenseNet-40 for fast coal–rock identification. The above coal–rock identification by image methods are applied in coal processing plants and are not applicable to the differentiation of coal–rock interfaces during mining.

Near infrared (NIR) reflectance spectroscopy has the advantages of high efficiency, high accuracy, and no damage to the sample; it basic principle is that objects composed of different phases have different characteristic absorption bands in the NIR spectral region, and it has been experimentally proven that changes in environmental factors during detection only cause the overall drift of the spectral curve and do not change the location and depth of the characteristic absorption bands. NIR spectra are obtained by scanning samples using an NIR spectrometer (Zhen and Boshen 2021; Roy et al. 2021). This technology has long been widely applied in remote sensing and has grown mature; it has also been adopted in the component quantitative analysis and detection of coal, minerals, soil, and other elements (Fang et al. 2018; Goetz et al. 2009; Kaihara et al. 2002; Sgavetti et al. 2006). Farrand et al. (2021) took advantage of integrated methods, including the expert system/spectral feature fitting material identification and classification algorithm (MICA) as well as low-abundance substance detection to match absorption characteristics in image spectra with those in a user-defined spectral library. The purpose of the experiment was to trace the acid-producing minerals and diffused metals of a field in northwest India. Qi et al. (2021) adopted the method of Savitzky–Golay and continuum removal to smoothen soil spectral data. With the integration of the spectral angle cosine and spectral correlation coefficient algorithms, the classification of soil spectra was realized. Scafutto et al. (2021) took advantage of airborne hyperspectral thermal infrared data to detect petroleum hydrocarbons in continental areas. Zhao et al. (2012) applied both principal component analysis and a self-organizing mapping neural network-Fuzzy C-means clustering (SOM-FCM) to filter out disputes and the reduction of the dimension of coal NIR spectral data. A coal ash prediction model was built on the basis of G Algorithm-Back Propagation neural network, which increased the learning accuracy rate of the model in an effective way. In terms of the NIR spectral analysis of coal ash on the basis of machine learning, researchers have proposed a vast majority of learning algorithms for increasing the quality of modeling spectral data and estimating the performance of the model (Meng 2013; Cloutis et al. 2018; Milton et al. 2009; Liang et al. 2016; Zou et al. 2020a, 2020b). As more and stricter demands have been imposed on intelligent tunneling technology, along with the raging advancement of big data technology, an explosive increase in the number of samples has been observed, creating much higher demand for hyperspectral data identification of coal and rock.

In this paper, we proposed a convolutional neural network (CNN) coal–rock recognition method based on coal–rock hyperspectral data and design an optimal network structure, which effectively improves the accuracy of coal–rock recognition.

This thesis can be divided into five sections: Section one refers to the description of the coal spectral acquisition experiment and the pre-treatment of experimental data. Chapter two refers to a brief introduction of the CNN, with a one-dimensional (1D) CNN model being put forward, as well as an explanation of the selection of the model parameters. Chapter three compares the experimental results of this method with those of other methods where coal hyperspectral data sets were applied, which successfully proves the effectiveness of this method. Chapter five presents the conclusion of this research.

Based on the collection of a total of 120 samples of carbonaceous shale and bituminous coal in the same fully integrated mining face, the reflectance spectra of the NIR band (1000–2500 nm) at 1.5 distance from the sample surface of massive coal were acquired with the application of an NIR spectrometer in the laboratory. In accordance with the yield and reflection spectrum of coal ash, a 1D-CNN coal prediction model was created.

2 Acquisition of reflectance spectrum data of coal and rock samples

2.1 Spectral acquisition experiments

120 large block samples of carbonized shale and bituminous coal were collected at the junction of roof and coal seam from a field of fully mechanized mining faces in the same coal mine, which were stored in sealed bags and included 96 coal samples and 24 rock samples, both of which were black and similar in appearance. The reflectance spectrum of the flat surface of each sample was acquired in the laboratory. For achieving the reduction of the influence of bidirectional reflection characteristics of measured coal and rock materials, the application of a 100 W tungsten halogen spot was adopted for illuminating the center of the selected flat surface at an incident Angle of 90°, thus to achieve a circular light spot with a diameter of approximately 10 cm and an illumination of approximately 20,000 lx. A neoSpectra NIR spectrometer (Netherlands) acquired the spectra within a wavelength range of 1000–2500 nm and a spectral resolution of 8 nm.

First, a Poly tetra fluoroethylene whiteboard was used to adjust and calibrate the reflection reference. A coupling lens was used for beam collimation, which was connected to the head of a quartz fiber attached to a laser designator to the side. The other end was connected to the spectrometer for spectra acquisition. A fixed collimator was used for alignment of the detection target. The adjustment was made on the collimator axis for aligning the spot center on the sample surface vertically, where the distance between the collimator and the spot center was kept at l = 1.5 m. In accordance with the experimental diagram shown in Fig. 1, the spectrum collected and acquired by the spectrometer was the average reflectance spectrum of the circular area at the bottom of the sample surface, which was generated by the field of view angle of a orthophotoscope on a circular platform. The spectrometer was connected to a computer via USB3.0 for acquisition of the reflectance curves, followed by pre-treatment. The reflectance spectra are shown in Fig. 2.

Fig. 1
figure 1

Experimental platform for acquisition of coal and rock reflectance spectra

Fig. 2
figure 2

Example reflectance spectra acquired in the experiment

2.2 Sample reflectance spectral data pre-processing

2.2.1 Principle of hyperspectral coal and rock identification

With the deepening of coal rank, an increase in the degree of aromatic condensation has been observed, along with a reduction in bridge bonds, side chains, and functional groups, which indicate a reduction in the content of ash and volatilization. It is inevitable that a well-ordered internal arrangement of molecules together with a plummet of the degree of parallel orientation among molecules will lead to anisotropy. Many coal properties have witnessed prominent transformation among middle metamorphic bituminous coal, fat coal, and coking coal, which signals qualitative changes due to quantitative changes in their structure.

From the anthracite stage, the molecular arrangement gradually shifts to a graphite structure, with a high condensation of aromatic rings. With the reduction of Coal rank division, a decrease in the regularity of coal molecular structural units and models is observed, along with a reduction of the number of condensation rings in structural units and an increase in the number of fatty side chains and oxygen-containing functional groups. For fatty side chains, the absorption fundamental frequency is generated by oxygen-containing functional groups and aromatic structures in the mid-infrared band, while an increase in frequency combination and frequency doubling is achieved by fatty side chains and oxygen-containing functional groups, with their absorption fundamental frequency in the NIR band. In such cases, contributions are made to the generation of more absorption valley characteristics of low-rank coal in the NIR band.

It is worth mentioning that the structure of coal volatilization was determined by fatty side chains and oxygen-containing functional groups, where their increase contributed to an increase in low-rank coal volatilization. Hence, taking into consideration the reduction of coal rank, fatty side chains and oxygen-containing functional groups are, on the one hand, regarded as the major factors at play for the increase of absorption valley characteristics in the NIR band, and, on the other hand, the major cause for the rise of volatile yield. This, in accordance with the evidence given above, it is safe to draw the conclusion that a relationship could be determined between the volatile yield of coal types and absorption valley characteristic parameters in the NIR band.

In accordance with the principles proposed related to organic chemistry and infrared spectroscopy, the functional groups of respective absorption peak of coal are demonstrated in Table 1.

Table 1 Coal–rock hyperspectral absorption band

In accordance with Fig. 2, the major spectral characteristics of coal and gangue are presented. Prominent fluctuations were observed in the spectral curve of gangue in the overall interval, along with prominent absorption valleys that could be largely attributed to the proportion of hydroxyl groups in gangue. The reflectance curve of bituminous coal remained at a relatively low position across the whole band, with slight and non-obvious alterations. Beyond 1200 nm, slow exchanges could be seen in the reflectance in the same way a gangue. In the 1850–2050 nm and 2150–2350 nm bands, it is fair to say that the absorption valley was due to hydroxyl groups. Major differences could be seen between the two samples: (1) At 900–2500 nm, the reflectance of gangue was higher than that of coal. (2) A large swings could be seen at 1900–2150 nm due to the rapid increase of spectral reflectance of gangue, while that of coal was smaller in comparison within this band. (3) In the NIR band (2300–2500 nm), a decrease could be seen in the reflectance of gangue, while most of coal demonstrated an upward trend or basically remained unchanged.

In light of the above analysis, with the increase of coal rank, the degree of aromatization of the coal molecular structure would also increase, where the coal molecular structure model showed a trend of graphitization together with an increase in C/H ratio. This meant that an increase could be seen in the fixed carbon content of coal. In addition, a decrease could be seen in the inclination degree of the reflectivity spectrum curve of coal in the NIR band of the long-wavelength direction. Hence, it is safe to draw the conclusion that the higher the coal rank, the smaller the spectral slope, the higher the fixed carbon content, and the lower the spectral reflectance. As for the visible/short-wave NIR band, it was not easy to determine universal characteristics of the different absorption valleys that stood out between coal and rock. In comparison with coal, a large part of gangue demonstrated prominent absorption valleys in the long-wave NIR band, which were mainly concentrated in four bands, namely, 1400 nm, 1900 nm, 2200 nm, and 2350 nm. From the perspective of the overall waveform, an overwhelming part of the coal measured rocks were much more wavy, with an upper convex waveform. However, the coal was much more stable and showed little change, marking a lower concave waveform. The absorption valley characteristics of the reflectance spectra were fully leveraged to distinguish between coal and rock, and two absorption valleys at 2200 nm and 2350 nm were regarded as the major differentiating features, with the former taken as the priority feature.

2.2.2 Data pre-processing

In the process of collecting the spectral data, the external light conditions of the experimental environment were constantly changing, and because a calibration could not be done frequently using the calibration white board, there was a baseline drift problem. Thus, pre-processing was necessary so that the spectral data of all samples were on a uniform baseline. First-order differentiation and second-order differentiation view the rate of change of the spectra by means of derivatives, which to a certain extent weaken the influence of changes in external conditions on the experimental results.

Because the samples for the spectral acquisition experiments in this paper were block samples, it was difficult for their surfaces to achieve the ideal uniformity of powder samples; the surface inhomogeneity will cause scattering when light passes through or reflects back from the samples, which will bring errors to the sample spectra. The standard normal transform subtracts the spectral mean from the original data and divides it by the sample deviation to obtain the normalized sample data. Convolutional smoothing denoising is used to eliminate random noise during device operation and improve the signal-to-noise ratio. Pre-processed spectra are shown in Fig. 3.

Fig. 3
figure 3

Hyperspectral data pre-processing a First-order differential; b Second-order differential; c SNV transformation; d Polynomial smoothing pretreatment

2.2.2.1 Differential processing

The elimination of the influence of background drift could be realized through the derivative of spectral data. The first derivative worked to remove the constant shift of the background, and the second derivative worked to remove the linear shift of the background. In every individual computer, the storage of each spectrum was in the form of a two-dimensional array—one dimension for storing the information of the horizontal axis of the spectrum (i.e., wavelength) and the other dimension for storing the absorbance or spectral response value. For n samples with the number of spectral points being P within the same interval, integration of the spectral information could be realized and then stored in an N × P data matrix. The derivative of the spectrum could be obtained by applying the multipoint numerical differential formula:

$$\frac{{\text{d}y}}{{\text{d}\lambda }} = \frac{{ - 2y_{{i - 2}} - y_{{i - 1}} + y_{{i + 1}} + y_{{i + 2}} }}{{10\lambda }}$$
(1)
$$\frac{{\text{d}^{2} y}}{{\text{d}\lambda ^{2} }} = \frac{{ - 2y_{{i - 2}} - y_{{i - 1}} + 2y_{i} + y_{{i + 1}} + y_{{i + 2}} }}{{7\lambda ^{2} }}$$
(2)

Equations (1) and (2) could be obtained from the five-point quadratic smoothing formula, where y refers to reflectivity, and \(\uplambda\) refers to wavelength interval.

2.2.2.2 Standard normal variation

As the samples were both solid and liquid, the realization of ideal uniformity could hardly be achieved. The inhomogeneity of the sample would contribute to the scattering of light as the light passes through or reflects back from the sample. In such cases, the scattering of light would inevitably bring error to the sample spectrum. The application of standard normal variation (SNV) was conducive to correct spectral errors that were caused by scattering. In accordance with this method, the absorbance value of each wavelength point in each spectrum should meet a certain distribution (such as a normal distribution), where each spectrum would be treated in advance to ensure that it was as close as possible to an “ideal” spectrum (i.e., a spectrum without scattering error effects). SNV refers to dividing by the standard deviation, S, of the spectral data of the average absorbance of the original spectrum minus all spectral points of the spectrum. In essence, SNV achieves the normalization of the standard of the original spectral data:

$$Z_{{ij}} = \frac{{x_{{ij}} - \overline{{x_{i} }} }}{{S_{i} }}$$
(3)
$$\bar{x}_{i} = \frac{1}{p}\sum\limits_{{j = 1}}^{p} {x_{{ij}} }$$
(4)
$$S_{i} = \sqrt {\frac{1}{{p - 1}}\sum\limits_{{j = 1}}^{p} {\left( {x_{{ij}} - \bar{x}_{i} } \right)^{2} } }$$
(5)

where n refers to the number of samples, and P refers to the number of spectral points.

2.2.2.3 Polynomial smoothing filtering

To effectively eliminate spectral noise due to instrument performance, the environment, and other factors, a polynomial smoothing filtering algorithm was adopted to achieve the filtering and denoising of the spectrum data collected in the experiment. The formula is:

$$Y_{{2m + 1}} = X_{{(2m + 1)}} \times k \cdot Ak + E_{{(2m + 1)}}$$
(6)

This formula takes λ as the center wavelength point and r as the interval range of the wavelength points to show the reflectance spectrum data point vector, where m refers to the sample number, A refers to the smoothing matrix and is calculated from the power function polynomial basis matrix of the interval of the central wavelength points, and Y refers to the spectral vector after the fitting and smoothing of the spectral data. The figures below refer to the spectral curves of the original coal reflectance after first-order differential, second-order differential, SNV transformation, and polynomial smoothing pretreatment in order.

3 Convolutional neural network prediction model based on reflectance spectra

Recent years have witnessed the proposal of 1D CNNs for processing 1D signals and the realization of excellent performance and high efficiency. In a relatively short time, 1D CNNs have gained momentum and have been well recognized in different diversified signal processing applications, including early arrhythmia detection in electrocardiograms, structural damage detection, and high-power engine failure monitoring. In comparison with two-dimensional CNNs, this method is capable of automatically seizing complex features from training samples while simultaneously processing 1D signals directly. In comparison with traditional fault diagnosis methods, however, 1D CNNs excel in processing the original fault data and realizing the end-to-end method in a direct way. The classification model could be much more flexible with this method, and the reliance on expert knowledge could also be eliminated.

3.1 Principle of one-dimensional convolutional neural networks

1D CNNs can be divided into three parts: a convolution layer, a pooling layer, and a fully connected layer. Figure 4 refers to the general 1D CNN architecture. A 1D signal is conveyed into the input layer of the 1D CNN, and the convolution operation is achieved between the input signal and the corresponding convolution kernel for the generation of an input feature map. To realize the generation of the output feature map of the convolution layer, the function should be properly activated. The output of the convolution layer can be expressed as:

$$y_{j}^{l} = f\left( {b_{j}^{l} + \sum\limits_{i \in Mj} {\text{conv1D}\left( {\omega_{ij}^{l - 1} ,x_{i}^{l - 1} } \right)} } \right)$$
(7)
Fig. 4
figure 4

Architecture of a 1D CNN

In the formula, \(y_{l}^{j}\) refers to the output of the \(j\) neuron at the \(l\) layer; \(f\left( \cdot \right)\) refers to a nonlinear function; \(b_{j}^{l}\) refers to the bias of the \(j\) neuron at the \(l\) layer; \(M_{j}\) refers to the sample of an input data set; \(x_{i}^{l - 1}\) refers to the output of the \(i\) neuron in the \(l - 1\) layer, and \(\omega_{ij}^{l - 1}\) refers to the weight of the \(i\) neuron in the \(l - 1\) layer to the \(j\) neuron in the \(l\) layer.

It is usual to adopt a pooling layer followed by a convolution layer, which ensure the reduction of the dimension of the features extracted from the upper convolution layer, decrease the calculation cost, and offer fundamental translation invariance for the features. The formula is as follows:

$$s_{j}^{l + 1} = f\left( {\beta_{j}^{l}\, \text{Max}\left( {y_{j}^{l} } \right) + b_{j}^{l + 1} } \right)$$
(8)

In the formula, \(\text{Max}\left( \cdot \right)\) refers to the sub-sampling function,this thesis selected the maximum sampling; \(\beta_{j}^{l}\) refers to the weight coefficient, and \(b_{j}^{l + 1}\) refers to the bias coefficient.

The output of each neuron in the pooling layer turns into the input of each neuron in the totally connected layer. It is common that the totally connected layer is taken as a classifier in the overall 1D CNN.

3.2 Influence of super-parameter setting on classification accuracy

This thesis put forward a coal and rock recognition model of a 1D CNN, which is shown in Fig. 5. The model took coal and rock hyperspectral data as inputs. In terms of the feature extraction stage, four identical feature extraction layers were designed for the extraction of features from each input sample; each feature extraction layer comprised several parts, including two convolution layers, a batch normalization layer, a ReLU function activation layer, and a pooling layer. Then, a flat layer was applied to convey the two-dimensional feature matrix, which included 1D feature mappings, to 1D feature vectors for the classifier. The addition of sample data was made for the batch processing normalization layer followed by two convolutions of feature extraction, which also ensured that the normalization of data could take place in advance before being conveyed to the ReLU activation layer. In such a case, the speed would increase, the performance would be improved, and the neural network would be much stabler.this thesis took advantage of the maximum pooling method in the pooling layer for cutting the number of values of each feature map to half of the original size.

Fig. 5
figure 5

Fault diagnosis model of a 1D CNN

In CNNs, the choice of hyperparameters is crucial. To investigate the effect of hyperparameters on classification performance in the 1D CNN model, we considered three factors: (1) learning rate, (2) the number of feature extraction layers, and (3) dropout rate. Five levels were included for each factor, as shown in Table 2, and the selection of the range of variation was in accordance with previous studies. Different levels of the five experimental factors are shown in Table 1. On the basis of the design of the experimental methods, an exploration of the influences of different hyperparameters on classification performance in the 1D CNN model was carried out, and the results were used to determine the optimal combination of parameters. The influence trend of each factor on each evaluation index is shown in Fig. 3. In accordance with Fig. 6a, as the learning rate increased, the accuracy rate also dramatically increased at first and then decreased. The peak of the accuracy rate occurred when the learning rate was 0.03. The increase in the number of feature extraction layers from one to five at first contributed to an increase in classification accuracy, followed by a decrease. Hence, the results showed that too many layers of feature extraction would contribute to overfitting and impose impacts on generality, as was shown in Fig. 6b. In accordance with Fig. 6c, the time when Dropout = 0.425 marked the peak of the classification accuracy. With the classification accuracy being an indicator, the overall training process of the CNN model adopted a learning rate of 0.03, a number of feature extraction layers of four, and a dropout rate of 0.425. With the application of the dropout layer (dropout rate = 0.425), 42.5% of the nodes were eliminated in a random way for achieving the reduction of overfitting, which otherwise would contribute to high training accuracy and low testing accuracy. The application of these hyperparameters was conducive to train a new 1DCNN model. With a classification accuracy of 96.8%, this model revealed that the super-parameter set was optimal. In line with the evaluation carried out on the models in the test set, the final classification accuracy reached a level of 94.6%, as shown in Fig. 7.

Table 2 Super-parameters and horizontal factors
Fig. 6
figure 6

Variation trend of each index: a Learning rate, b Number of feature extraction layers, and c Dropout rate

Fig. 7
figure 7

Test results of different parameter combinations

3.3 Evaluation of prediction effect of neural network model

3.3.1 Effectiveness of the pre-processing method

To demonstrate the effectiveness of pre-processing the spectral data, an experimental comparison between the original data and the pre-processed data was performed using the CNN identified above, as shown in Fig. 8. The accuracy of the data after pre-processing was improved by approximately 3% compared to the original data.

Fig. 8
figure 8

Comparison between experimental results of pre-processed data and original data

3.3.2 Effectiveness of model

To verify the effectiveness of the CNN-based approach, a comparison was made between the 1D-CNN model and five other models based on machine learning (BP, SVM) and deep learning (MLP, DBN, SAPSO-DBN). To realize the comparability among the different methods, the author ensured that the optimizer parameters conFiguration, cost function, and activation function applied by 1D-CNN and DBN were consistent in this thesis. The first, second, third, and fourth layers of the MLP model were whole connection layers of 400, 300, 200, and 100 neurons, respectively, and the activation function was Relu, with a dropout rate of 0.425 for every totally connected layer. The fifth layer was the output layer, where two neurons were held and divided in accordance with Softmax. The learning ratio was set as 0.002. Adam was taken as the optimizer, and the loss function was entropy. The size of the batch was 32, and the number of iterations was 40. GridSearchCV (10 times cross validation parameter) was taken as the support vector machine model. Gaussian kernel was adopted as the kernel function of support vector machine. The penalty factor, C, was set as 16, and the gamma (for the width of the Gaussian kernel and the determination of the distribution of the data mapped to the new feature space) was set as 0.002.

For a much more accurate assessment of the performance of the models, each model was cross-validated by a factor of 10. The experimental results of the different coal and rock identification methods are shown in Table 3 and Fig. 9, where Table 3 also shows the accuracy rate of the six models. The average accuracy rate of the 1D-CNN network was 94.6%, which was 27.8% and 22.4% higher than that of BP and SVM, respectively. In accordance with the results, these two coal–rock recognition methods based on machine learning exhibited inferior network performance in comparison with the 1D-CNN put forward in this thesis. This phenomenon could be explained by the following: the application of the BP algorithm called for pre-processing the data to a certain extent, as when dealing with complex and intricate classification problems, the traditional algorithms of shallow-feature machine learning could not well meet the requirements of feature extraction, nor could they demonstrate the mapping relationships among diversified data in an accurate way. SVM held an inferior performance for datasets with many feature points, with good sensitivity towards random signals, making it be prone to contribute to overfitting.

Table 3 Accuracy of six different models
Fig. 9
figure 9

Accuracy of each method in ten tests

The average accuracy rate of 1D-CNN was 19% and 7.8% higher than that of MLP and DBN, respectively. The average accuracy of DBN (86.8%) and MLP (75.6%) was higher than that of SVM (72.2%) and BP (66.8%). According to the experimental results, the deep learning methods were superior to those of machine learning, as the latter could hardly grasp and learn certain nonlinear relationships in the hyperspectral data of coal and rock. However, the deep learning methods were significantly advantageous when it came to complex and non-stationary data.

4 Conclusions

On the basis of a 1D CNN, this thesis put forward a new method for coal–rock identification. The application of coal–rock hyperspectral data was conducive to verifying the model. The following conclusions can be drawn from the abundant experiments carried out in this study.

  1. (1)

    The average accuracy rate of the proposed method in coal–rock identification was 94.6%.

  2. (2)

    The addition of a dropout layer to the 1D-CNN model facilitated the improvement of the accuracy rate of cross-load training in an effective way, as well as the advancement of the generalization capacity of the model. The accuracy of coal–rock recognition was enhanced in an effective way.

  3. (3)

    In comparison with traditional machine-learning-based methods, the 1D-CNN model exceled in the analysis of intricate non-stationary signals. Hence, it is safe to draw the conclusion that the 1D CNN adopted by this thesis showed many more advantages and increased efficiency compared with other methods.