Abstract
The identification technology for coal and coal-measure rock is required across multiple stages of coal exploration, mining, separation, and tailings management. However, the construction of identification models necessitates substantial data support. To this end, we have established a near-infrared spectral dataset for coal and coal-measure rock, which includes the reflectance spectra of 24 different types of coal and coal-measure rock. For each type of sample, 11 sub-samples of different granularities were created, and reflectance spectra were collected from sub-samples at five different detection azimuths, 18 different detection zeniths, and under eight different light source zenith conditions. The quality and usability of the dataset were verified using quantitative regression and classification machine learning algorithms. Primarily, this dataset is used to train artificial intelligence-based models for identifying coal and coal-measure rock. Still, it can also be utilized for regression studies using the industrial analysis results contained within the dataset.
Similar content being viewed by others
Background & Summary
The stable and safe supply of coal is crucial for sustaining production and daily life1. It is necessary to constantly enhance the technical level of coal production, in which coal and coal-measure rock identification play a pivotal role throughout the entire coal production process. During the exploration stage2, identification technology aids engineers in rapidly determining the optimal locations, depths, and angles for drilling. During the mining stage3, it enables the shearer to follow the roof, which reduces equipment wear. During the separation stage4, it effectively distinguishes between coal and coal-measure rock, which in turn reduces energy consumption during subsequent processing stages. During the tailing management stage5, unutilized coal can be effectively identified and recovered, thereby minimizing waste. Currently, there are various methods for coal and coal-measure rock identification, such as imaging6, gamma-ray7, and radar8 techniques. In recent years, spectral-based methods have gained widespread attention due to their advantages in speed and efficiency9. Spectroscopy reveals molecular composition and structure through absorption spectra of vibrations at fundamental and overtone frequencies. Spectroscopy serves as a “fingerprint” for different substances10.
Traditionally, identification models relied on database matching methods11. However, with the advancement of artificial intelligence (AI) technology, AI-based identification models have achieved superior performance. Currently, various classification models have been developed using advanced machine learning techniques, including Convolutional Neural Network (CNN)12,13, Broad Learning System (BLS)14, and Bidirectional Long Short-Term Memory network (Bi-LSTM)15, as well as pre-trained Vision Transformer (ViT)16. Notably, Gaussian Support Vector Machine (SVM)17, Linear Discriminant Analysis (LDA)18, BLS19, Random Forest (RF)20, and Extreme Learning Machine (ELM)21 have also been utilized to establish models for classifying coal types and provenance. Specifically, for the quantitative analysis of coal composition, researchers have employed methods such as XGBoost22 and Partial Least Squares (PLS)23 to construct regression models.
While AI-based models are widely used, their practical implementation introduces new challenges, notably the need for diverse training data. Numerous spectral databases have been established, such as the USGS Spectral Library Version 7 by the United States Geological Survey24, primarily featuring mineral reflectance spectra covering visible, near-infrared, and mid-far infrared bands of absorption and reflectance spectra; NASA JPL’s ECOSTRESS Spectral Library - Version 1.025, which focused on minerals and rocks across the 0.4~15.4 μm wavelength range; The “ 2D hyperspectral library of mineral reflectance” by Laurent Fasnacht26,27,28,29 includes reflectance spectra of minerals in near-infrared ranges. These databases provide a valuable data platform for researchers, but they are primarily intended to support fields like geological exploration, environmental monitoring, agriculture, and forestry.
The primary subjects of identification research are coal and coal-measure rock, where coal-measure rock refers to sedimentary rocks like mud shale, sandstone, and limestone. Reflectance spectra can differ due to geological variations and types in different regions30. Even within the same coal mine, reflectance spectra of the same type of coal and coal-measure rock can vary significantly due to differences in detection geometry, such as angle and distance. As well as coal and coal-measure rock characteristics like granularity and surface roughness31. Therefore, a specialized spectral dataset for coal and coal-measure rock identification is needed. We have collected coal and coal-measure rock samples from various mining areas in China, including 12 types of coal-measure rocks and 12 types of coal. Each coal sample underwent XRF (X-ray Fluorescence), XRD (X-ray Diffraction), and ICA (Industrial Component Analysis) treatments. Each collected sample was prepared into 11 different granularity sub-samples, and the reflectance spectra for all sub-samples were obtained using five different detection azimuth ϕi, 18 different detection zenith θi, and nine different light source zenith θo. Fig. 1 show the overview of coal and coal-measure rock sample preparation experiment and spectral acquisition.
Methods
Coal and coal-measure rock samples collection
We collected coal and coal-measure rock samples from different mining areas in various provinces of China using two collection methods: direct acquisition from fully mechanized mining faces and roof drilling. In total, we obtained representative samples of 12 types of coal, including anthracite coal, bituminous coal, and lignite, as well as 12 types of coal-measure rocks, including shale, sandstone, and limestone, as show in Fig. 2. To preserve the original surface morphology of the coal and coal-measure rock, the collected samples were promptly placed in self-sealing plastic bags and sealed for storage.
Component analysis and sub-sample preparation
Component analysis was performed on each sample to investigate the material composition, which produces the spectral characteristics of coal and coal-measure rock reflectance. Three methods were employed for the study: XRD, XRF, and ICA. XRD determined the carbonaceous material structure and mineral composition type of the coal, XRF quantitatively measured the elemental content in the coal, ICA analyses determined the air-dried moisture (Mad), ash content (Aad), volatile matter (Vad), and fixed carbon (FCad) of the coal samples. Table 1 provides information on the instruments used in these three methods. Table 2 presents the configuration for the component analysis of each sample.
Since ICA primarily involves measuring the chemical composition parameters of samples through combustion. The coal industry’s core focus is on coal, rock is considered additional materials produced during mining and are discarded after extraction. The purpose of conducting ICA is to sort coal samples based on chemical composition parameters, in order to classify them by different combustion levels, making this analysis exclusive for coal.
Due to the rough surface condition and particle size of coal and coal-measure rock, which are important factors affecting the spectral reflectance characteristics. For each type of coal and coal-measure rock, block samples with two different surface roughness levels and powdered samples with nine different particle sizes were prepared. The block samples were prepared, with one side having an in-situ fractured rough surface, the other having a relatively smooth surface polished to a surface roughness (Ry) less than 0.3 mm. The powdered samples of each coal and coal-measure rock type were obtained by standard sieving, resulting in nine different particle sizes: 8000 μm, 4750 μm, 2500 μm, 1000 μm, 500 μm, 210 μm, 100 μm, 74 μm, and 45 μm.
Spectral data collection
Based on the relative positional relationship of samples, detectors, and light sources in spherical coordinates, a spherical coordinate-based reflectance spectroscopy measurement platform was designed. The three-dimensional model of the platform is shown in Fig. 1, which mainly includes the platform’s structure, whiteboard, detector, spectrometer, and light source; their parameters are shown in Table 3.
The specific collection process is as follows:
-
(1)
Connect one end of the optical fiber to the detector and the other to the spectrometer.
-
(2)
Connect the spectrometer to the computer via USB and launch the AvaSoft software developed by Avantes.
-
(3)
Measure the radiance illuminance of complete reflection from a fixed position (two distances are 0.5 m, light source zenith of 45°, detection zenith of 0°, and detection azimuth of 0°).
-
(4)
Measure the zero radiance illuminance with the detector covered.
-
(5)
Collect the irradiance of the sample and calculate the reflectance spectrum using Formula 1.
Where R represents the reflectance, S represents the radiance illuminance. The subscripts “ref” and “dark” indicate the radiance illuminance under complete reflection and detector covered, respectively. The parameters θi, θo, and φi represent detection zenith, light source zenith, and detection azimuth.
During the spectrum collection process, the distance between the light source and the sample, as well as between the detector and the sample, is set to 0.5 meters. Five different detection azimuth (0°, 10°, 20°, 30°, 40°), eighteen different light source zenith (0°, 5°, 10°, 15°, 20°, 25°, 30°, 35°, 40°, 45°, 50°, 55°, 60°, 65°, 70°, 75°, 80°, 85°), and nine different detection zenith (10°, 20°, 30°, 40°, 45°, 50°, 60°, 70°, 80°) were set. These three angles were combined in a factorial design, resulting in 32 detection geometries. Thus, a total of 53 reflectance spectra were obtained for each type of sample in each condition. In total, 24 types of samples yielded 10496 reflectance spectra.
Data Records
The near-infrared spectroscopy dataset has been deposited into the Zenodo(https://doi.org/10.5281/zenodo.11137126)32. The directory structure is shown in Fig. 3. The dataset includes a total of 4 folders: Photos, Documentation, Spectra, and Analysis. In the Spectra folder, there are exterior images of all samples in.jpg format. In the Documentation folder, there are text files (.txt) summarizing the origin of each sample, as well as ICA, and XRF results. The Spectra folder contains the reflectance spectrum of each sub-sample at diverse conditions in.csv format. Lastly, in the Analysis folder, there are.csv files containing the results obtained from XRF, XRD, and ICA for each sample. The names of the files in the first three folders correspond to their respective samples.
Technical Validation
To ensure the reliability of the reflectance spectrum collected under diverse conditions, we conducted detailed inspections before, during, and after collection. The specific inspection methods are as follows:
-
(1)
Before collection, three individuals who deeply understood coal and coal-measure rock spectra were assigned different tasks. They were responsible for sample selection according to the labels, adjusting the measurement platform’s collection parameters, and operating the computer software. This division of labor helped to minimize the probability of errors.
-
(2)
During collection, for each sub-sample and each condition, the spectrometer was set to automatically acquire the reflectance spectrum ten times and then take the average. This approach can avoid distortions caused by a single spectral collection.
-
(3)
After collection, interpolation was applied to the exported reflectance spectrum using Avasoft. This step aims to obtain a reflectance spectrum with a wavelength interval of 1 nm and fill in any missing wavelength. By following these meticulous procedures, we ensured the accuracy and reliability of the collected reflectance spectrum for our dataset.
To validate the feasibility of the constructed dataset. In this manuscript, the analysis of reflectance spectra will be conducted using three approaches: binary classification of coal and coal-measure rock, multi-class classification of coal and quantitative regression of coal’s composition. This manuscript will validate the existing algorithms mentioned in the background & summary section. CNN, BLS, and Bi-LSTM will be utilized for the binary classification of coal and coal-measure rock using all spectral data. For the multi-class classification of coal, SVM, LDA, BLS, RF, and ELM will be employed using all spectral data of coal. Quantitative regression of coal’s composition will be performed using XGBoost and PLS using all spectral data of coal. During the dataset validation process, the dataset is divided into training set, testing set, and validation set in a 7:2:1 ratio. The algorithms were trained on a machine with an i7-8750H CPU, Quadro P1000, and Windows 10. The hyperparameters for all algorithms were initially set to their recommended appropriate values, and the specific training parameters can be found in the publicly available GitHub repository listed in the code availability section. The versions of the libraries used for each algorithm are detailed in Table 4.
To further validate the stability of the algorithms, each algorithm was trained and tested ten times. The performance of binary and multi-classification algorithms was evaluated using accuracy, while the effectiveness of the quantitative regression was assessed using Mean Absolute Error (MAE). The calculation methods for these metrics are as follows:
where TP and TN represent correctly identified coal and coal-measure rock, respectively, while FP and FN denote incorrectly identified coal and coal-measure rock. Similarly, TM and FM denote the identification of correct and incorrect coal types. The variable n refers to the total volume of data recognized, yrel represents the real Aad of the sample, and ypre indicates the predicted Aad of the sample.
Fig. 4 displays the accuracy and error for binary classification, multi-classification, and quantitative regression tasks. In Fig. 4(a), the identification accuracy of all algorithms is no less than 94%, with the Bi-LSTM algorithm achieving the highest accuracy, because bi-LSTM can consider the correlation between wide-range wavelengths. It performed well in both the test and validation sets, reaching an average accuracy of 98.84%. In Fig. 4(b), due to the limitations of its linear classifier, the LDA algorithm performed poorly, whereas the ELM algorithm showed the highest and most stable accuracy, with an average accuracy of 98.89% in the validation set. The other algorithms also maintained a stable accuracy around 84%. In Fig. 4(c), XGBoost, with its ability to capture nonlinear relationships, achieved a lower error than to PLS, with an average MAE of 4.75% in the validation set.
In addition, we will further expand the coal and coal-measure rock near-infrared spectral dataset by adding a reflectance spectrum under the interference of external factors, such as dust, water mist, etc. This work will make the dataset universally applicable in various stages of coal mining. We also encourage other researchers in the coal mining field to expand and improve this dataset. The coal and coal-measure rock near-infrared reflectance spectrum collected in different conditions have significant implications for applying identification algorithms in practical work. The aim is to support further research and advancements in the intelligentization of coal mining.
Code availability
In this article, spectral acquisition, calibration, and data export were all completed using Avantes’ Avasoft software, with the software version being 7.8. The algorithms used for data validation were implemented using Python code, and the distribution version of the code can be obtained at the following website: https://github.com/smartLybo/Spectral-data-processing.git.
References
Ralston, J. C., Hargrave, C. O. & Dunn, M. T. Longwall Automation: Trends, Challenges and Opportunities. Int j Min Sci Techno. 27, 733–739 (2017).
Shi, S. et al. Improved Unet in Lithology Identification of Coal Measure Strata. Lithosphere-US. 2022, 15 (2022).
Wang, H. & Zhang, Q. Dynamic Identification of Coal-Rock Interface Based On Adaptive Weight Optimization and Multi-Sensor Information Fusion. Inform Fusion. 51, 114–128 (2019).
Zhang, K., Wang, W., Lv, Z., Fan, Y. & Song, Y. Computer Vision Detection of Foreign Objects in Coal Processing Using Attention Cnn. Eng Appl Artif Intel. 102, 104242 (2021).
Zhu, W. et al. Ash Detection of Coal Slime Flotation Tailings Based On Chromatographic Filter Paper Sampling and Multi-Scale Residual Network. Intelligent Automation & Soft Computing. 38, 259–273 (2023).
Liu, Q., Li, J., Li, Y. & Gao, M. Recognition Methods for Coal and Coal Gangue Based On Deep Learning. Ieee Access. 9, 77599–77610 (2021).
Shao, H. et al. A 91-Channel Hyperspectral Lidar for Coal/Rock Classification. Ieee Geosci Remote S. 17, 1052–1056 (2020).
Zhao, M. et al. Spatial Effect Analysis of Coal and Gangue Recognition Detector Based On Natural Gamma Ray Method. Natural resources research (New York, N.Y.). 31, 953–969 (2022).
Chen, X. et al. Coal Gangue Recognition Using Multichannel Auditory Spectrogram of Hydraulic Support Sound in Convolutional Neural Network. Measurement science & technology. 33, 15107 (2022).
Liancun, X., Zhizhong, Z., Chunxia, C. & Yang, G. Mineral Identification and Geological Mapping Using Near-Infrared Spectroscopy Analysis. 2017 International Conference on Progress in Informatics and Computing (PIC): IEEE, 2017:119–123.
Mei, X. et al. A Real-Time Infrared Ultra-Spectral Signature Classification Method Via Spatial Pyramid Matching. SENSORS-BASEL. 15, 15868–15887 (2015).
Hu, F., Zhou, M., Dai, R. & Liu, Y. Recognition Method of Coal and Gangue Based On Multispectral Spectral Characteristics Combined with One-Dimensional Convolutional Neural Network. Frontiers in earth science (Lausanne). 10 (2022).
Yang, J. et al. Cnn Coal and Rock Recognition Method Based On Hyperspectral Data. Int j Coal Sci Techn. 9, 1–12 (2022).
Zou, L., Yu, X., Li, M., Lei, M. & Yu, H. Nondestructive Identification of Coal and Gangue Via Near-Infrared Spectroscopy Based On Improved Broad Learning. Ieee T Instrum Meas. 10, 8043–8052 (2020).
Ding, Z. W. et al. Recognition Method of Coal–Rock Reflection Spectrum Using Wavelet Scattering Transform and Bidirectional Long–Short-Term Memory. Rock Mech Rock Eng. 57, 1353–1374 (2024).
Yang, J., Chang, B., Zhang, Y., Zhang, Y. & Luo, W. Pcvit: A Pre-Convolutional Vit Coal Gangue Identification Method. Energies. 15, 4189 (2022).
Lei, M., Zhang, L., Li, M., Chen, H. & Zhang, X. Near-Infrared Spectrum of Coal Origin Identification Based On Lvq with Svm Algorithm. Proceedings of the 37th Chinese Control Conference. China, 2018:9016–9020.
Yu, X., Guo, W., Wu, N., Zou, L. & Lei, M. Rapid Discrimination of Coal Geographical Origin Via Near-Infrared Spectroscopy Combined with Machine Learning Algorithms. Infrared Phys Techn. 105, 103180 (2020).
Lei, M., Rao, Z., Li, M., Yu, X. & Zou, L. Identification of Coal Geographical Origin Using Near Infrared Sensor Based On Broad Learning. Applied sciences. 9, 1111 (2019).
Lei, M., Yu, X., Li, M. & Zhu, W. Geographic Origin Identification of Coal Using Near-Infrared Spectroscopy Combined with Improved Random Forest Method. Infrared Phys Techn. 92, 177–182 (2018).
Xiao, D., Li, H. & Sun, X. Coal Classification Method Based On Improved Local Receptive Field-Based Extreme Learning Machine Algorithm and Visible–Infrared Spectroscopy. Acs Omega. 5, 25772–25783 (2020).
Begum, N., Maiti, A., Chakravarty, D. & Das, B. S. Reflectance Spectroscopy Based Rapid Determination of Coal Quality Parameters. Fuel. 280, 118676 (2020).
Li, J. et al. Coal Calorific Value Detection Technology Based On Nirs-Xrf Fusion Spectroscopy. Chemosensors. 11, 363 (2023).
Kokaly, R. F. et al. Usgs Spectral Library Version 7, U.S. Geological Survey, Reston, VA (2017).
Meerdink, S. K., Hook, S. J., Roberts, D. A. & Abbott, E. A. The Ecostress Spectral Library Version 1.0. Remote Sens Environ. 230, 111196 (2019).
Fasnacht, L., Vogt, M., Renard, P. & Brunner, P. A 2D Hyperspectral Library of Mineral Reflectance, From 900 to 2500 Nm. Sci Data. 6 (2019).
Fasnacht, L., Vogt, M.-L., Renard, P. & Brunner, P. A 2D hyperspectral library of mineral refectance, from 900 to 2500 nm – Raw data. Zenodo https://doi.org/10.5281/zenodo.1446397 (2018).
Fasnacht, L., Vogt, M.-L., Renard, P. & Brunner, P. A 2D hyperspectral library of mineral refectance, from 900 to 2500 nm – High dynamic range data. Zenodo https://doi.org/10.5281/zenodo.1476495 (2018).
Fasnacht, L., Vogt, M.-L., Renard, P. & Brunner, P. A 2D hyperspectral library of mineral refectance, from 900 to 2500 nm – Masked high dynamic range data. Zenodo https://doi.org/10.5281/zenodo.1476503 (2018).
Hunt, G. R. Spectral Signatures of Particulate Minerals in the Visible and Near Infrared. Geophysics. 42, 501–513 (1977).
Cloutis, E. A. et al. Spectral Reflectance “Deconstruction” of the Murchison Cm2 Carbonaceous Chondrite and Implications for Spectroscopic Investigations of Dark Asteroids. Icarus. 305, 203–224 (2018).
Lv, Y., Wang, S., & Yang, E. A near infrared spectroscopy dataset of coal and coal-measure rock under diverse conditions, Zenodo, https://doi.org/10.5281/zenodo.11137126 (2024).
Acknowledgements
This work was supported in part by the projects of the Ministry of Industry and Information Technology under Grant TC220A04W-1/167, in part by the National Key Research and Development Program of Shanxi under Grant 202102010101003, in part by the Scientific research projects of the Inner Mongolian under Grant 2022YFHH0007, and in part by the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.
Author information
Authors and Affiliations
Contributions
S.W. developed the overall experimental plan, E.Y. collected and prepared the experimental samples, Y.L. and E.Y.collected and analyzed the spectral data of coal and coal-measure rock, Y.L. and S.G. drafted and modified the manuscript. All authors have read and approved the final manuscript for submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lv, Y., Wang, S., Yang, E. et al. A near-infrared spectroscopy dataset of coal and coal-measure rock under diverse conditions. Sci Data 11, 628 (2024). https://doi.org/10.1038/s41597-024-03422-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03422-w
- Springer Nature Limited