Abstract
Based on the complex correlation between the geochemical element distribution patterns at the surface and the types of bedrock and the powerful capabilities in capturing subtle of machine learning algorithms, four machine learning algorithms, namely, decision tree (DT), random forest (RF), XGBoost (XGB), and LightGBM (LGBM), were implemented for the lithostratigraphic classification and lithostratigraphic prediction of a quaternary coverage area based on stream sediment geochemical sampling data in the Chahanwusu River of Dulan County, Qinghai Province, China. The local Moran’s I to represent the features of spatial autocorrelations, and terrain factors to represent the features of surface geological processes, were calculated as additional features. The accuracy, precision, recall, and F1 scores were chosen as the evaluation indices and Voronoi diagrams were applied for visualization. The results indicate that XGB and LGBM models both performed well. They not only obtained relatively satisfactory classification performance but also predicted lithostratigraphic types of the Quaternary coverage area that are essentially consistent with their neighborhoods which have the known types. It is feasible to classify the lithostratigraphic types through the concentrations of geochemical elements in the sediments, and the XGB and LGBM algorithms are recommended for lithostratigraphic classification.
摘要
基岩类型的判别是地质调查中十分重要的内容, 也是开展油气勘探和矿产勘探的重要基础工作。 本文采用决策树、随机森林、XGBoost、LightGBM 四种机器学习的方法, 实现了基于地球化学采样数 据的基岩类型判别。以15 种地球化学元素含量及其局部空间自相关莫兰指数和地形因子为特征, 训 练了不同的分类模型, 通过10 折交叉验证对模型做出了验证与评价。结果表明, 集成学习算法的分 类效果优于决策树, 其中XGBoost 和LightGBM 表现最好, 对复杂的高维空间数据和不平衡数据有较 强的处理能力。此外, 本文通过构建的分类模型成功地实现了对第四系沉积物下伏基岩类型的预测, Voronoi 图可视化结果表明, 预测基岩类型与其周围真实基岩类型基本吻合, 能初步划出基岩类型的 分界线。因此, 利用地球化学采样数据来判别其下伏基岩类型是可行的。
Similar content being viewed by others
References
TOLLE K M, TANSLEY D S W, HEY A J G. the fourth paradigm: Data-intensive scientific discovery [J]. Proceedings of the IEEE, 2011, 99(8): 1334–1337. DOI: https://doi.org/10.1109/JPROC.2011.2155130.
REICHSTEIN M, CAMPS-VALLS G, STEVENS B, JUNG M, DENZLER J, CARVALHAIS N, PRABHAT. Deep learning and process understanding for data-driven earth system science [J]. Nature, 2019, 566(7743): 195–204. DOI: https://doi.org/10.1038/s41586-019-0912-1.
BISHOP C M. Pattern recognition and machine learning [M]. Springer, 2007. https://www.springer.com/us/book/9780387310732.
LAKE B M, SALAKHUTDINOV R, TENENBAUM J B. Human-level concept learning through probabilistic program induction [J]. Science, 2015, 350(6266): 1332–1338. DOI: https://doi.org/10.1126/science.aab3050.
MOHRI M, ROSTAMIZADEH A, TALWALKAR A. Foundations of machine learning [M]. MIT Press, 2018. https://ieeexplore.ieee.org/document/6282245?reload=true&tp=&arnumber=6282245.
DEVRIES P M R, VIEGAS F, WATTENBERG M, MEADE B J. Deep learning of aftershock patterns following large earthquakes [J]. Nature, 2018, 560(7720): 632–634. DOI: https://doi.org/10.1038/s41586-018-0438-y.
RAHMATI O, GOLKARIAN A, BIGGS T, KEESSTRA S, MOHAMMADI F, DALIAKOPOULOS I N. Land subsidence hazard modeling: Machine learning to identify predictors and the role of human activities [J]. J Environ Manage, 2019, 236: 466–480. DOI: https://doi.org/10.1016/j.jenvman.2019.02.020.
LI Tong-wen, SHEN Huan-feng, YUAN Qiang-qiang, ZHANG Xue-chen, ZHANG Liang-pei. Estimating ground-level PM2.5 by fusing satellite and station observations: A geo-intelligent deep learning approach [J]. Geophysical Research Letters, 2017, 44(23): 985–993. DOI: https://doi.org/10.1002/2017gl075710.
WANG Rao, LI Qing-yong, YU Hao-min, CHEN Ze-chuan, ZHANG Ying-jun, ZHANG Ling, CUI Hou-xin, ZHANG Ke. A category-based calibration approach with fault tolerance for air monitoring sensors [J]. IEEE Sensors Journal, 2020, 20(18): 10756–10765. DOI: https://doi.org/10.1109/jsen.2020.2994645.
ADELI A, EMERY X, DOWD P. Geological modelling and validation of geological interpretations via simulation and classification of quantitative covariates [J]. Minerals, 2017, 8(7): 8010007. DOI: https://doi.org/10.3390/min8010007.
GONÇALVES Í G, KUMAIRA S, GUADAGNIN F. A machine learning approach to the potential-field method for implicit modeling of geological structures [J]. Computers and Geosciences, 2017, 103: 173–182. DOI: https://doi.org/10.1016/j.cageo.2017.03.015.
MCKAY G, HARRIS J R. Comparison of the data-driven random forests model and a knowledge-driven method for mineral prospectivity mapping: A case study for gold deposits around the Huritz group and Nueltin suite, Nunavut, Canada [J]. Natural Resources Research, 2015, 25(2): 125–143. DOI: https://doi.org/10.1007/s11053-015-9274-z.
RODRIGUEZ-GALIANO V, SANCHEZ-CASTILLO M, CHICA-OLMO M, CHICA-RIVAS M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines [J]. Ore Geology Reviews, 2015, 71: 804–818. DOI: https://doi.org/10.1016/j.oregeorev.2015.01.001.
CHEN Yong-liang, WU Wei. Isolation forest as an alternative data-driven mineral prospectivity mapping method with a higher data-processing efficiency [J]. Natural Resources Research, 2018, 28(1): 31–46. DOI: https://doi.org/10.1007/s11053-018-9375-6.
ZHANG Nan-nan, ZHOU Ke-fa, LI Dong. Back-propagation neural network and support vector machines for gold mineral prospectivity mapping in the Hatu region, Xinjiang, China [J]. Earth Science Informatics, 2018, 11(4): 553–566. DOI: https://doi.org/10.1007/s12145-018-0346-6.
SUN Tao, CHEN Fei, ZHONG Lian-xiang, LIU Wei-ming, WANG Yun. GIS-based mineral prospectivity mapping using machine learning methods: A case study from tongling ore district, eastern China [J]. Ore Geology Reviews, 2019, 109: 26–49. DOI: https://doi.org/10.1016/j.oregeorev.2019.04.003.
KEYKHAY-HOSSEINPOOR M, KOHSARY A H, HOSSEIN-MORSHEDY A, PORWAL A. A machine learning-based approach to exploration targeting of porphyry Cu-Au deposits in the Dehsalm district, eastern Iran [J]. Ore Geology Reviews, 2020, 116: 103234. DOI: https://doi.org/10.1016/j.oregeorev.2019.103234.
SUN Tao, LI Hui, WU Kai-xing, CHEN Fei, ZHU Zhong, HU Zi-juan. Data-driven predictive modelling of mineral prospectivity using machine learning and deep learning methods: A case study from southern Jiangxi province, China [J]. Minerals, 2020, 10(2): 10020102. DOI: https://doi.org/10.3390/min10020102.
WANG Fan-yun, MAO Xian-cheng, DENG Hao, ZHANG Bao-yi. Manganese potential mapping in western Guangxisoutheastern Yunnan (China) via spatial analysis and modal-adaptive prospectivity modeling [J]. Transactions of Nonferrous Metals Society of China, 2020, 30(4): 1058–1070. DOI: https://doi.org/10.1016/s1003-6326(20)65277-3.
ZUO Ren-guang, XIONG Yi-hui. Big data analytics of identifying geochemical anomalies supported by machine learning methods [J]. Natural Resources Research, 2017, 27(1): 5–13. DOI: https://doi.org/10.1007/s11053-017-9357-0.
CHEN Yong-liang, WU Wei. Application of one-class support vector machine to quickly identify multivariate anomalies from geochemical exploration data [J]. Geochemistry: Exploration, Environment, Analysis, 2017, 17(3): 231–238. DOI: https://doi.org/10.1144/geochem2016-024.
CHEN Li-rong, GUAN Qing-feng, FENG Bin, YUE Han-qiu, WANG Jun-yi, ZHANG Fan. A multi-convolutional autoencoder approach to multivariate geochemical anomaly recognition [J]. Minerals, 2019, 9(5): 9050270. DOI: https://doi.org/10.3390/min9050270.
WANG Zi-ye, ZUO Ren-guang, DONG Yan-ni. Mapping geochemical anomalies through integrating random forest and metric learning methods [J]. Natural Resources Research, 2019, 28(4): 1285–1298. DOI: https://doi.org/10.1007/s11053-019-09471-y.
GHEZELBASH R, MAGHSOUDI A, CARRANZA E J M. Optimization of geochemical anomaly detection using a novel genetic K-means clustering (GKMC) algorithm [J]. Computers and Geosciences, 2020, 134: 104335. DOI: https://doi.org/10.1016/j.cageo.2019.104335.
WU Ruo-yu, CHEN Jian-li, ZHAO Jiang-nan, CHEN Jin-duo, CHEN Shou-yu. Identifying geochemical anomalies associated with gold mineralization using factor analysis and spectrum-area multifractal model in Laowan district, Qinling-Dabie metallogenic belt, central China [J]. Minerals, 2020, 10(3): 10030229. DOI: https://doi.org/10.3390/min10030229.
GUO Zhen-wei, LAI Jian-qing, ZHANG Ke-ning, MAO Xian-cheng, LIU Jian-xin. Geosciences in central south university: A state-of-the-art review [J]. Journal of Central South University, 2020, 27(4): 975–996. DOI: https://doi.org/10.1007/s11771-020-4347-5.
SUN Jian, LI Qi, CHEN Ming-qiang, REN Long, HUANG Gui-hua, LI Chen-yang, ZHANG Zi-xuan. Optimization of models for a rapid identification of lithology while drilling-A win-win strategy based on machine learning [J]. Journal of Petroleum Science and Engineering, 2019, 176: 321–341. DOI: https://doi.org/10.1016/j.petrol.2019.01.006.
YU Le, PORWAL A, HOLDEN E J, DENTITH M C. Towards automatic lithological classification from remote sensing data using support vector machines [J]. Computers and Geosciences, 2012, 45: 229–239. DOI: https://doi.org/10.1016/j.cageo.2011.11.019.
PARAKH K, THAKUR S, CHUDASAMA B, TIRODKAR S, BHATTACHARYA A. Machine learning and spectral techniques for lithological classification [C]// SPIE Asia-Pacific Remote Sensing, 2016: 1–12. DOI: https://doi.org/10.1117/12.2223638.
CRACKNELL M J, READING A M. Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information [J]. Computers and Geosciences, 2014, 63: 22–33. DOI: https://doi.org/10.1016/j.cageo.2013.10.008.
ORDÓÑEZ-CALDERÓN J C, GELCICH S. Machine learning strategies for classification and prediction of alteration facies: Examples from the Rosemont Cu-Mo-Ag skarn deposit, SE Tucson Arizona [J]. Journal of Geochemical Exploration, 2018, 194: 167–188. DOI: https://doi.org/10.1016/j.gexplo.2018.07.020.
DEV A D, EDEN M R. Formation lithology classification using scalable gradient boosted decision trees [J]. Computers and Chemical Engineering, 2019, 128: 392–404. DOI: https://doi.org/10.1038/s41586-018-0438-y.
KITZIG M, KEPIC A, GRANT A. Near real-time classification of iron ore lithology by applying fuzzy inference systems to petrophysical downhole data [J]. Minerals, 2018, 8(7): 8070276. DOI: https://doi.org/10.3390/min8070276.
XIE Yun-xin, ZHU Chen-yang, ZHOU Wen, LI Zhong-dong, LIU Xuan, TU Mei. Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances [J]. Journal of Petroleum Science and Engineering, 2018, 160: 182–193. DOI: https://doi.org/10.1016/j.petrol.2017.10.028.
SUN Jian, LI Qi, CHEN Ming-qiang, REN Long, HUANG Gui-hua, LI Chen-yang, ZHANG Zi-xuan. Optimization of models for a rapid identification of lithology while drilling-A win-win strategy based on machine learning [J]. Journal of Petroleum Science and Engineering, 2019, 176: 321–341. DOI: https://doi.org/10.1016/j.petrol.2019.01.006.
SAVU-KROHN C, RANTITSCH G, AUER P, MELCHER F, GRAUPNER T. Geochemical fingerprinting of Coltan ores by machine learning on Uneven datasets [J]. Natural Resources Research, 2011, 20(3): 177–191. DOI: https://doi.org/10.1007/s11053-011-9142-4.
CATÉ A, SCHETSELAAR E, MERCIER-LANGEVIN P, ROSS P-S. Classification of lithostratigraphic and alteration units from drillhole lithogeochemical data using machine learning: A case study from the Lalor volcanogenic massive sulphide deposit, Snow Lake, Manitoba, Canada [J]. Journal of Geochemical Exploration, 2018, 188: 216–228. DOI: https://doi.org/10.1016/j.gexplo.2018.01.019.
HARRIS J R, GRUNSKY E C. Predictive lithological mapping of Canada’s north using random forest classification applied to geophysical and geochemical data [J]. Computers and Geosciences, 2015, 80: 9–25. DOI: https://doi.org/10.1016/j.cageo.2015.03.013.
COSTA I, TAVARES F, OLIVEIRA J. Predictive lithological mapping through machine learning methods: A case study in the Cinzento Lineament, Carajás province, Brazil [J]. Journal of the Geological Survey of Brazil, 2019, 2(1): 26–36. DOI: https://doi.org/10.29396/jgsb.2019.v2.n1.3
ELLIS D V, SINGER J M. Well logging for earth scientists [M]. Netherlands: Springer, 2007. DOI: https://doi.org/10.1007/978-1-4020-4602-5.
CHENG Qiu-ming. Singularity theory and methods for mapping geochemical anomalies caused by buried sources and for predicting undiscovered mineral deposits in covered areas [J]. Journal of Geochemical Exploration, 2012, 122: 55–70. DOI: https://doi.org/10.1016/j.gexplo.2012.07.007.
SHAHI H, GHAVAMI R, ROUHANI A K. Detection of deep and blind mineral deposits using new proposed frequency coefficients method in frequency domain of geochemical data [J]. Journal of Geochemical Exploration, 2016, 162: 29–39. DOI: https://doi.org/10.1016/j.gexplo.2015.12.006.
CHEN Qiao, JIA Cui-ping, WEI Jiu-chuan, DONG Fan-ying, YANG Wei-gang, HAO De-cheng, JIA Zhi-wei, JI Yu-han. Geochemical process of groundwater fluoride evolution along global coastal plains: Evidence from the comparison in seawater intrusion area and soil salinization area [J]. Chemical Geology, 2020, 552: 119779. DOI: https://doi.org/10.1016/j.chemgeo.2020.119779.
CHEN Qiao, HAO De-cheng, WEI Jiu-chuan, JIA Cui-ping, WANG Hong-mei, SHI Long-qing, LIU Song-liang, NING Fang-zhu, AN Mao-guo, JIA Zhi-wei, DONG Fang-ying, JI Yu-han. The influence of high-fluorine groundwater on surface soil fluorine levels and their FTIR characteristics [J]. Arabian Journal of Geosciences, 2020, 13: No. 383. DOI: https://doi.org/10.1007/s12517-020-05346-2.
ANAND R R, ASPANDIAR M F, NOBLE R R P. A review of metal transfer mechanisms through transported cover with emphasis on the vadose zone within the Australian regolith [J]. Ore Geology Reviews, 2016, 73(3): 394–416. DOI: https://doi.org/10.1016/j.oregeorev.2015.06.018.
ZAREMOTLAGH S, HEZARKHANI A. The use of decision tree induction and artificial neural networks for recognizing the geochemical distribution patterns of LREE in the Choghart deposit, central Iran [J]. Journal of African Earth Sciences, 2016, 128: 37–46. DOI: https://doi.org/10.1016/j.jafrearsci.2016.08.018.
ZHANG Bao-yi, CHEN Yi-ru, HUANG An-shuo, LU Hao, CHENG Qiu-ming. Geochemical field and its roles on the 3D prediction fo concealed ore-bodies [J]. Acta Petrologica Sinica, 2018, 34(2): 352–362. http://en.cnki.com.cn/Article_en/CJFDTotal-YSXB201802012.htm. (in Chinese)
WANG Li-fang, WU Xiang-bin, ZHANG Bao-yi, LI Xuefeng, HUANG An-shuo, MENG Fei, DAI Peng-yao. Recognition of significant surface soil geochemical anomalies via weighted 3d shortest-distance field of subsurface orebodies: A case study in the Hongtoushan copper mine, NE China [J]. Natural Resources Research, 2019, 28(3): 587–607. DOI: https://doi.org/10.1007/s11053-018-9410-7.
CORTES C, VAPNIK V. Support-vector networks [J]. Machine Learning, 1995, 20(3): 273–297. DOI: https://doi.org/10.1023/A:1022627411411.
QUINLAN J R. Induction of decision trees [J]. Machine Learning, 1986, 1(1): 81–106. DOI: https://doi.org/10.1023/A:1022643204877.
KODIKARA J R L, WOLDAI T. Spectral indices derived, non-parametric decision tree classification approach to lithological mapping in the Lake Magadi area, Kenya [J]. International Journal of Digital Earth, 2017, 11(10): 1020–1038. DOI: https://doi.org/10.1080/17538947.2017.1372525.
BREIMAN L. Random forests [J]. Machine Learning, 2001, 45(1): 5–32. DOI: https://doi.org/10.1023/A:1010933404324.
FREUND Y, SCHAPIRE R E. Experiments with a new boosting algorithm [C]// International Conference on Machine Learning: Proceedings of the Thirteenth International Conference. 1996: 148–156. http://cseweb.ucsd.edu/~yfreund/papers/boostingexperiments.pdf.
FRIEDMAN J H. Stochastic gradient boosting [J]. Computational Statistics & Data Analysis, 2002, 38(4): 367–378. DOI: https://doi.org/10.1016/S0167-9473(01)00065-2.
CHEN Tian-qi, GUESTRIN C. XGBoost: A scalable tree boosting system [C]// the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: Association for Computing Machinery, 2016: 785–794. DOI: https://doi.org/10.1145/2939672.2939785.
KE Guo-lin, MENG Qi, FINLEY T, WANG Tai-feng, CHEN Wei, MA Wei-dong, YE Qi-wei, LIU Tie-Yan. LightGBM: A highly efficient gradient boosting decision tree [C]// Neural Information Processing Systems 30 (NIPS 2017). Long Beach, CA, USA: Neural Information Processing Systems Conference, 2017: 3149–3157. http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf.
JIANG Kai, WANG Shou-dong, HU Yong-jing, PU Shi-zhao, DUAN Hang, WANG Zheng-wen. Lithology identification model by well logging based on boosting tree algorithm [J]. Well Logging Technology, 2018, 42(4): 29–34. http://en.cnki.com.cn/Article_en/CJFDTotal-CJJS201804006.htm. (in Chinese)
XIE Zheng-wen, ZHU Chen-yang, LU Yue, ZHU Zheng-wei. Towards optimization of boosting models for formation lithology identification [J]. Mathematical Problems in Engineering, 2019, 5309852. DOI: https://doi.org/10.1155/2019/5309852.
ASANTE-OKYERE S, SHEN Chuan-bo, ZIGGAH Y Y, RULEGEYA M M, ZHU Xiang-feng. A novel hybrid technique of integrating gradient-boosted machine and clustering algorithms for lithology classification [J]. Natural Resources Research, 2019, 29(4): 2257–2273. DOI: https://doi.org/10.1007/s11053-019-09576-4.
GOODFELLOWI J, POUGET-ABADIE J, MIRZA M, XU Bing, WARDE-FARLEY D, OZAIR S, COURVILLE A, BENGIO Y. Generative adversarial networks [C]// Advances in Neural Information Processing Systems. Montréal, Canada: 2014: 2672–2680. https://arxiv.org/pdf/1406.2661v1.pdf.
XU Shu-teng, ZHOU Yong-zhang. Artificial intelligence identification of ore minerals under microscope based on deep learning algorithm [J]. Acta Petrologica Sinica, 2018, 34(11): 3244–3252. http://html.rhhz.net/ysxb/20181110.htm. (in Chinese)
LI Guo-he, QIAO Ying-han, ZHENG Yi-feng, LI Ying, WU Wei-jiang. Semi-supervised learning based on generative adversarial network and its applied to lithology recognition [J]. IEEE Access, 2019, 7: 67428–67437. DOI: https://doi.org/10.1109/access.2019.2918366.
LIU Cheng-zhao, LI Ming-chao, ZHANG Ye, HAN Shuai, ZHU Yue-qin. An enhanced rock mineral recognition method integrating a deep learning model and clustering algorithm [J]. Minerals, 2019, 9: 516. DOI: https://doi.org/10.3390/min9090516.
ANSELIN L. Local indicators of spatial association—LISA [J]. Geographical Analysis, 1995, 27(2): 93–115. DOI: https://doi.org/10.1111/j.1538-4632.1995.tb00338.x.
QUINLAN J R. C4.5: Programs for machine learning [M]. San Francisco: Morgan Kaufmann Publishers Inc, 1993. DOI: https://doi.org/10.5555/152181.
BREIMAN L, FRIEDMAN J, OLSHEN R A, STONE C J. Classification and regression trees [M]. New York: Chapman and Hall, 1984. DOI: https://doi.org/10.2307/2530946.
Acknowledgments
The authors would like to thank the Co-Construction MapGIS Library by Engineering Research Center for Geographic Information System of China and Central South University for providing MapGIS® software. We also thank senior engineer Professor ZHANG Shao-ning (The 8th Team of Qinghai Provincial Bureau of Nonferrous Metals and Geological Exploration) and Professor LAI Jianqing (Central South University) for their kind assistance in the area of data collection.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Foundation item
Projects(41772348, 42072326) supported by the National Natural Science Foundation of China; Project(2017YFC0601503) supported by the National Key Research and Development Program, China
Contributors
The overarching research goals were developed by ZHANG Bao-yi; WANG Li-fang provided the data curation and completed data preprocessing; LI Man-yi and LI Wei-xia trained the lithostratigraphic classifiers based on machine learning and predicted lithostratigraphic types underlying the Quaternary coverages; WANG Fun-yun analyzed and verified the results; JIANG Zheng-wen and Umair KHAN realized the visualization. The initial draft of the manuscript was written by ZHANG Bao-yi and LI Man-yi; WANG Li-fang replied to reviewers’ comments and revised the final version.
Conflict of interest
ZHANG Bao-yi, LI Man-yi, LI Wei-xia, JIANG Zheng-wen, Umair KHAN, WANG Li-fang, WANG Fan-yun declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Zhang, By., Li, My., Li, Wx. et al. Machine learning strategies for lithostratigraphic classification based on geochemical sampling data: A case study in area of Chahanwusu River, Qinghai Province, China. J. Cent. South Univ. 28, 1422–1447 (2021). https://doi.org/10.1007/s11771-021-4707-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-021-4707-9
Key words
- machine learning
- geochemical sampling
- lithostratigraphic classification
- lithostratigraphic prediction
- bedrock