Abstract
An interpretability model for intelligent lithology identification is proposed, which utilizes Ensemble Learning Stacking, Permutation Importance (PI), and Local Interpretable Model-agnostic Explanations (LIME) techniques. The aim of this method is to provide more accurate geological information and scientific support for oil and gas resource exploration. Two logging datasets from the public domain were used as experiments, and support vector machine (SVM), random forest (RF), and naive Bayes (NB) were employed as base learners, while SVM was utilized as the meta learner for lithology classification via stacking algorithm. The accuracy of the model was verified using evaluation metrics such as Area Under Curve (AUC), precision, recall, and F1-score. The PI and LIME techniques were employed to explain the lithology identification model. The results indicate that the stacking algorithm produced the best indexes and highest prediction accuracy. With respect to overall interpretation, PHIND, GR, and RT were found to have the most significant influence on lithology identification in a natural gas protection area in the United States, while DEN, CAL, and PEF were observed to be the most influential variables for lithology identification in the Daqing Oilfield in China. From the perspective of a single sample, the LIME algorithm can provide a quantitative prediction probability and degree of influence of the characteristic variables.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Ao Y, Li H, Zhu L, Ali S, Yang Z (2019) Identifying channel sand-body from multiple seismic attributes with an improved random forest algorithm. J Pet Sci Eng 173:781–792
Asante-Okyere S, Shen C, Ziggah YY, Rulegeya MM, Zhu X (2020) A Novel Hybrid Technique of Integrating Gradient-Boosted Machine and Clustering Algorithms for Lithology Classification. Nat Resour Res 29:2257–2273
Breiman L (2001) Random Forests. Mach Learn 45:5–32
Bressan TS, Souza MK, Girelli TJ, Junior FC (2020) Evaluation of machine learning methods for lithology classification using geophysical data. Comput Geosci 139:104475
Cao Z (2018) Cross-well lithology identification. figshare. Dataset. https://doi.org/10.6084/m9.figshare.6667646.v1
Chen Z, Chang R, Guo H, Pei X, Zhao W, Yu Z, Zou L (2022) Prediction of Potential Geothermal Disaster Areas along the Yunnan-Tibet Railway Project. Remote Sens 14:3036
Das S, Datta S, Zubaidi HA, Obaid IA (2021) Applying interpretable machine learning to classify tree and utility pole related crash injury types. IATSS Res 45:310–316
Dubois MK, Byrnes AP, Bohling GC, Seals SC, Doveton JH (2003) Statistically-based lithofacies predictions for 3-D reservoir modeling: examples from the Panoma (Council Grove) field, Hugoton embayment, southwest Kansas (abs). Proceedings, American Association of Petroleum Geologists 2003 Annual Convention, Salt Lake City, Utah, 12, A44, and Kansas Geological Survey Open File Report ♯2003-30, Kansas Geological Survey web site. http://www.kgs.ku.edu/PRS/publication/2003/ofr2003-30/index.html
Dubois MK, Bohling GC, Chakrabarti S (2007) Comparison of four approaches to a rock facies classification problem. Comput Geosci 33:599–617
Genuer R, Poggi J-M, Tuleau-Malot C, Villa-Vialaneix N (2017) Random Forests for Big Data. Big Data Res 9:28–46
Han R, Wang Z, Wang W, Xu F, Qi X, Cui Y (2021) Lithology identification of igneous rocks based on XGboost and conventional logging curves, a case study of the eastern depression of Liaohe Basin. J Appl Geophys 195:104480
Hsieh WW (2009) Machine learning methods in the environmental sciences: kernel methods. Cambridge University Press, Cambridge
Ibrahim M, Modarres C, Louie M, Paisley J (2019) Global explanations of neural network: Mapping the landscape of predictions, in: AIES 2019 - Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society
Jia H, Zhao J, jun-liu, Min-Zhang, Sun W (2021) Accurate heart disease prediction via improved stacking integration algorithm. J Imaging Sci Technol 65(3). https://doi.org/10.2352/J.ImagingSci.Technol.2021.65.3.030408
Liu Y, Yu Z, Chen C, Han Y, Yu B (2020) Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Anal Biochem 609:113903
Mateo-Sanchis A, Piles M, Amorós-López J, Muñoz-Marí J, Adsuara JE, Moreno-Martínez Á, Camps-Valls G (2021) Learning main drivers of crop progress and failure in Europe with interpretable machine learning. Int J Appl Earth Obs Geoinf 104:102574
Ribeiro M, Singh S, Guestrin C (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 97–101
Saporetti CM, da Fonseca LG, Pereira E, de Oliveira LC (2018) Machine learning approaches for petrographic classification of carbonate-siliciclastic rocks using well logs and textural information. J Appl Geophys 155:217–225
Saporetti CM, da Fonseca LG, Pereira E (2019) A Lithology Identification Approach Based on Machine Learning With Evolutionary Parameter Tuning. IEEE Geosci Remote Sens Lett 16:1819–1823
Shankar K, Lakshmanaprabu SK, Gupta D, Maseleno A, Albuquerque VH (2018) Optimal feature-based multi-kernel SVM approach for thyroid disease classification. J Supercomput 76:1128–1143
Sun J, Li Q, Chen M, Ren L, Huang G, Li C, Zhang Z (2019) Optimization of models for a rapid identification of lithology while drilling-A win-win strategy based on machine learning. J Pet Sci Eng 176:321–341
Swets JA (1988) Measuring the accuracy of diagnostic systems. Science (80-. ) 240:1285–1293
Wang K, Tian J, Zheng C, Yang H, Ren J, Liu Y, Han Q, Zhang Y (2021) Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med 137:104813
Wang H, Xiong J, Yao Z, Lin M, Ren J (2017) Research survey on support vector machine, in: International Conference on Mobile Multimedia Communications (MobiMedia)
Wang K, Liu X (2021) An Anomaly Detection Method of Industrial Data Based on Stacking Integration. J Artif Intell 3(1). https://doi.org/10.32604/jai.2021.016706
Xu Z, Huang X, Lin L et al (2020) BP neural networks and random forest models to detect damage by Dendrolimus punctatus Walker. J For Res 31(1):107–121
Author information
Authors and Affiliations
Contributions
All authors have contributed to the conception and design of this study. Data collection was carried out by Xiaochun Lin. Xiaochun Lin and Shitao Yin constructed the experimental models and the development and testing of the proposed methods. The manuscript was written by Xiaochun Lin, and all authors have provided feedback and comments on the manuscript. All authors have reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Communicated by: H. Babaie
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, X., Yin, S. Lithology identification based on interpretability integration learning. Earth Sci Inform 16, 2211–2222 (2023). https://doi.org/10.1007/s12145-023-01024-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-023-01024-5