Abstract
Lithofacies identification is critical to energy exploration and reservoir evaluation. Machine learning provides a way to use logging data for lithofacies intelligence identification. However, labeled logging data are usually scarce, which makes the currently used supervised algorithms less effective, so semi-supervised methods have received attention from researchers. In this paper, we propose to apply Tri-Training to the field of lithofacies recognition. The framework used Random Forest (RF), Gradient-Boosted Decision Trees (GBDT), and Support Vector Machine (SVM), as the baseline supervised classifiers, and based on the idea of inductive semi-supervised methods and ensemble learning. Baseline classifiers are trained and iterated using unlabeled data to obtain effect improvement. The final results are output in an ensemble paradigm. We used seven logging parameters from two wells as input and divide the data randomly 10 times for training and testing. With only five samples of each lithology, the prediction accuracy improved by the average of 2.1% and 14.5% in both wells compared to the baseline methods. In addition, we also compared two commonly used semi-supervised methods, label propagation algorithm (LPA) and Co-Training. The experimental results also confirm that Tri-training has the better and more stable performance. The Tri-training method in this paper can be effectively applied to lithofacies identification under scarce labeled logging data.
Similar content being viewed by others
Data availability
The data are not publicly available due to Privacy of data. Other relevant materials during the current study are available from the corresponding author on reasonable request.
References
Ao Y, Li H, Zhu L, Ali S, Yang Z (2019) Logging lithology discrimination in the prototype similarity space with random forest. IEEE Geosci Remote Sens Lett. https://doi.org/10.1109/LGRS.2018.2882123
Ao Y, Zhu L, Guo S & Yang Z (2020). Probabilistic logging lithology characterization with random forest probability estimation. Comput Geosci. https://doi.org/10.1016/j.cageo.2020.104556
Chen X, Cao W, Gan C, Ohyama Y, She J & Wu M (2021). Semi-supervised support vector regression based on data similarity and its application to rock-mechanics parameters estimation. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2021.104317
Cui Y, Wang G, Jones SJ, Zhou Z, Ran Y, Lai J, Li R, & Deng L (2017). Prediction of diagenetic facies using well logs – A case study from the upper Triassic Yanchang Formation, Ordos Basin, China. Mar Pet Geol. https://doi.org/10.1016/j.marpetgeo.2017.01.001
Deng C, Pan H, Fang S, Konaté AA, & Qin R (2017). Support vector machine as an alternative method for lithology classification of crystalline rocks. J Geophysics Eng. https://doi.org/10.1088/1742-2140/aa5b5b
Dev VA & Eden MR (2019a). Formation lithology classification using scalable gradient boosted decision trees. Comput Chem Eng. https://doi.org/10.1016/j.compchemeng.2019.06.001
Dev VA & Eden MR (2019b). Gradient Boosted Decision Trees for Lithology Classification. In Computer Aided Chemical Engineering. https://doi.org/10.1016/B978-0-12-818597-1.50019-9
Dong S, Zeng L, Du X, He J & Sun F (2022). Lithofacies identification in carbonate reservoirs by multiple kernel Fisher discriminant analysis using conventional well logs: A case study in A oilfield, Zagros Basin, Iraq. J Pet Sci Eng. https://doi.org/10.1016/j.petrol.2021.110081
Dong S, Zeng L, Lyu W, Xu C, Liu J, Mao Z, Tian H & Sun F (2020). Fracture identification by semi-supervised learning using conventional logs in tight sandstones of Ordos Basin, China. J Nat Gas Sci Eng. https://doi.org/10.1016/j.jngse.2019.103131
Duan Y, Xie J, Li B, Wang M, Zhang T & Zhou Y (2020). Lithology identification and reservoir characteristics of the mixed siliciclastic-carbonate rocks of the lower third member of the Shahejie formation in the south of the Laizhouwan Sag, Bohai Bay Basin, China. Carbonates Evaporites. https://doi.org/10.1007/s13146-020-00583-8
Dunham MW, Malcolm A & Welford JK (2020). Improved well log classification using semisupervised Gaussian mixture models and a new hyper-parameter selection strategy. Comput Geosci. https://doi.org/10.1016/j.cageo.2020.104501
Huang P, Wang H, & Jin Y (2021). Offline data-driven evolutionary optimization based on tri-training. Swarm Evol Comput. https://doi.org/10.1016/j.swevo.2020.100800
Hutami HY, Sudarsana R (2019) Rock physics model to determine the geophysical pore-type characterization and geological implication in carbonate reservoir rock. In IOP Conference Series: Earth and Environmental Science. https://doi.org/10.1088/1755-1315/311/1/012031
Jiang H, Pang X, Chen D, Peng H, Yu Q & Zhang X (2018). Characteristics of source rock controlling hydrocarbon distribution in Huizhou Depression of Pearl River Mouth Basin, South China Sea. J Pet Sci Eng. https://doi.org/10.1016/j.petrol.2018.08.031
Jollife IT & Cadima J (2016). Principal component analysis: A review and recent developments. In Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. https://doi.org/10.1098/rsta.2015.0202
Lan X, Zou C, Kang Z & Wu X (2021). Log facies identification in carbonate reservoirs using multiclass semi-supervised learning strategy. Fuel. https://doi.org/10.1016/j.fuel.2021.121145
Li S, Luo J & Hu Y (2020a). Semi-supervised process fault classification based on convolutional ladder network with local and global feature fusion. Comput Chem Eng. https://doi.org/10.1016/j.compchemeng.2020.106843
Li Z, Kang Y, Feng D, Wang XM, Lv W, Chang J, Zheng WX (2020) Semi-supervised learning for lithology identification using Laplacian support vector machine. J Pet Sci Eng 195(April):107510. https://doi.org/10.1016/j.petrol.2020.107510
Liu Q, Liu S, Wang G & Xia S (2020a). Social relationship prediction across networks using tri-training BP neural networks. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.02.057
Liu W, Li Y, Lin X, Tao D & Wang Y (2014). Hessian-regularized co-training for social activity recognition. PLoS ONE. https://doi.org/10.1371/journal.pone.0108474
Liu W, Li Y, Tao D & Wang Y (2015). A general framework for co-training and its applications. Neurocomputing. https://doi.org/10.1016/j.neucom.2015.04.087
Liu XY, Zhou L, Chen XH, & Li JY (2020b). Lithofacies identification using support vector machine based on local deep multi-kernel learning. Pet Sci. https://doi.org/10.1007/s12182-020-00474-6
Manivannan S (2022) An ensemble based deep semi supervised learning for the classification of wafer bin maps defect patterns. Comput Ind Eng 172(PA):108614. https://doi.org/10.1016/j.cie.2022.108614
Meng F, Cheng W, & Wang J (2021). Semi-supervised Software Defect Prediction Model Based on Tri-training. KSII Trans Internet Inform Syst. https://doi.org/10.3837/TIIS.2021.11.009
Panjei E, Gruenwald L, Leal E, Nguyen C & Silvia S (2022). A survey on outlier explanations. VLDB J. https://doi.org/10.1007/s00778-021-00721-1
Qi GJ & Luo J (2022). Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2020.3031898
Qian T, Liu B, Chen L, Peng Z, Zhong M, He G, Li X & Xu G (2016). Tri-Training for authorship attribution with limited training data: A comprehensive study. Neurocomputing. https://doi.org/10.1016/j.neucom.2015.07.064
Ren Q, Zhang H, Zhang D, Zhao X, Yan L, Rui J, Zeng F, Zhu X (2022) A framework of active learning and semi-supervised learning for lithology identification based on improved naive Bayes. Expert Syst Appl 202(January):117278. https://doi.org/10.1016/j.eswa.2022.117278
Ruiyi HA, Zhuwen WA, Wenhua WA, Fanghui XU, Xinghua QI, Yitong CU (2021). Lithology identification of igneous rocks based on XGboost and conventional logging curves, a case study of the eastern depression of Liaohe Basin. J Appl Geophysics. https://doi.org/10.1016/j.jappgeo.2021.104480
dos Santos TD, Roisenberg M & dos Santos Nascimento, M. (2022). Deep Recurrent Neural Networks Approach to Sedimentary Facies Classification Using Well Logs. IEEE Geosci Remote Sens Lett. https://doi.org/10.1109/LGRS.2021.3053383
Sun J, Li Q, Chen M, Ren L, Huang G, Li C & Zhang Z (2019). Optimization of models for a rapid identification of lithology while drilling - A win-win strategy based on machine learning. J Petr Sci Eng. https://doi.org/10.1016/j.petrol.2019.01.006
Tewari S & Dwivedi UD (2019). Ensemble-based big data analytics of lithofacies for automatic development of petroleum reservoirs. Comput Ind Eng. https://doi.org/10.1016/j.cie.2018.08.018
van Engelen JE & Hoos HH (2020). A survey on semi-supervised learning. Mach Learn. https://doi.org/10.1007/s10994-019-05855-6
Wang G, Carr TR, Ju Y & Li C (2014). Identifying organic-rich Marcellus Shale lithofacies by support vector machine classifier in the Appalachian basin. Comput Geosci. https://doi.org/10.1016/j.cageo.2013.12.002
Wei J, Jian-Qi Z & Xiang Z (2011). Face recognition method based on support vector machine and particle swarm optimization. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2010.09.108
Xie Y, Zhu C, Hu R & Zhu Z (2021). A Coarse-to-Fine Approach for Intelligent Logging Lithology Identification with Extremely Randomized Trees. Math Geosci. https://doi.org/10.1007/s11004-020-09885-y
Xie Y, Zhu C, Zhou W, Li Z, Liu X & Tu M (2018). Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances. J Pet Sci Eng. https://doi.org/10.1016/j.petrol.2017.10.028
Xu T, Chang J, Feng D, Lv W, Kang Y, Liu H, Li J & Li Z (2021). Evaluation of active learning algorithms for formation lithology identification. J Pet Sci Eng. https://doi.org/10.1016/j.petrol.2021.108999
Yang G, Zheng W, Che C & Wang W (2020). Graph-based label propagation algorithm for community detection. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-019-01042-0
Ye Z, Guo S, Chen D, Wang H, & Li S (2021). Drilling formation perception by supervised learning: Model evaluation and parameter analysis. J Nat Gas Sci Eng. https://doi.org/10.1016/j.jngse.2021.103923
Yu Z, Wang Z, Zeng F, Song P, Baffour BA, Wang P, Wang W & Li L (2021). Volcanic lithology identification based on parameter-optimized GBDT algorithm: A case study in the Jilin Oilfield, Songliao Basin, NE China. J Appl Geophysics. https://doi.org/10.1016/j.jappgeo.2021.104443
Yuan C, Wu Y, Li Z, Zhou H, Chen S, Kang Y (2022) Lithology identification by adaptive feature aggregation under scarce labels. J Pet Sci Eng 215(PA):110540. https://doi.org/10.1016/j.petrol.2022.110540
Zeng, L., Su, H., Tang, X., Peng, Y., & Gong, L. (2013). Fractured tight sandstone oil and gas reservoirs: A new play type in the Dongpu depression, Bohai Bay Basin, China. AAPG Bullet. https://doi.org/10.1306/09121212057
Zhang Y, Liu Y, Jin R, Tao J, Chen L & Wu X (2020). GLLPA: A Graph Layout based Label Propagation Algorithm for community detection. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2020.106363
Zheng W, Tian F, Di Q, Xin W, Cheng F & Shan X (2021). Electrofacies classification of deeply buried carbonate strata using machine learning methods: A case study on ordovician paleokarst reservoirs in Tarim Basin. Mar Pet Geol. https://doi.org/10.1016/j.marpetgeo.2020.104720
Zhou K, Zhang J, Ren Y, Huang Z & Zhao L (2020). A gradient boosting decision tree algorithm combining synthetic minority oversampling technique for lithology identification. Geophysics. https://doi.org/10.1190/geo2019-0429.1
Zhou ZH & Feng J (2019). Deep forest. Natl Sci Rev. https://doi.org/10.1093/nsr/nwy108
Zhou ZH & Li M (2005). Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2005.186
Funding
The research was supported by National Natural Science Foundation of China (No. 41374116 and 41674113).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Xinyi Zhu: Methodology, Software, Visualization, Writing—original draft. Hongbing Zhang: Data collection, Conceptualization, Writing—review & editing. Quan Ren: Validation, Software, Data curation. Dailu Zhang: Supervision, Parameter analysis. Fanxing Zeng: Resources, Data curation. Xinjie Zhu: Resources, Data curation. Lingyuan Zhang: Programing, Visualization. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Communicated by: H. Babaie
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, X., Zhang, H., Ren, Q. et al. A Tri-Training method for lithofacies identification under scarce labeled logging data. Earth Sci Inform 16, 1489–1501 (2023). https://doi.org/10.1007/s12145-023-00986-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-023-00986-w