A Novel Hybrid Technique of Integrating Gradient-Boosted Machine and Clustering Algorithms for Lithology Classification

  • Solomon Asante-Okyere
  • Chuanbo ShenEmail author
  • Yao Yevenyo Ziggah
  • Mercy Moses Rulegeya
  • Xiangfeng Zhu
Original Paper


The significant body of research on lithology identification in recent years has laid emphasis on the improvement of classification performance using hybrid machine learning methods. To the best of our knowledge, a hybrid lithology classification model that integrates clustering results of well log data has not been developed. This study, therefore, exploits the advantage of incorporating results from clustering well log data into 2 and 3 groups using K-means and Gaussian mixture models (GMM) to construct a more accurate gradient-boosted machine (GBM) lithology model. The findings of the study showed that improved performance in terms of classification accuracy rate was achieved by the K-means-based GBM classifiers. In addition, GMM-based GBM established an enhanced performance when the developed classifiers were tested on the entire dataset. A rigorous examination of the confusion matrices generated by the classifiers further revealed that the increase in the performance from the clustering-based hybrid GBM models was attributed to the improvement in recognizing mudstone and siltstone, which represents the main lithofacies that are found in the South Yellow Sea’s southern Basin. The findings from the present paper demonstrate that a clustering-based hybrid GBM model can handle new independent lithofacies classification better than GBM.


K-means Gaussian mixture models Gradient-boosted machine Lithology 



This work was supported by the Major National Science and Technology Programs in the “Thirteenth Five-Year” Plan period (Nos. 2016ZX05024-002-005, 2017ZX05032-002-004), the Outstanding Youth Funding of Natural Science Foundation of Hubei Province (No. 2016CFA055), the Program of Introducing Talents of Discipline to Universities (No. B14031), and the Fundamental Research Fund for the Central Universities, China University of Geosciences (Wuhan, No. CUGCJ1820).


  1. Aggarwal, C. C., & Reddy, C. K. (2013). Data clustering: Algorithms and applications (pp. 2–21). Boca Raton: CRC Press.CrossRefGoogle Scholar
  2. Al-Anazi, A., & Gates, I. D. (2010a). A support vector machine algorithm to classify lithofacies and model permeability in heterogeneous reservoirs. Engineering Geology, 114, 267–277.CrossRefGoogle Scholar
  3. Al-Anazi, A., & Gates, I. D. (2010b). On the capability of support vector machines to classify lithology from well logs. Natural Resources Research, 19, 125–139.CrossRefGoogle Scholar
  4. Aler, R., Galván, I. M., Ruiz-Arias, J. A., & Gueymard, C. A. (2017). Improving the separation of direct and diffuse solar radiation components using machine learning by gradient boosting. Solar Energy, 150, 558–569.CrossRefGoogle Scholar
  5. Al-Mohair, H. K., Saleh, J. M., & Suandi, S. A. (2015). Hybrid human skin detection using neural network and K-means clustering technique. Applied Soft Computing, 33, 337–347.CrossRefGoogle Scholar
  6. Amirgaliev, E., Isabaev, Z., Iskakov, S., Kuchin, Y., Muhamedyev, R., Muhamedyeva, E., et al. (2014). Recognition of rocks at uranium deposits by using a few methods of machine learning. Soft Computing in Machine Learning Advances in Intelligent Systems and Computing, 273, 33–40.CrossRefGoogle Scholar
  7. Asante-Okyere, S., Shen, C., Ziggah, Y. Y., Rulegeya, M. M., & Zhu, X. (2018). Investigating the predictive performance of Gaussian process regression in evaluating reservoir porosity and permeability. Energies, 11, 3261.CrossRefGoogle Scholar
  8. Bartetzko, A., Delius, H., & Pechnig, R. (2005). Effect of compositional and structural variations on log responses of igneous and metamorphic rocks. In P. K. Harvey, T. S. Brewer, P. A. Pezard, & V. A. Petrov (Eds.), Petrophysical properties of crystalline rocks (pp. 255–278). London: Geological Society Special Publications.Google Scholar
  9. Bramer, M. (2013). Data for data mining. In Principles of data mining. Undergraduate topics in computer science. London: Springer. ISBN: 978-1-4471-4884-5.
  10. Chang, H. C., Kopaska-Merkel, D. C., Chen, H. C., & Durrans, S. R. (2000). Lithofacies identification using multiple adaptive resonance theory neural networks and group decision expert system. Computers & Geosciences, 26, 591–601.CrossRefGoogle Scholar
  11. Chang, Y., Chang, K., & Wu, G. (2018). Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Applied Soft Computing Journal, 73, 914–920.CrossRefGoogle Scholar
  12. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.CrossRefGoogle Scholar
  13. Cortina-Januchs, M. G., Quintanilla-Dominguez, J., Vega-Corona, A., & Andina, D. (2015). Development of a model for forecasting of PM10 concentrations in Salamanca, Mexico. Atmospheric Pollution Research, 6, 626–634.CrossRefGoogle Scholar
  14. Cracknell, M. J., & Reading, A. M. (2014). Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Computers & Geosciences, 63(2014), 22–33.CrossRefGoogle Scholar
  15. Day, N. E. (1969). Estimating the components of a mixture of normal distributions. Biometrika, 56(3), 463–474.CrossRefGoogle Scholar
  16. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.Google Scholar
  17. Deng, C., Pan, H., Fang, S., Konaté, A. A., & Qin, R. (2017). Support vector machine as an alternative method for lithology classification of crystalline rocks. Journal of Geophysics and Engineering, 14, 341–349.CrossRefGoogle Scholar
  18. Ehsan, M., Gu, H., Akhtar, M. M., Abbasi, S. S., & Ullah, Z. (2018). Identification of hydrocarbon potential of talhar shale: Member of lower goru formation by using well logs derived parameters, southern lower Indus basin, Pakistan. Journal of Earth Science, 29, 587–593.CrossRefGoogle Scholar
  19. Elangasinghe, M. A., Singhal, N., Dirks, K. N., Salmond, J. A., & Samarasinghe, S. (2014). Complex time series analysis of PM 10 and PM 2.5 for a coastal site using artificial neural network modelling and k-means clustering. Atmospheric Environment, 94, 106–116.CrossRefGoogle Scholar
  20. Franceschi, F., Cobo, M., & Figueredo, M. (2018). Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá, Colombia, using artificial neural networks, principal component analysis, and k-means clustering. Atmospheric Pollution Research, 9(5), 912–922.CrossRefGoogle Scholar
  21. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232.CrossRefGoogle Scholar
  22. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38, 367–378.CrossRefGoogle Scholar
  23. Gallego, A., Calvo-Zaragoza, J., Valero-Mas, J., & Rico-Juan, J. R. (2018). Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recognition, 74, 531–543.CrossRefGoogle Scholar
  24. Gao, D., Cheng, R., Shen, Y., Wang, L., & Hu, X. (2018). Weathered and volcanic provenance-sedimentary system and its influence on reservoir quality in the east of the eastern depression, the north Yellow Sea Basin. Journal of Earth Science, 29, 353–368.CrossRefGoogle Scholar
  25. Golden, C. E., Rothrock, M. J., & Mishra, A. (2019). Comparison between random forest and gradient boosting machine methods for predicting Listeria spp. prevalence in the environment of pastured poultry farms. Food Research International. Scholar
  26. Hill, E. J., Robertson, J., & Uvarova, Y. (2015). Multiscale hierarchical domaining and compression of drill hole data. Computers & Geosciences, 79, 47–57.CrossRefGoogle Scholar
  27. Hill, E. J., & Uvarova, Y. (2018). Identifying the nature of lithogeochemical boundaries in drill holes. Journal of Geochemical Exploration, 184 Part A, 167–178.CrossRefGoogle Scholar
  28. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.CrossRefGoogle Scholar
  29. Jiang, H., Daigle, H., Tian, X., Pyrcz, M. J., Griffith, C., & Zhang, B. (2019). A comparison of clustering algorithms applied to fluid characterization using NMR T1-T2 maps of shale. Computers & Geosciences, 126, 52–61.CrossRefGoogle Scholar
  30. Kassenaar JDC (1991) An application of principal components analysis to borehole geophysical data. In Proceedings of the fourth international symposium on borehole geophysics for minerals, geotechnical and groundwater applications, Toronto, ON, Canada (pp. 211–218).Google Scholar
  31. Konaté, A. A., Ma, H., Pan, H., Qin, Z., Ahmed, H. A., & Dembele, N. J. (2017). Lithology and mineralogy recognition from geochemical logging tool data using multivariate statistical analysis. Applied Radiation and Isotopes, 128, 55–67.CrossRefGoogle Scholar
  32. Konaté, A. A., Pan, H., Ma, H., Cao, X., Ziggah, Y. Y., Oloo, M., et al. (2015). Application of dimensionality reduction technique to improve geo-physical log data classification performance in crystalline rocks. Journal of Petroleum Science and Engineering, 133, 633–645.CrossRefGoogle Scholar
  33. Li, K., Ma, Z., Robinson, D., & Ma, J. (2018). Identification of typical building daily electricity usage profiles using Gaussian mixture model-based clustering and hierarchical clustering. Applied Energy, 231, 331–342.CrossRefGoogle Scholar
  34. Maiti, S., & Tiwari, R. K. (2010). Neural network modeling and an uncertainty analysis in Bayesian framework: A case study from the KTB borehole site. Journal of Geophysical Research: Solid Earth. Scholar
  35. McLachlan, G. J., Peel, D., Basford, K. E., & Adams, P. (1999). The EMMIX software for the fitting of mixtures of normal and t-components. Journal of Statistical Software, 4(2), 1–14.CrossRefGoogle Scholar
  36. Neal, R. M., & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models (pp. 355–368). Dordrecht: Springer.CrossRefGoogle Scholar
  37. Othman, A. A., & Gloaguen, R. (2017). Integration of spectral, spatial and morphometric data into lithological mapping: A comparison of different machine learning algorithms in the Kurdistan Region, NE Iraq. Journal of Asian Earth Sciences, 146, 90–102.CrossRefGoogle Scholar
  38. Pang, Y., Zhang, X., Xiao, G., Wen, Z., Guo, X., Hou, F., et al. (2016). Structural and geological characteristics of the south yellow sea basin in lower Yangtze block. Geological Review, 62, 604–616. (In Chinese).Google Scholar
  39. Pechnig R, Bartetzko A, Delius H (2001) Effects of compositional and structural variations on log responses in igneous and metamorphic rocks. In Proceedings of the AGU fall meeting, San Francisco, CA, USA, Abstract V32C-0988.Google Scholar
  40. Pechnig, R., Delius, H., & Bartetzko, A. (2005). Effect of compositional variations on log responses of igneous and metamorphic rocks, Chapter 2: Acid and intermediate rocks. In P. K. Harvey, T. S. Brewer, P. A. Pezard, & V. A. Petrov (Eds.), Petrophysical properties of crystalline rocks (pp. 279–300). London: Geological Society Special Publications.Google Scholar
  41. Rao, H., Shi, X., Rodrigue, A. K., Feng, J., Xia, Y., Elhoseny, M., et al. (2019). Feature selection based on artificial bee colony and gradient boosting decision tree. Applied Soft Computing Journal, 74, 634–642.CrossRefGoogle Scholar
  42. Saggaf, M. M., & Nebrija, E. L. (2008). Estimation of lithologies and depositional facies from wire-line logs. AAPG Bulletin, 4, 1633–1646.Google Scholar
  43. Saggaf, M. M., & Nebrija, L. (2003). A fuzzy logic approach for the estimation of facies from wire-line logs. AAPG Bulletin, 87, 1223–1240.CrossRefGoogle Scholar
  44. Salim, A. M. A., Pan, H. P., Luo, M., & Zhou, F. (2008). Integrated log interpretation in the Chinese continental scientific drilling main hole (Eastern China): Lithology and mineralization. Journal of Applied Sciences, 8, 3593–3602.CrossRefGoogle Scholar
  45. Saporetti, C. M., Duarte, G. R., Fonseca, T. L., Goliatt da Fonseca, L., & Pereira, E. (2018). Extreme learning machine combined with a differential evolution algorithm for lithology identification. Revista de Informática Teórica e Aplicada RITA, 25, 43–56.CrossRefGoogle Scholar
  46. Sebtosheikh, M. A., Motafakkerfard, R., Riahi, M. A., & Moradi, S. (2015). Separating well log data to train support vector machines for lithology prediction in a heterogeneous carbonate reservoir. Iranian Journal of Oil & Gas Science and Technology, 4, 1–14.Google Scholar
  47. Shen, C., Asante-Okyere, S., Ziggah, Y. Y., Wang, L., & Zhu, X. (2019). Group method of data handling (GMDH) lithology identification based on wavelet analysis and dimensionality reduction as well log data pre-processing techniques. Energies, 12, 1509.CrossRefGoogle Scholar
  48. Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Boston: Pearson Addison Wesley.Google Scholar
  49. Touzani, S., Granderson, J., & Fernandes, S. (2018). Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy and Buildings, 158, 1533–1543.CrossRefGoogle Scholar
  50. Wu, S., Ni, X., & Cai, F. (2008). Petroleum geological framework and hydrocarbon potential in the Yellow Sea. Chinese Journal of Oceanology and Limnology, 26, 23–34.CrossRefGoogle Scholar
  51. Xie, Y., Zhu, C., Zhou, W., Li, Z., & Tu, M. (2018). Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances. Journal of Petroleum Science and Engineering, 139, 182–193.CrossRefGoogle Scholar
  52. Yi, S., Yi, S., Batten, D. J., Yun, H., & Park, S. J. (2013). Cretaceous and Cenozoic non-marine deposits of the Northern South Yellow Sea Basin, offshore western Korea: palynostratigraphy and palaeoenvironments. Palaeogeography, Palaeoclimatology, Palaeoecology, 191, 15–44.CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geosciences 2019

Authors and Affiliations

  • Solomon Asante-Okyere
    • 1
    • 2
  • Chuanbo Shen
    • 1
    • 2
    Email author
  • Yao Yevenyo Ziggah
    • 3
  • Mercy Moses Rulegeya
    • 1
    • 2
  • Xiangfeng Zhu
    • 1
    • 2
  1. 1.Key Laboratory of Tectonics and Petroleum ResourcesMinistry of Education, China University of GeosciencesWuhanChina
  2. 2.Department of Petroleum Geology, School of Earth ResourcesChina University of GeosciencesWuhanChina
  3. 3.Department of Geomatic Engineering, Faculty of Mineral Resource TechnologyUniversity of Mines and TechnologyTarkwaGhana

Personalised recommendations