Skip to main content
Log in

An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Previous major earthquake events have revealed that soils susceptible to liquefaction are one of the factors causing significant damages to the structures. Therefore, accurate prediction of the liquefaction phenomenon is an important task in earthquake engineering. Over the past decade, several researchers have been extensively applied machine learning (ML) methods to predict soil liquefaction. This paper presents the prediction of soil liquefaction from the SPT dataset by using relatively new and robust tree-based ensemble algorithms, namely Adaptive Boosting, Gradient Boosting Machine, and eXtreme Gradient Boosting (XGBoost). The innovation points introduced in this paper are presented briefly as follows. Firstly, Stratified Random Sampling was utilized to ensure equalized sampling between each class selection. Secondly, feature selection methods such as Recursive Feature Elimination, Boruta, and Stepwise Regression were applied to develop models with a high degree of accuracy and minimal complexity by selecting the variables with significant predictive features. Thirdly, the performance of ML algorithms with feature selection methods was compared in terms of four performance metrics, Overall Accuracy, Precision, Recall, and F-measure to select the best model. Lastly, the best predictive model was determined using a statistical significance test called Wilcoxon’s sign rank test. Furthermore, computational cost analyses of the tree-based ensemble algorithms were performed based on parallel and non-parallel processing. The results of the study suggest that all developed tree-based ensemble models could reliably estimate soil liquefaction. In conclusion, according to both validation and statistical results, the XGBoost with the Boruta model achieved the most stable and better prediction performance than the other models in all considered cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Towhata I (2008) Geotechnical earthquake engineering. Springer-Verlag, Berlin

    Book  Google Scholar 

  2. Ishihara K, Koga Y (1981) Case studies of liquefaction in the 1964 Niigata earthquake. Soils Found 21(3):35–52

    Article  Google Scholar 

  3. Seed HB, Idriss IM (1967) Analysis of soil liquefaction: Niigata earthquake. J Soil Mech Found Div 93(3):83–108

    Article  Google Scholar 

  4. Youd T (2014) Ground failure investigations following the 1964 Alaska Earthquake. In: Proceedings of the 10th National Conference in Earthquake Engineering, Earthquake Engineering Research Institute, Anchorage, AK

  5. Chen L, Yuan X, Cao Z, Hou L, Sun R, Dong L, Wang W, Meng F, Chen H (2009) Liquefaction macrophenomena in the great Wenchuan earthquake. Earthq Eng Eng Vib 8(2):219–229

    Article  Google Scholar 

  6. Orense RP, Kiyota T, Yamada S, Cubrinovski M, Hosono Y, Okamura M, Yasuda S (2011) Comparison of liquefaction features observed during the 2010 and 2011 Canterbury earthquakes. Seis Res Lett 82(6):905–918

    Article  Google Scholar 

  7. Yasuda S, Harada K, Ishikawa K, Kanemaru Y (2012) Characteristics of liquefaction in Tokyo Bay area by the 2011 Great East Japan earthquake. Soils Found 52(5):793–810

    Article  Google Scholar 

  8. Papathanassiou G, Mantovani A, Tarabusi G, Rapti D, Caputo R (2015) Assessment of liquefaction potential for two liquefaction prone areas considering the May 20, 2012 Emilia (Italy) earthquake. Eng Geol 189:1–16

    Article  Google Scholar 

  9. Seed HB, Idriss IM (1971) Simplified procedure for evaluating soil liquefaction potential. J Soil Mech and Found Div 97(9):1249–1273

    Article  Google Scholar 

  10. Robertson PK, Wride C (1998) Evaluating cyclic liquefaction potential using the cone penetration test. Can Geotech J 35(3):442–459

    Article  Google Scholar 

  11. Andrus RD, Stokoe KH II (2000) Liquefaction resistance of soils from shear-wave velocity. J Geotech Geoenviron Eng 126(11):1015–1025

    Article  Google Scholar 

  12. Cetin KO, Seed RB, Der Kiureghian A, Tokimatsu K, Harder LF Jr, Kayen RE, Moss RE (2004) Standard penetration test-based probabilistic and deterministic assessment of seismic soil liquefaction potential. J Geotech Geoenviron Eng 130(12):1314–1340

    Article  Google Scholar 

  13. Moss R, Seed RB, Kayen RE, Stewart JP, Der Kiureghian A, Cetin KO (2006) CPT-based probabilistic and deterministic assessment of in situ seismic soil liquefaction potential. J Geotech Geoenviron Eng 132(8):1032–1051

    Article  Google Scholar 

  14. Kayen R, Moss R, Thompson E, Seed R, Cetin K, Kiureghian AD, Tanaka Y, Tokimatsu K (2013) Shear-wave velocity–based probabilistic and deterministic assessment of seismic soil liquefaction potential. J Geotech Geoenviron Eng 139(3):407–419

    Article  Google Scholar 

  15. Boulanger R, Idriss I (2014) CPT and SPT based liquefaction triggering procedures. Report No UCD/CGM-14 1

  16. Boulanger RW, Idriss I (2016) CPT-based liquefaction triggering procedure. J Geotech Geoenviron Eng 142(2):04015065

    Article  Google Scholar 

  17. Cetin KO, Seed RB, Kayen RE, Moss RE, Bilge HT, Ilgac M, Chowdhury K (2018) SPT-based probabilistic and deterministic assessment of seismic soil liquefaction triggering hazard. Soil Dynam Earthq Eng 115:698–709

    Article  Google Scholar 

  18. Zhang W, Li H, Li Y, Liu H, Chen Y, Ding X (2021) Application of deep learning algorithms in geotechnical engineering: a short critical review. Artif Intell Rev 54:5633–5673. https://doi.org/10.1007/s10462-021-09967-1

    Article  Google Scholar 

  19. Durante MG, Rathje EM (2021) An exploration of the use of machine learning to predict lateral spreading. Earthq Spect. https://doi.org/10.1177/87552930211004613

    Article  Google Scholar 

  20. Xie Y, Ebad Sichani M, Padgett JE, DesRoches R (2020) The promise of implementing machine learning in earthquake engineering: a state-of-the-art review. Earthq Spect 36(4):1769–1801

    Article  Google Scholar 

  21. Goh AT (1996) Neural-network modeling of CPT seismic liquefaction data. J Geotech Geoenviron Eng 122(1):70–73

    Article  Google Scholar 

  22. Pal M (2006) Support vector machines-based modelling of seismic liquefaction potential. Int J Num Anal Meth Geomech 30(10):983–996

    Article  MATH  Google Scholar 

  23. Goh AT, Goh S (2007) Support vector machines: their use in geotechnical engineering as illustrated using seismic liquefaction data. Comput Geotech 34(5):410–421

    Article  Google Scholar 

  24. Hanna AM, Ural D, Saygili G (2007) Neural network model for liquefaction potential in soil deposits using Turkey and Taiwan earthquake data. Soil Dynam Earthq Eng 27(6):521–540

    Article  Google Scholar 

  25. Ülgen D, Engin HK (2007) A study of CPT based liquefaction assessment using artificial neural networks. In: 4th international conference on earthquake geotechnical engineering, pp. 1–12

  26. Rezania M, Faramarzi A, Javadi AA (2011) An evolutionary based approach for assessment of earthquake-induced soil liquefaction and lateral displacement. Eng Appl Artif Intell 24(1):142–153

    Article  Google Scholar 

  27. Zhang J, Zhang LM, Huang HW (2013) Evaluation of generalized linear models for soil liquefaction probability prediction. Environ Earth Sci 68(7):1925–1933

    Article  Google Scholar 

  28. Kohestani V, Hassanlourad M, Ardakani A (2015) Evaluation of liquefaction potential based on CPT data using random forest. Nat Hazards 79(2):1079–1089

    Article  Google Scholar 

  29. Hoang N-D, Bui DT (2018) Predicting earthquake-induced soil liquefaction based on a hybridization of kernel Fisher discriminant analysis and a least squares support vector machine: a multi-dataset study. Bull Eng Geol Env 77(1):191–204

    Article  Google Scholar 

  30. Pirhadi N, Tang X, Yang Q, Kang F (2019) A new equation to evaluate liquefaction triggering using the response surface method and parametric sensitivity analysis. Sustainability 11(1):112

    Article  Google Scholar 

  31. Zhou J, Li E, Wang M, Chen X, Shi X, Jiang L (2019) Feasibility of stochastic gradient boosting approach for evaluating seismic liquefaction potential based on SPT and CPT case histories. J Perform Constr Facil 33(3):04019024

    Article  Google Scholar 

  32. Cai M, Hocine O, Mohammed AS, Chen X, Amar MN, Hasanipanah M (2021) Integrating the LSSVM and RBFNN models with three optimization algorithms to predict the soil liquefaction potential. Eng Comput, 1–13

  33. Zhao Z, Duan W, Cai G (2021) A novel PSO-KELM based soil liquefaction potential evaluation system using CPT and Vs measurements. Soil Dynam Earthq Eng 150:106930

    Article  Google Scholar 

  34. Wang L, Wu C, Tang L, Zhang W, Lacasse S, Liu H, Gao L (2020) Efficient reliability analysis of earth dam slope stability using extreme gradient boosting method. Acta Geotech 15(11):3135–3150

    Article  Google Scholar 

  35. Wang M-X, Huang D, Wang G, Li D-Q (2020) SS-XGBoost: a machine learning framework for predicting newmark sliding displacements of slopes. J Geotech Geoenviron Eng 146(9):04020074

    Article  Google Scholar 

  36. Bharti JP, Mishra P, Sathishkumar V, Cho Y, Samui P (2021) Slope stability analysis using Rf, Gbm, Cart, Bt and Xgboost. Geotech Geol Eng 39(5):3741–3752

    Article  Google Scholar 

  37. Zhang W, Wu C, Zhong H, Li Y, Wang L (2021) Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci Front 12(1):469–477

    Article  Google Scholar 

  38. Polikar R (2012) Ensemble learning. Ensemble machine learning. Springer, pp. 1–34

  39. Worasucheep C (2021) Ensemble classifier for stock trading recommendation. Appl Artif Intell, 1–32

  40. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  MATH  Google Scholar 

  41. Quinlan JR (1996) Bagging, boosting, and C4. 5. Aaai/iaai 1:725–730

    Google Scholar 

  42. Rocca J (2019) Ensemble methods: bagging, boosting and stacking. medium-towards data science. https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205

  43. Papadopoulos S, Azar E, Woon W-L, Kontokosta CE (2018) Evaluation of tree-based ensemble learning algorithms for building energy performance estimation. J Build Perform Simul 11(3):322–332

    Article  Google Scholar 

  44. Bou-hamad I, Larocque D, Ben-Ameur H, Mâsse LC, Vitaro F, Tremblay RE (2009) Discrete-time survival trees. Can J Stat 37(1):17–32

    Article  MATH  Google Scholar 

  45. Sabbeh SF (2018) Machine-learning techniques for customer retention: a comparative study. Int J Adv Comput Sci Appl, 9(2). https://doi.org/10.14569/IJACSA.2018.090238

  46. Qi Y, Bar-Joseph Z, Klein-Seetharaman J (2006) Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3):490–500

    Article  Google Scholar 

  47. Musbah H, Aly HH, Little TA (2021) Energy management of hybrid energy system sources based on machine learning classification algorithms. Electric Power Syst Res 199:107436

    Article  Google Scholar 

  48. Muhammad L, Islam MM, Usman SS, Ayon SI (2020) Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery. SN Comp Sci 1(4):1–7

    Google Scholar 

  49. Pham BT, Nguyen MD, Nguyen-Thoi T, Ho LS, Koopialipoor M, Quoc NK, Armaghani DJ, Van Le H (2021) A novel approach for classification of soils based on laboratory tests using Adaboost. Tree ANN Model Transp Geotech 27:100508

    Google Scholar 

  50. Wang X, Li Z, Shafieezadeh A (2021) Seismic response prediction and variable importance analysis of extended pile-shaft-supported bridges against lateral spreading: exploring optimized machine learning models. Eng Struct 236:112142

    Article  Google Scholar 

  51. Chen Z, Li H, Goh ATC, Wu C, Zhang W (2020) Soil liquefaction assessment using soft computing approaches based on capacity energy concept. Geosciences 10(9):330

    Article  Google Scholar 

  52. Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications. Springer, Berlin

    MATH  Google Scholar 

  53. Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  54. Das SK, Mohanty R, Mohanty M, Mahamaya M (2020) Multi-objective feature selection (MOFS) algorithms for prediction of liquefaction susceptibility of soil based on in situ test methods. Nat Hazards 103:2371–2393

    Article  Google Scholar 

  55. Kuhn M, Johnson K (2019) Feature engineering and selection: A practical approach for predictive models. CRC Press, Boca Raton

    Book  Google Scholar 

  56. Hu J (2021) Data cleaning and feature selection for gravelly soil liquefaction. Soil Dynam Earthq Eng 145:106711

    Article  Google Scholar 

  57. Demir S, Sahin EK (2021) Assessment of feature selection for liquefaction prediction based on recursive feature elimination. Eur J Sci Tech 28:290–294

    Google Scholar 

  58. Team RDC (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org.

  59. Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut Comput 1(1):3–18

    Article  Google Scholar 

  60. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MATH  Google Scholar 

  61. Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360

    Article  MATH  Google Scholar 

  62. An T-K, Kim M-H (2010) A new diverse AdaBoost classifier. In: 2010 International conference on artificial intelligence and computational intelligence. IEEE, pp 359–363

  63. Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobot 7:21

    Article  Google Scholar 

  64. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794

  65. Qin C, Zhang Y, Bao F, Zhang C, Liu P, Liu P (2021) XGBoost optimized by adaptive particle swarm optimization for credit scoring. Math Probl Eng. https://doi.org/10.1155/2021/6655510

    Article  Google Scholar 

  66. XGBoost-Documentation (2021). https://xgboost.readthedocs.io/en/stable/. Accessed 16 Sept 2021

  67. Zhang H, Qiu D, Wu R, Deng Y, Ji D, Li T (2019) Novel framework for image attribute annotation with gene selection XGBoost algorithm and relative attribute model. Appl Soft Comput 80:57–79

    Article  Google Scholar 

  68. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  69. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1):389–422

    Article  MATH  Google Scholar 

  70. Shi F, Peng X, Liu Z, Li E, Hu Y (2020) A data-driven approach for pipe deformation prediction based on soil properties and weather conditions. Sustain Cities Soc 55:102012

    Article  Google Scholar 

  71. Sun D, Shi S, Wen H, Xu J, Zhou X, Wu J (2021) A hybrid optimization method of factor screening predicated on GeoDetector and random forest for landslide susceptibility mapping. Geomorphology 379:107623

    Article  Google Scholar 

  72. Svetnik V, Liaw A, Tong C, Wang T (2004) Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: International workshop on multiple classifier systems. Springer, pp 334–343

  73. Paja W, Pancerz K, Grochowalski P (2018) Generational feature elimination and some other ranking feature selection methods. Advances in feature selection for data and pattern recognition. Springer, pp. 97–112

  74. Kursa MB, Rudnicki WR (2010) Feature selection with the Boruta package. J Stat Softw 36(11):1–13

    Article  Google Scholar 

  75. Stańczyk U, Zielosko B, Jain LC (2018) Advances in feature selection for data and pattern recognition: an introduction. Advances in feature selection for data and pattern recognition. Springer, pp 1–9

  76. Breaux HJ (1967) On stepwise multiple linear regression. Report no. 1369. Ballistic research laboratories aberdeen proving ground, Maryland

  77. Kumar S, Attri S, Singh K (2019) Comparison of Lasso and stepwise regression technique for wheat yield prediction. J Agrometeorol 21(2):188–192

    Article  Google Scholar 

  78. Chowdhury MZI, Turin TC (2020) Variable selection strategies and its importance in clinical prediction modelling. Fam Med Commun Health 8(1):e000262. https://doi.org/10.1136/fmch-2019-000262

    Article  Google Scholar 

  79. Huang C, Townshend J (2003) A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover. Int J Remote Sens 24(1):75–90

    Article  Google Scholar 

  80. Huang C, Davis L, Townshend J (2002) An assessment of support vector machines for land cover classification. Int J Remote Sens 23(4):725–749

    Article  Google Scholar 

  81. Maxwell AE, Warner TA, Fang F (2018) Implementation of machine-learning classification in remote sensing: an applied review. Int J Remote Sens 39(9):2784–2817

    Article  Google Scholar 

  82. Etikan I, Bala K (2017) Sampling and sampling methods. Biom Biostat Int J 5(6):00149

    Google Scholar 

  83. Berndt AE (2020) Sampling methods. J Hum Lact 36(2):224–226

    Article  Google Scholar 

  84. Fink A (2003) How to sample in surveys. Sage, Thousand Oaks

    Book  Google Scholar 

  85. Samui P, Sitharam T (2011) Machine learning modelling for predicting soil liquefaction susceptibility. Nat Hazards Earth Syst Sci 11(1):1–9

    Article  Google Scholar 

  86. Demir S, Sahin EK (2022) Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on CPT data. Soil Dynam Earth Eng 154:107130

    Article  Google Scholar 

  87. Ao S-I (2008) Data mining and applications in genomics. Springer Science & Business Media, Berlin

    Book  MATH  Google Scholar 

  88. Sahin EK (2022) Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. Geocarto Int 37(9):2441–2465. https://doi.org/10.1080/10106049.2020.1831623

    Article  Google Scholar 

  89. Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(1):1–26

    Google Scholar 

  90. Keyport RN, Oommen T, Martha TR, Sajinkumar K, Gierke JS (2018) A comparative analysis of pixel-and object-based detection of landslides from very high-resolution images. Int J App Earth Obs Geoinf 64:1–11

    Google Scholar 

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

SD: Conceptualization, Investigation, Writing-review and editing, Writing-original draft, Visualization. EKS: Conceptualization, Methodology, Software, Writing-review and editing, Writing-original draft, Visualization.

Corresponding author

Correspondence to Selçuk Demir.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Data availability

The dataset analyzed during the current study are publicly available at location cited in the reference section.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Demir, S., Sahin, E.K. An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Comput & Applic 35, 3173–3190 (2023). https://doi.org/10.1007/s00521-022-07856-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07856-4

Keywords

Navigation