An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost

Demir, Selçuk; Sahin, Emrehan Kutlug

doi:10.1007/s00521-022-07856-4

An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost

Original Article
Published: 08 October 2022

Volume 35, pages 3173–3190, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1886 Accesses
26 Citations
1 Altmetric
Explore all metrics

Abstract

Previous major earthquake events have revealed that soils susceptible to liquefaction are one of the factors causing significant damages to the structures. Therefore, accurate prediction of the liquefaction phenomenon is an important task in earthquake engineering. Over the past decade, several researchers have been extensively applied machine learning (ML) methods to predict soil liquefaction. This paper presents the prediction of soil liquefaction from the SPT dataset by using relatively new and robust tree-based ensemble algorithms, namely Adaptive Boosting, Gradient Boosting Machine, and eXtreme Gradient Boosting (XGBoost). The innovation points introduced in this paper are presented briefly as follows. Firstly, Stratified Random Sampling was utilized to ensure equalized sampling between each class selection. Secondly, feature selection methods such as Recursive Feature Elimination, Boruta, and Stepwise Regression were applied to develop models with a high degree of accuracy and minimal complexity by selecting the variables with significant predictive features. Thirdly, the performance of ML algorithms with feature selection methods was compared in terms of four performance metrics, Overall Accuracy, Precision, Recall, and F-measure to select the best model. Lastly, the best predictive model was determined using a statistical significance test called Wilcoxon’s sign rank test. Furthermore, computational cost analyses of the tree-based ensemble algorithms were performed based on parallel and non-parallel processing. The results of the study suggest that all developed tree-based ensemble models could reliably estimate soil liquefaction. In conclusion, according to both validation and statistical results, the XGBoost with the Boruta model achieved the most stable and better prediction performance than the other models in all considered cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative analysis of ensemble learning algorithms with hyperparameter optimization for soil liquefaction prediction

Article 02 May 2024

Liquefaction prediction with robust machine learning algorithms (SVM, RF, and XGBoost) supported by genetic algorithm-based feature selection and parameter optimization from the perspective of data processing

Article 21 September 2022

Application of Boosting-Based Ensemble Learning Method for the Prediction of Compression Index

Article 18 April 2020

References

Towhata I (2008) Geotechnical earthquake engineering. Springer-Verlag, Berlin
Book Google Scholar
Ishihara K, Koga Y (1981) Case studies of liquefaction in the 1964 Niigata earthquake. Soils Found 21(3):35–52
Article Google Scholar
Seed HB, Idriss IM (1967) Analysis of soil liquefaction: Niigata earthquake. J Soil Mech Found Div 93(3):83–108
Article Google Scholar
Youd T (2014) Ground failure investigations following the 1964 Alaska Earthquake. In: Proceedings of the 10th National Conference in Earthquake Engineering, Earthquake Engineering Research Institute, Anchorage, AK
Chen L, Yuan X, Cao Z, Hou L, Sun R, Dong L, Wang W, Meng F, Chen H (2009) Liquefaction macrophenomena in the great Wenchuan earthquake. Earthq Eng Eng Vib 8(2):219–229
Article Google Scholar
Orense RP, Kiyota T, Yamada S, Cubrinovski M, Hosono Y, Okamura M, Yasuda S (2011) Comparison of liquefaction features observed during the 2010 and 2011 Canterbury earthquakes. Seis Res Lett 82(6):905–918
Article Google Scholar
Yasuda S, Harada K, Ishikawa K, Kanemaru Y (2012) Characteristics of liquefaction in Tokyo Bay area by the 2011 Great East Japan earthquake. Soils Found 52(5):793–810
Article Google Scholar
Papathanassiou G, Mantovani A, Tarabusi G, Rapti D, Caputo R (2015) Assessment of liquefaction potential for two liquefaction prone areas considering the May 20, 2012 Emilia (Italy) earthquake. Eng Geol 189:1–16
Article Google Scholar
Seed HB, Idriss IM (1971) Simplified procedure for evaluating soil liquefaction potential. J Soil Mech and Found Div 97(9):1249–1273
Article Google Scholar
Robertson PK, Wride C (1998) Evaluating cyclic liquefaction potential using the cone penetration test. Can Geotech J 35(3):442–459
Article Google Scholar
Andrus RD, Stokoe KH II (2000) Liquefaction resistance of soils from shear-wave velocity. J Geotech Geoenviron Eng 126(11):1015–1025
Article Google Scholar
Cetin KO, Seed RB, Der Kiureghian A, Tokimatsu K, Harder LF Jr, Kayen RE, Moss RE (2004) Standard penetration test-based probabilistic and deterministic assessment of seismic soil liquefaction potential. J Geotech Geoenviron Eng 130(12):1314–1340
Article Google Scholar
Moss R, Seed RB, Kayen RE, Stewart JP, Der Kiureghian A, Cetin KO (2006) CPT-based probabilistic and deterministic assessment of in situ seismic soil liquefaction potential. J Geotech Geoenviron Eng 132(8):1032–1051
Article Google Scholar
Kayen R, Moss R, Thompson E, Seed R, Cetin K, Kiureghian AD, Tanaka Y, Tokimatsu K (2013) Shear-wave velocity–based probabilistic and deterministic assessment of seismic soil liquefaction potential. J Geotech Geoenviron Eng 139(3):407–419
Article Google Scholar
Boulanger R, Idriss I (2014) CPT and SPT based liquefaction triggering procedures. Report No UCD/CGM-14 1
Boulanger RW, Idriss I (2016) CPT-based liquefaction triggering procedure. J Geotech Geoenviron Eng 142(2):04015065
Article Google Scholar
Cetin KO, Seed RB, Kayen RE, Moss RE, Bilge HT, Ilgac M, Chowdhury K (2018) SPT-based probabilistic and deterministic assessment of seismic soil liquefaction triggering hazard. Soil Dynam Earthq Eng 115:698–709
Article Google Scholar
Zhang W, Li H, Li Y, Liu H, Chen Y, Ding X (2021) Application of deep learning algorithms in geotechnical engineering: a short critical review. Artif Intell Rev 54:5633–5673. https://doi.org/10.1007/s10462-021-09967-1
Article Google Scholar
Durante MG, Rathje EM (2021) An exploration of the use of machine learning to predict lateral spreading. Earthq Spect. https://doi.org/10.1177/87552930211004613
Article Google Scholar
Xie Y, Ebad Sichani M, Padgett JE, DesRoches R (2020) The promise of implementing machine learning in earthquake engineering: a state-of-the-art review. Earthq Spect 36(4):1769–1801
Article Google Scholar
Goh AT (1996) Neural-network modeling of CPT seismic liquefaction data. J Geotech Geoenviron Eng 122(1):70–73
Article Google Scholar
Pal M (2006) Support vector machines-based modelling of seismic liquefaction potential. Int J Num Anal Meth Geomech 30(10):983–996
Article MATH Google Scholar
Goh AT, Goh S (2007) Support vector machines: their use in geotechnical engineering as illustrated using seismic liquefaction data. Comput Geotech 34(5):410–421
Article Google Scholar
Hanna AM, Ural D, Saygili G (2007) Neural network model for liquefaction potential in soil deposits using Turkey and Taiwan earthquake data. Soil Dynam Earthq Eng 27(6):521–540
Article Google Scholar
Ülgen D, Engin HK (2007) A study of CPT based liquefaction assessment using artificial neural networks. In: 4th international conference on earthquake geotechnical engineering, pp. 1–12
Rezania M, Faramarzi A, Javadi AA (2011) An evolutionary based approach for assessment of earthquake-induced soil liquefaction and lateral displacement. Eng Appl Artif Intell 24(1):142–153
Article Google Scholar
Zhang J, Zhang LM, Huang HW (2013) Evaluation of generalized linear models for soil liquefaction probability prediction. Environ Earth Sci 68(7):1925–1933
Article Google Scholar
Kohestani V, Hassanlourad M, Ardakani A (2015) Evaluation of liquefaction potential based on CPT data using random forest. Nat Hazards 79(2):1079–1089
Article Google Scholar
Hoang N-D, Bui DT (2018) Predicting earthquake-induced soil liquefaction based on a hybridization of kernel Fisher discriminant analysis and a least squares support vector machine: a multi-dataset study. Bull Eng Geol Env 77(1):191–204
Article Google Scholar
Pirhadi N, Tang X, Yang Q, Kang F (2019) A new equation to evaluate liquefaction triggering using the response surface method and parametric sensitivity analysis. Sustainability 11(1):112
Article Google Scholar
Zhou J, Li E, Wang M, Chen X, Shi X, Jiang L (2019) Feasibility of stochastic gradient boosting approach for evaluating seismic liquefaction potential based on SPT and CPT case histories. J Perform Constr Facil 33(3):04019024
Article Google Scholar
Cai M, Hocine O, Mohammed AS, Chen X, Amar MN, Hasanipanah M (2021) Integrating the LSSVM and RBFNN models with three optimization algorithms to predict the soil liquefaction potential. Eng Comput, 1–13
Zhao Z, Duan W, Cai G (2021) A novel PSO-KELM based soil liquefaction potential evaluation system using CPT and Vs measurements. Soil Dynam Earthq Eng 150:106930
Article Google Scholar
Wang L, Wu C, Tang L, Zhang W, Lacasse S, Liu H, Gao L (2020) Efficient reliability analysis of earth dam slope stability using extreme gradient boosting method. Acta Geotech 15(11):3135–3150
Article Google Scholar
Wang M-X, Huang D, Wang G, Li D-Q (2020) SS-XGBoost: a machine learning framework for predicting newmark sliding displacements of slopes. J Geotech Geoenviron Eng 146(9):04020074
Article Google Scholar
Bharti JP, Mishra P, Sathishkumar V, Cho Y, Samui P (2021) Slope stability analysis using Rf, Gbm, Cart, Bt and Xgboost. Geotech Geol Eng 39(5):3741–3752
Article Google Scholar
Zhang W, Wu C, Zhong H, Li Y, Wang L (2021) Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci Front 12(1):469–477
Article Google Scholar
Polikar R (2012) Ensemble learning. Ensemble machine learning. Springer, pp. 1–34
Worasucheep C (2021) Ensemble classifier for stock trading recommendation. Appl Artif Intell, 1–32
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Article MATH Google Scholar
Quinlan JR (1996) Bagging, boosting, and C4. 5. Aaai/iaai 1:725–730
Google Scholar
Rocca J (2019) Ensemble methods: bagging, boosting and stacking. medium-towards data science. https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205
Papadopoulos S, Azar E, Woon W-L, Kontokosta CE (2018) Evaluation of tree-based ensemble learning algorithms for building energy performance estimation. J Build Perform Simul 11(3):322–332
Article Google Scholar
Bou-hamad I, Larocque D, Ben-Ameur H, Mâsse LC, Vitaro F, Tremblay RE (2009) Discrete-time survival trees. Can J Stat 37(1):17–32
Article MATH Google Scholar
Sabbeh SF (2018) Machine-learning techniques for customer retention: a comparative study. Int J Adv Comput Sci Appl, 9(2). https://doi.org/10.14569/IJACSA.2018.090238
Qi Y, Bar-Joseph Z, Klein-Seetharaman J (2006) Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3):490–500
Article Google Scholar
Musbah H, Aly HH, Little TA (2021) Energy management of hybrid energy system sources based on machine learning classification algorithms. Electric Power Syst Res 199:107436
Article Google Scholar
Muhammad L, Islam MM, Usman SS, Ayon SI (2020) Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery. SN Comp Sci 1(4):1–7
Google Scholar
Pham BT, Nguyen MD, Nguyen-Thoi T, Ho LS, Koopialipoor M, Quoc NK, Armaghani DJ, Van Le H (2021) A novel approach for classification of soils based on laboratory tests using Adaboost. Tree ANN Model Transp Geotech 27:100508
Google Scholar
Wang X, Li Z, Shafieezadeh A (2021) Seismic response prediction and variable importance analysis of extended pile-shaft-supported bridges against lateral spreading: exploring optimized machine learning models. Eng Struct 236:112142
Article Google Scholar
Chen Z, Li H, Goh ATC, Wu C, Zhang W (2020) Soil liquefaction assessment using soft computing approaches based on capacity energy concept. Geosciences 10(9):330
Article Google Scholar
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications. Springer, Berlin
MATH Google Scholar
Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O’Reilly Media Inc, Sebastopol
Google Scholar
Das SK, Mohanty R, Mohanty M, Mahamaya M (2020) Multi-objective feature selection (MOFS) algorithms for prediction of liquefaction susceptibility of soil based on in situ test methods. Nat Hazards 103:2371–2393
Article Google Scholar
Kuhn M, Johnson K (2019) Feature engineering and selection: A practical approach for predictive models. CRC Press, Boca Raton
Book Google Scholar
Hu J (2021) Data cleaning and feature selection for gravelly soil liquefaction. Soil Dynam Earthq Eng 145:106711
Article Google Scholar
Demir S, Sahin EK (2021) Assessment of feature selection for liquefaction prediction based on recursive feature elimination. Eur J Sci Tech 28:290–294
Google Scholar
Team RDC (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org.
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut Comput 1(1):3–18
Article Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MATH Google Scholar
Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360
Article MATH Google Scholar
An T-K, Kim M-H (2010) A new diverse AdaBoost classifier. In: 2010 International conference on artificial intelligence and computational intelligence. IEEE, pp 359–363
Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobot 7:21
Article Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794
Qin C, Zhang Y, Bao F, Zhang C, Liu P, Liu P (2021) XGBoost optimized by adaptive particle swarm optimization for credit scoring. Math Probl Eng. https://doi.org/10.1155/2021/6655510
Article Google Scholar
XGBoost-Documentation (2021). https://xgboost.readthedocs.io/en/stable/. Accessed 16 Sept 2021
Zhang H, Qiu D, Wu R, Deng Y, Ji D, Li T (2019) Novel framework for image attribute annotation with gene selection XGBoost algorithm and relative attribute model. Appl Soft Comput 80:57–79
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1):389–422
Article MATH Google Scholar
Shi F, Peng X, Liu Z, Li E, Hu Y (2020) A data-driven approach for pipe deformation prediction based on soil properties and weather conditions. Sustain Cities Soc 55:102012
Article Google Scholar
Sun D, Shi S, Wen H, Xu J, Zhou X, Wu J (2021) A hybrid optimization method of factor screening predicated on GeoDetector and random forest for landslide susceptibility mapping. Geomorphology 379:107623
Article Google Scholar
Svetnik V, Liaw A, Tong C, Wang T (2004) Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: International workshop on multiple classifier systems. Springer, pp 334–343
Paja W, Pancerz K, Grochowalski P (2018) Generational feature elimination and some other ranking feature selection methods. Advances in feature selection for data and pattern recognition. Springer, pp. 97–112
Kursa MB, Rudnicki WR (2010) Feature selection with the Boruta package. J Stat Softw 36(11):1–13
Article Google Scholar
Stańczyk U, Zielosko B, Jain LC (2018) Advances in feature selection for data and pattern recognition: an introduction. Advances in feature selection for data and pattern recognition. Springer, pp 1–9
Breaux HJ (1967) On stepwise multiple linear regression. Report no. 1369. Ballistic research laboratories aberdeen proving ground, Maryland
Kumar S, Attri S, Singh K (2019) Comparison of Lasso and stepwise regression technique for wheat yield prediction. J Agrometeorol 21(2):188–192
Article Google Scholar
Chowdhury MZI, Turin TC (2020) Variable selection strategies and its importance in clinical prediction modelling. Fam Med Commun Health 8(1):e000262. https://doi.org/10.1136/fmch-2019-000262
Article Google Scholar
Huang C, Townshend J (2003) A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover. Int J Remote Sens 24(1):75–90
Article Google Scholar
Huang C, Davis L, Townshend J (2002) An assessment of support vector machines for land cover classification. Int J Remote Sens 23(4):725–749
Article Google Scholar
Maxwell AE, Warner TA, Fang F (2018) Implementation of machine-learning classification in remote sensing: an applied review. Int J Remote Sens 39(9):2784–2817
Article Google Scholar
Etikan I, Bala K (2017) Sampling and sampling methods. Biom Biostat Int J 5(6):00149
Google Scholar
Berndt AE (2020) Sampling methods. J Hum Lact 36(2):224–226
Article Google Scholar
Fink A (2003) How to sample in surveys. Sage, Thousand Oaks
Book Google Scholar
Samui P, Sitharam T (2011) Machine learning modelling for predicting soil liquefaction susceptibility. Nat Hazards Earth Syst Sci 11(1):1–9
Article Google Scholar
Demir S, Sahin EK (2022) Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on CPT data. Soil Dynam Earth Eng 154:107130
Article Google Scholar
Ao S-I (2008) Data mining and applications in genomics. Springer Science & Business Media, Berlin
Book MATH Google Scholar
Sahin EK (2022) Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. Geocarto Int 37(9):2441–2465. https://doi.org/10.1080/10106049.2020.1831623
Article Google Scholar
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(1):1–26
Google Scholar
Keyport RN, Oommen T, Martha TR, Sajinkumar K, Gierke JS (2018) A comparative analysis of pixel-and object-based detection of landslides from very high-resolution images. Int J App Earth Obs Geoinf 64:1–11
Google Scholar

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Department of Civil Engineering, Bolu Abant Izzet Baysal University, 14030, Bolu, Turkey
Selçuk Demir & Emrehan Kutlug Sahin

Authors

Selçuk Demir
View author publications
You can also search for this author in PubMed Google Scholar
Emrehan Kutlug Sahin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SD: Conceptualization, Investigation, Writing-review and editing, Writing-original draft, Visualization. EKS: Conceptualization, Methodology, Software, Writing-review and editing, Writing-original draft, Visualization.

Corresponding author

Correspondence to Selçuk Demir.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Data availability

The dataset analyzed during the current study are publicly available at location cited in the reference section.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Demir, S., Sahin, E.K. An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Comput & Applic 35, 3173–3190 (2023). https://doi.org/10.1007/s00521-022-07856-4

Download citation

Received: 11 November 2021
Accepted: 21 September 2022
Published: 08 October 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00521-022-07856-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost

Abstract

Access this article

Similar content being viewed by others

A comparative analysis of ensemble learning algorithms with hyperparameter optimization for soil liquefaction prediction

Liquefaction prediction with robust machine learning algorithms (SVM, RF, and XGBoost) supported by genetic algorithm-based feature selection and parameter optimization from the perspective of data processing

Application of Boosting-Based Ensemble Learning Method for the Prediction of Compression Index

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Data availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost

Abstract

Access this article

Similar content being viewed by others

A comparative analysis of ensemble learning algorithms with hyperparameter optimization for soil liquefaction prediction

Liquefaction prediction with robust machine learning algorithms (SVM, RF, and XGBoost) supported by genetic algorithm-based feature selection and parameter optimization from the perspective of data processing

Application of Boosting-Based Ensemble Learning Method for the Prediction of Compression Index

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Data availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation