Abstract
Previous major earthquake events have revealed that soils susceptible to liquefaction are one of the factors causing significant damages to the structures. Therefore, accurate prediction of the liquefaction phenomenon is an important task in earthquake engineering. Over the past decade, several researchers have been extensively applied machine learning (ML) methods to predict soil liquefaction. This paper presents the prediction of soil liquefaction from the SPT dataset by using relatively new and robust tree-based ensemble algorithms, namely Adaptive Boosting, Gradient Boosting Machine, and eXtreme Gradient Boosting (XGBoost). The innovation points introduced in this paper are presented briefly as follows. Firstly, Stratified Random Sampling was utilized to ensure equalized sampling between each class selection. Secondly, feature selection methods such as Recursive Feature Elimination, Boruta, and Stepwise Regression were applied to develop models with a high degree of accuracy and minimal complexity by selecting the variables with significant predictive features. Thirdly, the performance of ML algorithms with feature selection methods was compared in terms of four performance metrics, Overall Accuracy, Precision, Recall, and F-measure to select the best model. Lastly, the best predictive model was determined using a statistical significance test called Wilcoxon’s sign rank test. Furthermore, computational cost analyses of the tree-based ensemble algorithms were performed based on parallel and non-parallel processing. The results of the study suggest that all developed tree-based ensemble models could reliably estimate soil liquefaction. In conclusion, according to both validation and statistical results, the XGBoost with the Boruta model achieved the most stable and better prediction performance than the other models in all considered cases.
Similar content being viewed by others
References
Towhata I (2008) Geotechnical earthquake engineering. Springer-Verlag, Berlin
Ishihara K, Koga Y (1981) Case studies of liquefaction in the 1964 Niigata earthquake. Soils Found 21(3):35–52
Seed HB, Idriss IM (1967) Analysis of soil liquefaction: Niigata earthquake. J Soil Mech Found Div 93(3):83–108
Youd T (2014) Ground failure investigations following the 1964 Alaska Earthquake. In: Proceedings of the 10th National Conference in Earthquake Engineering, Earthquake Engineering Research Institute, Anchorage, AK
Chen L, Yuan X, Cao Z, Hou L, Sun R, Dong L, Wang W, Meng F, Chen H (2009) Liquefaction macrophenomena in the great Wenchuan earthquake. Earthq Eng Eng Vib 8(2):219–229
Orense RP, Kiyota T, Yamada S, Cubrinovski M, Hosono Y, Okamura M, Yasuda S (2011) Comparison of liquefaction features observed during the 2010 and 2011 Canterbury earthquakes. Seis Res Lett 82(6):905–918
Yasuda S, Harada K, Ishikawa K, Kanemaru Y (2012) Characteristics of liquefaction in Tokyo Bay area by the 2011 Great East Japan earthquake. Soils Found 52(5):793–810
Papathanassiou G, Mantovani A, Tarabusi G, Rapti D, Caputo R (2015) Assessment of liquefaction potential for two liquefaction prone areas considering the May 20, 2012 Emilia (Italy) earthquake. Eng Geol 189:1–16
Seed HB, Idriss IM (1971) Simplified procedure for evaluating soil liquefaction potential. J Soil Mech and Found Div 97(9):1249–1273
Robertson PK, Wride C (1998) Evaluating cyclic liquefaction potential using the cone penetration test. Can Geotech J 35(3):442–459
Andrus RD, Stokoe KH II (2000) Liquefaction resistance of soils from shear-wave velocity. J Geotech Geoenviron Eng 126(11):1015–1025
Cetin KO, Seed RB, Der Kiureghian A, Tokimatsu K, Harder LF Jr, Kayen RE, Moss RE (2004) Standard penetration test-based probabilistic and deterministic assessment of seismic soil liquefaction potential. J Geotech Geoenviron Eng 130(12):1314–1340
Moss R, Seed RB, Kayen RE, Stewart JP, Der Kiureghian A, Cetin KO (2006) CPT-based probabilistic and deterministic assessment of in situ seismic soil liquefaction potential. J Geotech Geoenviron Eng 132(8):1032–1051
Kayen R, Moss R, Thompson E, Seed R, Cetin K, Kiureghian AD, Tanaka Y, Tokimatsu K (2013) Shear-wave velocity–based probabilistic and deterministic assessment of seismic soil liquefaction potential. J Geotech Geoenviron Eng 139(3):407–419
Boulanger R, Idriss I (2014) CPT and SPT based liquefaction triggering procedures. Report No UCD/CGM-14 1
Boulanger RW, Idriss I (2016) CPT-based liquefaction triggering procedure. J Geotech Geoenviron Eng 142(2):04015065
Cetin KO, Seed RB, Kayen RE, Moss RE, Bilge HT, Ilgac M, Chowdhury K (2018) SPT-based probabilistic and deterministic assessment of seismic soil liquefaction triggering hazard. Soil Dynam Earthq Eng 115:698–709
Zhang W, Li H, Li Y, Liu H, Chen Y, Ding X (2021) Application of deep learning algorithms in geotechnical engineering: a short critical review. Artif Intell Rev 54:5633–5673. https://doi.org/10.1007/s10462-021-09967-1
Durante MG, Rathje EM (2021) An exploration of the use of machine learning to predict lateral spreading. Earthq Spect. https://doi.org/10.1177/87552930211004613
Xie Y, Ebad Sichani M, Padgett JE, DesRoches R (2020) The promise of implementing machine learning in earthquake engineering: a state-of-the-art review. Earthq Spect 36(4):1769–1801
Goh AT (1996) Neural-network modeling of CPT seismic liquefaction data. J Geotech Geoenviron Eng 122(1):70–73
Pal M (2006) Support vector machines-based modelling of seismic liquefaction potential. Int J Num Anal Meth Geomech 30(10):983–996
Goh AT, Goh S (2007) Support vector machines: their use in geotechnical engineering as illustrated using seismic liquefaction data. Comput Geotech 34(5):410–421
Hanna AM, Ural D, Saygili G (2007) Neural network model for liquefaction potential in soil deposits using Turkey and Taiwan earthquake data. Soil Dynam Earthq Eng 27(6):521–540
Ülgen D, Engin HK (2007) A study of CPT based liquefaction assessment using artificial neural networks. In: 4th international conference on earthquake geotechnical engineering, pp. 1–12
Rezania M, Faramarzi A, Javadi AA (2011) An evolutionary based approach for assessment of earthquake-induced soil liquefaction and lateral displacement. Eng Appl Artif Intell 24(1):142–153
Zhang J, Zhang LM, Huang HW (2013) Evaluation of generalized linear models for soil liquefaction probability prediction. Environ Earth Sci 68(7):1925–1933
Kohestani V, Hassanlourad M, Ardakani A (2015) Evaluation of liquefaction potential based on CPT data using random forest. Nat Hazards 79(2):1079–1089
Hoang N-D, Bui DT (2018) Predicting earthquake-induced soil liquefaction based on a hybridization of kernel Fisher discriminant analysis and a least squares support vector machine: a multi-dataset study. Bull Eng Geol Env 77(1):191–204
Pirhadi N, Tang X, Yang Q, Kang F (2019) A new equation to evaluate liquefaction triggering using the response surface method and parametric sensitivity analysis. Sustainability 11(1):112
Zhou J, Li E, Wang M, Chen X, Shi X, Jiang L (2019) Feasibility of stochastic gradient boosting approach for evaluating seismic liquefaction potential based on SPT and CPT case histories. J Perform Constr Facil 33(3):04019024
Cai M, Hocine O, Mohammed AS, Chen X, Amar MN, Hasanipanah M (2021) Integrating the LSSVM and RBFNN models with three optimization algorithms to predict the soil liquefaction potential. Eng Comput, 1–13
Zhao Z, Duan W, Cai G (2021) A novel PSO-KELM based soil liquefaction potential evaluation system using CPT and Vs measurements. Soil Dynam Earthq Eng 150:106930
Wang L, Wu C, Tang L, Zhang W, Lacasse S, Liu H, Gao L (2020) Efficient reliability analysis of earth dam slope stability using extreme gradient boosting method. Acta Geotech 15(11):3135–3150
Wang M-X, Huang D, Wang G, Li D-Q (2020) SS-XGBoost: a machine learning framework for predicting newmark sliding displacements of slopes. J Geotech Geoenviron Eng 146(9):04020074
Bharti JP, Mishra P, Sathishkumar V, Cho Y, Samui P (2021) Slope stability analysis using Rf, Gbm, Cart, Bt and Xgboost. Geotech Geol Eng 39(5):3741–3752
Zhang W, Wu C, Zhong H, Li Y, Wang L (2021) Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci Front 12(1):469–477
Polikar R (2012) Ensemble learning. Ensemble machine learning. Springer, pp. 1–34
Worasucheep C (2021) Ensemble classifier for stock trading recommendation. Appl Artif Intell, 1–32
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Quinlan JR (1996) Bagging, boosting, and C4. 5. Aaai/iaai 1:725–730
Rocca J (2019) Ensemble methods: bagging, boosting and stacking. medium-towards data science. https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205
Papadopoulos S, Azar E, Woon W-L, Kontokosta CE (2018) Evaluation of tree-based ensemble learning algorithms for building energy performance estimation. J Build Perform Simul 11(3):322–332
Bou-hamad I, Larocque D, Ben-Ameur H, Mâsse LC, Vitaro F, Tremblay RE (2009) Discrete-time survival trees. Can J Stat 37(1):17–32
Sabbeh SF (2018) Machine-learning techniques for customer retention: a comparative study. Int J Adv Comput Sci Appl, 9(2). https://doi.org/10.14569/IJACSA.2018.090238
Qi Y, Bar-Joseph Z, Klein-Seetharaman J (2006) Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3):490–500
Musbah H, Aly HH, Little TA (2021) Energy management of hybrid energy system sources based on machine learning classification algorithms. Electric Power Syst Res 199:107436
Muhammad L, Islam MM, Usman SS, Ayon SI (2020) Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery. SN Comp Sci 1(4):1–7
Pham BT, Nguyen MD, Nguyen-Thoi T, Ho LS, Koopialipoor M, Quoc NK, Armaghani DJ, Van Le H (2021) A novel approach for classification of soils based on laboratory tests using Adaboost. Tree ANN Model Transp Geotech 27:100508
Wang X, Li Z, Shafieezadeh A (2021) Seismic response prediction and variable importance analysis of extended pile-shaft-supported bridges against lateral spreading: exploring optimized machine learning models. Eng Struct 236:112142
Chen Z, Li H, Goh ATC, Wu C, Zhang W (2020) Soil liquefaction assessment using soft computing approaches based on capacity energy concept. Geosciences 10(9):330
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications. Springer, Berlin
Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O’Reilly Media Inc, Sebastopol
Das SK, Mohanty R, Mohanty M, Mahamaya M (2020) Multi-objective feature selection (MOFS) algorithms for prediction of liquefaction susceptibility of soil based on in situ test methods. Nat Hazards 103:2371–2393
Kuhn M, Johnson K (2019) Feature engineering and selection: A practical approach for predictive models. CRC Press, Boca Raton
Hu J (2021) Data cleaning and feature selection for gravelly soil liquefaction. Soil Dynam Earthq Eng 145:106711
Demir S, Sahin EK (2021) Assessment of feature selection for liquefaction prediction based on recursive feature elimination. Eur J Sci Tech 28:290–294
Team RDC (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org.
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut Comput 1(1):3–18
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360
An T-K, Kim M-H (2010) A new diverse AdaBoost classifier. In: 2010 International conference on artificial intelligence and computational intelligence. IEEE, pp 359–363
Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobot 7:21
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794
Qin C, Zhang Y, Bao F, Zhang C, Liu P, Liu P (2021) XGBoost optimized by adaptive particle swarm optimization for credit scoring. Math Probl Eng. https://doi.org/10.1155/2021/6655510
XGBoost-Documentation (2021). https://xgboost.readthedocs.io/en/stable/. Accessed 16 Sept 2021
Zhang H, Qiu D, Wu R, Deng Y, Ji D, Li T (2019) Novel framework for image attribute annotation with gene selection XGBoost algorithm and relative attribute model. Appl Soft Comput 80:57–79
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1):389–422
Shi F, Peng X, Liu Z, Li E, Hu Y (2020) A data-driven approach for pipe deformation prediction based on soil properties and weather conditions. Sustain Cities Soc 55:102012
Sun D, Shi S, Wen H, Xu J, Zhou X, Wu J (2021) A hybrid optimization method of factor screening predicated on GeoDetector and random forest for landslide susceptibility mapping. Geomorphology 379:107623
Svetnik V, Liaw A, Tong C, Wang T (2004) Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: International workshop on multiple classifier systems. Springer, pp 334–343
Paja W, Pancerz K, Grochowalski P (2018) Generational feature elimination and some other ranking feature selection methods. Advances in feature selection for data and pattern recognition. Springer, pp. 97–112
Kursa MB, Rudnicki WR (2010) Feature selection with the Boruta package. J Stat Softw 36(11):1–13
Stańczyk U, Zielosko B, Jain LC (2018) Advances in feature selection for data and pattern recognition: an introduction. Advances in feature selection for data and pattern recognition. Springer, pp 1–9
Breaux HJ (1967) On stepwise multiple linear regression. Report no. 1369. Ballistic research laboratories aberdeen proving ground, Maryland
Kumar S, Attri S, Singh K (2019) Comparison of Lasso and stepwise regression technique for wheat yield prediction. J Agrometeorol 21(2):188–192
Chowdhury MZI, Turin TC (2020) Variable selection strategies and its importance in clinical prediction modelling. Fam Med Commun Health 8(1):e000262. https://doi.org/10.1136/fmch-2019-000262
Huang C, Townshend J (2003) A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover. Int J Remote Sens 24(1):75–90
Huang C, Davis L, Townshend J (2002) An assessment of support vector machines for land cover classification. Int J Remote Sens 23(4):725–749
Maxwell AE, Warner TA, Fang F (2018) Implementation of machine-learning classification in remote sensing: an applied review. Int J Remote Sens 39(9):2784–2817
Etikan I, Bala K (2017) Sampling and sampling methods. Biom Biostat Int J 5(6):00149
Berndt AE (2020) Sampling methods. J Hum Lact 36(2):224–226
Fink A (2003) How to sample in surveys. Sage, Thousand Oaks
Samui P, Sitharam T (2011) Machine learning modelling for predicting soil liquefaction susceptibility. Nat Hazards Earth Syst Sci 11(1):1–9
Demir S, Sahin EK (2022) Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on CPT data. Soil Dynam Earth Eng 154:107130
Ao S-I (2008) Data mining and applications in genomics. Springer Science & Business Media, Berlin
Sahin EK (2022) Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. Geocarto Int 37(9):2441–2465. https://doi.org/10.1080/10106049.2020.1831623
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(1):1–26
Keyport RN, Oommen T, Martha TR, Sajinkumar K, Gierke JS (2018) A comparative analysis of pixel-and object-based detection of landslides from very high-resolution images. Int J App Earth Obs Geoinf 64:1–11
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
SD: Conceptualization, Investigation, Writing-review and editing, Writing-original draft, Visualization. EKS: Conceptualization, Methodology, Software, Writing-review and editing, Writing-original draft, Visualization.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Data availability
The dataset analyzed during the current study are publicly available at location cited in the reference section.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Demir, S., Sahin, E.K. An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Comput & Applic 35, 3173–3190 (2023). https://doi.org/10.1007/s00521-022-07856-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07856-4