Abstract
Machine learning algorithms have been recently applied to build a landslide susceptibility map. The objective of this study is to find whether classification algorithms of machine learning are suitable for obtaining safety factor based on a high-risk-area (HRA) model, composed of eight geotechnical properties. Each property value is designated as an input value for machine learning, and the output value is determined as a safety factor. The data are transformed into continuous data after preprocessing with label encoding since the data have a discontinuous pattern. The DT, KNN, LR, RF, and SVM algorithms are selected to perform the classification with train and validation ratio of 7:3. To improve the reliability of the results, the classification is also performed after applying the PCA technique, which can reduce eight dimensions to two principal components. In addition, the number of data is equally oversampled using the SMOTE technique to solve the data imbalance problem for each class, and the results of classification are also compared. The PCA shows a limited ability to reflect the characteristics of the original data, and the oversampled data by the SMOTE provides high reliability. The results show that the RF is suitable for performing classification with high accuracy in the range of 1.2–2.4 of safety factors. This study demonstrates that it is possible to classify even discontinuous data through a preprocessing technique, and SMOTE can improve the accuracy of landslide risk mapping.
Similar content being viewed by others
Data availability
All data, models, or code generated or used during the study are available from the corresponding author by request.
References
Allam AS, Bassioni HA, Kamel W, Ayoub M (2020) Estimating the standardized regression coefficients of design variables in daylighting and energy performance of buildings in the face of multicollinearity. Sol Energy 211:1184–1193
Alphonsus C, Raji AO (2019) Application of principal component analysis (PCA) for correcting multicollinearity and dimension reduction of morphological parameters in Bunaji Cows. Nigerian J Anim Sci 21(2):1–8
Bernardo A, Della Valle E (2022) An extensive study of C-SMOTE, a continuous synthetic minority oversampling technique for evolving data streams. Expert Syst Appl 196:116630
Blondeau S, Gunnell Y, Jarman D (2021) Rock slope failure in the Western Alps: A first comprehensive inventory and spatial analysis. Geomorphology 380:107622
Bordoni M, Vivaldi V, Lucchelli L, Ciabatta L, Brocca L, Galve JP, Meisina C (2021) Development of a data-driven model for spatial and temporal shallow landslide probability of occurrence at catchment scale. Landslides 18:1209–1229
Borra S, Thanki R, Dey N (2019) Satellite image classification. In Satellite Image Analysis: Clustering and Classification (pp. 53–81). Springer, Singapore
Choo H, Min DH, Sung JH, Yoon HK (2019) Sensitivities of input parameters for predicting stability of soil slope. Bull Eng Geol Env 78(8):5671–5685
Guo Z, Shi Y, Huang F, Fan X, Huang J (2021) Landslide susceptibility zonation method based on C5. 0 decision tree and K-means cluster algorithms to improve the efficiency of risk management. Geosci Front 12(6):101249
Han XL, Jiang NJ, Yang YF, Choi J, Singh DN, Beta P, Wang YJ (2022) Deep learning based approach for the instance segmentation of clayey soil desiccation cracks. Comput Geotech 146:104733
Han Y, Bao X (2022) Topological mapping of complex networks from high slope deformation time series for landslide risk assessment. Expert Syst Appl 206:117816
Hu JZ, Zhang J, Huang HW, Zheng JG (2021) Value of information analysis of site investigation program for slope design. Comput Geotech 131:103938
Jun H, Min DH, Yoon HK (2017) Determination of monitoring systems and installation location to prevent debris flow through web-based database and AHP. Mar Georesour Geotechnol 35(8):1049–1057
Lee JS, Park J, Kim J, Yoon HK (2022) Study of oversampling algorithms for soil classifications by field velocity resistivity probe. Geomech Eng 30(3):247–258
Lee S, Yoon HK (2020) Hydraulic conductivity of saturated soil medium through time-domain reflectometry. Sensors 20(23):7001
Liang D, Yi B, Cao W, Zheng Q (2022) Exploring ensemble oversampling method for imbalanced keyword extraction learning in policy text based on three-way decisions and SMOTE. Expert Syst Appl 188:116051
Merghadi A, Yunus AP, Dou J, Whiteley J, ThaiPham B, Bui DT, Avtar R, Abderrahmane B (2020) Machine learning methods for landslide susceptibility studies: a comparative overview of algorithm performance. Earth Sci Rev 207:103225
Metya S, Chaudhary N, Sharma KK (2021) Psuedo static stability analysis of rock slope using patton’s shear criterion. Intern J Geo-Eng 12:1–22
Min DH, Yoon HK (2021) Suggestion for a new deterministic model coupled with machine learning techniques for landslide susceptibility mapping. Sci Rep 11(1):1–24
Pan Y, Wu G, Zhao Z, He L (2020) Analysis of rock slope stability under rainfall conditions considering the water-induced weakening of rock. Comput Geotech 128:103806
Pham BT, Nguyen-Thoi T, Qi C, Van Phong T, Dou J, Ho LS, Le HV, Prakash I (2020) Coupling RBF neural network with ensemble learning techniques for landslide susceptibility mapping. CATENA 195:104805
Rotigliano E, Martinello C, Hernandéz MA, Agnesi V, Conoscenti C (2019) Predicting the landslides triggered by the 2009 96E/Ida tropical storms in the Ilopango caldera area (El Salvador, CA): optimizing MARS-based model building and validation strategies. Environ Earth Sci 78:1–16
Sarkar S, Chakraborty M (2021) Stability analysis for two-layered slopes by using the strength reduction method. Intern J Geo-Eng 12(1):24
Singh V, Stanier S, Bienen B, Randolph MF (2021) Modelling the behaviour of sensitive clays experiencing large deformations using non-local regularisation techniques. Comput Geotech 133:104025
Song BD, Park H, Park K (2022) Toward flexible and persistent UAV service: Multi-period and multi-objective system design with task assignment for disaster management. Expert Syst Appl 206:117855
Suleiman S, Badamsi S (2019) Effect of multicollinearity in predicting diabetes mellitus using statistical neural network. Euro J Adv Eng Technol 6(6):30–38
Van Dao D, Jaafari A, Bayat M, Mafi-Gholami D, Qi C, Moayedi H, Phong T, Ly H, Le T, Trinh PT, Luu C, Quoc NK, Thanh BN, Pham BT (2020) A spatially explicit deep learning neural network model for the prediction of landslide susceptibility. CATENA 188:104451
Wei X, Zhang L, Luo J, Liu D (2021) A hybrid framework integrating physical model and convolutional neural network for regional landslide susceptibility mapping. Nat Hazards 109(1):471–497
Yoon HK, Lee JS (2010) Field velocity resistivity probe for estimating stiffness and void ratio. Soil Dyn Earthq Eng 30(12):1540–1549
Zhang J, Li M, Han S, Deng G (2021) Estimation of seismic wave incident angle using vibration response data and stacking ensemble algorithm. Comput Geotech 137:104255
Funding
This research was supported by the Daejeon University Research Grants (2023).
Author information
Authors and Affiliations
Contributions
Sewon Kim: methodology, software, formal analysis. Hyung-Koo Yoon: conceptualization, data process, writing manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kim, S., Yoon, HK. Application of classification coupled with PCA and SMOTE, for obtaining safety factor of landslide based on HRA. Bull Eng Geol Environ 82, 381 (2023). https://doi.org/10.1007/s10064-023-03403-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10064-023-03403-0