Skip to main content

Advertisement

Log in

Spatial prediction of landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and RBFNetwork models for the Long County area (China)

  • Original Paper
  • Published:
Bulletin of Engineering Geology and the Environment Aims and scope Submit manuscript

Abstract

The main goal of this study is to assess and compare three advanced machine learning techniques, namely, kernel logistic regression (KLR), naïve Bayes (NB), and radial basis function network (RBFNetwork) models for landslide susceptibility modeling in Long County, China. First, a total of 171 landslide locations were identified within the study area using historical reports, aerial photographs, and extensive field surveys. All the landslides were randomly separated into two parts with a ratio of 70/30 for training and validation purposes. Second, 12 landslide conditioning factors were prepared for landslide susceptibility modeling, including slope aspect, slope angle, plan curvature, profile curvature, elevation, distance to faults, distance to rivers, distance to roads, lithology, NDVI (normalized difference vegetation index), land use, and rainfall. Third, the correlations between the conditioning factors and the occurrence of landslides were analyzed using normalized frequency ratios. A multicollinearity analysis of the landslide conditioning factors was carried out using tolerances and variance inflation factor (VIF) methods. Feature selection was performed using the chi-squared statistic with a 10-fold cross-validation technique to assess the predictive capabilities of the landslide conditioning factors. Then, the landslide conditioning factors with null predictive ability were excluded in order to optimize the landslide models. Finally, the trained KLR, NB, and RBFNetwork models were used to construct landslide susceptibility maps. The receiver operating characteristics (ROC) curve, the area under the curve (AUC), and several statistical measures, such as accuracy (ACC), F-measure, mean absolute error (MAE), and root mean squared error (RMSE), were used for the assessment, validation, and comparison of the resulting models in order to choose the best model in this study. The validation results show that all three models exhibit reasonably good performance, and the KLR model exhibits the most stable and best performance. The KLR model, which has a success rate of 0.847 and a prediction rate of 0.749, is a promising technique for landslide susceptibility mapping. Given the outcomes of the study, all three models could be used efficiently for landslide susceptibility analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Aghdam IN, Varzandeh MHM, Pradhan B (2016) Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran). Environ Earth Sci 75:1–20

    Article  Google Scholar 

  • Althuwaynee OF, Pradhan B, Lee S (2016) A novel integrated model for assessing landslide susceptibility mapping using CHAID and AHP pair-wise comparison. Int J Remote Sens 37:1190–1209

    Article  Google Scholar 

  • Andrews DW (1988) Chi-square diagnostic tests for econometric models: introduction and applications. J Econ 37:135–156

    Article  Google Scholar 

  • Booth AM et al (2015) Integrating diverse geologic and geodetic observations to determine failure mechanisms and deformation rates across a large bedrock landslide complex: the Osmundneset landslide, Sogn og Fjordane, Norway. Landslides 12:745–756

    Article  Google Scholar 

  • Carlini M et al (2016) Tectonic control on the development and distribution of large landslides in the northern Apennines (Italy). Geomorphology 253:425–437

    Article  Google Scholar 

  • Cawley GC, Talbot NL (2008) Efficient approximate leave-one-out cross-validation for kernel logistic regression. Mach Learn 71:243–264

    Article  Google Scholar 

  • Chen W, Panahi M, Pourghasemi HR (2017a) Performance evaluation of GIS-based new ensemble data mining techniques of adaptive neuro-fuzzy inference system (ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for landslide spatial modelling. Catena 157:310–324

    Article  Google Scholar 

  • Chen W et al (2017b) GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models. Geomat Nat Haz Risk 8:950–973

    Article  Google Scholar 

  • Chen W et al (2017c) A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 151:147–160

    Article  Google Scholar 

  • Chung C-JF, Fabbri AG (2003) Validation of spatial prediction models for landslide hazard mapping. Nat Hazards 30:451–472

    Article  Google Scholar 

  • Colkesen I, Sahin EK, Kavzoglu T (2016) Susceptibility mapping of shallow landslides using kernel-based Gaussian process, support vector machines and logistic regression. J Afr Earth Sci 118:53–64

    Article  Google Scholar 

  • Conoscenti C et al (2015) Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: a case of the Belice River basin (western Sicily, Italy). Geomorphology 242:49–64

    Article  Google Scholar 

  • Constantin M, Bednarik M, Jurchescu MC, Vlaicu M (2011) Landslide susceptibility assessment using the bivariate statistical analysis and the index of entropy in the Sibiciu Basin (Romania). Environ Earth Sci 63:397–406

    Article  Google Scholar 

  • Cook TL, Yellen BC, Woodruff JD, Miller D (2015) Contrasting human versus climatic impacts on erosion. Geophys Res Lett 42:6680–6687

    Article  Google Scholar 

  • Dehnavi A, Aghdam IN, Pradhan B, Varzandeh MHM (2015) A new hybrid model using step-wise weight assessment ratio analysis (SWAM) technique and adaptive neuro-fuzzy inference system (ANFIS) for regional landslide hazard assessment in Iran. Catena 135:122–148

    Article  Google Scholar 

  • Dormann CF et al (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36:27–46

    Article  Google Scholar 

  • Dou J et al. (2014) GIS-based landslide susceptibility mapping using a certainty factor model and its validation in the Chuetsu Area, Central Japan. In: Sassa K, Canuti P, Yin Y (eds) Landslide Science for a Safer Geoenvironment. Springer, Cham, pp 419–424

  • Felicísimo ÁM, Cuartero A, Remondo J, Quirós E (2013) Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides 10:175–189

    Article  Google Scholar 

  • Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58

    Article  Google Scholar 

  • Gil D, Johnsson M (2010) Supervised SOM based architecture versus multilayer perceptron and RBF networks, Proceedings of the Linköping Electronic Conference, pp 15–24

  • Gorsevski PV, Brown MK, Panter K, Onasch CM (2016) Landslide detection and susceptibility mapping using LiDAR and an artificial neural network approach: a case study in the Cuyahoga Valley National Park, Ohio. Landslides 13:467–484

    Article  Google Scholar 

  • Guzzetti F, Reichenbach P, Ardizzone F, Cardinali M, Galli M (2006) Estimating the quality of landslide susceptibility models. Geomorphology 81:166–184

    Article  Google Scholar 

  • Hong H, Pradhan B, Xu C, Tien Bui D (2015a) Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. Catena 133:266–281

    Article  Google Scholar 

  • Hong H, Xu C, Revhaug I, Tien Bui D (2015b) Spatial prediction of landslide hazard at the Yihuang area (China): a comparative study on the predictive ability of backpropagation multi-layer perceptron neural networks and radial basic function neural networks. In: Robbi Sluter C, Madureira Cruz CB, Leal de Menezes PM (eds) Cartography – Maps Connecting the World. Springer, Cham, pp 175–188

  • Jaafari A, Najafi A, Pourghasemi H, Rezaeian J, Sattarian A (2014) GIS-based frequency ratio and index of entropy models for landslide susceptibility assessment in the Caspian forest, northern Iran. Int J Environ Sci Technol 11:909–926

    Article  Google Scholar 

  • Kim T, Chung BD, Lee JS (2016) Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification. Computing 99:1–16

    Google Scholar 

  • Kimeldorf G, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33:82–95

    Article  Google Scholar 

  • Kumar R, Anbalagan R (2015) Landslide susceptibility zonation in part of Tehri reservoir region using frequency ratio, fuzzy logic and GIS. J Earth Syst Sci 124:431–448

    Article  Google Scholar 

  • Kumar R, Anbalagan R (2016) Landslide susceptibility mapping using analytical hierarchy process (AHP) in Tehri reservoir rim region, Uttarakhand. J Geol Soc India 87:271–286

    Article  Google Scholar 

  • Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42:155–165

    Article  Google Scholar 

  • Lineback Gritzner M, Marcus WA, Aspinall R, Custer SG (2001) Assessing landslide potential using GIS, soil wetness modeling and topographic attributes, Payette River, Idaho. Geomorphology 37:149–165

    Article  Google Scholar 

  • Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond A 209:415–446

  • Mohammady M, Pourghasemi HR, Pradhan B (2012) Landslide susceptibility mapping at Golestan Province, Iran: a comparison between frequency ratio, Dempster–Shafer, and weights-of-evidence models. J Asian Earth Sci 61:221–236

    Article  Google Scholar 

  • Park S, Choi C, Kim B, Kim J (2013) Landslide susceptibility mapping using frequency ratio, analytic hierarchy process, logistic regression, and artificial neural network methods at the Inje area, Korea. Environ Earth Sci 68:1443–1464

    Article  Google Scholar 

  • Peng JB et al (2015) Heavy rainfall triggered loess-mudstone landslide and subsequent debris flow in Tianshui, China. Eng Geol 186:79–90

    Article  Google Scholar 

  • Pham BT, Pradhan B, Tien Bui D, Prakash I, Dholakia MB (2016) A comparative study of different machine learning methods for landslide susceptibility assessment: a case study of Uttarakhand area (India). Environ Model Softw 84:240–250

    Article  Google Scholar 

  • Pham BT, Tien Bui D, Pourghasemi HR, Indra P, Dholakia M (2017) Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theor Appl Climatol 128:255–273

    Article  Google Scholar 

  • Pham BT, Tien Bui D, Pourghasemi HR, Indra P, Dholakia MB (2015) Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theor Appl Climatol 1–19

  • Pradhan B (2010) Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. J Indian Soc Remote Sens 38:301–320

    Article  Google Scholar 

  • Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365

    Article  Google Scholar 

  • Pradhan B, Abokharima MH, Jebur MN, Tehrany MS (2014) Land subsidence susceptibility mapping at Kinta Valley (Malaysia) using the evidential belief function model in GIS. Nat Hazards 73:1019–1042

    Article  Google Scholar 

  • Press SJ (1966) Linear combinations of non-central chi-square variates. Ann Math Stat 480–487

  • Rao J, Scott A (1987) On simple adjustments to chi-square tests with sample survey data. Ann Stat 385–397

  • Razandi Y, Pourghasemi HR, Neisani NS, Rahmati O (2015) Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Sci Inf 8:867–883

    Article  Google Scholar 

  • Razavizadeh S, Solaimani K, Massironi M, Kavian A (2017) Mapping landslide susceptibility with frequency ratio, statistical index, and weights of evidence models: a case study in northern Iran. Environ Earth Sci 76:499

    Article  Google Scholar 

  • Regmi AD et al (2014) Application of frequency ratio, statistical index, and weights-of-evidence models and their comparison in landslide susceptibility mapping in Central Nepal Himalaya. Arab J Geosci 7:725–742

    Article  Google Scholar 

  • Regmi NR, Giardino JR, Vitek JD (2010) Modeling susceptibility to landslides using the weight of evidence approach: western Colorado, USA. Geomorphology 115:172–187

    Article  Google Scholar 

  • Sar N, Khan A, Chatterjee S, Das A, Mipun BS (2016) Coupling of analytical hierarchy process and frequency ratio based spatial prediction of soil erosion susceptibility in Keleghai river basin, India. International Soil and Water Conservation Research

  • Satorra A, Bentler PM (2001) A scaled difference chi-square test statistic for moment structure analysis. Psychometrika 66:507–514

    Article  Google Scholar 

  • Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

  • Shahabi H, Hashim M, Ahmad BB (2015) Remote sensing and GIS-based landslide susceptibility mapping using frequency ratio, logistic regression, and fuzzy logic methods at the central Zab basin, Iran. Environ Earth Sci 73:8647–8668

    Article  Google Scholar 

  • Soria D, Garibaldi JM, Ambrogi F, Biganzoli EM, Ellis IO (2011) A ‘non-parametric’version of the naive Bayes classifier. Knowl-Based Syst 24:775–784

    Article  Google Scholar 

  • Tien Bui D, Nguyen QP, Hoang N-D, Klempe H (2017) A novel fuzzy K-nearest neighbor inference model with differential evolution for spatial prediction of rainfall-induced shallow landslides in a tropical hilly area using GIS. Landslides 14:1–17

    Article  Google Scholar 

  • Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in vietnam using support vector machines, decision tree, and Naive Bayes Models. Math Probl Eng 2012

  • Tien Bui D, Pradhan B, Revhaug I, Tran CT (2014) A comparative assessment between the application of fuzzy unordered rules induction algorithm and J48 decision tree models in spatial prediction of shallow landslides at Lang Son City, Vietnam. In: Srivastava PK, Mukherjee S, Gupta M, Islam T (eds) Remote Sensing Applications in Environmental Research. Springer, New York, pp 87–111

  • Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I (2016) Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13:361–378

    Article  Google Scholar 

  • Trigila A, Iadanza C, Esposito C, Scarascia-Mugnozza G (2015) Comparison of logistic regression and random forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 249:119–136

    Article  Google Scholar 

  • Tsangaratos P, Ilia I (2016) Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: the influence of models complexity and training dataset size. Catena 145:164–179

    Article  Google Scholar 

  • Van Westen C (2004) Geo-information tools for landslide risk assessment: an overview of recent developments, Proceedings 9th International Symposium on Landslides. Balkema, Amsterdam, pp 39–56

  • Walter S (2002) Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Stat Med 21:1237–1256

    Article  Google Scholar 

  • Wang L-J, Guo M, Sawada K, Lin J, Zhang J (2016) A comparative study of landslide susceptibility maps using logistic regression, frequency ratio, decision tree, weights of evidence and artificial neural network. Geosci J 20:117–136

    Article  Google Scholar 

  • Witten IH, Frank E, Mark AH (2011) Data mining: practical machine learning tools and techniques. 3rd edn. Morgan Kaufmann, Burlington

    Google Scholar 

  • Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37

    Article  Google Scholar 

  • Wu YM, Lan HX, Gao X, Li LP, Yang ZH (2015) A simplified physically based coupled rainfall threshold model for triggering landslides. Eng Geol 195:63–69

    Article  Google Scholar 

  • Youssef AM, Pourghasemi HR, El-Haddad BA, Dhahry BK (2016) Landslide susceptibility maps using different probabilistic and bivariate statistical models and comparison of their performance at Wadi Itwad Basin, Asir region, Saudi Arabia. Bull Eng Geol Environ 75:63–87

    Article  Google Scholar 

  • Zhang G et al (2016) Integration of the statistical index method and the analytic hierarchy process technique for the assessment of landslide susceptibility in Huizhou, China. Catena 142:233–244

    Article  Google Scholar 

  • Zhang M, Yin YP, Huang BL (2015) Mechanisms of rainfall-induced landslides in gently inclined red beds in the eastern Sichuan Basin, SW China. Landslides 12:973–983

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to express their gratitude to the Editor-in-Chief Martin Gordon Culshaw and two anonymous reviewers for their helpful comments on the manuscript.

Funding

This research was supported by Project funded by China Postdoctoral Science Foundation (Grant No. 2017 M613168), Project funded by Shaanxi Province Postdoctoral Science Foundation (Grant No. 2017BSHYDZZ07), Scientific Research Program Funded by Shaanxi Provincial Education Department (Program No. 17JK0511), the Open Fund of Shandong Provincial Key Laboratory of Depositional Mineralization & Sedimentary Minerals (Grant No. DMSM2017029), and the Open-ended Fund of the Key Laboratory for Geo-hazard in Loess Areas, Ministry of Land and Resources of China (Grant No. KLGLAMLR201603).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Chen or Haoyuan Hong.

Additional information

Highlights

•KLR, NB, and RBFNetwork models were compared in this study.

•The Chi-squared statistic was used to select conditioning factors.

•The ROC curve, ACC, MAE, and RMSE methods were used to assess the models’ performances.

•KLR showed the most promising results.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., Yan, X., Zhao, Z. et al. Spatial prediction of landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and RBFNetwork models for the Long County area (China). Bull Eng Geol Environ 78, 247–266 (2019). https://doi.org/10.1007/s10064-018-1256-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10064-018-1256-z

Keywords

Navigation