Skip to main content

Advertisement

Log in

A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping

  • Original Paper
  • Published:
Landslides Aims and scope Submit manuscript

Abstract

An ensemble algorithm of data mining decision tree (DT)-based CHi-squared Automatic Interaction Detection (CHAID) is widely used for prediction analysis in variety of applications. CHAID as a multivariate method has an automatic classification capacity to analyze large numbers of landslide conditioning factors. Moreover, it results two or more nodes for each independent variable, where every node contains numbers of presence or absence of landslides (dependent variable). Other DT methods such as Quick, Unbiased, Efficient Statistic Tree (QUEST) and Classification and Regression Trees (CRT) are not able to produce multi branches based tree. Thus, the main objective of this paper is to use CHAID method to perform the best classification fit for each conditioning factors, then, combined it with logistic regression (LR) to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. In the first step, a landslide inventory map with 296 landslide locations were extracted from various sources over the Pohang-Kyeong Joo catchment (South Korea). Then, the inventory was randomly split into two datasets, 70 % was used for training the models, and the remaining 30 % was used for validation purpose. Thirteen landslide conditioning factors were used for the susceptibility modeling. Then, CHAID was applied and revealed that some conditioning factors such as altitude, soil drain, soil texture and TWI, as terminal nodes and reflected the best classification fit. Then, a proposed ensemble technique was applied and the interpretations of the coefficients showed that the relationship between the decision tree branch nodes distance from drain, soil drain, and TWI, respectively, leads to better consequences assessment of landslides in the current study area. The validation results showed that both success and prediction rates, 75 and 79 %, respectively. This study proved the efficiency and reliability of ensemble DT and LR model in landslide susceptibility mapping.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Akgün A (2012) A comparison of landslide susceptibility maps produced by logistic regression, multi-criteria decision, and likelihood ratio methods: a case study at İzmir, Turkey. Landslides 9:93–106

    Article  Google Scholar 

  • Akgün A, Turk N (2013) An assessment of conditioning parameter selection efficiency on medium scale erosion susceptibility mapping by gis and remote sensing methodologies: an example from Northwest Turkey, EGU General Assembly Conference Abstracts, pp 7457

  • Akgun A, Sezer EA, Nefeslioglu HA, Gokceoglu C, Pradhan B (2012) An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Comput Geosci 38(1):23–34

    Article  Google Scholar 

  • Althuwaynee OF, Pradhan B, Lee S (2012a) Application of an evidential belief function model in landslide susceptibility mapping. Comput Geosci:44 120-135

  • Althuwaynee OF, Pradhan B, Mahmud AR, Yusoff ZM (2012b) Prediction of slope failures using bivariate statistical based index of entropy model, Humanities, Science and Engineering (CHUSER), 2012 IEEE Colloquium on. IEEE, pp 362–367

  • Althuwaynee OF, Pradhan B, Park H-J, Lee JH (2014) A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. Catena 114:21–36

    Article  Google Scholar 

  • Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65:15–31

    Article  Google Scholar 

  • Bai S, Lü G, Wang J, Zhou P, Ding L (2011) GIS-based rare events logistic regression for landslide-susceptibility mapping of Lianyungang, China. Environ Earth Sci 62:139–149

    Article  Google Scholar 

  • Bai S, Wang J, Zhang Z, Cheng C (2012) Combined landslide susceptibility mapping after Wenchuan earthquake at the Zhouqu segment in the Bailongjiang Basin, China. Catena 99:18–25

    Article  Google Scholar 

  • Baker S, Cousins RD (1984) Clarification of the use of CHI-square and likelihood functions in fits to histograms. Nucl Inst Methods Phys Res 221:437–442

    Article  Google Scholar 

  • Beven K, Kirkby M (1979) A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrol Sci J 24:43–69

    Google Scholar 

  • Bozkir AS, Sezer EA (2011) Predicting food demand in food courts by decision tree approaches. Procedia Comput Sci 3:759–763

    Article  Google Scholar 

  • Bui DT, Pradhan B, Lofman O, Revhaug I, Dick ØB (2013) Regional prediction of landslide hazard using probability analysis of intense rainfall in the Hoa Binh province, Vietnam. Nat Hazards 66:707–730

    Article  Google Scholar 

  • Bui DT, Ho TC, Revhaug I, Pradhan B, Nguyen DB (2014) landslide susceptibility mapping along the National Road 32 of Vietnam using GIS-based J48 decision tree classifier and its ensembles, cartography from pole to pole. Springer, pp 303–317

  • Crozier MJ (1999) Prediction of rainfall-triggered landslides: a test of the Antecedent Water Status Model. Earth Surf Process Landf 24:825–833

    Article  Google Scholar 

  • Dai F, Lee C (2002) Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42:213–228

    Article  Google Scholar 

  • Demyanyk Y, Hasan I (2010) Financial crises and bank failures: a review of prediction methods. Omega 38:315–324

    Article  Google Scholar 

  • Domínguez-Cuesta MJ, Jiménez-Sánchez M, Berrezueta E (2007) Landslides in the Central Coalfield (Cantabrian Mountains, NW Spain): geomorphological features, conditioning factors and methodological implications in susceptibility assessment. Geomorphology 89:358–369

    Article  Google Scholar 

  • Guzzetti F, Reichenbach P, Ardizzone F, Cardinali M, Galli M (2006) Estimating the quality of landslide susceptibility models. Geomorphology 81:166–184

    Article  Google Scholar 

  • Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley-Interscience

  • Jakob M, Hungr O (2005) Debris-flow hazards and related phenomena. Springer

  • Jenness E (2011) An ArcGis extention for analyzing raster elevation dataset. Jenness Enterprises, Flagstaff, AZ 86004 USA

  • Jeong G-C, Kim K-S, Choo C-O, Kim J-T, Kim M-I (2011) Characteristics of landslides induced by a debris flow at different geology with emphasis on clay mineralogy in South Korea. Nat Hazards 59:347–365

    Article  Google Scholar 

  • Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29:119–127

    Article  Google Scholar 

  • Lange K (2002) Mathematical and statistical methods for genetic analysis. Springer Verlag

  • Lee S (2004) Application of likelihood ratio and logistic regression models to landslide susceptibility mapping using GIS. Environ Manag 34:223–232

    Article  Google Scholar 

  • Lee S, Min K (2001) Statistical analysis of landslide susceptibility at Yongin, Korea. Environ Geol 40:1095–1113

    Article  Google Scholar 

  • Lee S, Oh H-J (2012) Ensemble-based landslide susceptibility maps in Jinbu area, Korea, Terrigenous Mass Movements. Springer, pp 193-220

  • Lee S, Hwang J, Park I (2013) Application of data-driven evidential belief functions to landslide susceptibility mapping in Jinbu, Korea. Catena 100:15–30

    Article  Google Scholar 

  • Magidson J (1993) SPSS for Windows CHAID reléase 6.0. SPSS Incorporated

  • Menard S (2000) Coefficients of determination for multiple logistic regression analysis. Am Stat 54:17–24

    Google Scholar 

  • Naftulin IS, Rebrova OY (2010) Application of C&RT, CHAID, C4. 5 and WizWhy algorithms for stroke type diagnosis, Artificial Intelligence and Soft Computing. Springer, pp 651–656

  • Nefeslioglu H, Gokceoglu C, Sonmez H (2008) An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps. Eng Geol 97:171–191

    Article  Google Scholar 

  • Nefeslioglu H, Sezer E, Gokceoglu C, Bozkir A, Duman T (2010) Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey. Math Probl Eng 2010:15

    Article  Google Scholar 

  • Nisbet R, Elder IV J, Miner G (2009) Handbook of statistical analysis and data mining applications. Academic

  • Ohlmacher GC, Davis JC (2003) Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng Geol 69:331–343

    Article  Google Scholar 

  • Pal M, Mather PM (2003) An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens Environ 86:554–565

    Article  Google Scholar 

  • Pourghasemi HR, Mohammadi M, Pradhan B (2012a) Landslide susceptibility mapping using index of entropy and conditional probability models at Safarood Basin, Iran. Catena 97:71–84. doi:10.1016/j.catena.2012.05.005

    Article  Google Scholar 

  • Pourghasemi HR, Pradhan B, Gokceoglu C (2012b) Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat Hazards 63:965–996

    Article  Google Scholar 

  • Pourghasemi HR, Jirandeh AG, Pradhan B, Xu C, Gokceoglu C (2013a) Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran. J Earth Syst Sci 122:1–21

    Article  Google Scholar 

  • Pourghasemi HR, Moradi HR, Fatemi Aghda SM, Gokceoglu C, Pradhan B (2013b) GIS-based landslide susceptibility mapping with probabilistic likelihood ratio and spatial multi criteria evaluation models (North of Tehran, Iran). Arab J Geosci. doi:10.1007/s12517-012-0825-x (Article online first available)

    Google Scholar 

  • Pourghasemi H, Pradhan B, Gokceoglu C, Moezzi KD (2013c) A comparative assessment of prediction capabilities of Dempster–Shafer and weights-of-evidence models in landslide susceptibility mapping using GIS. Geomatics Nat Hazards Risk 4:93–118

    Article  Google Scholar 

  • Pradhan B (2010) Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. J Indian Soc Remote Sens 38:301–320

    Article  Google Scholar 

  • Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365

    Article  Google Scholar 

  • Pradhan B, Lee S (2010) Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environ Model Softw 25:747–759

    Article  Google Scholar 

  • Pradhan B, Lee S, Buchroithner MF (2010a) Remote sensing and GIS-based landslide susceptibility analysis and its cross-validation in three test areas using a frequency ratio model. Photogrammetrie, Fernerkundung, Geoinformation 2010:17–32

    Article  Google Scholar 

  • Pradhan B, Oh JJ, Buchroithner MF (2010b) Weight-of-evidence model applied to landslide susceptibility mapping in a tropical hilly area. Geomatics Nat Hazards Risk 1(3):199–223. doi:10.1080/19475705.2010.498151

    Article  Google Scholar 

  • Pradhan B, Youssef AM, Varathrajoo R (2010c) Approaches for delineating landslide hazard areas using different training sites in an advanced artificial neural network model. Geospatial Inf Sci 13(2):93–102

    Article  Google Scholar 

  • Pradhan B, Mansor S, Pirasteh S, Buchroithner MF (2011) Landslide hazard and risk analyses at a landslide prone catchment area using statistical based geospatial model. Int J Remote Sens 32(14):4075–4087

    Article  Google Scholar 

  • Quinlan JR (1993) C4. 5: programs for machine learning. Kaufmann

  • Rygielski C, Wang J-C, Yen DC (2002) Data mining techniques for customer relationship management. Technol Soc 24:483–502

    Article  Google Scholar 

  • Saito H, Nakayama D, Matsuyama H (2009) Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: the Akaishi Mountains, Japan. Geomorphology 109:108–121

    Article  Google Scholar 

  • Sassa K, Canuti P (2008) Landslides-disaster risk reduction. Springer

  • Schmidt K, Roering J, Stock J, Dietrich W, Montgomery D, Schaub T (2001) The variability of root cohesion as an influence on shallow landslide susceptibility in the Oregon Coast Range. Can Geotech J 38:995–1024

    Article  Google Scholar 

  • Sezer EA, Pradhan B, Gokceoglu C (2011) Erratum: Erratum to: manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia [Expert Systems with Applications 38 (2011) 8208-8219]. Expert Syst Appl Int J 40:2360

    Article  Google Scholar 

  • SPSS (1998) Answer tree 2.0: user’s guide. SPSS Chicago

  • StatisticsSolutions (2012) CHAID. Statistics Solutions, 2627 McCormick Drive Suite 102, Clearwater, FL 33759

  • Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and ensemble bivariate and multivariate statistical models. J Hydrol 504:69–79. doi:10.1016/j.jhydrol.2013.09.034

    Article  Google Scholar 

  • Terratech (1994) Inventory of forest landslide occurrence in the kamloops forest region, Terratech Western profile consultants LTD

  • Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2011) Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro fuzzy inference system and GIS. Comput Geosci 45:199–211. doi:10.1016/j.cageo.2011.10.031

    Article  Google Scholar 

  • Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012a) Landslide susceptibility assessment in the Hoa Binh province of Vietnam: a comparison of the Levenberg–Marquardt and Bayesian regularized neural networks. Geomorphology 171:12–29

    Article  Google Scholar 

  • Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012b) Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena 96:28–40

    Article  Google Scholar 

  • Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012c) Landslide susceptibility assessment in Vietnam using support vector machines, decision tree and naïve Bayes models. Mathematical Problems in Engineering, 1–26 (http://www.hindawi.com/journals/mpe/aip/974638/)

  • Van Den Eeckhaut M, Hervás J, Jaedicke C, Malet JP, Montanarella L, Nadim F (2012) Statistical modelling of Europe-wide landslide susceptibility using limited landslide inventory data. Landslides 9:357–369

    Article  Google Scholar 

  • Xu C, Xu X, Dai F, Xiao J, Tan X, Yuan R (2012) Landslide hazard mapping using GIS and weight of evidence model in Qingshui River watershed of 2008 Wenchuan earthquake struck region. J Earth Sci 23:97–120

    Article  Google Scholar 

  • Yeon Y-K, Han J-G, Ryu KH (2010) Landslide susceptibility mapping in Injae, Korea, using a decision tree. Eng Geol 116:274–283

    Article  Google Scholar 

  • Zare M, Pourghasemi HR, Vafakhah M, Pradhan B (2012) Landslide susceptibility mapping at VazWatershed (Iran) using an artificial neural network model: a comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arab J Geosci 5:1–16

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by UPM University Research Grant (05-01-11-1283RU) to stimulate research under the RUGS scheme with project number 9344100 and National Research Foundation of Korea grants funded by Korea government (No. 2012M3A2A1050984). Thanks to two anonymous reviewers for their valuable constructive comments which helped us to improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Biswajeet Pradhan or Hyuck-Jin Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Althuwaynee, O.F., Pradhan, B., Park, HJ. et al. A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping. Landslides 11, 1063–1078 (2014). https://doi.org/10.1007/s10346-014-0466-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10346-014-0466-0

Keywords

Navigation