Abstract
An ensemble algorithm of data mining decision tree (DT)-based CHi-squared Automatic Interaction Detection (CHAID) is widely used for prediction analysis in variety of applications. CHAID as a multivariate method has an automatic classification capacity to analyze large numbers of landslide conditioning factors. Moreover, it results two or more nodes for each independent variable, where every node contains numbers of presence or absence of landslides (dependent variable). Other DT methods such as Quick, Unbiased, Efficient Statistic Tree (QUEST) and Classification and Regression Trees (CRT) are not able to produce multi branches based tree. Thus, the main objective of this paper is to use CHAID method to perform the best classification fit for each conditioning factors, then, combined it with logistic regression (LR) to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. In the first step, a landslide inventory map with 296 landslide locations were extracted from various sources over the Pohang-Kyeong Joo catchment (South Korea). Then, the inventory was randomly split into two datasets, 70 % was used for training the models, and the remaining 30 % was used for validation purpose. Thirteen landslide conditioning factors were used for the susceptibility modeling. Then, CHAID was applied and revealed that some conditioning factors such as altitude, soil drain, soil texture and TWI, as terminal nodes and reflected the best classification fit. Then, a proposed ensemble technique was applied and the interpretations of the coefficients showed that the relationship between the decision tree branch nodes distance from drain, soil drain, and TWI, respectively, leads to better consequences assessment of landslides in the current study area. The validation results showed that both success and prediction rates, 75 and 79 %, respectively. This study proved the efficiency and reliability of ensemble DT and LR model in landslide susceptibility mapping.
Similar content being viewed by others
References
Akgün A (2012) A comparison of landslide susceptibility maps produced by logistic regression, multi-criteria decision, and likelihood ratio methods: a case study at İzmir, Turkey. Landslides 9:93–106
Akgün A, Turk N (2013) An assessment of conditioning parameter selection efficiency on medium scale erosion susceptibility mapping by gis and remote sensing methodologies: an example from Northwest Turkey, EGU General Assembly Conference Abstracts, pp 7457
Akgun A, Sezer EA, Nefeslioglu HA, Gokceoglu C, Pradhan B (2012) An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Comput Geosci 38(1):23–34
Althuwaynee OF, Pradhan B, Lee S (2012a) Application of an evidential belief function model in landslide susceptibility mapping. Comput Geosci:44 120-135
Althuwaynee OF, Pradhan B, Mahmud AR, Yusoff ZM (2012b) Prediction of slope failures using bivariate statistical based index of entropy model, Humanities, Science and Engineering (CHUSER), 2012 IEEE Colloquium on. IEEE, pp 362–367
Althuwaynee OF, Pradhan B, Park H-J, Lee JH (2014) A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. Catena 114:21–36
Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65:15–31
Bai S, Lü G, Wang J, Zhou P, Ding L (2011) GIS-based rare events logistic regression for landslide-susceptibility mapping of Lianyungang, China. Environ Earth Sci 62:139–149
Bai S, Wang J, Zhang Z, Cheng C (2012) Combined landslide susceptibility mapping after Wenchuan earthquake at the Zhouqu segment in the Bailongjiang Basin, China. Catena 99:18–25
Baker S, Cousins RD (1984) Clarification of the use of CHI-square and likelihood functions in fits to histograms. Nucl Inst Methods Phys Res 221:437–442
Beven K, Kirkby M (1979) A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrol Sci J 24:43–69
Bozkir AS, Sezer EA (2011) Predicting food demand in food courts by decision tree approaches. Procedia Comput Sci 3:759–763
Bui DT, Pradhan B, Lofman O, Revhaug I, Dick ØB (2013) Regional prediction of landslide hazard using probability analysis of intense rainfall in the Hoa Binh province, Vietnam. Nat Hazards 66:707–730
Bui DT, Ho TC, Revhaug I, Pradhan B, Nguyen DB (2014) landslide susceptibility mapping along the National Road 32 of Vietnam using GIS-based J48 decision tree classifier and its ensembles, cartography from pole to pole. Springer, pp 303–317
Crozier MJ (1999) Prediction of rainfall-triggered landslides: a test of the Antecedent Water Status Model. Earth Surf Process Landf 24:825–833
Dai F, Lee C (2002) Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42:213–228
Demyanyk Y, Hasan I (2010) Financial crises and bank failures: a review of prediction methods. Omega 38:315–324
Domínguez-Cuesta MJ, Jiménez-Sánchez M, Berrezueta E (2007) Landslides in the Central Coalfield (Cantabrian Mountains, NW Spain): geomorphological features, conditioning factors and methodological implications in susceptibility assessment. Geomorphology 89:358–369
Guzzetti F, Reichenbach P, Ardizzone F, Cardinali M, Galli M (2006) Estimating the quality of landslide susceptibility models. Geomorphology 81:166–184
Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley-Interscience
Jakob M, Hungr O (2005) Debris-flow hazards and related phenomena. Springer
Jenness E (2011) An ArcGis extention for analyzing raster elevation dataset. Jenness Enterprises, Flagstaff, AZ 86004 USA
Jeong G-C, Kim K-S, Choo C-O, Kim J-T, Kim M-I (2011) Characteristics of landslides induced by a debris flow at different geology with emphasis on clay mineralogy in South Korea. Nat Hazards 59:347–365
Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29:119–127
Lange K (2002) Mathematical and statistical methods for genetic analysis. Springer Verlag
Lee S (2004) Application of likelihood ratio and logistic regression models to landslide susceptibility mapping using GIS. Environ Manag 34:223–232
Lee S, Min K (2001) Statistical analysis of landslide susceptibility at Yongin, Korea. Environ Geol 40:1095–1113
Lee S, Oh H-J (2012) Ensemble-based landslide susceptibility maps in Jinbu area, Korea, Terrigenous Mass Movements. Springer, pp 193-220
Lee S, Hwang J, Park I (2013) Application of data-driven evidential belief functions to landslide susceptibility mapping in Jinbu, Korea. Catena 100:15–30
Magidson J (1993) SPSS for Windows CHAID reléase 6.0. SPSS Incorporated
Menard S (2000) Coefficients of determination for multiple logistic regression analysis. Am Stat 54:17–24
Naftulin IS, Rebrova OY (2010) Application of C&RT, CHAID, C4. 5 and WizWhy algorithms for stroke type diagnosis, Artificial Intelligence and Soft Computing. Springer, pp 651–656
Nefeslioglu H, Gokceoglu C, Sonmez H (2008) An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps. Eng Geol 97:171–191
Nefeslioglu H, Sezer E, Gokceoglu C, Bozkir A, Duman T (2010) Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey. Math Probl Eng 2010:15
Nisbet R, Elder IV J, Miner G (2009) Handbook of statistical analysis and data mining applications. Academic
Ohlmacher GC, Davis JC (2003) Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng Geol 69:331–343
Pal M, Mather PM (2003) An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens Environ 86:554–565
Pourghasemi HR, Mohammadi M, Pradhan B (2012a) Landslide susceptibility mapping using index of entropy and conditional probability models at Safarood Basin, Iran. Catena 97:71–84. doi:10.1016/j.catena.2012.05.005
Pourghasemi HR, Pradhan B, Gokceoglu C (2012b) Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat Hazards 63:965–996
Pourghasemi HR, Jirandeh AG, Pradhan B, Xu C, Gokceoglu C (2013a) Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran. J Earth Syst Sci 122:1–21
Pourghasemi HR, Moradi HR, Fatemi Aghda SM, Gokceoglu C, Pradhan B (2013b) GIS-based landslide susceptibility mapping with probabilistic likelihood ratio and spatial multi criteria evaluation models (North of Tehran, Iran). Arab J Geosci. doi:10.1007/s12517-012-0825-x (Article online first available)
Pourghasemi H, Pradhan B, Gokceoglu C, Moezzi KD (2013c) A comparative assessment of prediction capabilities of Dempster–Shafer and weights-of-evidence models in landslide susceptibility mapping using GIS. Geomatics Nat Hazards Risk 4:93–118
Pradhan B (2010) Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. J Indian Soc Remote Sens 38:301–320
Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365
Pradhan B, Lee S (2010) Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environ Model Softw 25:747–759
Pradhan B, Lee S, Buchroithner MF (2010a) Remote sensing and GIS-based landslide susceptibility analysis and its cross-validation in three test areas using a frequency ratio model. Photogrammetrie, Fernerkundung, Geoinformation 2010:17–32
Pradhan B, Oh JJ, Buchroithner MF (2010b) Weight-of-evidence model applied to landslide susceptibility mapping in a tropical hilly area. Geomatics Nat Hazards Risk 1(3):199–223. doi:10.1080/19475705.2010.498151
Pradhan B, Youssef AM, Varathrajoo R (2010c) Approaches for delineating landslide hazard areas using different training sites in an advanced artificial neural network model. Geospatial Inf Sci 13(2):93–102
Pradhan B, Mansor S, Pirasteh S, Buchroithner MF (2011) Landslide hazard and risk analyses at a landslide prone catchment area using statistical based geospatial model. Int J Remote Sens 32(14):4075–4087
Quinlan JR (1993) C4. 5: programs for machine learning. Kaufmann
Rygielski C, Wang J-C, Yen DC (2002) Data mining techniques for customer relationship management. Technol Soc 24:483–502
Saito H, Nakayama D, Matsuyama H (2009) Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: the Akaishi Mountains, Japan. Geomorphology 109:108–121
Sassa K, Canuti P (2008) Landslides-disaster risk reduction. Springer
Schmidt K, Roering J, Stock J, Dietrich W, Montgomery D, Schaub T (2001) The variability of root cohesion as an influence on shallow landslide susceptibility in the Oregon Coast Range. Can Geotech J 38:995–1024
Sezer EA, Pradhan B, Gokceoglu C (2011) Erratum: Erratum to: manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia [Expert Systems with Applications 38 (2011) 8208-8219]. Expert Syst Appl Int J 40:2360
SPSS (1998) Answer tree 2.0: user’s guide. SPSS Chicago
StatisticsSolutions (2012) CHAID. Statistics Solutions, 2627 McCormick Drive Suite 102, Clearwater, FL 33759
Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and ensemble bivariate and multivariate statistical models. J Hydrol 504:69–79. doi:10.1016/j.jhydrol.2013.09.034
Terratech (1994) Inventory of forest landslide occurrence in the kamloops forest region, Terratech Western profile consultants LTD
Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2011) Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro fuzzy inference system and GIS. Comput Geosci 45:199–211. doi:10.1016/j.cageo.2011.10.031
Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012a) Landslide susceptibility assessment in the Hoa Binh province of Vietnam: a comparison of the Levenberg–Marquardt and Bayesian regularized neural networks. Geomorphology 171:12–29
Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012b) Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena 96:28–40
Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012c) Landslide susceptibility assessment in Vietnam using support vector machines, decision tree and naïve Bayes models. Mathematical Problems in Engineering, 1–26 (http://www.hindawi.com/journals/mpe/aip/974638/)
Van Den Eeckhaut M, Hervás J, Jaedicke C, Malet JP, Montanarella L, Nadim F (2012) Statistical modelling of Europe-wide landslide susceptibility using limited landslide inventory data. Landslides 9:357–369
Xu C, Xu X, Dai F, Xiao J, Tan X, Yuan R (2012) Landslide hazard mapping using GIS and weight of evidence model in Qingshui River watershed of 2008 Wenchuan earthquake struck region. J Earth Sci 23:97–120
Yeon Y-K, Han J-G, Ryu KH (2010) Landslide susceptibility mapping in Injae, Korea, using a decision tree. Eng Geol 116:274–283
Zare M, Pourghasemi HR, Vafakhah M, Pradhan B (2012) Landslide susceptibility mapping at VazWatershed (Iran) using an artificial neural network model: a comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arab J Geosci 5:1–16
Acknowledgments
This research was supported by UPM University Research Grant (05-01-11-1283RU) to stimulate research under the RUGS scheme with project number 9344100 and National Research Foundation of Korea grants funded by Korea government (No. 2012M3A2A1050984). Thanks to two anonymous reviewers for their valuable constructive comments which helped us to improve the manuscript.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Althuwaynee, O.F., Pradhan, B., Park, HJ. et al. A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping. Landslides 11, 1063–1078 (2014). https://doi.org/10.1007/s10346-014-0466-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10346-014-0466-0