Abstract
This paper explores the use of adaptive support vector machines, random forests and AdaBoost for landslide susceptibility mapping in three separated regions of Canton Vaud, Switzerland, based on a set of geological, hydrological and morphological features. The feature selection properties of the three algorithms are studied to analyze the relevance of features in controlling the spatial distribution of landslides. The elimination of irrelevant features gives simpler, lower dimensional models while keeping the classification performance high. An object-based sampling procedure is considered to reduce the spatial autocorrelation of data and to estimate more reliably generalization skills when applying the model to predict the occurrence of new unknown landslides. The accuracy of the models, the relevance of features and the quality of landslide susceptibility maps were found to be high in the regions characterized by shallow landslides and low in the ones with deep-seated landslides. Despite providing similar skill, random forests and AdaBoost were found to be more efficient in performing feature selection than adaptive support vector machines. The results of this study reveal the strengths of the classification algorithms, but evidence: (1) the need for relying on more than one method for the identification of relevant variables; (2) the weakness of the adaptive scaling algorithm when used with landslide data; and (3) the lack of additional features which characterize the spatial distribution of deep-seated landslides.
Similar content being viewed by others
References
Adrizzone F, Cardinali M, Carrara A, Guzzetti F, Reichenbach P (2002) Impact of mapping errors on the reliability of landslide hazard. Nat Hazard Earth Sys 2:3–14. doi:10.5194/nhess-2-3-2002
Aleotti P, Chowdhury R (1999) Landslide hazard assessment: summary review and new perspectives. B Eng Geol Environ 58:21–44. doi:10.1007/s100640050066
Atkinson PM, Massari R (1998) Generalised linear modelling of susceptibility to landsliding in Central Appennines, Italy. Comp Geosci 24:373–385. doi:10.1016/s0098-3004(97)00117-9
Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65:15–31. doi:10.1016/j.geomorph.2004.06.010
Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility mapping: the Staffora River Basin Case Study, Italy. Math Geosci 40:47–70. doi:10.1007/s11004-011-9379-9
Bollinger D, Hegg C, Keusen HR, Lateltin O (2012) Ursachenanalyse der Hanginstabilitäten 1999. Bull Angew Geol 5:5–38
Bonnard C (2006) Evaluation et prédiction des mouvements des grandes phénomènes d’instabilité de pente. Bull Angew Geol 11:89–100
Breiman L (2001) Random forests. Mach Learn 45:5–32. doi:10.1023/A:1010933404324
Brenning A (2005) Spatial prediction models for landslide hazards: review, comparison and evaluation. Nat Hazard Earth Sys 5:853–862. doi:10.5194/nhess-5-853-2005
Brenning A (2009) Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection. Remote Sens Environ 113:239–247. doi:10.1016/j.rse.2008.09.005
Brenning A (2012), Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package sperrorest. In: International geoscience and remote sensing symposium (IGARSS), IEEE, International, pp 5372–5375. doi:10.1109/IGARSS.2012.6352393
Brenning A, Trombotto D (2006) Logistic regression modeling of rock glacier and glacier distribution: topographic and climatic controls in the semi-arid Andes. Geomorphology 81:141–154. doi:10.1016/j.geomorph.2006.04.003
Canu S, Grandvalet Y, Guigue V, Rakotomamonjy A (2005) SVM and Kernel Methods Matlab toolbox. Perception Systèmes et Information, INSA de Rouen, Rouen, France
Carrara A (1983) Multivariate models for landslide hazard evaluation. Math Geol 15:403–426. doi:10.1007/BF01031290
Carrara A, Cardinali M, Detti R, Guzzetti F, Pasqui V, Reichenbach P (1991) GIS techniques and statistical models in evaluating landslide hazard. Earth Surf Proc Land 16:427–445. doi:10.1002/esp.3290160505
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, pp 161–168. doi:10.1145/1143844.1143865
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27
Cherkassky V, Mulier F (2007) Learning from data: concepts, theory, and methods. Wiley, New York
Daia F, Lee C (2002) Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42:213–228. doi:10.1016/S0169-555X(01)00087-3
Dietrich WE, Reiss R, Hsu ML, Montgomery DR (1995) A process-based model for colluvial soil depth and shallow landsliding using digital elevation data. Hyrol Process 9:383–400. doi:10.1002/hyp.3360090311
Egan J (1975) Signal detection theory and ROC analysis. Academic Press, New York
Ermini L, Catani F, Casagli N (2005) Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 66:327–343. doi:10.1016/j.geomorph.2004.09.025
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Foresti L, Tuia D, Kanevski M, Pozdnoukhov A (2011) Learning wind fields with multiple kernels. Stoch Env Res Risk A 25:51–66. doi:10.1007/s00477-010-0405-0
Foresti L, Kanevski M, Pozdnoukhov A (2012) Kernel-based mapping of orographic rainfall enhancement in the Swiss Alps as detected by weather radar. IEEE T Geosci Remote 99:1–14. doi:10.1109/TGRS.2011.2179550
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139. doi:10.1006/jcss1997.1504
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232. doi:10.1214/aos/1013203451
Friedman J (2002) Stochastic gradient boosting. Comput Stat Data An 38:367–378. doi:10.1016/S0167-9473(01)00065-2
Gallus D (2010) Gaussian processes for classification of spatial data in context of an early warning chain. Dissertation, Karlsruhe Institute of Technology
Goetz JN, Guthrie RH, Brenning A (2011) Integrating physical and empirical landslide susceptibility models using generalized additive models. Geomorphology 129:376–386. doi:10.1016/j.geomorph.2011.03.001
Grandvalet Y, Canu S (2003) Adaptive scaling for feature selection in SVMs. Adv Neur In 15:553–560
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Guyon I, Gunn S, Nikravesh M, Zadeh L (eds) (2006) Feature extraction: foundations and applications. Springer, New York
Guzzetti F, Carrara A, Cardinali M, Reichenbach P (1999) Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31:181–216. doi:10.1016/S0169-555X(99)00078-1
Guzzetti F, Reichenbach P, Ardizzone F, Cardinali M, Galli M (2006) Estimating the quality of landslide susceptibility models. Geomorphology 81:166–184. doi:10.1016/j.geomorph.2006.04.007
Guzzetti F, Adrizzone F, Cardinali M, Rossi M, Valigi D (2009) Landslide volumes and landslide mobilization rates in Umbria, Central Italy. Earth Planet Sci Lett 279:222–229. doi:10.1016/j.epsl.2009.01.005
Haykin S (1999) Neural Netwoks: a comprehensive foundation, 2nd edn. Prentice Hall, Upper Saddle River
Kalbermatten M, Van De Ville D, Turberg P, Tuia D, Joost S (2011) Multiscale analysis of geomorphological and geological features in high resolution digital elevation models using the wavelet transform. Geomorphology 138:352–363. doi:10.1016/j.geomorph.2011.09.023
Kanevski M, Pozdnoukhov A, Timonin V (2009) Machine Learning For Spatial Environmental Data: Theory. Applications and Software. EPFL Press, Lausanne
Lee E (1974) A computer program for linear logistic regression analysis. Comput Prog Biomed 4:80–92. doi:10.1016/0010-468X(74)90011-7
Lee L, Ryu J, Won J, Park H (2004) Determination and application of the weights for landslide susceptibility mapping using and artificial neural network. Eng Geol 71:289–302. doi:10.1016/S0013-7952(03)00142-X
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22
Liess M, Glaser B, Huwe B (2011) Functional soil-landscape modelling to estimate slope stability in a steep Andean mountain forest region. Geomorphology 132:287–299. doi:10.1016/j.geomorph.2011.05.015
Lin HT, Lin CJ, Weng RC (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68:267–276. doi:10.1007/s10994-0075018-6
Marmion M, Hjort J, Thuiller W, Luoto M (2008) A comparison of predictive methods in modelling the distribution of periglacial landforms in Finnish Lapland. Earth Surf Proc Land 33:2241–2254. doi:10.1002/esp.1695
Marmion M, Hjort J, Thuiller W, Luoto M (2009) Statistical consensus methods for improving predictive geomorphology maps. Comp Geosci 35:615–625. doi:10.1016/j.cageo.2008.02.024
Melchiorre C, Matteucci M, Azzoni A, Zanchi A (2008) Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology 94:379–400. doi:10.1016/j.geomorph.2006.10.035
Moguerza JM, Munoz A (2006) Support vector machines with applications. Stat Sci 21:322–336. doi:10.1214/088342306000000493
Montgomery DR, Dietrich WE (1994) A physically based model for the topographic control on shallow landsliding. Water Resour Res 30:1153–1171. doi:10.1029/93WR02979
Mosar J, Stampfli GM, Girod F (1996) Western Préalpes Médianes Romandes: timing and structure. A review. Eclogae Geol Helv 89:389–425
Muchoney D, Strahler A (2002) Pixel- and site-based calibration and validation methods for evaluating supervised classification of remotely sensed data. Remote Sens Environ 81:290–299. doi:10.1016/S0034-4257(02)00006-8
Neaupane K, Achet S (2004) Use of backpropagation neural network for landslide monitoring: a case study in the higher himalaya. Eng Geol 74:213–226. doi:10.1016/j.enggeo.2004.03.010
Nefeslioglu H, Gokceoglu C, Sonmez H (2008) An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for preparation of landslide susceptibility maps. Eng Geol 97:171–191. doi:10.1016/j.enggeo.2008.01.004
Nicodemus KK (2011) Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures predictors from random forest variable importance measures. Brief Bioinform 12:369–373. doi:10.1093/bib/bbr016
Noverraz F (1994) Carte des instabilitiés de terrain du Canton de Vaud. Rapport conclusif et explicatif des travaux de levé de cartes. Swiss Federal Institute of Technology, Lausanne
Noverraz F, Bonnard C (1990) Mapping methodology of landslide and rockfall in Switzerland. In: ALPS 90, Alpine landslide practical seminar, Milano, pp 43–53
Ohlmacher GC, Davis JC (2003) Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng Geol 69:331–343. doi:10.1016/S0013-7952(03)00069-3
Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Disc 12:203–228. doi:10.1007/s10618-005-0014-6
Park NW, Chi KH (2008) Quantitative assessment of landslide susceptibility using high-resolution remote sensing data and a generalized additive model. Int J Remote Sens 29:247–264. doi:10.1080/01431160701227661
Pedrazzini A, Surace I, Horton P, Loye A (2008) Cartes Indicatives de Danger des Mouvements de Versants du Canton de Vaud. Faculty of Geosciences and Environment, University of Lausanne
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74
Pradhan B, Lee S (2010) Landslide susceptibility assessment and factor effect analysis: back-propagation artificial neural networks and their comparison with frequency ration and bivariate logistic regression modelling. Environ Modell Softw 25:747–759. doi:10.1016/j.envsoft.2009.10.016
R Core Team (2013) R: A language and environment for statistical computing. Vienna, Austria. http://www.R-project.org/. Accessed 17 January 2013
Ridgeway G (2013) gmb: Generalized Boosted Regression Models. R package version 2.1. http://CRAN.R-project.org/package=gbm. Accessed 17 January 2013
Schölkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Soeters R, van Westen CJ (1996) Slope instability recognition, analysis, and zonation. In: Turner AK, Schuster RL (eds) Landslide: investigations and mitigation. National Academy Press, Washington D.C., pp 129–177
Strobl C, Boulestiex AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8:25. doi:10.1186/1471-2105-8-25
Stumpf A, Kerle N (2011) Object-oriented mapping of landslides using random forests. Remote Sens Environ 115:2564–2577. doi:10.1016/j.rse.2011.05.013
Suzen M, Doyuran V (2004) Data driven bivariate landslide susceptibility assessment using geographical information systems: a method and application to Asarsuyu catchment, Turkey. Eng Geol 71:303–321. doi:10.1016/S0013-7952(03)00143-1
Tacher L, Bonnard C, Laloui L, Parriaux A (2005) Modelling the behaviour of a large landslide with respect to hydrogeological and geomechanical parameter heterogeneity. Landslides 2:3–14. doi:10.1007/s10346-004-0038-9
Tarboton DG (2005) Terrain analysis using digital elevation models (TauDEM). http://hydrology.usu.edu. Accessed 21 November 2012
Terlien M, van Westen CJ, van Asch T (1995) Deterministic modelling in GIS-based landslide hazard assessment. In: Carrara A, Guzzetti F (eds) Geographical information systems in assessing natural hazards. Kluwer, Dordrecht, pp 55–77
Trumpy R (1980) Geology of Switzerland, a guide book. Part A, an outline of the geology of Switzerland. Earth Sci Rev 17:3
Tullen R (2000) Glissement de la Chenolette (Bex-Les Plans, VD). Bull Géol Appl 5:39–45
Vapnik V (1998) Statistical learning theory. Wiley, New York
Varnes DJ (1984) Landslide hazard zonation: a review of principles and practice. Commission of Landslide of IAEG, UNESCO, Natural Hazards, Paris
van Westen CJ, van Asch T, Soeters R (2005) Landslide hazard and risk zonation: why is it still so difficult? B Eng Geol Environ 65:167–184. doi:10.1007/s10064-005-0023-0
van Westen CJ, Castellanos Abella EA (2008) Spatial data for landslide susceptibility, hazards and vulnerability assessment: an overview. Eng Geol 102:112–131. doi:10.1016/j.enggeo.2008.03.010
Yao X, Tham L, Dai F (2008) Landslide susceptibility mapping based on support vector machines: a case study on natural slopes of Hong Kong, China. Geomorphology 101:572–582. doi:10.1016/j.geomorph.2008.02.011
Yesilnacar E, Topal T (2005) Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng Geol 79:251–266. doi:10.1016/j.enggeo.2005.02.002
Yilmaz I (2010a) Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ Earth Sci 61:821–836. doi:10.1007/s12665-009-0394-9
Yilmaz I (2010b) The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks. Environ Earth Sci 60:505–519. doi:10.1007/s12665-009-0191-5
Acknowledgments
This study was partially funded by the Swiss National Science Foundation projects Geokernels: kernel-based methods for geo- and environmental sciences. Phase II (No. 200020-121835/1) and rockslides in Rhône valley (No. 200021-118105). We thank Prof. Stuart Lane for the interesting comments provided and Pierrick Nicolet for his valuable help. We also are grateful to the two anonymous reviewers, who provided us with constructive comments and helped in improving the quality of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Micheletti, N., Foresti, L., Robert, S. et al. Machine Learning Feature Selection Methods for Landslide Susceptibility Mapping. Math Geosci 46, 33–57 (2014). https://doi.org/10.1007/s11004-013-9511-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-013-9511-0