Skip to main content

Advertisement

Log in

Machine Learning Feature Selection Methods for Landslide Susceptibility Mapping

  • Published:
Mathematical Geosciences Aims and scope Submit manuscript

Abstract

This paper explores the use of adaptive support vector machines, random forests and AdaBoost for landslide susceptibility mapping in three separated regions of Canton Vaud, Switzerland, based on a set of geological, hydrological and morphological features. The feature selection properties of the three algorithms are studied to analyze the relevance of features in controlling the spatial distribution of landslides. The elimination of irrelevant features gives simpler, lower dimensional models while keeping the classification performance high. An object-based sampling procedure is considered to reduce the spatial autocorrelation of data and to estimate more reliably generalization skills when applying the model to predict the occurrence of new unknown landslides. The accuracy of the models, the relevance of features and the quality of landslide susceptibility maps were found to be high in the regions characterized by shallow landslides and low in the ones with deep-seated landslides. Despite providing similar skill, random forests and AdaBoost were found to be more efficient in performing feature selection than adaptive support vector machines. The results of this study reveal the strengths of the classification algorithms, but evidence: (1) the need for relying on more than one method for the identification of relevant variables; (2) the weakness of the adaptive scaling algorithm when used with landslide data; and (3) the lack of additional features which characterize the spatial distribution of deep-seated landslides.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Adrizzone F, Cardinali M, Carrara A, Guzzetti F, Reichenbach P (2002) Impact of mapping errors on the reliability of landslide hazard. Nat Hazard Earth Sys 2:3–14. doi:10.5194/nhess-2-3-2002

    Article  Google Scholar 

  • Aleotti P, Chowdhury R (1999) Landslide hazard assessment: summary review and new perspectives. B Eng Geol Environ 58:21–44. doi:10.1007/s100640050066

  • Atkinson PM, Massari R (1998) Generalised linear modelling of susceptibility to landsliding in Central Appennines, Italy. Comp Geosci 24:373–385. doi:10.1016/s0098-3004(97)00117-9

    Article  Google Scholar 

  • Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65:15–31. doi:10.1016/j.geomorph.2004.06.010

    Article  Google Scholar 

  • Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility mapping: the Staffora River Basin Case Study, Italy. Math Geosci 40:47–70. doi:10.1007/s11004-011-9379-9

    Article  Google Scholar 

  • Bollinger D, Hegg C, Keusen HR, Lateltin O (2012) Ursachenanalyse der Hanginstabilitäten 1999. Bull Angew Geol 5:5–38

    Google Scholar 

  • Bonnard C (2006) Evaluation et prédiction des mouvements des grandes phénomènes d’instabilité de pente. Bull Angew Geol 11:89–100

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32. doi:10.1023/A:1010933404324

    Article  Google Scholar 

  • Brenning A (2005) Spatial prediction models for landslide hazards: review, comparison and evaluation. Nat Hazard Earth Sys 5:853–862. doi:10.5194/nhess-5-853-2005

    Article  Google Scholar 

  • Brenning A (2009) Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection. Remote Sens Environ 113:239–247. doi:10.1016/j.rse.2008.09.005

    Article  Google Scholar 

  • Brenning A (2012), Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package sperrorest. In: International geoscience and remote sensing symposium (IGARSS), IEEE, International, pp 5372–5375. doi:10.1109/IGARSS.2012.6352393

  • Brenning A, Trombotto D (2006) Logistic regression modeling of rock glacier and glacier distribution: topographic and climatic controls in the semi-arid Andes. Geomorphology 81:141–154. doi:10.1016/j.geomorph.2006.04.003

    Article  Google Scholar 

  • Canu S, Grandvalet Y, Guigue V, Rakotomamonjy A (2005) SVM and Kernel Methods Matlab toolbox. Perception Systèmes et Information, INSA de Rouen, Rouen, France

  • Carrara A (1983) Multivariate models for landslide hazard evaluation. Math Geol 15:403–426. doi:10.1007/BF01031290

    Google Scholar 

  • Carrara A, Cardinali M, Detti R, Guzzetti F, Pasqui V, Reichenbach P (1991) GIS techniques and statistical models in evaluating landslide hazard. Earth Surf Proc Land 16:427–445. doi:10.1002/esp.3290160505

    Article  Google Scholar 

  • Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, pp 161–168. doi:10.1145/1143844.1143865

  • Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27

    Article  Google Scholar 

  • Cherkassky V, Mulier F (2007) Learning from data: concepts, theory, and methods. Wiley, New York

  • Daia F, Lee C (2002) Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42:213–228. doi:10.1016/S0169-555X(01)00087-3

    Article  Google Scholar 

  • Dietrich WE, Reiss R, Hsu ML, Montgomery DR (1995) A process-based model for colluvial soil depth and shallow landsliding using digital elevation data. Hyrol Process 9:383–400. doi:10.1002/hyp.3360090311

    Article  Google Scholar 

  • Egan J (1975) Signal detection theory and ROC analysis. Academic Press, New York

  • Ermini L, Catani F, Casagli N (2005) Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 66:327–343. doi:10.1016/j.geomorph.2004.09.025

    Article  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Article  Google Scholar 

  • Foresti L, Tuia D, Kanevski M, Pozdnoukhov A (2011) Learning wind fields with multiple kernels. Stoch Env Res Risk A 25:51–66. doi:10.1007/s00477-010-0405-0

    Article  Google Scholar 

  • Foresti L, Kanevski M, Pozdnoukhov A (2012) Kernel-based mapping of orographic rainfall enhancement in the Swiss Alps as detected by weather radar. IEEE T Geosci Remote 99:1–14. doi:10.1109/TGRS.2011.2179550

    Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139. doi:10.1006/jcss1997.1504

    Google Scholar 

  • Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232. doi:10.1214/aos/1013203451

    Article  Google Scholar 

  • Friedman J (2002) Stochastic gradient boosting. Comput Stat Data An 38:367–378. doi:10.1016/S0167-9473(01)00065-2

    Article  Google Scholar 

  • Gallus D (2010) Gaussian processes for classification of spatial data in context of an early warning chain. Dissertation, Karlsruhe Institute of Technology

  • Goetz JN, Guthrie RH, Brenning A (2011) Integrating physical and empirical landslide susceptibility models using generalized additive models. Geomorphology 129:376–386. doi:10.1016/j.geomorph.2011.03.001

    Article  Google Scholar 

  • Grandvalet Y, Canu S (2003) Adaptive scaling for feature selection in SVMs. Adv Neur In 15:553–560

    Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  • Guyon I, Gunn S, Nikravesh M, Zadeh L (eds) (2006) Feature extraction: foundations and applications. Springer, New York

  • Guzzetti F, Carrara A, Cardinali M, Reichenbach P (1999) Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31:181–216. doi:10.1016/S0169-555X(99)00078-1

    Article  Google Scholar 

  • Guzzetti F, Reichenbach P, Ardizzone F, Cardinali M, Galli M (2006) Estimating the quality of landslide susceptibility models. Geomorphology 81:166–184. doi:10.1016/j.geomorph.2006.04.007

    Article  Google Scholar 

  • Guzzetti F, Adrizzone F, Cardinali M, Rossi M, Valigi D (2009) Landslide volumes and landslide mobilization rates in Umbria, Central Italy. Earth Planet Sci Lett 279:222–229. doi:10.1016/j.epsl.2009.01.005

    Article  Google Scholar 

  • Haykin S (1999) Neural Netwoks: a comprehensive foundation, 2nd edn. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Kalbermatten M, Van De Ville D, Turberg P, Tuia D, Joost S (2011) Multiscale analysis of geomorphological and geological features in high resolution digital elevation models using the wavelet transform. Geomorphology 138:352–363. doi:10.1016/j.geomorph.2011.09.023

    Article  Google Scholar 

  • Kanevski M, Pozdnoukhov A, Timonin V (2009) Machine Learning For Spatial Environmental Data: Theory. Applications and Software. EPFL Press, Lausanne

  • Lee E (1974) A computer program for linear logistic regression analysis. Comput Prog Biomed 4:80–92. doi:10.1016/0010-468X(74)90011-7

    Article  Google Scholar 

  • Lee L, Ryu J, Won J, Park H (2004) Determination and application of the weights for landslide susceptibility mapping using and artificial neural network. Eng Geol 71:289–302. doi:10.1016/S0013-7952(03)00142-X

    Article  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22

    Google Scholar 

  • Liess M, Glaser B, Huwe B (2011) Functional soil-landscape modelling to estimate slope stability in a steep Andean mountain forest region. Geomorphology 132:287–299. doi:10.1016/j.geomorph.2011.05.015

    Article  Google Scholar 

  • Lin HT, Lin CJ, Weng RC (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68:267–276. doi:10.1007/s10994-0075018-6

    Article  Google Scholar 

  • Marmion M, Hjort J, Thuiller W, Luoto M (2008) A comparison of predictive methods in modelling the distribution of periglacial landforms in Finnish Lapland. Earth Surf Proc Land 33:2241–2254. doi:10.1002/esp.1695

    Article  Google Scholar 

  • Marmion M, Hjort J, Thuiller W, Luoto M (2009) Statistical consensus methods for improving predictive geomorphology maps. Comp Geosci 35:615–625. doi:10.1016/j.cageo.2008.02.024

    Article  Google Scholar 

  • Melchiorre C, Matteucci M, Azzoni A, Zanchi A (2008) Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology 94:379–400. doi:10.1016/j.geomorph.2006.10.035

    Article  Google Scholar 

  • Moguerza JM, Munoz A (2006) Support vector machines with applications. Stat Sci 21:322–336. doi:10.1214/088342306000000493

    Article  Google Scholar 

  • Montgomery DR, Dietrich WE (1994) A physically based model for the topographic control on shallow landsliding. Water Resour Res 30:1153–1171. doi:10.1029/93WR02979

    Article  Google Scholar 

  • Mosar J, Stampfli GM, Girod F (1996) Western Préalpes Médianes Romandes: timing and structure. A review. Eclogae Geol Helv 89:389–425

    Google Scholar 

  • Muchoney D, Strahler A (2002) Pixel- and site-based calibration and validation methods for evaluating supervised classification of remotely sensed data. Remote Sens Environ 81:290–299. doi:10.1016/S0034-4257(02)00006-8

    Article  Google Scholar 

  • Neaupane K, Achet S (2004) Use of backpropagation neural network for landslide monitoring: a case study in the higher himalaya. Eng Geol 74:213–226. doi:10.1016/j.enggeo.2004.03.010

    Article  Google Scholar 

  • Nefeslioglu H, Gokceoglu C, Sonmez H (2008) An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for preparation of landslide susceptibility maps. Eng Geol 97:171–191. doi:10.1016/j.enggeo.2008.01.004

    Article  Google Scholar 

  • Nicodemus KK (2011) Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures predictors from random forest variable importance measures. Brief Bioinform 12:369–373. doi:10.1093/bib/bbr016

    Article  Google Scholar 

  • Noverraz F (1994) Carte des instabilitiés de terrain du Canton de Vaud. Rapport conclusif et explicatif des travaux de levé de cartes. Swiss Federal Institute of Technology, Lausanne

  • Noverraz F, Bonnard C (1990) Mapping methodology of landslide and rockfall in Switzerland. In: ALPS 90, Alpine landslide practical seminar, Milano, pp 43–53

  • Ohlmacher GC, Davis JC (2003) Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng Geol 69:331–343. doi:10.1016/S0013-7952(03)00069-3

    Article  Google Scholar 

  • Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Disc 12:203–228. doi:10.1007/s10618-005-0014-6

    Article  Google Scholar 

  • Park NW, Chi KH (2008) Quantitative assessment of landslide susceptibility using high-resolution remote sensing data and a generalized additive model. Int J Remote Sens 29:247–264. doi:10.1080/01431160701227661

    Article  Google Scholar 

  • Pedrazzini A, Surace I, Horton P, Loye A (2008) Cartes Indicatives de Danger des Mouvements de Versants du Canton de Vaud. Faculty of Geosciences and Environment, University of Lausanne

  • Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74

    Google Scholar 

  • Pradhan B, Lee S (2010) Landslide susceptibility assessment and factor effect analysis: back-propagation artificial neural networks and their comparison with frequency ration and bivariate logistic regression modelling. Environ Modell Softw 25:747–759. doi:10.1016/j.envsoft.2009.10.016

    Article  Google Scholar 

  • R Core Team (2013) R: A language and environment for statistical computing. Vienna, Austria. http://www.R-project.org/. Accessed 17 January 2013

  • Ridgeway G (2013) gmb: Generalized Boosted Regression Models. R package version 2.1. http://CRAN.R-project.org/package=gbm. Accessed 17 January 2013

  • Schölkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  • Soeters R, van Westen CJ (1996) Slope instability recognition, analysis, and zonation. In: Turner AK, Schuster RL (eds) Landslide: investigations and mitigation. National Academy Press, Washington D.C., pp 129–177

    Google Scholar 

  • Strobl C, Boulestiex AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8:25. doi:10.1186/1471-2105-8-25

    Article  Google Scholar 

  • Stumpf A, Kerle N (2011) Object-oriented mapping of landslides using random forests. Remote Sens Environ 115:2564–2577. doi:10.1016/j.rse.2011.05.013

    Article  Google Scholar 

  • Suzen M, Doyuran V (2004) Data driven bivariate landslide susceptibility assessment using geographical information systems: a method and application to Asarsuyu catchment, Turkey. Eng Geol 71:303–321. doi:10.1016/S0013-7952(03)00143-1

    Article  Google Scholar 

  • Tacher L, Bonnard C, Laloui L, Parriaux A (2005) Modelling the behaviour of a large landslide with respect to hydrogeological and geomechanical parameter heterogeneity. Landslides 2:3–14. doi:10.1007/s10346-004-0038-9

    Article  Google Scholar 

  • Tarboton DG (2005) Terrain analysis using digital elevation models (TauDEM). http://hydrology.usu.edu. Accessed 21 November 2012

  • Terlien M, van Westen CJ, van Asch T (1995) Deterministic modelling in GIS-based landslide hazard assessment. In: Carrara A, Guzzetti F (eds) Geographical information systems in assessing natural hazards. Kluwer, Dordrecht, pp 55–77

    Google Scholar 

  • Trumpy R (1980) Geology of Switzerland, a guide book. Part A, an outline of the geology of Switzerland. Earth Sci Rev 17:3

    Google Scholar 

  • Tullen R (2000) Glissement de la Chenolette (Bex-Les Plans, VD). Bull Géol Appl 5:39–45

    Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

  • Varnes DJ (1984) Landslide hazard zonation: a review of principles and practice. Commission of Landslide of IAEG, UNESCO, Natural Hazards, Paris

  • van Westen CJ, van Asch T, Soeters R (2005) Landslide hazard and risk zonation: why is it still so difficult? B Eng Geol Environ 65:167–184. doi:10.1007/s10064-005-0023-0

    Article  Google Scholar 

  • van Westen CJ, Castellanos Abella EA (2008) Spatial data for landslide susceptibility, hazards and vulnerability assessment: an overview. Eng Geol 102:112–131. doi:10.1016/j.enggeo.2008.03.010

    Article  Google Scholar 

  • Yao X, Tham L, Dai F (2008) Landslide susceptibility mapping based on support vector machines: a case study on natural slopes of Hong Kong, China. Geomorphology 101:572–582. doi:10.1016/j.geomorph.2008.02.011

    Article  Google Scholar 

  • Yesilnacar E, Topal T (2005) Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng Geol 79:251–266. doi:10.1016/j.enggeo.2005.02.002

    Article  Google Scholar 

  • Yilmaz I (2010a) Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ Earth Sci 61:821–836. doi:10.1007/s12665-009-0394-9

  • Yilmaz I (2010b) The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks. Environ Earth Sci 60:505–519. doi:10.1007/s12665-009-0191-5

    Article  Google Scholar 

Download references

Acknowledgments

This study was partially funded by the Swiss National Science Foundation projects Geokernels: kernel-based methods for geo- and environmental sciences. Phase II (No. 200020-121835/1) and rockslides in Rhône valley (No. 200021-118105). We thank Prof. Stuart Lane for the interesting comments provided and Pierrick Nicolet for his valuable help. We also are grateful to the two anonymous reviewers, who provided us with constructive comments and helped in improving the quality of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natan Micheletti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Micheletti, N., Foresti, L., Robert, S. et al. Machine Learning Feature Selection Methods for Landslide Susceptibility Mapping. Math Geosci 46, 33–57 (2014). https://doi.org/10.1007/s11004-013-9511-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-013-9511-0

Keywords

Navigation