Skip to main content
Log in

Significance of variable selection and scaling issues for probabilistic modeling of rainfall-induced landslide susceptibility

  • Published:
Spatial Information Research Aims and scope Submit manuscript

Abstract

Identifying the input variables/attributes for probabilistic modeling of rainfall-induced landslides is critical for effective landslide susceptibility characterization. This study evaluates the capabilities of different attribute selectors available in Weka, an open source machine learning software, for identifying the most landslide-predictive combination of attributes. The study area is located in the Lake Atitlán watershed in Guatemala, which is highly susceptible to landslides during the rainy season. Landslide initiation points were delineated in the field as well as in the ortho-photos, which were taken following Hurricane Stan of October 2005. Two datasets spanning different sized areas were used to compare the success of attribute selectors and to determine if the model results from the smaller area could be successfully applied to the larger one. The Weka Bayesian network classification algorithm was used to evaluate the success of different attribute selection methods and to identify the combination of attributes with the highest rate of landslide prediction for the two study areas. Filtered subset proved to be the most successful in identifying the ideal combination for both the smaller and the larger scale datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Sajinkumar, K. S., Anbazhagan, S., Rani, V. R., & Muraleedharan, C. (2014). A paradigm quantitative approach for a regional risk assessment and management in a few landslide prone hamlets along the windward slope of Western Ghats, India. International Journal of Disaster Risk Reduction, 7, 142–153.

    Article  Google Scholar 

  2. Bouali, E. H., Oommen, T., & Escobar-Wolf, R. (2017). Mapping of slow landslides on the Palos Verdes Peninsula using the California landslide inventory and persistent scatterer interferometry. Landslides. https://doi.org/10.1007/s10346-017-0882-z.

    Google Scholar 

  3. van Westen, C. J., Rengers, N., Terlien, M. T. J., & Soeters, R. (1997). Prediction of the occurrence of slope instability phenomenal through GIS-based hazard zonation. Geologische Rundschau, 86(2), 404–414.

    Article  Google Scholar 

  4. Scaioni, M., Longoni, L., Melillo, V., & Papini, M. (2004). Remote sensing for landslide investigations: An overview of recent achievements and perspectives. Remote Sensing, 6(10), 9600–9652.

    Google Scholar 

  5. Liu, H., & Motoda, H. (2007). Computational methods of feature selection. Boca Raton: CRC Press.

    Google Scholar 

  6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.

    Article  Google Scholar 

  7. Miner, A. S., Vamplew, P., Windle, D. J., Flentje, P., & Warner, P. (2010). A comparative study of various data mining techniques as applied to the modeling of landslide susceptibility on the Bellarine Peninsula, Victoria, Australia. In Proceedings of the 11th IAEG congress of the international association of engineering geology and the environment, Auckland, New Zealand. http://ro.uow.edu.au/engpapers/555.

  8. Marjanovic, M., Kovacevic, M., Bajat, B., Mihalic, S., & Abolmasov, B. (2011). Landslide Assessment of the Starca Basin (Croatia) using machine learning algorithms. Acta Geotechnical Slovenica, 8(2), 45–55.

    Google Scholar 

  9. Tien Bui, D., Pradhan, B., Lofman, O., & Revhaug, I. (2012). Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and Naive Bayes Models. Mathematical Problems in Engineering. https://doi.org/10.1155/2012/974638.

    Google Scholar 

  10. Song, Y., Gong, J., Gao, S., Wang, D., Cui, T., Li, Y., et al. (2012). Susceptibility assessment of earthquake-induced landslides using Bayesian network: A case study in Beichuan, China. Computers & Geosciences, 42, 189–199.

    Article  Google Scholar 

  11. Hwang, S., Guevarra, I. F., & Yu, B. (2009). Slope failure prediction using a decision tree: A case of engineered slopes in South Korea. Engineering Geology, 104(1), 126–134.

    Article  Google Scholar 

  12. Mujalli, R. O., & de Oña, J. (2011). A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks. Journal of Safety Research, 42(5), 317–326.

    Article  Google Scholar 

  13. Bharti, K., Jain, S., & Shukla, S. (2010). Fuzzy K-mean clustering via J48 for intrusion detection system. Kusum Bharti International Journal of Computer Science and Information Technologies (IJCSIT), 1(4), 315–318.

    Google Scholar 

  14. Catani, F., Lagomarsino, D., Segoni, S., & Tofani, V. (2013). Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Natural Hazards and Earth System Science, 13(11), 2815–2831.

    Article  Google Scholar 

  15. Fressard, M., Thiery, Y., & Maquaire, O. (2014). Which data for quantitative landslide susceptibility mapping at operational scale? Case study of the Pays d’Auge plateau hillslopes (Normandy, France). Natural Hazards and Earth System Science, 14(3), 569–588.

    Article  Google Scholar 

  16. Schaefer, L. N., Oommen, T., Corazzato, C., Tibaldi, A., Escobar-Wolf, R., & Rose, W. I. (2013). An integrated field-numerical approach to assess slope stability hazards at volcanoes: The example of Pacaya, Guatemala. Bulletin of Volcanology, 75(6), 1–18.

    Article  Google Scholar 

  17. Smith, D. M., Oommen, T., Bowman, L. J., Gierke, J. S., & Vitton, S. J. (2015). Hazard assessment of rainfall-induced landslides: A case study of San Vicente volcano in central El Salvador. Natural Hazards, 75, 2291–2310.

    Article  Google Scholar 

  18. Luna, B. Q. (2007). Assessment and modelling of two lahars caused by “Hurricane Stan” at Atitlan, Guatemala. Master Thesis (unpublished), University of Oslo.

  19. Keyport, R. N., Oommen, T., Martha, T. R., Sajinkumar, K. S., & Gierke, J. S. (2018). A comparative analysis of pixel- and object-based detection of landslides from very high-resolution images. International Journal Earth Observation and Geoinformation, 64, 1–11.

    Article  Google Scholar 

  20. Lai, J. S., & Tsai, F. (2012). Verification and risk assessment for landslides in the Shimen reservoir watershed of Taiwan using spatial analysis and data mining. In International archives of the photogrammetry, remote sensing and spatial information sciences, XXII ISPRS congress (Vol. XXXIX-B2, pp. 67–70).

  21. Tsai, F., Lai, J. S., Chen, W. W., & Lin, T. H. (2013). Analysis of topographic and vegetative factors with data mining for landslide verification. Ecological Engineering, 61(Part C), 669–677.

    Article  Google Scholar 

  22. Geólogos del Mundo. (2009). Guíametodológica para la elaboración de mapas de susceptibilidad: A movimientos de ladera en la cuenca del lagoatitlán, Guatemala. Con el apoyo financier de la Agencia Española de Cooperación Internacional para el Desarrollo (AECID).

  23. Stevens, D. L., Jr., & Olsen, A. R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465), 262–278.

    Article  Google Scholar 

  24. Theobald, D. M., Stevens, D. L., Jr., White, D., Urquhart, N. S., Olsen, A. R., & Norman, J. B. (2007). Using GIS to generate spatially balanced random survey designs for natural resource applications. Environmental Management, 40(1), 134–146.

    Article  Google Scholar 

  25. ESRI. (2012). ArcTool create spatially balanced points. ArcMap 10.1. ESRI, Redlands, California. http://resources.arcgis.com/en/help/main/10.1/index.html#//00310000009z000000.

  26. Yesilnacar, E., & Topal, T. (2005). Landslide susceptibility mapping: A comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering Geology, 79(3–4), 251–266.

    Article  Google Scholar 

  27. Nandi, A., & Shakoor, A. (2010). A GIS-based landslide susceptibility evaluation using bivariate and multivariate statistical analyses. Engineering Geology, 110(1), 11–20.

    Article  Google Scholar 

  28. Pradhan, B., Oh, H. J., & Buchroithner, M. (2010). Weights-of-evidence model applied to landslide susceptibility mapping in a tropical hilly area. Geomatics, Natural Hazards and Risk, 1(3), 199–223.

    Article  Google Scholar 

  29. Ozdemir, A. (2011). Landslide susceptibility mapping using Bayesian approach in the Sultan Mountains (Akşehir, Turkey). Natural Hazards, 59(3), 1573–1607.

    Article  Google Scholar 

  30. Secretary of Planning and Programming of the Presidency (Segeplan). (2006). Ortho-photos and Land use. http://www.segeplan.gob.gt/2.0/index.php?option=comwrapper&view=wrapper&Itemid=260.

  31. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.

    Google Scholar 

  32. Süzen, M. L., & Doyuran, V. (2004). A comparison of the GIS based landslide susceptibility assessment methods: Multivariate versus bivariate. Environmental Geology, 45(5), 665–679.

    Article  Google Scholar 

  33. Chen, Z., & Wang, J. (2007). Landslide hazard mapping using logistic regression model in Mackenzie Valley, Canada. Natural Hazards, 42(1), 75–89.

    Article  Google Scholar 

  34. Saito, H., Nakayama, D., & Matsuyama, H. (2009). Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: The Akaishi Mountains, Japan. Geomorphology, 109(3), 108–121.

    Article  Google Scholar 

  35. Ducher, M., Kalbacher, E., Combarnous, F., Finaz de Vilaine, J., McGregor, B., Fouque, D., et al. (2013). Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy. BioMed Research International. https://doi.org/10.1155/2013/686150.

    Google Scholar 

  36. Wang, K. J., Makond, B., & Wang, K. M. (2014). Modeling and predicting the occurrence of brain metastasis from lung cancer by Bayesian network: A case study of Taiwan. Computers in Biology and Medicine, 47, 147–160.

    Article  Google Scholar 

  37. Bouckaert, R. R. (2007). WEKA Manual for Version 3-5-7. http://www.cs.waikato.ac.nz/~remco/weka_bn/.

  38. Oommen, T., Baise, L. G., & Vogel, R. (2010). Validation and application of empirical liquefaction models. Journal of Geotechnical and Geoenvironmental Engineering, 136(12), 1618–1633.

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to Geólogos del Mundo in Guatemala, especially Laura Nuñez, in providing key material for completing this work; and to the United States Peace Corps of Guatemala and the organization Ati’tAla’ for their support. This research was in part funded by US National Science Foundation PIRE 0530109. Sajinkumar acknowledges University Grants Commission (UGC), Government of India, for granting Raman Post-Doctoral Fellowship, which enabled him to carry out the Post-Doctoral Research work at Michigan Technological University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Oommen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 28742 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oommen, T., Cobin, P.F., Gierke, J.S. et al. Significance of variable selection and scaling issues for probabilistic modeling of rainfall-induced landslide susceptibility. Spat. Inf. Res. 26, 21–31 (2018). https://doi.org/10.1007/s41324-017-0154-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41324-017-0154-y

Keywords

Navigation