Skip to main content
Log in

A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling

  • Published:
Environmental Modeling & Assessment Aims and scope Submit manuscript

Abstract

Classification and regression problems are a central issue in geosciences. In this paper, we present Classification and Regression Treebagger (ClaReT), a tool for classification and regression based on the random forest (RF) technique. ClaReT is developed in Matlab and has a simple graphic user interface (GUI) that simplifies the model implementation process, allows the standardization of the method, and makes the classification and regression process reproducible. This tool performs automatically the feature selection based on a quantitative criterion and allows testing a large number of explanatory variables. First, it ranks and displays the parameter importance; then, it selects the optimal configuration of explanatory variables; finally, it performs the classification or regression for an entire dataset. It can also provide an evaluation of the results in terms of misclassification error or root mean squared error. We tested the applicability of ClaReT in two case studies. In the first one, we used ClaReT in classification mode to identify the better subset of landslide conditioning variables (LCVs) and to obtain a landslide susceptibility map (LSM) of the Arno river basin (Italy). In the second case study, we used ClaReT in regression mode to produce a soil thickness map of the Terzona catchment, a small sub-basin of the Arno river basin. In both cases, we performed a validation of the results and a comparison with other state-of-the-art techniques. We found that ClaReT produced better results, with a more straightforward and easy application and could be used as a valuable tool to assess the importance of the variables involved in the modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Adediran, A. O., Parcharidis, I., Poscolieri, M., & Pavlopoulos, K. (2004). Computer-assisted discrimination of morphological units on north-central Crete (Greece) by applying multivariate statistics to local relief gradients. Geomorphology, 58, 357–370.

    Article  Google Scholar 

  2. Grunsky, E. C. (1986). Recognition of alteration in volcanic rocks using statistical analysis of lithogeochemical data. Journal of Geochemical Exploration, 25(1–2), 157–183.

    Article  CAS  Google Scholar 

  3. Zhao, J., Wang, W., & Cheng, Q. (2014). Application of geographically weighted regression to identify spatially non-stationary relationships between Fe mineralization and its controlling factors in eastern Tianshan, China. Ore Geology Reviews, 57, 628–638.

    Article  Google Scholar 

  4. Mertens, M., Nestler, I., & Huwe, B. (2002). GIS-based regionalization of soil profiles with classification and regression trees (CART). Z. Pflanzenernähr. Bodenk., 165, 39–43.

    Article  CAS  Google Scholar 

  5. Loos, M., & Elsenbeer, H. (2011). Topographic controls on overland flow generation in a forest—an ensemble tree approach. Journal of Hydrology, 409(1–2), 94–103.

    Article  Google Scholar 

  6. Gharari, S., Hrachowitz, M., Fenicia, F., & Savenije, H. H. G. (2011). Hydrological landscape classification: investigating the performance of HAND based landscape classifications in a central European meso-scale catchment. Hydrology and Earth System Sciences, 15, 3275–3291. doi:10.5194/hess-15-3275-2011.

    Article  Google Scholar 

  7. Khan, U., Tuteja, N. K., & Sharma, A. (2013). Delineating hydrologic response units in large upland catchments and its evaluation using soil moisture simulations. Environmental Modelling and Software, 46, 142–154.

    Article  Google Scholar 

  8. Turco, M., Zollo, A. L., Ronchi, C., De Luigi, C., & Mercogliano, P. (2013). Assessing gridded observations for daily precipitation extremes in the alps with a focus on Northwest Italy. Natural Hazards and Earth System Sciences, 13, 1457–1468.

    Article  Google Scholar 

  9. Mercogliano, P., Segoni, S., Rossi, G., Sikorsky, B., Tofani, V., Schiano, P., Catani, F., & Casagli, N. (2013). Brief communication: a prototype forecasting chain for rainfall induced shallow landslides. Natural Hazards and Earth System Sciences, 13, 771–777.

    Article  Google Scholar 

  10. Steinhorst, R. K., & Williams, R. E. (1985). Discrimination of ground-water sources using cluster analysis, MANOVA, canonical analysis and discriminant analysis. Water Resources Research, 21, 1149–1156.

    Article  CAS  Google Scholar 

  11. Szucs, P., & Horne, R. N. (2009). Applicability of the ACE algorithm for multiple regression in hydrogeology. Computational Geosciences, 13, 123–124. doi:10.1007/s10596-008-9112-z.

    Article  Google Scholar 

  12. Carrara, A. (1983). Multivariate models for landslide hazard evaluation. Mathematical Geology, 15(3), 403–426.

    Article  Google Scholar 

  13. Dong, J. J., Tung, Y. H., Chen, C. C., Liao, J. J., & Pan, Y. W. (2011). Logistic regression model for predicting the failure probability of a landslide dam. Engineering Geology, 117, 52–61.

    Article  Google Scholar 

  14. Rennó, C. D., Nobre, A. D., Cuartas, L. A., Soares, J. V., Hodnett, M. G., Tomasella, J., & Waterloo, M. J. (2008). HAND, a new terrain descriptor using SRTM-DEM: mapping terra-firme rainforest environments in Amazonia. Remote Sensing of Environment, 112, 3469–3481. doi:10.1016/j.rse.2008.03.018.

    Article  Google Scholar 

  15. Vannametee, E., Babel, L. V., Hendriks, M. R., Schuur, J., de Jong, S. M., Bierkens, M. F. P., & Karssenberg, D. (2014). Semi-automated mapping of landforms using multiple point geostatistics. Geomorphology, 221, 298–319. doi:10.1016/j.geomorph.2014.05.032.

    Article  Google Scholar 

  16. Lachenbruch, P. A., & Goldstein, M. (1979). Discriminant analysis. Biometrics, 35, 69–85.

    Article  Google Scholar 

  17. Press, S. J., & Wilson, S. (1978). Choosing between logistic regression and discriminant analysis. Journal of the American Statistical Association, 73, 699–705.

    Article  Google Scholar 

  18. Flury, B., & Riedwyl, H. (1990). Multivariate statistics: a practical approach. London: Chapman and Hall.

    Google Scholar 

  19. Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. Princeton, NJ: John Wiley & Sons.

    Google Scholar 

  20. Studenmund, A. H. (1992). Using econometrics: a practical guide. New York: Harper Collins.

    Google Scholar 

  21. Snedecor, G. W., & Cochran, W. G. (1980). Statistical methods (7th ed.). Ames, IA: The Iowa State University Press.

    Google Scholar 

  22. Neter, J., Wasserman, W., & Kutner, M. H. (1985). Applied linear statistical models (2nd ed.). Homewood, IL: Richard D. Irwin, Inc..

    Google Scholar 

  23. Myers, R. H. (1990). Classical and modern regression with applications (2nd ed.). Boston, Massachusetts: PWS-KENT Publishing Company.

    Google Scholar 

  24. Tanaka, H., Hayashi, I., & Watada, J. (1989). Possibilistic linear regression analysis for fuzzy data. European Journal of Operational Research, 40(3), 389–396.

    Article  Google Scholar 

  25. Beale, R., & Jackson, T. (1991). Neural computing: an introduction. Bristol: Adam Hilger, Techno House.

    Google Scholar 

  26. Haykin, S. (1994). Neural networks: a comprehensive foundation. New York: Maxwell Macmillan International.

    Google Scholar 

  27. Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth International Group.

    Google Scholar 

  28. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  29. Razi, M. A., & Athappilly, K. (2005). A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Systems with Applications, 29(1), 65–74.

    Article  Google Scholar 

  30. Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: back propagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software, 25, 747–759.

    Article  Google Scholar 

  31. Kanungo, D. P., Arora, M. K., Sarkar, S., & Gupta, R. P. (2006). A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Engineering Geology, 85, 347–366.

    Article  Google Scholar 

  32. Kurt, I., Ture, M., & Kurum, A. T. (2008). Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Systems with Applications, 34(1), 366–374.

    Article  Google Scholar 

  33. King, R. D., Feng, C., & Sutherland, A. (1995). Statlog-comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3), 289–333.

    Article  Google Scholar 

  34. Segoni, S., Rossi, G., Rosi, A., & Catani, F. (2014). Landslides triggered by rainfall: a semiautomated procedure to define consistent intensity-duration thresholds. Computational Geosciences, 63, 123–131.

    Article  Google Scholar 

  35. Guzzetti, F., Carrara, A., Cardinali, M., & Reichenbach, P. (1999). Landslide hazard evaluation: a review of current techniques and their application in a multiscale study, Central Italy. Geomorphology, 31, 181–216.

    Article  Google Scholar 

  36. Carrara, A., Crosta, G. B., & Frattini, P. (2008). Comparing models of debris-flow susceptibility in the alpine environment. Geomorphology, 94, 353–378.

    Article  Google Scholar 

  37. Yilmaz, I. (2009). Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: a case study from Kat landslides (Tokat-Turkey). Computer & Geoscience, 35, 1125–1138.

    Article  Google Scholar 

  38. Catani, F., Lagomarsino, D., Segoni, S., & Tofani, V. (2013). Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Natural Hazards and Earth System Sciences, 13(11), 2815–2831.

    Article  Google Scholar 

  39. Lee, S., Choi, J., & Min, K. (2002). Landslide susceptibility analysis and verification using the Bayesian probability model. Environmental Geology, 43, 120–131.

    Article  Google Scholar 

  40. Gorsevski, P. V., Gessler, P. E., Foltz, R. B., & Elliot, W. J. (2006). Spatial prediction of landslide hazard using logistic regression and ROC analysis. Transactions in GIS, 10, 395–415.

    Article  Google Scholar 

  41. Costanzo, D., Rotigliano, E., Irigaray, C., Jiménez-Perálvarez, J. D., & Chacón, J. (2012). Factors selection in landslide susceptibility modelling on large scale following the GIS matrix method: application to the river Beiro basin (Spain). Natural Hazards and Earth System Sciences, 12, 327–340.

    Article  Google Scholar 

  42. Felicísimo, A., Cuartero, A., Remondo, J., & Quirós, E. (2013). Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides, 10, 175–189.

    Article  Google Scholar 

  43. Manzo, G., Tofani, V., Segoni, S., Battistini, A., & Catani, F. (2013). GIS techniques for regional-scale landslide susceptibility assessment: the Sicily (Italy) case study. International Journal of Geographical Information Science, 27, 1433–1452.

    Article  Google Scholar 

  44. Lee, S., & Pradhan, B. (2007). Landslide hazard mapping at Selangor, Malaysia, using frequency ratio and logistic regression models. Landslides, 4, 33–41.

    Article  Google Scholar 

  45. Van Den Eeckhaut, M., Reichenbach, P., Guzzetti, F., Rossi, M., & Poesen, J. (2009). Combined landslide inventory and susceptibility assessment based on different mapping units: an example from the Flemish Ardennes, Belgium. Natural Hazards and Earth System Sciences, 9, 507–521.

    Article  Google Scholar 

  46. Pereira, S., Zêzere, J. L., & Bateira, C. (2012). Technical note: assessing predictive capacity and conditional independence of landslide predisposing factors for shallow landslide susceptibility models. Natural Hazards and Earth System Sciences, 12, 979–988.

    Article  Google Scholar 

  47. Akgun, A., Sezer, E. A., Nefeslioglu, H. A., Gokceoglu, C., & Pradhan, B. (2012). An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computers & Geosciences, 38, 23–34.

    Article  Google Scholar 

  48. Catani, F., Segoni, S., & Falorni, G. (2010). An empirical geomorphology-based approach to the spatial prediction of soil thickness at catchment scale. Water Resources Research, 46, W05508. doi:10.1029/2008WR007450.

    Article  Google Scholar 

  49. Saulnier, G. M., Beven, K., & Obled, C. (1997). Including spatially variable effective soil depths in TOPMODEL. Journal of Hydrology, 202, 158–172.

    Article  Google Scholar 

  50. De Rose, R. C. (1996). Relationships between slope morphology, regolith depth, and the incidence of shallow landslides in eastern Taranaki hill country. Zeitschrift fur Geomorphologie Supplementband, 105, 49–60.

    Google Scholar 

  51. Tesfa, T. K., Tarboton, D. G., Chandler, D. G., & McNamara, J. P. (2009). Modeling soil depth from topographic and land cover attributes. Water Resources Research, 45, W10438. doi:10.1029/2008WR007474.

    Article  Google Scholar 

  52. Tsai, C. C., Chen, Z. S., Duh, C. T., & Horng, F. V. (2001). Prediction of soil depth using a soil-landscape regression model: a case study on forest soils in southern Taiwan. Proc. Natl. Sci. Counc. R.O.C., 25(1), 34–49.

    CAS  Google Scholar 

  53. Ziadat, M. F. (2005). Analyzing digital terrain attributes to predict soil attributes for a relatively large area, soil Sci. Soc. Am. J., 69, 1590–1599.

    Article  CAS  Google Scholar 

  54. Segoni, S., Lagomarsino, D., Fanti, R., Moretti, S., & Casagli, N. (2015). Integration of rainfall thresholds and susceptibility maps in the Emilia Romagna (Italy) regional-scale landslide warning system. Landslides, 12, 773–785.

    Article  Google Scholar 

  55. Trigila, A., Iadanza, C., Esposito, C., & Scarascia-Mugnozza, G. (2015). Comparison of logistic regression and random forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology, 249, 119–136.

    Article  Google Scholar 

  56. Youssef, A. M., Pourghasemi, H. R., Pourtaghi, Z. S., & Al-Katheeri, M. M. (2015). Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir region, Saudi Arabia. Landslides. doi:10.1007/s10346-015-0614-1.

    Google Scholar 

  57. Bachmair, S., & Weiler, M. (2012). Hillslope characteristics as controls of subsurface flow variability. Hydrology and Earth System Sciences, 16, 3699–3715.

    Article  Google Scholar 

  58. Vorpahl, P., Elsenbeer, H., Märker, M., & Schröder, B. (2012). How can statistical models help to determine driving factors of landslides? Ecological Modelling, 239, 27–39.

    Article  Google Scholar 

  59. Díaz-Uriarte, R., & De Andrés, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics. doi:10.1186/1471-2105-7-3.

    Google Scholar 

  60. Liaw, A., & Wiener, M. (2002). Classification and regression by random Forest. R News, 2, 18–22.

    Google Scholar 

  61. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.

    Article  Google Scholar 

  62. Frattini, P., Crosta, G., & Carrara, A. (2010). Techniques for evaluating the performance of landslide susceptibility models. Engineering Geology, 111, 62–72.

    Article  Google Scholar 

  63. Swets, J. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293.

    Article  CAS  Google Scholar 

  64. Brenning, A. (2005). Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5, 853–862.

    Article  Google Scholar 

  65. IAEG (1990). Suggested nomenclature for landslides. IAEG Bulletin, 41, 13–16.

    Google Scholar 

  66. Bertolini, G., Casagli, N., Ermini, L., & Malaguti, C. (2004). Radiocarbon data on Lateglacial and Holocene landslides in the northern Apennines. Natural Hazards, 31, 645–662.

    Article  Google Scholar 

  67. Catani, F., Casagli, N., Ermini, L., Righini, G., & Menduni, G. (2005). Landslide hazard and risk mapping at catchment scale in the Arno River basin. Landslides, 2, 329–342.

    Article  Google Scholar 

  68. Trigila, A., Frattini, P., Casagli, N., Catani, F., Crosta, G., Esposito, C. et al. (2013). Landslide susceptibility mapping at national scale: the Italian case study. In Landslide Science and Practice (pp. 287–295). Berlin: Springer.

  69. Carrara, A., Crosta, G., & Frattini, P. (2003). Geomorphological and historical data in assessing landslide hazard. Earth Surf. Process. Landforms, 28, 1125–1142.

    Article  Google Scholar 

  70. Baeza, C., & Corominas, J. (2001). Assessment of shallow landslide susceptibility by means of multivariate statistical techniques. Earth Surf. Process. Landforms, 26, 1251–1263.

    Article  Google Scholar 

  71. Segoni, S., Rossi, G., & Catani, F. (2012). Improving basin-scale shallow landslides modelling using reliable soil thickness maps. Natural Hazards, 61, 85–101.

    Article  Google Scholar 

  72. Godt, J. W., Baum, R. L., Savage, W. Z., Salciarini, D., Schulz, W. H., & Harp, E. L. (2008). Transient deterministic shallow landslide modeling: requirements for susceptibility and hazard assessments in a GIS framework. Engineering Geology, 102(3–4), 214–226.

    Article  Google Scholar 

  73. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

    Google Scholar 

  74. Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. doi:10.1186/1471-2105-9-307.

    Article  Google Scholar 

  75. Yilmaz, I. (2010). The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks. Environmental Earth Sciences, 60, 505–519.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniela Lagomarsino.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lagomarsino, D., Tofani, V., Segoni, S. et al. A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling. Environ Model Assess 22, 201–214 (2017). https://doi.org/10.1007/s10666-016-9538-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10666-016-9538-y

Keywords

Navigation