A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling

Lagomarsino, Daniela; Tofani, V.; Segoni, S.; Catani, F.; Casagli, N.

doi:10.1007/s10666-016-9538-y

A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling

Published: 20 January 2017

Volume 22, pages 201–214, (2017)
Cite this article

Environmental Modeling & Assessment Aims and scope Submit manuscript

Daniela Lagomarsino¹,
V. Tofani¹,
S. Segoni¹,
F. Catani¹ &
…
N. Casagli¹

1564 Accesses
64 Citations
Explore all metrics

Abstract

Classification and regression problems are a central issue in geosciences. In this paper, we present Classification and Regression Treebagger (ClaReT), a tool for classification and regression based on the random forest (RF) technique. ClaReT is developed in Matlab and has a simple graphic user interface (GUI) that simplifies the model implementation process, allows the standardization of the method, and makes the classification and regression process reproducible. This tool performs automatically the feature selection based on a quantitative criterion and allows testing a large number of explanatory variables. First, it ranks and displays the parameter importance; then, it selects the optimal configuration of explanatory variables; finally, it performs the classification or regression for an entire dataset. It can also provide an evaluation of the results in terms of misclassification error or root mean squared error. We tested the applicability of ClaReT in two case studies. In the first one, we used ClaReT in classification mode to identify the better subset of landslide conditioning variables (LCVs) and to obtain a landslide susceptibility map (LSM) of the Arno river basin (Italy). In the second case study, we used ClaReT in regression mode to produce a soil thickness map of the Terzona catchment, a small sub-basin of the Arno river basin. In both cases, we performed a validation of the results and a comparison with other state-of-the-art techniques. We found that ClaReT produced better results, with a more straightforward and easy application and could be used as a valuable tool to assess the importance of the variables involved in the modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia

Article 08 August 2015

Ahmed Mohamed Youssef, Hamid Reza Pourghasemi, … Mohamed M. Al-Katheeri

Spatial Prediction of Landslide Susceptibility Using Random Forest Algorithm

A comparative study of different machine learning methods coupled with GIS for landslide susceptibility assessment: a case study of N’fis basin, Marrakesh High Atlas (Morocco)

Article 01 June 2022

Hassan Ait Naceur, Brahim Igmoulan, … Mourad Jadoud

References

Adediran, A. O., Parcharidis, I., Poscolieri, M., & Pavlopoulos, K. (2004). Computer-assisted discrimination of morphological units on north-central Crete (Greece) by applying multivariate statistics to local relief gradients. Geomorphology, 58, 357–370.
Article Google Scholar
Grunsky, E. C. (1986). Recognition of alteration in volcanic rocks using statistical analysis of lithogeochemical data. Journal of Geochemical Exploration, 25(1–2), 157–183.
Article CAS Google Scholar
Zhao, J., Wang, W., & Cheng, Q. (2014). Application of geographically weighted regression to identify spatially non-stationary relationships between Fe mineralization and its controlling factors in eastern Tianshan, China. Ore Geology Reviews, 57, 628–638.
Article Google Scholar
Mertens, M., Nestler, I., & Huwe, B. (2002). GIS-based regionalization of soil profiles with classification and regression trees (CART). Z. Pflanzenernähr. Bodenk., 165, 39–43.
Article CAS Google Scholar
Loos, M., & Elsenbeer, H. (2011). Topographic controls on overland flow generation in a forest—an ensemble tree approach. Journal of Hydrology, 409(1–2), 94–103.
Article Google Scholar
Gharari, S., Hrachowitz, M., Fenicia, F., & Savenije, H. H. G. (2011). Hydrological landscape classification: investigating the performance of HAND based landscape classifications in a central European meso-scale catchment. Hydrology and Earth System Sciences, 15, 3275–3291. doi:10.5194/hess-15-3275-2011.
Article Google Scholar
Khan, U., Tuteja, N. K., & Sharma, A. (2013). Delineating hydrologic response units in large upland catchments and its evaluation using soil moisture simulations. Environmental Modelling and Software, 46, 142–154.
Article Google Scholar
Turco, M., Zollo, A. L., Ronchi, C., De Luigi, C., & Mercogliano, P. (2013). Assessing gridded observations for daily precipitation extremes in the alps with a focus on Northwest Italy. Natural Hazards and Earth System Sciences, 13, 1457–1468.
Article Google Scholar
Mercogliano, P., Segoni, S., Rossi, G., Sikorsky, B., Tofani, V., Schiano, P., Catani, F., & Casagli, N. (2013). Brief communication: a prototype forecasting chain for rainfall induced shallow landslides. Natural Hazards and Earth System Sciences, 13, 771–777.
Article Google Scholar
Steinhorst, R. K., & Williams, R. E. (1985). Discrimination of ground-water sources using cluster analysis, MANOVA, canonical analysis and discriminant analysis. Water Resources Research, 21, 1149–1156.
Article CAS Google Scholar
Szucs, P., & Horne, R. N. (2009). Applicability of the ACE algorithm for multiple regression in hydrogeology. Computational Geosciences, 13, 123–124. doi:10.1007/s10596-008-9112-z.
Article Google Scholar
Carrara, A. (1983). Multivariate models for landslide hazard evaluation. Mathematical Geology, 15(3), 403–426.
Article Google Scholar
Dong, J. J., Tung, Y. H., Chen, C. C., Liao, J. J., & Pan, Y. W. (2011). Logistic regression model for predicting the failure probability of a landslide dam. Engineering Geology, 117, 52–61.
Article Google Scholar
Rennó, C. D., Nobre, A. D., Cuartas, L. A., Soares, J. V., Hodnett, M. G., Tomasella, J., & Waterloo, M. J. (2008). HAND, a new terrain descriptor using SRTM-DEM: mapping terra-firme rainforest environments in Amazonia. Remote Sensing of Environment, 112, 3469–3481. doi:10.1016/j.rse.2008.03.018.
Article Google Scholar
Vannametee, E., Babel, L. V., Hendriks, M. R., Schuur, J., de Jong, S. M., Bierkens, M. F. P., & Karssenberg, D. (2014). Semi-automated mapping of landforms using multiple point geostatistics. Geomorphology, 221, 298–319. doi:10.1016/j.geomorph.2014.05.032.
Article Google Scholar
Lachenbruch, P. A., & Goldstein, M. (1979). Discriminant analysis. Biometrics, 35, 69–85.
Article Google Scholar
Press, S. J., & Wilson, S. (1978). Choosing between logistic regression and discriminant analysis. Journal of the American Statistical Association, 73, 699–705.
Article Google Scholar
Flury, B., & Riedwyl, H. (1990). Multivariate statistics: a practical approach. London: Chapman and Hall.
Google Scholar
Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. Princeton, NJ: John Wiley & Sons.
Google Scholar
Studenmund, A. H. (1992). Using econometrics: a practical guide. New York: Harper Collins.
Google Scholar
Snedecor, G. W., & Cochran, W. G. (1980). Statistical methods (7th ed.). Ames, IA: The Iowa State University Press.
Google Scholar
Neter, J., Wasserman, W., & Kutner, M. H. (1985). Applied linear statistical models (2nd ed.). Homewood, IL: Richard D. Irwin, Inc..
Google Scholar
Myers, R. H. (1990). Classical and modern regression with applications (2nd ed.). Boston, Massachusetts: PWS-KENT Publishing Company.
Google Scholar
Tanaka, H., Hayashi, I., & Watada, J. (1989). Possibilistic linear regression analysis for fuzzy data. European Journal of Operational Research, 40(3), 389–396.
Article Google Scholar
Beale, R., & Jackson, T. (1991). Neural computing: an introduction. Bristol: Adam Hilger, Techno House.
Google Scholar
Haykin, S. (1994). Neural networks: a comprehensive foundation. New York: Maxwell Macmillan International.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth International Group.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Article Google Scholar
Razi, M. A., & Athappilly, K. (2005). A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Systems with Applications, 29(1), 65–74.
Article Google Scholar
Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: back propagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software, 25, 747–759.
Article Google Scholar
Kanungo, D. P., Arora, M. K., Sarkar, S., & Gupta, R. P. (2006). A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Engineering Geology, 85, 347–366.
Article Google Scholar
Kurt, I., Ture, M., & Kurum, A. T. (2008). Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Systems with Applications, 34(1), 366–374.
Article Google Scholar
King, R. D., Feng, C., & Sutherland, A. (1995). Statlog-comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3), 289–333.
Article Google Scholar
Segoni, S., Rossi, G., Rosi, A., & Catani, F. (2014). Landslides triggered by rainfall: a semiautomated procedure to define consistent intensity-duration thresholds. Computational Geosciences, 63, 123–131.
Article Google Scholar
Guzzetti, F., Carrara, A., Cardinali, M., & Reichenbach, P. (1999). Landslide hazard evaluation: a review of current techniques and their application in a multiscale study, Central Italy. Geomorphology, 31, 181–216.
Article Google Scholar
Carrara, A., Crosta, G. B., & Frattini, P. (2008). Comparing models of debris-flow susceptibility in the alpine environment. Geomorphology, 94, 353–378.
Article Google Scholar
Yilmaz, I. (2009). Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: a case study from Kat landslides (Tokat-Turkey). Computer & Geoscience, 35, 1125–1138.
Article Google Scholar
Catani, F., Lagomarsino, D., Segoni, S., & Tofani, V. (2013). Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Natural Hazards and Earth System Sciences, 13(11), 2815–2831.
Article Google Scholar
Lee, S., Choi, J., & Min, K. (2002). Landslide susceptibility analysis and verification using the Bayesian probability model. Environmental Geology, 43, 120–131.
Article Google Scholar
Gorsevski, P. V., Gessler, P. E., Foltz, R. B., & Elliot, W. J. (2006). Spatial prediction of landslide hazard using logistic regression and ROC analysis. Transactions in GIS, 10, 395–415.
Article Google Scholar
Costanzo, D., Rotigliano, E., Irigaray, C., Jiménez-Perálvarez, J. D., & Chacón, J. (2012). Factors selection in landslide susceptibility modelling on large scale following the GIS matrix method: application to the river Beiro basin (Spain). Natural Hazards and Earth System Sciences, 12, 327–340.
Article Google Scholar
Felicísimo, A., Cuartero, A., Remondo, J., & Quirós, E. (2013). Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides, 10, 175–189.
Article Google Scholar
Manzo, G., Tofani, V., Segoni, S., Battistini, A., & Catani, F. (2013). GIS techniques for regional-scale landslide susceptibility assessment: the Sicily (Italy) case study. International Journal of Geographical Information Science, 27, 1433–1452.
Article Google Scholar
Lee, S., & Pradhan, B. (2007). Landslide hazard mapping at Selangor, Malaysia, using frequency ratio and logistic regression models. Landslides, 4, 33–41.
Article Google Scholar
Van Den Eeckhaut, M., Reichenbach, P., Guzzetti, F., Rossi, M., & Poesen, J. (2009). Combined landslide inventory and susceptibility assessment based on different mapping units: an example from the Flemish Ardennes, Belgium. Natural Hazards and Earth System Sciences, 9, 507–521.
Article Google Scholar
Pereira, S., Zêzere, J. L., & Bateira, C. (2012). Technical note: assessing predictive capacity and conditional independence of landslide predisposing factors for shallow landslide susceptibility models. Natural Hazards and Earth System Sciences, 12, 979–988.
Article Google Scholar
Akgun, A., Sezer, E. A., Nefeslioglu, H. A., Gokceoglu, C., & Pradhan, B. (2012). An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computers & Geosciences, 38, 23–34.
Article Google Scholar
Catani, F., Segoni, S., & Falorni, G. (2010). An empirical geomorphology-based approach to the spatial prediction of soil thickness at catchment scale. Water Resources Research, 46, W05508. doi:10.1029/2008WR007450.
Article Google Scholar
Saulnier, G. M., Beven, K., & Obled, C. (1997). Including spatially variable effective soil depths in TOPMODEL. Journal of Hydrology, 202, 158–172.
Article Google Scholar
De Rose, R. C. (1996). Relationships between slope morphology, regolith depth, and the incidence of shallow landslides in eastern Taranaki hill country. Zeitschrift fur Geomorphologie Supplementband, 105, 49–60.
Google Scholar
Tesfa, T. K., Tarboton, D. G., Chandler, D. G., & McNamara, J. P. (2009). Modeling soil depth from topographic and land cover attributes. Water Resources Research, 45, W10438. doi:10.1029/2008WR007474.
Article Google Scholar
Tsai, C. C., Chen, Z. S., Duh, C. T., & Horng, F. V. (2001). Prediction of soil depth using a soil-landscape regression model: a case study on forest soils in southern Taiwan. Proc. Natl. Sci. Counc. R.O.C., 25(1), 34–49.
CAS Google Scholar
Ziadat, M. F. (2005). Analyzing digital terrain attributes to predict soil attributes for a relatively large area, soil Sci. Soc. Am. J., 69, 1590–1599.
Article CAS Google Scholar
Segoni, S., Lagomarsino, D., Fanti, R., Moretti, S., & Casagli, N. (2015). Integration of rainfall thresholds and susceptibility maps in the Emilia Romagna (Italy) regional-scale landslide warning system. Landslides, 12, 773–785.
Article Google Scholar
Trigila, A., Iadanza, C., Esposito, C., & Scarascia-Mugnozza, G. (2015). Comparison of logistic regression and random forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology, 249, 119–136.
Article Google Scholar
Youssef, A. M., Pourghasemi, H. R., Pourtaghi, Z. S., & Al-Katheeri, M. M. (2015). Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir region, Saudi Arabia. Landslides. doi:10.1007/s10346-015-0614-1.
Google Scholar
Bachmair, S., & Weiler, M. (2012). Hillslope characteristics as controls of subsurface flow variability. Hydrology and Earth System Sciences, 16, 3699–3715.
Article Google Scholar
Vorpahl, P., Elsenbeer, H., Märker, M., & Schröder, B. (2012). How can statistical models help to determine driving factors of landslides? Ecological Modelling, 239, 27–39.
Article Google Scholar
Díaz-Uriarte, R., & De Andrés, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics. doi:10.1186/1471-2105-7-3.
Google Scholar
Liaw, A., & Wiener, M. (2002). Classification and regression by random Forest. R News, 2, 18–22.
Google Scholar
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.
Article Google Scholar
Frattini, P., Crosta, G., & Carrara, A. (2010). Techniques for evaluating the performance of landslide susceptibility models. Engineering Geology, 111, 62–72.
Article Google Scholar
Swets, J. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293.
Article CAS Google Scholar
Brenning, A. (2005). Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5, 853–862.
Article Google Scholar
IAEG (1990). Suggested nomenclature for landslides. IAEG Bulletin, 41, 13–16.
Google Scholar
Bertolini, G., Casagli, N., Ermini, L., & Malaguti, C. (2004). Radiocarbon data on Lateglacial and Holocene landslides in the northern Apennines. Natural Hazards, 31, 645–662.
Article Google Scholar
Catani, F., Casagli, N., Ermini, L., Righini, G., & Menduni, G. (2005). Landslide hazard and risk mapping at catchment scale in the Arno River basin. Landslides, 2, 329–342.
Article Google Scholar
Trigila, A., Frattini, P., Casagli, N., Catani, F., Crosta, G., Esposito, C. et al. (2013). Landslide susceptibility mapping at national scale: the Italian case study. In Landslide Science and Practice (pp. 287–295). Berlin: Springer.
Carrara, A., Crosta, G., & Frattini, P. (2003). Geomorphological and historical data in assessing landslide hazard. Earth Surf. Process. Landforms, 28, 1125–1142.
Article Google Scholar
Baeza, C., & Corominas, J. (2001). Assessment of shallow landslide susceptibility by means of multivariate statistical techniques. Earth Surf. Process. Landforms, 26, 1251–1263.
Article Google Scholar
Segoni, S., Rossi, G., & Catani, F. (2012). Improving basin-scale shallow landslides modelling using reliable soil thickness maps. Natural Hazards, 61, 85–101.
Article Google Scholar
Godt, J. W., Baum, R. L., Savage, W. Z., Salciarini, D., Schulz, W. H., & Harp, E. L. (2008). Transient deterministic shallow landslide modeling: requirements for susceptibility and hazard assessments in a GIS framework. Engineering Geology, 102(3–4), 214–226.
Article Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Google Scholar
Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. doi:10.1186/1471-2105-9-307.
Article Google Scholar
Yilmaz, I. (2010). The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks. Environmental Earth Sciences, 60, 505–519.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Earth Sciences Department, University of Firenze, Via La Pira 4, 50121, Florence, Italy
Daniela Lagomarsino, V. Tofani, S. Segoni, F. Catani & N. Casagli

Authors

Daniela Lagomarsino
View author publications
You can also search for this author in PubMed Google Scholar
V. Tofani
View author publications
You can also search for this author in PubMed Google Scholar
S. Segoni
View author publications
You can also search for this author in PubMed Google Scholar
F. Catani
View author publications
You can also search for this author in PubMed Google Scholar
N. Casagli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniela Lagomarsino.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lagomarsino, D., Tofani, V., Segoni, S. et al. A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling. Environ Model Assess 22, 201–214 (2017). https://doi.org/10.1007/s10666-016-9538-y

Download citation

Received: 21 May 2014
Accepted: 20 October 2016
Published: 20 January 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10666-016-9538-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling

Abstract

Access this article

Similar content being viewed by others

Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia

Spatial Prediction of Landslide Susceptibility Using Random Forest Algorithm

A comparative study of different machine learning methods coupled with GIS for landslide susceptibility assessment: a case study of N’fis basin, Marrakesh High Atlas (Morocco)

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling

Abstract

Access this article

Similar content being viewed by others

Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia

Spatial Prediction of Landslide Susceptibility Using Random Forest Algorithm

A comparative study of different machine learning methods coupled with GIS for landslide susceptibility assessment: a case study of N’fis basin, Marrakesh High Atlas (Morocco)

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation