Machine Learning Based Predictive Modeling of Debris Flow Probability Following Wildfire in the Intermountain Western United States


It has been recognized that wildfire, followed by large precipitation events, triggers both flooding and debris flows in mountainous regions. The ability to predict and mitigate these hazards is crucial in protecting public safety and infrastructure. A need for advanced modeling techniques was highlighted by re-evaluating existing prediction models from the literature. Data from 15 individual burn basins in the intermountain western United States, which contained 388 instances and 26 variables, were obtained from the United States Geological Survey (USGS). After randomly selecting a subset of the data to serve as a validation set, advanced predictive modeling techniques, using machine learning, were implemented using the remaining training data. Tenfold cross-validation was applied to the training data to ensure nearly unbiased error estimation and also to avoid model over-fitting. Linear, nonlinear, and rule-based predictive models including naïve Bayes, mixture discriminant analysis, classification trees, and logistic regression models were developed and tested on the validation dataset. Results for the new non-linear approaches were nearly twice as successful as those for the linear models, previously published in debris flow prediction literature. The new prediction models advance the current state-of-the-art of debris flow prediction and improve the ability to accurately predict debris flow events in wildfire-prone intermountain western United States.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Agresti A (2002) Introduction to generalized linear models. In: Balding DJ, Bloomfield P, Noel NAC, Fisher NI, Johnstone IM, Kadane JB, Ryan LM, Scott DW, Smith AFM, Teugels JL (eds) Categorical data analysis, 2nd edn. Wiley, Hoboken, pp 115–164

  2. Bailey RW, Craddock GW, Croft AR (1947) Watershed management for summer flood control in Utah. US Department of Agriculture, Washington

    Google Scholar 

  3. Benediktsson JA, Swain PH, Ersoy OK (1990) Neural network approaches versus statistical-methods in classification of multisource remote-sensing data. IEEE Trans Geosci Remote Sens 28(4):540–52. doi:10.1109/TGRS.1990.572944

    Article  Google Scholar 

  4. Cannon SH (2001) Debris flow generation from recently burned watersheds. Environ Eng Geosci 7:321–341. doi:10.2113/gseegeosci.7.4.321

    Article  Google Scholar 

  5. Cannon SH, Degraff JV (2009) The increasing wildfire and post-fire debris-flow threat in Western USA, and implications for consequences of climate change. In: Sassa K, Canuti P (eds) Landslides—disaster risk reduction, 1st edn. Springer, Berlin, pp 177–190

  6. Cannon SH, Gartner JE (2005) Wildfire-related debris flow from a hazards perspective. In: Debris flow hazards and related phenomena, 1st edn. Springer, Berlin, pp 363–385

  7. Cannon SH, Gartner JE, Rupert MG, Michael JA, Rea AH, Parrett C (2010) Predicting the probability and volume of postwildfire debris flows in the intermountain Western United States. Geol Soc Am Bull 122:127–44

    Article  Google Scholar 

  8. Cannon SH, Kirkham RM, Parise M (2000) Wildfire-related debris-flow initiation process, Storm King Mountain, Colorado. Geomorphology 39:171–188. doi:10.1016/S0169-555X(00)00108-2

    Article  Google Scholar 

  9. Clark J (2013) Remote sensing and geospatial support to burned area emergency response (BAER) teams in assessing wildfire effects to hillslopes. In: Landslide science and practice, vol 4, global environmental change, 1st edn. Springer, Berlin, pp 211–215

  10. Clemmensen L, Hastie T, Witten D, Ersboll B (2011) Sparse discriminant analysis. Technometrics 53(4):406–413. doi:10.1198/TECH.2011.08118

    Article  Google Scholar 

  11. De Graff JV (2014) Improvement in quantifying debris flow risk for post-wildfire emergency response. Geoenviron Disasters. doi:10.1186/s40677-014-0005-2

    Google Scholar 

  12. De Graff JV, Lewis DS (1989) Using past landslide activity to guide post-wildfire mitigation. In: Engineering geology and geotechnical engineering, 1st edn. 25th symposium on engineering geology and geotechnical engineering, Nevada, p 65

  13. Eaton EC (1936) Flood and erosion control problems and their solution. Trans Am Soc Civ Eng 61:1021–1049

    Google Scholar 

  14. Faraway J (1995) Data splitting strategies for reducing the effect of model selection on inference. Dissertation, University of Michigan

  15. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874

    Article  Google Scholar 

  16. Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet. doi:10.1111/j.1469-1809.1936.tb02137.x

    Google Scholar 

  17. Freedman DA (1983) A note on screening regression equations. Am Stat. doi:10.2307/2685877

    Google Scholar 

  18. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for general linear models via coordinate descent. J Stat Softw 33(1):1–22

    Article  Google Scholar 

  19. Gartner JE, Cannon SH, Bigio ER, Davis NK, Parrett C, Pierce KL, Rupert MG, Thurston BL, Trebish MJ, Garcia SP, Rea AH (2005) Compilation of data relating to the erosive response of 606 recently burned basins in the Western US. US Geological Survey.

  20. Gartner JE, Cannon SH, Helsel DR, Bandurraga M (2009) Multivariate statistical models for predicting sediment yields from southern California watersheds. US Geological Survey.

  21. Gartner JK, Cannon SH, Santi PM, deWolfe VG (2007) Empirical models to predict the volumes of debris flows generated by recently burned basins in the western U.S. Geomorphology. doi:10.1016/j.geomorph.2007.02.033

  22. Gartner JE, Cannon SH, Santi PM (2011) Implementation of post-fire debris flow hazard assessment along drainage networks, southern California, U.S.A. U.S Geological Survey, Reston. doi:10.4408/IJEGE.2011-03.B-093

  23. Hardle W (2004) Nonparametric density estimation. In: Nonparametric and semiparametric models. Springer, Berlin, pp 39–83

  24. Harrell FE (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York

    Google Scholar 

  25. Haupt SE, Pasini A, Marzban C (2009) Artificial intelligence methods in the environmental sciences. Springer, Netherlands

    Google Scholar 

  26. Hsieh WW (2009) Machine learning methods in the environmental sciences: neural networks and kernels. Cambridge University Press, Cambridge

    Google Scholar 

  27. Ichoku C (2011) Earth observatory. National Aeronautics and Space Administration.

  28. Key CH, Benson NC (2006) Landscape assessment: ground measure severity, the composite burn index; and remote sensing of severity, the normalized burn ratio. U.S. Geological Survey. Accessed 01 Jan 2015

  29. Kotsiantis S, Kannellopoulos D, Pintelas P (2006) Data preprocessing for supervised learning. Int J Comput Sci 1(2):111–117

    Google Scholar 

  30. Krasnopolsky VM (2007) Neural network emulations for complex multidimensional geophysical mappings: applications of neural network techniques to atmospheric and oceanic satellite retrievals and numerical modeling. Rev Geophys 45(3):RG3009. doi:10.1029/2006RG000200

    Article  Google Scholar 

  31. Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York

    Google Scholar 

  32. Olden JD, Jackson DA (2002) A comparison of statistical approaches for modelling fish species distributions. Freshw Biol 47(10):1976–95

    Article  Google Scholar 

  33. Oommen T, Misra D, Twarakavi NKC, Prakash A, Sahoo B, Bandopadhyay S (2008) An objective analysis of support vector machine based classification for remote sensing. Math Geosci 40:409. doi:10.1007/s11004-008-9156-6

    Article  Google Scholar 

  34. RStudio Team (2015) RStudio: integrated development for R. RStudio, Inc.

  35. Rupert MG, Cannon SH, Gartner JE (2003) Using logistic regression to predict the probability of debris flows occurring in areas of recently burned by Wildland Fires. U.S. Geological Survey.

  36. Rupert MG, Cannon SH, Gartner JE, Michael JA, Helsel DR (2008) Using logistic regression to predict the probability of debris flows in areas Burned by Wildfires, southern California, 2003–2006. U.S. Geological Survey.

  37. Sahoo BC, Oommen T, Misra D, Newby G (2007) Using the one-dimensional S-transform as a discrimination tool in classification of hyperspectral images. Can J Remote Sens 33(6):551–560

    Article  Google Scholar 

  38. Samui P, Gowda P, Oommen T, Howell T, Marek T (2012) Statistical learning algorithms for identifying contrasting tillage practices with Landsat Thematic Mapper data. Int J Remote Sens 33:5732–5745

    Article  Google Scholar 

  39. Santi PM, Victor G, Dewolfe JV, Higgins D, Cannon SH, Gartner JE (2007) Sources of debris flow material in burned areas. Geomorphology. doi:10.1016/j.geomorph.2007.02.022

    Google Scholar 

  40. Schwartz GE, Alexander RB (1995) Soils data for the conterminous United States derived from the NRCS state soil geographic (STATSGO) data base. U.S. Geological Survey. Accessed 01 Jan 2015

  41. Staley DM (2014) Emergency assessment of post-fire debris-flow hazards for the 2013 Springs Fire, Ventura County, California. U.S. Geological Survey, Reston. doi:10.3133/ofr20141001

    Google Scholar 

  42. Steyerberg EW, Harrell FE (2015) Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol. doi:10.1016/j/jclinepi.2015.04.005

  43. Welch B (1939) Note on discriminant functions. Biometrika. doi:10.2307/2334985

    Google Scholar 

  44. Wells WG II (1987) The effects of fire on the generation of debris flows in southern California. Geol Soc Am. doi:10.1130/REG7-p105

    Google Scholar 

Download references


This project was funded by the US Department of Transportation (USDOT) through the Office of the Assistant Secretary for Research and Technology. The authors would also like to thank the following individuals for their contributions to the work described: Caesar Singh, USDOT program manager, and Susan Cannon for providing data and necessary guidance.

Author information



Corresponding author

Correspondence to Ashley N. Kern.

Additional information

Disclaimer: The views, opinions, findings, and conclusions reflected in this paper are the responsibility of the authors only and do not represent the official policy or position of the USDOT/OST-R or any State or other entity.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kern, A.N., Addison, P., Oommen, T. et al. Machine Learning Based Predictive Modeling of Debris Flow Probability Following Wildfire in the Intermountain Western United States. Math Geosci 49, 717–735 (2017).

Download citation


  • Burn severity
  • Naïve Bayes
  • Mixture discriminant analysis
  • Remote sensing