Skip to main content
Log in

A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes

  • Published:
Extremes Aims and scope Submit manuscript

Abstract

Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge, we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a four-stage, bivariate, sparse spatial model for high-dimensional zero-inflated data that we develop using stochastic partial differential equations (SPDE), allowing sparse precision matrices for the latent processes. In Stage 1, the observations are separated in zero/nonzero categories and modeled using a two-layered hierarchical Bayesian sparse spatial model to estimate the probabilities of these two categories. In Stage 2, we first obtain empirical estimates of the spatially-varying mean and variance profiles across the spatial locations for the positive observations and smooth those estimates using fixed rank kriging. This approximate Bayesian inference method is employed to avoid the high computational burden of large spatial data modeling using spatially-varying coefficients. In Stage 3, we further model the standardized log-transformed positive observations from the second stage using a sparse bivariate spatial Gaussian process. The Gaussian distribution assumption for wildfire counts developed in the third stage is computationally effective but erroneous. Thus, in Stage 4, the predicted exceedance probabilities are post-processed using Random Forests. We draw posterior inference for Stages 1 and 3 using Markov chain Monte Carlo (MCMC) sampling. We then create a cross-validation scheme for the artificially generated gaps and compare the EVA 2021 prediction scores of the proposed model to those obtained using some competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The dataset analyzed during the current study is available from the corresponding author on reasonable request.

References

  • Abatzoglou, J.T., Williams, A.P.: Impact of anthropogenic climate change on wildfire across western US forests. Proc. Natl. Acad. Sci. 113(42), 11770–11775 (2016)

    Article  Google Scholar 

  • Abdelfatah, K., Bao, J., Terejanu, G.: Environmental modeling framework using stacked Gaussian processes. Preprint at https://arxiv.org/abs/1612.02897 (2016)

  • Agarwal, G., Sun,Y., Wang, H.J.: Copula-based multiple indicator kriging for non-Gaussian random fields. Spat. Stat. 100524 (2021)

  • Bakka, H., Rue, H., Fuglstad, G.A., Riebler, A., Bolin, D., Illian, J., Krainski, E., Simpson, D., Lindgren, F.: Spatial modeling with R-INLA: a review. Wiley Interdiscip. Rev. Comput. Stat. 10(6), e1443 (2018)

  • Banerjee, S.: Modeling massive spatial datasets using a conjugate Bayesian linear modeling framework. Spat. Stat. 37, 100417 (2020)

  • Bivand, R., Gómez-Rubio, V., Rue, H.: Spatial data analysis with R-INLA with some extensions. J. Stat. Softw. 63(20), 1–31 (2015)

    Article  Google Scholar 

  • Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  • Brown, E.K., Wang, J., Feng, Y.: US wildfire potential: a historical view and future projection using high-resolution climate data. Environ. Res. Lett. 16(3), 034060 (2021)

  • Cressie, N., Johannesson, G.: Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Series B Stat. Methodol. 70(1), 209–226 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Cumming, S.: A parametric model of the fire-size distribution. Can. J. For. Res. 31(8), 1297–1303 (2001)

    Article  Google Scholar 

  • Davison, A.C., Huser, R., Thibaud, E.: Geostatistics of dependent and asymptotically independent extremes. Math. Geosci. 45(5), 511–529 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Diggle, P.J., Moraga, P., Rowlingson, B., Taylor, B.M.: Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostatistical paradigm. Stat. Sci. 28(4), 542–563 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Dutta, S., Bhattacharya, S.: Markov chain Monte Carlo based on deterministic transformations. Stat. Methodol. 16, 100–116 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Fusco, E.J., Finn, J.T., Balch, J.K., Nagy, R.C., Bradley, B.A.: Invasive grasses increase fire occurrence and frequency across US ecoregions. Proc. Natl. Acad. Sci. 116(47), 23594–23599 (2019)

    Article  Google Scholar 

  • Gabriel, E., Opitz, T., Bonneu, F.: Detecting and modeling multi-scale space-time structures: the case of wildfire occurrences. J. Soc. Fr. Stat. 158(3), 86–105 (2017)

    MathSciNet  MATH  Google Scholar 

  • Gelfand, A.E., Banerjee, S., Gamerman, D.: Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics 16(5), 465–479 (2005)

    Article  MathSciNet  Google Scholar 

  • Gelfand, A.E., Schliep, E.M.: Spatial statistics and Gaussian processes: a beautiful marriage. Spat. Stat. 18, 86–104 (2016)

    Article  MathSciNet  Google Scholar 

  • Genton, M.G., Butry, D.T., Gumpertz, M.L., Prestemon, J.P.: Spatio-temporal analysis of wildfire ignitions in the St. Johns River water management district, Florida. Int. J. Wildland Fire 15(1), 87–97 (2006)

  • Hazra, A., Huser, R.: Estimating high-resolution Red Sea surface temperature hotspots, using a low-rank semiparametric spatial model. Ann. Appl. Stat. 15(2), 572–596 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  • Hazra, A., Huser, R., Bolin, D.: A sparse Gaussian scale mixture process for short-range extremal dependence and long-range independence. Preprint at http://arxiv.org/abs/2112.10248 (2021)

  • Hazra, A., Huser, R., Jóhannesson, Á.V.: Latent Gaussian models for high-dimensional spatial extremes. Preprint at http://arxiv.org/abs/2110.02680 (2021)

  • Hazra, A., Reich, B.J., Reich, D.S., Shinohara, R.T., Staicu, A.M.: A spatio-temporal model for longitudinal image-on-image regression. Stat. Biosci. 11(1), 22–46 (2019)

    Article  Google Scholar 

  • Hazra, A., Reich, B.J., Shaby, B.A., Staicu, A.M.: A semiparametric spatiotemporal Bayesian model for the bulk and extremes of the Fosberg Fire Weather Index. Preprint at http://arxiv.org/abs/1812.11699 (2018)

  • Hering, A.S., Bell, C.L., Genton, M.G.: Modeling spatio-temporal wildfire ignition point patterns. Environ. Ecol. Stat. 16(2), 225–250 (2009)

    Article  MathSciNet  Google Scholar 

  • Hrafnkelsson, B., Siegert, S., Huser, R., Bakka, H., Jóhannesson, ÁV.: Max-and-smooth: a two-step approach for approximate Bayesian inference in latent Gaussian models. Bayesian Anal. 16(2), 611–638 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  • Huser, R., Opitz, T., Thibaud, E.: Bridging asymptotic independence and dependence in spatial extremes using Gaussian scale mixtures. Spat. Stat. 21, 166–186 (2017)

    Article  MathSciNet  Google Scholar 

  • Huser, R., Wadsworth, J.L.: Advances in statistical modeling of spatial extremes. Wiley Interdiscip. Rev. Comput. Stat. 14, e1537 (2022)

  • Jain, P., Coogan, S.C., Subramanian, S.G., Crowley, M., Taylor, S., Flannigan, M.D.: A review of machine learning applications in wildfire science and management. Environ. Rev. 28(4), 478–505 (2020)

    Article  Google Scholar 

  • Johannesson, Á.V., Siegert, S., Huser, R., Bakka, H., Hrafnkelsson, B.: Approximate Bayesian inference for analysis of spatio-temporal flood frequency data. Ann. Appl. Stat. 16(2), 905–935 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  • Joseph, M.B., Rossi, M.W., Mietkiewicz, N.P., Mahood, A.L., Cattau, M.E., St, L.A., Denis, R.C., Nagy, V., Iglesias, J.T. Abatzoglou., Balch, J.K.: Spatiotemporal prediction of wildfire size extremes with Bayesian finite sample maxima. Ecol. Appl. 29(6), e01898 (2019)

  • Juan, P., Mateu, J., Saez, M.: Pinpointing spatio-temporal interactions in wildfire patterns. Stoch. Env. Res. Risk Assess. 26(8), 1131–1150 (2012)

    Article  Google Scholar 

  • Katzfuss, M.: Bayesian nonstationary spatial modeling for very large datasets. Environmetrics 24(3), 189–200 (2013)

    Article  MathSciNet  Google Scholar 

  • Koh, J., Pimont, F., Dupuy, J.L., Opitz, T.: Spatiotemporal wildfire modeling through point processes with moderate and extreme marks. Preprint at https://arxiv.org/abs/2105.08004 (2021)

  • Lindgren, F., Rue, H., Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Stat. Soc. Series B Stat. Methodol. 73(4), 423–498 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Menze, B.H., Kelm, B.M., Masuch, R., Himmelreich, U., Bachert, P., Petrich, W., Hamprecht, F.A.: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinf. 10(1), 1–16 (2009)

    Article  Google Scholar 

  • Møller, J., Díaz-Avalos, C.: Structured spatio-temporal shot-noise Cox point process models, with a view to modelling forest fires. Scand. J. Stat. 37(1), 2–25 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Møller, J., Syversveen, A.R., Waagepetersen, R.P.: Log-Gaussian Cox processes. Scand. J. Stat. 25(3), 451–482 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Nadeem, K., Taylor, S., Woolford, D.G., Dean, C.: Mesoscale spatiotemporal predictive models of daily human-and lightning-caused wildland fire occurrence in British Columbia. Int. J. Wildland Fire 29(1), 11–27 (2020)

    Article  Google Scholar 

  • Opitz, T.: Editorial: EVA 2021 Data Competition on spatio-temporal prediction of wildfire activity in the United States. Extremes (to appear). (2022)

  • Penttinen, A., Stoyan, D., Henttonen, H.M.: Marked point processes in forest statistics. Forest Sci. 38(4), 806–824 (1992)

    Google Scholar 

  • Pereira, J., Turkman, K.: Statistical models of vegetation fires: Spatial and temporal patterns. Handbook of Environmental and Ecological Statistics, pp. 401–420. Taylor & Francis: Chapman and Hall/CRC (2019)

  • Pimont, F., Fargeon, H., Opitz, T., Ruffault, J., Barbero, R., Martin-StPaul, N., Rigolot, E., Rivière, M., Dupuy, J.L.: Prediction of regional wildfire activity in the probabilistic Bayesian framework of Firelihood. Ecol. Appl. 31(5), e02316 (2021)

  • Preisler, H., Ager, A.: Forest-fire models. Environ. Encycl. 3, 2181–2185 (2013)

    Google Scholar 

  • Preisler, H.K., Brillinger, D.R., Burgan, R.E., Benoit, J.: Probability based models for estimation of wildfire risk. Int. J. Wildland Fire 13(2), 133–142 (2004)

    Article  Google Scholar 

  • Preisler, H.K., Westerling, A.L.: Statistical model for forecasting monthly large wildfire events in western United States. J. Appl. Meteorol. Climatol. 46(7), 1020–1030 (2007)

    Article  Google Scholar 

  • Pyne, S., Andrew, P., Laven, R.: Introduction to Wildland and Rural Fire. Princeton University Press, Princeton, NJ (1996)

    Google Scholar 

  • Ríos-Pena, L., Kneib, T., Cadarso-Suárez, C., Klein, N., Marey-Pérez, M.: Studying the occurrence and burnt area of wildfires using zero-one-inflated structured additive beta regression. Environ. Model. Software 110, 107–118 (2018)

    Article  Google Scholar 

  • Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Taylor & Francis: Chapman and Hall/CRC (2005)

  • Saha, A., Basu, S., Datta, A.: Random forests for spatially dependent data. J. Am. Stat. Assoc. 1–19 (2021). https://doi.org/10.1080/01621459.2021.1950003

  • Scott, A.C.: The pre-quaternary history of fire. Palaeogeogr. Palaeoclimatol. Palaeoecol. 164(1–4), 281–329 (2000)

    Article  Google Scholar 

  • Serra, L., Saez, M., Varga, D., Tobías, A., Juan, P., Mateu, J.: Spatio-temporal modelling of wildfires in Catalonia, Spain, 1994–2008, through log-Gaussian Cox processes. Modelling, Monitoring and Management of Forest Fires III, pp. 11139. (2012)

  • Trucchia, A., Egorova, V., Pagnini, G., Rochou, M.C.: Surrogate-based global sensitivity analysis for turbulence and fire-spotting effects in regional-scale wildland fire modeling. Preprint at https://arxiv.org/abs/1809.05430 (2018)

  • Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688. (2011)

  • Wikle, C.K.: Low-rank representations for spatial processes. Handbook of Spatial Statistics, pp. 114–125. Taylor & Francis: CRC Press (2010)

  • Wuebbles, D.J., Fahey, D.W., Hibbard, K.A., Arnold, J.R., DeAngelo, B., Doherty, S., Easterling, D.R., Edmonds, J., Edmonds, T., Hall, T. et al.: Climate science special report: Fourth national climate assessment (NCA4), vol. I. (2017)

  • Xi, D.D., Taylor, S.W., Woolford, D.G., Dean, C.: Statistical models of key components of wildfire risk. Annu. Rev. Stat. Appl. 6, 197–222 (2019)

    Article  MathSciNet  Google Scholar 

  • Yadav, R., Huser, R., Opitz, T.: A flexible Bayesian hierarchical modeling framework for spatially dependent peaks over-threshold-data. Spat. Stat. 51, 100672 (2022)

  • Zammit-Mangion, A., Cressie, N.: FRK: an R package for spatial and spatio-temporal prediction with large datasets. J. Stat. Softw. 98(4), 1–48 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

The first three authors (Cisneros, Gong, Yadav) contributed equally to this work by implementing some of the methods and writing parts of the paper. The last two authors (Hazra, Huser) oversaw the whole project, with Hazra having a leading role throughout all practical aspects of the data competition (supervision of the Bedouins Team, methods’ implementation, results’ interpretation, writing). We would like to thank Thomas Opitz for organizing this very interesting data competition for the EVA 2021 Conference, as well as Thomas Mikosch for welcoming a Special Issue in Extremes about this topic. This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-CRG2020-4394.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnab Hazra.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 183 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cisneros, D., Gong, Y., Yadav, R. et al. A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes. Extremes 26, 301–330 (2023). https://doi.org/10.1007/s10687-022-00460-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10687-022-00460-8

Keywords

AMS 2000 Subject Classifications

Navigation