Testing drought indicators for summer burned area prediction in Italy

During the summer season, the Italian territory is vulnerable to extended wildfires, which can have dramatic impact on human activities and ecosystems. Such wildfire events are usually associated with the presence of drought conditions and are generally more severe in southern Italy, owing to the high temperatures and reduced precipitation that characterize this geographical region. In this work, we discuss the statistical analysis of the burned area (BA) in Italy and build simple data-driven models linking BA to different climatic drivers, comparing the relevance of direct surface soil moisture information to that provided by drought indices such as the Standardized Precipitation Index and the Standardized Precipitation Evapotranspiration Index (SPEI). We show that considering surface soil moisture alone is sufficient to produce reliable out-of-sample predictions in a large part of the country. By contrast, SPEI allows for better model performances in the more arid regions.


Introduction
Wildfires are a major threat to human well-being, to societies and infrastructures (Keeley et al. 2011). With burned area (BA) peaks of over 8000 Km 2 /yr (San-Miguel-Ayanz et al.

2013), Mediterranean Europe is a region of primary interest for wildfires. Mediterranean
Europe is indeed expected to be one of the areas where climate change is going to have severe impact on the hydrological cycle (Giorgi 2006), with an increase of drought conditions and thus enhanced fire risk (Bedia et al. 2014;Turco et al. 2018a).
Even though short-term prediction, prevention and control measures have significantly improved in the last decades, a coherent, reliable and accurate methodology for the understanding and long-term prediction of wildfire dependence on climate conditions is only partially available (Bowman et al. 2020;Moreira et al. 2020;Leverkus et al. 2021). One reason relies in the complexity of wildfire dynamics, which involves a large range of time and space scales and a multiplicity of different factors (both of natural and anthropogenic origin) interacting with each other. Currently, long-term wildfire estimates in future climate conditions (decadal and multi-decadal) are based on either process-based (Migliavacca et al. 2013;Wu et al. 2015) or data-driven, empirical models (Amatulli et al. 2013;Turco et al. 2014). Studies focusing on shorter-term forecasting (e.g. for the following fire season) succeeded to satisfactorily model wildfires linking pre-fire climate conditions to summer burned area (Abolafia-Rosenzweig et al. 2022), relating the probability of large wildfires to the coincident weather and climate state (Barbero et al. 2019), and/or finding and analysing fire patterns in previous seasons (Ferreira et al. 2020).
In this work, we analyse the annual BA in Italy, a country characterized by the presence of very different climatic zones (Masala et al. 2012). Here we adopt the convention introduced in Metzger et al. (2005): from North to South, the Italian peninsula hosts the so-called Alpine South, Mediterranean Mountains, Mediterranean North and Mediterranean South environmental regions. In particular, the Mediterranean South and the southern regions of Mediterranean North are extremely susceptible to wildfires. In order to reflect such climatic differences and consider areas with similar properties, the country area is divided into sixteen ecoregions following Turco et al. (2017). Even though the extent of burned area has decreased during the last decades in a large part of Mediterranean Europe (Turco et al. 2016), the sheer amount of damage caused by wildfires is still a matter of great concern, leading to a consistent effort to predict, prevent and control fires in order to limit human and economic losses.
In Mediterranean Europe, the extension of the burned area has been shown to be correlated with same-summer dryness conditions (Turco et al. 2017). Empirical regressions of BA on drought indices such as Standardized Precipitation Index (hereafter SPI) (McKee et al. 1993) and the Standardized Precipitation Evapotranspiration Index (hereafter SPEI) (Vicente-Serrano et al. 2010) have been proven to be effective (Turco et al. 2018a;Turco et al. 2018b). SPI and SPEI are computed by aggregating the necessary data (precipitation and, for SPEI, also potential evapotranspiration) in the time span of interest which, in this case, corresponds to the climatic summer (June-July-August). SPI and SPEI thus provide indirect information on fuel conditions (moisture) and amount, which directly affect wildfires (Viegas and Viegas 1994). Precipitation and temperature affect both fuel dryness and total fuel available, the former mainly controlled by coincident conditions and the latter by antecedent conditions, especially in arid regions. That is, low precipitation during fire season increases fuel dryness, therefore raising its flammability, while the occurrence of the same conditions a few months or years in advance (depending upon the vegetation type) can reduce the fuel available to burn. Here we investigate fuel dryness exploring a direct measure of drought conditions-namely, soil moisture (Seneviratne et al. 2010;Zhou et al. 2021). We develop a data-driven model based on Surface Soil Moisture (0-7 cm, hereafter SSM) and compare the results with those obtained using SPI and SPEI. The relation between soil moisture and wildfires has recently been the subject of several studies, linking the former to different aspects of the latter, such as the extent of fires (Krueger et al. 2015;Chaparro et al. 2016), wildfire risk (Thomas Ambadan et al. 2020) or the likelihood of fire occurrence (Jensen et al. 2018). Here we use soil moisture as a predictor in an empirical, regressive model in order to predict the extent of wildfires during the summer fire season. The analysis provided relevant information also on how wildfires in different regions (i.e. arid or humid areas) respond to the climatic conditions, and what indicators are more suitable in the different cases.
The rest of this work proceeds as follows. Section 2 outlines the approach, defining the datasets, the explanatory variables (SPI, SPEI, SSM) and the empirical models adopted. Section 3 reports the results, considering both in-sample and out-of-sample prediction. Finally, in Section 4-5 we discuss the results and draw conclusions on the possible application of the methodology presented here.

Data and methods
The BA data analysed here are provided by the European Forest Fire Information System, EFFIS (San-Miguel-Ayanz et al. 2012). We consider the data aggregated in environmentally coherent zones (ecoregions) (Turco et al. 2018a;Metzger et al. 2005;Turco et al. 2017), obtained by merging the appropriate areas available at the NUTS3 (Nomenclature of Units for Territorial Statistics) level. For most of Italy, the BA dataset ranges from 1985 to 2015, while for northern Piedmont, northern Veneto and northwestern Lombardy covers 1985-2018, for southern Veneto and Emilia-Romagna 1985, for Sicily 1986 and for Sardinia the dataset is limited to 1997-2007 and 2009-2012. This latter data constraint will make us refrain from implementing the empirical model to Sardinia. A thorough analysis of wildfire exposure and risk in this region was performed by Salis et al. (2021). Here, as a proxy for surface soil moisture we employ surface volumetric soil water, measured in m 3 of water over m 3 of soil, taken from the first layer (swvl1) of the monthlyaveraged ERA5 dataset (Hersbach et al. 2020;Li et al. 2020) and suitably aggregated over the months of interest-from June to August. ERA5 is the fifth generation of ECMWF reanalysis of the Earth climate in the last decades; the surface layer of the volumetric soil water used in this work, going from the surface to a depth of 7 cm, is the first of the four layers used by the ECMWF Integrated Forecasting System to represent the soil.
The left panel of Fig. 1 shows the average percentage of area that was annually burned by wildfires in the different Italian ecoregions for the considered time span. Clearly, the impact of wildfires in southern Italy stands out as a matter of primary concern; the uppermost value is in Calabria, where roughly 0.75% of the surface is affected by wildfires every year. The right panel of Fig. 1 shows the average volumetric soil water content in the upper 7 cm of soil for each ecoregion, which decreases going southward.
Although most wildfires in Italy have anthropic ignition (Macias Fauria et al. 2011;Tedim et al. 2022), the extent of the burned area is at least partially controlled by factors such as the availability and flammability of fuel, which in turn depends on vegetation species (Wyse et al. 2016;Ganteaume et al. 2011) and on the meteoclimatic conditions (Krawchuk et al. 2009). Following the climate-based approach as in Turco et al. (2017Turco et al. ( , 2018, here we adopt the view that precipitation and potential evapotranspiration can be used as proxies for (at least some of) the climatic drivers of summer wildfires in Mediterranean Europe and for the fuel conditions and, moreover, that surface soil moisture can be used as a proxy for the combined effect of temperature and precipitation. To this end, we use Surface Soil Moisture (SSM), the Standardized Precipitation Index (McKee et al. 1993) (SPI) and the Standardized Precipitation Evapotranspiration Index (Vicente-Serrano et al. 2010) (SPEI) to summarize the climatic information that is relevant to fires. The SPI is estimated by aggregating the precipitation record in a selected time period of n months, then fitting it to a gamma distribution and, finally, transforming it into a variable with normal distribution. Thus, SPI can be interpreted as the deviation of precipitation from its long-term average, expressed in units of the long-term standard deviation. The SPEI is estimated in a similar way but, instead of considering only precipitation, the variable to be aggregated is the difference between precipitation and potential evapotranspiration (Thornthwaite 1948), the latter computed using FAO56-Penman-Monteith equation (Allan et al. 1998), which computation requires air temperature, mean wind speed, radiation flux, sensible heat flux, vapour pressure and saturation vapour pressure.
Since the burned area time series is positively skewed, for the analysis we consider its log transform. To minimize the impact of slowly changing factors, SSM, SPI, SPEI and the natural logarithm of the burned area, considered only on the time range for which the latter dataset was available, have been linearly detrended and standardized.
Based on previous analyses (Turco et al. 2017), we build a data-driven model based on linear regression, relating the summer BA to SSM, SPI or SPEI, as or Here, a and b are the slope and the intercept of the regression, respectively, and ϵ is a Gaussian and uncorrelated stochastic noise term used to represent any other neglected process-see Appendix A for further information. The value of m indicates the end month of the temporal aggregation and n indicates the total duration (in months) of the aggregation time window. For all the regressions discussed here, we have chosen m = 8 and n = 3 since, as shown in Table 1, it has been the best case (on average) of those taken into account.
(1) The linear regressions have been determined by a least-squares procedure. We considered both in-sample regressions, i.e. performing the regression on the whole dataset, and out-of-sample regressions, i.e. splitting the dataset in a training and a testing part, using the former to determine the model and the latter to test it. To this end, we adopt a leaveone-out procedure, using all points but one in the training dataset used to build the model and the left-out point to test the forecast, repeating this procedure for all the points in the dataset. Figure 2 shows the correlations between the observed BA and the results provided by the empirical regression models using SSM (left), SPI (centre) or SPEI (right) for all ecoregions except Sardinia.

In-sample analysis
The average values of the correlation are remarkable, especially considering the relative scarcity of data employed to approximate such a complex phenomenon. With the exception of the SSM-and SPI-based regression for Sicily, all correlations are statistically Including potential evapotranspiration in the indicator (i.e. SPEI) allows to get better correlations in southern ecoregions, in particular Sicily. The value of SPI is an especially good predictor for ecoregions in central Italy-in this area, computing SPEI does not substantially improve the results. Remarkably, the use of SSM information alone provides similar overall results with respect to the SPEI index. In particular, the mainland is modelled very well by SSM, while SPEI is better suited for Sicily.

Out-of-sample prediction
The main objective of this study is to provide a simple, reliable and parsimonious method to obtain statistical projections for the burned area during the fire season as a function of the climatic conditions. To test the prediction skill of our approach, we applied an out-ofsample regression with a leave-one-out procedure (see Section 2). The left panel of Fig. 3 shows the average correlation between the predicted and observed BA for each ecoregion, using the surface soil moisture model in the leave-oneout, out-of-sample approach. The central and right panels show the differences between the correlations obtained using SPI or SPEI, respectively, and using SSM. The model using SSM provides satisfying results, in particular for the Alpine region and for southern mainland. Note, however, that the SPEI model is more effective in modelling Sicily. This marks an important difference between the two approaches, the former providing on average better results when used on more humid regions, the latter when applied to more arid regions. As in the in-sample case, all correlations are statistically significant, aside from the SSMand SPI-based ones applied to Sicily.
We next check the mean error to ascertain if the predictions are inclined to over or under-estimate the data. Figure 4 shows the mean error for all regressions. The model results for regions including the Apennines, for Apulia and northern Italy-aside from a narrow band-typically underestimated the data, while coastal ecoregion predictions are mixed, even though often they tend to overpredict the data.
Averaging on all ecoregions, the explained variance of the SSM, SPI and SPEI models are, respectively, 0.48, 0.47 and 0.51. In particular, the SPEI-based model displays a better performance than the other models in Sicily and southern Apulia (the Salento area), confirming the relevance of SPEI for arid regions, and in southern Veneto and Emilia-Romagna. This ecoregion is indeed not arid but, as already pointed out in Metzger et al. (2005), it belongs to the Mediterranean North climatic region, which differs from the majority of northern regions (placed in the Mediterranean Mountains climate) due to its lower altitude. In particular, the climate of Emilia-Romagna is intermediate between northern and southern regions (Crespi et al. 2018), with relatively high temperatures-which explains the relevance of SPEI in this case-and low precipitation, even though it benefits from its placement in Po Plain (Brunetti et al. 2014). The results obtained with the SSM-based empirical model in a large part of the mainland are comparable to (or better than) the results provided by the other two models and are definitely the best ones in the Alpine and Tyrrhenian regions.
In the leave-one-out procedure, the variations of the coefficient a of the linear regression between burned area and the chosen meteoclimatic variable (SSM, SPI or SPEI) provide relevant information on the temporal stability of the empirical relationship between climate and BA. A highly dispersed ensemble of values of a indicates very different regression coefficients in the course of time, implying that some important driver may have not been taken into account or that the datasets have some outlier (see Appendix A).
To explore this issue, in Fig. 5 we show the standard deviation of the values of a in the leave-one-out procedure for each ecoregion. As a result, the Alpine area and the southernmost ecoregions are less effectively represented by the SSM and SPI-based model, as they are associated with a larger variability of the regression coefficient a, while the results on central ecoregions are comparable across the different models. A particular case is represented by the Ligurian coast and southern Piedmont, which are characterized by larger variability of the regression coefficient for all models, likely owing to the presence of an outlier (see Appendix A).

Discussion
To further explore the possible geographical differences in Italian wildfire regimes, in Fig. 6 we show the extension of the annual burned area during the fire season, partitioning Italy in two different macro-regions plus Sicily and Sardinia. The northern region, although appreciably represented in terms of total surface, gave a limited contribution to the overall burned area during the time frame 1985-2015. On the other hand, the southern mainland has been severely affected by wildfires, significantly contributing to the total burned area in Italy. Quite importantly, the southern mainland has been shown to be well modelled by the simple empirical models considered here (see Fig. 3), also when employing only soil moisture (SSM) or precipitation (SPI) as a driver. Adding potential evapotranspiration, that is adopting the SPEI driver, allows for a slightly better model performance over the southern mainland and also for much better predictions of the burned area in Sicily, which was not satisfactorily modelled using precipitation alone.
Even though the addition of potential evapotranspiration leads to better predictions in Sicily, it is clear the advantage of having successful out-of-sample predictions of summer burned area across the whole Italian mainland using just a single-variable driver. In particular, soil moisture combines information on precipitation and evapotranspiration, thus leading to a comparable performance of the SSM-based model over the whole country, albeit the use of SPEI is probably preferable in more arid regions and Emilia-Romagna (see Section 3.2). The upper right panel of Fig. 6 shows also the annual burned area in Sardinia for the few years included in the EFFIS dataset. The magnitude of wildfires in this region clearly stands out, making a matter of prime relevance to engage in further efforts to find long-term data and devise appropriate empirical models for this region. Although the annual burned area in most of Italy has been properly modelled using surface soil moisture, precipitation and potential evapotranspiration data, a few issues are still matter of concern, such as: • Sardinia, which has not been modelled due to lack of data, but where wildfire activity is very relevant; • Sicily, where the SPEI-based model improved the prediction, however still not reaching the performance obtained in the rest of southern Italy; • Northwestern Italy, which possibly suffers from the lack of information on important drivers; • In general, the explained variance for the leave-one-out procedure remains around 50%, a fact that indicates the presence of other drivers besides climatic variables. In particular, the characteristics of vegetation can play an important role that, when properly taken into account, could improve the performance of the models (e.g. D' Andrea et al. 2010). Further work on this topic is in progress.
Interestingly, we note that the burned area in all regions shows an oscillating behaviour, with peaks every 5 to 8 years, partially mirroring analogous changes in the SPI, SPEI and SSM drivers, as shown in the lower panels of Fig. 6. Notice, also, that the burned area shows higher variability in the peak values with respect to the drivers, suggesting the presence of other processes controlling the BA response to meteoclimatic variability, such as strong winds due to suitable synoptic conditions (Duane and Brotons 2018), eventually linking the direction of the advection with fire boosting in different seasons (Rodrigues et al. 2019); other works link climate teleconnection indices, in particular the North Atlantic Oscillation and Western Mediterranean Oscillation patterns, to wildfires activity (Rodrigues et al. 2021). Although the time series analysed here is too short to draw any meaningful conclusion on such oscillations, further work on longer time series and more extended areas should consider in detail such fluctuations in wildfire dynamics.
With respect to other past approaches linking surface soil moisture with the extension of wildfires (Krueger et al. 2015;Chaparro et al. 2016;Turco et al. 2019), the SSM-based model considered here displays remarkable features of reliability, simplicity and parsimony, and it is able to satisfactorily model the phenomenon-at least when a climatically coherent partition in homogeneous ecoregions is adopted. Such result suggests that, with a suitable partition of territories, it could be possible to find other parsimonious approaches-that is, taking advantage of a limited number of datasets-working on different variables that could be thought as proxies for at least part of the climatic properties of a given area.

Conclusions
In this paper, we implemented and tested simple empirical, data-driven models for the annual burned area during the fire season in Italy. As drivers, we used the dryness conditions during the same year of the fires, represented by either the surface soil moisture, the Standardized Precipitation Index or the Standardized Precipitation Evapotranspiration Index. This approach, as discussed by Turco et al. (2017Turco et al. ( , 2018a, has revealed a remarkable skill, especially when considering the scarceness of required data. With respect to previous works, here we have focused on a more restricted area, namely the Italian peninsula, and we have tested a new driver based on surface soil moisture (SSM).

3
The modelled burned area-both for in-sample or out-of-sample predictions-has shown a high and significant correlation with observed data. In particular, the behaviour in the southern mainland, an area strongly affected by wildfires, is accurately predicted using the single-variable approaches, thus allowing for reliable results with a modest effort. The SSM-based model provided similar results to both the SPI and SPEI models, aside from the important cases of arid regions and Emilia-Romagna, where the SPEI-based model performed better. On the other hand, in the Alpine and Tyrrhenian regions the SSM-based model provided better results. The importance of precipitation in modelling the extension of wildfires is particularly relevant when taking into account climate projections for the Mediterranean area. Since precipitation is expected to decrease in the Mediterranean (Giorgi and Lionello 2008;García-Ruiz et al 2011;Brogli et al. 2019), the burned area is expected to increase even in regions where wildfires activity has been less prominent till now, as in northern Italy.
Finally, future work should explore the performance of empirical models (such as those implemented here) for smaller spatial scales. In particular, the good performance of the SSM-based approach, if confirmed at smaller scales, suggests the use of SSM satellite products as drivers to model and predict summer burned area also in more limited geographical regions. Since SSM is also one of the outcomes of several seasonal prediction systems, empirical models tuned on SSM data could be used for seasonal BA predictions, extending to smaller territorial units the approach followed by Turco et al. (2018).

Appendix A Statistical tests
Pearson correlation in Figs. 2-3 has been proven to be statistically significant (p-value<0.05) in most of the tests for all the ecoregions (in-sample and out-of-sample; SSM, SPI and SPEI; a total of six tests applied on fifteen ecoregions), with the exception of SPI and SSM models for Sicily-an anomaly to be expected, since applying the methodology to that specific region has been shown to provide inaccurate results.
Since the term ϵ in Eqs.1-3 ideally represents an uncorrelated, Gaussian noise, residuals-that is, the difference between the actual and the modelled/predicted burned areashave been tested for normality and whiteness.
To ascertain normality we have applied the D'Agostino-Pearson test, checking that the p-value does not allow to reject the null hypothesis (which consists in the Gaussianity of the series). Due to the shortness of the datasets, we have merged the fifteen ecoregions datasets into one, suitably standardizing each series. The p-values corresponding to insample SPI and SPEI model failed the Gaussianity test owing to the presence of an outlier in the burned area related to the coastal northwestern ecoregion (Ligurian coast and southern Piedmont) in Fig. 5. Indeed, our dataset shows that just 25.5 ha of forest burned in 1996, a very low value compared with the average drought conditions displayed by SPI and SPEI-removing the outlier leads to the non-rejection of the null hypothesis, that is, there is no evidence for non-Gaussianity of the residuals.
We check the whiteness of the residuals by computing the autocorrelation function of residuals beyond lag 0 and checking that no more than 5% exceeds the value ±1.96/(√N), where N is the number of points in the dataset (Tong 1990). With this criterion, 8 out of 90 tests, all belonging to three ecoregions corresponding to Southern Apulia, northern Lazio and part of Tuscany, fail to comply with those requirements, hinting at the presence of some residual correlation in those areas.