1 Introduction

Many areas across the world have seen a rise in extreme fires in recent years. Those include South America and southern and western Europe, and also unexpected places above the Arctic Circle, like the fires in Sweden during the summer of 2018 (de Groot et al. 2013; European Commission 2019). The Mediterranean area is no stranger to these changes, and fires have become larger and more frequent, e.g.  200 kha burned in Portugal in mid-October 2017 or 63 kha in Greece in August 2007 (Castellnou et al. 2018). Indeed, forest fires in this area represent more than 80% of the total forest area burned on the European continent (San-Miguel-Ayanz et al. 2020).This trend has been driven by socioeconomic changes that have generated rural depopulation and changes in traditional land use (Chergui et al. 2018). In addition, the combination of longer drought periods and higher woody biomass and flammability of dominant species have created an environment favourable to fire spread (Piñol et al. 1998; Millán et al. 2005). Lately, very frequently fire behaviour exceeds firefighting capabilities and fire agencies have trouble in suppressing flames while providing safety for both firefighters and citizens (Werth et al. 2016).

One of the main drivers that explain the causes of this new regimen of wildfire is the current conditions of weather and climate (Viegas et al. 2004; Pereira et al. 2005; de Dios et al. 2022; Balch et al. 2022; Rao et al. 2022). Several studies have related forest fires and meteorological conditions; for instance, Viegas and Viegas (1994) found an association between rainfall and burned area in Portugal; Pausas (2004) found the same relation between inter-annual variability in area burned and rainfall in Mediterranean basin; Raja (2011) identified that relative humidity exhibits higher correlation with the monthly number of fires in Germany compared to temperature or precipitation; differences in subtype of climate in northern and southern Portugal also explained the differences between number of fires and burnt area in Portugal (Parente et al. 2018). Consequently, several wildfire danger rating systems based on fire weather indices have been developed for wildfire prevention and management (San-Miguel-Ayanz et al. 2018). The Canadian Fire Weather Index System (FWI System) is one of the most well-documented and used indices worldwide (Papagiannaki et al. 2020). Indeed, it has been adopted by European Forest Fire Information System (EFFIS) and other forest fires agencies.

In recent decades, major efforts have been made to determine the influence of climate change on natural hazards (e.g. forest fires), and to develop models and tools to properly characterise and quantify changes in climatic patterns. However, while physical processes involved in ignition and combustion are theoretically simple, understanding the relative influence of human factors in determining wildfire is an ongoing task (Mann et al. 2016). Human-caused fires that occur repeatedly in a given geographical area are not simply reducible to individual personal factors and thus subject to pure chance. They are usually the result of a spatial pattern, whose origin is in the interaction of environmental and socioeconomic conditions (Koutsias et al. 2015). This is particularly true in human-dominated landscapes such as Spain. In such areas, anthropogenic ignitions outnumber natural ignitions, reflecting the extensive interfacing of human land uses with the natural environment. Through these interactions, humans significantly alter fuel properties and flammability, directly impacting fire ignition frequency and intensity. Furthermore, human populations play a dual role in fire management, acting as both fire initiators through accidental or intentional ignitions, and fire suppressors through active fire control efforts. In such cases, human influence may cause sudden changes in fire frequency, intensity, and burned area size (Pezzatti et al. 2013). The role of human activities in changing those conditions has not been assessed at global scale. Human drivers mostly have a temporal dimension, which is why an historical/temporal perspective is often required (Zumbrunnen et al. 2011; Carmona et al. 2012). Multiple local studies have pinpointed recurring factors associated with human-caused fire ignition. These include proximity to transportation infrastructure, land-use interfaces (e.g. forest-agriculture, forest-urban), land management practices, and social conflict dynamics. (Leone et al. 2003; Martínez et al. 2009; Vega García et al. 1995).

Despite significant advancements in understanding wildfire behaviour, predictive models often struggle to anticipate the exceptional speed, intensity, and expansive footprint of recent fire events. This discrepancy manifests in numerous wildfires exceeding historical records in terms of fatalities, material losses, burnt area, or propagation speed, indicating potential limitations in current predictive capabilities (Duane et al. 2021). Efforts to comprehend and forecast wildfires have proven remarkably challenging, necessitating a critical reassessment of established knowledge and the development of innovative models by fire experts. Only through such comprehensive efforts can we hope to fully understand and effectively predict these formidable hazards (Moreira et al. 2020; Rogers et al. 2020).

Point process methodologies have been fruitfully applied to the study of wildfires, enabling the evaluation of how spatio-temporal heterogeneity in fire occurrence within a defined timeframe relates to the underlying spatial distribution of land use characteristics, such as vegetation cover, urban zones, and wetland ecosystems (Juan et al. 2012; Møller and Díaz-Avalos 2010; Pereira et al. 2013; Serra et al. 2014). Real-world environmental covariate data often present challenges for wildfire prediction models due to its specific spatio-temporal characteristics. Typically, these data exhibit high spatial and temporal resolution, diverse numerical formats, and may suffer from low signal-to-noise ratios relative to the target phenomenon. Effective pre-processing becomes crucial for extracting meaningful signals and building robust models. Additionally, generating artificial covariates that encapsulate relevant fire-related conditions could represent a valuable strategy to enhance predictive power amidst noisy data.

The substantial spatio-temporal dimensionality of wildfire data, encompassing both observed occurrences and control cases without occurrences, has traditionally been addressed through data subsetting or aggregation approaches. These typically involve segmenting data by year or spatial regions (e.g. Genton et al., 2006; Turner, 2009; Serra et al., 2014; Xu and Schoenberg, 2011). Some recent approaches have focused more strongly on studying the interplay of the spatial and temporal structures (Gabriel et al. 2017), or on the usefulness of a specific Fire Weather Index aggregating weather data (Fargeon et al. 2018).

We here use log-Gaussian Cox process models, which have already been identified as useful models for wildfires since they allow capturing spatio-temporal aggregation structures through random effects (Gabriel et al. 2017; Pereira et al. 2013; Serra et al. 2014). Fitting spatial point process models to some spatial patterns is computationally intensive due to—amongst other things—the large number of individual points in the data set (Burslem et al. 2001; Waagepetersen 2007; Waagepetersen and Guan 2009; Law et al. 2009). However, in this paper we consider a rather different and complex situation with a more complicated and novel approach to understand wildfires. In our case, we are combining two likelihoods to understand the distribution of wildfires in the Mediterranean. We are estimating not only the distribution of the fires trough point process approach but also the intensity of each point is modelled at the same time. Summarising, we are using a marked point process model instead of a “simple” point process using Integrated Nested Laplace Approximation (INLA, see Rue et al. (2009a); Illian et al. (2012)).

Bayesian inference for log-Gaussian Cox processes using INLA is now well-established, but remains challenging for high dimensional data. The typical approach to overcome this issue would be to build a grid structure to summarise the data (Opitz et al. 2020); alternatively, in this work we specify complex spatio-temporal structures in the model to provide a deep understanding of forest fires. We follow the spread of forest fires across the Mediterranean basin, examining the role of multiple random fields in capturing spatial clustering dynamics in the fires distribution between 2003 and 2013. Accounting for spatially varying detection probability is a particular strength of inlabru, which was developed specifically for (ecological) datasets with complex observation processes.

When fitting a marked point process model in inlabru, the spatio-temporal structure of the marks, in our case the size of the fires (independent of the point distribution according to Kolgomorov–Smirnov test (Zhang 2014)) and the spatio-temporal structure of the points, wildfire locations, can be represented with different Gaussian random fields in a shared representation of continuous space and discrete time. In this way, multiple data features and different sources of spatial clustering can be incorporated into a single model of wildfires distribution (Laxton et al. 2022).

The aims of the present study were (1) to model the spatial and temporal distribution of forest fires in the Mediterranean Basin between 2003 and 2013, (2) to understand the most important environmental and socioeconomic factors related to forest fire behaviour and distribution, (3) to develop for the first time a detailed dynamic cartography of fire regime in the Mediterranean Basin assessing results from countries of western Asia, northern Africa, and southern Europe.

2 Data on Fire Occurrences and Socioeconomic Covariates

In this section, we are presenting the dataset used during the analysis, introducing the study area and different variables that were included in the model. The dataset comprises a total of 46,519 fire incidents, each meticulously documented with a comprehensive set of variables derived from the Fire Atlas, socioeconomic data acquired from the World Bank, and the corresponding Drought Code value for the specific day and grid cell associated with each incident.

2.1 Study Area

The Mediterranean Sea, the world’s largest inland sea, is bordered to the north by densely populated and highly industrialised regions, transitioning dramatically to less populated and more desert landscapes in the south.

Here we consider the 19 countries located in the Mediterranean Basin (see Fig. 1): Albania, Algeria, Bosnia and Herzegovina, Croatia, Cyprus, Egypt, France, Greece, Israel, Italy, Lebanon, Libya, Montenegro, Morocco, Portugal, Slovenia, Spain, Syria, Tunisia, and Turkey.

The region (defined as the region around the Mediterranean sea with a Mediterranean-type climate by Chergui et al. (2018)) has an annual dry and warm period (summer) when intense crown fires are frequent (Pausas 2004; Archibald et al. 2013; Bedia et al. 2015). In such ecosystems, fire controls the age and structure of the vegetation, as well as the composition of species (Verdú and Pausas 2007), that is, vegetation depends not only on climate, but also on the fire regime (Trabaud and Galtié 1996), and human activities have strongly regulated fire regimes across the Mediterranean region (Pausas and Vallejo 1999; Keeley et al. 2011). Indeed, there is evidence that fires were frequent during the late Quaternary (Carrión et al. 2004), and they were also probably frequent much earlier, as many species have acquired adaptive mechanisms to persist and regenerate after recurrent fires (Pausas 2004; Pausas and Verdú 2005).

Fig. 1
figure 1

Study area. Countries coloured in brown have been included in the study. Dashed black box serves as reference to the area that is going to be represented in future plots (Color figure online)

2.2 Fire Occurrence Data: Fire Atlas

This study leverages two distinct fire occurrence data sources. The first is the World Fire Atlas (Andela et al. 2019), a global active fire product derived from data acquired by the Along-Track Scanning Radiometer (ATSR-2) and the Advanced Along-Track Scanning Radiometer (AATSR). These instruments were onboard the second European Remote Sensing Satellite (ERS-2) and the Environment Satellite (ENVISAT), respectively (Page et al. 2008). Spanning the period from November 1995 to the present, with the exception of a brief hiatus between January and June 1996, the World Fire Atlas boasts a comprehensive temporal coverage. Each sensor exhibits a nadir spatial resolution of 1 km and, with a swath width of 512 km, enables an equatorial revisiting period of 3 days.

The Global Fire Atlas dataset tracks the day-to-day dynamics of individual fires based on moderate resolution burned area data. Between 2003 and 2013, were identified about 13.3 million individual fires globally (see Fig. 2, more detail is observed in Figure 11). For each individual fire, the dataset provides information on the timing and location of the ignition, the fire size, perimeter, duration, daily fire line, daily expansion, speed, and direction of spread. Methodology and validation are presented in Andela et al. (2019). Data available are summarised in Table 1.

Table 1 Factors on fire occurrence from Global Fire Atlas included in the analysis with units and description

Summarising the content from Table 1, we can see in Fig. 3 (right) that the trend in number of fatalities between 2003 and 2013 is positive, increasing the number of wildfires during the time. However, we can observe that there is a fluctuation in number of fires from 2007. Similar trend can be observed in the total burned area Fig. 3 (left), but with a dramatic low value during 2008. Also, in this graph we can observe than during 2012 with lower number of fatalities than 2011 the burned area is almost the same. Analysing the relationship between duration and expansion, we can see in Fig. 4 (right) that the average number of days with an active fire is between 4.5 and 5 days, with an average of expansion between 0.45 and 0.7 km\(^2\)/day. Also, analysing the variables theoretically more correlated, we can see in Fig. 5 that speed is not strongly correlated with other variables related to the burned area, having obtained a correlation coefficient of 0.67 between speed and total area, 0.61 between speed and expansion and 0.66 between speed and fire line.

Finally, we have represented in Fig. 6 the different landcover affected by the fire between 2003 and 2013. We can observe that the predominant affected surface is croplands having the higher burned area every year.

Fig. 2
figure 2

Distribution of forest fires locations observed in the Mediterranean basin between 2003 and 2013

Fig. 3
figure 3

Number of wildfires per year analysed (left), total burned area per year in Km\(^2\) (right)

Fig. 4
figure 4

Average number of fire duration (days) per year (left), average daily fire expansion (Km\(^2\)/day) per year (right)

Fig. 5
figure 5

Relationship between variables related to burned area. Relationship between average speed of fire (Km\(^2\)/day) and Total burned area per fire (Km\(^2\)) (Right). Relationship between average speed of fire (Km\(^2\)/day) and Average daily fire expansion (Km/day) (Centre). Relationship between average speed of fire (Km\(^2\)/day) and Average daily fire line length (Km) (Left)

Fig. 6
figure 6

Summary of affected Dominant land cover type from Friedl et al. (2002) per year

2.3 Drought Code: from Canadian Fire Weather Index (Vitolo et al. 2019)

The Fire Weather Index System (FWI) of the Canadian Forest Fire Danger Rating System works well in forested ecosystems where organic soil layers are the primary surface fuels Stocks et al. (1989) and are used by the European Forest Fire Information System (EFFIS) for characterising fire danger in Europe.

Three moisture codes are available, the Fine Fuel Moisture Code (FFMC), Duff Moisture Code (DMC), and the Drought Code (DC) featuring increasing drying timelags. They independently track the movement of water in soil profiles of increasing depth through a “bookkeeping” system in which today’s code is built on yesterday’s. The moisture codes rely on the four weather variables, air temperature, relative humidity, wind speed, and precipitation. These codes consist of underlying semi-physical models of moisture movement finished with abstraction equations that cause fire danger to increase as fuel moisture is depleted. The three moisture codes are then combined with wind to yield three fire danger indices that represent potential spread rate, fuel weight consumed, and frontal fire intensity Van Wagner C et al (1987), Stocks et al. (1988), Wotton (2009)

The Drought Code (DC) (Giuseppe et al. 2019) was developed as part of the Canadian Forest Fire Weather Index System in the early 1970 s to represent a deep column of soil that dries relatively slowly. Unlike most other fire danger indices or codes that operate on gravimetric moisture content and use the logarithmic drying equation to represent diffusion, the DC is based on a model that balances daily precipitation and evaporation Miller (2020). The Drought Code is a numeric rating of the average moisture content of deep, compact organic layers (deep duff layer, 1–20 cm). This code is a useful indicator of seasonal drought effects on forest fuels and the amount of smouldering in deep duff layers and large logs. DC has a long-term response (about 50 days) to weather variations. The rating is defined in the range from zero to infinity, with a default startup value equal to 15.

This dataset is available on a daily basis as a grid with a spatial resolution of about 80 km covering the entire globe (see Fig. 7).

Fig. 7
figure 7

Drought Code grid of study area

The Drought Code is calculated using a daily time step by interpolating the atmospheric fields at local noon when fire conditions are considered to be at their worst (Vitolo et al. 2019). In our case, we have processed the information to give the exact DC valued to each fire according to time and location.

2.4 Socioeconomic Factors: World Bank (WB)

The World Bank (WB) database constitutes a freely accessible repository offering time series data encompassing diverse thematic areas aggregated at the national level. In essence, it encompasses information spanning back to the 1960 s from more than 250 nations, encompassing broad subjects including health, nutrition, education, economic advancement, disparities, poverty, and fundamental demographic metrics. Founded on a rigorous methodology, the World Bank database initiative ensures systematic updates to its dataset. (World Bank 2014).

The World Development Indicators (WDI) database stands as a prominent international repository, meticulously developed and upheld by the World Bank. Within this repository reside a plethora of socio-economic metrics primarily aggregated at the national level. Among these, Gross Domestic Product (GDP) and its constituent components, alongside population figures, emerge as the most widely utilised. These indicators typically manifest as annual time series, with many tracing back to the year 1960. While a substantial proportion of indicators are originated directly by the World Bank, occasionally in collaboration with other international bodies such as the United Nations (UN), a considerable portion is amalgamated from external databases. The utilisation of WDI, contingent upon specific requirements, offers a consolidated platform for comprehensive data retrieval, obviating the necessity for referencing more specialised databases (Van Der Mensbrugghe 2016).

The WDI dataset serves a multitude of functions within academic research and policy analysis contexts, including elucidating historical trends, conducting parameter estimation, calibrating trend analyses, and evaluating structural shifts over time. Examples of applications include the examination of population cohorts, the determination of agriculture’s contribution to GDP and employment, as well as the assessment of indicators pertinent to the Millennium Development Goals (MDGs) (Van Der Mensbrugghe 2016).

We have used the World Bank’s World Development Indicators database between 2003 and 2013 (The World Bank 2003-2013) for data on agriculture, industry and services employment, forest area, rural population, and gross domestic product (see Table 2).

Table 2 Summary of socioeconomic variables included in the analysis (World Bank): total population, density (population per sq. km of land area), employment in agriculture (% of total employment), employment in industry (% of total employment), employment in services (% of total employment), forest area (% of land area), rural population (% of total population), and gross domestic product

As depicted in Fig. 8, a synopsis of various socioeconomic indicators is presented by country. Distinct disparities are discernible across nations for each variable. For example, Lebanon and Israel exhibit population densities exceeding double those of other countries. Moreover, an analysis of the three employment sectors reveals that over 40% of the populace in Albania and Morocco are engaged in agriculture, whereas the proportion of the population employed in industry spans between 15% and 35% across all nations, with nearly 80% of Israel’s population working in services. Lastly, in terms of GDP, France, Italy, and Spain exhibit the highest values.

Fig. 8
figure 8

Graphical summary of socioeconomic covariates by country. a Density—population per square kilometre. b Employment in agriculture—% of total employment. c Employment in industry—% of total employment. d Employment in services—% of total employment. e Rural population—% of total population. f Gross domestic product—GDP (bottom left)

3 Model

As previously done with complex ecological systems (Laxton et al. 2022), we are assuming that the spatial locations of the forest fire data \(x_{i}:i=1,...,n\) have been generated as a partial realisation of a point process that is itself the object of scientific interest.

To model our data, we fit a spatio-temporal log-Gaussian Cox process Møller et al. (1998). Log-Gaussian Cox processes (LGCP) are widely used to model point patterns, due to their flexibility and their usefulness in the context of modelling aggregation (clusters) relative to some underlying unobserved environmental field (Illian et al. 2010; Simpson et al. 2016). In recent years, there have been considerable number of papers where the log-Gaussian Cox point process is used, for example, in Brix and Diggle (2001) and Liang et al. (2008) in the context of disease mapping or in Møller and Díaz-Avalos (2010) in the context of wildfires.

3.1 Spatio-Temporal Log-Gaussian Cox Process Models

A spatio-temporal LGCP is defined as a spatio-temporal Poisson point process conditional on the realisation of a stochastic intensity function \(\Lambda (x, t) = \text {exp}\{S(x, t)\}\), where \(S(\cdot )\) is a Gaussian process. Gneiting and Guttorp (2010) review the literature on formulating models for spatio-temporal Gaussian processes. They make a useful distinction between physically motivated constructions and more empirical formulations. An example of the former is given in Brown et al. (2000), who propose models based on a physical dispersion process. In discrete time, with \(\delta \) denoting the time separation between successive realisations of the spatial field, their model can be expressed as

$$\begin{aligned} S(x,t)= \int h_{\delta }(u)S(x-u, t-\delta ) \textrm{d}u + Z_{\delta }(x,t), \end{aligned}$$
(1)

where \(h_{\delta }\) is a smoothing kernel and \(Z_{\delta }\) is a noise process, in each case with parameters that depend on the value of \(\delta \) in such a way as to give a consistent interpretation in the spatially continuous limit as \(\delta \rightarrow 0\).

A Cox process is an inhomogeneous Poisson process whose intensity is itself a realisation of a non-negative-valued stochastic process. In the spatio-temporal setting, we write the intensity process as \(\Lambda (x,t)\). The conditional Poisson property of the Cox process precludes any direct interactions between events. This makes it most appealing as a model when an observed pattern is thought to be determined by observed and/or unobserved environmental processes. The model for the stochastic component of \(\Lambda (x,t)\) could be either mechanistic or empirical in character, but empirical models are more common practice (Diggle 2013). We allow the expectation of \(\Lambda (x,t)\) to vary with x and t but assume that its covariance structure is stationary. A convenient re-parameterisation is then to

$$\begin{aligned} \Lambda (x,t)= \lambda (x,t) R(x,t), \end{aligned}$$
(2)

where R(xt) is a stationary process with expectation 1 and covariance function \(\gamma (u,v)=\sigma ^{2}r(u,v)\). It follows that \(\lambda (x,t)\) is the conditional intensity of the point process, and the stationary of R(xt) implies that the point process is intensity-reweighted stationary.

3.2 Marked Point Process Models

The point pattern of forest fire locations is modelled as a log Gaussian Cox process in inlabru. The distribution of fires is independent conditional on the point process intensity \(\Lambda (s,t)\).

$$\begin{aligned} \Lambda (s,t) = \exp \left( \alpha _0 + R(s,t) \right) \end{aligned}$$
(3)

The log-intensity of the spatio-temporal point process model is given by an intercept term \(\alpha _0\) and a spatio-temporal Gaussian random field R(st).

The probability of fire presence P(st) is dependent on both the size of fires, and the distribution of the existing fires in space and time. We will model P(st) as a logit-transformed Gaussian process (Lindgren and Rue 2015; Laxton et al. 2022):

$$\begin{aligned} P(s,t) = \text{ logit}^{-1} \left( \beta _0 + \sum _{i=1}^I \alpha _i x_i (s,t) + \beta R(s,t) + M(s, t) \right) \end{aligned}$$
(4)

Here, \(\beta _0\) represents intercept term and \(x_{i}(s,t)\) is the value of each covariate i at location s the time t. The spatial density is incorporated into the model through the inclusion of the spatio-temporal Gaussian random field R(st) multiplied by a scaling parameter \(\beta \), which determines the strength and direction of the interaction between fire location and fire size. The spatio-temporal Gaussian random field M(st) represents the spatial structure in distribution of the observed fires in space and time unexplained by the intercept and covariates. The Gaussian random fields R(st)) and M(st) are approximated using an SPDE.

Model fitting and inference were carried out in R version 4.1.1 (R Core Team 2021) using the packages inlabru version 2.3.1.9000 (Bachl et al. 2019) and R-INLA version 21.02.23 (Rue et al. 2009b).

3.3 PC Priors

Penalised complexity (PC) priors were used to inform this covariance structure, according to ecological understanding of the distances across which values may be correlated, and the extent to which values may vary. PC priors are interpretable default priors which operate under the principle of Occam’s Razor, penalising complexity away from a simpler base model (Simpson et al. 2017).

Penalised complexity (PC) priors were originally defined by Simpson et al. (2017) to provide a generic and understandable approach to building prior distributions (Simpson et al. 2017). In contrast with many other approaches to constructing priors, users do not select priors for parameters using a generic prior distribution not directly linked to a specific model component; however, they decide how sure they are about how much flexibility is needed in a particular model component. Here, a prior is not a prior on the parameters themselves but on the divergence of a model component from a base model. Specifically, a prior is put on a flexibility parameter \(\xi \) for a model component with density \(\pi (x|\xi )\); the more flexible the model is, the larger the flexibility parameter and hence the distance from the base model, i.e. the “simplest” model for the specific model component (Laxton et al. 2022).

In all of the models fitted in this paper, PC priors are used to inform the Matérn covariance structure of the Gaussian random field. This is carried out using a reparameterisation of the range parameter \(\kappa \) and scaling parameter \(\tau \) from the Matérn covariance function to a range and variance parameter, \(\rho _s\) and \(\sigma ^2\) (Blangiardo et al. 2013).

For the Matérn field, the base model represents the instance where the Gaussian random field has close to no impact; the limiting case with \(\sigma ^2\rightarrow 0\) and \(\rho _s\rightarrow \infty \). Thus, if the data do not support the inclusion of a Gaussian random field in the model, its effects can be ‘removed’ from the analysis.

We explore for the temporal element of this field modelling temporal correlation between consecutive years with a first-order autoregressive (AR1) process.

Similarly, PC priors were also used to inform the temporal correlation structure of the AR1 process in the models which included this effect. In the AR1 process, PC priors are placed on the correlation parameter (also termed \(\rho \), although note that this differs from the \(\rho _s\) range parameter for the Matérn covariance) with two possible base models, \(\rho _t=0\) (no correlation) or \(\rho _t=1\) (strong positive correlation). We have set: \(P(\rho _t>0)=0.9\), with base model \(\rho _t=1\). This can be interpreted as a high probability that there is a positive correlation between consecutive time points.

3.4 Model Evaluation

An usual way to estimate out-of-sample prediction error is cross-validation (see Geisser and Eddy (1979) and Vehtari and Lampinen (2002)) for a Bayesian approach), but scientists have always looked for alternative methods, as cross-validation involves repeated model fits and it can run into trouble with sparse data (Gelman and Shalizi 2013). When the aim is model comparison, the most common index is the DIC (Spiegelhalter et al. 2002; Van Der Linde 2005), which, in the same way to the Akaike information criterion AIC, involves two components, a term that measures the goodness of fit and a penalty term for growing model complexity. More recently, the Watanabe–Akaike information criterion WAIC (Watanabe and Opper 2010) has been suggested as an appropriate alternative for estimating the out-of-sample expectation in a fully Bayesian approach. This method starts with the calculated log point wise posterior predictive density and then adds a correction for the effective number of parameters to adjust for overfitting (Gelman and Shalizi 2013). WAIC works on predictive probability density of observed variables rather than on model parameter; hence, it can be applied in singular statistical models (i.e. models with non-identifiable parameterisation, see (Li et al. 2016). We have also considered the conditional predictive ordinate (CPO) (Pettit 1990) to perform model evaluation. The conditional predictive ordinate (CPO) is established on leave-one-out cross-validation. CPO estimates the probability of observing a value, after having already observed the others. The mean logarithmic score (LCPO) was calculated as a measure of the predictive quality of the model (Gneiting and Raftery 2007; Roos and Held 2011). High LCPO values indicate possible outliers, high-leverage, and influential observations. In Table 5, we can see the summary of the WAIC and LCPO for the model developed.

4 Results

We have formulated two distinct marked point process spatio-temporal models for scrutinising fire size and another marked point process model to elucidate the influence of variables on fire propagation speed. These models were assessed using the widely applicable information criterion (WAIC) and leave-one-out cross-validation (LCPO) to ascertain the optimal fit. Subsequently, a novel model was constructed to elucidate the impact of various covariates on wildfire velocity.

Table 3 illustrates the outcomes of the model in which the mark corresponds to the burned area. It delineates two intercepts: one pertaining to the point process (i.e. the distribution of fires) and another related to the mark process (i.e. the area burned in each fire). Following these intercepts, the table presents various variables incorporated in the model.

Table 3 also presents the implementation of two distinct models. The column labelled ”All covariates” corresponds to the comprehensive model, incorporating all variables. Conversely, in the ”Reduced model” column, non-significant variables (agriculture, industry, and services) alongside the variable ”speed”—derived from the total burned area divided by the number of days—have been excluded.

As we can see in Table 3, only two variables in the full model have impact on the area of forest fires. Speed and DC are affecting positively the surface of forest fires. Surprisingly, variables that according to the literature have a strong relationship with the size of the fires like GDP do not have any effect on the response variable here.

From a model perspective, we also appreciate the intercept in point and mark have the same behaviour in both models with area as mark response.

Table 3 Posterior mean and 95% credible intervals for regression coefficients scaling parameter (\(\beta \)) representing the interaction between R(st) and the probability of fire presence; temporal correlation parameter from the AR1 process; parameters of the spatio-temporal R(st) and M(st)fields

Lastly, we have included in our last model only the variables that have a clear relationship with the response. However, we have introduced duration instead of speed to avoid the possible correlation with the area. As we can see in Table 4, the two variables have a positive effect with the response.

Table 4 Posterior mean and 95% credible intervals for regression coefficients scaling parameter (\(\beta \)) representing the interaction between R(st) and the probability of fire presence; temporal correlation parameter from the AR1 process; parameters of the spatio-temporal R(st) and M(st)fields

Table 5 shows that the model with only DC and duration has best fit according to the WAIC and LCPO, followed by the model with all covariates.

Table 5 Watanabe–Akaike information criterion(WAIC) and logarithmic score of conditional predictive ordinate (LCPO). for model with area as a mark

Upon analysing the spatial dispersion of fires, we can discern three components within the predictions: the mark process, the point process, and the amalgamation of both, referred to as the marked point process. To elucidate these outcomes, Fig. 9 depicts various representations. In the top-left corner, the intensity map for the year 2003 illustrates the joint process (mark and Cox process), delineating the distribution of fires. The colour gradient from blue to yellow signifies the intensity of fires corresponding to their size. Concurrently, the top-right panel displays the total number of fires spanning from 2003 to 2013, with those occurring in 2003 highlighted in yellow. In the bottom-left section, the results of the point process are depicted, showcasing intensity ranging from blue (small size) to yellow (large size). Similarly, the bottom-right map portrays the predictions of the mark field.

When scrutinising the spatial configuration and progression spanning from 2003 to 2013, we can encapsulate the findings in accordance with the distinct components of the model as follows.

We can see in Figure 12 the calculated mark random field M(st) spanning from 2003 to 2013, derived from the marked point process model, suggests a general escalation in intensity over this timeframe, with 2009 and 2013 standing out as years with particularly heightened predictions. Geographically, certain regions such as the North-West of the Iberian Peninsula, the North coast of Algeria, the East coast of Corsica, Bosnia and Montenegro, and Central Turkey consistently exhibit greater intensity compared to other areas.

Looking at Figure 13 the estimated point random fields R(st) for 2003–2013 from the marked point process model. We can see similar predictions than the previous plot. Increasing the areas also to the South of Italy, East of Greece, and West of Iberian Peninsula.

Finally, in Figure 14 we can see the mean predicted area of forest fires for 2003–2013 from the marked point process model. In this case, we can see the high number of points accumulated, almost every year, on the North-West of the Iberian Peninsula, North of Algeria, South of Italy, and Centre of Turkey. In contrast, in 2007, the fires were concentrated in the Balkan countries and South-East of Italy.

4.1 Speed Model

Speed is one of the most interesting variables in forest fires behaviour (Duane et al. 2021). However, this variable in fire atlas database is derived from extension and time (days) (Andela et al. 2019); for that reason we need to be cautious with the output obtained. In this case, none of the variables included in the model shows any relationship.

As we have done with the area, we can proceed to analyse the spatial structure and temporal evolution with the speed as response. We are going to comment each of the components included in the model in order to understand different changes in space and time from 2003 to 2013.

Looking at Figures 15,16 and 17, we can see in M(st) that in general each year is homogeneous according to the speed. Also, we can highlight 2005 and 2013 as year with higher speed predicted in the marks. Also, in R(st), there is a higher spatial variation in the marks, with higher values in Turkey and west of Iberian Peninsula every year. Finally, we can see the high number of points accumulated, almost every year, on the North-West of the Iberian Peninsula, North of Algeria, South of Italy, and Centre of Turkey as it was happening with the area.

Finally, in Figure 17 we can see the mean predicted speed of forest fires for 2003–2013 from the marked point process model. In this case, we can see the high number of points accumulated, almost every year, on the North-West of the Iberian Peninsula, North of Algeria, South of Italy, and Centre of Turkey as it was happening with the area.

5 Discussion

The outcomes obtained provide novel perspectives for comprehending the spatio-temporal distribution of forest fires, specifically in terms of burned area, within the Mediterranean region. This marks the first instance where such understanding has been achieved through a synthesis of climatic and socioeconomic variables. The analysis has revealed that the relationship extends beyond certain variables, such as DC and speed, with the area, indicating that the progression of forest fires over time is influenced by temporal and spatial factors, as well as their combined effects. Moreover, it has contributed towards identifying patterns of fire occurrence at the landscape level. These findings hold potential significance for managers and policymakers in formulating effective fire prevention strategies.

Fig. 9
figure 9

Wildfires intensity in Mediterranean basin in 2003 (size, top left). Distribution of large Wildfires in Mediterranean basin in 2003 (top right). Estimated mark random field (M(st), bottom left) from model with all covariates and point random field (R(st), bottom right) for 2003 from the marked point process model from model with all covariates. Colour scale is given in low-high intensity as we are interested in relative differences across space and not absolute values. Purple background colour is used for visual clarity. Colour scale is given in low-high intensity as we are interested in relative differences across space and not absolute values (Color figure online)

A notable correlation between drought code (DC) and burnt area was demonstrated. The DC was originally devised to assess the soil water content of deep and compacted duff, typically exceeding a depth of 25 cm (Ruffault et al. 2018). The DC distinguishes itself from the DMC in terms of the soil horizon considered, which is shallower in the DMC, and in its more detailed depiction of the water balance, incorporating a Thornthwaite-type evapotranspiration function (Turner 1972). Consequently, the DC serves as a pivotal index that characterises severe drought conditions within moisture reserves, which are linked to water stress in forest species and can consequently result in increased fuel availability in the event of a fire. Owing to its depth, the DC exhibits the slowest rate of change among moisture codes, with a time lag of 52 days (Van Wagner C et al 1987). Essentially, the DC value diminishes with effective rainfall and escalates with evapotranspiration, thereby indicating that higher values correspond to an increased probability of wildfire persistence and smouldering. (Van Wagner C et al 1987). Nonetheless, certain studies (e.g. Wilmore (2001), Girardin et al. (2006), McElhinny et al. (2020) have highlighted the necessity of utilising overwintered DC values when computing fire behaviour at the onset of the fire season. This implies that significant disparities may arise between the utilisation of default DC values and the adoption of overwintered DC values. Consequently, the fire risk in certain regions could be more severe than what is anticipated based on default DC predictions (McElhinny et al. 2020).

The scaling parameter represents the strength and direction of the interaction between the point field R(st) and the probability of fire.

In AR1 models, the spatio-temporal field M(st) leverages strength across multiple years, indicating that the spatial arrangement is shaped by the recurring incidence of relatively isolated fire occurrences over time. Likewise, the standard deviation facilitates ample variation within the field, enabling estimated fire intensities to span from low to very high values (see Figure 12).

The temporal correlation \(\rho _t\), is estimated to be extremely high, and the spatial structure of M(st) is predicted to be the same for each year of data. This indicates that the spatio-temporal field in these models captures a spatial structure in the data that is constant over time.

Given that the AR1 models identify a persistent trend of a static spatial structure over time, they project no alterations in the spatial depiction of the spatio-temporal field across different years. Consequently, the spatio-temporal field is compelled to portray the most probable spatial correlation structure for all years, resulting in an average derived from the complete dataset. However, this averaging process can yield deceptive estimates of fire distribution, particularly when fires emerge in previously unaffected areas.

The model has revealed a correlation between the area burned and the rate of wildfire spread. Recent extreme wildfire events, such as those observed in France in 2016 (Ruffault et al. 2018), Chile in 2016–2017 (Castillo et al. 2020), Spain and Portugal in 2017 (Turco et al. 2019), Greece in 2018 (Lagouvardos et al. 2019), and eastern Australia in 2019–2020 (Nolan et al. 2020), have underscored the limitations of wildfire suppression capabilities. Many of these events are associated with exceptionally rapid rates of spread (Duane et al. 2021), leading to understandable failures in fire suppression systems and resulting in devastating consequences in terms of burned area, even in regions that are well-prepared and well-equipped. The challenge or impossibility of control explains why these fires can affect larger areas. Indeed, the rate of spread constitutes a crucial component of the information utilised to determine operational decisions regarding firefighting strategies and the allocation of firefighting resources, with the primary objective of mitigating the impact of fires (Storey et al. 2021), particularly in terms of burned area or affected structures. Rates of spread, such as the reported 10,000 hectares burned per hour in Portugal in October 2017 (Castellnou et al. 2018), have been documented in the literature as a consequence of exceptional fire-weather conditions (Ruffault et al. 2018; Duane et al. 2021) or prevailing fuel conditions (Duane et al. 2021).

Anthropogenic factors, such as population density and gross domestic product (GDP), may exert both positive and negative influences on fire dynamics (Aldersley et al. 2011). While human activity has been demonstrated to play a significant role in igniting fires (Russell-Smith and Yates 2007), evidence derived from palaeorecords suggests that, on the whole, human interventions have led to a reduction in wildfire extent. This reduction is attributed to factors such as breaks in vegetation continuity, fuel load reductions (Marlon et al. 2008), as well as concerted efforts and investments in fire suppression. GDP, which serves as an indicator of the macroeconomic status within a country or region and is closely linked with education, investment, household income, etc., tends to be associated with increased human settlement, and concurrently exhibits lower burned area (Aldersley et al. 2011). Additionally, Marques et al. (2011) observed a decrease in the proportion of burned area with declining population density in Portugal, positing that lower densities may impede fire detection and prolong the time before initial suppression operations. Conversely, Kim et al. (2019) linked the rise in GDP and population density with an increase in forest fires in South Korea. However, the results of the proposed model revealed a stochastic relationship between population density, GDP, and the response variable. Moreover, Guo et al. (2015) found no significant association between socioeconomic factors such as population density and GDP, and the occurrence of human-caused fires in Chinese boreal forests, which aligns with Martínez et al. (2009) but contradicts findings from other researchers (Chang et al. 2013; Syphard et al. 2007).

The findings also revealed an absence of correlation between the percentage of rural population and the occurrence of wildfires. This observation stands in stark contrast to recent investigations; for instance, Akter and Grafton (2021) examined the nexus between socioeconomic disadvantage and wildfire risk during Australia’s ”black summer” of 2019–2020, concluding that a connection existed between wildfire exposure and rural areas with clear socioeconomic disadvantages. Conversely, Davis et al. (2018) demonstrated that socioeconomically advantaged populations in the United States inhabit locations highly prone to wildfires due to the presence of high environmental amenities and corresponding property values. Furthermore, no apparent relationship was evident between the area burned and the percentage of the population engaged in agriculture, industry, or service sectors. These findings are surprising given that regions experiencing pronounced processes of land abandonment (characterised by reduced agricultural activity) tend to be more susceptible to large fires, while peri-urban areas witness smaller yet more frequent fires owing to heightened anthropogenic pressure (associated with the development of industry and service sectors) (Carlucci et al. 2019). However, certain agricultural practices aimed at clearing shrubland may potentially lead to uncontrolled fires (Sumarga 2017). Indeed, in Europe’s Mediterranean basin, approximately two-thirds of all fires originate in agriculture (Wunder et al. 2021).

Various factors can exert differential effects contingent upon the region under consideration. In our context, variables have been conceptualised on a scale where local insights may prove beneficial in elucidating the causal relationships underlying forest fires.

Upon scrutinising the models, we contend that the employment of spatio-temporal models confers an advantage in comprehending the diverse dynamics, particularly considering that the temporal and spatio-temporal perspective is not commonly explored in the analysis of forest hazards.

In summary, we can infer that the causality of forest fires is influenced not only by environmental factors but also by socioeconomic variables.

Over time, landscapes have become increasingly hazardous due to land abandonment, resulting in an expansion of forested areas. Areas devoid of trees experience a higher proportional rate of burning compared to wooded regions (Urbieta et al. 2019). In southern Europe, fires exhibit a preference for shrublands over forested types (Moreira et al. 2011; Oliveira et al. 2014), although this preference may vary across different locations (Moreno et al. 2011). Such variations could be attributed to changes in ignition patterns resulting from shifts in wildland–agricultural and wildland–urban interfaces (Rodrigues and de la Riva 2014; Modugno et al. 2016). Landscapes characterised by a diversity of land uses, particularly those featuring a mixture of forest and agriculture, are identified as being the most vulnerable (Ortega et al. 2012). Therefore, the inclusion of vegetation as a factor in analysing causality warrants further investigation.

Changes in ignition sources can influence fire trends. In European Mediterranean countries, a small proportion of fires are ignited by lightning, with the majority ignited by human activities. Fires originating from these two sources often occur in distinct locations (Vázquez and Moreno 1998), potentially affecting the vegetation they consume and the challenges encountered during extinguishment. However, no discernible shifts between these two ignition sources have been observed (Ganteaume et al. 2013). Concerning fires caused by human activities, the majority are deliberate, followed by instances of negligence (Urbieta et al. 2019). In recent years, there has been an increase in fires caused by negligence and a decrease in deliberate ignitions (Ganteaume et al. 2013). Whether these changes are differentially impacting trends in fire occurrence necessitates further investigation (Urbieta et al. 2019).

6 Conclusions

We have delineated three spatio-temporal marked log-Gaussian Cox processes to elucidate and forecast the extent and velocity of forest fires spanning from 2003 to 2013. The findings indicate that socioeconomic covariates exhibit a stochastic influence. When all covariates were incorporated, only DC and speed demonstrated any discernible impact on the area, with this effect dissipating upon the removal of socioeconomic variables and speed.

From a modelling standpoint, further enhancements are conceivable. Integrating socioeconomic variables at the regional level, such as NUTS 2 in Europe (de Rivera et al. 2020), could offer a distinct insight into the influence of these variables on the causality and dynamics of forest fires. However, the accessibility of such variables is restricted beyond European borders.

Another advantageous strategy would involve incorporating covariates into the Cox process, utilising raster layers across the study area to enhance comprehension and prediction not only in the point process but also in the marked point process. However, implementing this enhancement may entail certain computational expenses and time investments.

Ultimately, data regarding investment in fire management, encompassing both extinguishing and mitigation efforts, would be pivotal in enhancing our understanding of forest fires and their progression. Once more, the challenge with such datasets lies in their availability, as this information is not universally accessible across countries, not only at the regional level but also at the national level. Moreover, this information could shed light on one of the prevalent theories in fire management: how varying fire management practices may yield unforeseen consequences in the behaviour of future fires (Minnich 1989; Silva et al. 2010).

We recognise that there are opportunities for refinement in this analysis, and future endeavours are envisaged to comprehend not only the causality of fires but also their intensity and dispersion. Analysing smaller regions to delve deeper into fire dynamics using varying levels of detail will be imperative for comprehending and mitigating fires in specific locales. For this reason, we perceive this work not as a conclusive chapter but as a prelude to a more comprehensive understanding of fires across different scales.