1 Introduction

Different boosting methods and logistic regression (LR) models have been jointly analyzed since more than 20 years ago from a methodological perspective, obtaining similar performances (Friedman et al. 2000). One reason could be that boosting can be interpreted as an approximation to additive modeling with a logistic scale using maximum (Bernoulli) likelihood (Friedman 2001). More recently, the comparisons have been extended to many versions of boosting in the context of different models and different health (Ingwersen et al. 2023; de Menezes et al. 2017) or environmental applications (Arabameri et al. 2019; Rizeei et al. 2019). These studies reaffirm the proximity between LR and boosting algorithms, but with some differences depending on the boosting versions, modelling and data. Frequently, it is assumed that machine learning methods overcome traditional statistical procedures, in particular when dealing with large datasets. In this paper we assess the performance of both methods for detecting burned areas using satellite images. It is worth mentioning that the dataset used in this context is characterized by severe class imbalance and a large volume of data.

Satellite images are crucial sources of information for monitoring wildfires on the Earth surface and specifically, the generation of global images of Burned Areas (BA) has been an important issue since the late 1990s. Consequently, since the 2000s, it is possible to find periodical and global BA products routinely derived worldwide, but with limited level of accuracy. An excellent review of developments in detection of burned areas with remote sensing data is provided by Chuvieco et al. (2019).

The most popular Burned Area product is MCD64A1 of the Moderate Resolution Imaging Spectroradiometer (MODIS) (Tomshin and Solovyev 2021) mission, because it provides a monthly global gridded 500 m image of burned areas and quality information (Giglio et al. 2018) for all over the Earth. The algorithm used for creating this product lies in a burn sensitive vegetation index (VI) to create dynamic thresholds, however, some disturbances caused by clouds, atmospheric absorption and sensor-introduced noises can still be present (EEDC 2022), and erroneous identifications of burned areas are possible. For example, in the European Mediterranean countries, MCD64A1 shows a 25% of BA overestimation with regard to the European forest fire information system (EFFIS) (Turco et al. 2019a). FireCCI51 is another global BA product (Lizundia-Loiola et al. 2020) available from 2001 to 2020 based on MODIS 250 m reflectance product, but recent studies show little improvement of this product with regard to MCD64A1 (Hall et al. 2021; Vetrita et al. 2021). There are also specific contributions for detecting burned areas with Landsat and Sentinel missions, but regrettably they are only available in specific regions.

Overall, satellite BA products are based on a great variety of algorithms with different efficiencies depending on image resolutions and ecosystem variety, yet recently, the fast evolution of machine learning techniques has enabled to improve the detection of burned areas (Jain et al. 2020), and the study of wildfire spread patterns (Khanmohammadi et al. 2022). Known methods such as random forest (Ramo and Chuvieco 2017; Belgiu and Drăguţ 2016), support vector machines (Zhang et al. 2015; Petropoulos et al. 2011), artificial neural networks (Mas and Flores 2008), convolutional neural networks with long short term memory (LSTM) (Pinto et al. 2020) and geometric semantic genetic programming (Castelli et al. 2015) are available methods for detecting burned areas, yet gradient boosting based models are widely used. LR has also been used for classifying burned areas since more than two decades ago (Koutsias and Karteris 2000; Bastarrika et al. 2011), but only recently, machine learning techniques have become more popular, mainly due to the need of managing big datasets. Comparisons between statistical and machine learning approaches are scarce, but in some cases many similarities are observed in the predictive performances (Ramampiandra et al. 2023).

In this paper, we evaluate the effectiveness of burned area detection in a region spanning over 100,000 km\(^2\) on the Iberian Peninsula. We utilize remote sensing data and compare the performance of two classifiers with distinct approaches: a traditional statistical classifier, LR, and a machine learning-based classifier, the extreme gradient boosting algorithm (XGBoost). The application presented in the paper starts with a detailed description of the procedure: (a) we shortly describe the MODIS products, (b) we explain the definition of the auxiliary variables and the reference classification variable, and (c) we compare both classifiers for the identification of burned and non-burned pixels. We know that the presence of highly imbalanced data, specifically burn and unburned pixels, can significantly complicate the estimation process, but facing both procedures using the same data can give a fair evaluation. The validation is made by comparing the predicted classification with the target reference and other external classifications, not involved in the estimation process. Both procedures use the same auxiliary variables.

The rest of the paper is organised as follows. Section 2 describes the study region and the data. Subsection 2.1 includes specific subsections for MODIS remote sensing data, spectral indices and additional products. Subsection 2.3 elucidates the process of acquiring valuable data and constructing the input dataset. It encompasses explanations of the differences of spectral indices, the average density of active fires and the definition of the reference classification. XGBoost and LR are briefly explained in the context of our data analysis in Subsects. 3.1 and 3.2 of Sect. 3, respectively. The final results, and the accuracy metrics obtained for the validation process of XGBoost and LR, are shown in Sect. 4. Finally, the paper ends with some conclusions.

2 Study region and data

The region of interest covers several Iberian peninsula regions including Galicia and the Portuguese regions of Santarém, Braga, Vila Real, Coimbra, Guarda, Aveiro, Viseu, Castelo Branco, Portalegre, Braganca, Porto and Viana do Castelo, with an extension of about 84,348 km\(^2\). Figure 1 illustrates the extent of burned areas within the study region over the designated time period. Galicia is a region of roughly 30,000 km\(^2\) located in the north of Spain and above Portugal, that concentrates a high number of fires in Spain. In 2017, approximately 80% of the 620 km\(^2\) area that burned in Galicia occurred within a span of 2 days. During this period, more than 20 fires resulted in burned surfaces exceeding 5 km\(^2\). Portugal, with a land area of about 92,090 km\(^2\), consists of over 66% forested land. In 2017, Portugal lost by wildfires the greatest area in 1 year, more than 5000 km\(^2\) (Turco et al. 2019b). It is the European country most affected by fires during the last decade (San-Miguel-Ayanz et al. 2020).

Fig. 1
figure 1

Overview of the burned areas in the region of interest between September and November 2017

Regarding the region of interest, multiple variables have been generated from various data sources. These sources provide information in a variety of formats, including both vectorial and raster formats. The data are standardized into a stack of rasterized images and projected with the MODIS mission format, denoted as SR-ORG:6974. Most of the data used in this work is derived from multi-spectral satellite images, explicitly capturing information related to burned areas; this data is presented in Subsect. 2.1. The remainder of the section includes additional information obtained primarily from vectorial sources in Subsect. 2.2. Finally, in Subsect. 2.3, the data processing for extracting valuable information are described.

2.1 Multi-spectral data

The study requires a substantial amount of data, and MODIS provides a significantly larger variety of variable types related to burned areas compared to other satellite programs. The MODIS program has two satellites, Terra and Aqua, both of which capture daily images of the earth surface at 500 m spatial resolution. Both satellites cross the same orbit with a 3 h lag, allowing them to complement the missing values from each other.

The download and data loading into the R software (R Core Team 2023) was performed using the rsat package (Pérez-Goya et al. 2021). This package assembles the images by the region and time of interest in an object that contains images covering the region of interest for the 91 days of September, October and November 2017. Using rsat, we import a total of 182 layers for each spectral index. This comprises 91 daily layers from MOD09GA and an additional 91 daily layers from MYD09GA, sourced from Terra and Aqua satellites, respectively.

2.1.1 Spectral indices

Table 1 shows the definition of the spectral indices, and the pre-fire and post-fire differences used in this study. The normalized burn ratio (NBR) is the most popular spectral burn index, originally developed for identifying burned areas (García and Caselles 1991), and later used for burn and fire severity assessment (Lutes et al. 2006). It is defined with the near-infrared and the shortwave-infrared indices, both sensitive to burning but in opposite way. Other indices, such as the normalized burn ratio 2 (NBR2) (Santana et al. 2018), the burn-sensitive vegetation index (MVI) (Giglio et al. 2009), and the mid-infrared bispectral index (MIRBI) (Trigg and Flasse 2001; McCarley et al. 2018), are also highly responsive to variations in live green vegetation or moisture content. The normalized difference vegetation index (NDVI) is a well known indicator of vegetation. A zero value means no vegetation, and a value near 1 indicates high level of vegetation. We also use the near infrared (NIR) band 2 of MODIS, commonly used for monitoring temporal burn signatures (Tucker 1979; Mohler and Goodin 2010). Values close to zero are associated with unburned areas. All of these indices decrease significantly after a fire, becoming good indicators of burned pixels (Chen et al. 2011). More comprehensive descriptions of the implications of spectral indices in the context of burned area monitoring can be found in the literature (Pereira 1999; Libonati et al. 2010, 2011).

Table 1 Definition of the spectral indices in terms of the red (R), near infrared (NIR), shortwave-infrared (SWIR1, SWIR2) and thermal infrared (TIRS1) bands; pre-fire and post-fire differences of spectral indices identified with the suffixes pre and post respectively

Data obtained from satellites may contain invalid information due to cloud cover. The Terra and Aqua daily cloud masks are used to remove the cloudy observations from the classification process. This work reduces cloud gaps by using composite images of indices. These images are given by covering the daily indices defined with MOD09GA with the corresponding daily indices defined with MYD09GA. It means that we substitute unavailable pixels of MOD09GA indices with available MYD09GA pixels for the corresponding indices, reducing the number of unavailable or erroneous pixels. Several Gap-filling methods have been developed to solve this issue (Militino et al. 2019a, b; Wang et al. 2022).

2.2 Additional products

Apart from using multispectral indices, supplementary products incorporating additional data have been employed to enhance the detection of wildfires. One of those is the MCD12Q1 land cover product which provides yearly land cover maps at a spatial resolution of 500 m, featuring 17 classification legends. The first 11 categories correspond to different types of vegetation, while the remaining categories encompass a variety of croplands, urban and built-up lands, and water bodies. This classification enables the derivation of a binary variable to determine burnability. If a pixel corresponds to one of the first 11 categories, it is considered burnable, whereas pixels falling outside of these categories are deemed unburnable.

The study also includes specific products related to the fires and burned areas:

  1. 1.

    Fire location products. Fire location products allow us to find the date of the nearest fire to compute the pre-fire and post-fire differences. They also improve the detection of burned areas by incorporating features that include spatial information, such as the distance to the nearest fire and the intensity of the point process of active fires.

    1. (a)

      The fire location product MCD14DL of October 2017. This is a monthly product of near real-time (NRT) MODIS Thermal Anomalies or fire locations representing the center of a 1 km spatial resolution pixel.

    2. (b)

      The visible infrared imaging radiometer suite (VIIRS) of October 2017. It detects active fires and other thermal anomalies (VIIRS 2021) providing a means to identify fire-induced changes in surface reflectance (Loboda et al. 2007). VIIRS data complements and enhances MODIS (MCD14DL) fire detection (NASA 2020).

    Both products provide vector files. Then, to use them in addition to the spectral indexes, they are reprojected into the grid that defines the MOD09GA images.

  2. 2.

    Burned area products. Burned area products serve a dual purpose. Firstly, they enable us to establish a reference classification for our methods. Secondly, they facilitate the validation of our classification models’ performance.

    1. (a)

      The EFFIS wildfire data base of October 2017. It contains perimeters of burned areas in Europe since 2003 in vector files (EFFIS 2021). This product is derived from the daily processing of MODIS satellite imagery at 250 m ground spatial resolution. The product is reprojected into the MOD09GA grid and used as a reference for the classification methods. The EFFIS reference classification is a binary variable where the burned pixels are those covered by the EFFIS burned areas. Building upon this reference, we create a refined classification reference, labeled as Lclass, for use in the validation process. Lclass removes atypical data from EFFIS classification, and identifies as burned pixels only those of the EFFIS burned areas greater than 2 km\(^2\), and with \(d.NBR1 > 0.1\). The unburned pixels are those not defined as burned in the previous step, but with \(d.NBR1 \le 0.15\) for avoiding isolated burn scars not identified in the EFFIS database (Lutes et al. 2006).

    2. (b)

      The MCD64A1 product. It provides burned area data at a 500 m resolution grid. Even if the format is similar, we still need to reproject the MCD64A1 images into the MOD09GA grid. This product has the sole purpose of validating our results.

    3. (c)

      The FireCCI5.1 product. It provides burned area data at a 250 m resolution grid. Even if this product gives images, we still need to reproject them into the MOD09GA grid. This product is used for validation purposes only.

2.3 Input variables for the classifiers

Classification methods require a reference classification, which serves as the dependent or target variable, along with auxiliary or predictor variables. The reference classification is derived from EFFIS burned area data, while the auxiliary variables are obtained from differences of spectral indices, the distance of each pixel to the nearest fire (distAF) and the average intensity of active fires (aF.int).

2.3.1 Differences of spectral indices and distances to the nearest active fire

Differences of spectral indices are a frequent tool in change detection algorithms for identifying burned areas (Van Wagtendonk et al. 2004; Miller et al. 2009). The difference process consists in subtracting the index value posterior to the fire (post) from the index value previous to the detected fire (pre) (Eidenshink et al. 2007). Using the vector files of active fires of October 2017 from MDC14DL and VIIRS products in the region of interest, we obtain the nearest active fire date using the Dirichlet tessellation, and we define the distance (distAF) of every pixel to the nearest active fire. The Dirichlet tessellation creates a polygon around a center point where any other point inside the polygon is nearest to the center point than any other point. Next, we assign to all pixels of the same polygon the date of the closer fire. Fire dates are used for identifying the eight previous and the eight posterior observations for every pixel. These images are drawn from the time series of composite indices defined between September and November 2017. Next, we calculate the difference indices subtracting the mean of the eight posterior dates of fires from the mean of the eight previous dates. The amplitude of 8 days is empirically the most suitable for time series of MODIS images (Giglio et al. 2018).

2.3.2 Average density of active fires

Fire locations are usual examples of point processes (Borrajo et al. 2020), because they are realizations of a random point process in a two-dimensional space (Baddeley et al. 2015). The reference model of a point process is a uniform or homogeneous Poisson point process, where the number of points in a region A follows a Poisson distribution with mean \(\lambda *area(A)\), where \(\lambda\) is the intensity of the process, defined as the expected number of points by unit area. When the point process is not homogeneous, such as the case in Portugal or Spain where clusters of municipalities present a higher frequency of wildfires (Martinho 2018), the intensity can be effectively modeled by incorporating spatial coordinates (u) through linear, generalized linear or generalized additive models (gam). To gain model flexibility a gam model is used here. The similarity found between the Poisson log-likelihood and the linear Poisson regression, allows the intensity to be expressed as log-linear in the parameter \(\theta\). Namely,

$$\begin{aligned} \text {log}\, \lambda _{\theta }(u)=S(u), \end{aligned}$$

where S(u) is a smooth function of the coordinates u. In this case, we use a thin-plate basis function of dimension \(k=30\) (Turner 2009). Figure 2 shows the estimation of the average density computed with the R package spatstat (Baddeley and Turner 2005).

Fig. 2
figure 2

Average density of the point pattern of active fires in the region of interest

3 Classifiers

The classifiers allow a supervised classification of burned and unburned pixels. We analyze the dataset using extreme gradient boosting and logistic regression.

The input file is obtained by generating a text file from the raster dataset, and thus, it has the following variables: the differences of spectral indices called d.NBR2, d.MVId.MIRBId.NDVI and d.NIR, the average density of active fires by pixel (aF.int), the distance to the nearest active fire (distAF), and the reference classification as dependent or target variable. It has around 500,000 observations.

3.1 The eXtreme Gradient Boosting method (XGBoost)

XGBoost (Chen and Guestrin 2016) is an advanced implementation of the gradient boosting method with many applications in Earth Sciences (Sahin 2022). It is an ensemble learning method and supervised algorithm, where a single model combines the predictive power of multiple learners. The main ensemble learners are boosting and bagging, both usually based on decision trees, that predict the target variable through several input features. Boosting works with sequential trees reducing errors from previous trees, and it is appropriate for managing large sets of data without specific assumptions. The main advantages of the decision trees are the relative simple structure, the lack of assumptions, and the flexibility and robustness with regard to other methods (Alnahit et al. 2022). Decision trees can effectively deal with nonlinear relationships and diverse variable types, including both categorical and numerical variables. The main difference with other bagging methods such as random forest, is that boosting uses trees with few splits. In the training step, the parameters of the weak learner are fitted iteratively minimizing an objective function. In this application, every learner is compared with its previous learners to minimize the binary classification rate computed as the ratio of the number of wrong cases over the total number of cases.

XGBoost randomly chooses a training set of 75% of observations and uses a tenfold cross validation over the training set to estimate the best hyperparameters. The optimized model is obtained for the hyperparameters achieving the minimum mean error among the folds. The main hyperparameters are: (1) ‘Learning rate’, that scales the contribution of each tree by a factor to prevent overfitting and can make the boosting process more conservative. It varies between 0.1 and 0.5. The optimum is 0.1. (2) ‘Maximum depth of a tree’, that controls the use of deeper trees, generating more complex models. It varies among 1, 5 and 10. Higher depth will result in more complex models, which are more likely to overfit. The optimum is 1. (3) ‘Minimum sum of instance weight needed in a child’, that provides minimum weights for further partitioning. If the tree partition step results in a leaf node with the sum of instance weight less than this weight, then the building process will give up further partitioning. It varies among 1, 3, 5, 7 and 9. The optimum is 7. (4) ‘Control of imbalanced classes’, that is fixed for the training set. This is a very specific hyperparameter that makes XGBoost more competitive than other machine learning methods in burned area detection. It is defined as the ratio of unburned over the burned pixels, i.e. 27.88. XGBoost has been implemented with the xgboost R package (R Core Team 2023), yet we have also used rsat (Pérez-Goya et al. 2021) and dependent packages for downloading, customizing and managing the images, vector and text files.

3.2 Logistic regression (LR)

LR is a popular statistical method for supervised classification (Hosmer et al. 2013) that predicts the probability of belonging to a binary class. Fitting LR models requires several assumptions: (a) a binary response variable, (b) independent observations, (c) absence of multicollinearity among explanatory variables, (d) no extreme outliers and (e) a linear relationship between explanatory variables and the response variable (James et al. 2013). The assumptions are accomplished as follows. The training data set is the same as for XGBoost, and consists in a random choice of 75% of the observations. Therefore, choosing random data relaxes the assumption of independence. The variance inflation factor (vif) allows quantifying the effects of multicollinearity. In this application all predictor variables have a vif less than 5 (far from the limit of 10). We have checked the linearity of the continuous independent variables and their logit (log odds) with the Box–Tidwell test (1962), and the absence of outliers.

In case of very imbalanced data, the LR can underestimate seriously the probability of success (burned pixels) (King and Zeng 2001). Then, the minority group will get a high sensitivity rate and a lower specificity rate. Using an adequate sampling scheme and a weighted procedure we can compensate the differences of successes and failures. This is made with undersampling and weighting (Haixiang et al. 2017). Weights are assigned to data for compensating differences between successes (burned pixels) and failures (unburned pixels). Undersampling consists in drawing at random a number of successes similar (or equal) to the failures, that is, at a ratio of 1:27.88.

The probability of burned pixels (\(Y=1\)) is given by

$$\begin{aligned} \pi=P(Y=1\mid X_1,X_2,\ldots ,X_8)=\dfrac{1}{1+\exp ^{(\beta _0+\sum _{k=1}^8 \beta _k X_k)}} \end{aligned}$$

that can also be expressed as

$$\begin{aligned} \log \left(\dfrac{\pi}{1-\pi}\right)=\beta _0+\beta _1X_1+\beta _2X_2+\cdots +\beta _8 X_8, \end{aligned}$$

where \(\{\beta _i \mid i=1,\ldots ,n\}\) are the coefficients given in Table 2 and fitted by maximum likelihood through an iteratively weighted least squares algorithm. All of the predictor variables are statistically significant except for the difference spectral index d.NBR2. We do not exclude it to keep the same auxiliary variables in both methods. The convergence is reached in a few iterations in less than 1 min.

Table 2 Coefficients, estimated coefficients, standard errors, z-values and p-values obtained for the variables of the training set with LR

3.3 Confusion matrices

The main accuracy metrics used to evaluate the classifiers are shown in Table 3. The true positives (TP) indicate the matches of pixels defined and predicted as burned pixels. The false negatives (FN) are those pixels that are defined as burned pixels but predicted as unburned. The false positives (FP) are those defined as unburned pixels but predicted as burned, and finally the true negatives (TN) are those defined and predicted as unburned pixels. Therefore, the detection rate (D) is the proportion of correctly defined and predicted burned pixels over the set of pixels, the omission error (OE) is the proportion of incorrectly predicted burned pixels over the burned reference set, and the commission error (CE) is the proportion of incorrectly predicted burned pixels over the burned predicted set. The precision (P) is the proportion of correctly predicted burned pixels over the burned pixels. It is also the complement of the commission error \((P=1-CE)\). The recall, also called sensitivity or true positive rate, is the proportion of true burned pixels over the burned reference set. It is the complement of the omission error (\(R=1-OE\)). The Dice coefficient (DC) is a similarity measure lying between 0 and 1. It is defined as double of the overlapping area divided by the total number of pixels in both images (Guindon and Zhang 2017). The kappa coefficient \((\kappa )\) measures the agreement between the reference and the predicted classification.

Table 3 Definition of accuracy metrics, where TP, TN, FP, and FN are true positives, true negatives, false positives and false negatives, respectively

4 Results

XGBoost and LR learn and estimate from the same random training data set of 75% of the pixels. Both use corrections to compensate the very imbalanced data using appropriate weights, but inevitably results can vary due to the randomness of the approximation methods. For avoiding uncertainties, we run both procedures 100 times and derive the accuracy metrics. Negligible differences are found among different runs. Removing the weights for imbalanced data results in an increase in misclassifications. The final predictions of both methods are classified as either 0 or 1, depending on whether the predicted probability is less than or equal to 0.5 or greater than 0.5, respectively.

Table 4 gives the means of the TP, TN, FP and FN proportions obtained when comparing the predicted classification of XGBoost and LR with four different classifications: the reference (EFFIS), the re-defined Lclass, the MCD64A1 and the FireCCI5.1 classification. The EFFIS classification is used as reference in the estimation process, Lclass is a refined classification from EFFIS, MCD64A1 and FireCCI5.1 are classifications based on MODIS products not involved in the estimation process. For the EFFIS reference, the proportions are calculated over the test set (25% of the input dataset) of 126,233 pixels, and in the rest the means are calculated over the total (100%) of 504,933 pixels in 100 runs. The results obtained for EFFIS and Lclass with XGBoost and LR are a bit better than the ones obtained for MCD64A1 and FireCCI5.1 as expected. FN and FP rates are lower in EFFIS and Lclass classifications in both classifiers, but XGBoost tends to provide more FP and less FN than LR in all the scenarios.

Table 4 Proportions of true positives (TP), true negatives (TN), false negatives (FN) and false positives (FP) calculated with the means of 100 runs of XGBoost and LR when compared with the EFFIS reference, the redefined Lclass, the MCD64A1, and the FireCCI5.1 classification

Table 5 shows the means of the accuracy metrics estimated with 100 runs of LR and XGBoost predictions in the four different scenarios already defined. The EFFIS reference has been made with the test set, while the rest of references are obtained with the complete dataset (training and testing). The highest detection rates are provided by XGBoost in the EFFIS (0.033) and Lclass (0.032) classifications, yet LR detection rates are very similar, 0.032 and 0.031 respectively. In all the metrics, the comparison of the predictions with EFFIS and Lclass classifications is more successful than the predictions with the MCD64A1 and FireCCI5.1 products as expected, because these products are not based upon the reference classification. LR is better in Precision and provides a lower number of FP, while XGBoost is better in Recall and provides a lower number of FN in all the scenarios. The similarity between predictions and all the classifications is higher in LR than in XGBoost, as Dice and \(\kappa\) coefficients show.

Table 5 Means of estimated accuracy metrics of the 100 runs of logistic and XGBoost predictions vs. the EFFIS reference, the redefined Lclass, the MCD64A1 and the FireCCI5.1 classification

Figures 3 and 4 show the mean of the classifications provided by XGBoost and LR, respectively vs. the reference, the redefined Lclass, the MCD64A1 and the FireCCI5.1 products in the region of interest. NA pixels, plotted in white, are missing or un-burnable data, the green pixels are the true unburned pixels (TN), roughly 96% of the pixels. The true burned pixels (TP, in blue) represent approximately 3% of the pixels. All panels show a strong coincidence of location and identification. Only 1% of false negatives (FN, plotted in red) and false positives (FP, plotted in yellow) are observed as misclassified pixels. The FP (see Fig. 3) are mainly in the border regions of burn scars, and more frequent in the MCD64A1 and FireCCI5.1 classification. The FN are sparsely distributed (see Fig. 4). Both classifiers have better performance when comparing with the EFFIS and Lcass classifications than when comparing with the MCD64A1 and FireCCI5.1 classifications (Fig. 5).

Fig. 3
figure 3

From the leftmost to the rightmost panel, the XGBoost mean prediction of 100 runs vs. the reference, the redefined Lclass, the MCD64A1 and the FireCCI5.1 classification. NA pixels, plotted in white, are missing data, and pixels in blue, red, yellow and green colors are the true positive, false negative, false positive and true negative pixels, detected in each classification, respectively The highlighted region is zoomed in Fig. 6

Fig. 4
figure 4

From the leftmost to the rightmost panel, the Logistic prediction of 100 runs vs. the reference, the redefined Lclass, the MCD64A1 and the FireCCI5.1 classification. NA pixels, plotted in white, are missing data, and pixels in blue, red, yellow and green colors are the true positive, false negative, false positive and true negative pixels, detected in each classification, respectively The highlighted region is zoomed in Fig. 7

Fig. 5
figure 5

Importance assessment of the predictor variables in XGBoost

The importance assessment of XGBoost is shown in Fig. 5, where d.NBR1 has the highest contribution to the predicted classification, followed by d.MVIdistAFd.NBR2, d.NIRd.NDVId.MIRBI and aF.int. This rank of contribution is expected, because NBR1 is one of the most popular burn index and MVI is the used index in MODIS BA products (Giglio et al. 2018). The rest of variables have lower percentage gain, but they are also crucial for detecting burned areas because they improve the classification process. Specifically, the average density of active fires (aF.int) as predictor variable reduces both the number of FN (2%) and the number of FP (13%) in the estimated confusion matrices.

Figures 6 and 7 zoom the highlighted region of Figs. 3 and 4, respectively. The blue burn scars (TP) are well identified in the four scenarios of both methods, yet in the reference and Lclass classifications identification of burned pixels is higher than in the MCD64A1 and FireCCI5.1 products as can be expected since the detection rate is 10% higher in those cases (see Table 5). More specifically, Fig. 7 shows that LR has a slightly number of FN (plotted in red), but a lower number of FP (plotted in yellow) with regard to XGBoost.

Fig. 6
figure 6

From left to right, the XGBoost mean prediction of 100 runs vs. the reference, the redefined Lclass, the MCD64A1 and the FireCCI5.1 classification in the highlighted region of Fig. 3. NA pixels, plotted in white, are missing data, and pixels in blue, red, yellow and green colors are the true positive, false negative, false positive and true negative pixels, detected in each classification, respectively

Fig. 7
figure 7

From left to right, the Logistic prediction of 100 runs vs. the reference, the redefined Lclass, the MCD64A1 and the FireCCI5.1 classification in the highlighted region of Fig. 3. NA pixels, plotted in white, are missing data, and pixels in blue, red, yellow and green colors are the true positive, false negative, false positive and true negative pixels, detected in each classification, respectively

5 Conclusions

In this work, we evaluate a machine learning algorithm called the extreme gradient boosting algorithm (XGBoost), that outperforms in many cases other machine learning algorithms when detecting burned areas using satellite images, and logistic regression (LR), a traditional statistical method that, in principle, is not specifically oriented to detect burned areas. Both use the same input set of predictor variables defined with the differences of spectral indices for identifying vegetation changes, the distance of every pixel to the closest active fire, and the average density of active fires by pixel, computed using point processes. Auxiliary variables contribute to a better classification in both methods, in particular the differences of spectral indices, but the distance to the active fires and the average density, are also relevant for identifying burned pixels. In LR, because these variables are statistically significant, and in XGBoost because removing them increases the number on misclassifications. Using weights to mitigate the bias effect of imbalanced data also aids in better identifying true fires.

Conceptualization of XGBoost is different of LR, but both present similar results with pros and cons. XGBoost extracts model-like structure from data, without assuming any type of distribution. LR is a well-known parametric method, that requires some assumptions to be fitted. On the other hand, LR offers better interpretability, and demonstrates greater robustness, as the estimated coefficients remain relatively stable even when changing the training dataset. Moreover, it is highly efficient and significantly faster than XGBoost (from less than 1 min in LR to more than 4 h in XGBoost).

In all classifications, LR has better agreement coefficients (Dice, Overall accuracy and \(\kappa\)) but a bit smaller Detection rate. XGBoost has lower omission error and higher commission error than LR. In addition, XGBoost exhibits higher differences in omission error and commission error compared to LR. Specifically, XGBoost achieves a very low number of false negatives (FN) but increases false positives (FP) more than LR does when attempting to reduce FP by increasing FN. Consequently, LR slightly outperforms XGBoost in terms of global accuracy metrics. But more importantly, LR emerges as a simple, explainable, computationally efficient, and highly competitive model for classifying large sets of binary data with imbalanced classes. This makes LR an excellent choice for analyzing burned areas using satellite images.