Background & Summary

Wildfire science is undergoing a data revolution enabled by improvements in the resolution and coverage of satellite-based observations of wildfire1. There has been a steady increase in the number and quality of active-fire monitoring sensors, first and foremost being the Moderate Resolution Imaging Spectroradiometer (MODIS), available 2001-present2, and the Visible Infrared Imaging Radiometer Suite (VIIRS), available 2012-present3. These low-earth orbit systems capture active wildfire data several times daily with no cost to the end user, allowing scientists and wildfire managers to track wildfire ignitions and daily spread4,5,6,7. Historically, data access challenges and processing requirements have limited efforts to consolidate and use these large active fire datasets, but with the launch of open data platforms such as Google Earth Engine8 (GEE) and the NASA Fire Information for Resource Management System9 (FIRMS), these barriers have largely been removed.

Researchers have developed several automated fire-event-delineation algorithms based on active-fire detections, thermal anomalies detected by satellite observation (hereafter “hotspots”). For example, FRY10 is a global database of fire patches built on the novel combination of MCD14A1 hotspots11 (hereafter “MODIS hotspots”) and change-detection maps, which were built on the integration of Collection 6 MCD64A1 burned area12 (hereafter “MODIS burned area”) and FIRECCI4113. The Global Fire Atlas14 and GlobFire15 go beyond fire delineation by using MODIS burned area12 to identify daily fire perimeters, providing information such as daily fire size and estimates of fire front location and direction of fire spread. The Fire Events Delineation16 (FIRED) algorithm provides further improvement through automated temporal clustering of fire events, producing a continuous estimate of fire spread rate and direction for the conterminous United States. However, all four algorithms (FRY10, Global Fire Atlas14, GlobFire15, and FIRED16) are reliant on the change-detection algorithm of MODIS burned area12 instead of fire hotspot interpolation, and thus they are subject to a multi-day temporal uncertainty and a 500-m or coarser resolution. For example, when MODIS burned area12 products were compared against MODIS hotspots, 32% of all MODIS burned area pixels reported fire arrival three or more days after the first hotspot detection12. Newer global burned area mapping algorithms such as FIRECCI5017 and FIRECCI5118 have reported finer spatial resolution and accuracy of day-of-burning estimates over MODIS burned area12, but still suffer from temporal uncertainty, which reduces their usefulness for identifying daily weather conditions during fire spread.

Interpolation of day-of-burning or fire arrival time (hereafter “progression mapping”) can be used to answer complex ecological and fire management questions. Hotspots and hand- or GPS-delineated fire progressions from aerial and ground observations have long been an essential part of wildfire operational management19. However, near-real-time fire progression mapping from satellite observations is gradually being practical20, given that continuous aerial observations by aircraft are no longer necessary. The Fire Event Data Suite7 (FEDS) algorithm uses spatiotemporal aggregation of VIIRS hotspots with an alpha hull to automate fire progression mapping at a 12-hr timescale. However, to better address missing hotspots, most automated fire progression mapping estimates fire arrival date using interpolation as opposed to spatiotemporal aggregation5. When assessed against high-quality infrared fire burned area observations, pooling VIIRS and MODIS and interpolating using kriging or natural neighbour interpolation leads to the best estimates of fire arrival time21. Although these methods are limited to approximately daily-scale accuracy by the limited satellite overpass frequency, methodological and sensor-technology advancements suggest that fire progression mapping on a subdaily scale is becoming possible. For example, coarse-resolution subdaily fire mapping using the Geostationary Operational Environmental Satellites (GOES) is possible for the Continental United States22; however, the satellites’ high view angles and the products’ coarse spatial resolution and high omission errors limit their usefulness at Canadian latitudes.

Improved fire progression mapping represents an opportunity to address a variety of wildfire research questions. For example, progression mapping provides an estimate of the date and thereby the weather conditions under which each pixel burned, necessary information for modelling fire spread under various fire weather conditions6,23,24. When coupled with maps of bottom-up factors such as land cover and topoedaphic characteristics, progression mapping can serve as a foundation for improving the knowledge base of large-scale wildfire spread phenomena, with applications involving wildfire in unconventional settings, such as novel disturbances or treed peatlands. Given that over twenty years of satellite observations of wildfire are available, even relatively rare circumstances or phenomena such as wildland-urban interface fires and high-latitude fires constitute hundreds or thousands of gridded data points for model building.

Here we present the Canadian Fire Spread Dataset25 (CFSDS), an event-based daily fire progression dataset at a resolution of 180 m, built on the combination of high-precision wildfire boundaries and active fire detections from multiple sensors. The CFSDS provides a research-oriented fire growth dataset that is useful for applications such as building statistical and machine-learning fire spread models. In this first release we provide a collection of interpolated wildfire progression maps and associated environmental covariates for all wildfires larger than 1,000 ha that occurred in the forested areas of Canada from 2002 to 2021. Hotspots9 were identified with co-located wildfire perimeters from the National Burned Area Composite26(NBAC). We combined VIIRS and MODIS hotspots9 to produce daily-scale fire progression maps for each fire with at least 6 observed hotspots6. We attributed each burned point (i.e. 180-m pixel) with 50 environmental covariates such as daily weather, derived weather metrics (e.g. drought metrics), topography, and forest attributes. The final product is a dataset of 3,269 fires comprising 70,895 days of burning, each with an estimate of the daily fire spread distance and area of growth.

Methods

We constructed the CFSDS25 from three complementary groups of spatial and temporal data: final fire perimeter polygons, hotspots, and associated environmental covariates. In brief, hotspots were first filtered to predetermined fire perimeters. The detection times of these hotspots were then interpolated using ordinary kriging to provide an estimate of the date at which fire reached any given pixel within the fire perimeter. This raster of fire arrival times (i.e. “day of burning”) was then used to identify environmental covariates associated with the fire (Fig. 1).

Fig. 1
figure 1

CFSDS production process. See Table 3 for covariate descriptions.

Fire event delineation and burned area polygons

We obtained wildfire perimeters from NBAC26, an annual, national fire dataset mapping individual wildfire events in Canada (Fig. 1). NBAC26 integrates three Canadian burned-area products, listed in order of decreasing precedence: pre- and post-fire Landsat imagery processed into normalized burn ratio using the Multi-Acquisition Fire Mapping System27 (MAFiMS); the Canadian National Fire Database28 (CNFDB), created from provincial/territorial wildfire agency polygon data; and the Hotspot and Normalised Difference Vegetation Index Differencing Synergy (HANDS) algorithm29. NBAC26 also includes non-spatial attributes for each fire, such as fire start date and fire end date, used here for subsetting hotspots. From the full NBAC26 record, we selected all fires that: (i) occurred from 2002–2021, and (ii) attained a final area of at least 1,000 ha (Fig. 2).

Fig. 2
figure 2

CFSDS fire polygons from NBAC. Fire year indicates the era of hotspot collection: 2002–2011 includes MODIS only, 2012–2016 includes MODIS and Suomi-NPP (VIIRS), 2017–2021 includes MODIS, Suomi-NPP, and NOAA-20 (VIIRS). Grey indicates mean canopy cover48.

Hotspots

We downloaded all MODIS and VIIRS hotspots within Canada from 2002 to 2021 from the NASA FIRMS9 (Fig. 1). The Collection 6 MODIS hotspots2,30 (MCD14ML, V006/V0061) includes all Terra (MOD14) and Aqua (MYD14) satellite detections at a spatial resolution of 1 km and four daily overpasses. VIIRS hotspots include both the Suomi-NPP31 (VNP14IMGT) and NOAA-2032 (VJ114IMGTDL_NRT) satellite detections at a spatial resolution of 375 m and four daily overpasses3.

In total, this consisted of 3,731,070 hotspots, including all pooled MODIS and VIIRS hotspots9. The temporal resolution of the data depends on the year it was collected, due to satellite launches: four times daily from 2002 to 2011, six times daily from 2012 to 2017, and eight times daily from 2018 to 2021. We filtered hotspots to individual fire events using NBAC26 polygon date and spatial location (Fig. 1). To account for hotspot precision limitations, all hotspots within 1,000 m of a fire polygon were considered for interpolation. Any hotspots occurring more than 30 days before the recorded fire start date or more than 30 days after the recorded fire end date were removed, to eliminate spatially coincident burning (from agricultural or silvicultural debris burning, for example) while still allowing for possible errors in fire start and end date attribution. We also applied a filter based on hotspot “confidence”, as reported by FIRMS9; MODIS hotspots with “confidence” less than or equal to 10 (range 0–100) and VIIRS hotspots with “confidence” of “LOW” (range “LOW” to “HIGH”) were removed, given that these hotspots are commonly associated with sun glint9. Hotspots were also adjusted to represent local standard time based on geographic location.

Fire arrival time interpolation and processing

For each individual fire, fire arrival times were interpolated using kriging on a 180-m resolution base grid (Fig. 1). This resolution was selected to produce outputs that were a practical size for storage and upload, while still being a multiple of our covariate datasets’ resolution. Using all hotspots corresponding to an individual fire, we fit a variogram on hotspot detection times, as represented by the decimal numerical day of year (i.e. Feb. 1, 1200 hr as 32.5). We fit the experimental variogram using an exponential, spherical, and Matern model, and subsequently selected the best-fitting model for each fire. We used the default nugget, sill, and range values as determined by the gstat R package33. Using ordinary kriging, we interpolated the fire arrival time based on the fire-specific variogram fit. The resultant fire arrival time raster was then masked to burned areas as identified in NBAC26. The fire arrival date was rounded to the closest (temporally) date that was observed in the 6 nearest hotspots; this ensured that the predicted burning date occurred during detected fire growth and was not an artefact of averaging6.

The phenomenon of “peak burning” describes the tendency for wildfires in Canada to burn most intensely in the late afternoon or early evening34, a time of day during which the MODIS and VIIRS satellites are not present overhead. This artificially delays wildfire detection; for example, the VIIRS satellite overpasses occur at approximately 0100–0230 hr and 1300–1430 hr for the Canadian boreal zone. Therefore, the uncorrected day-of-burning rasters are biased towards late, nighttime detections, which may not always represent true fire arrival time. To limit the impact of these missing observations, we assigned any hotspot observed between 1900–0800 hr to the previous day. We assigned an observed time of 1900 hr to these hotspots, immediately after diurnal “peak burning” as determined by peak Fire Radiative Power (FRP) using GOES-16 and GOES-1735. In this case, the temporal rounding of hotspots (i.e. kriging inputs) is not to be confused with temporal rounding of the day-of-burning grid (i.e. kriging outputs).

Zero-growth days

Days with no observable fire growth were included in the dataset as zero-growth days, defined as any day with less than three pixels (9.72 ha) of fire growth. Environmental covariates were defined using the spatial location of pixels from the next day of non-zero fire growth (a “fire growth day”). The subsequent fire growth day was taken as the next sequential day in which the fire grew more than three pixels. False zero-growth days are possible, but less likely due to the high detection efficiency of nighttime active fire detections30,31,32.

Environmental covariates

Daily fire arrival time interpolations were intersected with spatial datasets of 50 environmental covariates (Table 1) (Fig. 1), providing a record of the conditions under which each pixel burned. Environmental covariates were resampled to the same 180-m resolution grid used for interpolation of fire progression, where needed. We resampled using a mean aggregate function for continuous variables and a mode aggregate function for categorical variables.

Fire weather

We obtained weather data from the ERA5-Land dataset36, produced globally, hourly at 0.1 deg resolution. Specifically, 2-m air temperature, dewpoint temperature, total precipitation, 10-m u and v components of wind and snow cover variables were downloaded over Canada for the temporal period of the study (2002–2021). The data were further processed to provide the necessary inputs to the Canadian Forest Fire Weather Index (FWI) System37: the noon LST values of temperature, relative humidity, wind speed and the 24-hr accumulated precipitation ending at noon LST each day. The FWI System produces six indices: Fine Fuel Moisture Code (“FFMC”), Duff Moisture Code (“DMC”), Drought Code (“DC”), Initial Spread Index (“ISI”), and Buildup Index (“BUI”), and Fire Weather Index (“FWI”). The FWI System variables were calculated only for the snow-free period corresponding to the main fire season each year; season start was determined by the first day following three consecutive days of less than 50% snow cover. The DC is adjusted at startup by applying an overwintering correction based on the amount of overwinter precipitation38.

In addition to the FWI System outputs, the following daily weather variables were also included in the dataset: the maximum daily temperature (“Tmax”; °C), precipitation summed over 24 hr, from noon the prior day (“Prec”; mm), noon wind speed at 10-m elevation (“WS”; km/h), noon relative humidity (“RH”; %), and maximum vapour pressure deficit over a 24 hr period (“VPD”; hPa). Fire weather data is provided for all fires at the CFSDS data repository25.

Fuels

We represented modeled pre-fire fuels with the Spatialized Canadian National Forest Inventory data product39 (SCANFI). In brief, Landsat multispectral imagery was trained on national forest inventory plots and used to map pre-fire fuels at a 30-m resolution, including percentage deciduous component (“prcD”), percentage coniferous component (“prcC”), biomass in tonnes/ha (“Biomass”), and crown closure percentage (“Closure”).

Topography

All topographic variables were derived from the NASA ASTER Global Digital Elevation Model V00340 30-m digital elevation model (“DEM”). This DEM was subsequently used to calculate elevation (“elev”), slope (“slope”), aspect (“aspect”), and topographic wetness index (“TWI”).

Peatlands and surface water

Peatland presence (“peatland”) and peatland class (“peatCl”) are the modeled probability that any pixel is a peatland, and the modeled peatland type from 9 possible classes (e.g., bog, fen, etc.) (Table 1). These estimates were based on the combination of MODIS reflectance data trained on National Forest Inventory plots to estimate the distribution of treed and open peatlands at a resolution of 250 m, supplemented with Landsat-based land-cover datasets41.

Table 1 Peatland classes41 used in “peatCl” covariate.

Proportion of surface water (“hydrodens”) within a 2 km, 5 km, 10 km, and 25 km of each pixel was calculated using the 50 K CanVec Series Hydrographic Features dataset42.

Anthropogenic

We represented road densities (“roaddens”) and distance to road (“roaddist”) using the 50 K CanVec Transport Features dataset43. Mean road density within 2 km, 5 km, 10 km, and 25 km of each pixel was calculated as the total length of roads per km2. Distance to road is based on linear distance to major roads or rail lines, not including temporary roads such as logging or access roads.

Fire characteristics

We calculated fire daily growth characteristics from the day-of-burn raster (NFIREID_YEAR_krig.tif). This includes daily area burned in hectares (“firearea”), cumulative area burned (“cumuarea”), and percent growth over the prior day’s total area (“pctgrowth”). The day since the start of fire (“fireday”) was calculated until extinguishment, including zero-growth days. Fire spread (metres per day) was estimated from the cumulative area burned relative to the cumulative area burned for the previous day. This assumes circular growth in a constant direction, according to the Eq. 1:

$$sprdistm=2(\sqrt{cumuarea/\pi }-\sqrt{(cumuarea-firearea)/\pi })$$
(1)

using area in m2. This calculation is an approximation of frontal fire spread, and not true linear spread44. This is an approximation of spread distance that has lower precision than the underlying hotspot data, and this value is not rounded based on the underlying resolution of hotspot data.

Ecozones

Ecozones, which are large geographic areas sharing biophysical characteristics, were identified from the National Ecological Framework for Canada45 (Table 2). The Boreal Shield and Taiga Plains ecozones have been split into “East” and “West” subzones46.

Table 2 Ecozone names and associated “ecozone” code45.

Data Records

The complete CFSDS25 for 2002–2021 is provided open access at the Centre for Open Science OSF data repository (https://doi.org/10.17605/OSF.IO/F48RY). This includes two datasets for quantitative analysis and one collection of fire arrival time interpolations:

Firegrowth_pts_v1_YEAR.csv – a data table where each row indicates one pixel (180 m) that burned on a specific day. Each row representing a pixel is attributed with the 50 covariates.

Firegrowth_groups_v1.csv – a summarised data table where each row indicates a single fire growth day. Environmental covariates are summarised over each specific day of a fire by taking the mean value of continuous covariates and the mode of categorical covariates. Latitude and longitude values represent the centroid of fire spread on each specific day.

YEAR_NFIREID_krig.tif – raster dataset depicting fire arrival time for a single fire, with values indicating the day of year at which the fire arrived in any given 180-m pixel. For example, a pixel burned on February 10 would have an integer value of 41. NFIREID and YEAR are replaced with attributes as identified in NBAC26. Rasters are grouped by year for download.

In the Firegrowth_pts and Firegrowth_groups datasets, data columns indicate the covariates under which the fire burned, by pixel and fire growth day respectively (Fig. 1). These variables are listed in Table 3. These datasets are intended for use in model building and statistical analysis.

Table 3 Environmental covariates and other associated fields.

Technical Validation

Validating day-of-burning rasters is difficult because scant “ground truth” validation data is available, making it hard to assess with confidence the accuracy of our day-of-burn rasters. The best data available are fire progression perimeters from wildfire agencies (“agency perimeters”); however, the regular production of agency polygons is limited by resource availability and they are known to slightly underpredict fire arrival time6. Therefore, instead of quantifying absolute error, we validated CFSDS25 perimeters by assessing the degree to which they were an improvement over daily FIRED16 progressions for all of Canada, when compared against agency perimeters (Fig. 3). FIRED16 is the most recent and precise automated fire arrival time dataset with global coverage, compared against FRY10, Global Fire Atlas14 or GlobFire15. Where all three datasets (FIRED, CFSDS25, and agency perimeters16) were available we assessed CFSDS and FIRED against the agency perimeters using two methods: daily perimeter similarity using Sørensen-Dice index and daily area-burned correlation. Wildfire agency perimeters were provided by the provinces of British Columbia (84 fires, n = 497), Alberta (18 fires, n = 96), Saskatchewan (13 fires, n = 31), Manitoba (17 fires, n = 59), and Quebec (12 fires, n = 58).

Fig. 3
figure 3

Estimated fire arrival time for the 2018 Nadina Lake Fire in British Columbia. Fire arrival estimated from the agency polygons (a), FIRED (b), and CFSDS (c).

Preprocessing

The three datasets were substantially different and it was therefore necessary to subset them for a fair comparison. First and most importantly, the agency perimeters vary substantially in quality, including high-quality sources such as infrared aerial platforms and medium- or low-quality sources such as aerial GPS observations of fire perimeters. We only used agency perimeters with higher spatial and temporal precision as a validation dataset, defined as perimeters derived from: aerial photography; GPS aerial observations; processed infrared aerial imagery; Sentinel-2 satellite observations; and Landsat satellite observations. Both CFSDS25 and FIRED16 were subset to include only perimeters from dates which had representative observations in the agency polygons.

Second, CFSDS25 is unfairly advantaged against FIRED16 because CFSDS is constrained with an a priori fire perimeter, whereas FIRED is not. CFSDS25 often matches the final agency perimeter very closely due to similar data sources being used for fire delineation in NBAC26; therefore, we excluded the final perimeter as a data point. Instead all validation stats were calculated on fire progression polygons from the period while the fire was still expanding and we compared area burned as a percentage of final perimeter size instead of absolute area burned. Using percentage area burned prevents FIRED16 from being penalised by a high false-negative rate due to the lack of an a priori perimeter. Finally, to further reconcile the three datasets, we introduced unburned islands to FIRED16 and the agency perimeters by masking them with the fire NBAC26 polygon, as was done for CFSDS25.

Perimeter similarity using Sørensen–Dice index

The Sørensen–Dice index is broadly used in image segmentation and more broadly to examine classification problems. It is equal to twice the area of overlap between perimeters divided by the total number of pixels in both perimeters (Eq. 2).

$${\rm{DSC}}=\frac{2TP}{2TP+FP+FN}$$
(2)

Where TP is true positive, FP is false positive, and FN is false negative. The index varies from 0 to 1, with 0 indicating no spatial alignment and 1 indicating perfect spatial alignment. We assessed perimeter similarity by calculating the Sørensen–Dice index between CFSDS-Agency and FIRED-Agency pairs for the same date. We used this index as an indicator of mean absolute improvement of CFSDS25 over FIRED16.

Validation by Sørensen–Dice index indicates that CFSDS25 agrees more closely with agency polygons than FIRED. Averaged across all provinces, CFSDS25 has an index of 0.73 ± 0.31, while FIRED16 has an index of 0.47 ± 0.32 (Table 4, Fig. 4). This is likely due to the lower spatial resolution of FIRED16 and the temporal lag commonly associated with MODIS burned area products12.

Table 4 Provincial summary of Sørensen–Dice index of agency polygons compared against FIRED16 and CFSDS25 perimeters.
Fig. 4
figure 4

Sørensen–Dice index of agency polygons compared against CFSDS (a) and FIRED (b) polygons. Each point represents the mean index of a single fire, averaged across all days of burning.

Daily burned area correlation

Validation by comparing daily area burned is an established method for intercomparison of remotely sensed fire progressions against agency polygons47. We calculated the Spearman correlation between each dataset’s prediction of daily area burned for individual fires, showing that the CFSDS25 consistently outperforms FIRED (Fig. 5), with CFSDB having a Spearman’s ρ 0.06–0.28 higher than FIRED. On a daily basis, both datasets tended to underpredict total area burned when compared against the agency perimeters, although this could also be attributed to agency perimeters overpredicting daily area burned. While MODIS burned area products12, such as FIRED16, have been shown to be poorly suited for burned area assessments of smaller (<100 ha) fires47, our fire size threshold of 1,000 ha minimised this bias. It is notable that even with this advantage to FIRED16, CFSDS25 was the strongest performer.

Fig. 5
figure 5

Daily area-burned validation by province. Agency polygons from the provincial wildfire agencies of Alberta (a), British Columbia (b), Manitoba (c), Quebec (d), and Saskatchewan (e) provide the validation dataset to compare against the CFSDS or FIRED. Points represent a single agency polygon burning day compared against FIRED or CFSDS using percentage burned of the final perimeter. Dashed black line represents 1:1 fit, or perfect agreement with agency polygons. Dashed, coloured lines represent a linear regression fit of all points. Spearman correlation coefficients are indicated with inset boxes.

Usage Notes

Users interested in examining the spread of a specific fire or set of fires should identify the fire IDs of these fires using NBAC26, available at https://cwfis.cfs.nrcan.gc.ca/datamart/metadata/nbac. Individual fire events were assigned a unique ID by the concatenation of the ‘YEAR’ and ‘NFIREID’ fields from NBAC26. For example, a fire from 2007 with an NFIREID of 346 would have an ID of “2007_346” and a corresponding interpolation filename of “2007_346_krig.tif”.

CFSDS v1.025 will be expanded annually with corresponding covariates as better data become available, and will be made available at the same data repository. (https://osf.io/f48ry/) (https://doi.org/10.17605/OSF.IO/F48RY). This data descriptor was peer reviewed in 2024 on v1.0 of the dataset, and based on the data available at the time.