Live fuel moisture content (LFMC) time series for multiple sites and species in the French Mediterranean area since 1996

Live fuel moisture content (LFMC), the ratio of water mass to dry mass of living shoots, is a primary driver of wildfire activity (Chandler et al. 1983; Dennison andMoritz 2009; Nolan et al. 2016) and fuel flammability (Marino et al. 2012; Rossa et al. 2016; Fares et al. 2017 Ruffault et al. 2018). LFMC is an input variable in several fire behavior models (Sullivan 2009; Alexander and Cruz 2013) and is often implicitly accounted for in fire hazard indices in Mediterranean areas (e.g. Viegas et al. 1999; Ruffault and Mouillot 2017). Despite the importance of LFMC for a wide range of wildfire research studies, its estimation at stand to landscape scales is still highly uncertain, because LFMC results from complex interactions between the antecedent and concurrent weather and several biological mechanisms that influence water content (i.e. plant water relations) and dry matter accumulation (i.e. carbon allocation at the leaf level) (Turner 1981; Jolly et al. 2014). There is therefore a need for robust and longterm LFMC datasets to improve our understanding of LFMC variations and refine our predictions. In 1996, the French organization in charge of protection of the Mediterranean forest (DPFM) initiated the systematic measurement of LFMC to improve its operational fire danger rating system during the fire season. Weekly measurements have been performed in various sites and shrub species over the fire-prone French Mediterranean. This operational network, called the “Reseau Hydrique” (what could be translated as “hydric network”) has been operated by the National Forest Service (Office National des Forêts (ONF)) since then. To date, the “Reseau Hydrique” produced a dataset that includes 584 “Sites × Years”, on 24 species, with 7 to 20 measurement dates per year. In addition, rainfall amounts during the fire season have been recorded since 2009 on some sites. The raw dataset is currently available on demand via the Reseau Hydrique website but, in its current form, cannot be easily used for scientific purposes for several reasons: (i) the database is not referenced (i.e. does not have a DOI); (ii) information is in French only; (iii) the labels and names of sampling sites and species names are not always consistent; (iv) outliers, duplications and inconsistencies in LFMC data


Introduction
Live fuel moisture content (LFMC), the ratio of water mass to dry mass of living shoots, is a primary driver of wildfire activity (Chandler et al. 1983;Dennison and Moritz 2009;Nolan et al. 2016) and fuel flammability (Marino et al. 2012;Rossa et al. 2016;Fares et al. 2017Ruffault et al. 2018. LFMC is an input variable in several fire behavior models (Sullivan 2009;Alexander and Cruz 2013) and is often implicitly accounted for in fire hazard indices in Mediterranean areas (e.g. Viegas et al. 1999;Ruffault and Mouillot 2017).
Despite the importance of LFMC for a wide range of wildfire research studies, its estimation at stand to landscape scales is still highly uncertain, because LFMC results from complex interactions between the antecedent and concurrent weather and several biological mechanisms that influence water content (i.e. plant water relations) and dry matter accumulation (i.e. carbon allocation at the leaf level) (Turner 1981;Jolly et al. 2014). There is therefore a need for robust and longterm LFMC datasets to improve our understanding of LFMC variations and refine our predictions.
In 1996, the French organization in charge of protection of the Mediterranean forest (DPFM 1 ) initiated the systematic measurement of LFMC to improve its operational fire danger rating system during the fire season. Weekly measurements have been performed in various sites and shrub species over the fire-prone French Mediterranean. This operational network, called the "Reseau Hydrique" (what could be translated as "hydric network") has been operated by the National Forest Service (Office National des Forêts (ONF)) since then. To date, the "Reseau Hydrique" produced a dataset that includes 584 "Sites × Years", on 24 species, with 7 to 20 measurement dates per year. In addition, rainfall amounts during the fire season have been recorded since 2009 on some sites.
The raw dataset is currently available on demand via the Reseau Hydrique website 2 but, in its current form, cannot be easily used for scientific purposes for several reasons: (i) the database is not referenced (i.e. does not have a DOI); (ii) information is in French only; (iii) the labels and names of sampling sites and species names are not always consistent; (iv) outliers, duplications and inconsistencies in LFMC data have not been corrected; (v) measurement uncertainties (confidence levels) are not provided.
Here, we describe a revised version of the "Reseau Hydrique" dataset. The database was cleaned up and robust estimations of LFMC uncertainties were computed. Metadata description and accessibility are also provided.

Methods
We document the raw LFMC dataset as well as a refactored version that satisfies scientific purposes. We describe protocols for data collection, the clean-up process and confidence interval estimators, as well as some preliminary assessments of data quality.

Site and species description
In each French administrative unit (called "départements") of the fire-prone Mediterranean area, between one and three sampling sites were selected according to the climatic heterogeneity and the averaged levels of fire hazard observed in the surrounding areas (Fig. 1a, Table 1). Since 1996, measurements have been carried out over 50 sites among which 35 are geolocalized. There are currently 30 active measurement sites in the region (Fig. 1a, Table 1). All sites are located on a south-exposed slope and include a shrubby layer, possibly associated with a sparse tree layer. The site labels are "DmSn" where m is the "département" number and n, the site number within a "département". In the refactored database presented below, site identifiers are unique and static (contrary to the operational database). Data from non-geolocalized (NG) sites are labelled "DmSNGn" with m the "département" index number and n the site index number within a "département". The different sites cover a wide range of summer water availability (Fig. 1b), expressed as the yearly average ratio between rainfall and evapotranspiration during the fire season (June to September). We estimated both quantities with the SAFRAN climatic dataset, which provides daily climatic variables on an 8-km grid over France (Vidal et al. 2010). In the network, one to three shrub species are sampled in each site. They are selected among the dominant species. When species show both shrub and tree habits, only shrubby individuals are sampled.

LFMC and rainfall measurements
Apical and lateral shoots of branches fully exposed to the sun are sampled from different individuals of a given species within plots on the order of 1000 m 2 representative of the surrounding landscape. Sampled individuals were chosen to be representative of the average status of each species. Samples were used to fill five 0.35 l-aluminium containers sealed with paper tape, corresponding to an overall fresh mass on the order of 50 to 75 g. Once at the laboratory, samples were weighed fresh, oven dried at 60°C during 24 h and weighed dry. The sampling operations take place at ca. 12:00 UT. Fuel moisture content is computed and released on fresh mass basis by the ONF. In the refactored database, values are provided in dry mass (see supplementary S1 for conversion). At each sampling date, rainfall amount from manual gauges since the previous measurement is recorded (data available since 2009).

Measurement frequency
Sampling starts every year at the beginning of the fire season (generally in June) and ends in between August and October, depending on fire danger levels. Between 1996 and 2009, the sampling frequency was once or twice a week, depending on the uncertainty of the fire danger evolution. The sampling of the day was cancelled when a rainfall event had occurred less than 2 days before or was forecasted for the day.

Raw data processing, clean-up and robust estimators
Raw data were first converted to a dry weight basis, leading to samples of up to five individual measurements of LFMC. Because the sample size is small, the common estimators of mean and standard deviation are highly sensitive to outliers and thus are not appropriate. We therefore applied the following two-step procedure to filter the dataset and compute robust estimators of the mean and standard error.

Manual filtering of the data
The "manual filter" consisted in setting to NA (not available) doubtful or irrelevant values, for each individual measurement of LFMC as well as for the mean daily value computed by the ONF for operational purposes. Note that 1857 samples were already set to NA by the ONF during data collection. Each individual measurement was set to NA in two cases: (a) When the mean value was indicated as NA by the ONF (indicating uncertainties regarding the corresponding sample), all individual measurements were set to NA. (b) When values were outside of the classical biological range for LFMC (> 250 or < 20) and differed by more of 100% from other values collected at the same site, the same day for the same species.
All filtered values are identified in the third table (LFMC_raw_Table.csv) with a flag: (FlagV i = 1, when value i is manually filtered). The figure in supplementary S2 shows the distribution before and after processing the manual filtering.

Robust estimator for LFMC and LFMC confidence interval
The following method for robust estimation was held for a given Site×Species and was applied to obtain mean and error values at each date (t).
Robust estimations of LFMC The common estimators of LFMC at each date t for a sample of i = 1 to n(t)≤ 5 individual measurements LFMC i (t) can either be the mean LFMC t ð Þ, or the median g LFMC t ð Þ of the individual values. However a more robust estimator of LFMC, referred as robust mean LFMC t ð Þ can be obtained using a bisquare weight function of the median (Mosteller and Tukey, 1977) The weights W i allow to correct the median for the scattering of the individual data and are defined by: In Eq. (3),σ t ð Þ is a robust estimator of standard deviation of samples at date t, based on the sample median absolute deviation (Mosteller and Tuckey 1977): The constant 0.6745 makes the estimate unbiased assuming a normal distribution of residuals.
The standard deviation estimates, however, exhibit an unrealistic distribution because of the small size of the samples (see supplementary S4). We thus consider an alternative approach based on the computation of the standard deviation from all LFMC measurement values available for a given Site×Species (i.e. all dates). This approach assumes that the standard deviation of individual LFMC measurement is constant with time (year and day of the year) and in particular with the LFMC mean, which was acceptable in our case. Let N be the number of Fig. 1 a Location of the currently active and closed geolocalized sites (Sx) within the sampling region. White numbers in the background are the French administrative units ("Départements"). b Boxplot (in black) and mean value (red dots) of the yearly climatic water availability (ratio of rainfall over ETP for the June to September period, R/ETP) for each site as computed by using SAFRAN climatic analysis. Sites are ranked in increasing order of water availability. Site codes on b provided are identifiers of the form "DmSn" where m is the "département" number (i.e. administrative county) and n, the site number within a "département" Table 1 List of the geolocalized sites of the Reseau Hydrique available in the database. The site code provides an identifier of the form "DmSn" where m is the "département" number (i.e. administrative county) and n, the site number within a "département"

SiteCode
SiteName measurements of a given Site×Species. The standard deviation of individual measurements is expressed as follows: This standard deviation is properly estimated as it relies on hundreds of individual measurements (N j ). The corresponding value can be used instead ofσ t ð Þ within the robust estimation of LFMC described above in Eqs. (1)-(2). Such process can be iterated and converged after a few iterations for bothLFMC t ð Þ andσ j . We illustrate the relationship between this robust mean and the median in supplementary S3.
Robust estimator of the standard error At this stage, potential remaining outliers within individual measurements are identified and filtered by using the criteria of Thompson's Tau at 5% computed for each Site×Species (Thompson 1985): with N as defined above, t s the student's value based on α = 0.05 for a degree of freedom and df = N − 2. Residuals greater than τσ are considered as outliers. This leads to the identification of 3981 potential outliers that are identified within the raw data table (FlagV i = 2, when value i is identified as an outlier). The robust standard deviation estimate (Eq. 5) and the number of valid measurementsn t ð Þ (n t ð Þ≤ n t ð Þ≤ 5Þ can be used to evaluate a robust standard error for each sample k: The 95% confidence interval forLFMC t ð Þ is: The distribution of robust standard errors is discussed and compared to the distribution obtained using the common median-based standard error in supplementary S4. The benefit of this approach is illustrated in Fig. 2, which exhibits much smoother confidence interval patterns than when computed with standard methods.
Zenodo is an open international research data repository created by the CERN and OpenAire that allows to store and share research datasets and provides digital object identifier. The whole dataset consists of four tables: 1. The first table (LFMC_final_Table.csv) contains the live fuel moisture content (LFMC). These are the robust estimates of LFMC and their associated standard errors which were both estimated from raw data with the method described above. Each row in the table describes the LFMC at a given date, for a given species and at a given site. The table has 11 columns. The first eight columns indicate the site identifier (SiteCode and SiteName), the species (Species), a unique identifier for a given species at a given site (Site×Species) and the date (date, year, month, day of year). The last three columns are respectively the robust LFMC meanLFMC (labelled RobustLFMC), the standard errorŜE (labelled RobustStandErrLFMC) and the number of valid measurementsn (labelled RobustNval). RobustStandErrLFMC can be used to estimate confidence limits depending on the desired confidence rate. 2. The second table (RainTable.csv) contains site identifiers (SiteCode and SiteName) and rainfall measurements (rainfall), corresponding to rainfall occurring between the day of year of the previous measurement (labelled PreviousDoy) and the day of year when the measurement was performed (labelled Doy). The last column is a flag that enables to identify the doubtful measurements (RainFlag = 1), when the discharge of the gauge during the previous measurement was uncertain. 3. The third table contains raw data (LFMC_raw_Table.csv).
The 4 Technical validation Figure 2 shows typical LFMC seasonal variations for two species at a given site and their 95% confidence limits (shaded Fig. 2 Example of LFMC and rain series as reported in the database: a raw data and b robust estimates. Variations are reported here for the site "Le Télégraphe" (D13S2) for a seeder (Rosmarinus officinalis) and a resprouter (Quercus coccifera). Shaded areas indicate the 95% confidence limits. The site code is an identifier of the form "DmSn", where m is the "département" number (i.e. administrative county) and n, the site number within a "département" area) before (Fig. 2a) and after (Fig. 2b) data processing. Rainfall series are also shown. LFMC raw data are altered by outliers and the small sample sizes (Fig 2a, e.g. first day of measurement for Quercus coccifera) and has an erratic and large standard error compared to the data corrected with robust estimators (Fig 2b). Figure 2 shows that LFMC generally decreases along the drying periods and increases after rainfalls. Some exceptions exist (Figure 2, Quercus coccifera after the rain event around Doy 210), suggesting that other processes, such as phenology or tissue ageing, might affect LFMC variations (Jolly et al. 2014).
As a basic analysis, we applied a linear model to predict the lowest annual values of LFMC (10th percentile) with different variables including (i) a linear effect of a drought index (rain/ETP from June to September) computed each year for each site with the French climate reanalysis SAFRAN (see Section 2), (ii) a regeneration strategy effect accounting if a species is a seeder or resprouter (this information was taken from the Brot database Paula et al. 2009) and (iii) a site effect. The global model leads to a R 2 of 0.51. All parameters were significant ( Table 2). The model coefficients are reported in supplementary S6 and indicate that (i) drought was associated with a decrease in lowest annual values of LFMC, that (ii) resprouters tend to have a higher minimum LFMC than seeders in agreement with physiological expectations (Vilagrosa et al. 2014) and that (iii) unidentified features related to sites (e.g. soil, leaf area index) contribute to determining LFMC. A graphical illustration of the three effects is given in supplementary S6.
The robustness of rainfall measurements of geolocalized sites was also evaluated with SAFRAN. We found a significant relationship between both datasets (supplementary S5). The rainfall and LMFC peaks generally co-occur (e.g. Figure 2). This variable, however, should be used with caution as no wind shelter protected the rain gauge and as the minimum distance from the surrounding vegetation (normally equal to four times the height of the vegetation) was not always applied (due to vegetation growth and the need for protection of the rain gauge from potential robbery).

Discussion and conclusion
The dataset can be used for the following applications: Altogether, this database may help to improve live fuel moisture and fire danger modelling. If drought is a primary driver of LFMC, it also depends on phenology and tissue ageing (Jolly et al. 2014), which should be accounted for in LFMC modelling. Also, the spatial variability of LFMC may partly be explained by local soil and vegetation properties. Ongoing field measurements carried out by our group aim at consolidating sites descriptions, with additional information regarding soil, vegetation cover and structure, as well as site history (fire occurrence or fuel cut during the measurement period), and will be released as soon as possible. Such data are critical to better understand and model LFMCdynamics. Table 2 Analysis of variance of the linear model used to predict the minimum annual values of LFMC (10th percentile). Explanatory variables included a drought index (rain/ETP from June to September) computed for each site with the French climate reanalysis at 8-km resolution (SAFRAN, Vidal et al. 2010), the regeneration strategy of the species (seeder or resprouter) and a site effect. Because interactions were not significant, all terms were set additive in this model.