Background

Accurate and representative in situ observations of forests, such as those from National Forest Inventories (NFIs), are needed to report robust statistics on forestry and to assess land cover maps based on remotely sensed observations. This is especially true since regular and accurate Earth Observations are being made, for instance by the fleet of Landsat and Sentinel satellites. These developments are opening avenues to better combine classical NFIs statistical surveying and remote sensing-derived products in the domains of forest monitoring.

In this light of thoughts, the main objective of this opinion paper is to respond to a recent criticism made by Breidenbach et al. (2022) regarding the use of Landsat remote sensing imagery to estimate recent harvest rates across EU countries (Ceccherini et al. 2020), in the hope that constructive dialog can effectively lead to progress in forest monitoring. More specifically, the three arguments brought forward by Breidenbach et al. (2022)—based on the sole use of NFIs plots—are that (1) the change in forest loss reported for Sweden and Finland in Ceccherini et al. (2020) is overestimated due to the increased sensitivity of the Global Forest Change—GFC (Hansen et al. 2013)—product after 2015; (2) the sample-based validation we performed in Ceccherini et al. (2021) is flawed because it uses Landsat data; and (3) estimates from Ceccherini et al. (2020) are inconsistent with those derived from a GFC validation based on plot data of NFIs. We conclude by suggesting possible venues for further research.

Main text

Remote sensing-based assessment of European harvest from clear-cuts (Ceccherini et al. 2020) has triggered extensive debate. Here we discuss the limitations and potentials of remote sensing and NFI-based methods in the assessment of harvest rates while addressing the three main criticisms on the Ceccherini paper moved by Breidenbach et al. (2022), and we suggest constructive ways forward to ensure complementary of the different approaches.

Increased sensitivity of the GFC product after 2015. On this claim (1), Breidenbach et al. are not bringing any new element to what has been discussed in Nature Matters Arising (Ceccherini et al. 2021). The undocumented change in the Global Forest Change - GFC (Hansen et al. 2013) algorithm between 2015 and 2016 has been already confirmed in Palahì et al. (2021) and more recently in a GFC blog (GFC, 2021). The impact of that change on harvest statistics has been assessed and reported for the first time in our rebuttal and accompanying documents (i.e., Grassi et al. 2021a; Ceccherini et al. 2021).

Circularity in sample-based validation due to Landsat data. Concerning this claim (2), we argue that the statements of Breidenbach et al. are not correct and resulting from a limited understanding of the validation exercise presented in Ceccherini et al. (2021). In their line of reasoning, the use of Landsat to determine the timing of forest cover loss in the sample-based validation is questioned (Ceccherini et al. 2021) because “Landsat became more sensitive in detecting forest cover loss over time, many losses that occurred in or before the first period are thus detected in the second period” and therefore conclude that “Landsat cannot be used to validate a Landsat based product.”

First, we note that the sample-based validation presented in Ceccherini et al. (2020) is based on the state-of-the-art methodology used by the lead authors in the field and by the GFC authors (e.g., Olofsson et al. 2014; Olofsson et al. 2020; Potapov et al. 2015). Second, it is based solely on visual assessment of independent aerial photographs. We did not use GFC or any other Landsat-based dataset for the classification of ground points. Harvest or stable forest classes were determined exclusively with aerial photographs. Furthermore, the statements by Breidenbach et al. (2022) are incorrect because it was not Landsat that became more sensitive in 2016, but the classification algorithm used in the production of the GFC maps. In fact, the most recent sensor (i.e., Landsat 8) has been operational since 2013 and, therefore, its sensitivity did not change in 2016. Therefore, as already broadly discussed before (Ceccherini et al. 2021), the increased sensitivity of the GFC product after 2015 fully depends on the undocumented change in the GFC detection algorithm and not on a change in the satellite imagery. The visual assessment of Landsat-derived NDVI time series was used only to attribute to a specific year the harvest—already assessed with independent photos only. For this reason, the increased sensitivity of the GFC algorithm clearly did not affect the validation, making claim (2) incorrect.

Inconsistencies between remote sensing and NFIs validation. With this claim (3), Breidenbach et al. (2022) question the validity of the corrected estimates reported in Ceccherini et al. (2021), based on the results of a validation exercise that relies on NFIs’ plots to correct the GFC dataset. While we fully recognize the key value of NFIs in providing high-quality in situ observations, it should be stressed that the current sampling design of NFIs is typically targeted towards the collection of country-wide statistics on forest area and biomass, and not towards the validation of remote sensing products of land cover change such as the GFC. For this reason, their application to that scope comes with shortcomings that lead to biases and high uncertainties not considered in the analysis. In the following paragraphs, we demonstrate that these shortcomings are undermining the validity of the validation exercise presented, to the point that the results cannot be considered as a reference for the evaluation of other studies.

Notably, also Breidenbach et al. (2022), in the very last paragraph of the Appendix, consider their approach non-optimal for estimating actual harvested area for the following two reasons. First, the NFIs field observations are repeated with a multi-annual frequency and therefore the attribution of the harvest year in their validation is uncertain and has been forced to match the GFC loss year, potentially leading to biased estimates. Second, Breidenbach et al. (2022) recognize that NFIs use “stand-level observations around the sample plots for area estimation rather than only plot-level measurements,” leading to a clear spatial mismatch between satellite retrievals and the ground truth, when the substantially larger stand-level observations are related to the 30-m Landsat pixels. Unfortunately, they did not elaborate further on the potential implications of these issues and neither tried to estimate the impact of these shortcomings on the uncertainty of their estimates.

In the following paragraphs, we present a broader perspective on the limitations on the use of NFIs data to validate a land cover change map. Several drawbacks become apparent when working with the NFIs plots. These drawbacks hinder the further use of the NFIs data by the remote sensing community and in particular by users who are active in the fields of land cover change mapping. Altogether, these limitations are questioning the robustness of NFI-based validation of GFC. Such drawbacks include:

Difficulty to have an appropriate a priori stratified sampling. The spatial sampling of NFIs based on permanent plots cannot be stratified a priori by stable forest and harvest classes as recommended in sample-based validation schemes (Olofsson et al. 2014; GFOI 2016). Stratified sampling is key to reach an acceptable standard error for estimated user accuracies in this type of work for the simple reason that the area harvested annually is very small (1–2% of the total forest area). As a result, the NFIs sampling is severely unbalanced, with the sampling effort devoted to the stable forest layer being at least one order of magnitude larger than the sampling in the area affected by final felling. Such a limited sample size for the rare forest harvest class typically does not lead to an acceptable standard error for estimated user accuracies. In other words, despite the large sampling effort, appropriate statistical representation of forest loss is intrinsically not ensured with the NFIs design and is potentially problematic, since the detection of harvest rate is not the primary scope of national inventories. Furthermore, with NFIs plots only, it is not possible to use a priori the buffer strata to reduce the weight of omission errors, as recommended by Olofsson et al. (2020). The buffer strata is the area mapped as stable forest around forest loss that is likely to reduce the uncertainty in area estimates for sample-based validation. The Global Forest Change (or other remote sensing-based) map, while not being suitable for mapping forest change area as it is, can be used as a stratification layer for developing a sampling scheme whereby loss areas and their buffers can be used for change assessment. Also, the limited sampling of the loss layer is particularly severe in the most recent years of the time series. In fact, given that the NFIs of both Finland and Sweden are based on a 5-year rolling program, the sampling of the harvest after 2016 is not complete yet and is therefore based on a number of observations systematically lower than those of the previous years (reduction of 37% and 42% of observations available per year in 2016–2018 compared to 2011–2015 for Finland and Sweden, respectively), as shown in Fig. 1a. Despite this sharp reduction in the sampling during the 2016–2018 period, the uncertainty range for final felling in Fig. 1 in Breidenbach et al. (2022) remains identical to the previous years, while we expect them to increase, as the uncertainty should be proportional to the sample size (e.g., the standard error changes for samples of different size).

Fig. 1
figure 1

Histogram of NFIs’ forest plots affected by different types of harvesting for Finland and Sweden (a). Note that the number of intact forest plots is equal to 10,861 and 20,725 for Finland and Sweden, respectively. Percentage of managed areas affected by final felling for Finland and Sweden (b)

Because of these limitations of the sampling design, harvest statistics purely based on NFIs field data are rather uncertain, but the combined use of Earth Observation data could indeed increase their precision. In addition, since the number of sample plots that are subject to management operation is rather low, it is impossible to spatially disaggregate the results, if not by merging NFIs data with remote sensing products.

Difficulty in assessing commission errors. The analysis presented by Breidenbach et al. (2022), as stated also by the authors, does not allow a full assessment of the commission error on the harvest statistics (areas where GFC assumes a harvest event that is not confirmed). This is particularly relevant since map-based estimates are prone to both commission and omission errors that typically affect map statistics in opposite directions. The correction of only one of the two errors might lead to biased results. The scientific literature and guidelines on the sample-based correction of maps clearly highlight the correction of both omission and commission errors as a key element to assure consistency in the workflow and correctness in the adjusted results (Olofsson et al. 2014; GFOI 2016).

Difficulty in the temporal attribution of loss year. Given the uncertainty related to the periodic sampling of ground data (5 years interval) combined with that of the GFC classification, the temporal attribution of the management operation is different between NFIs surveys and GFC. For this reason, Breidenbach et al. (2022) forced the loss year of the NFIs plot with GFC data where the latter were available, introducing a considerable uncertainty in the process. In fact, using the data of the harvest year in the product under validation (i.e., GFC) to fix the timing of the event in the ground truth data introduces circularity in the process and may generate substantial errors. In addition, the temporal mismatch between harvest events between NFIs and GFC is inconsistent between countries. For Sweden, there is a systematic offset of about 1 year between the GFC and NFIs harvest year, while for Finland there is no systematic difference but still considerable uncertainty in the year of loss, as shown in Fig. 2. As a result, the match in harvest year between the two data sources is limited to 35% and 20% of the NFIs plots for Finland and Sweden, respectively.

Fig. 2
figure 2

Histogram of temporal mismatch between GFC and NFIs attribution of forest harvest. Delta is the difference between the year of measurement of the NFIs plot and the year of change by GFC

The uncertainty in the attribution of the loss year triggers an important methodological issue in the temporal attribution of omission errors to a specific observation year. In fact, while the year of harvest can be matched to the GFC year for the plot of confirmed loss, in the case of omission error this cannot be done since the GFC date is not available. For this reason, a temporal misalignment between confirmed loss and omission is expected, given the temporal mismatch between the two datasets reported in Fig. 2. This systematic problem makes the correction of omission errors in the validation intrinsically incorrect.

Spatial mismatch between remote sensing and in situ data. The spatial mismatch between satellite and surface data triggered by the use of stand variables instead of plot data, as reported by Breidenbach et al. (2022), may hamper the assessment of the map error at the edge of clear-cuts. Our previous assessment (Ceccherini et al. 2021) and other papers on the subject (e.g., Olofsson et al. 2020) proved that a large fraction of the uncertainty in the classification of tree cover loss occurs at the edge of gaps, where the sensitivity of the retrieval may be affected by a change in the detection algorithm. To properly address this fundamental aspect of the validation, a strict majority criterion at the scale of the pixel has to be applied, while the adoption of stand-level observations is unacceptable, being the edge plots across two stands under contrasting management.

We argue that these limitations of NFIs data as validation dataset have not been considered in the formulation of a proper uncertainty analysis. Ultimately, this leads to the conclusion that the validation of GFC based on NFIs data is not sufficiently robust in the assumptions, and neither comprehensive in the assessment of uncertainties, to become a reference dataset for the evaluation of the exercise presented in Ceccherini et al. (2021).

Breidenbach et al. (2022) comment that “Combining the GFC map with adequate reference data into reliable estimators can prove very useful for estimating harvested area and related C-stock losses, as illustrated in various studies.”

We fully agree that NFIs data are an important source to further refine the analysis and assessments based on remote sensing. For instance, the dataset of Breidenbach et al. (2022) shows that in Sweden the fraction of final felling on the total managed area has increased substantially in 2016–2018 compared to previous years, as reported in Fig. 1b. Since this figure is calculated as the ratio between observations at NFIs plot, it is self-consistent and robust.

Unfortunately, this type of plot-level NFIs data has not been openly distributed by countries so far, with few commendable exceptions. In addition, accurate geographical coordinates of the plots, which are essential to match ground and satellite data, are typically not shared or shared with degraded precision. This closeness on the sharing of surface plot data is currently limiting the uptake of this important data source in the remote sensing community. An improved openness on data-sharing compatible with the required protection of land owners’ geo-privacy and closer cooperation between scientific communities would indeed be very beneficial for advancing the monitoring of forest resources.

Conclusions

Ground surveys are fundamental for the assessment of forest resources both in the present time and, even more, for the future, when the expansion of Earth observation will require increased availability of reference surface data. On this point, we fully agree with Breidenbach et al. (2022). However, the suitability of this datastream for the validation and calibration of remote sensing products is still suboptimal, because the design of the sampling schemes and the field protocols of NFIs are not designed to be used in combination with Earth Observations. To overcome these limitations, careful analysis on the requisites for integration between surface and satellite data should be performed at the stage of the design of the NFIs’ sampling scheme and the definition of field protocols.

Although we appreciate the disclosure of NFIs data presented in Breidenbach et al. (2022), we show that their statements do not acknowledge the current shortcomings of NFIs data as a validation set of land cover change maps and to properly size the uncertainty of their estimates. We remark that the status quo in the EU relies on sparse and heterogeneous NFIs data that at present are not sufficient to run an EU-wide validation.

In summary, the key challenge is to avoid widening the gap between what is well-assessed with NFIs and in the context of Earth Observation, and the diverse demands of users. This can only be achieved through a long-term collaborative effort among forestry and remote sensing communities. For the future, we envisage a full integration of NFIs with satellite data to harness the potential of an integrated approach in advancing the monitoring of EU forest resources.

We conclude by suggesting possible venues for further research. First, deploying a remote sensing compliant protocol to be performed on NFIs where in situ surveying is planned, checking the temporal and spatial match between the satellite footprint and the NFIs plot. This might guarantee continuity in NFIs dataset, while trying to find synergies between statistical in situ surveying and Earth observation. Secondly, products such as Global Forest Change map, while not being considered as appropriate for mapping trends of forest change area as it is, can be used as a stratification layer for developing a sampling scheme whereby loss areas and their buffers are used for change assessment. Thirdly, future research should aim to solve the issue of data anonymization to preserve the geo-privacy of land owners. For example, a Cloud Service (e.g., a Jupyter notebook) managed by designated entities could automatically extract time series of remote sensing observations over NFIs plots without revealing their exact geolocation.