1 Introduction

1.1 REDD+ Forest Reference Emission Levels and Baselines

At the 21st Conference of the Parties (COP) in 2015, Parties to the UNFCCC set out an ambitious plan—the Paris Agreement—for global action on climate change mitigation and adaption. To mitigate climate change, the Paris Agreement promotes the implementation of policy approaches and positive incentives for reducing emissions from (a) deforestation and (b) forest degradation, and for increasing (c) the conservation of forest carbon stocks, (d) the sustainable management of forests, and (e) the enhancement of forest carbon stocks (or what is commonly referred to as REDD+). Two years prior, at the 19th COP in 2013, Parties agreed on a set of decisions known as the Warsaw Framework for REDD+, which among other things, set guidelines for the development of national (or in the interim, subnational) forest monitoring systems (UNFCCC 2013a) and forest reference emission levels/forest reference levels (FRELs) (UNFCCC 2013b), which are required for countries to receive results-based payments for REDD+ activities (UNFCCC 2010).

A FREL is a projected level of greenhouse gas emissions (or removals, which are considered negative emissions), given in tons per year for a specific timeframe in the future. It is calculated based on historical emissions from one or more of the REDD+ activities of (a–e) above, with adjustments based on national circumstances permitted when justified. The FREL serves as a baseline to measure a country’s performance in mitigating climate change in the forest sector. Specific methodologies for calculating historical emissions and generating FRELs are not stipulated in UNFCCC decisions, which provides flexibility by allowing for differences in national circumstances/capacities and for FRELs to be improved over time as national capacities and data quality/quantity increase (UNFCCC 2011). However, at COP15, Parties agreed that the historical emissions must be calculated using a combination of remote sensing (RS) and ground-based forest carbon inventory (GBFCI) approaches (UNFCCC 2009). The Earth observation data are typically processed using RS image-processing techniques (e.g., supervised classification and change detection algorithms) to map forest changes over time, while the resultant emissions from these changes are estimated using the GBFCI data (Intergovernmental Panel on Climate Change 2006). After calculating the historical emissions, future emissions are then projected as the final step of generating the FREL. Various modeling approaches exist for projecting future emissions and some common approaches include: the use of the historical average emissions rate (i.e., assuming no change in future emissions), temporal regression modeling using historical emissions data (with or without other nonspatial data that reflect the drivers of historical emissions), and spatiotemporal modeling using historical emissions data (with or without other nonspatial data) as well as spatial data that reflect the drivers of historical emissions (e.g., slope maps, locations of roads, etc.) (Busch et al. 2009; Huettner et al. 2009).

Apart from its use for generating national/sub-national FRELs, Earth observation data are also being used for REDD+ implementation at the individual site or project level, and many REDD+ projects are ongoing or already completed in voluntary carbon trading markets. The development of REDD+ projects for these voluntary carbon trading markets was spurred by a decision from the 13th COP in 2007, which encouraged REDD+ demonstration activities and invited developed countries to mobilize resources to support these efforts (UNFCCC 2007). To assure the quality of REDD+ projects, various certification schemes have been established, with the Verified Carbon Standard (VCS) being one of the largest certifiers of projects selling credits into the forest carbon markets (Hamrick and Goldstein 2015; McDermott et al. 2012). In contrast to the FRELs, for VCS-certified REDD+ projects there are detailed methodological requirements on how the future emissions levels (hereafter referred to as “baselines” to differentiate them from FRELs) can be projected. For example, VCS methodologies for generating baselines of emissions from unplanned future deforestation and/or forest degradation (DD) require that the future rate of DD in the project area be estimated based on the historic rates of DD in an area where the agents and drivers of deforestation as well as the soil type/slope/elevation are similar to those found in the project area (e.g., in a larger “reference area”) (Verified Carbon Standard 2013), so Earth observation data of the “reference area” are needed to generate the baseline.

From this discussion, it is clear that Earth observation data are essential for developing FRELs and baselines at the national/subnational and project levels, respectively. Thus, the Earth observation derived historical emissions estimates and the projections of future emissions must be accurate to ensure the contributions of REDD+ activities to climate change mitigation are correctly quantified. The Earth observation data quality (e.g., spatial resolution of the data) has a direct impact on the accuracy of the historical forest maps used to estimate emissions (Hilker et al. 2009), while the accuracy of some modeling approaches for projecting future emissions (e.g., regression models) will also be affected by the Earth observation data quantity (number of historical forest maps produced). For these reasons, it is necessary to assess the quality and quantity of Earth observation data being used, as well as how the data are being used, to really understand the contribution of Earth observation to the implementation of REDD+.

1.2 Objectives

This study had two main objectives. The first was to better understand the quality and quantity of Earth observation data being used for REDD+ at the national/subnational and project levels. New sources of Earth observation data are becoming available every year, so it should be possible to increase the quality and quantity of the data used for REDD+ FRELs/baselines over time. To evaluate the quality and quantity of the Earth observation data currently being used at the country and project levels, we computed four metrics related to the spatial and temporal resolution and scale of the data. A previous study (Johnson et al. 2016) performed a similar analysis based on the Earth observation data used by the six countries that submitted their proposed FRELs by the end of 2015, but in this chapter we expand the scope of analysis to include nine additional countries (those that submitted their proposed FRELs in 2016) as well as 15 REDD+ projects.

The second objective was to understand whether or not the Earth observation data quality and quantity were determining the types of REDD+ activities and modeling approaches used to generate FRELs/baselines. For this, we computed two additional metrics related to methods used for FREL/baseline generation: the number of REDD+ activities assessed by Earth observation data for FREL/baseline generation and the level of complexity of the model used to project future emissions. The types of activities that Earth observation data can be used to accurately estimate historical emissions for, as well as the level of complexity of the model used to project future emissions, are both affected by the Earth observation data quality and quantity, so we hypothesized that countries/projects using higher quality and quantity Earth observation data should be assessing historical emissions from more REDD+ activities and also using more complex models to project future emissions. If this is not the case, some factors other than the Earth observation data quality and quantity may be the primary determinants of the REDD+ activities assessed and/or the models selected to project future emissions.

2 Methods

2.1 Data

As of April 2016, 15 countries have submitted reports for technical review to the UNFCCC secretariat containing the details of their proposed FRELs.Footnote 1 The reports contain detailed information on the Earth observation data that countries used to estimate historical emissions as well as the approaches they used to project future emissions from the historical data. This information was used to calculate the six metrics in this study. At the project level, similar information was obtained for a sample of 15 VCS-verified REDD+ projects (Verified Carbon Standard 2016a). Tables 14.1 and 14.2 list the countries that have submitted their proposed FRELs to the UNFCCC as of April 2016, as well as the locations of the 15 sample VCS-verified REDD+ projects.

Table 14.1 Metric values calculated at the national/subnational level
Table 14.2 Metric values calculated at the project level

2.2 Metrics

Four metrics were calculated to assess the quality and quantity of Earth observation data used to estimate historical emissions:

$$ \begin{array}{*{20}l} {\% mapped} \hfill & {{\text{percentage of country }}\left( {\text{or reference area for projects}} \right){\text{ mapped}}} \hfill \\ {} \hfill & {{\text{using Earth}}\,{\text{observation}}\,{\text{data}}} \hfill \\ \end{array} $$
(14.1)
$$ \begin{array}{*{20}l} {\% mapped} \hfill & {{\text{length}}\,{\text{of}}\,{\text{historical}}\,{\text{assessment}}\,{\text{period}}\,{\text{for}}\,{\text{which}}\,{\text{Earth}}\,} \hfill \\ {} \hfill & {{\text{observation data}}\,{\text{was}}\,{\text{used}},\,{\text{in}}\,{\text{years}}} \hfill \\ \end{array} $$
(14.2)
$$ \begin{array}{*{20}l} {\# maps} \hfill & {{\text{the}}\,{\text{number}}\,{\text{of}}\,{\text{maps}}\,{\text{produced}}\,{\text{during}}\,{\text{the}}\,{\text{historical}}\,{\text{assessment}}} \hfill \\ {} \hfill & {\text{period}} \hfill \\ \end{array} $$
(14.3)
$$ spatial\_res\quad {\text{spatial}}\,{\text{resolution}}\,{\text{of}}\,{\text{Earth}}\,{\text{observation}}\,{\text{data}} $$
(14.4)

Of these metrics, Eqs. 14.1–14.3 are indicators of the Earth observation data quantity (area extent and temporal scale/resolution), while Eq. 14.4 is an indicator of the data quality because it determines the minimum changes in forest extent that can be detected. Because the value of Eq. 14.2 may have an effect on the value of Eq. 14.3 (a longer timeframe allows for a greater #maps), we assessed the correlation between the two metrics using the Spearman rank correlation test (p < 0.05) (Wayne 1990).

For countries/projects that generated multiple FRELs/baselines using different historical assessment periods, the value reported in Eq. 14.2 is that of the most recent historical assessment period. Additionally, some countries/projects used forest maps from before or after the designated historical assessment period to estimate the historical emissions (i.e., by interpolating the forest extent from maps generated before/after the assessment period), so in these cases the values for both Eqs. 14.2 and  14.3 were changed to match the first and last image years to better reflect the actual Earth observation data used. Finally, for countries/projects that used multiple Earth observation data sources (e.g., because finer spatial resolution data became available in more recent years), the value reported for Eq. 14.4 was that of the coarsest resolution data set because this data set limits the finest-scale changes that can be detected over the entire historical assessment period. After calculating these metrics, median values were computed to show the typical quality and quantity of Earth observation data being used at the national/subnational and project levels.

In addition to the four Earth observation data quality/quantity metrics, two additional metrics were calculated to assess how the Earth observation derived historical emissions were used for FREL/baseline generation:

$$ \begin{array}{*{20}l} {\# activities} \hfill & {{\text{number}}\,{\text{of}}\,{\text{REDD}} + {\text{activities}}\,{\text{assessed}}\,{\text{by}}\,{\text{Earth}}\,{\text{observation}}\,} \hfill \\ {} \hfill & {{\text{data for}}\,{\text{FREL}}/{\text{baseline}}\,{\text{generation}}} \hfill \\ \end{array} $$
(14.5)
$$ \begin{array}{*{20}l} {model\_complexity} \hfill & {{\text{level}}\,{\text{of}}\,{\text{complexity}}\,{\text{of}}\,{\text{the}}\,{\text{model}}\,{\text{used}}\,{\text{to}}\,{\text{project}}\,{\text{future}}} \hfill \\ {} \hfill & {\text{emissions}} \hfill \\ \end{array} $$
(14.6)

Some countries/projects assessed historical emissions for multiple REDD+ activities using Earth observation data, but did not include all of them in their FREL/baseline due to an unacceptable level of estimation uncertainty, so the value reported for Eq. 14.5 was limited to the number of REDD+ activities actually included in the FREL/baseline. For Eq. 14.6, we assigned ordinal values to the different modeling approaches for projecting future emissions (described in Sect. 4.4.1) based on the quantity of Earth observation data they can incorporate to project future emissions. The historical average modeling approach was assigned a value of “1” because it can be calculated using only 2 years of Earth observation derived data: a map from the starting year and another from the ending year of the historical assessment period. The temporal regression modeling approach was assigned a value of “2” because increasing the timeframe and #maps can increase the model’s prediction accuracy (e.g., R2 value). Finally, the spatiotemporal modeling approach was assigned a value of “3” because increasing the timeframe and #maps and incorporating ancillary spatial data sets (which may also be derived from Earth observation data) can increase the model’s prediction accuracy. Other approaches for projecting future emissions have not been used thus far for FREL/baseline generation, although some FRELs/baselines were slightly modified to account for anticipated changes in deforestation drivers (e.g., by including a multiplier to increase the projected future emissions).

2.3 Comparison of Metric Values at National/Subnational and Project Levels

As discussed in Sect. 4.4.1, there are typically more specific guidelines that must be followed at the project level compared to at the national/subnational level. To assess whether this resulted in significant differences in the values of any metrics, we compared the metric values at the two different levels using the nonparametric Wilcoxon rank sum test (p < 0.05) (Wilcoxon 1945). This test was selected because the metric values were not parametric (values at the project level are determined at least in part by the methodological guidelines that vary depending on a project’s location). Chile was excluded from this comparison because its data quantity was not uniform (varied by region).

2.4 Relationship Between Earth Observation Data Quality/Quantity and Methods Used for FREL/Baseline Generation

To evaluate the relationship between Earth observation data quantity (timeframe and #maps) and the #activities and model_complexity, we performed ordered logistic regression modeling at both the national/subnational level and the project level. The Earth observation data quality (spatial_res) was excluded from this regression modeling because it was uniform for all countries and projects, as discussed later in Sect. 4.4.3.1. The full regression model for each dependent variable is:

$$ \# activities\sim timeframe + \# maps $$
(14.7)

and,

$$ model\_complexity\sim timeframe + \# maps $$
(14.8)

In regards to model_complexity, at the national/subnational level a simple logistic regression model was used because the dependent variable had only two values (1 or 2).

To explain #activities and model_complexity at the national/subnational and project levels, respectively, we selected the best among four different regression models: a null model (no explanatory variables), a model using only timeframe, a model using only #maps, and a full model using both timeframe and #maps. Model selection was done based on the Akaike Information Criterion (AIC) (Akaike 1974; Crawley 2005). AIC is a measure of the relative goodness of fit of a statistical model and allows for comparisons among nested models. It is calculated considering the tradeoff between the bias and variance in model construction, and given by:

$$ {\text{AIC}} = 2 {\text{k}} - 2 \ln \left( L \right) $$
(14.9)

where k is number of model parameters and ln (L) is the log-likelihood for the statistical model. The model with the lowest AIC value has the highest quality among the models compared. For more information on AIC, readers are referred to a recent review article by Aho et al. (2014).

As the number of samples was relatively low at both analysis levels in our study, we tested the significance of the potential logistic regression models using the parametric bootstrap approach. Bootstrapping entails randomly drawing (with replacement) from the samples to construct a “bootstrapped” set with the same size as the original sample size (some of the original samples may be drawn more than once or not at all in the bootstrapped set), and it is often used to calculate the bias and standard error of a model (Efron and Gong 1983). In this study, the #activities or model_complexity of each nation/subnation or project were drawn from for the bootstrapping and we calculated the differences between the log-likelihood of the bootstrapped logistic regression model and that of the null model. We performed the bootstrapping 1000 times and compared the log-likelihood difference for the bootstrapped data set with that of the original data set in each iteration. The logistic regression model results were considered to be significant if the log-likelihood difference for the bootstrapped data was smaller than that of original data set in more than 95% of the iterations. All of the statistical analyzes in this section and in Sect. 4.4.2.3 were conducted using R 3.2.5. Chile was excluded from the analysis in this section due to its nonuniform data quantity.

3 Results

3.1 Quality/Quantity of Earth Observation Data Used

The values of Eqs. 14.1–14.6 at the national/subnational level are shown in Table 14.1, while the values and applied methodologies at the project level are shown in Table 14.2.

At the national/subnational level:

  • %mapped ranged from 22%–100% with a median value of 100%;

  • timeframe ranged from 8–22 years with a median value of 13 years;

  • #maps ranged from 2–15 with a median of 3.5;

  • spatial_res was 30 m for all countries (all used Landsat data, although some countries also used higher resolution imagery);

  • #activities ranged from 1–5 with a median of 1 (typically deforestation);

  • model_complexity ranged from 1–2 with a median of 1.

At the project level, one of the four methodologies listed below (Verified Carbon Standard 2016b) were used in all of the 15 projects:

VM0006:

Methodology for Carbon Accounting for Mosaic and Landscape-scale REDD Projects

VM0007 REDD+:

Methodology Framework (REDD-MF)

VM0009:

Methodology for Avoided Ecosystem Conversion

VM0015:

Methodology for Avoided Unplanned Deforestation

As can be seen in Fig. 14.1, there were some clear differences in the Earth observation data quality/quantity between projects using different methodologies. As one example, the VM0015 projects all used three maps, while the VM0009 projects used between six and eight maps. However, for the projects in general:

Fig. 14.1
figure 1

Relationship between historical assessment period and number of maps used to project future emissions in FRELs and VCS-certified REDD+ projects

  • %mapped was always 100%;

  • timeframe ranged from 9–26 years with a median of 13 years;

  • #maps ranged from 2–15 with a median of 3;

  • spatial_res was always 30 m;

  • #activities ranged from 1–2 with a median of 1; and

  • model_complexity ranged from 1–3 with a median value of 2.

Regarding the test of correlation between timeframe and #maps, we found that the two metrics were not significantly correlated at either the national/subnational or project levels (p < 0.05, Spearman’s rank correlation test; scatterplot shown in Fig. 14.1).

3.2 Comparison of Metric Values at National/Subnational and Project Levels

Comparing the metric values of national/subnational and project levels, we can see that quite similar data and methods were used at both levels. There were no differences in Earth observation data quality (all used 30 m resolution data). In terms of Earth observation data quantity, there were no significant differences in the timeframe or #maps (p < 0.05, Wilcoxon rank sum test). In terms of how the data were used, there was no significant difference in #activities (p < 0.05, Wilcoxon rank sum test), while model_complexity at the project level was significantly higher than that at national/subnational level (p < 0.05, Wilcoxon rank sum test).

3.3 Relationship Between Earth Observation Data Quality/Quantity and Methods Used for FREL/Baseline Generation

Results of the regression analyses indicated that #activities and model_complexity were not well explained by the Earth observation data quantity (Table 14.3), as the AIC values were lowest for the null models in all cases. Due to the high uniformity of model_complexity at the national/subnational level (all countries except Peru had the same value, as shown in Table 14.1), regression analysis could not be done at this level for this variable. For the other cases, the full model was worst at explaining both #activities and model_complexity at the project level, but was second best at explaining #activities at the national/subnational level. The parametric bootstrap tests also showed that the models that included the Earth observation data quantity variables (full models, and models with either timeframe or #maps) did not explain significantly better than the null models (p < 0.05).

Table 14.3 Akaike Information Criterion (AIC) values of regression models explaining the #activities and model_complexity at the national/subnational and project levels. Lower AIC values indicate a higher quality model

4 Discussion and Conclusions

The results provide an overview of the Earth observation data quality/quantity being used for REDD+ at the national/subnational and project levels. Most countries and projects have covered 100% of the county/project area using Earth observation data, used a historical assessment period of at least 13 years, produced at least three maps over the assessment period, and used 30 m spatial resolution data. Synthetic Aperture Radar (SAR) data are often suggested as a complementary data source to optical data for forest monitoring and REDD+ (Reiche et al. 2016) (but have not been used thus far for REDD+), and free approx. 30 m resolution SAR data sets have become available in recent years (e.g., 25 m resolution ALOS PALSAR-1/-2 mosaic data from 2007), so if the median values calculated are considered as minimum requirements then PALSAR-1/-2 mosaic data could be used as a complementary data source starting from the year 2021. The Committee on Earth Observation Satellites (CEOS) provides a searchable online database with details of the past, current, and future Earth observation satellite missions of its member agencies, which could further aid these types of searches (Committee on Earth Observation Satellites 2016). It should be noted that aside from the metrics calculated, other metrics could be computed to obtain even more information on Earth observation data quality/quantity.

Comparing the data/methods used at the national/subnational and project levels, it was found that quality and quantity of the Earth observation data used at the two levels did not differ significantly despite the fact that projects covered smaller areas (and thus are likely to be easier to acquire and process Earth observation data for). The methodological guidelines at the project level may have somewhat constrained these differences (e.g., methodology VM0015 states that the historical assessment period should be no more than 10–15 years, so even if Earth observation data for a longer period are available and easy to acquire, they cannot be used). The number of REDD+ activities assessed using Earth observation data at the two levels was also not significantly different. However, the level of model complexity was significantly higher at the project level than at the national/subnational level, and this was likely due to the stricter methodological requirements at the project level and possibly the greater ease of acquiring/managing/processing ancillary spatial and nonspatial data at the project level (due to the smaller scale).

In terms of the relationship between Earth observation data quality/quantity and the methods used to generate FRELs/baselines, we found that neither the data quality (all countries/projects used 30 m data) nor the data quantity were strong determinants of the number of REDD+ activities considered for FRELs/baselines nor the level of complexity of the models used to project future emissions. Although a larger sample size would of course give higher confidence to the analysis (there is a chance that the null model was selected by AIC because of the limited sample size), the results of this study generally suggest that countries/projects are selecting which REDD+ activities to consider and which modeling approaches to use based on factors unrelated to Earth observation data quality/quantity. In regards to the number of activities considered, it is likely that countries/projects are focusing their efforts for now on the REDD+ activity (or the few activities) responsible for the highest level of greenhouse gas emissions, which is typically deforestation, even if enough data are available for assessing other activities.

In regard to the modeling approach selected for projecting future emissions, while there are strict requirements on the models that can be used at the project level, there is great flexibility at the national/subnational level, so it is possible that countries have other motivations besides model prediction accuracy for their selection of a particular modeling approach. For example, a country may opt for a model that predicts higher levels of future emissions to make it less likely that their actual future emissions exceed these predicted levels. For countries with decreasing rates of emissions in more recent years (e.g., Brazil, Indonesia, and many others), the simplest modeling approach—use of the historical average emission rate—will predict higher future emissions than the more complex modeling approaches that take into account the downward trend in emissions, so these countries may be reluctant to use the more complex modeling approaches. In contrast, for countries with increasing rates of emissions in more recent years (e.g., Peru), the more complex modeling approaches that take into account the upward trend in emissions will project higher future emissions, so these countries may be more willing to select a more complex modeling approach. If the goal is that, as Earth observation data quality/quantity increase over time, the methods used to project future emissions for REDD+ also improve, there may be a need for more specific guidance to be given on how countries facing different national circumstances should project their future emissions.