Recommendations for the use of tree models to estimate national forest biomass and assess their uncertainty

Three options are proposed to improve the accuracy of national forest biomass estimates and decrease the uncertainty related to tree model selection depending on available data and national contexts. Different tree volume and biomass equations result in different estimates. At national scale, differences of estimates can be important while they constitute the basis to guide policies and measures, particularly in the context of climate change mitigation. Few countries have developed national tree volume and biomass equation databases and have explored its potential to decrease uncertainty of volume and biomasttags estimates. With the launch of the GlobAllomeTree webplatform, most countries in the world could have access to country-specific databases. The aim of this article is to recommend approaches for assessing tree and forest volume and biomass at national level with the lowest uncertainty. The article highlights the crucial need to link allometric equation development with national forest inventory planning efforts. Models must represent the tree population considered. Data availability; technical, financial, and human capacities; and biophysical context, among other factors, will influence the calculation process. Three options are proposed to improve accuracy of national forest assessment depending on identified contexts. Further improvements could be obtained through improved forest stratification and additional non-destructive field campaigns.


Introduction
Forests produce a number of benefits including the provision of resources (e.g., timber, fruits, and medicines), the regulation of ecosystem services (e.g., air and water cycles, climate, pollination, and nutrient cycling), and a contribution to culture (e.g., aesthetics and education) (Millenium Ecosystem Assessment 2005). About 13 million hectares of forest was converted yearly to other uses during the period 2000(FAO 2010. Creating new incentives for forest management and conservation, such as payments for environmental services, could provide specific conditional compensation for a voluntary specific action which preserves or expands forest resources (Wunder 2007). A demand for environmental services could be generated through private preferences (e.g., ecotourism), public preferences (e.g., species protection), or international policies (e.g., capped carbon emissions).
Under the United Nations Convention Framework on Climate Change (UNFCCC), the activities related to forest land in developing countries have become one of the potential key mechanisms for climate change mitigation (UNFCCC 2011a, b). This mechanism, named REDD+, aims to mitigate climate change through reduced greenhouse gas (GHG) emissions and removing GHG through enhanced forest management in developing countries. REDD+ will be implemented following a stepwise approach depending on each country's circumstances (Herold et al. 2012). The ultimate phase involves moving to a more direct result-based actions, i.e., emissions and removals that should be fully measured, reported, and verified. In consequence, REDD+ will require robust and transparent national forest monitoring systems for producing estimates that are transparent, consistent, as far as possible accurate, and that reduce uncertainties, taking into account national capabilities and capacities (UNFCCC 2009).
There are still large uncertainties in assessing the contribution of forest activities to the carbon cycle at national (Morton et al. 2011;Pelletier et al. 2012) and pantropical levels (Achard et al. 2014;. While significant efforts focus on mapping forest land area and carbon changes (Saatchi et al. 2011;Hansen et al. 2013;Achard et al. 2014), it is crucial and necessary to improve the quality of field plot biomass estimates .
Uncertainty in local forest biomass and carbon stock estimation can result from different sources of errors-sampling and model errors . While the sampling error can be reduced by optimizing the sampling design, improving the quality of the equations and the methodology to select and use them can reduce the model error. Various authors report significant differences in forest biomass and carbon stock estimates depending on the used model (Kenzo et al. 2009;Melson et al. 2011;Alvarez et al. 2012;Kuyah et al. 2012;Ngomanda et al. 2014). The adequate development and use of tree models are a key to improve the robustness of forest carbon and stock change estimates.
Tree models and volume tables are crucial for quantifying many forest services such as the production of commercial volume, bioenergy, and biomass. Most countries use volume models and biomass expansion factors to report national forest biomass (FAO 2010), although these models were developed for different purposes and used in different ways to meet specific objectives at local and national levels. Some countries, such as Mexico, Indonesia, and Vietnam, are developing databases and guidelines for the use of tree models (Inoguchi et al. 2012;Krisnawati et al. 2012;. The first step is defining the most appropriate approach to biomass and carbon stock assessments (both at local and national levels). The fact that few countries possess a national database for tree models can be explained by various obstacles, access to information and data sharing being probably the most important . Existing models can be quality controlled and validated. It is generally necessary to test the applicability of the models (UNFCCC 2011a, b) under a particular situation (i.e., test whether the described tree allometry changes because of differences in soils, elevation, or climate relative to the site where the equation was originally constructed).
The second step is defining the appropriate approach to analyze the forest inventory data taking into account the models available. There are a multitude of methods for the use of volume and biomass equations to ensure the most accurate results. These may involve the use of specific methods or default values, depending on the contribution of the inventory categories to the total assets as well as the available national financial, technical, and human capacities. It is important to note that the desired accuracy should be at least enough to detect changes in biomass and carbon stocks, not just stocks at a given time.
Adequate calculation of forest biomass using existing models may rely on decision trees. Decision trees to use volume and biomass equations rely on a set of general and specific criteria that will guide the selection of models.
General criteria to guide the selection can be: applicability, robustness, and documentation. Specific criteria can be: variables of interest, tree components considered, species considered, temporal representativeness, geographic location of field measurements, climate zone, statistical parameters, sample size, range of validity, transformation of the output variable, statistical method for residual and model weighting, mathematical form, validation of the data, adequate documentation, materials and method used, and sampling design (Cifuentes Jara et al. 2014). The construction of a decision tree is based on a series of scientific hypotheses to be presented together with the approach, such as: "The models having a sample size less than 30 individuals are not robust"; "Species specific models are more robust than models for an ecological zone or pantropical distributions"; and "Geographical proximity is a good indicator of representativeness of the study area". Identification of the most suitable decision tree requires, in most cases, access to raw data. If we sum the total number of individuals for the tree models available at GlobAllomeTree , it follows that between 56,937 and 121, 062 individuals were measured across a range of forest ecosystems ( Fig. 1). 1 This figure is significantly higher than the sample size currently available for creating pantropical models (for moist forests, e.g., Brown 1997: n= 170; Chave et al. 2014: n=4004). In addition, while pantropical models were mainly developed for trees in closed and mostly unmanaged forest , the GlobAllomeTree database contains individuals from a wide range of forested landscapes and trees outside forests.
The aim of this article is to recommend an approach for assessing biomass and volume at the national level that ensures the lowest possible uncertainty. We discuss adequate sampling schemes to develop tree volume and biomass models and improving those once national forest inventories are completed. Consideration of national circumstances is then given to the selection of volume and biomass equations and their potential use as part of national biomass assessment programs. Finally, we discuss the impact of the introduction of new methods on emission reduction estimates.
2 What is the adequate sampling scheme to develop tree volume or biomass equations?
The total error associated to the use of tree models is a combination of three types of error (Cunia 1987). First, the error associated to the sampling design, dependent on the number of trees sampled, and the choice of a biomass equation among several available of those trees relative to the population. This error can be calculated and, given enough resources and time, controlled with relative ease (measuring additional sampled trees or re-measuring part of the sampled population). The second error is due to human error while measuring the forest, entering and checking data (Westfall 2014). This error cannot be assessed exactly, but procedures exist to minimize it (Picard et al. 2012a, b). Several forest services have developed quality assurance plans that include measuring quality, revising methodology to reduce efforts, improving the effectiveness of training sessions, and revising re-measurement program for quality control for example (USDA 2012). The third error is associated to the model's prediction. The first and third errors are due to the fact that results are based on samples and not on the entire population and that an important natural variability exists. These two errors are not avoidable but can be estimated through statistical indicators and reduced as much as possible through efficient sampling strategies and statistical models ).
Optimizing the sampling strategy will reduce the measurements' costs and increase the representativeness of the sample in relation to the population. A good preliminary overview of the population is therefore needed. Sampling should be transparent, robust, and simple. It will be the result of a compromise between the desired accuracy and the resources available to perform field measurements. Optimizing the sampling scheme requires the identification of ecosystem types, species composition, forest structure, tree size distribution, tree architecture, and available existing data. As part of a systematic sample, we thus must be careful to ensure that the sampling efforts take into account the different ecosystem types. When possible, different ecosystems could be delimited to apply stratification. Stratification aims to take account of exogenous information to establish homogeneous sampling strata and thus improve the precision of our estimations (Picard et al. 2012a, b). The principle, in the same manner as previously, is to increase the sampling intensity of the most variable ecosystem or strata. The use of remote sensing can be of particular help to support the stratification, map structural traits of plant canopies, and allow inference of traits and, in many cases, The map is based on the data of April 2014 from www.globallometree.org species ranges (Schimel et al. 2013). Another solution would be sampling along gradients where allometric variation is known, e.g., altitudinal gradients having strong influence on soil types, climate, vegetation forms, and biomass (Girardin et al. 2010). Soil physical conditions influence the floristic composition (Infante Mata et al. 2011) and constrain the amount of biomass stored in tropical forests, highlighting the need to consider the importance of taking into account soil characteristics and species wood density when assessing national forest biomass (Gourlet-Fleury et al. 2011). Unless for homogenous ecosystems, such as plantations or mangroves, it is not easy to take into consideration the floristic composition in the sampling scheme for the tropics because the number of tree species can be up to 300 per hectare (Gibbs et al. 2007). The identification of different plant functional types (Hawthorne 1995) can be an option to group the different tree species based on different architecture, growth strategy, and biomass to develop tree models specific for different plant functional types (Henry et al. 2010). Identification of architectural models is a convenient starting point for interpreting plant forms (Valladares and Niinemets 2007), and the consideration of the identification of architectural models for the sampling strategy can be one option (Goodman et al. 2013). However, there is a series of variations and exceptions to each program of plant development that complicates each classification, e.g., tree species such as Arbutus sp. exhibits different architectural patterns depending on the light environment (Bell 1993). Other constraints such as the identification of tree species or the availability of a tree species classification system by plant traits are not often available in tropical countries. Another element to be well considered is the forest structure to capture different elements such as the basal area, the tree height, and the range and shape of diameter distributions. Particular attention should be paid to the selection of trees of different sizes. Large trees, which are more difficult to measure, are often ignored in sampling campaigns, while they store large amounts of biomass (Slik et al. 2013). As large trees drive much of the biomass/volume and their associated uncertainty, discarding them in models may lead to considerable bias in the estimates. Access to the data from the various destructive and non-destructive forest field inventories are necessary to well identify the range of tree size, to avoid duplication of efforts, and ensure that the sampling strategy is accurate and optimized as far as practicable. Tree model fitting methods are usually influenced by heteroscedasticity in the data. Two methods are usually proposed to solve this issue (Picard et al. 2012b). The first consists in weighted, but here, everything depends on the weighting function. The second consists in a log transformation, but in this case, the estimated values need to be returned to a normal distribution. When data are already available, additional sampling should focus mainly on the parts of the trees (roots, branches, leaves, etc.), life forms (palms, lianas, etc.), and the parts of the population (forest type, location, tree size, etc.) that have not been covered in the previous sampling.
3 Taking into account the national contexts, what are the potential options for using tree volume and biomass equations as part of a national forest biomass assessment?
The proposed approaches to the use of tree models will have to take the following into consideration: (1) data availability (raw data, metadata, models, and forest inventory data), (2) the biophysical and environmental context, and (3) the human, financial, and technical capabilities (Herold and Johns 2007), i.e., some countries do not have university courses in the field of forestry or do not have funds allocated to ensure data collection, storage, and management. Raw data are needed to allow the validation of the estimates and perform accuracy assessments. Although highly desirable, it is not required to have locally developed models; available generic models may be used in early stages of the system. The model selected should be tested at a local level based on a limited number of samples from different ecological environments. Finally, the national forest inventory can provide national-scale coverage of ecosystems and facilitate the field data required to calculate volume and biomass. Cost issues are also relevant but were not addressed during the workshop.
The following three options for using tree models as part of a national forest biomass assessment are based on the information available: Option 1 Neither models nor raw data are available. In this case, it is better to use a generic model and validate it by destructive harvesting. Option 2 The raw data are not available, but volume and biomass equations are available. It is then possible to use Bayesian approaches to simulate a data set having the same properties as the original raw data (Picard et al. 2012a;Zapata-Cuartas et al. 2012) and results compared against option 1. Option 3 Reliable raw data and models are available. In this case, models taking into account tree species, forest types, climate, and interval of validity can be considered if the data set is large enough and compared against option 1.
However, the intention to classify into different options is purely practical. Practitioners can use the three options separately depending on the data availability in different forest strata or can combine the three options. The use of the Bayesian methods is one option to be used to combine the results and minimize uncertainty associated to model errors.
The use of Bayesian methods is not new in forestry (Green et al. 1994), but the use of Bayesian model averaging methods in forestry appears to be a new challenge (Picard et al. 2012a, b;Zapata-Cuartas et al. 2012). Bayesian approaches can solve many problems associated with the use of tree models (some examples follow).

Uncertainty reduction
In many forestry applications, model uncertainty is ignored in the estimation process. Analysts typically select a model from some class of models and then proceed as if the predicted values were observed data with no error. This approach ignores the uncertainty in model selection as well as uncertainty associated with estimated model parameters and residual variance, leading to over-confident inferences and riskier decisions. Bayesian calibration of models iteratively provides a coherent mechanism for reducing model uncertainty (van Oijen et al. 2013). Bayesian hierarchical models also reduce uncertainty by profiting from the information provided by well-represented species and use it for less represented tree species models (by nature, carrying typically carrying larger uncertainties) (Dietze et al. 2008).

Selection of volume and biomass equations
Rather than choosing a single model out of several ones, with the risk of not selecting the best available one, Bayesian model averaging (BMA) offers a way to combine different models into a single predictive model. Picard et al. (2012a) used the BMA of deterministic models and combined three existing multispecies pantropical biomass equations for tropical moist forests. The resulting model brought a relatively minor, although consistent, improvement of the predictions of the aboveground dry biomass of trees and captured features in the biomass response to diameter that no single model was able to fit. BMA, thus, is an alternative to model selection that allows integrating the biomass response from different models (Picard et al. 2012a).

Reducing sample size
A large number of biomass equations have been developed over the years, providing an opportunity to synthesize parameter values and estimate their probability distributions. These distributions can be used as a priori probabilities to develop new equations for other species or sites. For example, Dietze et al. (2008) and Zapata-Cuartas et al. (2012) use Bayesian methods which outperform the classical statistical approach of least-square regression at small sample sizes. With this method, it is possible to obtain similar significant values in the estimation of parameters using a sample size of 6 trees rather than 40-60 trees in the classical approach that does not utilize any a priori information. Further, the Bayesian approach suggests that allometric scaling coefficients should be studied in the framework of probability distributions rather than fixed parameter values (Zapata-Cuartas et al. 2012).

Mixed models
Similar to BMA, mixed effects (also called multilevel) modeling methods are not new; however, these methods have only recently become more common in forestry applications. A key feature of mixed models is the ability to account for the lack of independence among observations, which is often found in forestry data, e.g., multiple observations within each tree or multiple trees on a sample plot. Both of these situations may be encountered when developing models to predict tree biomass. Failure to account for the inherent interdependence results in biased estimates of model error invalidating inference regarding the statistical significance of model parameters (Valentine and Gregoire 2001). Mixed model techniques are usually implemented in one of two ways: (1) via the direct specification of a matrix that mathematically describes the covariance structure within the data or (2) via the specification of random parameters in the model that indirectly account for the correlation among observations. The latter implementation is often preferred in forestry applications as the random parameters essentially customize the model fit for each observation. Further, values of these random parameters can be predicted for new observations, and thus, the model can be calibrated to local conditions (Westfall 2010). Thus, the use of a mixed modeling framework can reduce uncertainty by providing improved local prediction accuracy.
A proposed method that would allow inclusion of volume and biomass equations in a national forest biomass assessment would require adapting decision trees to include adjustment methods. Any unique regional model is prone to over-or under-estimate estimates for any given location. However, Bayesian model averaging (BMA) can group models, e.g., by climate zone, biome, forest types, etc., and has the advantage of generating weighted model estimates. The weighting can follow the stratification used for inventory or some prior knowledge and improve the estimates to include a greater range of variation in diameters of samples or species. Also, the coefficients used are the result of the weighted average of the information from existing models. This approach is a novel and justifiable method for option 1 and should be a robust (adaptable) alternative to consider when there is adequate information (options 2 and 3).
Regarding the calculation of the uncertainty, two options are proposed. The first method is to validate the results using data obtained from destructive measurements. This can be costly and requires additional fieldwork and coordination with local authorities. The second method uses Monte Carlo methods or Bootstrapping (Molto et al. 2013). These statistical methods are not field intensive but require advanced skills not readily available in all countries. It is clear though that the analysis of error propagation can be achieved when the raw data are available to allow this.
4 Will the establishment of new methodologies have an impact on the emission reduction estimates?
Emission reduction estimates in the context of REDD+ will be reported as part of the GHG inventory for the biennial report and the national communication to the UNFCCC (2010). As part of the national communication, it is very likely that developing countries will be requested to provide an improvement plan (Tulyasuwan et al. 2012). An improvement plan is considered as an essential part of a National GHG Inventory System. A well-informed list of possible improvements sorted by priorities (e.g., through a Key Category Analysis) strives for increasing transparency, consistency, comparability, completeness, and accuracy of GHG emission and removal estimates. In order to improve national GHG inventories, it may be necessary to consider more accurate methodologies, country-specific forest carbon stock and stock change factors, land area estimates, and other relevant technical elements of the GHG inventory, depending on available resources. Making a quantitative estimate of inventory uncertainty for each category and for the inventory in total and considering the influence or the magnitudes of each emission and removal source category will guide further improvements. Depending on the category of improvement of estimates, various methods proposed by the IPCC ensure consistency of trends (e.g., extrapolation, interpolation, recalculation, etc.). It is expected that the improvement of the GHG preparation process will lead to improved quality and reduced uncertainty and will improve the robustness of estimates. Thus, improving estimates of forest carbon stocks and stock changes in the context of REDD+ should enhance the credibility of estimates and accessibility to benefits.

Conclusion
This article provides clear advice on how to improve tree and forest volume and biomass assessment at local and national scales. The adequate use and selection of volume and biomass equations and accuracy of estimates contribute to improving national forest inventories. Volume and biomass equations are critical to improve the precision of volume, biomass, and carbon stock and changes. This can be done in different ways: (1) Stratification is valuable, although there will always be the need to sample additional ecosystems or species with particular growth forms. (2) Non-destructive measurements during national forest inventory campaigns can enlarge available datasets, improving models' precision and coverage. (3) Adequate measurement of variables such as wood density, tree diameter, and tree heights. While the recommendations provided in this article are based on evidence from several site-specific and national cases, the proposed options should be tested particularly in tropical countries where forest assessment is more difficult because of the complexities of ecosystems and relationship between anthropogenic and biophysical elements. In addition, an adequate institutional context should be set up in order to ensure adequate data accessibility and validation of the process.