Background

Carbon dioxide sequestration and storage associated with forest ecosystem is an important mechanism for regulating anthropogenic emissions of this gas and contribute to the mitigation of global warming (Husch et al. 2003). The estimation of carbon stock in forest ecosystems must include measurements in the following carbon pools (Brown 1999; Brown 2002; IPCC 2006; Pearson et al. 2007): live aboveground biomass (AGB) (trees and non-tree vegetation), belowground biomass (BGB), dead organic matter (dead wood and litter biomasses), and soil organic matter.

Biomass can be measured or estimated by in situ sampling or remote sensing (Lu 2006; Ravindranath 2008; GTOS 2009; Vashum and Jayakumar 2012). The in situ sampling, in turn, is divided into destructive direct biomass measurement and non-destructive biomass estimation (GTOS 2009; Vashum and Jayakumar 2012).

Non-destructive biomass estimation does not require harvesting trees; it uses biomass equations to estimate biomass at the tree-level and sampling weights to estimate biomass at the forest level (Pearson et al. 2007; GTOS 2009; Soares and Tomé 2012). When biomass equations are fitted using least squares they are called biomass regression equations. Biomass regression equations are developed as linear or non-linear functions of one or more tree-level dimensions. On other hand, when they are fitted in such a way that specify tree component biomass as directly proportional to stem volume, the ratios of proportionality are then called component biomass expansion factors (BEFs). However, biomass equation (either regressions or BEFs) are developed from destructively sampled trees (Carvalho and Parresol 2003; Carvalho 2003; Dutca et al. 2010; Marková and Pokorný 2011; Sanquetta et al. 2011; Mate et al. 2014; Magalhães and Seifert 2015 a, b, c).

Biomass regression equations yield the most accurate estimates (IPCC 2003; Jalkanen et al. 2005; Zianis et al. 2005; António et al. 2007; Soares and Tomé 2012) as long as they are derived from a large enough number of trees (Husch et al. 2003; GTOS 2009). Nonetheless, national and regional biomass estimates are generally calculated based on BEFs (Magalhães and Seifert 2015c), especially when using national forest inventory data (Schroeder et al. 1997; Tobin and Nieuwenhuis 2007).

Jalkanen et al. (2005) compared regression equations based and BEF-based biomass estimates for pine-, spruce- and birch-dominated forests and mixed forests and concluded that BEF-based biomass estimates were lower and associated with larger error than regression equations based biomass estimates. However, no similar studies have been conducted for tropical natural forests.

The objective of this particular study was to compare regression equations based and BEF-based above- and belowground biomass estimates for an evergreen forest in Mozambique with regard to the following sources of errors: (1) random plot selection and variability, (2) biomass model, (3) model parameter estimates, and (4) residual variability around model prediction. Therefore, the precision and bias associated with those estimates were critically analysed. This study is a follow up of the study by Magalhães and Seifert (2015b). However, unlike the study by those authors, that considered only five tree components, the current study is extended to 11 components (taproot, lateral roots, root system, stem wood, stem bark, stem, branches, foliage, crown, shoot system, and whole tree), and to bias analyses not considered by Magalhães and Seifert (2015b, c) for either method of estimating biomass.

Methods

Study area

The study was conducted in Mozambique, in an evergreen forest type named Mecrusse. Mecrusse is a forest type where the main species, many times the only one, in the upper canopy is Androstachys johnsonii Prain (Mantilla and Timane 2005). A. johnsonii is an evergreen tree species (Molotja et al. 2011), the sole member of the genus Androstachys in the Euphorbiaceae family. Mecrusse woodlands are mainly found in the southmost part of Mozambique, in Inhambane and Gaza provinces, and in Massangena, Chicualacuala, Mabalane, Chigubo, Guijá, Mabote, Funhalouro, Panda, Mandlakaze, and Chibuto districts. The easternmost Mecrusse forest patches, located in Mabote, Funhalouro, Panda, Mandlakaze, and Chibuto districts, were defined as the study area and encompassed 4,502,828 ha (Dinageca 1997), of which 226,013 ha (5 %) were Mecrusse woodlands. Maps showing the area of natural occurrence of mecrusse in Inhnambane and Gaza provinces and the study area, along with detailed description of the species and the forest type can be found in Magalhães and Seifert (2015c) and Magalhães (2015).

Data collection

The data were collected in 2012 and 2014. In 2012, a two-phase sampling design was used to determine tree component biomass. In the first phase, diameter at breast height (DBH) and total tree height of 3574 trees were measured in 23 randomly located circular plots (20-m radius). Only trees with DBH ≥5 cm were considered. In the second phase, 93 A. johnsonii trees (DBH range: 5–32 cm; height range: 5.69–16 m) were randomly selected from those analysed during the first phase for destructive measurement of tree component biomass along with the variables from the first phase. Maps showing the distribution of the 23 randon plots in the study area and in the different site classes are shown by Magalhães and Seifet (2015c) and Magalhães (2015).

In 2014, additional 37 trees (DBH range: 5.5–32 cm; height range: 7.3–15.74 m) were felled outside sampling plots, 21 inside and 16 outside the study area. The 93 trees collected in 2012 were used to fit tree component biomass regression models and determine tree component BEFs, and those collected in 2014 (37 trees) were used to estimate the biases associated with regression equation based and BEF-based tree component biomass estimates.

The felled trees (both from 2012 to 2014) were divided into the following components: (1) taproot + stump; (2) lateral roots; (3) root system (1 + 2); (4) stem wood; (5) stem bark; (6) stem (4 + 5); (7) branches; (8) foliage; (9) crown (7 + 8); (10) shoot system (6 + 9); and (11) whole tree (3 + 10). Tree components were sampled and the dry weights estimated as desbrided by Magalhães and Seifert (2015, a, b, c, d, e) and Magalhães (2015).

Data processing and analysis

Tree component biomass

The distinction between biomass regression equations (or simply regression equations) and biomass expansion factors (BEFs) may be confusing as BEF is a biomass equation (equation that yields biomass estimates), it is a regression through the origin of biomass on stem volume where, therefore, the BEF value is the slope. For clarity, in this study, biomass regression equations refer to the biomass equations where the regression coefficients are obtained using least squares (Montgomery and Peck 1982) such that the sum of squares of the difference between the observed and expected value is minimum (Jayaraman 2000), unlike BEF which is not obtained using least squares.

Biomass estimation typically requires estimation of tree components and total tree biomass (Seifert and Seifert 2014). To ensure the additivity of minor component biomass estimates into major components and whole tree biomass estimates, minor component, major component and whole tree biomass models were fitted using the same regressors (Parresol 1999; Goicoa et al. 2011). For this, first the best tree component and whole tree biomass regression equations were selected by running various possible linear regressions on combinations of the independent variables (DBH, tree height) and evaluating them using the following goodness of fit statistics: coefficient of determination (R2), standard deviation of residuals (Sy.x), mean residual (MR), and graphical analysis of residuals. The mean residual and the standard deviation of residuals were expressed as relative values, hereafter referred to as percent mean residual (MR (%)) and coefficient of variation of residuals (CVr (%)), respectively, which are more revealing. The computation and interpretation of these fit statistics were previously described by Mayer (1941), Gadow & Hui (1999), Ruiz-Peinado et al. (2011), and Goicoa et al. (2011).

Among the different model forms tested (Y = b0 + b1D2, Y = b0 + b1D2 + b2H and Y = b0 + b1D2H, where b0 and b1 are regression coefficients, D is the DBH and H is the tree height), the model form Y = b0 + b1D2H was the best for 8 tree components and for the whole tree biomass, and the second best for the remaining tree components, as judged by the goodness of fit statistics described above. Therefore, to allow all tree components and whole tree biomass models to have the same regressors, and thus achieve additivity, this model form was generalized for all tree components and whole tree biomass models.

Linear weighted least squares were used to address heteroscedasticity. The weight functions were obtained by iteratively finding the optimal weight that homogenised the residuals and improved other fit statistics. Among the tested weight functions (1/D, 1/D2, 1/DH, 1/D2H), the best weight function was found to be 1/D2H for all tree components and whole tree biomass models. Although the selected weight function may not have been the best one among all possible weights, it was the best approximation found.

Linear models were preferred over nonlinear models because the procedure of enforcing additivity by using the same regressors is only applicable for linear models (Parresol 1999; Goicoa et al. 2011) and because the procedure of combining the error of the first and second sampling phases in double sampling (Cunia 1986a) is limited to biomass regressions estimated by linear weighted least squares (Cunia 1986a).

The regression equation based and the BEF-based biomass of the c component of the k th tree in the h th plot (Ŷ hk ) is determined by Eq. (1) and Eq. (2), respectively:

$$ {\widehat{Y}}_{hk}={b}_0+{b}_1{D}_{hk}^2{H}_{hk} $$
(1)
$$ {\widehat{Y}}_{hk}= BE{F}_c\times {v}_{hk}= BE{F}_c\times \frac{\pi }{4}\times {D}_{hk}^2\times {H}_{hk}\times ff $$
(2)

where v hk , D hk and H hk represent stem volume, DBH and tree height of the k th tree in the h th plot, ff and BEFc represent the average Hohenadl form factor (0.4460) and tree component BEFs of A. johnsonii estimated by Magalhães and Seifert (2015c).

Computing BEF-based biomass is similar to compute the biomass with a regression equation of tree compontent biomass on stem volume passing through the origin, where, therefore, b0 = 0 and b1 = BEFc. In fact, in ratio estimators, the ratio R (BEF value, in this case) is the regression slope when the regression line passes through the origin (Johnson 2000). Given that fact, Eqs. (1, 2) can be presented as one, in matrix form as follows:

$$ {\widehat{Y}}_{hk}=b{X}_{hk} $$
(3)

where \( b=\left[\begin{array}{cc}\hfill {b}_0\hfill & \hfill {b}_1\hfill \end{array}\right] \) and \( {X}_{hk}={\left[\begin{array}{cc}\hfill 1\hfill & \hfill {D}_{hk}^2{H}_{hk}\hfill \end{array}\right]}^T \) if b 0  ≠ 0; and \( b=\left[\begin{array}{cc}\hfill 0\hfill & \hfill {b}_1\hfill \end{array}\right]= BE{F}_c \) and \( {X}_{hk}={\left[\begin{array}{cc}\hfill 0\hfill & \hfill \frac{\pi }{4}{D}_{hk}^2{H}_{hk} ff\hfill \end{array}\right]}^T=\frac{\pi }{4}{D}_{hk}^2{H}_{hk} ff \) if b 0  = 0. T denotes matrix transpose.

The biomass of plot h (Ŷ h ) is estimated by summing the individual biomass (Ŷ hk ) values of the n h trees in plot h. Dividing Ŷ h by plot size a gives biomass Ŷ on an area basis:

$$ \widehat{Y}=\frac{{\widehat{Y}}_h}{a}=\frac{b{\displaystyle \sum_{k=1}^{nh}{X}_{hk}}}{a} $$
(4)

where k = 1, 2, …, n h , and h = 1, 2, …, n p , n p  = number of plots in the sample, and n h  = number of trees in the h th plot.

Denoting \( {S}_h=\frac{{\displaystyle \sum_{k=1}^{nh}{X}_{hk}}}{a} \), Eq. (4) can be rewritten as:

$$ \widehat{Y}=b{S}_h $$
(5)

where \( {S}_h={\left[\begin{array}{cc}\hfill {S}_{h0}\hfill & \hfill {S}_{h1}\hfill \end{array}\right]}^T \). Where \( {S}_{h0}=\frac{n_h}{a} \) and \( {S}_{h1}=\frac{{\displaystyle \sum_{k=1}^{nh}{D}_{hk}^2{H}_{hk}}}{a} \) if b 0  ≠ 0; and S h0 = 0 and \( {S}_{h1}=\frac{{\displaystyle \sum_{k=1}^{nh}\frac{\pi }{4}{D}_{hk}^2{H}_{hk} ff}}{a} \) if b 0 = 0.

The biomass stock Ȳ (average biomass per hectare) is estimated by summing the biomass Ŷ of each plot (area basis) and dividing it by the number of plots n p :

$$ \overline{Y}=\frac{b{S}_h}{n_p} $$
(6)

Now, denoting \( Z=\frac{S_h}{n_p} \), Eq. (6) can be rewritten as follows:

$$ \overline{Y}=bZ $$
(7)

where \( Z={\left[\begin{array}{cc}\hfill {Z}_0\hfill & \hfill {Z}_1\hfill \end{array}\right]}^T \) if b 0  ≠ 0; and \( Z={\left[\begin{array}{cc}\hfill 0\hfill & \hfill {Z}_1\hfill \end{array}\right]}^T={Z}_1 \) if b 0 = 0.

Recall that b is the row vector of the estimates from the second sampling phase (regression coefficients or BEF values), and Z is the column vector of the estimates from the first phase.

Eqs. (2, 3, 4, 5, 6, 7) were applied to estimate biomass stock of each tree component and whole tree.

Biomass stock [Eq. (7)] is estimated by combining the estimates of the first and second phases (Z and b, respectively). Two main sources of error must be accounted for in this calculation, that resulting from plot-level variability (first sampling phase) and that from biomass equation: either regression or BEF equation (second phase).

Cunia (1965, 1986a, 1986b, 1990) demonstrated that the total variance of Ȳ (mean biomass per hectare) can be estimated by Eq. (8):

$$ VA{R}_t=VA{R}_1+VA{R}_2=b\times {S}_{ZZ}\times {b}^T+Z\times {S}_{bb}\times {Z}^T $$
(8)

where VAR 1 and VAR 2 are variance components from the first and second sampling phases, respectively; S zz represents the variance–covariance matrix of vector Z T; and S bb represents the variance–covariance matrix of vector b. For this specific case, S bb and S zz are given in Eqs. (9, 10):

$$ {S}_{bb}=\left[\begin{array}{cc}\hfill {S}_{b_0{b}_0}\hfill & \hfill {S}_{b_0{b}_1}\hfill \\ {}\hfill {S}_{b_0{b}_1}\hfill & \hfill {S}_{b_1{b}_1}\hfill \end{array}\right] $$
(9)
$$ {S}_{zz}=\left[\begin{array}{cc}\hfill {S}_{z_0{z}_0}\hfill & \hfill {S}_{z_0{z}_1}\hfill \\ {}\hfill {S}_{z_0{z}_1}\hfill & \hfill {S}_{z_1{z}_1}\hfill \end{array}\right] $$
(10)

where \( {S}_{b_i{b}_j} \) = covariance of b i and b j , \( {S}_{b_i{b}_i} \) = variance of b i , \( {S}_{z_i{z}_j}=\frac{{\displaystyle \sum_{h=1}^{n_p}\left({S}_{hi}-{\overline{S}}_i\right)\left({S}_{hj}-{\overline{S}}_j\right)}}{\left({n}_p-1\right){n}_p} \) = covariance of Z i and Z j , and \( {S}_{z_i{z}_i} \) = variance of Z i .

Note that if b 0  = 0 (and then b 1  = BEFc), \( {S}_{b_i{b}_j}=0 \) and \( {S}_{z_i{z}_j}=0 \), therefore, \( {S}_{bb}={S}_{b_1{b}_1} \) and \( {S}_{zz}={S}_{z_1{z}_1} \). Consequentely, \( VA{R}_t= BE{F}_c\times {S}_{Z_1{Z}_1}\times BE{F}_c+{Z}_1\times {S}_{b_1{b}_1}\times {Z}_1 \) which is equal to:

$$ VA{R}_t= BE{F}_c^2\times {S}_{Z_1{Z}_1}+{Z}_1^2\times {S}_{b_1{b}_1} $$
(11)

The square roots of Eqs. (8, 11) are the total standard errors (SE) of Ȳ, the square roots of the first components of Eqs. (8, 11) are the SEs of the first phase, and the square roots of the second components of the same equations are the SEs of the second phase of the relevant methods of estimating biomass stock.

In this study, the error of Ȳ of the first and second sampling phases, and of both phases combined is expressed as the percent SE of the relevant phase or both phases combined, obtained by dividing the relevant SE by Ȳ and multiplying by 100. However, in some cases, the error is expressed as the variance of Ȳ, especially where the proportional influence of a particular source of error needs to be known, because, unlike the SEs, the variances of the first and second phases are additive (sum to total variance) (Cunia 1990).

As said previously, the error of the first sampling phase results from random plot selection and variability, and that from the second phase results from biomass model (either regression or BEF model). McRoberts and Westfall (2015), Henry et al. (2015), Temesgen et a.l (2015), and Picard et al. (2014) distinguish four sources of errors (surrogate of uncertainty) in model prediction: (1) model misspecification (also known as statistical model; i.e.: error due to model selection (Cunia 1986a)), (2) uncertainty in the values of independent variables, (3) uncertainty in the model parameter estimates, and (4) residual variability around model prediction.

The first source of error in model prediction arises from the fact that changing the model will generally change the estimates. Here, this error is expected to be negligible as, in general, the predictors explained a large portion of the variation in biomass and because the models were associated to a small error (CVr) (Table 1). In fact, according to Cunia (1986a) and McRoberts and Westfall (2015), when the statistical model used fits reasonably well the sample data, the statistical model error is generally small and can be ignored. The second source of error is quantified by Magalhães and Seifert (2015b). The third source of error is expressed by the parameter variance-covariance matrix, S bb . In this study, this source of error is expressed by the standard errors of the regression parameters or of the BEF values, as they are the square roots of the respective variances obtained from the variance-covariance matrix, S bb . The fourth source (residual variability around model prediction) is here expressed as coefficient of variation of residuals (CVr), as it measures the dispersion between the observed and the estimated values of the model, indicates the error that the model is subject to when is used for predicting the dependent variable.

Table 1 Regression coefficients (± SE), BEF values (± SE) and the fit statistics for each tree component and for total biomass

Therefore, the methods of estimating biomass under study (regression and BEF models) were compared with regard to the following sources of errors: (1) random plot selection and variability, (2) biomass model, (3) model parameter estimates, and (4) residual variability around model prediction. The first constitutes the error of the first sampling phase and the second constitutes the error of the second phase which incorporates the third and fourth source of errors.

The percent biases resulting from regression equation based and from BEF-based estimates were determined by Eq. (12) using an independent sample of 37 trees (trees not included in fitting the models):

$$ Bias\left(\%\right)=\frac{{\displaystyle \sum P{B}_k-{\displaystyle \sum O{B}_k}}}{{\displaystyle \sum P{B}_k}}\times 100 $$
(12)

where PB k and OB k represent, respectively, the predicted and observed biomass of the c compontent of the k th tree.

As described above, the regression-based biomass is estimated by the model form Y = b 0 + b 1 D 2 H [kg] and the BEF-based one is estimated by \( Y=BEF\times {v}_{hk}=\frac{\pi }{4}\times {D}^2H\times ff \) [Mg], which is equal to \( Y=\frac{\pi }{4}\times {D}^2H\times ff\times 1000 \) [kg], where as v hk and H are expressed in m3 and m, respectively, D must be converted to m, which makes BEF-based biomass (in kg) to be estimated as \( Y=\frac{\pi }{40000}\times {D}^2H\times ff\times 1000=\frac{\pi }{40}\times {D}^2H\times ff \) if D is expressed in cm.

From Table 1 it can be seen that 8 out of the 11 regression equations have their intercepts not statistically siginicant at α = 0.05; therefore, the regression equation can be generelized as Y = b 1 D 2 H [kg] and the BEF model as \( Y={\tilde{b}}_1{D}^2H \) [kg], where \( {\tilde{b}}_1=\frac{BEF\times \pi \times ff}{40} \). Thus, to estimate the percentual difference between regression-based and BEF-based biomasses at a given D2H, b 1 and \( {\tilde{b}}_1 \) were contrasted; i.e.: the percentual magnitude of \( {\tilde{b}}_1 \) in relation to b 1 was taken as an indicative of how the different models (regression and BEF models) estimate biomass from a given D2H. Additionally, the average b 1 and \( {\tilde{b}}_1 \) for all components at given D2H were compared using Student’s t-test.

Furthermore, the estimation errors (defined as the percentual difference between predicted and observed biomass values) of the individual trees from 2014 for each method of estimating biomass were plotted against those trees’ D2H to evaluate the under or overestimation associated to each method. Farther, the average errors at given D2H per tree (for each method) were compared using Student’s t-test. All the statistical analyses were performed at α = 0.05.

Results

For all tree components and whole tree, except foliage, the variation of biomass explained by predictor variable(s) ranged 82.14 to 97.75 % for regression models and from 74.54 to 98.85 % for BEF models (Table 1). In general, the variation of biomass explained by the predictor variable(s) was larger in regression models than in BEF ones, except for stem and stem wood (Table 1). Less than half of the variation of foliage biomass was explained by the predictor variable(s). All tree components presented non-significant MRs. The plots of the residuals presented no particular trend (refer to Magalhães and Seifert (2015a, b)); the cluster of points was contained in a horizontal band, with the residuals evenly distributed under and over the axis of abscissas, meaning that there were not model defects.

The errors due to model parameter estimates (SE) and those due to residual variability around model prediction (CVr) are larger for BEF models, except for stem and stem wood components.

The regression equation based biomass stocks estimates were relatively larger than the BEF-based ones, except for foliage (Table 2). For example, the regression equation based BGB, AGB and whole tree biomass stocks were 7.7, 8.5 and 8.3 % larger than the BEF-based ones. However, the proportion of the whole tree biomass allocated to each tree component is similar in either method; for instance, BGB, stem, and crown biomass accounted for 20, 56 and 24 %, respectively, to whole tree biomass for both methods. The property of additivity is achieved in both methods, for the whole tree biomass and for all major tree components. This is so because for each particular method (regression or BEF), all tree component models used the same predictors (DBH and H for regression and stem volume for BEF models).

Table 2 Regression equations based and BEF-based tree component biomass

Overall, the percent SEs of the first sampling phase (error resulting from plot selection and variability) of the BEF-based biomass estimates were slightly and sometimes nigligibly larger than those obtained using regression equations (Table 3), except for 2 tree components (lateral roots and branches) where the percent SEs were relatively smaller. In the second sampling phase considerable differences in percent SEs were found; BEF-based estimates exhibited smaller percent SE in 6 tree components and larger ones in the remaining five. The total percent SEs (both phases combined) were also negligibly different between the two methods of estimating biomass stocks, except for foliage where a substantial difference was observed. Although, the average tree component biomasses obtained by either method were slightly different (Table 2), they fell in the 95 % confidence interval of any method (Table 3).

Table 3 Absolute standard errors (Mg ha−1), percent standard errors, and 95 % confidence limits of the estimates of tree component biomass stocks for each sampling phase using regression equations and BEFs

The percent SE of the first phase is a result of plot selection and variability, and that of the second phase is a result of biomass models (either regression or BEF models). From Table 4, it is noted that for both methods, the percentage of the total error (as total variance) attributed to first phase (plot selection) is larger than that attributed to second phase (biomass models), except for the foliage, branches and crown. The percentage of the total error (as total variance) attributed to BEF models is larger than that attributed to regression models in all tree components, except for stem wood, stem bark and stem (stem bark + stem wood). The percentage of the total error (as total variance) attributed to BEF model for stem wood and stem is more than twice as small as that attributed to regression model.

Table 4 Percentage of total error (as variance) attributed to each sampling phase

The BEF-based biomass estimates were found to be more biased than the regression-based ones in 6 out of 11 tree components (Table 5). Overall, regression equation based biomasses tended to be larger than the observed biomasses and the BEF-besed ones tended to be smaller than the observed ones. As expected, the percent biases for stem wood and stem BEF-based biomass are considerably smaller than those from regression based ones. Recall that BEF models for stem wood and stem were found to be associated to larger R2, smaller percentage of total error (as variance) attributed to biomass model, smaller errors due to model parameter estimates and smaller errors due to residual variability around model prediction than the regression models.

Table 5 Comparision of bias between regression equation based and BEF-based biomass estimates

It was found that at a given D2H, the regression-based biomass estimates tended to be considerably larger than the BEF-based ones (Table 6), supporting the finding from Table 2. However, it is worth mentioning that the percentual difference between the regression-based and BEF-based biomass estimates at a given D2H for taproot + stump, lateral roots, and foliage are overestimated, as for those components the intercepts are statistically significant and then should not be removed from the model. For example, it was expected the regression-based biomass estimate at a given D2H for the taproot + stump to be larger than the BEF-based one, therefore in accordance to the Table 2 (yielding a negative difference); however, the exclusion of the intercept caused the BEF-based biomass estimate at a given D2H to be larger, causing a positive difference. Accordingly, the really differences between the regression-based and the BEF-based biomass estimates at a given D2H for lateral roots and foliage are smaller than those presented in the Table 6. Using Student’s t-test the average biomass estimates by each method at a given D2H are found to be statistically different (p-value = 0.01).

Table 6 Comparision between regression-based and BEF-based biomass at a given D2H

The estimation errors per tree plotted against the respective D2H values (Fig. 1) for the whole tree show that the positive and negative errors of regression model cancel each other, tending to average zero; in fact, the Student’s t test showed that the average percent error (1.34 %) is not statistical different from zero (p-value = 0.51). On the other hand, the plot of the errors show that the BEF model underestimates the biomass, a finding confirmed by Student’s t-test (average error = −8.60, p-value = 0.0007).

Fig. 1
figure 1

Comparision of the estimation errors of the regression model and BEF model for the whole tree biomass

Discussion

This study compares two commonly used methods of estimating tree and forest biomass: regression equations and biomass expansion factors. This is a unique study for many reasons: (1) the precision and bias associated with each method of estimating biomass are critically compared; the errors associated with biomass estimates are rarely evaluated carefully (Chave et al. 2004); (2) the comparison involved 11 tree components, including BGB, which is rarely studied (GTOS 2009); (3) in turn, BGB was divided into 2 root components: taproot and lateral roots.

Many biomass studies include only AGB not breakdown in further components (e.g. Overman et al. 1994; Grundy 1995; Eshete and Ståhl 1998; Pilli et al. 2006; Salis et al. 2006; Návar-Cháidez 2010; Suganuma et al. 2012; Sitoe et al. 2014; Mason et al. 2014), ignoring the fact that different tree components have distinguished uses and decomposition rates, affecting differently the storage time of carbon and nutrients (Magalhães and Seifert 2015a). Aware of that, here, the AGB is divided into 6 tree components (foliage, branches, crown, stem wood, stem bark, and stem).

Few studies have considered BGB (e.g. Kuyah et al. 2012; Mugasha et al. 2013; Green et al. 2007; Ryan et al. 2010; Ruiz-Peinado et al. 2011; Paul et al. 2014); in most of those studies the root system was not fully excavated (Green et al. 2007; Ryan et al. 2010; Ruiz-Peinado et al. 2011; Kuyah et al. 2012; and Paul et al. 2014), the excavation was done to a certain predefined depth or the fine roots were not considered; or a sort of sampling procedure was used (Kuyah et al. 2012; Mugasha et al. 2013). These procedures of estimating BGB lead to underestimation or to less accurate estimates (Mokany et al. 2006; Mugasha et al. 2013). Furthermore, studies that have breakdown BGB into further root components are limited.

The only studies available that compare regression equations based and BEF-based biomass estimates are those by Jalkanen et al. (2005) and Petersson et al. (2012), which, however, did not consider BGB. The finding that the whole tree BEF-based biomass estimate was 8.3 % lower, with slightly larger percent error than that based on regression equation is in line with the finding by Jalkanen et al. (2005), which found that BEF-based AGB estimate was 6.7 % lower.

It was verified here that the percentage of the total error of biomass (as total variance) attributed to BEF model for stem wood and stem is more than twice as small as that attributed to regression model; and that BEF models for those tree components (stem wood and stem) were associated to larger R2, smaller biases, smaller errors due to model parameter estimates and smaller errors due to residual variability around model prediction than the regression models. Therefore, although it has been maintained that biomass regression equations yield the most accurate estimates than BEFs (IPCC 2003; Jalkanen et al. 2005; Zianis et al. 2005; António et al. 2007; Soares and Tomé 2012), this might not be true when stem and stem wood components are concerned. This is so because the stem BEF value is computed by dividing the stem biomass by stem volume, which makes the stem BEF value to be similar to stem wood density (specific gravity) and thus more realistic (than models using only DBH and tree height) when using it to convert stem volume to stem biomass, as biomass is a function of wood density (Ketterings et al. 2001). As for stem wood biomass, since the difference between stem wood and stem biomass is negligible.

On the contrary, using stem volume to obtain any other tree component biomass, through BEF value, is not realistic, since the density varies from component to component, leading to less accurate and less precise estimates. This is aggravated for the non-woody components, where the density value may differ greatly from the stem density value. In fact, it has been noted here that the BEF-based foliage biomass is associated with the largest percent error (11.55 %), and that 84 % of that error is attributed to BEF model (Table 4), besides being associated to the largest error due to model parameter estimates and due to residual variability around model prediction (within and between methods).

In this study, the average stem density value of A.johnsonii trees was 754.42 Kg m−3 and the average stem BEF was 0.7334 Mg m−3 (733.40 Kg m−3). The small difference of these estimates might be due to the fact that the stem density was computed using saturated volume and the stem BEF value was computed using green volume. The stem density obtained here is in line with that by Bunster (2006) (754 Kg m−3) for the same tree species.

The errors of regression-based biomass estimates are the same as those obtained by Magalhães and Seifert (2015b) for the relevant tree components. However, the errors of the BEF-based estimates were slightly different from those obtained by Magalhães and Seifert (2015c); these differences might be attributed to the different approaches used to compute the errors.

The regression-based biomass estimates could have been more precise if non-linear regression models were used instead of linear ones, as biomass is better described by non-linear functions (Bolte et al. 2004; Ter-Mikaelian and Korzukhin 1997; Schroeder et al. 1997; de Jong and Klinkhmer 2005; and Salis et al. 2006). However, the approach of combining the errors from the first and second phases developed by Cunia (1986a) is limited to linear regression models, as using non-linear regression, the expression of the error (as variance) may be so complex that may become extremely cumbersome to apply (Cunia 1986a). In the meantime, the linear models used here performed satisfactorily; relatively lower performance was obtained for foliage biomass model (R2 = 49.41 %; CVr = 66.21 %; MR = 1.55 %). Foliage biomass models have, usually, shown relatively poor performance (Brandeis et al. 2006; Mate et al. 2014).

A combined-variable model (Y = b0 + b1 × D2H) was used here to estimate tree component biomass. Silshi (2014) has referred that where compound derivatives of DBH and H are included there is no unique way to partition the variance in the response. However, the Monte Carlo error propagation approach can be applied to estimate the percent contribution of each variable (DBH and H) measurement error to the error of biomass estimate as performed by Magalhães and Seifert (2015b) and Chave et al. (2004) or using Bayesian approach as done by Molto et al. (2012).

It has been maintained here that the error due to model misspecification was ignored because it is expected to be negligible as overall the models fitted reasonably well the sample data. However, the foliage biomass models might be associated with a large model misspecification error as their predictors explained less than half of the variation in biomass, especially the foliage BEF model.

The current biomass estimates disregarded smaller and younger trees (DBH <5 cm), which may have led to underestimation, as those trees may have a significant contribution to forest biomass stock and are reported to be very important in the United Nations Framework Convention on Climate Change (UNFCCC) reporting process (Black et al. 2004). For example, Vicent et al. (2015) found that small trees (DBH <10 cm) accounted for 7.2 % of aboveground live biomass, which is a considerable share. Lugo and Brown (1992) and Chave et al. (2003) maintained that small tree biomass (DBH <10 cm) is equivalent to 5 % of large tree biomass. Nevertheless, in this study, the share of small trees biomass to aboveground live biomass or to large trees biomass is expected to be very small than that reported by Lugo and Brown (1992), Chave et al. (2003) and Vicent et al. (2015) as the definition of small trees (DBH <5 cm) considered here, include only part of the trees considered as small by those authors.

Conclusions

The regression equation based BGB and AGB stocks were, approximately, 33.6 ± 3.3 Mg ha−1 and 134.5 ± 12.9 Mg ha−1, respectively. The BEF-based BGB and AGB were, approximately, 30.1 ± 3.2 Mg ha−1and 123.1 ± 12.0 Mg ha−1, respectively.

Overall, the regression equation based biomass stocks were found to be slightly larger, associated with relatively smaller errors and least biased than the BEF-based ones. However, because stem BEF and stem wood BEFs are equivalent to stem and stem wood densities (specific gravities) and therefore, the equivalent biomasses computed directely by multiplying stem volume by stem or stem wood density, the percentages of their total errors (as total variance) attributed to BEF model were considerably smaller than those attributed to biomass regression equations, as regression equations were based only on DBH and stem height and ignored the stem density.