1 Introduction

Traditionally, forest growth models have been used for exploring different management and silvicultural options (Vanclay 1994, p.1). However, in the last two decades, the United Nations Framework Convention on Climate Change (UNFCCC) and the international negotiations concerning the second commitment period of the Kyoto Protocol (cf. Grassi et al. 2012) have emphasized the need for large-scale growth models, i.e., models that can provide predictions on large areas. The parties to those conventions need to make predictions of their forest resources and carbon stocks at the national level (e.g., Groen et al. 2013).

Apart from climate change issues, large-scale forest growth models can be useful from an economic and technical perspective. They may contribute to the development of national forest policies and the assessment of the sustainability and development potential of the forest sector. For instance, energy production from forest biomass has become a major issue and is now under study in many countries (e.g., Nord-Larsen and Talbot 2004; François et al. 2014).

The availability of large-scale forest growth models is currently limited. The EFISCEN matrix model, which was originally developed by Sallnäs (1990), and the Global Forest Model (Kindermann et al. 2006) count among the very few of their kind, although they have been used in many European countries in order to produce national or regional forecasts since the early 2000s (e.g., Nabuurs et al. 2000; Thürig and Schelhaas 2006; Groen et al. 2013). Other large-scale growth models are being developed to meet the demand for these types of predictive tools (e.g., Wernsdörfer et al. 2012; Packalen et al. 2014).

The use of growth models always implies a certain degree of uncertainty in the predictions. Kangas (1999) and McRoberts and Westfall (2014) list four sources of errors in model predictions: (i) model misspecification; (ii) errors in the independent variables, including the sampling and measurement errors; (iii) the residual error; and (iv) errors in the model parameter estimates. The propagation of these different errors in plot and stand-level growth predictions have been studied since the mid-1980s in forestry (e.g., Mowrer and Frayer 1986; Gertner 1987; Gertner and Dzialowy 1984; Mowrer 1991; Kangas 1999; Fortin et al. 2009; Mäkinen et al. 2010). However, to the best of our knowledge, very little information is available regarding the error propagation in national and regional forest growth predictions.

While the uncertainty associated with large-scale growth predictions remains to be assessed, some recent studies have addressed the precision of biomass and volume estimates in national forest inventories. It turns out that the uncertainty due to the residual error and the errors in the model parameter estimates is much smaller than the uncertainty induced by the sampling (McRoberts and Westfall 2014; Ståhl et al. 2014; Breidenbach et al. 2014).

When dealing with large-scale growth predictions, it could reasonably be assumed that the same conclusions apply: The sampling error may be the most important component of the total uncertainty. However, an additional challenge arises from the fact that growth models are usually more complicated than biomass and volume models. As such, they might behave differently in terms of error propagation and, therefore, the assertion that the sampling error is the major component should be verified.

Given the need for large-scale growth models and their role in the upcoming international negotiations concerning climate change, it seemed important to assess prediction uncertainty. The purpose of this study was to assess the uncertainty of large-scale predictions induced by the model residual error, the errors in the parameter estimates, and the sampling. More precisely, it aimed at distinguishing the contribution of each source of uncertainty to the total uncertainty of the predictions. Considering the extent of this work, we deliberately omitted the model misspecification and the measurement errors in the assessment.

To achieve our objective, we used the region of Catalonia, Spain, as a case study. Data from the Spanish national forest inventory were used to fit a population growth model based on a transition-matrix approach. This model coupled to a modified Horvitz–Thompson estimator made it possible to make predictions of tree frequencies by species group and diameter class for the entire region. The error propagation was then carried out using Monte Carlo techniques (cf., Efron and Tibshirani 1993).

2 Material and methods

2.1 Data and sampling design

The data we used in the study are a subset of the second and the third Spanish national forest inventory (hereafter referred to as NFI2 and NFI3, respectively) that matched the region of Catalonia. The Spanish NFI follows a stratified sampling design (Alberdi Asensio et al. 2010). The stratification was carried out in each province independently, Catalonia being composed of four provinces. The primary criteria for the stratification were the homogeneity in terms of volume and the relevance of the forest type for management purposes (MAPA 1990, p.170). Other criteria such as the dominant species, the crown coverage, and the development stage also contributed to the stratification. All these criteria were assessed using aerial photographs and existing maps (MAPA 1990, p.170).

In each stratum, the location of the observation points was established during the NFI2 campaign using a 1 × 1 km grid, leading to a systematic design and a similar sampling intensity across the strata. Each observation point consists of a permanent plot.

In Catalonia, the field measurements of NFI2 and NFI3 were carried out in 1989 and 2000, respectively, leaving an 11-year interval between the two series of observations. Although the plots were said to be permanent, they were not all visited again during NFI3, which is partly be due to land use changes. Some new permanent plots were established during NFI3 though, to account for new forested areas. For the sake of this study, we considered the stratum areas remained constant over time and we kept only the plots that were measured in both inventories, for a total of 8795 plots, divided into 115 strata. A summary of the sampling scheme can be found in Table 1.

Table 1 Summary of the sampling scheme

Each permanent plot consisted of four concentric circular subplots in which all trees above a minimum diameter threshold were tagged and measured. More specifically, all trees with diameter at breast height (dbh, 1.3 m in height) greater than 7.5, 12.5, 22.5, and 42.5 cm were measured in a 5-, 10-, 15-, or 25-m-radius subplot, respectively. Different metrics such as tree dbh and height were recorded.

When the plots were revisited during NFI3, the status of each tree was also recorded. Consequently, it could be known whether a particular tree survived and, if it did, what its diameter increment had been over the 11-year interval. In addition to this, recruits, i.e., those new trees that had reached the minimum diameter threshold during the interval, were also recorded.

The subset originally contained about a hundred species, which made a species-specific analysis impossible. For practical reasons, the different species were grouped into four classes:

  • Major commercial softwood species (s = 1)

  • Other softwood species (s = 2)

  • Typical Mediterranean deciduous species (s = 3)

  • Other deciduous species (s = 4)

and s was defined as the species group index. The details of this grouping are provided as Supplementary material (see Tables S1, S2, S3, and S4). A summary of the dataset can be found in Table 2.

Table 2 Summary of the dataset

2.2 Estimating the population total

The theory behind the estimators for stratified sampling schemes is well known (cf., Gregoire and Valentine 2008; Mandallaz 2008). The population total of a characteristic y is usually estimated using a Horvitz–Thompson estimator (Horvitz and Thompson 1952):

$$ \widehat{\tau}={\displaystyle \sum_{k=1}^K{\displaystyle \sum_{i=1}^{n_k}\frac{N_k{y}_{k,i}}{n_k}}} $$
(1)

where τ is the population total, N k is the total number of sampling units in stratum k, y k, i is the variable of interest measured in sample unit i of stratum k, and n k is the sample size in stratum k. The variance of the estimator (1) can be estimated as follows:

$$ \begin{array}{c}\hfill \widehat{Var\left(\widehat{\tau}\right)}={\displaystyle \sum_{k=1}^K{N}_k^2\left(1-\frac{n_k}{N_k}\right)}\kern0.1em \frac{\widehat{\sigma_k^2}}{n_k}\hfill \\ {}\hfill ={\displaystyle \sum_{k=1}^K{N}_k\left({N}_k-{n}_k\right)\frac{\widehat{\sigma_k^2}}{n_k}}\hfill \end{array} $$
(2)

where \( \widehat{\sigma_k^2} \) is the estimated variance of y k, i in stratum k.

There are two issues related to estimators (1) and (2). First of all, decision makers are often more interested in tree frequencies by species group and diameter class than in the total number of trees. In many forest inventories, tree diameters are grouped into 5-cm diameter classes in order to better represent the forest structure. The classes are referred to by their median, so that the 20-cm diameter class encompasses all trees with 17.5 cm ≤ dbh < 22.5 cm. Given this preference for frequencies by species group and diameter class, it is necessary to estimate a vector of total frequencies τ instead of a single parameter τ.

The second issue is that the number of sampling units N k is not constant within a particular stratum since the plot radius changes depending upon tree dbh. As a result, there is no single N k for stratum k but, instead, a vector of N k . If the diameter classes match the thresholds for the different plot radii, we can define N k as a vector that contains the N ksj where s is the species group index and j is the diameter class index. If p j is the plot radius (m) for diameter class j, then the elements of N k are calculated as

$$ \forall s:{N}_{ksj}=\frac{10000{A}_k}{\pi \kern0.1em {p}_j^2}. $$

Given these two issues, the Horvitz–Thompson estimator (1) can be modified as follows:

$$ \widehat{\boldsymbol{\tau}}={\displaystyle \sum_{k=1}^K\frac{1}{n_k}{\displaystyle \sum_{i=1}^{n_k} diag\left({\boldsymbol{N}}_k\right){\boldsymbol{y}}_k{{}_{,}}_i}} $$
(3)

where y k,i is the vector of tree frequencies in plot i of stratum k. The elements of this vector are the tree frequencies y k , isj . In the context of this study, the vector y k,i contains four subvectors, i.e.,

$$ {\boldsymbol{y}}_{k,i}=\left(\begin{array}{l}{\boldsymbol{y}}_{k,i_{1\bullet }}\\ {}{\boldsymbol{y}}_{k,i_{2\bullet }}\\ {}{\boldsymbol{y}}_{k,i_{3\bullet }}\\ {}{\boldsymbol{y}}_{k,i_{4\bullet }}\end{array}\right) $$

with each one of these subvectors representing a single species group:

$$ {{\boldsymbol{y}}_{k, is}}_{\bullet }=\left(\begin{array}{l}{y}_{k, is1}\\ {}{y}_{k, is2}\\ {}\vdots \\ {}{y}_{k,isJ}\end{array}\right) $$

where J is the number of diameter classes.

A “crude” estimator of the variance of estimator (3) is

$$ \widehat{Var\left(\widehat{\boldsymbol{\tau}}\right)}={\displaystyle \sum_{k=1}^K\frac{1}{n_k}{\boldsymbol{Z}}_k^{1/2}{\widehat{\boldsymbol{\Phi}}}_k{\boldsymbol{Z}}_k^{1/2}} $$
(4)

where Z k  = diag(N k  • (N k  − n k )) with • being the element-wise operator (also referred to as the Hadamard product) and n k  = (n k , n k  …)T, and \( {\widehat{\boldsymbol{\Phi}}}_k \) is the estimated variance–covariance matrix of y k,i . This variance–covariance matrix is easily estimated as

$$ {\widehat{\boldsymbol{\Phi}}}_k=\frac{{\displaystyle {\sum}_{i=1}^{n_k}\left({\boldsymbol{y}}_{k,i}-{\overline{\boldsymbol{y}}}_k\right){\left({\boldsymbol{y}}_{k,i}-{\overline{\boldsymbol{y}}}_k\right)}^T}}{n_k-1} $$

where \( {\overline{\boldsymbol{y}}}_k=\frac{{\displaystyle {\sum}_{i=1}^{n_k}{\boldsymbol{y}}_{k,i}}}{n_k}. \)

2.3 Population growth model

Usher (1966) is credited as being the first to introduce a transition matrix model in forestry. Since then, this approach has been widely used in a large array of forest conditions (e.g., Solomon et al. 1986; Liang and Buongiorno 2005; Wernsdörfer et al. 2012; Picard and Liang 2014).

Matrix models are based on the assumption that individuals in a population (e.g., trees) can be classified into discrete classes of an individual-specific key feature, such as the aforementioned diameter classes. To model diameter increments, individuals are allowed to move from diameter class j to j ’ in discrete time with a transition probability of π j ’,j . At the same time, individuals can leave the population if they are harvested or if they die, from any class j with a probability m j , whereas r j new individuals can enter the population into any class j through recruitment. Note that these probabilities and the number of recruits are assumed to be dependent only on the current state. In other words, the previous growth history of a particular individual has no impact on the probabilities at a given time, which is commonly referred to as the Markov assumption (Vanclay 1994, p.44).

We can assume that these probabilities and recruitment are species-group-dependent, so that we obtain π s,j ’,j , m s,j , and r s,j . Using matrix notation, the model can be expressed as

$$ {\boldsymbol{y}}_{k,i,t+1}\left|{\boldsymbol{y}}_{k,i,t}=\boldsymbol{US}{\boldsymbol{y}}_{k,i,t}+{\boldsymbol{r}}_{t+1}+{\boldsymbol{\upepsilon}}_{k,i,t+1}\right. $$
(5)

where vector y k,i,t + 1 contains the frequencies by species group and diameter class in plot i of stratum k at time t + 1, matrix U contains the transition probabilities, matrix S is a diagonal matrix whose elements are the probability of survival, vector y k,i,t contains the frequencies by species group and diameter class at time t, r t + 1 is a column vector that contains the mean number of recruits by species group and diameter class at time t + 1, and ϵ k,i,t + 1 is a vector of residual error terms with E[ϵ k,i,t + 1] = 0 and Var(ϵ k,i,t + 1) = Ψ.

For the sake of simplicity, let us assume that there is a single plot radius and, therefore, a unique N k by stratum. Then, the above transition matrix model (5) can be upscaled at the population level in order to provide the total stem frequencies at time t + 1:

$$ \begin{array}{c}\hfill {\boldsymbol{\tau}}_{t+1}={\displaystyle \sum_{k=1}^K{\displaystyle \sum_{i=1}^{N_k}{\boldsymbol{y}}_{k,i,t+1}}}\hfill \\ {}\hfill ={\displaystyle \sum_{k=1}^K{\displaystyle \sum_{i=1}^{N_k}\boldsymbol{US}{\boldsymbol{y}}_{\mathit{k,i,t}}+{\boldsymbol{r}}_{t+1}+{\boldsymbol{\upepsilon}}_{k,i,t+1}}}\hfill \\ {}\hfill =\boldsymbol{US}{\boldsymbol{\tau}}_t+{\boldsymbol{\rho}}_{t+1}+{\boldsymbol{\xi}}_{t+1}\hfill \end{array} $$
(6)

where ρ t + 1 is a vector that contains the total number of recruits by species group and diameter class in the population, i.e., \( {\boldsymbol{\rho}}_{t+1}={\displaystyle {\sum}_{k=1}^K{\displaystyle {\sum}_{i=1}^{N_k}{\boldsymbol{r}}_{t+1}}} \), and \( {\boldsymbol{\xi}}_{t+1}={\displaystyle {\sum}_{k=1}^K{\displaystyle {\sum}_{i=1}^{N_k}{\boldsymbol{\upepsilon}}_{k,i,t+1}}} \). Vector ξ t + 1 actually represents the difference between the true total of the population and the estimated total, i.e., \( {\boldsymbol{\xi}}_{t+1}=\boldsymbol{\tau} {}_{t+1}-\boldsymbol{US}{\boldsymbol{\tau}}_t-{\boldsymbol{\rho}}_{t+1} \). The same upscaling can be performed with the vector N k at the cost of a tedious notation that we omit here.

If the model is unbiased and the population size is large enough, then we can expect ξ t + 1 to be close to the expectation of ϵ k,i,t + 1, i.e., 0. However, this vector has a variance Var(ξ t + 1) = diag(∑ K k = 1 N k )1/2 Ψdiag(∑ K k = 1 N k )1/2 where Ψ is the variance–covariance of the ϵ k,i,t + 1. This variance is unbiasedly estimated by replacing Ψ by its estimate \( \widehat{\boldsymbol{\Psi}} \)

Being population parameters, the “true” matrices U and S and the “true” vectors τ t and ρ t+1 in model (6) are unknown. The total of population τ t can be unbiasedly estimated using the modified Horvitz–Thompson estimator (Eq. 3), which takes into account the different plot radii. The transition and survival probabilities can be estimated from the monitoring of the individual trees (see Supplementary material). However, estimating the total recruitment ρ t+1 is not straightforward. As a matter of fact, the different plot radii hinder us from estimating this vector.

The inventory defines a recruit as a tree that was too small during NFI2 but meets the dbh requirement for being included in NFI3. Because the plot radius varies across the diameter classes, some of those recruits might not be true recruits stricto sensu, i.e., trees that were below the 7.5-cm minimum diameter during NFI2 and that grew over this threshold during the interval.

In fact, the total number of recruits, let \( {\overset{.}{\boldsymbol{\rho}}}_{t+1} \) denote this variable, is the sum of the true recruits ρ t+1 plus all the trees above the 7.5-cm minimum diameter but not recorded in NFI2 because they were located in the outer rings of the plots. In the Spanish inventory, for example, the observed recruits in the 15-cm diameter class are those that were initially smaller than 7.5 cm in dbh, plus those located between 5 and 10 m from the plot center that were initially in the 10-cm diameter class and increased up to the 15-cm diameter class.

It is possible to derive the total number of true recruits by subtracting the “fake” recruits as calculated from the population total τ t and the transition matrix US:

$$ {\boldsymbol{\rho}}_{t+1}={\boldsymbol{\rho}}_{t+1}-\boldsymbol{H}\boldsymbol{U}\boldsymbol{S}{\boldsymbol{\tau}}_t $$
(7)

where H is a design matrix that accounts for the change in plot radius along with diameter classes. The details of this matrix are annexed to this paper (see Appendix). Vector \( {\overset{.}{\boldsymbol{\rho}}}_{t+1} \) is easily estimated using the Horvitz–Thomson estimator in Sect. 2.2, whereas H is defined by the inventory protocol.

2.4 Model fit and uncertainty

Model (6) applies at the population level and it does not depend on the strata. The estimation of the probabilities in U and S is made possible through logistic regressions (see Supplementary material, Section S2). However, the sampling scheme is stratified and, consequently, the observations in the dataset may not share the same sampling weights. Using those logistic regressions with no correction factors might give more importance to some strata just because they were more intensively sampled than they would have been in a pure random design, thereby leading to biased estimates of the population parameters.

In such a context, the sampling weights of the observations can be included in the logistic regressions in order to take into account the sampling scheme (Hosmer et al. 2013, p.233). The sampling weights are also used to correct the estimated variance–covariance matrix of the parameter estimates. For more details about this statistical approach, the reader is referred to Hosmer et al. (2013). The SURVEYLOGISTIC procedure available in the SAS System (SAS Institute Inc. 2008, Ch.84) allows for the fit of such models. This is the software we used in this paper to fit the logistic regressions behind the transition and survival probabilities.

In the end, all the components of model (6) can be unbiasedly estimated using either the modified Horvitz–Thompson estimator, the logistic regressions with sampling weights, and the data from NFI2 and NFI3, which yields

$$ {\tilde{\boldsymbol{\tau}}}_{NFI 3}=\widehat{\boldsymbol{U}}\widehat{\boldsymbol{S}}{\widehat{\boldsymbol{\tau}}}_{NFI2}+{\widehat{\boldsymbol{\rho}}}_{NFI 3} $$
(8)

where \( \widehat{\boldsymbol{\rho}} \) NFI3  = \( \widehat{\boldsymbol{\rho}} \) NFI3  − \( \boldsymbol{H}\widehat{\boldsymbol{U}}\widehat{\boldsymbol{S}}\widehat{\boldsymbol{\tau}} \) NFI2 and \( {\tilde{\boldsymbol{\tau}}}_{NFI 3} \) is the prediction as opposed to \( {\widehat{\boldsymbol{\tau}}}_{NFI 3} \), which is the population total as estimated from NFI3 using the modified Horvitz–Thompson estimator. Term ξ t + 1 in Eq. 6 can be omitted since it is equal to 0 under the assumption of unbiasedness. It still contributes to the variance of \( {\tilde{\boldsymbol{\tau}}}_{NFI 3} \) though.

The true total of the population for the third national forest inventory remains unknown. Because the estimated frequencies from NFI3 are unbiased, we took them as a reference. If the growth model is unbiased as well, then we can expect the predicted frequencies to be close to the estimated ones. Absolute and relative biases were estimated by species group and diameter class in order to assess the fit of the model.

To test the impact of the sampling scheme, the growth model was also fitted using standard logistic regressions instead of those adapted to the stratified design. The standard logistic regressions assume even sampling weights across the observations as in a random sampling scheme.

For the purpose of prediction, i.e., predictions beyond the third national forest inventory, the recruitment vector and the variance–covariance of ξ t + 1 were assumed to remain unchanged. For the fourth national forest inventory (NFI4), the model could be used as follows:

$$ {\tilde{\boldsymbol{\tau}}}_{NFI 4}=\widehat{\boldsymbol{U}}\widehat{\boldsymbol{S}}{\widehat{\boldsymbol{\tau}}}_{NFI3}+\widehat{\boldsymbol{\rho}} $$
(9)

where \( \widehat{\boldsymbol{\rho}} \) = \( {\widehat{\boldsymbol{\rho}}}_{NFI 3} \)

There were at least two sources of uncertainty in model (9). The first one was the uncertainty due to the sampling. This uncertainty stemmed from the estimated population total from the third campaign (\( {\widehat{\boldsymbol{\tau}}}_{NFI 3} \)). The second source of uncertainty was related to the model and came from the estimated transition probabilities (Û), survival probabilities (Ŝ), recruitment (\( \widehat{\boldsymbol{\rho}} \)), and ξ t + 1.

The logistic regression models that define the probabilities in matrices Û and Ŝ are nonlinear because of the logit link functions. Consequently, combining the two aforementioned sources of uncertainty was not straightforward. Monte Carlo techniques offered a simpler alternative to a complex analytical development. The technique consists in drawing a large number of realizations for some random variables in order to reproduce the variability of a particular phenomenon (Vanclay 1994, p.7).

To distinguish the contribution of each source of uncertainty to the total uncertainty, we actually ran three simulations, each one based on 10,000 realizations: a first one with random deviates in \( {\widehat{\boldsymbol{\tau}}}_{NFI 3} \), a second one with random deviates in all the other components except \( {\widehat{\boldsymbol{\tau}}}_{NFI 3} \), and, finally, a third one with random deviates in all the components. These simulations provided the uncertainty due to the sampling only, to the model only, and to both sources, respectively. There was no obvious link between \( {\widehat{\boldsymbol{\tau}}}_{NFI 3} \) and all the other components, neither in terms of covariance nor in the model formulation. Consequently, the sum of the variances obtained in the first two simulations should be approximately equal to that of the third simulation where both sources of uncertainty were taken into account. This third simulation actually represented a sort of benchmark to make sure the simulations are consistent.

3 Results

3.1 Estimated total frequencies

The estimated total frequencies by species group and diameter class as calculated using estimator (3) are shown in Table 3 for NFI2 and NFI3. The total frequencies tended to decrease along with the diameter classes for all species in the two inventories. The all-diameter total frequencies indicated that the other softwoods as well as the Mediterranean deciduous species were the most abundant groups. For all species groups, the all-diameter total frequencies increased between NFI2 and NFI3, with the most notable increase being in the Mediterranean deciduous species group.

Table 3 Estimated tree frequencies by species group and diameter class (d.c.) for NFI2 and NFI3 (frequencies expressed in thousands of trees; the 70-cm diameter class encompasses all trees with dbh ≥67.5 cm)

The estimated all-diameter and all-species total frequencies were \( {\widehat{\tau}}_{NFI 2} \) = 1,000,419,931 trees and \( {\widehat{\tau}}_{NFI 3} \) = 1,114,523,220. The 0.95 confidence intervals associated with these all diameter and all-species total frequencies were

for NFI2,

983,391,969 ≤ τ NFI2 ≤ 1,017,447,894

for NFI3,

1,096,565,991 ≤ τ NFI3 ≤ 1,132,480,450

Considering the extent of the estimated variance–covariance matrices of \( {\widehat{\boldsymbol{\tau}}}_{NFI 2} \) and \( {\widehat{\boldsymbol{\tau}}}_{NFI 3} \), they are not shown here but are available upon request to the authors.

3.2 Population growth model

Taking NFI3 as a reference, the comparison of the estimated biases for the model based on weighted logistic regressions and the one using standard logistic regressions are shown in Fig. 1. Given the larger magnitude of the frequencies in smaller diameter classes, larger differences were also observed in those classes. For all species groups, the logistic regressions accounting for the stratified sampling scheme yielded smaller biases than standard logistic regressions in all diameter classes, except the 25-cm diameter class in the commercial softwoods and the 20-cm diameter class in the Mediterranean deciduous species group (Fig. 1a, c). The largest difference was observed for Mediterranean deciduous species in the 10-cm diameter class, where the model with standard logistic regressions overestimated the total frequency by more than 1 × 107 trees, whereas there was no perceptible bias for the model based on weighted logistic regressions (Fig. 1c).

Fig. 1
figure 1

Estimated biases for the model with weighted logistic regressions (dots) and the model with standard logistic regressions (triangles)

Estimated relative biases are shown in Fig. 2. Those biases were small for all species groups in the smaller diameter classes. Above the 50-cm threshold, the biases increased and ranged from −35 % to 20 %. There were only a few large diameter classes, mainly in the other softwood and other deciduous species groups (Fig. 2b, c), for which the model with the standard logistic regressions performed better. In most cases though, the model with weighted logistic regressions yielded better results. The resulting parameter estimates of those weighted regressions as well as the estimated recruitment vector \( \widehat{\boldsymbol{\rho}} \) are available online as Supplementary material (see Tables S5 and S6).

Fig. 2
figure 2

Estimated relative biases for the model with weighted logistic regressions (dots) and the model with standard logistic regressions (triangles)

The predictions for the fourth national forest inventory as estimated from the third Monte Carlo simulation, i.e., the one that considered both sources of uncertainty, are shown in Table 4. The realizations of the frequencies were nearly Gaussian for all combinations of species groups and diameter classes. In terms of relative values, the width of the Monte Carlo confidence intervals increased along the diameter classes, ranging from ±3 to ±6 % for smaller-diameter classes, up to more than ±20 % in the 70-cm diameter class. The total frequencies per species group had smaller relative errors that led to relative confidence intervals of ±2.5 to ±4.5 %. Compared with the estimated frequencies in NFI3 (Table 3), the total frequencies were expected to significantly increase in the Mediterranean deciduous and the other deciduous species groups.

Table 4 Predicted tree frequencies (\( {\overset{\sim }{\tau}}_{NFI 4} \)) with their relative 0.95 confidence intervals (CIrel) by species group and diameter class (d.c.) for NFI4 (frequencies expressed in thousands of trees; the 70-cm diameter class encompasses all trees with dbh ≥67.5 cm)

There was no major departure between the sum of the variance in the first two Monte Carlo simulations and the variance of the third one, which supported the assumption of independence between sampling and model-related errors. For all the species groups and diameter classes, the uncertainty due to the sampling accounted for more than 60 % of the total variance in most cases, except for the first diameter class and the 30-cm diameter class in the other deciduous species group (Fig. 3). For all species groups, the proportion tended to increase along with the diameter classes, with more than 80 % of the variance due to the sampling in the 70-cm diameter class. The variance induced by the sum of the error terms in the model, i.e., ξ t + 1, accounted for less than 0.2 % of the total variance (results not shown).

Fig. 3
figure 3

Proportion of the total variance due to the sampling variability in the prediction of tree frequencies for NFI4

4 Discussion

Taking both the sampling uncertainty and the model uncertainty into account in an 11-year forecast of the Catalonian forest, we obtained relative errors on predicted frequencies that ranged from ±5 % in the smallest diameter classes to more than ±20 % in the largest ones (Table 4). It appears that the sampling uncertainty represents the biggest share in the total uncertainty. In most cases, the sampling variance accounted for at least 60 % of the total variance, with this proportion increasing along with the diameter classes (Fig. 3). As a matter of fact, the model variance appeared to be the major source of uncertainty only for the 10-cm diameter class. All other things considered, the predictions for the 10-cm diameter class are more highly impacted by the recruitment vector \( \widehat{\boldsymbol{\rho}} \) than those of larger diameter classes. Although this vector is part of the model, its estimation primarily relies on the estimate of \( {\overset{.}{\boldsymbol{\rho}}}_{NFI3} \) and, to a certain extent, on the estimate of τ NFI2 for recruits in diameter classes larger than 10 cm (see Eq. 7). These two estimates, \( {\widehat{\boldsymbol{\rho}}}_{NFI3} \) and \( {\widehat{\boldsymbol{\tau}}}_{NFI2} \), were obtained through sampling. Put this way, it clearly appears that the sampling contributes more uncertainty to the system than the estimates of the transition and mortality probabilities in matrices U and S and the variance of the sum of residual error terms ξ t + 1.

This result is in accordance with recent studies on this topic. Breidenbach et al. (2014) and Ståhl et al. (2014) compared model and sampling variability in a context of biomass estimation from the Norwegian, Finnish, and Swedish national forest inventories, and they found out that the model-related variability only accounted for 28 and 10 % of the total variability, respectively. Moreover, McRoberts and Westfall (2014) reported that the model-related uncertainty was dependent on the number of observations used to fit the model. Because most models are based on least squares and maximum likelihood estimators, which are consistent, a larger number of observations obviously results in smaller variances for the parameter estimates.

In our case study, the logistic regression models from which mortality and transition probabilities were predicted were based on a maximum likelihood approach and they were fitted to more than 100,000 tree observations (Table 2), which resulted in small variances for the parameter estimates. Even if the variances associated with mortality and transition probability predictions increased at the edge of the data range, i.e., in the larger-diameter classes, this increase was nothing compared to the increase in the sampling variability.

The sum of the residual error terms ξ t + 1 contributed for less than 1 % to the total variance. Working with the estimate of the mean, McRoberts and Westfall (2014) also concluded that the contribution of residual error terms to the uncertainty of the estimate of the mean was relatively small. When estimating the total of a population, the contribution of the residual error terms to the variance of \( \widehat{\tau} \) is N · Var(ϵ). With the estimate of the mean, the contribution of the residual error terms becomes \( \frac{Var\left(\epsilon \right)}{N} \). Thus, when estimating the mean, the contribution of the residual error terms tends toward 0 as the population size increases. This assertion is false when estimating the total as the variance induced by the residual error terms increases along with the population size. In our case study, this contribution remained relatively small because the variances of the ϵ k,i,t + 1 were already small. This contribution could rapidly increase if the variances of the residual error terms were larger though.

The assumption of independence between the residual error terms from one plot to another may also explain the relatively small contribution of the residual error to the total uncertainty. It is well known that climate conditions affect all diameter increments either positively or negatively. When not explicitly considered in the model, these climate conditions represent a random effect that induces a positive correlation among the residual errors and that may represent an important source of uncertainty in short-term projections (Kangas 1998).

The short time step we used in this case study increased the contribution of sampling error over model-related errors. In Kangas (1998), the coefficient of variation of stand volume growth predictions almost doubled over a 50-year projection. Similar increases were found for stand basal area and stem density in Fortin et al. (2009) for 15-year growth forecasts. For longer projections, we could reasonably expect the model-related uncertainty to increase and even exceed that of sampling. Holopainen et al. (2010) found that model-related errors contributed slightly more than sampling errors to the total uncertainty over a 100-year projection.

In the short term, one way to reduce the uncertainty due to the sampling would be to increase the sampling effort. Like the traditional Horvitz–Thompson estimator, estimator (3) is consistent: Its variance–covariance is inversely proportional to the sample size. As a consequence, the error margin decreases proportionally to the inverse square root of the sample size. Considering that the sample is already large (see Table 1), further reducing the error margin would imply a significant increase in the sampling effort, which is hardly conceivable.

Another possibility would be to change the plot size. A larger plot size for some diameter classes would induce a decrease in the variance of the population and result in more precise estimates. This would be particularly effective for diameter classes above 30 cm, which exhibited the largest biases and relative errors. Again, this would require additional means, but probably less effort than increasing the sample size. Angle count sampling (Gregoire and Valentine 2008, Ch.8) might provide the required precision for large trees, but this remains to be investigated.

A third option for reducing the uncertainty due to the sampling is the use of more efficient estimators. The Horvitz–Thompson estimator (3) we used is designed for a stratified scheme with random sampling without replacement. As a matter of fact, the within-stratum sampling is not strictly random here, but systematic instead. If there are some strong local random effects, then two neighbor sampling units might be highly correlated. In such a context, the systematic design reduces the variance of the estimator by avoiding the selection of these neighbor sampling units (cf., Gregoire and Valentine 2008, p.55). Some alternative variance estimators exist, which may take into account the greater precision of the systematic sampling, but none of them are entirely unbiased (Särndal et al. 1992, p.83).

Regardless of the approach, all estimators remain to be tested and adapted for a multivariate response, which raises an additional issue here. The variance estimator (4) is a “crude” estimator of the variance–covariance in a multivariate framework. Actually, the estimation of the covariance between related Horvitz–Thompson estimators is not straightforward (cf., Wood 2008). It involves the joint probability of drawing two sampling units. Even though our estimator already includes the covariances between the diameter classes through Φ in Eq. 4, it does not explicitly account for this joint probability. The calculation of this probability is hindered by the changes in plot size, not only for different observation points but also within the same observation point. This potential improvement of the estimators deserves to be addressed since it is probably the least expensive way to reduce or, at least, to better estimate sampling uncertainty.

Regarding the growth model we developed in this study, it has some features that deserve to be outlined. First of all, the estimation of the transition probabilities is based on an ordinal logistic model. Whenever this is possible, some authors (e.g., Liang and Picard 2013) prefer to structure the model in such a way that there are only two possible transitions: staying in the initial diameter class or moving to the next one, which is commonly referred to as the Usher assumption (Vanclay 1994, p.46). The use of a simple logistic regression then makes it possible to estimate the probability of moving to the next class, while the balance of probability represents the probability of staying in the initial diameter class. Limiting the transitions to two possibilities is a valid assumption only and only if the time step is short enough to avoid transitions of more than one diameter class. In national forest inventories based on permanent sample plots, time steps are usually long enough to allow for more than two possible outcomes in the transitions. It is still possible to use wider diameter classes in order to limit the number of transitions, but at the cost of a reduced-resolution model.

In our case study, we could not use wider diameter classes without loss of information. Moreover, using wider diameter classes would have led to a complex situation with some diameter classes having their trees measured over different plot sizes. Using 5-cm diameter classes as we did in this study resulted in many possible transitions. Actually, we had 11 possibilities, including negative transitions that probably reflected some measurement errors. The ordinal logistic regression made it possible to predict the probabilities associated with those 11 transitions for all the species while ensuring that the sum of the probabilities was equal to 1. It also provided a consistent variance–covariance matrix for the parameter estimates, which was a requirement for Monte Carlo simulations.

To the best of our knowledge, only Escalante et al. (2011) used a similar approach, which was nevertheless a multinomial regression instead of an ordinal logistic regression. Boltz and Carter (2006) also used a multinomial regression, but it was for predicting mortality probabilities simultaneously with two possible transitions. Compared to our approach, the multinomial regression is more flexible, but at the cost of a substantial increase in the number of parameters. In fact, the effect of any covariate is allowed to change across the possible transitions. In this case study, we tried this approach, but it resulted in a multinomial model with 80 parameters and an ill-conditioned variance–covariance matrix that impeded Monte Carlo simulations.

The ordinal logistic regression is more restrictive: It assumes that the intercepts delimiting the different transitions and the effect of the covariate remain the same regardless of the transitions. It also requires that the response levels can somehow be ordered. When dealing with diameter transitions, this is not a limitation since the number of classes in the transitions arises as a natural order. In our case study, we managed to fit all the transition probabilities with only 17 parameters (see Supplementary material, Table S5). The assumption of constant intercepts and dbh effect across the transitions is subject to debate. It led to an underestimation of tree frequencies in larger diameter classes of the other deciduous species group (Fig. 2d). The model could have been fitted to each species individually. However, this was rather complicated since the distribution of the species groups was not balanced across the transitions and the strata. In other words, fitting individual models for each species group led to a null occurrence in some strata and transition states. The approach we used seemed to be a good trade-off in terms of the number of parameters to be estimated and the capacity to predict the transition probabilities for all species groups and possible transitions. However, it should be stressed that the predicted frequencies for larger diameter classes of the other deciduous species group might be underestimated.

A second innovative feature of our model is the fact that it accounts for the stratified sampling scheme. The stratification implies different sampling weights across the strata. As a consequence, some of them are more highly represented in the sample than others, and not taking the stratification into account in the model fitting might lead to biased estimates at the population level. In our study, the weighted logistic regressions and the estimation of the recruitment vector accounted for the stratified sampling scheme. It turned out that the resulting model provided predictions that were closer to the estimates for NFI3 for most species groups and diameter classes. However, the fit was only a few percent better in relative values (Fig. 2). In absolute values, the fit was clearly improved in smaller diameter classes (Fig. 1), but this was mainly the result of an improved recruitment estimation rather than enhanced mortality and transition probability predictions.

One can reasonably wonder if these weighted logistic regressions are worth the effort. We have to recall that the original sampling scheme followed a systematic design. By definition, this systematic design implies constant sampling weights across the strata. Screening the database to keep only the plots that were measured in NFI2 and NFI3 induced a variability in the sampling weights. However, this reduced variability has nothing to do with what it would be in stratified random sampling with optimal allocation (Särndal et al. 1992, p.106). It also explains why the estimated probabilities of the standard and weighted logistic regressions were not that different. In the case of greater variability in the sampling weights, the weighted logistic regressions might prove essential to obtain unbiased estimates. In any context, those weighted logistic regressions should be considered as more statistically robust than their standard versions.

A third feature is the way recruitment was considered in the model. Like transition probabilities, recruitment often follows the Usher assumption, i.e., it is assumed to be existent only in the smallest diameter class (e.g., Favrichon 1998; Liang and Picard 2013). However, considering the time step in our case study, this assumption would lead to an underestimation of tree recruitment. A major issue arises when it comes to estimating the recruitment in larger diameter classes along with changes in plot size. As we already mentioned, this change in plot size makes it difficult to distinguish “fake” recruits from “true” ones. In this study, we estimated the true recruits conditional on the mortality and transition probabilities. That way, we managed to estimate the recruitment in larger diameter classes without double counts.

However, this conditional estimation is somewhat flawed. First, the elements of \( \boldsymbol{\rho} \) are not limited to positive values. Actually, if the survival and transition probabilities are overestimated, the difference \( \widehat{\boldsymbol{\rho}} \) NFI3  − \( \boldsymbol{H}\widehat{\boldsymbol{U}}\widehat{\boldsymbol{S}}\widehat{\boldsymbol{\tau}} \) NFI2 may yield negative recruitment in some classes. In our case study, it happened for very few diameter classes, but it could still be observed. In addition, underestimating the survival and transition probabilities may lead to overestimating the recruitment in larger diameter classes. For instance, recruitment in the 65- and 70-cm diameter classes is very unlikely over an 11-year period. This negative recruitment for some classes and the recruitment in the larger diameter classes represent major limitations. A state-of-the-art method would consist in estimating the mortality and transition probabilities as well as the recruitment in a single regression, for which the estimator remains to be developed.

Combining the two sources of uncertainty required an upscaling of the model at the region level. This upscaling represented in Eq. 6 was made possible because all the components of the growth model were density-independent. As a matter of fact, the species group and the diameter class are the only two covariates in our model. Using a density-dependent model (e.g., Favrichon 1998) implies a complex error propagation. While this error propagation can be approximated through a second-order Taylor series (see Sambakhe et al. 2014), we preferred to use a simpler model for the sake of the example.

Keeping the model density-independent facilitated the upscaling, but it remains a strong assumption. Actually, the model suffers from the stationary assumption: The mortality and transition probabilities remain constant over time (Vanclay 1994, p.44). In other words, the model assumes that the mortality and transition probabilities as well as the recruitment will remain what they were between NFI2 and NFI3. Changes in the average density are likely to result in biased predictions.

An additional limitation of this study is the assumption of constant areas for the strata. These areas are likely to change due to land use change or to reforestation. Mathematically speaking, this means that vector N k in estimators (3) and (4) is no longer constant but is a random variable. An estimated N k could be predicted if a model of forested area changes was available. Then, the uncertainty associated with this \( {\widehat{\boldsymbol{N}}}_k \) could be propagated through the estimators using Monte Carlo simulation as we did in this study. It would result in less precise estimates of the tree frequencies. Since this model of forested area changes was not available, we could not quantify this source of uncertainty which would certainly increase the model-related uncertainty. This remains to be investigated.

In our study, we did not consider model misspecification. In fact, we assumed the model was correct and unbiased. Ståhl et al. (2014) made the same approximation in their study on biomass estimation in Finland and Sweden. However, there are many circumstances in which we can expect term ξ t + 1 to be different from 0. Mandallaz (2008) outlined that this was inevitable when the model is external, i.e., not fitted to data not from the inventory. The aforementioned assumptions of constant intercepts and dbh effect in the estimation of transition probabilities, of density-independent mortality and transition probabilities, and of strata with constant areas may result in model misspecification. This is not a concern for predictions of tree frequencies over one or two 11-year growth periods. However, we would recommend not using the model for longer growth forecasts since it might impact the accuracy of the predictions. Considering that the plots are permanent, refitting the model to the new data after each campaign would be a safer strategy. If the time step was to change, the model could also be adapted using the generalized approach proposed by Harrison and Michie (1985).

In the context of a two-stage forest inventory, Mandallaz and Massey (2012) have shown that it is possible to correct for model misspecification by using a two-stage HT estimator. The method could be adapted for an inventory such as the Spanish NFI. However, it requires some observations, which is a major problem when predicting forest growth as future conditions cannot be observed. If the model was used for updating a former inventory, this kind of estimators could prove efficient. For example, if a subsample of plots had been revisited in 2011, then these observations could have been used to estimate and correct for the bias due to model misspecification. If these observations are unavailable or if the model is used in purely predictive context, there are no other options than assuming a correct model or guessing what a plausible bias due to model misspecification could be.

5 Conclusions

Predicting forest growth on a large scale is challenging because it involves many sources of uncertainty. First, the current forest conditions cannot be taken for certain since they are estimated through sampling. Second, the growth model parameters remain unknown and the fit only provides estimates of them, which is an additional source of uncertainty in large-scale forest growth predictions.

In this study, we managed to take these two sources of uncertainty into account in growth forecasts of Catalonia’s forests. This required the development of a multivariate version of the Horvitz–Thompson estimator and a population growth model based on a transition matrix approach. Using Monte Carlo simulation, it was then possible to make predictions of tree frequencies by 5-cm diameter class, with their 0.95 confidence intervals. As a general trend, it should be expected that deciduous species will be more abundant, whereas coniferous species will remain relatively stable. Those predicted total frequencies were based on the assumption of a constant stratum areas and were for the year 2011, the date at which the fourth national forest inventory campaign should have been carried out. Unfortunately, the fourth campaign was delayed due to some financial constraints. Until this inventory is carried out, this study provides some estimates of tree frequencies by diameter class at the population level.

While the predictions of total tree frequencies by species group included small relative errors, it turned out that the frequencies in larger diameter classes were predicted with much lower precision. Like previous studies concerning the uncertainty assessment of national biomass and volume estimates (McRoberts and Westfall 2014; Ståhl et al. 2014; Breidenbach et al. 2014), we concluded that the model-related uncertainty is generally smaller than the sampling uncertainty. Even when the model-related error appeared to be greater, as in the case of smaller diameter classes, it seemed to be the consequence of sampling variability in the recruitment estimation. Among the different options for reducing sampling uncertainty, we recommend first improving the current estimators. A consistent variance–covariance estimator of the multivariate Horvitz–Thompson estimator should be developed and the estimators should better account for the sampling scheme and the changing plot size.