Keywords

5.1 Introduction

In order to estimate the unknown values of the parameters of interest of the forest inventory, using the data collected through the survey campaign, it is required to apply the correct statistical procedures. In other words, it is necessary to identify the proper unbiased estimators. This chapter describes the unbiased estimators that were developed for the survey campaign of INFC2005 and explains why they can still be used for the campaign of INFC2015.

First of all, it is important to clarify which parameters of the forest inventory need to be estimated. It should be considered that the national territory is subdivided into \(L\) territorial districts (corresponding to the 21 Italian regions and autonomous provinces, called “regions” in the following text), with areal extents equal to \({A}_{1},{A}_{2},\dots ,{A}_{i},\dots ,{A}_{L}\), respectively. The total areal extent of Italy is therefore equal to \(A={\sum }_{l=1}^{L}{A}_{l}\). Each region is characterised by \(K=NF+F+1\) land categories, where \(NF\) is the number of non-woodland categories (set equal to one in INFC to simplify the estimation), \(F\) indicates the number of woodland categories, while the residual class comprises the categories that are excluded from the inventory.

The main population parameters of interest that can be estimated with the data collected during the second phase of the sampling plan are the \(K\times L\) region-level areal extents of the categories \(\left\{{a}_{kl}: k=1,\dots ,K, l=1,\dots ,L\right\}\). Aggregations of these quantities provide other interesting information, such as the areal extent of category \(k\) in the entire national territory, \({A}_{k}={\sum }_{l=1}^{L}{a}_{kl}\); the areal extent of a subset \(C\) of some categories for a given region \(l\), \({a}_{Cl}={\sum }_{k\in C}{a}_{kl}\), or for the entire country, \({a}_{C}={\sum }_{l=1}^{L}{a}_{Cl}\).

The main population parameters that can be estimated at the end of the third phase of the sampling plan concern the measured quantitative variables, such as the number of trees, the growing stock volume and the biomass. In particular, for any measured variable, it is possible to estimate the \(F\times L\) total values \({t}_{kl}\) for any combination of woodland category and region. Analogously to the \({a}_{kl}\) parameters, aggregations of \({t}_{kl}\) may also be of particular interest. Specifically, for a given variable, the quantity \(T={\sum }_{k=1}^{K}{\sum }_{l=1}^{L}{t}_{kl}\) corresponds to the overall value in the entire territory; \({T}_{l}={\sum }_{k=1}^{K}{t}_{kl}\) represents the overall value in region \(l\); and \({T}_{k}={\sum }_{l=1}^{L}{t}_{kl}\) indicates the overall value for the category \(k\) in the whole area. Moreover, for a given subset \(C\) of categories, it is also interesting to know the overall value of the variable in region \(l\), \({t}_{Cl}={\sum }_{k\in C}{t}_{kl}\), and the overall value over the entire territory, \({t}_{C}={\sum }_{l=1}^{L}{t}_{Cl}\). Another set of relevant parameters is the \(F\times L\) density values \({d}_{kl}={t}_{kl}/{a}_{kl}\) and their aggregations, that is \({d}_{k}={t}_{k}/{a}_{k}\), \({d}_{l}={t}_{l}/{a}_{l}\), \({d}_{Cl}={t}_{Cl}/{a}_{Cl}\) and \({d}_{C}={t}_{C}/{a}_{C}\).

Proper estimates of all these parameters and their corresponding variances can be obtained using the estimators developed for the survey campaign of INFC2005 (Fattorini et al., 2011). Such estimators are explained in the following sections of this chapter. In particular, Sect. 5.2 provides an essential description of the estimators of areal extents, while Sect. 5.3 briefly illustrates the estimation procedures of the total and density values of quantitative variables. Section 5.4 concludes explaining why these estimators are still valid for the survey campaign of INFC2015.

5.2 Estimation of Areal Extents

According to the sampling design adopted for INFC2005, during the first phase of the survey campaign (cf. Chap. 2), a point is selected at random in each of the \(NQ\) quadrats of the sampling grid. The \({N}_{0}\le NQ\) selected points are then classified, according to aerial photointerpretation, into \(H=NF+V+1\) land use and cover strata, where \(V<F\) strata refer to a less detailed classification of the F categories (cf. Sect. 5.1). The last residual stratum includes all the points that the aerial photointerpretation cannot properly classify.

The \({N}_{0}\) points are also classified according to the region, thus leading to a two-way stratification characterised by \(H\times L\) strata, each denoted as \({\mathrm{U}}_{hl}\), with size \({N}_{hl}(h=1,\dots ,H, l=1,\dots ,L)\).

For any stratum \(hl\) relative to the V categories and non-classifiable land category, that is where \(h=NF+1,\dots ,H\), if \({N}_{hl}>0\), a sample \({\mathrm{S}}_{hl}\) of \({n}_{hl}\) points is selected through simple random sampling without replacement. The sampled points are then observed on the field in order to correct possible classification errors arising during the aerial photointerpretation. To the contrary, points belonging to the strata relative to NF categories, those with \(h=1,\dots ,NF\), are not sampled (cf. Chap. 2).

Following Fattorini et al. (2006) an unbiased estimator of \({a}_{kl}\) is

$$\hat{a}_{kl} = R\left\{ {w_{kl} + \sum \limits_{h > NF} w_{hl} w_{khl} } \right\},{ }k = 1, \ldots ,NF,{ }l = 1, \ldots ,L$$
(5.1)

or

$$\hat{a}_{kl} = R \sum \limits_{h > NF} w_{hl} w_{khl} ,{ }k = NF + 1, \ldots ,K,{ }l = 1, \ldots ,L$$
(5.2)

where \(R\) is the overall areal extent of the \(NQ\) quadrats, \({w}_{hl}={N}_{hl}/NQ\) is the weight of stratum \(hl\), \({w}_{kl}={N}_{kl}/NQ\) is the weight of stratum \(kl\) relative to the non-woodland categories that are not sampled during the second phase, and \({w}_{khl}={n}_{khl}/{n}_{hl}\) is the share of points of \({\mathrm{S}}_{hl}\) belonging to land category \(k\).

Fattorini et al. (2006) also show that a conservative estimator of the variance of \({\widehat{a}}_{kl}\) is

$$\begin{aligned} \hat{v}\left( {\hat{a}_{kl} } \right) & = \frac{R^2 }{{NQ - 1}}\left\{ {w_{kl} + \sum \limits_{h > NF} \frac{{N_{hl} - 1}}{{n_{hl} - 1}}w_{hl} w_{khl} - \sum \limits_{h > NF} \frac{{N_{hl} - n_{hl} }}{{n_{hl} - 1}}w_{hl} w^2_{khl} - p_{kl}^2 } \right\}, \\ & \quad \quad \quad \quad \;k = 1, \ldots ,NF, l = 1, \ldots ,L \\ \end{aligned}$$
(5.3)

or

$$\begin{aligned} \hat{v}\left( {\hat{a}_{kl} } \right) & = \frac{R^2 }{{NQ - 1}}\left\{ { \sum \limits_{h > NF} \frac{{N_{hl} - 1}}{{n_{hl} - 1}}w_{hl} w_{khl} - \sum \limits_{h > NF} \frac{{N_{hl} - n_{hl} }}{{n_{hl} - 1}}w_{hl} w^2_{khl} - p_{kl}^2 } \right\}, \\ & \quad \quad \quad \quad \;k = NF + 1, \ldots ,K, l = 1, \ldots ,L \\ \end{aligned}$$
(5.4)

where \({p}_{kl}={\widehat{a}}_{kl}/R\).

In applying Eqs. 5.3 and 5.4, it is necessary that if \({N}_{hl}>1\) then \({n}_{kl}\ge 2\) while if \({N}_{hl}=1\) then \({n}_{kl}\ge 1\).

In order to estimate the variances of aggregations of \({\widehat{a}}_{kl}\), such as \({\widehat{A}}_{k}={\sum }_{l=1}^{L}{\widehat{a}}_{kl}\), the covariances among the estimates involved in the aggregations are also needed. According to Fattorini et al. (2006), the covariance between \({\widehat{a}}_{kl}\) and \(\hat{a}_{k^{\prime}l}\) can be properly estimated with

$$\begin{aligned} \hat{c}\left( {\hat{a}_{kl} ,\hat{a}_{k{^{\prime}}l} } \right) & = - \frac{R^2 }{{NQ - 1}}\left\{ { \sum \limits_{k > NF} \frac{{N_{hl} - n_{hl} }}{{n_{hl} - 1}}w_{hl} w_{khl} w_{k^{\prime}hl} + p_{kl} p_{k^{\prime}l} } \right\},{ } \\ & \quad \quad k^{\prime} \ne k = 1, \ldots ,K,{ }l = 1, \ldots ,L \\ \end{aligned}$$
(5.5)

while the covariance between \({\widehat{a}}_{kl}\) and \(\hat{a}_{k^{\prime}l^{\prime}}\) should be estimated using

$$\hat{c}\left( {\hat{a}_{kl} ,\hat{a}_{k{^{\prime}}l{^{\prime}}} } \right) = \frac{{\hat{a}_{kl} \hat{a}_{k{^{\prime}}l{^{\prime}}} }}{NQ - 1},k,k^{\prime} = 1, \ldots ,K,l^{\prime} \ne l, \ldots ,L$$
(5.6)

Although Eqs. 5.1 and 5.2 represent unbiased estimators of the areal extents for all the combinations of land category and region, the sum of the \(K\times L\) estimates \({\widehat{a}}_{kl}\) is not precisely equal to \(A\). It is indeed equal to \(R\) times the share of points collected during the first sampling phase that fall within the borders of the national territory. Analogously, for each \(kl\) stratum, \({\sum }_{k=1}^{K}{\widehat{a}}_{kl}\) is not equal to \({A}_{l}\) since it corresponds to \(R\) times the share of points falling in the region \(l\).

Since the values of \(A\) and \({A}_{l}\) are known, the problem can be solved by calibrating the \({\widehat{a}}_{kl}\)-values so that the territorial totals correspond to the actual values. The calibration can be done using the following calibration factor,

$$p_{kl}^{cal} = \hat{a}_{kl} /\mathop \sum \limits_{k = 1}^K \hat{a}_{kl} ,l = 1, \ldots ,L$$
(5.7)

which is characterised by the fact that \({\sum }_{k=1}^{K}{p}_{kl}^{cal}=1\) for any region \(l\).

The calibrated estimates of areal extents can then be obtained as

$$\hat{a}_{kl}^{cal} = A_l \times p_{kl}^{cal} .$$
(5.8)

The formulas to estimate the variances and covariances for \({\widehat{a}}_{kl}^{cal}\) can be found in Fattorini et al. (2006).

5.3 Estimation of Total and Density Values of Quantitative Variables

During the third phase of the sampling procedure, for any of the \(\left(V+1\right)\times L\times F\) second phase stratum concerning a Forest category, if \({n}_{khl}>0\), a sample \({\mathrm{Q}}_{khl}\) of \({m}_{khl}\) points is selected through simple random sampling without replacement. Each sampled point is then observed on the field and the amount of any variable of interest is measured within a circular area centred on it (cf. Chaps. 2 and 4). According to Fattorini et al. (2006), the obtained data can then be used to estimate \({t}_{kl}\), for any variable, through the following unbiased estimator

$$\hat{t}_{kl} = NQ \sum \limits_{h > NF} w_{hl} w_{khl} \overline{x}_{khl} ,k = NF + 1, \ldots ,K,l = 1, \ldots ,L$$
(5.9)

where \({\overline{x} }_{khl}\) is the sample mean of the Horvitz-Thompson total estimates of the variable observed in the sample points of stratum \(khl\).

Fattorini et al. (2006) also provide conservative estimators of the variances and covariances of \({\widehat{t}}_{kl}\), that is

$$\begin{aligned} & \hat{v}\left( {\hat{t}_{kl} } \right) = \frac{NQ^2 }{{NQ - 1}}\left\{ { \sum \limits_{h > NF} w_{hl} \frac{{N_{hl} - 1}}{{n_{hl} - 1}}w_{khl} \left( {n_{khl} - 1} \right)\frac{{s_{khl}^2 }}{{m_{khl} }}} \right., \\ & \quad \quad \quad \quad \quad \quad \;\left. { + \sum \limits_{h > NF} w_{hl} \frac{{N_{hl} - 1}}{{n_{hl} - 1}}w_{khl} \left( {1 - w_{khl} } \right)\overline{x}^2_{khl} + \sum \limits_{h > NF} w_{hl} w^2_{khl} \overline{x}^2_{khl} - \overline{x}^2_{hl} } \right\} \\ & \quad \quad \quad \quad k = NF + 1, \ldots ,K,{ } l= 1, \ldots ,L, \\ \end{aligned}$$
(5.10)
$$\begin{aligned} \hat{c}\left( {\hat{t}_{kl} ,\hat{t}_{k{^{\prime}}l} } \right) & = - \frac{NQ^2 }{{NQ - 1}}\left\{ { \sum \limits_{k > NF} \frac{{N_{hl} - n_{hl} }}{{n_{hl} - 1}}w_{hl} w_{khl} w_{k{^{\prime}}hl} \overline{x}_{khl} \overline{x}_{k{^{\prime}}hl} + \overline{x}_{khl} \overline{x}_{k{^{\prime}}hl} } \right\}, \\ & \quad \quad \quad k^{\prime} \ne k = NF + 1, \ldots ,K,l = 1, \ldots ,L, \\ \end{aligned}$$
(5.11)

and

$$\hat{c}\left( {\hat{t}_{kl} ,\hat{t}_{k{^{\prime}}l{^{\prime}}} } \right) = \frac{{\hat{t}_{kl} \hat{t}_{k{^{\prime}}l{^{\prime}}} }}{NQ - 1},k,k^{\prime} = NF + 1, \ldots ,K,l^{\prime} \ne l, \ldots ,L$$
(5.12)

where \({s}_{khl}^{2}\) is the sample variance of the estimated values of the variable observed in the sample points of stratum \(khl\).

In applying Eq. 5.9, it is necessary that if \({n}_{khl}>1\) then \({m}_{khl}\ge 2\) while if \({n}_{khl}=1\) then \({m}_{kl}\ge 1\).

The density values, \({d}_{kl}\), can be straightforwardly estimated with

$$\hat{d}_{kl} = \hat{t}_{kl} /\hat{a}_{kl}^{cal} .$$
(5.13)

Unfortunately, the variance of \({\widehat{d}}_{kl}\), which is a ratio estimator, is intractable. Approximate unbiased estimates of the variances and covariances of (5.13) can however be obtained using the common approach of linearising the ratio using the first leading term of its Taylor series expansion (Särndal et al., 1992). The reliability and precision of these approximated estimates depend on the level of precision of \({\widehat{t}}_{kl}\) and \({\widehat{a}}_{kl}^{cal}\). For this reason, in some circumstances the error of density estimates has not been reported.

For the survey campaign of INFC2005, the estimators described by Eqs. (5.9) and (5.10) were also modified to obtain the estimates of the total values of the quantitative variables for subsets of population units (e.g., the trees) identified by relevant qualitative attributes, such as the tree species. Let us consider \(M\) non-overlapping subsets. The total value, \({t}_{kml}\), of a quantitative variable of interest for subset \(m\), forest category \(k\), and region \(l\) can be properly estimated with

$$\begin{aligned} & \quad \quad \quad \hat{t}_{kml} = NQ \sum \limits_{h > NW} w_{hl} w_{khl} \overline{x}_{kmhl} \\ & k = NF + 1, \ldots ,K - 1,\,\,\,m = 1, \ldots ,M,\,\,l = 1, \ldots ,L. \\ \end{aligned}$$
(5.14)

The estimated variance of \({\widehat{t}}_{kml}\) is therefore

$$\begin{aligned} \hat{v}\left( {\hat{t}_{kml} } \right) & = \frac{NQ^2 }{{NQ - 1}}\left\{ { \sum \limits_{h > NF} w_{hl} \frac{{N_{hl} - 1}}{{n_{hl} - 1}}w_{khl} \left( {n_{khl} - 1} \right)\frac{{s_{kmhl}^2 }}{{m_{khl} }}} \right. \\ & \quad \quad \quad \quad + \sum \limits_{h > NF} w_{hl} \frac{{N_{hl} - 1}}{{n_{hl} - 1}}w_{khl} \left( {1 - w_{khl} } \right)\overline{x}_{kmhl}^2 \\ & \left. {\quad \quad \quad \quad + \sum \limits_{h > NF} w_{hl} w_{khl}^2 \overline{x}_{kmhl}^2 - \overline{x}_{klm}^2 } \right\} \\ & k = NF + 1, \ldots ,K - 1,\,\,\,m = 1, \ldots ,M,\,\,l = 1, \ldots ,L, \\ \end{aligned}$$
(5.15)

where \({\overline{x} }_{kmhl}\) and \({s}_{kmhl}^{2}\) are, respectively, the sample mean and sample variance of the estimated values of the variable observed in the sample points of stratum \(khl\) for the population units that belong to subset \(m\).

Further modifications of the estimators have allowed also to provide estimates for two specific cases. On the one hand, they have been modified to estimate the areal extents of forest categories for subsets, \(m=1,\dots ,M\), of sample points identified by qualitative attributes measured during the third phase of the sampling plan. On the other hand, they have been modified to estimate \({a}_{kml}\) and \({t}_{kml}\) in those circumstances in which subset \(m\) is identified during the second phase of the sampling plan. See Fattorini et al. (2011) for further details about these estimators.

5.4 Comparison Between the Two Forest Inventories

The presented estimation strategy has been used during the estimation process of the parameter of interest for INFC2005. Such strategy also remained valid for the estimation process of INFC2015. Indeed, the modifications occurred among the two forest inventories have no impact on the use of the estimation strategy, as explained in the following.

The sampling plan adopted for INFC2005 consisted of a three-phase structure (cf. Chap. 2). In the first phase, carried out using aerial photointerpretation, the area was divided into polygons of equal size (1 km2), from which a point was randomly selected (one from each polygon). Then the population of such selected points was divided into 21 strata corresponding to the territorial districts of the national territory (regions), and into strata corresponding to the land use and cover categories. In the second phase, carried out by surveys on the ground, a stratified sample of points was selected from the strata defined in the first phase only for land use and cover categories of interest for INCF2005. The selected sample points were assigned then to strata corresponding to the forest types. In the third phase, from each Forest stratum, an additional sample of points was selected. Then, plots were laid out around each of these points, in order to define the area on which the measurements of the variables were carried out (cf. Chap. 4).

In INFC2015, the adopted sampling plan did not suffer substantial changes. Indeed, the three-phase structure was maintained: the first was designed to classify the land use and cover; the second was aimed at definitively classifying the land use and cover and forest types in correspondence of sampling points; sought to define the areas for the survey of interest variables. Therefore, the sampling plan did not undergo any changes that altered its original design. The changes made in the INFC2015 have regarded exclusively the definition of strata and the sample sizes. Specifically:

  1. i.

    The number of points in the national territory was equal to 301,306 for INFC2005 and equal to 301,271 for INFC2015;

  2. ii.

    The classification scheme of the land use and cover used for the INFC2005 aerial photointerpretation included 12 classes/subclasses, while the scheme used for the INFC2015 included 13 classes/subclasses (including the class of non-classified points);

  3. iii.

    To estimate the INFC2015 areal extent, 3 new classes of forest inventory interest must be added to the 13 classes/subclasses, which following the aerial photointerpretation of the INFC2005, belonged to the classes/subclasses not of interest for the subsequent phases of the inventory;

  4. iv.

    The number of classes/subclasses from aerial photointerpretation, which were not of interest for the forest inventory went from seven (for INFC2005) to eight (for INFC2015);

  5. v.

    The number of classes of forest inventory interest was equal to four in the INFC2005, which became eight in the INFC2015.

As explained, the modifications occurred as a result of slightly different systems of aerial photointerpretation adopted during the INFC2015 and of changes in the classes/subclasses of interest. All this resulted in a different number of strata among INCF2005 and INCF2015, as well changes to some units from a stratum to another in the two survey waves. Nevertheless, from a methodological point of view, the sampling plan adopted during the two waves remained the same, since it was a three-phase design, with the definition of areas at each phase based on stratification. This fact confirmed the possibility of using the same estimation techniques adopted during the INCF2005 for INCF2015, both for the estimation of areal extents and for the values of the interest variables.

The estimators used for INCF2015 were implemented in the open source software R (R Core Team, 2020), a free environment for statistical computing.