Advertisement

Annals of Forest Science

, Volume 73, Issue 4, pp 839–847 | Cite as

The effects of temporal differences between map and ground data on map-assisted estimates of forest area and biomass

  • Ronald E. McRobertsEmail author
  • Erik Næsset
  • Terje Gobakken
Original Paper
Part of the following topical collections:
  1. Forest inventories at the European level

Abstract

Key message

When areas of interest experience little change, remote sensing-based maps whose dates deviate from ground data can still substantially enhance precision. However, when change is substantial, deviations in dates reduce the utility of such maps for this purpose.

Context

Remote sensing-based maps are well-established as means of increasing the precision of estimates of forest inventory parameters. The general practice is to use maps whose dates correspond closely to the dates of ground data. However, as national forest inventories move to continuous inventories, deviations between map and ground data dates increase.

Aims

The aim was to assess the degree to which remote sensing-based maps can be used to increase the precision of estimates despite differences between map and ground data dates.

Methods

For study areas in the USA and Norway, maps were constructed for each of two dates, and model-assisted regression estimators were used to estimate inventory parameters using ground data whose dates differed by as much as 11 years from the map dates.

Results

For the Minnesota study area that had little change, 7-year differences in dates had little effect on the precision of estimates of proportion forest area. For the Norwegian study area that experienced considerable change, 11-year differences in dates had a detrimental effect on the precision of estimates of mean biomass per unit area.

Conclusions

The effects of differences in map and ground data dates were less important than temporal change in the study area.

Keywords

Landsat Lidar Model-assisted estimator 

1 Introduction

Forest inventory and monitoring programs report estimates of parameters related to forest area and biomass using data acquired from arrays of ground plots. Although completely valid inferences can be constructed using only the ground data, the resulting precision may be less than acceptable, particularly for highly variable populations and for regions for which sampling intensities are small due to cost and logistical constraints. Remotely sensed auxiliary data, often in the form of forest attribute maps, have the potential to increase the precision of estimates with no increase in sample size.

Although the utility of maps based on remotely sensed auxiliary information for enhancing inference is well-documented (GOFC-GOLD 2014; GFOI 2013), acquisition and processing of the remotely sensed data for large regions may be expensive, labor-intensive, and time-consuming. For example, the Forest Inventory and Analysis (FIA) program of the US Forest Service conducts the nation’s national forest inventory (NFI) and reports inferences for parameters related to forest area and biomass at 5-year intervals. Because acquisition of nationwide remotely sensed data and construction of the necessary maps are beyond the scope of the program’s capabilities, particularly at 5-year intervals, the Landsat-based National Land Cover Dataset (NLCD) (Vogelmann et al. 2001; Homer et al. 2004, 2007) constructed by the US Geological Survey is used. However, the NLCD dates do not necessarily coincide with the FIA reporting dates, and the NLCD has been updated at only approximately 10-year intervals. As a second example, the utility of lidar-assisted approaches for estimating forest biomass is increasingly reported (e.g., d’Oliveira et al. 2012). However, for many countries with tropical forests, the cost of even a single set of lidar data, whether acquired wall-to-wall or in strips, may be prohibitive. The cost of multiple sets of lidar data corresponding to periodic remeasurements of the ground plots would be even more prohibitive.

In addition to the cost factors, estimation procedures based on aggregating map unit data, often characterized as pixel counting, are inherently biased because of map classification and prediction errors. Further, map accuracy indices produce no direct estimates of bias or variances. A popular emerging approach that produces these estimates while simultaneously compensating for the effects of both outdated maps and map errors is to combine map estimates with ground data using the design-based, model-assisted regression estimator (Baffetta et al. 2009; Gregoire et al. 2011; McRoberts 2010, 2011; d’Oliveira et al. 2012; McRoberts and Walters 2012; McRoberts et al. 2013; Næsset et al. 2011, 2013a, 2013b; Vibrans et al. 2013; Sannier et al. 2014), also characterized as the generalized regression estimator (GREG) (Särndal 2011). This estimator adjusts map-based estimates for classification and prediction errors due to factors such as deviations between the dates of the remotely sensed data and the ground reference data (Sect. 2.3.3). Although this feature makes the estimator unbiased, or at least nearly unbiased, the trade-off for adjustment of greater map errors is less precise estimates. Nevertheless, if the map errors are not substantial, the estimator may still be more precise than simple random sampling estimators that use only the ground reference data. Other than Næsset et al. (2011) who comment that fitted models do not compensate for the effects of temporal deviations between response and predictor variables, few reports have been published regarding the degree to which compensation is possible for substantially outdated maps or the degree to which precision is affected.

The overall objective of the study was to assess the degree to which temporal differences between remote sensing-based maps and ground data affect bias and precision estimates for estimates of inventory parameters. For a study area in Minnesota, USA, the population parameter of interest was proportion forest area for which a Landsat-based map of the probability of forest cover was used to enhance estimation. For a study area in Våler, Norway, the population parameter of interest was mean biomass per unit area for which a biomass map based on airborne laser scanning (ALS) data was used to enhance estimation. The rationale for these choices was twofold. First, forest area and volume-related variables such as biomass are the two most important and commonly reported forest inventory and monitoring variables. Second, these study areas, response variables, and auxiliary data provide a diverse context for the study.

2 Materials and methods

2.1 Study areas

2.1.1 Minnesota study area

The study area was defined by the portion of the row 27, path 27, Landsat scene in northeastern Minnesota, USA, that was cloud-free for the two image dates, 16 July 2002 and 30 July 2007 (Fig. 1). The Landsat Thematic Mapper (TM) spectral data were transformed using the normalized difference vegetation index (NDVI) transformation (Rouse et al. 1973) and the three tasseled cap transformations (TCgreen, TCbright, TCwet) (Kauth and Thomas 1976; Crist and Cicone 1984) for each image. The six original bands of spectral data and the four transformations were used as independent variables when constructing models of the relationship between the ground and remotely sensed data (Sect. 3.1.1).
Fig. 1

Study area in northeastern Minnesota, USA. Source: state boundaries - National Atlas of the United States, 2005

Ground training data were obtained for permanent plots established by the FIA program using a quasi-systematic sampling design that is regarded as producing an equal probability sample (McRoberts et al. 2010). Each FIA plot consists of four 7.32-m (24-ft) radius circular subplots that are configured as a central subplot and three peripheral subplots with centers located at distances of 36.58 m (120 ft) and azimuths of 0°, 120°, and 240° from the center of the central subplot. Centers of forested, partially forested, or previously forested plots are estimated using global positioning system (GPS) receivers, whereas centers of non-forested plots are verified using aerial imagery and digitization methods. Data were available for 238–252 FIA plots measured each year in the interval [2000, 2009]. Plots in the study area are remeasured at 5-year intervals; so, for example, the plots measured in 2000 were remeasured in 2005.

Field crews visually estimate the proportion of each subplot that satisfies the FIA definition of forest land: minimum area of 0.4 ha (1.0 ac), minimum crown cover of 10 %, minimum crown cover width of 36.6 m (120 ft), and forest land use. Field crews also observe species and measure diameter at-breast-height (dbh) (1.37 m, 4.5 ft) and height for all trees with dbh of at least 12.7 cm (5 in). Growing stock volumes are estimated for individual measured trees using statistical models, aggregated at subplot-level, expressed as volume per unit area, and considered to be observations without error (McRoberts and Westfall 2014).

For this study, data for only the central subplot of each plot were used to avoid dealing with spatial correlation among observations for subplots of the same plot. Deletion of data for the remaining subplots resulted in little loss of information, because the correlation among observations for subplots of the same plot was greater than 0.85. Because the 168.3-m2 subplots are considerably smaller than the larger 900-m2 TM pixels, the proportion of a subplot bisected by a forest/non-forest boundary may be considerably different than the proportion of the pixel bisected by the same forest/non-forest boundary. Therefore, such subplots were deleted for purposes of model construction but retained for purposes of estimation. In addition, because FIA field crews classify subplots with respect to land use, not land cover, subplots whose tree cover has been removed are still classified as forest if forest land use is expected to continue. Thus, observations of land cover for subplots with forest land use but no measurable volume were considered to be missing at random and were deleted for purposes of model construction but retained for purposes of estimation. Following deletions, forest/non-forest observations for 186–202 plots per year remained for model construction. For the central subplots, proportion forest was combined with the 10 Landsat variables for pixels containing subplot centers. For future reference, the term plot refers to the central subplot of each FIA plot cluster.
Fig. 2

Study area in Våler Municipality in southeastern Norway

The Minnesota study area consists primarily of State and County ownerships that are managed for timber production with rotation cycles on the order of 40 years. However, the study area also includes substantial numbers of private ownerships which may or may not be managed for specific objectives. The dominant forest types are aspen-birch (Populus spp., Betula spp.) and maple-beech-birch (Acer spp., Fagus spp., Betula spp.) with lesser amounts of spruce-fir (Picea spp., Abies spp.).

2.1.2 Våler study area

The 853-ha study area was located in Våler Municipality in southeastern Norway and included 176 systematically distributed, circular, 200-m2 forest inventory plots (Fig. 2). The dominant tree species are Norway spruce (Picea abies (L.) Karst.) and Scots pine (Pinus sylvestris L.). Tree-level aboveground biomass (AGB, Mg/ha) was estimated for both 1999 and 2010 using statistical models based on field observations of species and measurements of dbh (1.3 m) and height (Marklund 1988). For 1999 and 2010, plot-level AGB was estimated as the sum of individual tree AGB predictions, scaled to Mg/ha, and considered to be observations without error (McRoberts and Westfall 2014). Aerial stereo photography was used to delineate four classes related to stand age and species dominance that served as the basis for four strata: (1) recently regenerated forest, (2) young forest, (3) mature, spruce-dominated forest, and (4) mature, pine-dominated forest. Sampling intensities were approximately equal for the first three strata, but for the fourth stratum, the intensity was only approximately one-third of that for the other three strata (Næsset et al. 2013a).

Wall-to-wall ALS data were acquired for the study area in 1999 and 2010. For each year, distributions of first echo heights were constructed for the 200-m2 circular plots and 200-m2 square grid cells that tessellated the study area. A threshold of 1.3 m above the ground surface was used to remove the effects of echoes from ground vegetation whose biomass is not included in tree-level AGB. For each plot and cell, heights corresponding to the 10th, 20th, …, 100th percentiles of the distributions were calculated and were available for inclusion as independent variables for constructing models of the relationship between AGB and the ALS metrics.

2.2 Map construction

2.2.1 Mapping forest/non-forest

The relationship between a dichotomous response variable such as forest/non-forest, here denoted Y (y = 0 denotes non-forest, y = 1 denotes forest), and continuous independent variables, X, is often expressed in the form,
$$ {\mathrm{p}}_{\mathrm{i}}=\mathrm{f}\left({\mathbf{X}}_{\mathrm{i}};\boldsymbol{\upbeta} \right)+{\upvarepsilon}_{\mathrm{i}} $$
(1)
where i indexes population units, pi is the probability that yi = 1, β is a vector of parameters to be estimated, and εi is the random residual with mean 0 (Agresti 2007). The function, f(X i;β), expresses the expectation of Y in terms of X and β and is often formulated using the logistic function leading to the model,
$$ {\mathrm{p}}_{\mathrm{i}}=\frac{ \exp \left({\upbeta}_0+{\displaystyle \sum_{\mathrm{j}=1}^{\mathrm{J}}{\upbeta}_{\mathrm{j}}{\mathrm{x}}_{\mathrm{i}\mathrm{j}}}\right)}{1+ \exp \left({\upbeta}_0+{\displaystyle \sum_{\mathrm{j}=1}^{\mathrm{J}}{\upbeta}_{\mathrm{j}}{\mathrm{x}}_{\mathrm{i}\mathrm{j}}}\right)}+{\upvarepsilon}_{\mathrm{i}} $$
(2)
where j = 1, …, J indexes the independent variables, and exp (.) is the exponential function. The model parameters are estimated using maximum likelihood methods as described by Agresti (2007).

Parameters for the binomial logistic regression model were estimated separately using the 2002 FIA and Landsat data and using the 2007 FIA and Landsat data. A three-step procedure was used to assess quality of fit of the models to the data: (1) all plot observation/model prediction pairs, (yi, \( {\widehat{\mathrm{p}}}_{\mathrm{i}} \)), were ordered with respect to \( {\widehat{\mathrm{p}}}_{\mathrm{i}} \); (2) the ordered pairs were grouped into categories of approximately equal numbers of pairs, and the group means of the plot observations and the corresponding model predictions were calculated; and (3) a graph of the observation means versus the model prediction means was constructed. If the model is correctly specified, a graph of means of observations against means of predictions should lie along the 1:1 line.

2.2.2 Mapping biomass

For the Våler study area, a nonlinear logistic model was used to estimate the relationship between AGB and the ALS metrics. The model had the mathematical form,
$$ {\mathrm{y}}_{\mathrm{i}}=\frac{\upalpha}{1+ \exp \left({\upbeta}_0+{\displaystyle \sum_{\mathrm{j}=1}^{\mathrm{J}}{\upbeta}_{\mathrm{j}}{\mathrm{x}}_{\mathrm{i}\mathrm{j}}}\right)}+{\upvarepsilon}_{\mathrm{i}}, $$
(3)
where i indexes population units, xij is the jth lidar metric, α and the βs are parameters to be estimated, and εi is the residual term. An advantage of the logistic model expressed by Eq. (3) over a linear model is that all predictions are constrained by the lower horizontal asymptote of ŷ = 0 and the upper horizontal asymptote of \( \widehat{\mathrm{y}}=\widehat{\upalpha} \) which is estimated from the sample data. This logistic regression model should not be confused with the binomial logistic regression model described in Sect. 2.2.1.
The model was fit using least squares techniques with the parameters estimated separately for each stratum for each of 1999 and 2010 using the corresponding inventory and ALS data. For each model, the quality of fit of the model to the data was assessed using pseudo-R 2 calculated as
$$ {\mathrm{R}}^{2*}=\frac{{\mathrm{SS}}_{\mathrm{mean}}-{\mathrm{SS}}_{\mathrm{err}}}{{\mathrm{SS}}_{\mathrm{mean}}}, $$
(4)
where SSmean is the sum of squared deviations of the observations around their mean and SSerr is the sum of squared deviations of the observations from their predictions. The same three-step procedure as described in Sect. 2.2.1 was also used, albeit using ŷi rather than \( {\widehat{\mathrm{p}}}_{\mathrm{i}} \).

2.3 Analyses

2.3.1 Assumptions and technical objectives

All analyses were based on three underlying assumptions: (1) a finite population, U, consisting of N units in the form of either square, 900-m2 Landsat pixels for the Minnesota dataset or square 200-m2 grid cells for the Våler dataset; (2) a sample, S, of n population units in the form of the plots; and (3) availability of auxiliary remotely sensed Landsat data for all pixels and ALS data for all lidar cells. In the following sections, the terms population unit, pixel, and grid cell are used interchangeably.

For assessments of forest area, the objective is typically to estimate the area for a class of the response variable. Because the estimate of class area is simply the product of total area which is usually known and the estimate of the class area proportion, the parameter of interest for the Minnesota portion of the study was proportion forest at time t, denoted μ t . For the Våler study area, the parameter of interest was mean AGB at time t, also denoted μ t . For inventory applications, the ultimate objective is construction of an inference in the form of an approximately 95 % confidence interval for μ t expressed as
$$ {\widehat{\mu}}^{\mathrm{t}}\pm 2\cdot \sqrt{V\widehat{a}r\left({\widehat{\mu}}^t\right)}, $$
(5)
where \( V\widehat{a}r\left({\widehat{\mu}}^t\right) \) is the estimator of the variance of \( {\widehat{\mu}}^t \). Thus, the study emphasis was estimation of μ t and the standard error of its estimate, \( SE\left({\widehat{\mu}}^t\right)=\sqrt{V\widehat{a}r\left({\widehat{\mu}}^t\right)} \).

2.3.2 Simple random sampling estimators

For both study areas and for each year for which ground data were available, the parameters of interest were estimated using the simple random sampling (SRS) estimators,
$$ {\widehat{\mu}}_{\mathrm{SRS}}^{{\mathrm{t}}_{\mathrm{grnd}}}=\frac{1}{\mathrm{n}}{\displaystyle \sum_{\mathrm{i}\in \mathrm{S}}{\mathrm{z}}_{\mathrm{i}}^{{\mathrm{t}}_{\mathrm{grnd}}}} $$
(6a)
and
$$ V\widehat{a}r\left({\widehat{\mu}}_{\mathrm{SRS}}^{{\mathrm{t}}_{\mathrm{grnd}}}\right)=\frac{1}{n\left(n-1\right)}{\displaystyle \sum_{\mathrm{i}\in \mathrm{S}}^{\mathrm{n}}{\left({\mathrm{z}}_{\mathrm{i}}^{{\mathrm{t}}_{\mathrm{grnd}}}-{\widehat{\mu}}_{\mathrm{SRS}}^{{\mathrm{t}}_{\mathrm{grnd}}}\right)}^2}, $$
(6b)
where tgrnd denotes the date of the ground data, z denotes the ground observations of forest or non-forest for the Minnesota study area or AGB for the Våler study area.

2.3.3 Model-assisted regression estimators

Model-assisted regression estimation is an approach to increasing precision that uses auxiliary information. For both study areas and each ground data year, an initial estimate can be calculated as the mean over map predictions, regardless of the year of the map,
$$ {\widehat{\mu}}_{\mathrm{i}\mathrm{nit}}^{{\mathrm{t}}_{\mathrm{map}}}=\frac{1}{\mathrm{N}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{N}}{\widehat{\mathrm{z}}}_{\mathrm{i}}^{{\mathrm{t}}_{\mathrm{map}}}}, $$
(7a)
where tmap denotes the date of the remotely sensed data and \( \widehat{z} \) denotes the prediction of the probability of forest from Eq. (2) or the AGB prediction from Eq. (3). However, this estimator may be biased due to map classification and prediction error for multiple reasons such as changes in the response variable between the map and ground data dates. The bias of this estimator is estimated as
$$ \mathrm{B}\widehat{\mathrm{i}}\mathrm{a}\mathrm{s}\left({\widehat{\mu}}_{\mathrm{i}\mathrm{nit}}^{{\mathrm{t}}_{\mathrm{map}}}\right)=\frac{1}{\mathrm{n}}{\displaystyle \sum_{\mathrm{i}\in \mathrm{S}}\left({\widehat{\mathrm{z}}}_{\mathrm{i}}^{{\mathrm{t}}_{\mathrm{map}}}-{\mathrm{z}}_{\mathrm{i}}^{{\mathrm{t}}_{\mathrm{grnd}}}\right)}. $$
(7b)
The model-assisted, generalized regression estimator (GREG) of the mean is
$$ \begin{array}{c}\hfill {\widehat{\mu}}_{\mathrm{GREG}}^{{\mathrm{t}}_{\mathrm{grnd}}}={\widehat{\mu}}_{\mathrm{i}\mathrm{nit}}^{{\mathrm{t}}_{\mathrm{map}}}-\mathrm{B}\widehat{\mathrm{i}}\mathrm{a}\mathrm{s}\;\left({\widehat{\mu}}_{\mathrm{i}\mathrm{nit}}^{{\mathrm{t}}_{\mathrm{map}}}\right)\hfill \\ {}\hfill =\frac{1}{\mathrm{N}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{N}}{\widehat{\mathrm{z}}}_{\mathrm{i}}^{{\mathrm{t}}_{\mathrm{map}}}}-\frac{1}{\mathrm{n}}{\displaystyle \sum_{\mathrm{i}\in \mathrm{S}}\left({\widehat{\mathrm{z}}}_{\mathrm{i}}^{{\mathrm{t}}_{\mathrm{map}}}-{\mathrm{z}}_{\mathrm{i}}^{{\mathrm{t}}_{\mathrm{grnd}}}\right)}\hfill \end{array} $$
(7c)
with variance estimator,
$$ V\widehat{a}r\left({\widehat{\mu}}_{\mathrm{GREG}}^{{\mathrm{t}}_{\mathrm{grnd}}}\right)=\frac{1}{\mathrm{n}\left(\mathrm{n}-1\right)}{\displaystyle \sum_{\mathrm{i}\in \mathrm{S}}{\left({\upvarepsilon}_{\mathrm{i}}-\overline{\upvarepsilon}\right)}^2}, $$
(7d)
where \( {\varepsilon}_i=\left({\widehat{z}}_i^{t_{map}}-{z}_i^{t_{grnd}}\right) \) and \( \overline{\varepsilon}=\frac{1}{n}{\displaystyle \sum_{i\in S}{\varepsilon}_i} \) (Särndal et al. 1992, Sect. 6.5; Särndal 2011). The potential advantage of the GREG estimators is that \( {\displaystyle \sum_{i\in S}{\left({\varepsilon}_i-\overline{\varepsilon}\right)}^2} \) from Eq. (7d) may be smaller than \( {\displaystyle \sum_{i\in S}\left({\widehat{z}}_i^{t_{grnd}}-{\widehat{\mu}}_{SRS}^{t_{grnd}}\right)} \) from Eq. (6b) in which case \( V\widehat{a}r\left({\widehat{\mu}}_{GREG}^{t_{grnd}}\right) \) should be smaller than \( V\widehat{a}r\left({\widehat{\mu}}_{SRS}^{t_{grnd}}\right) \). The GREG bias and variance estimates are generally expected to be smaller when |t map  − t grnd | is smaller. However, the estimator is still, at worst, nearly unbiased, regardless of the difference in dates.

2.3.4 Stratified estimators

For the Våler study area, the unequal sampling intensities within strata necessitated use of stratified estimators. Because the plots were distributed systematically, the within-strata sample sizes were considered random rather than fixed as would be the case for stratified sampling. Thus, the post-stratified (STR) estimators as provided by Cochran (1977) were used,
$$ {\widehat{\mu}}_{\mathrm{STR}}={\displaystyle \sum_{\mathrm{h}=1}^{\mathrm{H}}{\mathrm{w}}_{\mathrm{h}}\cdot {\widehat{\mu}}_{\mathrm{h}}}, $$
(8a)
and
$$ V\widehat{a}r\left({\widehat{\mu}}_{\mathrm{STR}}\right)={\displaystyle \sum_{\mathrm{h}=1}^{\mathrm{H}}\left[{\mathrm{w}}_{\mathrm{h}}\cdot \frac{{\widehat{\upsigma}}_{\mathrm{h}}^2}{\mathrm{n}}+\left(1-{\mathrm{w}}_{\mathrm{h}}\right)\cdot \frac{{\widehat{\upsigma}}_{\mathrm{h}}^2}{{\mathrm{n}}^2}\right]}, $$
(8b)
where n is the total sample size, h = 1,…,H denote the strata, wh are the strata weights calculated as the proportions of the study area in strata, and the within-strata means and variances, μ h and \( {\widehat{\sigma}}_h^2 \) are estimated using both the SRS and the GREG estimators.

2.3.5 Estimating mean proportion forest and biomass per unit area

For the Minnesota study area, three estimates of proportion forest were calculated for each t grnd  ∈ [2000, 2009]: one using the SRS estimators, one using the GREG estimators and the 2002 map, and one using the GREG estimators and the 2007 map. For the Våler study area, three estimates of mean AGB were calculated for each of 1999 and 2010: one using the SRS estimators within strata, one using the GREG estimators and the 1999 map within strata, and one using the GREG estimators and the 2010 map within strata. Under the assumption of unbiasedness of the estimators, differences in the three estimates for the same ground data year should be small.

3 Results

For the Minnesota study area, the logistic regression models adequately represented the relationships between the probability of forest and Landsat variables (Fig. 3). If \( {\widehat{p}}_i<0.5 \) is used to predict non-forest and \( {\widehat{p}}_i\ge 0.5 \) is used to predict forest, the overall accuracies of the 2002 and 2007 classifications were 0.89 and 0.92, respectively. For the Våler study area, R 2 * values for the eight biomass models, one for each of the four strata for each of the 2 years, were in the range 0.72–0.96 with six of the eight greater than 0.90; these large R 2 * values were reflected in the strong and similar relationships between observations and model predictions for both 1999 and 2010 (e.g., Fig. 4).
Fig. 3

Accuracy of logistic regression model predictions for Minnesota study area

Fig. 4

Group means of biomass observations versus group means of predictions for stratum 1 and 1999 for Våler study area

For the Minnesota study area, annual estimates of proportion forest for each year in the 2000–2009 period were similar, regardless of the estimation approach (Table 1). The bias estimates were uniformly small, not more than 15 % of the estimates of the means, and had little effect on the GREG estimates. SE estimates were also uniformly small, not more than 4 % of the means, with the GREG estimates slightly smaller than the SRS estimates.
Table 1

Estimates of proportion forest area for Minnesota study area

Ground data year, tgrnd

Sample size

Simple random sampling estimators

Model-assisted regression (GREG) estimators (map year, tmap)

2002

2007

\( {\widehat{\mu}}_{SRS} \)

SE

\( {\widehat{\mu}}_{init} \)

Bîas

\( {\widehat{\mu}}_{GREG} \)

SE

\( {\widehat{\mu}}_{init} \)

Bîas

\( {\widehat{\mu}}_{GREG} \)

SE

2000

252

0.709

0.028

0.643

−0.033

0.676

0.025

0.639

−0.019

0.659

0.025

2001

238

0.637

0.031

0.643

0.028

0.615

0.024

0.639

0.018

0.621

0.025

2002

249

0.721

0.028

0.643

−0.026

0.668

0.022

0.639

−0.022

0.661

0.022

2003

247

0.700

0.029

0.643

−0.044

0.687

0.023

0.639

−0.003

0.643

0.022

2004

245

0.679

0.030

0.643

0.002

0.641

0.024

0.639

0.006

0.633

0.023

2005

252

0.707

0.028

0.643

−0.020

0.673

0.025

0.639

−0.017

0.656

0.025

2006

238

0.662

0.030

0.643

0.002

0.641

0022

0.639

−0.007

0.647

0.023

2007

249

0.750

0.027

0.643

−0.055

0.698

0.020

0.639

−0.051

0.690

0.019

2008

247

0.749

0.028

0.643

−0.093

0.734

0.021

0.639

−0.022

0.692

0.021

2009

245

0.699

0.029

0.643

−0.018

0.661

0.023

0.639

−0.014

0.653

0.022

For the Våler study area, the three population-level estimates of mean AGB were similar for 1999 and also for 2010 (Table 2). With the exception of stratum 2, the within-strata estimates were also similar. When the ground and map years were the same, all within-strata GREG bias estimates were less than 5 % of the estimated means, and the GREG SE estimates for the entire study area were less than 2 % of the estimated means with the latter considerably smaller than the SRS estimates. However, when the ground and map years differed, estimates of bias were considerably larger which, in turn, caused the SE estimates to be larger. Nevertheless, for the 2010 ground data and the 1999 map, the GREG SE estimate was still smaller proportionally by 0.13 than the SRS estimate, but such was not the case for the 1999 ground data and the 2010 map for which the GREG SE estimate was larger proportionally by 0.25 than the SRS estimate.
Table 2

Estimates of mean biomass per unit area (Mg/ha) for the Våler study area

Ground data year, tgrnd

Stratum

  

Within stratum estimator

Simple random sampling estimators

Model-assisted regression (GREG) estimators (map year, tmap)

Number

Weight

Sample size

\( {\widehat{\mu}}_{SRS} \)

SE

1999

2010

\( {\widehat{\mu}}_{init} \)

Bîas

\( {\widehat{\mu}}_{GREG} \)

SE

\( {\widehat{\mu}}_{init} \)

Bîas

\( {\widehat{\mu}}_{GREG} \)

SE

1999

1

0.126

31

49.21

6.96

48.84

0.16

48.69

2.18

112.86

66.77

46.09

5.37

2

0.231

55

114.89

7.92

96.82

−0.02

96.84

2.17

143.77

55.92

87.85

6.03

3

0.269

58

153.79

8.93

156.74

−0.11

156.85

4.46

124.83

−33.61

158.44

13.00

4

0.374

32

94.59

6.73

92.23

−4.45

96.68

3.68

101.75

2.06

99.69

8.54

All

1.000

176

109.47

3.89

  

106.84

1.78

  

105.98

4.85

2010

1

0.126

31

116.07

9.80

48.84

−66.70

115.55

6.03

112.86

−0.09

112.95

2.47

2

0.231

55

171.23

11.98

96.82

−56.36

153.18

6.85

143.77

−0.42

144.18

3.67

3

0.269

58

118.47

14.96

156.74

35.21

121.53

14.39

124.83

1.71

123.11

4.00

4

0.374

32

95.77

9.20

92.23

−5.62

97.85

9.73

101.75

0.89

100.87

2.66

All

1.000

176

121.87

6.22

  

119.24

5.42

  

118.38

1.75

4 Discussion

For the Minnesota study area, the small bias estimates associated with the GREG estimators can be attributed to the accuracy of both the 2002 and 2007 maps and the lack of change over the 2000–2009 interval. One result is the similarity in the annual estimates of proportion forest. The slightly smaller SEs for the GREG estimators than for the SRS estimators can be attributed to the combination of the utility of the auxiliary map data and the effectiveness of the GREG estimators. Despite differences in map and ground data dates of as much as 7 years, no appreciable effect on either estimates of proportion forest area or the precision of estimates as indicated by SEs were discernible.

For the Våler study area, the 1999 population estimates of mean AGB were similar as were the within-strata estimates, except for stratum 2, regardless of the estimation method; likewise, the 2010 estimates were very similar, with the exception of stratum 2, regardless of the estimation method. The reason results were different for stratum 2 is not apparent. When the ground and map years were the same, the smaller GREG estimates of SEs relative to the SRS estimates can be attributed to the utility of the combination of map auxiliary data and the GREG estimators. However, when the ground and map years differed, the greater bias and SE estimates can be attributed to AGB change that is not reflected in the outdated maps. Further, when the ground and map years differed, the utility of the auxiliary map data for increasing precision was greatly diminished, despite the accuracy of the adjusted estimates of the means.

The beneficial features of the GREG estimators are important, particularly for the Våler study area. First, as previously noted, when the ground and map years were the same, the GREG estimates of SEs were much smaller than the SRS estimates. Second, when the ground and map years differed, the bias estimates were large, but the GREG adjustment for them compensated for the fact that the outdated maps did not reflect current ground conditions. Therefore, and perhaps most importantly, the GREG estimates for the entire study area were very similar, regardless of the ground year and map year combination and despite changes in the resource between the 2 years. However, the price to be paid for the large GREG adjustments for estimated bias was much greater SE estimates; in particular, the GREG SE estimate for the combination of the 1999 ground data and the 2010 map was larger than the SRS SE estimate.

5 Conclusions

Three conclusions can be drawn from the study. First, the generalized regression estimators use the auxiliary information in the Landsat-based forest/non-forest maps and the airborne laser scanning-based biomass maps to increase the precision of estimates. This feature of the estimators was confirmed by the smaller standard errors for the generalized regression estimates of mean proportion forest and mean AGB than for the simple random sampling estimates when ground and map years were the same.

Second, the feature of the model-assisted generalized regression estimators that corrects for estimated bias makes the estimator unbiased, or at least nearly unbiased. This feature was illustrated by the similarity in estimates of mean proportion forest for the Minnesota study area and estimates of mean AGB for the Våler study area despite using maps that were outdated by 7 to 11 years. In particular, the corrections for estimated bias produced comparable estimates of population means, regardless of the temporal differences between the ground and map data and regardless of the change in the resource between the ground and map years.

Third, the price to be paid for using outdated maps is loss of precision, particularly when substantial change in the response variable occurs between the map and ground data dates. For the Minnesota study area for which change was rare, differences in dates by as much as 7 years had only negligible effects on both bias estimates and precision. However, for the Våler study area for which change was more substantial, differences in dates by 11 years had detrimental effects on precision to the extent that in one case the simple random sampling estimates were more precise than the generalized regression estimates.

Although broad generalizations based on these two study areas are ill-advised, several generalizations are still possible: (1) despite relatively large temporal differences between map and ground data dates and substantial change in the response variable, the adjustment for estimated bias produced similar estimates of population means; (2) the crucial factor affecting precision is not necessarily the temporal difference between map and ground data dates but rather the degree of change in the response variable.

Notes

Acknowledgments

The authors thank Dr. James A. Westfall, Northern Research Station, US Forest Service, for a careful and detailed review of the paper.

References

  1. Agresti A (2007) An introduction to categorical data analysis. Wiley-Interscience, HobokenCrossRefGoogle Scholar
  2. Baffetta F, Fattorini L, Franeschi S, Corona P (2009) Design-based approach to k-nearest neighbours technique for coupling field and remotely sensed data in forest surveys. Remote Sens Environ 113:463–475CrossRefGoogle Scholar
  3. Cochran WG (1977) Sampling techniques, 3rd edn. Wiley, New YorkGoogle Scholar
  4. Crist EP, Cicone RC (1984) Application of the tasseled cap concept to simulated Thematic Mapper data. Photogramm Eng Remote Sens 50:343–352Google Scholar
  5. d’Oliveira MVN, Reutebuch SE, McGaughey RJ, Andersen H-E (2012) Estimating forest biomass and identifying low-intensity logging areas using airborne laser scanning in Antimary State Forest, Acre State, Western Brazilian Amazon. Remote Sens Environ 124:479–491CrossRefGoogle Scholar
  6. GFOI (2013) Integrating remote-sensing and ground-based observations for estimation of emissions and removals of greenhouse gases in forests: methods and guidance from the global forest observations initiative. Group on Earth Observations, GenevaGoogle Scholar
  7. GOFC-GOLD (2014) A sourcebook of methods and procedures for monitoring and reporting anthropogenic greenhouse gas emissions and removals associated with deforestation, gains and losses of carbon stocks in forests remaining forests, and forestation. GOFC-GOLD Report version COP20-1, (GOFC-GOLD Land Cover Project Office, Wageningen University, The Netherlands). Last accessed: December 2014Google Scholar
  8. Gregoire TG, Ståhl G, Næsset E, Gobakken T, Nelson R, Holm S (2011) Model-assisted estimation of biomass in a LiDAR sample survey in Hedmark county, Norway. Can J For Res 41:83–95CrossRefGoogle Scholar
  9. Homer C, Huang C, Yang L, Wylie B, Coan M (2004) Development of a 2001 National Landcover Database for the United States. Photogramm Eng Remote Sens 70:829–840CrossRefGoogle Scholar
  10. Homer C, Dewitz J, Fry J, Coan M, Hossain N, Larson C, Herold N, McKerrow A, VanDriel JN, Wickham J (2007) Completion of the 2001 national land cover database for the conterminous United States. Photogramm Eng Remote Sens 73:337–341Google Scholar
  11. Kauth RJ, Thomas GS (1976) The Tasseled Cap — A graphic description of the spectral-temporal development of agricultural crops as seen by Landsat. In: Proceedings of the symposium on machine processing of remotely sensed data. Purdue University, West Lafayette, pp 41–51Google Scholar
  12. Marklund LG (1988) Biomass functions for pine, spruce, and birch in Sweden. Swedish University of Agricultural Sciences, Department of Forest Survey, Umeå (in Swedish)Google Scholar
  13. McRoberts RE (2010) Probability- and model-based approaches to inference for proportion forest using satellite imagery as ancillary data. Remote Sens Environ 114:1017–1025CrossRefGoogle Scholar
  14. McRoberts RE (2011) Satellite image-based maps: Scientific inference or pretty pictures? Remote Sens Environ 115:715–724CrossRefGoogle Scholar
  15. McRoberts RE, Walters BF (2012) Statistical inference for remote sensing-based estimates of net deforestation. Remote Sens Environ 124:394–401CrossRefGoogle Scholar
  16. McRoberts RE, Westfall JA (2014) Effects of uncertainty in model predictions of individual tree volume on large area volume estimates. For Sci 60:34–42Google Scholar
  17. McRoberts RE, Hansen MH, Smith WB (2010) United States of America. In: Tomppo E, Gschwantner T, Lawrence M, McRoberts RE (eds) National forest inventories, pathways for common reporting. Springer, Heidelberg, pp 567–582Google Scholar
  18. McRoberts RE, Gobakken T, Næsset E (2013) Inference for lidar-assisted estimation of forest growing stock volume. Remote Sens Environ 128:268–275CrossRefGoogle Scholar
  19. Næsset E, Gobakken T, Solberg S, Gregoire TG, Nelson R, Ståhl G, Weydahl D (2011) Model-assisted regional forest biomass estimation using LiDAR and InSAR as auxiliary data: a case study from a boreal forest area. Remote Sens Environ 115:3599–3614CrossRefGoogle Scholar
  20. Næsset E, Bollandsås OM, Gobakken T, Gregoire TG, Ståhl G (2013a) Model-assisted estimation of change in forest biomass over an 11 year period in a sample survey supported by airborne LiDAR: a case study with post-stratification to provide activity data. Remote Sens Environ 128:299–314CrossRefGoogle Scholar
  21. Næsset E, Gobakken T, Bollandsås OM, Gregoire TG, Nelson R, Ståhl G (2013b) Comparison of precision of biomass estimates in regional field sample surveys and airborne LiDAR-assisted surveys in Hedmark County, Norway. Remote Sens Environ 130:108–120CrossRefGoogle Scholar
  22. Rouse JW, Haas RH, Schell JA, Deering DW (1973) Monitoring vegetation systems in the great plains with ERTS. Proceedings of the Third ERTS Symposium, NASA SP-351, Volume 1. NASA, Washington, pp 309–317Google Scholar
  23. Sannier C, McRoberts RE, Fichet L-V, Makaga EMK (2014) Using the regression estimator with Landsat data to estimate proportion forest cover and net proportion deforestation in Gabon. Remote Sens Environ 115:138–148CrossRefGoogle Scholar
  24. Särndal C-E (2011) Combined inference in survey sampling. Pak J Stat 27:359–370Google Scholar
  25. Särndal C-E, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer, New York, 693 ppCrossRefGoogle Scholar
  26. Vibrans AC, McRoberts RE, Moser P, Nicoletti AL (2013) Using satellite image-based maps and ground inventory data to estimate the area of the remaining Atlantic forest in the Brazilian state of Santa Catarina. Remote Sens Environ 130:87–95CrossRefGoogle Scholar
  27. Vogelmann JE, Howard SM, Yang L, Larson CR, Wylie B, Van Driel N (2001) Completion of the 1990s National Land Cover Data Set for the conterminous United States from Landsat Thematic Mapper data and ancillary data sources. Photogramm Eng Remote Sens 67:650–662Google Scholar

Copyright information

© INRA and Springer-Verlag France 2015

Authors and Affiliations

  • Ronald E. McRoberts
    • 1
    Email author
  • Erik Næsset
    • 2
  • Terje Gobakken
    • 2
  1. 1.Northern Research StationU.S. Forest ServiceSaint PaulUSA
  2. 2.Department of Ecology and Natural Resource ManagementNorwegian University of Life SciencesÅsNorway

Personalised recommendations