1 Introduction

Ecological monitoring programs often produce annual estimates of abundance or biodiversity to assess changes in the status of populations or ecosystems (Marsh and Trenham 2008; Fraixedas et al. 2020). Estimating the total number of individuals in a population, or species in a community, is however a difficult task. For birds and butterflies, for instance, regional population estimates may be derived from transect counts where the proportion of missed individuals cannot be quantified and it may not be clear which spatial area the individuals counted belong to (Ralph et al. 1995, van Swaay et al. 2008). Raw abundance estimates therefore often have no easily interpreted scale. To anchor raw estimates of the ecological quantity of interest, and to put a direct focus on temporal change rather than absolute level, they are usually rescaled relative to some baseline into index values. The use of a baseline gives index values a meaningful scale, and indices are interpreted as the proportional change in the ecological quantity relative to the baseline.

Each choice of baseline gives a different index, and therefore has different associated uncertainty. A standard choice of the baseline is the first year of study (Gregory et al. 2019) so that indices represent the change in the ecological quantity relative to the first year. An issue with this approach is that uncertainty in the raw estimate in the reference year propagates into the uncertainty of all subsequent index values. This may considerably inflate uncertainty (Buckland and Johnston 2017), and is particularly problematic in cases where uncertainty of the raw estimate in the reference year is larger than in subsequent years, which is often the case if the reference year is the first year of the study and sampling effort increases over time. One approach to counter this effect is to select a year with a somewhat lower uncertainty as the reference (Fedy and Aldridge 2011), but this still means that uncertainty in the single reference year propagates into all indices. An alternative is to use the mean over multiple years instead of a single year as the reference to try to get a more stably estimated baseline. This kind of reference is not widely used in practice (but see e.g. Carlson et al. 2012, Knape 2016, Gregory et al. 2019). Another approach that has been suggested to reduce the influence of the choice of baseline (Buckland and Johnston 2017) is to use smoothing methods (Siriwardena et al. 1998; Fewster et al. 2000; Buckland et al. 2005; Soldaat et al. 2017; Harrison et al. 2014; Knape 2016). By smoothing indices over multiple years more stable indices may be obtained, but also in this case using a single reference year to anchor the smoothed index is common practice.

A broader empirical and quantitative assessment of how the choice of reference affects the uncertainty of population indices is currently lacking, and is the aim of this study. I compare indices defined from a single reference year to indices defined from reference periods consisting of sequences of years of varying lengths. I examine how the choice of reference period affect the uncertainty of annual indices estimated independently each year, and of smoothed index estimates obtained from GAMMs, for 100 bird species in Sweden.

2 Methods

For ease of presentation, I discuss abundance indices in the following sections, but the general ideas apply more broadly to ecological indices of temporal change, and particularly to biodiversity indices. Abundance indices are often derived from models with a log-link and, again for the sake of presentation, we assume that we have raw unscaled annual abundance indices over time at the log scale, \(\hat{\mu }_1\), \(\hat{\mu }_2\), ... These may, for example, be estimated from fixed year effects in a Poisson GLM or, in the case of smoothed indices, from a GAM with a Poisson response. If year 1 is used as the reference, then a standard relative abundance index in year \(t\) is

$$\begin{aligned} \hat{I}_t = \frac{\exp (\hat{\mu }_t)}{\exp (\hat{\mu }_1)} \end{aligned}$$
(1)

To evaluate the uncertainty of indices we focus on the variance of \(\log (\hat{I}_t)\), from which approximate confidence intervals and standard errors can be computed. The reason for focusing on the variance at the log scale is that it is unaffected by simple scaling of the index, e.g. the variance of \(\log (\hat{I}_t)\) is identical to that of \(\log (100 \hat{I}_t)\). The variance of \(\log (\hat{I}_t)\) can be expressed as

$$\begin{aligned} \text {V}(\log (\hat{I}_t)) = \text {V}(\hat{\mu }_t - \hat{\mu }_1) = \text {V}(\hat{\mu }_t) + \text {V}(\hat{\mu }_1) - 2 \text {Cov}(\hat{\mu }_t, \hat{\mu }_1) \end{aligned}$$
(2)

This shows that the uncertainty of both \(\hat{\mu }_1\) and \(\hat{\mu }_t\) contribute to the variance of \(\log (\hat{I}_t)\), and that the variance of \(\hat{\mu }_1\) will dominate if the uncertainty of \(\hat{\mu }_1\) is larger than that of \(\hat{\mu }_t\), which is often the case in practice.

The approach evaluated here is to try to reduce the part of the variance in \(\log (\hat{I}_t)\) that is due to uncertainty about the index in the reference year by using multiple reference years. Using 2 years as the reference we can define an alternative index

$$\begin{aligned} \hat{I}_t = \frac{\exp (\hat{\mu }_t)}{(\exp (\hat{\mu }_1) + \exp (\hat{\mu }_2))/2} \end{aligned}$$
(3)

The arithmetic mean of the raw index in the first 2 years is the reference for this index, and we would often expect the alternative index to have less uncertainty due to the denominator being more precisely estimated. This idea can be extended to use the mean over the first \(l\) years as the reference:

$$\begin{aligned} \hat{I}_t = \frac{\exp (\hat{\mu }_t)}{\frac{1}{l} \sum _{j=1}^{l} \exp (\hat{\mu }_j)} \end{aligned}$$
(4)

As \(l\) increases we would expect the uncertainty of \(\log (\hat{I}_t)\) to decrease further. Whether, and to what extent, this happens in practice is the focus of this paper. I will explore this in an analysis of monitoring data but first briefly describe how the uncertainty of the reference period indices may be computed in practice.

2.1 Computing uncertainty estimates for reference period indices

Compared to single year reference indices, \(\log (\hat{I}_t)\) for reference period indices are non-linear as a function of \(\hat{\mu }_j\) in the reference period. This can make it more difficult to compute uncertainty estimates. One approach to do so is to use a delta approximation (see Appendix A). A second approach is simulation methods such as bootstrapping where all \(\hat{\mu }_t\) are generated according to some simulation procedure, and the non-linear function \(\log (\hat{I}_t)\) is computed from the samples. A third approach is Bayesian methods based on Monte Carlo integration, used for example for indices for the North American breeding bird survey (Link and Sauer 2002). In such cases, Monte Carlo samples of the \(\hat{\mu }_t\) will usually be available and \(\log (\hat{I}_t)\) can be computed for each sample to yield a posterior distribution. A fourth option would be to use the geometric mean of the \(\exp (\hat{\mu }_j)\) as the reference instead of the arithmetic mean. Then \(\log (\hat{I}_t)\) would be linear as a function of \(\hat{\mu }_j\) and the uncertainty could be computed from contrasts.

Reference period indices are currently implemented in the R-packages rtrim (Bogaart et al. 2020) and poptrend (Knape 2016). The rtrim package uses a delta approximation (Appendix A). The poptrend package, used in the case study on Swedish birds and simulations below, instead uses a simulation approach. Simulated parameter estimates are drawn from a multivariate normal distribution with covariance matrix equal to a covariance matrix for the parameter estimates (Wood 2006b; Mandel 2013; Harrison et al. 2014).

2.2 Case study

To investigate how multiple reference years affect the uncertainty of population indices in practice, I analyzed data from the Swedish Bird Survey (Lindström and Green 2020). These data consist of annual line transect counts of birds from about 700 survey routes spread across a regular grid over Sweden. Not all routes are surveyed in every year. The survey was initiated in 1996 when 47 routes were surveyed, increasing to 84 in 1997, 166 in 1998, 179 in 1999, and 203 in 2000. The number of routes then continued to increase and 400–500 routes are now surveyed annually. Because so few routes were surveyed in the first 2 years, 1998 is used as the single reference year to compute official indices.

I ranked the species in the survey according to the number of non-zero counts across all years and routes. I then selected the 100 species with the most non-zero counts for analyses of the impact of the baseline on index uncertainty. For each species, I first removed routes that had only zero counts and then fitted (I) a negative binomial regression model with a log link, and route and year as fixed factors and (II) a negative binomial GAMM model with log link, route as a fixed factor, and year both as a smooth function and as a random effect (Knape 2016). The smooth function was implemented as a cubic regression spline. To get a uniform analysis across all 100 species I fixed the number of degrees of freedom in the GAMM at 8. This conforms with other empirical studies of bird census data (Fewster et al. 2000). I fixed the degrees of freedom since model selection of degrees of freedom in the GAMM analysis can lead to smooth functions that are near linear (1 degree of freedom). In this case there is a simple approximate relationship between uncertainty and the choice of reference period, which is examined further in Appendix B. The random time effect in the GAMM model was included to handle short term variation in abundance, and can affect uncertainty of the estimated trend (Knape 2016). The overdispersion parameter of the negative binomial distribution was estimated rather than fixed a priori, but was not included in the covariance matrix used to estimate uncertainty. Indices were then computed from the estimated year effects for (I) (hereafter referred to as ‘independently estimated annual indices’), or from the estimated smooth function evaluated at the years of interest (see below) for (II) (referred to as ‘smoothed indices’).

To evaluate the effect of choice of reference year or period on uncertainty in the resulting indices, I compared the uncertainty of index values representing year 2020 computed from different reference periods. I used two sets of reference periods, one containing years early in the series when the number of surveyed routes was low, and another containing years in the middle part of the series when more routes were surveyed. Specifically, the first set had reference periods of varying lengths that all ended in 2000 (i.e. 2000, 1999–2000, 1998–2000, etc), and the second set had periods that all ended in 2010 (2010, 2009–2010, 2008–2010 etc). The longest reference period used all 15 years from 1996 to 2010. For the same species and model, all reference period indices were computed from the same model fit. In other words, all years were included in model fits irrespective of the reference period.

For all models and baselines, the amount of uncertainty was measured via the width of a 95% confidence interval for the log scale index in 2020. Data were analyzed using the R-package poptrend (Knape 2016), which uses mgcv (Wood 2006a) as the model fitting engine, and computes confidence intervals using the simulation procedure described above (Wood 2006b; Mandel 2013).

To complement the case study with a more controlled set up, I also analyzed simulated data. As the results were largely similar to the results of the case study the details are provided in Appendix C. R-code for the analyses is available in Supplement 1.

3 Results

3.1 Independently estimated annual indices

In the case of reference periods in the start of the series (ending in 2000), increasing the number of years in the reference period from one to two reduced the width of confidence intervals by between a few percent and up to almost 60% with a median of 18% (Fig. 1a). The median reduction when using three years was around 24% compared to the single year 2000, while also including the first 2 years with fewer routes sampled on average did not lead to further reductions but instead to slight increases. In a very small number of cases, a reference period including years early in the series led to higher uncertainty than for a single reference year.

Fig. 1
figure 1

Boxplots of the relative uncertainty of indices for 100 bird species for independently estimated annual indices (a and b) and for smoothed indices (c and d) when using reference periods of varying length ending in 2000 (a and c) or ending in 2010 (b and d). Uncertainty is measured via the width of 95% confidence intervals at the log scale for indices in year 2020. In panels a and b, uncertainty is relative to the uncertainty of independently estimated annual indices with year 2000 as the baseline, and in c and d relative to smooth indices with 2000 as the baseline

Using a single reference year (2010) in the middle of the series gave uncertainty in the same range as when using multiple reference years early in the series (Fig. 1b). With longer reference periods ending in the middle of the series the width of confidence intervals were reduced by up to over 60%, with a median of over 40% and with at least 25% for each species, compared to indices with year 2000 as the reference. Including the first few years in these reference periods led to only slight increases in uncertainty (Fig. 1b).

3.2 Smoothed indices

For reference periods in the start of the series, more baseline years did not have a strong effect on uncertainty in the smoothed indices (Fig. 1c). Indices with reference period ending in the middle of the series had lower uncertainty (Fig. 1d). This was not only a consequence of more recent years being included in the reference period as extending the reference period backwards in time gave less index uncertainty than using only year 2010 as the reference (Fig. 1d). This is the opposite of what one would expect for a log-linear index for which uncertainty is mainly determined by the location of the midpoint of the reference period (Appendix B).

The magnitude of reduction in uncertainty for longer reference periods was lower than for independently estimated annual indices. The maximum reduction in uncertainty was on average around 15% compared to using year 2000 as the reference (Fig. 1d).

All results reported above are scaled by the width of confidence intervals for an index with the single year 2000 as the reference period ((width of the confidence interval for index with reference period)/(width of the confidence interval for index with year 2000 as the reference)). This scaling removes heterogeneity in estimation errors due to species properties, such as abundance and overdispersion. The variability in this scaling factor (the denominator of the ratio) is presented in Fig. 2.

Fig. 2
figure 2

Boxplots of the uncertainty of indices for year 2020 for 100 bird species and independently estimated annual indices and smooth indices when using the single reference year 2000. Uncertainty is measured via the width of 95% confidence intervals at the log scale for indices in year 2020

An illustration of the different models fitted to data on willow warblers can be found in Appendix D (Fig. 8).

4 Discussion

Choosing a baseline to anchor indices of population or biodiversity status is often necessary for meaningful presentation of change. The typical baseline choice is a single year in early parts of the series. The results of this study show that when annual indices are independently estimated, uncertainty can be greatly reduced by redefining the index using a longer reference period as the baseline. A single reference year may give the impression that uncertainty about population change or biodiversity is large, while in fact the main portion of uncertainty comes from asserting the level in the specific reference year. A longer reference period may therefore be beneficial and give a more accurate picture of uncertainty.

Previous studies have suggested that smoothed indices are less sensitive to the choice of reference year or period (Buckland and Johnston 2017). The results of this study confirm this empirically in that the length of reference periods had less impact on uncertainty than for independently estimated annual indices, and for all baselines there was less variation in uncertainty. Even so, smooth indices with long reference periods had, on average, around 15% less uncertainty than smooth indices with a single reference year. Longer reference periods therefore can be useful also for smooth indices, as long as the smooth index is not near linear in which case there is not much to gain from using reference periods instead of a single year (Appendix B).

When deciding on a baseline, the first priority should be a choice that reflects the purpose of the index. If the purpose is to compare the current status to the status in a specific year in the past, then a single reference year is appropriate despite potentially high uncertainty. In other cases, a reference period may fit well with the purpose of the index. Using a moving ten year period as the reference may for example fit well with IUCN red list assessments where change during the last 10 year period is an important criterion, or in the context of consequences of climate change a reference period coinciding with a climate normal period such as 1961–1990 could be suitable. Often there may not be an obvious year or period against which comparisons should be made, as the main purpose of many indices is in understanding how the size of a population, or the biodiversity of a community, has changed relative to the past in a more loose sense. In such cases choosing a baseline so that the index does not convey irrelevant uncertainty should be an important consideration. In light of the results here, using the mean over a large part, or all, of the series, or over the previous ten-year period (Buckland and Johnston 2017) seem like reasonable default choices in such situations.

Alternative suggestions have been to entirely remove the influence of baseline years by focusing on the slope or curvature of estimated smooth index curves (Buckland and Johnston 2017). Specifically, p-values for whether the first or second derivative of the smooth curve deviates from zero may be computed (Fewster et al. 2000), and one may use these to produce indicators for periods where the change in the curve, or the slope of the curve, is significant. Such indicators are a highly useful complement to indices, but address a partially different question. They are indicators for periods of change in status, but do not provide direct estimates of the cumulative magnitude of the change. Both of these are of prime interest for monitoring, and can be simultaneously presented in displays of indicators.

Uncertainty is an important but sometimes neglected component of population indices (Fraixedas et al. 2020). It is imperative that uncertainty estimates account for the main sources of error, which mainly comes down to a sound choice of model for producing raw index estimates. Given that important sources of error in the data have been properly accounted for, indices should be presented in a way that does not include irrelevant uncertainty. Smoothing indices and/or using longer reference periods, are useful approaches to achieve this.