Smoothing and trend detection in waterbird monitoring data using structural time-series analysis and the Kalman filter
- First Online:
- Cite this article as:
- Soldaat, L., Visser, H., van Roomen, M. et al. J Ornithol (2007) 148(Suppl 2): 351. doi:10.1007/s10336-007-0176-7
- 1.2k Downloads
Many wildlife-monitoring programmes have long time series of species abundance that cannot be summarized adequately by linear trend lines. To describe long time series better, generalized additive models may be used to obtain a smooth trend line through abundance data. We describe another approach to estimate a smoothed trend line through time series consisting of one observation per time point, such as year or month. This method is based on structural time-series models in combination with the Kalman filter and is computerized in the TrendSpotter software. One of its strengths is the possibility to test changes in smoothed abundances between years, taking into account serial correlation. The trend method is applied in the Dutch Waterbird Monitoring Scheme (DWMS), a monitoring scheme for migrating and overwintering waterbirds. Taking the numbers of overwintering Greater Scaup (Aythia marila) in the Netherlands as an example, we demonstrate three applications of the method: (1) trend calculation and classification for each year in the time series, (2) assessing alerts for alarming population declines and (3) testing yearly abundance against a population threshold. We discuss the situations where TrendSpotter is to be preferred over other methods.
KeywordsKalman filter Structural time-series analysis Population alert Trend analysis Generalized additive model (GAM) TRends and Indices for Monitoring data (TRIM) Aythia marila Greater scaup
Long time series of wildlife data are becoming more widespread now that many countries have well-established monitoring programmes. In the Netherlands nature monitoring has been strongly intensified and standardized in the past decade of the twentieth century. This development was mainly driven by the wish to base nature policy on sound knowledge of trends in animal and plant numbers and by international regulations like the European Unions’ Habitats and Birds Directive. The Dutch Network for Ecological Monitoring (NEM; van Strien 2006) now contains 14 monitoring schemes for different species groups, ranging form butterflies and plants to birds. Most time series cover periods of 10–15 years, but time series of bird species span more than 20 years. The longest time series, since 1975, come from the Dutch Waterbird Monitoring Scheme.
In most NEM monitoring schemes data analysis is performed using log-linear Poisson regression, a form of generalized linear models (GLMs) (McCullagh and Nelder 1989; ter Braak et al. 1994). The programme used is TRends and Indices for Monitoring data (TRIM) (Pannekoek and van Strien 2001), which calculates yearly indices per species and also provides an overall linear trend estimate that is based on the yearly indices and is meant to describe the change over the entire study period. However, long time series may show alternating periods of increase and decrease, and linear trends do not summarize these time series adequately. Furthermore, to separate patterns of genuine change from annual fluctuations, it is helpful to apply a smoothing algorithm. Although TRIM is able to assess trends in parts of the time series by using change points, smoothing would require other techniques such as Loess estimators or generalized additive models (GAMs).
GAMs are among the most widely used methods to smooth time series. The smoothed trend line of a GAM will usually look much like a moving average, but, in addition, it provides information on the uncertainty of the trend by calculating confidence limits. GAMs are being used for the analysis of bird monitoring data, for instance in the Breeding Bird Scheme and the Wetland Bird Survey in the UK (Fewster et al. 2000; Atkinson et al. 2006). However, in some situations, GAMs are not satisfactory to apply. In such cases, an alternative smoothing method from the class of structural time-series analysis, in combination with the Kalman filter, may be helpful (Visser 2004, 2005, 2007). In this paper we demonstrate the main characteristics of this method and some applications, using the TrendSpotter software package. We first give a brief introduction to TrendSpotter and compare characteristics of three methods of analysis (TRIM, GAM and TrendSpotter). We then focus on the application of TrendSpotter to data of overwintering Greater Scaup (Aythya marila) in The Netherlands.
Materials and methods
Data and data processing
In the Dutch Waterbird Monitoring Scheme (DWMS) waterbirds are counted on all important water bodies in the Netherlands. These numbers concern mainly migrating and overwintering birds. The waterbird counts are performed in 8–12 months per season, depending on the region (Soldaat et al. 2004; van Roomen et al. 2006a, b).
Although there are many missing values present in the data because not all water bodies are counted each month, the scheme comes close to a total census for many species. For these species a large part of the total population is counted, and the sum of all birds at individual sites is a good approximation of the number of birds in the whole country. For the Greater Scaup, for instance, yearly, more than 90% of all the birds present in the country have probably been counted in the DWMS since 1990.
First, missing monthly counts were imputed with UINDEX (Bell 1995; Underhill and Prŷs-Jones 1994) at the level of so-called monitoring sites, which include all important wetlands. We did not use TRIM here, because TRIM cannot cope with monthly counts. Before imputation, the sites were grouped into 11 regional strata in which a more or less comparable population development was expected.
After summation of the imputed monthly counts to seasonal sums per monitoring site, the seasonal sums with > 90% imputed birds were assigned a missing value, in order to exclude extremely high imputed values that are occasionally generated in the first step. These missing values were imputed again within a database that was not stratified in regional strata.
Characteristics of the methods of analysis
TrendSpotter estimates smoothed population numbers μ1, μ2, . . . , μt, . . . , μN for a time series with N equidistant measurements over time (y1, y2, . . . , yt, . . . , yN). The expectations of these smoothed population numbers are denoted as m1, m2, . . . , mt, . . . , mN. TrendSpotter also estimates the standard deviations of the smoothed population numbers: SD1, SD2, . . . , SDt, . . . , SDN. Finally, it estimates the standard deviations of the differencesμN−μt, denoted here as SDN−t. The estimation of confidence intervals is based on the deviations of time point values from the smoothed line. This emphasizes temporal variation as the main source of error, although measurement errors and errors due to imputation are implicitly incorporated in the yearly measurements. Missing data are easily incorporated into the model. In the present version of TrendSpotter it is not possible to add information on differences in reliability of individual measurements. In TrendSpotter the flexibility of a trend is set by the value of a standard deviation of a noise process. This standard deviation (parameter) can be chosen by hand, or can be optimized by maximum likelihood optimization (the default choice). If the parameter is set to zero, the estimated trend equals the well-known regression trend line (straight line). If the parameter is set to a very large number, the trend will go through all measurements y1, y2, . . . , yt, . . . , yN. All values for mI, SDI, and SDN−t are estimated by use of the Kalman filter. This filter is attractive, because it gives estimates with optimal statistical properties. Although normally distributed residuals (in jargon innovations) are not a necessary condition for the proper use of the Kalman filter, it still is a desirable property. For example, all confidence limits assessed by ±1.96 × standard error can be interpreted as 95% confidence limits. Normality of residuals can be tested by a so-called normality plot. For mathematical details about structural time-series analysis and the Kalman filter please refer to Harvey (1989).
Characteristics of the three methods of analysis
Yes, using the raw data
Yes, using indices of other models
Confidence intervals method
Detection of change points?
Test of indices between years?
All three trend methods discussed can be used to identify change points (Table 1), but the methods differ in their ability to test differences between years. In TRIM, each yearly index can be statistically tested against the base year, taking into account serial correlation. TrendSpotter is also able to test indices between years. More particularly, TrendSpotter tests the difference between the smoothed population number of the last year against each of the preceding years, thereby taking into account serial correlation between years. GAM indices may also be compared between years, but a statistical test is limited because serial correlation cannot be taken into account. Substantial positive serial correlation may be present in bird monitoring data, because partly the same individual birds are counted each year at the same sites. If such serial correlation is not taken into account, confidence intervals may be underestimated, leading to more type I errors.
Trend calculation and classification
TrendSpotter results of the analysis of mean monthly numbers of Greater Scaup in The Netherlands. The analysis was performed on log-transformed data. yt mean population number (untransformed), mt smoothed population number, SDt standard deviation of mt, mN–mt difference between mN and mt, SDN–t standard deviation of the difference between mN and mt
Classification of trends based on the 95% confidence interval of the YCR (CL confidence limit of YCR, CI confidence interval of YCR)
Lower CL >1.05
>5% increase/year (≈ doubling in 15 years)
1.00 < lower CL ≤ 1.05
Increase, but unsure whether > 5% year−1
Confidence interval contains 1.00 and lower CL ≥ 0.95 and upper CL ≤ 1.05
Population changes less than 5% year−1
0.95 ≤ upper CL < 1.00
Decrease, but unsure whether > 5% year−1
Upper CL < 0.95
> 5% decrease/year (≈ halving in 15 years)
Confidence interval contains 1.00 AND (lower CL < 0.95 OR upper CL > 1.05)
CI too large for reliable trend classification
Yearly change rate (YCR), its 95% confidence interval and associated trend classification for the Greater Scaup in the Netherlands, as well as total change rate (TCR), its 95% confidence interval and associated population alerts
Alerts in 2004
No alert for decrease over the whole time series
No alert for decrease over the last 25 years
No alert for decrease over the last 20 years
>25% decrease over the last 15 years!
No alert for decrease over the last 10 years
No alert for decrease over the last 5 years
An extension of trend calculation is the assessment of population alerts. A population alert highlights any declines in bird numbers that are of conservation concern (de Nobel et al. 2002). The British Trust for Ornithology (BTO) alert system (http://www.bto.org/birdtrends2005/alerts.htm) uses strong declines (>50%) and moderate declines (>25%) as alert thresholds, over different time periods (the whole time series and the last 25, 10 and 5 years of the time series). The BTO system uses GAMs to calculate smoothed indices of animal abundance and takes into account the standard errors of the indices to test for significance of the alerts.
TrendSpotter can also be used to calculate alerts through the conversion of the TCR values. The population of Greater Scaup has a TCR of 0.375 since 1990 (see Table 4, fifth column), which is similar to a 62.5% decline [100 × (1.00−0.375) = 62.5]. This decline is significantly larger than 25%, as the maximal TCR gives an estimated 39.3% decline for 1990–2004 [100 × (1.00−0.607) = 39.3], but not significantly larger than 50%. If an alert threshold of 25% decline is used, a moderate alert for the Greater Scaup is generated for the decline since 1990 (Table 4). Alert-thresholds may be any arbitrary value, e.g. >10%, >25% or >50% decline, and may also be based on a confidence interval of 90% instead of 95%.
Testing against a population threshold
Another application of TrendSpotter is the possibility to test if a smoothed index is above or below a certain population threshold. If the standard deviations SDt that are estimated by TrendSpotter are used, the smoothed population number for each year in the time series can be compared statistically with some predefined aim, for instance the favourable conservation status in the framework of the EU Birds Directive. For the Greater Scaup this conservation status was set at a monthly mean of at least 25,000 birds (LNV 2006). The smoothed population numbers for 1988–1995 were significantly above this value, as can be concluded from a comparison of the lower confidence limit in Fig. 1 with the solid grey line. From 1999 until 2003, however, the upper confidence limit was below the favourable conservation status, indicating a deviation from the threshold. In 2004 no significant difference from the favourable conservation status was found, partly due to a slight increase in the numbers of Greater Scaup counted, but also due to the larger confidence intervals at the end of the time series. The widening of confidence intervals at the start and end of time series is inherent to smoothing techniques.
Time series analysis with TrendSpotter has proved useful in the Dutch Waterbird Monitoring Scheme. Time series of up to 30 years, with alternating periods of increases and declines, were adequately described by a smoothed line. Changes in population abundance since any year could be statistically tested and trends could be classified for each individual year and converted into population alerts, as well as tested against population thresholds. GAMs offer similar advantages, but, as said earlier, do not take serial correlation into account with respect to trend classification and population alerts.
TrendSpotter may also be useful in other monitoring schemes, although there are some restrictions to the minimum length of the time series and the maximum percentage of missing yearly values at the start of the time series. Furthermore, the data should not contain too many zero values or values close to zero. Its main limitation, however, is its inability to take into account the uncertainties of the imputing models applied to the raw data, and, in that respect, a GAM applied to raw data is to be preferred. Currently, TrendSpotter is best applied in situations where GAMs to smooth the time series would be less satisfactory. This is not only in the case of a census; TrendSpotter may also be helpful in assessing confidence limits of composite indices for species groups, a type of biodiversity indicator that is becoming increasingly popular (see, e.g. Gregory et al. 2005). TrendSpotter is also the method of choice to smooth time series that contain only one value per time point, e.g. the first laying date of Lapwing in the Netherlands. Many applications of TrendSpotter to composite indices and to single value per year time series are found in http://www.natuurcompendium.nl.
A future development of TrendSpotter will be the inclusion of weights for each year in the time series, which will enable one to incorporate confidence intervals of the indices computed by TRIM or other imputing models. Such a combination of TRIM and TrendSpotter would offer many of the advantages of a GAM based on raw data and is easier to apply than a GAM.
Unfortunately, even such a combination may not completely solve the conceptual problems in the calculation of confidence intervals of trends in waterbirds. For a census, currently, there does not seem to be an analysis method available to take into account imputation uncertainties without, at the same time, including between-site variation. We have chosen to apply TrendSpotter for the DWMS because we prefer to neglect the imputation uncertainties rather than to include them and unavoidably incorporate between-site variation as obtained from bootstrapping using a GAM. This preference is based on the assumption that imputation uncertainties in the DWMS are probably small, as the number of missing counts is limited and extremely high imputed values are excluded from the trend analysis.
Two additional features of TrendSpotter should be noted. First, the programme may be run on time series with cyclic patterns, as are usually present in monthly counts of waterbirds. This could be interesting in the analysis of shifts in the seasonal patterns of birds as an effect of, e.g., climate change. Second, similar to applying covariate models in a GAM, explanatory variables may be added to each record in the dataset, which enables the detection of causal factors for observed changes in population abundance.
This publication would not have been possible without the data collected by many volunteer bird counters, and the financial support of the Dutch Ministry of Agriculture, Nature and Food quality, the Ministry of Transport, Public Works and Water Management and Vogelbescherming Nederland. Thanks are also due to A. Gmelig Meyling for computerizing the calculations described in the paper and to J. Blew and an anonymous reviewer for helpful comments on a previous draft of the paper.