Introduction

Methods for the accurate assessment of population trends are a valuable tool in wildlife management, in particular, providing key evidence to prioritise action such as conservation effort (BRIG 2007). Whilst current methods typically rely upon direct data collection, which can be impractical and costly to obtain on the large spatial and temporal scales required, there have been a growing number which overcome these issues by instead using publically available citizen science data (Bonney et al. 2009). Citizen science datasets are becoming an important resource in ecology. They hold the potential to reduce economic costs of conducting large-scale studies to estimate species presence or absence to aid management decisions (Croft et al. 2017). However, there is an ongoing challenge on how to manipulate publicly collected data that often contains various biases as a result of uneven recording intensity, irregular spatial coverage, and differences in sampling effort (Dickinson et al. 2010; Isaac et al. 2014). This inconsistency in data quality has been documented to be one of the predominant reasons that scientists avoid using citizen science datasets in peer-reviewed publications (Burgess et al. 2017).

Subsampling methods were initially used to predict population trends for UK birds and butterflies (Warren et al. 2001; Thomas et al. 2004). These methods compare changes in species distributions across time periods accounting for differences in survey effort. A recent publication by Hof and Bright (2016) applies subsampling to mammals for the first time. They use a method based on citizen science data to quantify population trends in England for West European hedgehog (Erinaceus europaeus); a UK Biodiversity Action Plan (BAP) species documented to have experienced recent declines (JNCC 2010). The study compares data sourced from the Global Biodiversity Information Facility (GBIF), an international data portal for biodiversity information, and a nationwide “Hog Watch” public participation survey (which they report is now also available via GBIF) across two 15-year time periods, 1960–1975 and 2000–2015. In accordance with their hypothesis, the findings from this study confirmed a decline in hedgehog distribution across England suggesting a loss of 5–7.4% since 1975.

This introduces some interesting ideas about how readily available occurrence data could be used to infer changes in species status, accounting for numerical differences in recording effort, and demonstrates some merit by reproducing a suspected trend. However, we raise concern that there is insufficient consideration for spatial differences in recording, which based on our experience are a feature of publically available mammal data (Croft et al. 2017) and likely also of other taxa. If not properly accounted for this could have a significant impact on the distribution of randomly selected records. For instance, it has been shown that in more recent datasets, recording is significantly greater at more accessible locations, i.e. in urban centres and close to roads (Warton et al. 2013). In a random subsample, such biases could mean a greater number of records selected within the same cells, thereby reducing the number of cells occupied and consequently skewing predictions towards a decline.

Here, we test this hypothesis by applying the methodology to a suite of terrestrial mammal species in England. We specifically include species such as deer which are known to be expanding both in range and population. By conducting this analysis, we aim to assess the degree to which the subsampling method is susceptible to the spatial biases which occur in citizen science data, and thus whether it is a valid, reliable approach for predicting population trends for these taxa. We did not assess the reliability of this approach for predicting population trends for other taxa.

Materials and methods

Following the method used in Hof and Bright (2016), occurrence records within each of the two study periods, 1960–1975 and 2000–2015, were downloaded from GBIF for all terrestrial mammal species in the UK on the basis of “human observation”, “machine observation”, or “observation” (https://doi.org/10.15468/dl.byq1fk). For comparison, occurrence records for the same time periods were also downloaded from the National Biodiversity Network (NBN) Atlas (https://data.nbn.org.uk/): the UK node for GBIF. The NBN is the UK’s largest biodiversity information portal obtaining data from a suite of providers such as government, charities, and local record centres. Typically, the NBN is updated more frequently than GBIF and should offer a greater number of records to analyse. If robust to recording bias, we argue that predicted trends should be insensitive to data source, and hence, predictions based on either GBIF or NBN should be consistent.

From each download coordinates (“decimallatitude”/“decimallongitude”), “eventdate” and “species” were extracted for all observations, excluding any which did not fulfil this basic level of information. Coordinates of both datasets were transformed from WGS84 to the British National Grid (BNG), before being assigned to a respective 10-km Ordnance Survey (OS) grid cell to enable data analysis at the 10 km by 10 km scale. To allow a comparison of West European hedgehog trends with those published by Hof and Bright (2016), data was only analysed within the political boundary of England. Therefore, all records outside of this boundary were omitted.

GBIF and NBN datasets do not have consistent records for observer, coordinate uncertainty, or time of sighting, and so, it was not feasible to discard observations where there were multiple sightings, by the same observer, at the same locality or observations with less than 100 m accuracy, without removing a significant proportion of both datasets. Nevertheless, in accordance with Hof and Bright (2016), duplicate records with identical “site”, “eventdate”, and “species” were discarded in an effort to remove some of the bias in survey effort.

Using these cleaned datasets, historic and recent coverage of each species (33 in total with sufficient data to perform the analysis; nominally at least 15 records across 10 unique sites per dataset) across England was then calculated by dividing the number of 10 km by 10-km grid cells with recorded occurrences by the total number of 10 km by 10-km grid cells within England, 1739 as documented by Hof and Bright (2016). However, due to suspected increased surveying effort over time, the recent dataset was nearly always larger than the historic dataset. For example, recent records for West European hedgehog were 7 and 35 times larger for GBIF and the NBN datasets, respectively. Therefore, similar to Hof and Bright (2016), we performed a subsampling approach matching the size of the larger dataset to the size of the smaller dataset by taking a random sample and omitting any remaining records. Subsequently, the percentage coverage of the subsample was calculated and hence the relative change in coverage. This process was repeated 10,000 times, calculating the mean, 5 and 95% confidence limits for the relative change between the two study periods.

In order to measure the validity of our results, we compared predicted trends for each species against a consensus determined from a review of trends published in the wider literature (Aebischer et al. 2011; Battersby 2005; BCT 2014; Croft et al. 2017; Harris et al. 1995; Harris et al. 2015). Mostly, these trends were only qualitative indicating whether populations are significantly increasing, decreasing, or stable. The latter is difficult to define quantitatively, and there are several classification methods available. Initially, we considered using confidence limits to determine if a change is statistically different from zero. However, we found that the variance generated by the subsampling approach was generally too small to predict any stable trends. We also considered setting limits based on definitions from The International Union for Conservation of Nature (IUCN), which classes species as “vulnerable” if they experience a 20% decline in abundance (IUCN 2001); although we note it is arguable that a 25% increase or decrease would need to occur to observe a change in the UK, therefore, this is used for the “moderate” declines forming UK BAP species listings (BRIG 2007). However, based on these criteria, all populations would be classified as “stable”. Rather than either of these criteria, we instead suggest a simple classification based on the average magnitude of change in coverage for each dataset across all species to determine whether a change is significant, i.e. an increase or decline depending on direction, respectively. Population changes with magnitudes less than the average across all species are considered stable.

Finally, to confirm, as suggested by previous studies (Warton et al. 2013), that more recent datasets contain greater clustering of records, which we argue may confound predictions, we computed the variance to mean ratio (VMR) for each historic and recent dataset. VMR is a measure of dispersion and can be used to quantify how clustered records are. Higher VMR values indicate a greater degree of dispersion, or clustering, in records.

Results

Initially focusing on the West European hedgehog, we observed, in agreement with Hof and Bright (2016), an increase in the number of occurrence records within the more recent study period (2000–2015) compared with the historic period (1960–1975) in both the GBIF and NBN datasets. Perhaps unsurprisingly, the dataset obtained from GBIF was most similar to that reported in Hof and Bright (2016), albeit with an increase in the number of recent records (nearly double) clustered over a marginally reduced area (80 versus 91.4%). The dataset obtained from the NBN contained fewer historic records than that reported in Hof and Bright (2016) over a reduced spatial coverage. It also showed a substantial increase in the volume of recent records (five times), but similar to the dataset obtained from GBIF, these were distributed over a reduced area compared to that of Hof and Bright (74% of England compared to 91.4 in the original publication). Both historic and recent datasets based on GBIF and the NBN showed indications of clustering (VMR > 1) with VMR values of 3.4 (historic) and 84.1 (recent) for GBIF and 3.9 and 115.1 for the NBN, respectively.

Following the subsampling approach, we predicted a change in population of − 15.7% (± 0.01%; 95% CI) and − 7.81% (± 0.01%) based on the GBIF and NBN datasets, respectively. Comparing these to the mean change across all species of 10 and 3% for the GBIF and NBN datasets, respectively, we would conclude that both indicate a significant decline in population. The published trends on West European hedgehogs (Aebischer et al. 2011; Battersby 2005; BCT 2014; Croft et al. 2017; Harris et al. 1995; Harris et al. 2015) suggest that the species experienced a decline between 1960 and 1995 and have remained stable since. Therefore, an overall decline would be expected between the two study periods, which is supported by these results. It is worth noting that the results from this study and the Hof and Bright (2016) publication do not predict the West European hedgehog to be declining by 25% or more, despite it being listed as a UK BAP species, judged as showing more than a 50% decline over 25 years (JNCC 2010).

As mentioned previously, all deer species in England have published population trends showing an overall increase in abundance (Table 1). Muntiacus reevesi (muntjac deer) and Capreolous capreolous (roe deer) have displayed population increases between 1960 and 1995 and post 1995, whereas Cervus elaphus (red deer) and Dama dama (fallow deer) have experienced population increases between 1960 and 1995 but remained stable post 1995. As a result of the analysis, using the value set for significant change in populations, the GBIF dataset predicts 100% (n = 4) of the deer species to remain stable. The NBN dataset predicts roe deer to be increasing, red deer to remain stable, and both fallow deer and muntjac deer to be declining (Table 1; highlighted in grey). VMR scores showed historic and recent records to be clustered for all species based on both GBIF and NBN datasets with the exception of muntjac where historic records were marginally under-dispersed. In all cases, the VMR of recent records was substantially increased (approximately ten times) compared to that of the historic records: average VMR of 5.4 (historic) to 53.1 (recent) and 8.0 to 74.2 for GBIF and the NBN datasets, respectively.

Table 1 A summary of the subsampling results for 33 mammal species in England using GBIF and NBN opportunistic occurrence datasets

Of the 33 species analysed in this report, only 12% (n = 4) displayed trends across both datasets that match trends published in the wider literature, one species remaining stable and three decreasing; 6% (n = 2) had trends that matched published data for the GBIF dataset but not NBN, and both were species in decline; 18% (n = 6) showed trends that matched published trends for the NBN dataset but not GBIF; and 61% (n = 20) displayed trends for both GBIF and NBN datasets that do not match trends published in literature, 35% (n = 7) of which showing conflicting trends which means this occurred for a total of 45% (n = 15) of species. It should be noted that for one species, Vulpes vulpes (red fox), a GBIF trend could not be calculated due to lack of data. A decrease in distribution was calculated from NBN data which disagreed with published trends for the species.

Trends based on the GBIF dataset showed a clear bias towards predicting a decline with 65% (n = 17) of the 26 trends incorrectly predicted classified with a more pessimistic outlook than expected. The mean percentage change across all species was also skewed heavily towards decline showing a value of − 8.7%. Trends based on the NBN also show some signs of bias towards predicting a population decline but was more marginal; 56% (n = 13 out of 23 incorrect predictions) classified with a more pessimistic outlook than expected and a mean change across all species of − 1.1%.

In general, VMR statistics across all species, for GBIF and NBN datasets, indicated some degree of clustering (likely owing to underlying habitat preference of species) of both historic and recent records with clustering of recent records substantially greater. An average increase in VMR for recent records of approximately ten times compared to that computed for historic records: 3.2 (historic) to 36.2 (recent) for GBIF and 4.1 to 56.9 for the NBN, respectively.

Discussion

In this paper, we have tested the efficacy of subsampling methodology based on citizen science data to predict population trends for mammal species. Our hypothesis was that the method does not adequately address spatial bias which is known to occur in this type of data (Dickinson et al. 2010) and as a consequence that the predictions generated are not reliable to infer change, even at a qualitative level. The results across the 33 mammal species tested confirm this suspicion with only 12% showing predicted trends consistent with other published studies independent of which dataset, GBIF or NBN, was used. The lack of consistency in the remaining 88% of cases, particularly the disagreement between predictions produced using different albeit similar datasets, highlights the method’s dependence on choice of dataset and an inherent vulnerability to the survey biases they contain. It must however be noted that the published trends were also not always consistent across sources.

It is unclear whether this failure is solely due to the presence of spatial bias, but we suggest that the greater spatial clustering observed in more recent datasets, albeit not noticeably around urban areas (Warton et al. 2013) due largely to the proliferation of mobile phone apps (e.g. mammal tracker) and the internet (Dickinson et al. 2012; Tulloch et al. 2013) as suggested by previous studies, may cause subsampling to underestimate the number of occupied cells. Consistent with this, the resultant predictions are biased towards a general decline in populations when based on GBIF data, and to a lesser extent by those generated from the NBN data. Both show that when trends are predicted incorrectly, they were more likely to indicate a more negative outcome. We note however that there are a substantial percentage of incorrectly predicted trends for which we see more positive outlook than expected. For such cases, spatial bias could still be argued responsible if patterns conflict with the underlying species distribution thereby reducing clustering in the recent dataset (compared to that which may be anticipated from habitat preference alone). In either case, the extent to which any spatial bias may affect the results is a complex problem which is likely dependent on other factors in the method such as the density of records in the historic dataset (number and coverage of records which have opposite impacts on predicted trend; an increased number of records, or subsample size, should induce a positive change and mitigate the effects from spatial bias whereas a decrease in coverage should elicit a similar effect by increasing the difference between the historic and subsampled datasets) and the coverage of records in the recent, larger, dataset.

A common approach to mitigate spatial bias is by aggregating data to a lower resolution, thus reducing variability. However, the nature of the subsampling method, which relies on spatial variation of records to indicate relative confidence in presence (typically, we would expect these to appear towards the range edge), means that aggregation also reduces model sensitivity preventing detection of any population changes. Nevertheless, we repeated the analysis based on a 25-, 50-, and 100-km grid to test whether this may provide a solution. The results showed no improvement in the qualitative classification of trends.

Several toolkits have been developed to address the problem of bias in citizen science data (Isaac et al. 2014), for example, Frescalo (Hill 2012) which attempts to standardise survey effort spatially by scaling the number of records for each species relative to the proportion of “benchmark” species observed within a given neighbourhood compared to the number expected, the application of which may improve the reliability of this method. However, there are a number of competing methodologies developed and successfully applied for other taxa which may provide a simpler more robust alternative. For instance, a method developed by Ball et al. (2011) to produce an atlas of UK hoverfly species proposed species status could be estimated as a proportion for the total records of all species recorded within the same time frame. Therefore, in the instance of the West European hedgehog, the number of occurrences of hedgehogs per year would be divided by the total number of observations of all mammal species during that year. The results could subsequently be presented as an index from a baseline date, for example 1995, and any deviation from this baseline index would suggest a trend in species abundance. Extensions of this basic idea have been further developed by the Game and Wildlife Conservation Trust (Aebischer et al. 2011) and the British Trust for Ornithology (Harris et al. 2015) using models to smooth the effects from variable survey bias. A more comprehensive toolkit has also been developed by the Centre for Ecology and Hydrology (CEH) to calculate species trends specifically accounting for various biases that frequently occur in occurrence data (Isaac et al. 2014). The methods incorporated vary in complexity from calculating an overall trend for total records over time and using the deviance from this trend to calculate individual species population trends (Telfer et al. 2002), to occupancy models that incorporate a range of covariates (Isaac et al. 2014).

We conclude that in order to reduce economic costs of conducting large-scale studies to predict population trends in mammal species and inform management decisions, the potential of citizen science datasets needs to be harnessed by accounting for relevant biases. At present, without modification to account for spatial survey bias, subsampling approaches are not a reliable tool to predict trends in UK mammal species. More widely, this raises questions regarding the general applicability of subsampling methods based on unstructured citizen science data to predict trends for any taxa. To elevate the problem of spatial bias, we would initially recommend that future studies consider other available methods using a systematic approach starting with the simplest method and increasing complexity. However, if a subsample methodology is applied, it must be done so across a suite of species including some with known qualitative trends in order to provide a benchmark for validation.