Skip to main content
Log in

Bayesian estimation of species relative abundances and habitat preferences using opportunistic data

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

We develop a new statistical procedure to monitor relative species abundances and their respective preferences for different habitat types, using opportunistic data. Following Giraud et al. (Biometrics 72(2):649–658, 2015), we combine the opportunistic data with some standardized data in order to correct the bias inherent to the opportunistic data collection. Species observations are modeled by Poisson distributions whose parameters quantify species abundances and habitat preferences, and are estimated using Bayesian computations. Our main contributions are (i) to tackle the bias induced by habitat selection behaviors, (ii) to handle data where the habitat type associated to each observation is unknown, (iii) to estimate probabilities of selection of habitat for the species. As an illustration, we estimate common bird species habitat preferences and abundances in the region of Aquitaine (France).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ball S, Morris R, Rotheray G, Watt K (2011) Atlas of the hoverflies of great britain (diptera, syrphidae). Centre for Ecology and Hydrology, Wallingford

    Google Scholar 

  • Bellamy PE, Brown NJ, Enoksson B, Firbank LG, Fuller RJ, Hinsley SA, Schotman AGM (1998) The influences of habitat, landscape structure and climate on local distribution patterns of the nuthatch (Sitta europaea L.). Oecologia 115(1–2):127–136

    Article  CAS  PubMed  Google Scholar 

  • Biggs CR, Olden JD (2011) Multi-scale habitat occupancy of invasive lionfish (Pterois volitans) in coral reef environments of roatan, honduras. Aquat Invasions 6:347–353

    Article  Google Scholar 

  • Boutin J, Roux D, Eraud C (2003) Breeding bird monitoring in France: the act survey. Ornis Hung 12(13):1–2

    Google Scholar 

  • Boyce M, McDonald L (1999) Relating populations to habitats using resource selection functions. Trends Ecol Evol 14:268–272

    Article  CAS  PubMed  Google Scholar 

  • Buckland S, Anderson D, Burnham K, Laake J (1993) Distance sampling: estimating abundance of biological populations. Chapman & Hall, New York

    Book  Google Scholar 

  • Calenge C, Dufour A, Maillard D (2005) K-select analysis: a new method to analyse habitat selection in radio-tracking studies. Ecol Model 186:143–153

    Article  Google Scholar 

  • Dickinson JL, Zuckerberg B, Bonter DN (2010) Citizen science as an ecological research tool: challenges and benefits. Annu Rev Ecol Evol Syst 41(1):149–172

    Article  Google Scholar 

  • Fithian W, Elith J, Hastie T, Keith D (2014) Bias correction in species distribution models: pooling survey and collection data for multiple species. Methods Ecol Evol 6:424–438

    Article  PubMed  PubMed Central  Google Scholar 

  • Fuller RM, Devereux BJ, Gillings S, Amable GS, Hill RA (2005) Indices of bird-habitat preference from field surveys of birds and remote sensing of land cover: a study of south-eastern England with wider implications for conservation and biodiversity assessment. Glob Ecol Biogeogr 14:223–239

    Article  Google Scholar 

  • Giraud C, Calenge C, Coron C, Julliard R (2015) Capitalizing on opportunistic data for monitoring species relative abundances. Biometrics 72(2):649–658

    Article  PubMed  Google Scholar 

  • Isaac NJB, van Strien AJ, August TA, de Zeeuw MP, Roy DB (2014) Statistics for citizen science: extracting signals of change from noisy ecological data. Methods Ecol Evol 5:1052–1060

    Article  Google Scholar 

  • Jiguet F, Devictor V, Julliard R, Couvet D (2012) French citizens monitoring ordinary birds provide tools for conservation and ecological sciences. Acta Oecol 44:58–66

    Article  Google Scholar 

  • Lele SR, Merrill EH, Keim J, Boyce MS (2013) Selection, use, choice and occupancy: clarifying concepts in resource selection studies. J Anim Ecol 82:1183–1191

    Article  PubMed  Google Scholar 

  • Link WA, Sauer JR (1998) Estimating population change from count data: application to the North American breeding bird survey. Ecol Appl 8:258–268

    Article  Google Scholar 

  • MacKenzie D (2005) What are the issues with presence–absence data for wildlife managers? J Wildl Manag 69:849–860

    Article  Google Scholar 

  • Mair L, Ruete A (2016) Explaining spatial variation in the recording effort of citizen science data across multiple taxa. PLoS ONE 11(1):1–13

    Article  Google Scholar 

  • Manly B, McDonald L, Thomas D, MacDonald T, Erickson W (2002) Resource selection by animals. Statistical design and analysis for field studies. Kluwer Academic Publisher, London

    Google Scholar 

  • Mason CF, Macdonald SM (2004) Distribution of foraging rooks, corvus frugilegus, and rookeries in a landscape in Eastern England dominated by winter cereals. Folia Zool 53(2):179–188

    Google Scholar 

  • Mysterud A, Ims R (1998) Functional responses in habitat use: availability influences relative use in trade-off situations. Ecology 79:1435–1441

    Article  Google Scholar 

  • Phillips S, Dudík M, Elith J, Graham C, Lehmann A, Leathwick J, Ferrier S (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecol Appl 19:181–197

    Article  PubMed  Google Scholar 

  • Plummer M (2003) Jags: a program for analysis of bayesian graphical models using Gibbs sampling. In:3rd International workshop on distributed statistical computing (DSC 2003), vol 124. Vienna, Austria

  • Plummer M (2014) Rjags: Bayesian graphical models using MCMC. R package version, pp. 3–13

  • Pollock KH (1982) A capture recapture design robust to unequal probability of capture. J Wildl Manag 46:752–757

    Article  Google Scholar 

  • Core Team R (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  • Roy H, Adriaens T, Isaac N, Kenis M, Martin G, Brown PEA (2012) Invasive alien predator causes rapid declines of native European ladybirds. Divers Distrib 18:717–725

    Article  Google Scholar 

  • Royle JA, Nichols JD, Kéry M (2005) Modelling occurrence and abundance of species when detection is imperfect. Oikos 110(2):353–359

    Article  Google Scholar 

  • Telfer M, Preston C, Rothery P (2002) A general method for measuring relative change in range size from biological atlas data. Biol Conserv 107:99–109

    Article  Google Scholar 

  • Tulloch A, Szabo J (2012) A behavioural ecology approach to understand volunteer surveying for citizen science datasets. Emu 112:313–325

    Article  Google Scholar 

  • van Strien A, van Swaay C, Termaat T (2013) Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models. J Appl Ecol 50:1450–1458

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Benjamin Auder for his contribution to the computer code for data treatment. We sincerely thank Denis Roux and all observers from the ACT network. We also thank the managers of both programs Faune d’Aquitaine and STOC-EPS, as well as the observers participating to these programs. This work was partially funded by public grants as part of the “Investissement d’avenir” project, reference ANR-11-LABX-0056-LMH, LabEx LMH, and reference ANR-10-CAMP-0151-02, Fondation Mathématiques Jacques Hadamard, by the Chair “Modélisation Mathématique et Biodiversité” of VEOLIA-Ecole Polytechnique-MNHN-F.X, and by the Mission for Interdisciplinarity at CNRS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Camille Coron.

Additional information

Handling Editor Pierre Dutilleul.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 267 KB)

Supplementary material 2 (pdf 238 KB)

Appendix

Appendix

1.1 Datasets

For the first (standardized) dataset we used the ACT (Alaudidae, Columbidae, Turdidae) monitoring survey (see Boutin et al. (2003) for more details concerning this dataset and its protocole) in which 13 species of birds are monitored (see Table 3). The observers are professionals from the technical staff of the participating organisms. The Aquitaine region was discretized into 66 quadrats, in which a 4-km-long route was randomly placed in non-urban habitat (see Fig. 4). Each route was traveled twice between April and mid-June and included 5 points separated by exactly 1 km: at each travel, each point was visited for exactly 10 min. The species of every bird heard or seen was recorded, and for each point and each species, we have access to the maximum of the counts from the two visits (in order to take advantage of the maximum detectability and to avoid effects due to migration, as explained in Pollock 1982). This protocol was repeated for several years and we use data from 2008 to 2011, which finally leads to 239 visits of quadrats (some of the quadrats were not visited each year), therefore leading to \(13*5*239=15535\) data, corresponding to the reporting of 7899 birds observations (some species are not always detected).

For opportunistic data, we used the dataset collected by the website www.fauneaquitaine.org (handled by the LPO), on which anyone can register and report the species, number of detected individuals, date, and location associated to any bird observations made in Aquitaine. The level of precision of the location is variable: exact location, locality indication, or municipality indication. For numerical analyses, to deal with this inhomogeneity in location information, we will use the municipality in which each observation was made, which is always given. As previously, we selected all such records between April and mid-June for the years 2008–2011. This led to 693,581 birds observations in 1622 municipalities (see Fig. 4), monitoring 34 species. Note that observers can go anywhere, for an unknown amount of time, and that they report their observations with an unknown probability (that might depend on the observed species); therefore these data do not provide any information concerning observation effort.

For the validation dataset used to assess the predictive power of our approach, we used the data from the STOC program (Suivi temporel des oiseaux communs), which is a French breeding bird survey carried out by the French Museum of Natural History (MNHN, Museum National d’Histoire Naturelle). The protocole of this survey (see Jiguet et al. 2012 for more details) is the following: each observer is assigned a \(2\times 2\) km square whose position is uniformly randomly chosen within 10 km of his/her house. The observer then distributes on the considered square, 10 observation points that have to be representative of the different habitats areas on the square, and each point is visited twice between April and mid-June, during 5 min. Every observation of each species (hearing or seeing) is reported and the maximum count among the two visits is kept, as for the ACT program. As previously, we use all such records for the years 2008–2011. This leads to 86526 birds observations in 38 squares (see Fig. 4), monitoring 34 species (the same than for the LPO dataset).

Fig. 4
figure 4

Positions of data collection

Table 3 provides the list of the 34 bird species under study.

Table 3 List of the 34 bird species under study

1.2 Identifiability and models implementation

1.2.1 Reparametrization of the model

Let us recall our model for the observations, where c is a cell of the site j:

$$\begin{aligned} \begin{aligned} X_{ick}&\sim \mathcal {P}\mathrm{oisson}\left( \int _{\mathcal {A}_c}N_{ij}\frac{S_{ih(x)}}{\sum _{h'}S_{ih'}V_{h'j}}\times E_{ck}\frac{q_{h(x)k}}{\sum _{h'}q_{h'k}V_{h'c}}\times P_{ik}\,dx\right) \\&=\mathcal {P}\mathrm{oisson}\left( N_{ij}E_{ck}P_{ik}\sum _{h} \frac{q_{hk}}{\sum _{h'}q_{h'k}V_{h'c}}\frac{S_{ih}}{\sum _{h'}S_{ih'}V_{h'j}}V_{hc}\right) . \end{aligned} \end{aligned}$$

For standardized data, we can assume either that \(q_{h0}/q_{10}\) is known for all h (generally equal to 1), or that each cell for the standardized dataset is small enough to be composed with only one habitat. In addition, we assume that \(E_{c0}/E_{10}\) is known for all c for the standardized dataset. To implement our model while ensuring identifiability of the parameters, we use the following change of variables

$$\begin{aligned} \tilde{N}_{ij}= & {} \frac{N_{ij}P_{i0}E_{10}}{\sum _{h'}\frac{S_{ih'}}{S_{i1}}V_{h'j}},\quad \tilde{P}_{ik}=\frac{P_{ik}P_{10}}{P_{i0}P_{1k}}, \quad \tilde{E}_{ck}=\frac{E_{ck}P_{1k}}{P_{10}E_{10}}\frac{V_c}{\sum _{h'}\frac{q_{h'k}}{q_{1k}}V_{h'c}},\\ \tilde{q}_{hk}= & {} \frac{q_{hk}}{q_{1k}},\quad \tilde{S}_{ih}=\frac{S_{ih}}{S_{i1}} \end{aligned}$$

where \(V_c=\sum _hV_{hc}\). Using this change of variables, we get that for all i, c and k,

$$\begin{aligned} X_{ick}\sim \mathcal {P}\left( \tilde{N}_{ij}\tilde{E}_{ck}\tilde{P}_{ik}\sum _{h} \tilde{q}_{hk}\tilde{S}_{ih}V_{hc}\right) \end{aligned}$$

with

$$\begin{aligned} \frac{\tilde{N}_{ij}}{\tilde{N}_{i1}}=\frac{N_{ij}}{N_{i1}}\frac{\sum _{h'}\tilde{S}_{ih'}V_{h'1}}{\sum _{h'}\tilde{S}_{ih'}V_{h'j}},\quad \tilde{P}_{i0}=1,\quad \tilde{P}_{11}=1,\quad \tilde{q_{h0}}=1, \quad \tilde{q_{11}}=1, \quad \tilde{S_{i1}}=1 \end{aligned}$$

for all i, c, k, and \(\tilde{E}_{c0}\) is known for all c.

In particular for standardized data, for which we can assume that the habitat associated to each cell c is known (denoted by h(c)), we get:

$$\begin{aligned} X_{ic0}\sim \mathcal {P}\left( \tilde{N}_{ij}\tilde{E}_{c0}\tilde{S}_{ih(c)}\right) , \end{aligned}$$

where \(\tilde{E}_{c0}\) is known for each cell c. This is a generalized linear model with \(IJ+I(H-1)\) unknown parameters (the quantities \(\tilde{N}_{ij}\) as well as habitat selection parameters \(\tilde{S}_{ih}/S_{i1}\) for \(h>1\)). These parameters are identifiable if and only if the matrix Y with size \(C\times (J+H-1)\) giving for each cell c visited by the STOC dataset, the site and habitat associated to this cell (when this habitat is not the first habitat), has rank \(J+H-1\). More precisely, the matrix Y is such that for all \(c\in [[1,C]]\), \(Y_{cj(c)}=1\), \(Y_{c{J+(h(c)-1)}}=1\) if \(h(c)>1\), and \(Y_{cl}=0\) elsewhere.

1.2.2 Implementation with JAGS

The computer code associated to the Sect. 3.1 is given in the numerical Additional File SimulatedData.Rnw. This program calls three models that are written in separate files: one for our model (Additional file ModelSimulatedData.txt), one for the model in which we use only standardized data (Additional file ModelStandardizedSimulatedData.txt), and one for the model in which differences in habitat preferences are neglected (Additional file ModelWithoutHabitatSimulatedData.txt). To make our estimations we used 10000 iterations (n.iter), with a thinning value to 10, 1 chain and 1000 iterations for adaptation (n.adapt). The computation time is about one hour per generated dataset for the three models, using a 2 cores Intel i5 processor.

The computer code associated to the Sect. 2 is given in the numerical Additional File RealData.Rnw. This program calls four models that are written in separate files: one for our model (Additional file ModelWithHabitat.txt), one for the model in which we use only standardized data (Additional file ModelStandardizedOnly.txt), one for the model in which differences in habitat preferences are neglected (Additional file ModelWithoutHabitat.txt), and one for the model in space is considered as one single quadrat (Additional file ModelOneQuadrat.txt). To make our estimations we used 10000 iterations (n.iter), with a thinning value to 10, 1 chain and 1000 iterations for adaptation (n.adapt). The estimations is about one hour per model, using a 2 cores Intel i5 processor.

1.3 Some details on the numerics: the prediction of the STOC data

Let \(\mathcal C^{\textit{STOC}}_{j}\) denote the set of all the observation points c in the quadrat j surveyed in the STOC dataset. The STOC counts for the species i in the quadrat j are

$$\begin{aligned} X_{ij}^{\textit{STOC}}=\sum _{c\in \mathcal C^{\textit{STOC}}_{j}} X_{ic}^{\textit{STOC}}. \end{aligned}$$

Let us denote by h(c) the habitat type of the observation point c. In our model (1), the average number of individuals of the species i in the square \(c\in \mathcal C^{\textit{STOC}}_{j}\) is given by

$$\begin{aligned} \int _{\mathcal {A}_{c}}\frac{N_{ij} S_{ih(c)}}{\sum _{h'} S_{ih'}V_{h'j}} \,dx = \frac{N_{ij} S_{ih(c)}V_{c}}{\sum _{h'} S_{ih'}V_{h'j}}. \end{aligned}$$

Taking into account a variable observational effort \(E_{c}^{\textit{STOC}}\) on each observation point c, we then predict \(X_{ij}^{\textit{STOC}}\) from the estimation based on our Model (1) by

$$\begin{aligned} \widehat{X}_{ij}^{model}=\hat{N}^{model}_{ij} \sum _{c\in \mathcal C^{\textit{STOC}}_{j}} E_{c}^{\textit{STOC}} \frac{\hat{S}^{model}_{ih(c)}V_{c}}{\sum _{h'} \hat{S}^{model}_{ih'}V_{h'j}}, \end{aligned}$$

where the observational effort \(E_{c}^{\textit{STOC}}\) is given by the number of years of observation at the observation point c.

For the one quadrat model with habitat [One Quadrat with hab] displayed in Table 2, the prediction is given by

$$\begin{aligned} \widehat{X}_{ij}^{model}=\hat{N}^{model}_{i} \sum _{c\in \mathcal C^{\textit{STOC}}_{j}} E_{c}^{\textit{STOC}} \frac{\hat{S}^{model}_{ih(c)}V_{c}}{\sum _{h'} \hat{S}^{model}_{ih'}V_{h'}}, \end{aligned}$$

with \(V_{h'}\) the area of the habitat type \(h'\) in the whole quadrat.

When the Model (2) is used for estimation, then the predictions are given by

$$\begin{aligned} \widehat{X}_{ij}^{model}=\hat{N}_{ij} \sum _{c\in \mathcal C^{\textit{STOC}}_{j}} E_{c}^{\textit{STOC}}\frac{V_{c}}{V_{j}}. \end{aligned}$$

1.4 Additional results on simulated data

In this section we provide additional results to the ones presented in Sect. 3.1 for simulated data. Figure 5, as a complement to Fig. 1, provides the posterior distributions of the habitat selection probabilities \(S_{i2}\) for all i, showing that these posterior are a good approximations to the reference values that we wish to estimate.

Fig. 5
figure 5

Posterior distributions of the habitat selection probabilities \(S_{i2}\) for all i (\(S_{i1}=1\) for all i, each graph corresponds to a different value of i). The reference values are given in red

1.5 Additional results on real data

1.5.1 Some ecological results

In this section we provide additional results answering to natural ecological motivations. In Fig. 6 we give maps of the estimated densities of the Eurasian nuthatch, with and without habitat structure. The important differences between these two maps highlight the necessity of taking into account habitat. In Fig. 7 we give the mean preferences of all considered species, for each habitat type. The values of these preferences present important differences (the highest being 10 times larger than the lowest), which is crucial to take into account when predicting reaction of species to environmental change for instance.

Fig. 6
figure 6

Relative density maps of the European nuthatch, without (left), and with (right) taking into account habitat structure. For each quadrat the gray level indicates the relative density \(\frac{\hat{N}^{model}_{ij}V_1}{\hat{N}^{model}_{i1}V_j}\) where \(V_l\) is the area of quadrat l

Fig. 7
figure 7

Mean species estimated habitat selection probabilities, without (left) and with (right) habitat dependent detectability

1.5.2 Taking into account habitat dependent detectability

We so far assumed that the detectability \(P_{ik}\) of species i in dataset k does not depend on the habitat, which might be unrealistic since, in particular, the range of vision (or hearing) can be different from one habitat to the other. If so, our estimation of habitat selection parameters \(S_{ih}\) but also of species relative abundances can be biased. Due to identifiability constraints, we cannot add and estimate an unknown list of parameters \(\alpha _h\) taking into account the dependence of detectability on the habitat (since they will be undistinguishable from the species habitat selection probabilities \(S_{ih}\)). Our proposition is to use an auxiliary dataset that can provide informations concerning the detectability associated to each considered habitat. We test this idea using a dataset provided by VigieNature that gives for different kinds of habitats the respective numbers of observations made in different distance ranges. The program associated to this section is given in the file alpha.R.

For each habitat h, we can assume that detection probability is equal to 1 when observed individuals are “close enough” to the observer, since we only want to quantify the loss in detectability in each habitat due to the limitation in the range of vision (or hearing) in this habitat. More precisely, we assume that the detection probability is equal to 1 when the observed individual is less than 25 m far from the observer. Then if we denote by \(Y_h\) the number of observed individuals in habitat h and by \(Y_{1h}\) the number of observed individuals in habitat h, at distance less than 25 m from the observer, we can quantify the detectability in habitat h by the quantity

$$\begin{aligned} \alpha _h=\frac{Y_{h}/Y_{1h}}{Y_{1}/Y_{11}}. \end{aligned}$$

The result of these calculations is given in Table 4. As expected, the detectability is lower in urbanized area and forest than in open and agricultural landscapes. The impact of taking habitat detectability into account is illustrated in Fig. 7. This figure shows that difference between the highest and the lowest mean habitat selection probabilities is even higher than predicted without taking account habitat detectability (the former being about 15 times larger than the latter).

Table 4 The detectability associated to each habitat, taking account differences in ranges of vision

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coron, C., Calenge, C., Giraud, C. et al. Bayesian estimation of species relative abundances and habitat preferences using opportunistic data. Environ Ecol Stat 25, 71–93 (2018). https://doi.org/10.1007/s10651-018-0398-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-018-0398-2

Keywords

Navigation