Abstract
We propose an extension of the N-mixture model that enables the estimation of abundances of multiple species as well as the correlations between them. Our novel multi-species N-mixture model (MNM) is the first to address the estimation of both positive and negative inter-species correlations, which allows us to assess the influence of the abundance of one species on another. We provide extensions that permit the analysis of data with excess of zero counts, and relax the assumption that populations are closed through the incorporation of an autoregressive term in the abundance. Our approach provides a method of quantifying the strength of association between species’ population sizes and is of practical use to population and conservation ecologists. We evaluate the performance of the proposed models through simulation experiments in order to examine the accuracy of both model estimates and coverage rates. The results show that the MNM models produce accurate estimates of abundance, inter-species correlations and detection probabilities at a range of sample sizes. The MNM models are applied to avian point data collected as part of the North American Breeding Bird Survey between 2010 and 2019. The results reveal an increase in Bald Eagle abundance in south-eastern Alaska in the decade examined.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Abundance in animal communities is of great interest in ecology, particularly in the areas of conservation and wildlife management (Witmer 2005; Nichols and MacKenzie 2004). Count data is an attractive option for estimating abundance due to the relative affordability with which it may be collected and the reduced risk of harm to both animals and humans when compared to more direct data collection methods (Verdade et al. 2013). However, count data for animal abundance has a tendency to suffer from imperfect detection (i.e., the recorded information is usually imperfect in the sense that it does not represent the total abundance). Furthermore, when the detection probability is small, there is a tendency towards the underestimation of abundance. Due to the characteristics of these data, traditional modelling techniques, such as generalised linear models (McCullagh and Nelder 1989), cannot be applied directly to the data, as they do not accommodate imperfect detection.
N-mixture models (Royle 2004) constitute a class of models which may be used to estimate abundance from count data. These models assume that the population under analysis is closed, i.e., it is constant in terms of births, deaths, and migration. The counts at each site and time are considered independent and identically distributed (i.i.d) random variables that follow a Binomial distribution. In the original N-mixture model, the detection probability is estimated using the data, without the specification of any prior distribution with fixed parameters. The population size at each site is treated as a random effect, with an assumed probability distribution. The distributions that are typically considered for the population size at each site are the Poisson and Negative Binomial, although any other non-negative discrete distribution could also be considered.
The ability to estimate correlations between species abundances allows us to relax any assumption of independent species abundances. This is the aim of the multi-species N-mixture (MNM) models presented in this paper—a new class of models which estimate abundance for multiple species simultaneously while accounting for imperfect detection, and estimate inter-species correlations, which are intended to allow for inferences about the relationships between different species.
The remainder of the paper is organised as follows. In Sect. 3, we introduce our novel modelling framework to estimate abundance and inter-species correlations in animal communities based on spatio-temporal count data. We also describe the model formulation, estimation procedure, and the computation of the inter-species correlations. In Sect. 4, we present the data obtained from the North American Breeding Bird Survey (NABBS) (Pardieck et al. 2020), which will be used to illustrate our modelling approach. Later, in Sect. 5, we compare results of model fit on the NABBS data to obtain the best fit. Finally, in Sect. 6, we present a general discussion.
2 Related works
Several multi-species modelling frameworks have been developed previously which allow for the analysis of occurrence-data (Dorazio and Royle 2005; Yamaura et al. 2011) or count-data (Yamaura et al. 2012; Golding et al. 2017; Gomez et al. 2017) of multiple species.
Dorazio and Royle (2005) developed a model for estimating the size of a biological community by modelling the probability of detection as a Binomial random variable, and the probability of occurrence as a Bernoulli random variable. They allow rates of detection and occurrence to vary among species, and not every species is assumed to be present at every location. However, the aim of their model is to determine the number of species, not the number of individuals of each species, as is the aim of N-mixture models.
Yamaura et al. (2011) developed a multi-species model that estimates the animal abundance from occurrence-data. This is an extension of the single-species model developed by Royle and Nichols (2003), in which binary detection/non-detection data is linked to abundance. Yamaura et al. (2012) extended this model to count data. The assumption behind these models is that the abundances or detection probabilities of species in the community might be linked by species-level or functional group-level characteristics. However, inter-species abundance correlations are not explored within these models.
Gomez et al. (2017) developed a multi-species N-mixture model whose aim was to allow for the estimation of abundance of rare species by borrowing strength from other species in the community. This was done by assuming detection probabilities are drawn at random from a Beta distribution. Another multi-species N-mixture model was developed by Golding et al. (2017), which used the dependent double-observer method to create a multi-species dependent double-observer abundance model. This allowed them to address an issue of false-positive errors in detection. The focus of both Gomez et al. (2017) and Golding et al. (2017) was an improvement in detection probability. None of the preceding multi-species models allow us to make inferences as to the relationships within an ecological community, as we propose to do with our multi-species N-mixture model.
Moral et al. (2018) developed an extension to the single-species N-mixture model, which allowed for the estimation of abundances of two species, and the correlations between these abundances. However, this model only examines two species, and is therefore not as complete as the model we propose here, which allows us to examine whole communities.
Dorazio and Connor (2014) developed a multi-species N-mixture model which allowed for abundances of species with similar traits to be correlated. However, to guarantee positive definite correlations, they only allow for positive correlations through the use of a distance metric d coupled with a spatial autocorrelation structure of the type \(e^{-\frac{d}{\phi }}\). The framework we present here is more complete in that we guarantee positive definiteness of the correlation matrix via an elegant prior setup. We also explore ways of incorporating zero-inflation and open population dynamics, which is not something attempted by Dorazio and Connor (2014).
Finally, Niku et al. (2019) describe generalised linear latent variable models—a modelling technique which allows for obtaining correlation matrices in an elegant manner. However, these models do not allow for the incorporation of imperfect detection.
3 Methods
The models developed in the following Section are a multi-species extension to the original N-mixture model of Royle (2004), which allows for accurate estimation of both the latent abundances and inter-species correlations, while accounting for imperfect detection and relaxing the closure assumption.
3.1 Multi-species N-mixture model (MNM Model)
Consider a study which sees count data \(Y_{its}\) collected, where \(Y_{its}\) is the number of individuals observed for S different species (\(s=1,\ldots ,S\)) from R sites (\(i=1,\ldots ,R\)). Consider also that these samples are taken from each site on T occasions (\(t=1,\ldots ,T\)). The true abundance at site i for species s is given by \(N_{is}\). We observe \(N_{is}\) with detection probability \(p_{its}\), and it is assumed that species populations are closed with respect to births, deaths and migration (i.e., that the population sizes do not change due to any of these factors, akin to the N-mixture model proposed by Royle (2004)). Our model assumes that \(N_{is}\) follows a Poisson distribution, and may be written as:
where \({{\textbf {a}}}_{i} = (a_{i1}, \ldots , a_{iS})^{\top }\). The Poisson rate parameter \(\lambda _{is}\) represents the mean abundance at site i, and \({\textbf{a}}_{i}\) is an S-dimensional vector that contains the random effects \(a_{is}\) that allow us to estimate inter-species correlations. In the above model, covariates may be incorporated in the detection probability and the abundance, with \({\textbf{z}}^\top _{it}\) the it-th row of the design matrix \({\textbf{Z}}\) of dimension \(RT \times q_p\), \(b_{s}\) the \(q_{p} \times 1\) parameter vector for the probability of detection, \({\textbf{x}}^\top _{i}\) the i-th row of the design matrix \({\textbf{X}}\) of dimension \(R \times q_{\lambda }\), and \(\varvec{\beta }\) the \(q_{\lambda } \times 1\) parameter vector for the abundance. Here, \(q_{p}\) and \(q_{\lambda }\) represent the number of covariates associated with the detection probability and the abundance, respectively. Note that different covariate effects may be estimated per species, and other species-level random effects may also be included.
3.2 Hurdle–Poisson model (MNM-Hurdle model)
In this Section, we develop a further extension of the multi-species N-mixture model, appropriate for scenarios in which the number of zero-counts exceeds those expected under a Poisson distribution. We now allow the counts to follow a Hurdle-Poisson distribution, with \(\lambda _{is}\) defined as in the MNM Model, and \(\theta \) the probability of obtaining a zero-count.
The Hurdle-Poisson distribution consists of two separate processes. The first is a Bernoulli process, which determines whether a site is occupied (count is non-zero) or unoccupied (count is zero). If the count is non-zero, a second random variable with a zero-truncated Poisson distribution determines the value of the count, i.e.,
We then define the latent abundances \(N_{is}\) as
which yields
If the Bernoulli process is equal to 0, then the site is unoccupied and \(N_{is}\) is equal to 0. However, if the Bernoulli process is equal to 1, then the hurdle is crossed, and the value of \(N_{is}\) is determined by the zero-truncated Poisson process. Similar to the MNM model, populations are assumed to be closed.
We assume a single probability of obtaining a zero count \(\theta \). However, \(\theta \) may also be allowed to vary by site and/or species, and may depend on covariates through a logit link. All other parameters are distributed as described in the MNM model in Sect. 3.1.
3.3 Autoregressive model (MNM-AR model)
In order to model populations over multiple years, a further extension to the multi-species N-mixture model is proposed, which allows us to relax the assumption that species populations are closed with respect to births, deaths and migration. We do this through the inclusion of an autoregressive term in the abundance parameter.
The study design now consists of data collected over K years (\(k=1,\ldots ,K\)) for S species at R locations, each with T sampling occasions. The observed abundance (Y) and actual abundance (N) are now allowed to vary by year:
If \(k=1\), then \(\lambda _{i1s}\) is defined as before:
However, for \(k>1\), we allow \(\lambda _{iks}\) to depend on the latent abundance at year \(k-1\):
The term \(\text {log}(N_{i(k-1)s}+1)\) is used here rather than the simpler \(N_{i(k-1)s}\) to avoid the rapid increase in sampled \(\lambda \) values when N values are large (Fokianos and Tjøstheim 2011).
3.4 Hurdle-autoregressive model (MNM-Hurdle-AR model)
A straightforward combination of the MNM-Hurdle model and the MNM-AR model produces the MNM-Hurdle-AR model. This model accommodates excess zeros while also accounting for an autoregressive structure in the data. The zero-inflation is introduced as in the MNM-Hurdle model, i.e.,
where
3.5 Model estimation
The models described in this paper are implemented using a Bayesian framework. Each of the above models were implemented in R (R Core Team 2020a) through the probabilistic programming software JAGS (Plummer et al., 2003; Plummer, 2017) using four chains with 50,000 iterations each, of which the first 10,000 were discarded as burn-in, and a thinning of five to reduce autocorrelation in the MCMC samples. Parameter convergence was determined using the potential scale reduction factor (\({\hat{R}}\)), a diagnostic criteria proposed by Gelman and Rubin (1992). An \({\hat{R}}\) value that is very close to one is an indication that the four chains have mixed well. If the \({\hat{R}}\) value was less than 1.05, the chains were considered to have mixed properly, and the posterior estimates of the parameters were considered reliable.
Prior distributions were assigned as follows: \(\varvec{\mu }_{a}\), the vector of means of the random effect \({\textbf{a}}\), was assigned a multivariate Normal prior with a diagonal variance-covariance matrix \(\varvec{\Sigma }_{0}\) and mean vector \(\varvec{\mu }_{0}\). \(\varvec{\Sigma }_{a}\), the variance-covariance matrix of \({\textbf{a}}\) was assigned an inverse-Wishart prior with a diagonal scale matrix \(\varvec{\Omega }\), and \(S+1\) degrees of freedom v which results in a Uniform(-1,1) prior on the correlations (Plummer 2017):
An inverse-Wishart distribution is specified as the prior for the covariance matrix of the random effect \({{\textbf {a}}}\). Criticisms of the inverse-Wishart prior include the dependency imposed between correlations and variances, and the fact that there is a single degree of freedom parameter which determines the uncertainty for all variance parameters. It is demonstrated by Alvarez et al. (2014) that when the variance is small relative to the mean, the correlation is biased towards zero, and the variance is biased towards larger values, though when working with count data, typically variances are large relative to the mean. Despite these issues, the inverse-Wishart distribution is a prior distribution commonly assigned to a covariance matrix in Bayesian analysis due to its conjugacy with the Normal distribution, and for these models the inverse-Wishart distribution provides a good solution due to its guarantee of providing a positive definite covariance matrix.
In the Hurdle and Hurdle-AR models, \(\theta \) is assigned a Beta prior with the value of both shape parameters equal to one, which is equivalent to a vague uniform prior:
In the AR and Hurdle-AR models, \(\phi \) is assigned a Multivariate Normal prior, with hyperpriors \(\varvec{\mu _{\phi }}\) and diagonal matrix \(\varvec{\Sigma _{\phi }}\):
Extensive simulation studies were carried out to examine the accuracy of parameter estimates; see Appendix B for more details.
3.6 Inter-species correlations
The presence of the multivariate normal random effect \({{\textbf {a}}}\) in the abundance provides a link between species’ abundances. The correlation matrix for the random effect, \(\varvec{\Sigma }_{a}\), may be estimated directly from the Bayesian model. In this sense, the inter-species correlations for the latent abundances \(N_{s}\) and \(N_{s'}\), for all \(s \ne s'\), are calculated for each model as:
The derivation of \(\text {Cov}(N_{s}, N_{s'})\) can be found in Appendix A.
The inter-species correlations for the MNM model and Hurdle model are assumed not to vary by year, so these models have a single analytic correlation matrix. However, in the AR and Hurdle-AR model, we assume latent abundances change by year, which requires the computation of K analytic correlation matrices. Note that the MNM and AR models required the use of properties of conditional variance and covariance to determine analytic correlations. In the Hurdle and Hurdle-AR models, the properties of conditional variance and covariance were merged with second-order Taylor approximations to make their computation feasible.
4 Case study: North American Breeding Bird Survey
In this section, we describe the application of the multi-species N-mixture models to a real world case study, to examine bird populations using data collected as part of the North American Breeding Bird Survey (NABBS).
The North American Breeding Bird Survey (Pardieck et al. 2020) was first conducted in 1966, and now provides data annually on more than 400 bird species across 3700 routes in the United States and Canada. Each of these routes is approximately 24.5 miles long and is composed of 50 stops, approximately 0.5 miles apart. At each stop, every bird seen or heard within a 0.25-mile radius is recorded. For the sake of our models, each of these routes is considered a site, and each of the 50 stops along a route is a sampling occasion.
We examine data collected in Alaska in the 10-year period 2010–2019. There are 94 routes in Alaska (Fig. 1) at which data was collected during this time, and each of these routes are composed of 50 sampling locations, totalling 4700 observations per bird species.
Bald Eagle populations in Alaska are estimated at between 8000 and 30,000 birds, accounting for roughly half of the global population (Hodges 2011; Hansen 1987; King et al. 1972). For this reason, Bald Eagles were chosen as a species of interest. Several other species were chosen; these included waterbirds such as geese, swans and snipes which were chosen for their relationships with Bald Eagles, as Bald Eagles are known to prey on waterbirds such as ducks, geese and grebes when fish are in short supply (Dunstan and Harper 1975; Todd et al. 1982; McEwan and Hirth 1980). Additionally, a selection of species with inland habitats, such as thrushes and swallows, were examined. In total, 10 species were selected for analysis, of the 233 total species present in Alaska within the 10-year period. The full list of species selected and the frequency with which they were observed is given in Table 1.
The models described in Sect. 3 were fitted to the NABBS data. Each was fitted three times, varying the dimension of the detection probability. Initially, detection probability was allowed to vary by site, species and year. Subsequently, models were fitted in which detection probability varies only by site and species, and then by species alone.
Initially, the models were fitted without covariates, and results were compared using their Bayesian Information Criterion (BIC) (Delattre et al. 2014) values. Subsequently, latitude, longitude and their interaction term latitude \(\times \) longitude were included in the linear predictors for the abundance parameters, and models were again compared using BIC values. All covariates were scaled to have zero-mean and unit variance.
Initial examination of this data revealed that \(93.2\%\) of observations (438,040 of a total of 470,000 observations) consisted of zero counts. This suggested that a model with a hurdle component might provide an appropriate framework for this data. Furthermore, this data was collected over the course of a decade. For this reason, we might expect that an autoregressive term may be useful to incorporate the time dependence.
Each model was fitted using four chains with 50,000 iterations each, of which the first 10,000 were discarded as burn-in, using a thinning value of five. All prior distributions were assigned as described in Sect. 3.5.
5 Results
Initially, the models were fitted without covariates and were compared using BIC values. The result of this comparison was that the Hurdle-AR model, in which detection probability varies by species (Hurdle-AR(C)), provided the best fit for the NABBS data. However, the addition of a response surface for latitude and longitude in the linear predictors for the abundance parameters results in the Hurdle model in which detection probability varies by species (Hurdle(C)) producing the lowest BIC value. This suggests that, within the range of models produced, this model provides the best fit for our data. The variance which was initially explained by the addition of the autoregressive term is now explained by the latitude and longitude covariates, which render the autoregressive component unnecessary. The result of this comparison is given in Table 2.
The latent inter-species correlations are given in Fig. 2, while the derivation of analytic correlations, which vary by site and year, are given in Appendix A. The latent correlations are obtained after the probability of detection and other covariates are taken into account. They may be interpreted as an interaction strength metric, which allows for the study of the influence of one species’ abundance on the others (Berlow et al. 2004; Moral et al. 2018).
6 Discussion
We have proposed a multi-species extension to the N-mixture model, which allows for the estimation of inter-species abundance correlations through the addition of a random variable in the abundance. Results of simulation studies (see Appendix B) reveal that this model performs well under a range of scenarios, with abundances and detection probabilities that range from low to high. For this reason, we believe that this approach represents an attractive framework for examining multi-species abundances.
Issues with parameter convergence were encountered when fitting the Hurdle and Hurdle-AR models. When zero-inflation and abundance are large, and detection probability is small, issues with convergence occurred in up to 20\(\%\) of parameters. While this convergence issue does not appear to negatively affect the relative biases of parameter estimates (as can be seen in Appendix B, Tables 3 and 4), coverage probability for detection probability p and random effect mean \(\mu _{a}\) is negatively impacted (Appendix D). In the same models, we see larger coverage for N. This is to be expected, and is due to zero counts being perfectly predicted.
Previous works have demonstrated that N-mixture models can sometimes suffer from issues with identifiability (Dennis et al. 2015) wherein probability of detection estimates are very close to zero and abundance estimates are infinite. To address this issue, we have performed extensive simulation studies, detailed in Appendix B, in which we assess the estimates of abundance and detection probability for a large range of sample sizes, detection probabilities, abundance sizes, and in the case of the Hurdle and Hurdle-AR models, zero-count probabilities. The result was a simulation study which demonstrated no evidence that this modelling framework suffers from these identifiability issues.
The models presented here all use the Poisson distribution to model the latent abundances. However, any other count distribution might instead be used, for example, the Negative Binomial. Our calculations for the analytic correlations, however, reflect only the use of the Poisson distribution.
Case study results reveal that the difference in BIC values between the model with the lowest BIC value (Hurdle(C) with covariates) and the model with the second-lowest BIC value (Hurdle-AR(C) without covariates) is 223. This sizeable difference in BIC values suggests that the Hurdle(C) model with covariates provides a better fit than the Hurdle-AR(C) model without covariates.
Case study detection probability values range from 0.047 (Tree Swallow) to 0.564 (Swainson’s Thrush). Estimates of the maximum latent abundance N per species are provided in Appendix C, which reveals that while N-mixture models occasionally suffer from identifiability issues as described above, this does not appear to be an issue for this case study.
The estimates for Bald Eagle abundance produced by this model are plotted by site and year in Fig. 3. Of the 94 possible sites in Alaska, the Bald Eagle population is concentrated at 18 sites at the southeastern coast, along a 300-mile stretch of islands called the Alexander Archipelago. Examination of this figure suggested a possible increase in Bald Eagle abundance in this area between 2010 and 2019. The mean abundance was calculated per year (Fig. 4), and a one-sided Mann-Kendall test (Mann 1945; Kendall 1948) for an increasing trend in time series data was performed. The result of this was a Kendall’s \(\tau \) value of 0.6 and a p-value of 0.0082, indicating that it was appropriate to reject the null hypothesis that no increasing trend exists. We can therefore conclude that Bald Eagle abundances increased in the area of the Alexander Archipelago in the decade between 2010 and 2019.
In the models that contain an autoregressive component, we obtain separate \(\text {Corr}(N_{s}, N_{s'})\) per year. As a feature of model formulation, the correlation between two species does not change sign from year to year. We can accommodate a change in sign by allowing for an unstructured covariance matrix of the autocorrelation coefficient \(\varvec{\Sigma _{\phi }}\), and this particular extension is subject of ongoing work. Furthermore, the models presented in this paper assume that sites are independent of one another. A further extension we are currently working on is the incorporation of spatial dependence.
Data availability
The North American Breeding Bird Survey data which was utilized for this research are as follows: Pardieck et al. (2020), [available at https://www.sciencebase.gov/catalog/item/52b1dfa8e4b0d9b325230cd9].
Code Availability
Code for simulating data and fitting models is provided via the following link: https://github.com/niamhmimnagh/MNM.
Change history
26 November 2022
Missing Open Access funding information has been added in the Funding Note.
References
Alvarez I, Niemi J, Simpson M (2014) Bayesian inference for a covariance matrix. arXiv preprint arXiv:1408.4050
Berlow EL, Neutel A-M, Cohen JE, De Ruiter PC, Eben man BO, Emmerson M, Fox Jeremy W, Jansen VA, Iwan Jones J, Kokkoris GD et al (2004) Interaction strengths in food webs: issues and opportunities. J Anim Ecol 73(3):585–598
Delattre M, Lavielle M, Poursat M-A (2014) A note on BIC in mixed-effects models. Electron J Stat 8(1):456–475
Dennis EB, Morgan BJT, Ridout MS (2015) Computational aspects of N-mixture models. Biometrics 71(1):237–246
Dorazio RM, Connor EF (2014) Estimating abundances of interacting species using morphological traits, foraging guilds, and habitat. PLoS ONE 9(4):e94323
Dorazio RM, Royle JA (2005) Estimating size and composition of biological communities by modeling the occurrence of species. J Am Stat Assoc 100(470):389–398
Dunstan TC, Harper JF (1975) “Food habits of bald eagles in north-central Minnesota. J Wildl Manag 39:140–143
Fokianos K, Tjøstheim D (2011) Log-linear poisson autoregression. J Multivar Anal 102:563
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Golding JD, Joshua Nowak J, Dreitz VJ (2017) A multispecies dependent double-observer model: a new method for estimating multispecies abundance. Ecol Evol 7(10):3425–3435
Gomez JP, Robinson SK, Blackburn JK, Ponciano JM (2017) An efficient extension of N-mixture models for multi-species abundance estimation. Methods Ecol Evol 9(2):340–353
Hansen AJ (1987) Regulation of bald eagle reproductive rates in southeast Alaska. Ecology 68(5):1387–1392
Herdin M, Czink N, Ozcelik H, Bonek E (2005) Correlation matrix distance, a meaningful measure for evaluation of non-stationary MIMO channels. In: 2005 IEEE 61st vehicular technology conference, vol 1, pp 146–140
Hodges JI (2011) Bald Eagle population surveys of the north Pacific Ocean, 1967–2010. Northwest Nat 92(1):7–12
JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling (2003) Vol. 124. 125.10. Vienna, Austria, pp 1–10
Kendall MG (1948) Rank correlation methods
King JG, Robards FC, Lensink CJ (1972) Census of the bald eagle breeding population in southeast Alaska. J Wildl Manag 36:1292–1295
Lin LI-K (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics 45:255–268
Mann HB (1945) Nonparametric tests against trend. Econometrica 13:245–259
McCullagh P, Nelder JA (1989) Generalized linear models, Chapman and Hall/CRC monographs on statistics and applied probability series, 2nd edn. Chapman & Hall
McEwan LC, Hirth DH (1980) Food habits of the bald eagle in north-central Florida. The Condor 82(2):229–231
Moral RA, Hinde J, Demétrio CGB, Reigada C, Godoy WAC (2018) Models for jointly estimating abundances of two unmarked site-associated species subject to imperfect detection. J Agric Biol Environ Stat 23(1):20–38
Nichols JD, MacKenzie DI (2004) Abundance estimation and conservation biology. Anim Biodivers Conserv 27(1):437–439
Niku J, Hui FKC, Taskinen S, Warton DI (2019) gllvm: Fast analysis of multivariate abundance data with generalized linear latent variable models in r. Methods Ecol Evol 10(12):2173–2182
Pardieck KL, Ziolkowski D, Lutmerding M, Aponte V, Hudson M-AR (2020) North American Breeding Bird Survey Dataset 1966–2019. Geological Survey data release, US. https://doi.org/10.5066/P9J6QUF6
Plummer Martyn (2017) JAGS Version 4.3.0 user manual. Lyon, France. http://www.stat.yale.edu/~jtc5/238/materials/jags_4.3.0_manual_with_distributions.pdf
R Core Team (2020a) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/
R Core Team (2020b) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/
Royle JA (2004) N-mixture models for estimating population size from spatially replicated counts. Biometrics 60(1):108–115
Royle JA, Nichols JD (2003) Estimating abundance from repeated presence-absence data or point counts. Ecology 84(3):777–790
Su Y-S, Yajima M (2020) R2jags: Using R to Run ’JAGS’. R package version 0.6-1. https://CRAN.R-project.org/package=R2jags
Todd CS, Young LS, Owen RB Jr, Gramlich FJ (1982) Food habits of bald eagles in Maine. J Wildl Manag 46:636–645
Verdade LM, Roberto MJ, Maria K, Ferraz PMB (2013) Counting capybaras. In: Capybara. Springer, pp 357–370
Witmer GW (2005) Wildlife population monitoring: some practical considerations. Wildl Res 32(3):259–263
Yamaura Y, Royle JA, Kuboi K, Tada T, Ikeno S, Makino S (2011) Modelling community dynamics based on species level abundance models from detection/nondetection data. J Appl Ecol 48(1):67–75
Yamaura Y, Royle JA, Shimada N, Asanuma S, Sato T, Taki H, Makino S (2012) Biodiversity of man-made open habitats in an underused country: a class of multispecies abundance models for count data. Biodivers Conserv 21(6):1365–1380
Acknowledgements
We are grateful to the associate editor and an anonymous referee, who helped to substantially improve the quality of the original manuscript.
Funding
Open Access funding provided by the IReL Consortium. Niamh Mimnagh’s work was supported by a Science Foundation Ireland Grant Number 18/CRT/6049. The opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Science Foundation Ireland. Estevão Prado’s work was supported by a Science Foundation Ireland Career Development Award Grant Number 17/CDA/4695 and SFI research centre 12/RC/2289P2. Andrew Parnell’s work was supported by: a Science Foundation Ireland Career Development Award (17/CDA/4695); an investigator award (16/IA/4520); a Marine Research Programme funded by the Irish Government, co-financed by the European Regional Development Fund (Grant-Aid Agreement No. PBA/CC/18/01); European Union’s Horizon 2020 research and innovation programme under Grant Agreement No 818144; SFI Centre for Research Training 18/CRT/6049, and SFI Research Centre awards 16/RC/3872 and 12/RC/2289P2.
Author information
Authors and Affiliations
Contributions
All authors contributed to methodology design. NM and RM analysed data and led the writing of the manuscript. AP and EP contributed critically to the drafts and gave final approval for publication.
Corresponding author
Ethics declarations
Competing interest
The authors have no competing interests to declare that are relevant to the content of this article.
Ethical approval
Not applicable.
Informed consent
Not applicable.
Additional information
Communicated by Luiz Duczmal.
Appendices
Analytic correlations
In this section, we present the analytical expressions for the correlation between the latent abundances (\(N_{s}\) and \(N_{s'}\)) for all \(s \ne s'\) for the MNM model. For convenience of notation, we drop the dependence on i and t, \(Y_{s} = (\{Y_{its}\})\), \(N_{s} = (\{N_{is}\})\), \(p_{s} = (\{p_{its}\})\), \(\lambda _{s} = (\{\lambda _{is}\})\). We need the expectation, variance and covariance of the log-normally distributed \(\lambda _{s}\), which are given by
where \(\mu =\varvec{\mu _{a}}+{{\textbf {x}}}_{i}\varvec{\mu _{\beta }}\) and \(\Sigma =\Sigma _{a}+{{\textbf {x}}}_{i}^{2}\varvec{\Sigma _{\beta }}\). As \(N_{s} \sim \text {Poisson}(\lambda _{s})\) and \(Y_{s} \sim \text {Binomial}(N_{s}, p_{s})\), we can write the conditional expectation and variance directly:
The unconditional expectation and variance of \(N_{s}\), and the unconditional covariance between \(N_{s}\) and \(N_{s'}\), can be derived using the laws of total expectation, variance and covariance as follows:
We assume that, given the correlated effects \({{\textbf {a}}}\), the latent and observed abundances are independent, which means that \(\text {Cov}(N_{s}, N_{s'} \mid \lambda _{s}, \lambda _{s'})=0\). So,
The correlations between \(N_{s}\) and \(N_{s'}\) can be estimated in a similar way for the other models presented in this paper. The models with a hurdle component require the use of Hurdle-Poisson \({\mathbb {E}}(N_{s} \mid \lambda _{s})\) and \(\text {Var}(N_{s} \mid \lambda _{s})\), which results in the need for an approximation of \({\mathbb {E}}(N_{s})\) and \(\text {Var}(N_{s})\) and \(\text {Cov}(N_{s}, N_{s'})\) based on quadratic Taylor expansions. Correlations for models with an autoregressive component follow the same form as the MNM model, with the following substitution for \(\lambda \)
where
Simulation study
In this section, we describe the simulation studies which were used to determine the accuracy of the estimates produced by the multi-species N-mixture models.
To determine if our modelling framework produces accurate estimates at contrasting sample sizes, a series of simulations were run in which we varied the number of sites, \(R \in \{10,100\}\), the number of sampling occasions, \(T \in \{5,10\}\), and the number of species observed, \(S \in \{5,10\}\). Within these simulations, we varied the detection probability p, and the mean number of individuals per site \(\lambda \). Small values for p lay between 0.1 and 0.4, while large values for p lay between 0.5 and 0.9. Small values for \(\lambda \) had a median value of 7 and standard deviation of 10, while large values for \(\lambda \) had a median value of 55 and standard deviation of 74.
In the case of the Hurdle and Hurdle-AR models, we also varied the probability of a zero-count occurring, \(\theta \in \{0.2, 0.7\}\). For each combination of parameters, we simulated 100 datasets and estimated N, \(\varvec{\Sigma }_{a}\), and p. We also estimated values for \(\theta \) and \(\phi \), in the case of the Hurdle and AR models, respectively. Relative mean bias was calculated for the estimated probability of obtaining a zero count \({\hat{\theta }}\), autocorrelation coefficient \({\hat{\phi }}\), probability of detection p, and mean of the abundance random effects \(\varvec{\mu }_a\). The smaller the value for relative bias, the closer to the true value our estimated parameters were. We compared \({\hat{N}}\) to N using the concordance correlation coefficient (Lin 1989), which is given by the formula:
where \(\rho \) is Pearson’s correlation coefficient, \(\sigma \) and \(\mu \) are the standard deviation and mean of the true values of N, and \({\hat{\sigma }}\) and \({\hat{\mu }}\) are the standard deviation and mean of the estimated values of N. The Pearson correlation coefficient is a measure of the strength of a linear association between two variables. However, the Pearson correlation is invariant under changes in location and scale. If two variables exhibit a linear relationship, but are very different in terms of their location or scale, the Pearson correlation coefficient will not reveal this. The concordance correlation coefficient, however, does take into account differences in location and scale. For this reason, the concordance correlation coefficient was chosen as a measure of the linear relationship between the true abundance and estimated abundance, rather than the Pearson correlation coefficient. The higher the value of the concordance correlation coefficient, the closer our estimates for N were to the true values.
We compared our estimated correlation matrix to the true value using the correlation matrix distance (Herdin et al. 2005), which is given by the following formula:
where \(\mathbf {X_1}\) and \(\mathbf {X_2}\) are two correlation matrices, \(\text {tr}(\mathbf {X_1}\mathbf {X_2})\) is the trace of the product of these two matrices, and \(\Vert .\Vert _{f}\) denotes the Frobenius norm.
Additionally, the coverage probabilities for each parameter were determined as the proportion of simulations in which the 50% credible interval contained the true parameter value. We expect that approximately 50% of the time, the estimated 50% credible interval for the parameter will contain the true value of that parameter (Appendix D). Each of these scenarios were simulated 100 times. All data was simulated using the R statistical software version 4.0.2 (R Core Team 2020b), and all Bayesian models were implemented using the R2jags package (Su and Yajima 2020).
1.1 Simulation study results
The results of the small-scale simulation study, which was composed of data simulated for five species at 10 sites, over five years, is shown in Table 3. The results of the large-scale simulation study, which contained 10 species, 100 sites and 10 years, is shown in Table 4.
1.1.1 MNM model
The large-scale simulation study (Table 4) produced reliable estimates for latent abundance N at every combination of p and \(\lambda \), with CCC values between 0.97 and 0.99. Estimates of N from the small-scale simulation study (Table 3) appear more dependent on the detection probability p, with greater CCC values associated with larger detection probabilities.
From Table 4, the relative bias for the estimate of p shows that when R, T and S are large, the model produces estimates for p which are accurate to two decimal places. When, R, T and S are small (Table 3), the relative bias for the estimate of p is larger for small median p. When R, T and S are small, larger values of p produce more reliable estimates of p.
Estimates for the correlation matrix and \(\varvec{\mu }_{a}\) improve with larger values of \(\lambda \). In both Tables 3 and 4, the relative bias for \(\varvec{\mu }_{a}\) and the CMD decrease when \(\lambda \) is larger. Larger values of R, T and S produce more accurate estimates of the inter-species correlations and \(\varvec{\mu }_{a}\), as can be seen by the decrease in the sizes of the CMD and RB(\(\varvec{\mu }_{a}\)) between Tables 3 and 4.
Coverage probabilities (Appendix D) for this model reveal that both small- and large-scale simulations produce parameters whose true value lie within the 50% credible interval approximately 50% of the time, as expected.
1.1.2 Autoregressive model
At both small- (Table 3) and large-scale simulations (Table 4), the autoregressive model produced reliable estimates for N, with CCC values above 0.9 for all simulations. Both the Tables 3 and 4 see CMD values accurate to two decimal places. Relative bias for p decreases as median p increases. This can be seen for both small (Table 3) and large (Table 4) values of R, T, and S. In Table 3, relative bias for the autocorrelation coefficient \(\phi \) is much larger for small abundance. In this situation, the estimates for the autocorrelation coefficient \(\phi \) cannot be relied upon. This is an issue that persists, though not as severely, as R, T and S increase in Table 4.
All parameters in this model have coverage probabilities of approximately 50%, as is expected for the 50% credible intervals (Appendix D).
1.1.3 Hurdle model
Similar to the MNM model, when R, T and S are large (Table 4), consistently accurate estimates of latent abundance N are produced, with CCC values between 0.948 and 0.999. In Table 3 we see that CCC values depend more on the detection probability, with more accurate estimates of N produced when detection probability is high. Both Tables 3 and 4 show higher accuracy in estimates of the inter-species correlations when zero-inflation is small, and abundance is large. CMD values are greater when \(\theta = 0.7\) or median \(\lambda = 7\) than for \(\theta = 0.2\) or median \(\lambda = 55\). From both Tables 3 and 4, the Hurdle model sees much smaller relative bias for p when median p is large compared to when median p is small. Relative bias for \(\theta \) decreases when \(\theta \) increases, indicating that \(\theta \) is estimated with more accuracy when zero-inflation is large. Table 4 sees smaller relative bias for \(\theta \) than Table 3, revealing that the strength of zero-inflation \(\theta \) is estimated more accurately when R, T and S are large.
Issues with parameter convergence were encountered when fitting the Hurdle model. When zero-inflation and abundance are large, and detection probability is small, issues with convergence occurred in up to 20\(\%\) of parameters. While this convergence issue does not appear to negatively affect the relative biases of parameter estimates, as can be seen in Tables 3 and 4, coverage probability for detection probability p and random effect mean \(\mu _{a}\) is negatively impacted (Appendix D). We also see coverage for N which is larger than 50%. This is to be expected, and due to zero counts being perfectly predicted.
1.1.4 Hurdle-autoregressive model
In Table 3, CCC values demonstrate that the Hurdle-AR model produces estimates for N which are more accurate when the probability of obtaining a zero count is smaller. However, increasing R, T, and S (Table 4) reduces this dependence on \(\theta \), and all CCC values produced are greater than 0.95.
The small-scale simulation (Table 3) has CMD values and relative biases for p and \(\varvec{\mu }_{a}\) which increase when the probability of obtaining a zero count increases, indicating that the inter-species correlations, p and \(\varvec{\mu }_{a}\) are estimated more accurately when the degree of zero-inflation is low. The same is true for the large-scale simulation (Table 4), though the differences in CMD and relative biases between small \(\theta \) and large \(\theta \) are not as large, revealing that the increase in R, T and S renders the increase in zero-inflation less important in the estimation of these parameters.
The Hurdle-AR model suffers with the same issue estimating \(\phi \) when the probability of obtaining a zero count is high. This issue is more severe in Table 3, and estimates of \(\phi \) cannot be trusted when R, T and S are small but \(\theta \) is large. Like the AR model, this issue is not as acute in Table 4, as an increase in R, T and S appears to compensate for the problems caused by large zero-inflation.
Estimated abundances
In this section, we provide a comparison of the maximum observed abundance with the maximum abundance estimated from the Hurdle(C) Model, fitted to the NABBS data.
Coverage probabilities
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mimnagh, N., Parnell, A., Prado, E. et al. Bayesian multi-species N-mixture models for unmarked animal communities. Environ Ecol Stat 29, 755–778 (2022). https://doi.org/10.1007/s10651-022-00542-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-022-00542-7