Study of biological communities subject to imperfect detection: bias and precision of community N-mixture abundance models in small-sample situations


Community N-mixture abundance models for replicated counts provide a powerful and novel framework for drawing inferences related to species abundance within communities subject to imperfect detection. To assess the performance of these models, and to compare them to related community occupancy models in situations with marginal information, we used simulation to examine the effects of mean abundance \((\bar{\lambda }\): 0.1, 0.5, 1, 5), detection probability \((\bar{p}\): 0.1, 0.2, 0.5), and number of sampling sites (n site : 10, 20, 40) and visits (n visit : 2, 3, 4) on the bias and precision of species-level parameters (mean abundance and covariate effect) and a community-level parameter (species richness). Bias and imprecision of estimates decreased when any of the four variables \((\bar{\lambda }\), \(\bar{p}\), n site , n visit ) increased. Detection probability \(\bar{p}\) was most important for the estimates of mean abundance, while \(\bar{\lambda }\) was most influential for covariate effect and species richness estimates. For all parameters, increasing n site was more beneficial than increasing n visit . Minimal conditions for obtaining adequate performance of community abundance models were n site  ≥ 20, \(\bar{p}\) ≥ 0.2, and \(\bar{\lambda }\) ≥ 0.5. At lower abundance, the performance of community abundance and community occupancy models as species richness estimators were comparable. We then used additive partitioning analysis to reveal that raw species counts can overestimate β diversity both of species richness and the Shannon index, while community abundance models yielded better estimates. Community N-mixture abundance models thus have great potential for use with community ecology or conservation applications provided that replicated counts are available.


The abundance of organisms is of central interest in ecology (Ehrlich and Roughgarden 1987). However, abundance measurements are almost always affected by imperfect detection; that is, abundance is underestimated when detection probability is less than 1. Detection probability may vary by species, observer, survey method and environment (Royle and Dorazio 2008; Kéry and Schaub 2012; Kéry and Royle 2016). The consequences of imperfect detection can vary widely, and can prevail in the analysis of abundance from local habitat to regional-scales (Lahoz-Monfort et al. 2014; Higa et al. 2015). For example, local population densities can be underestimated, while extinction and colonization rates of populations may be overestimated (Moilanen 2002; Kéry et al. 2013). Some have argued that imperfect detection need not always be considered provided that a study employs a standardized sampling design (Johnson 2008; Banks-Leite et al. 2014). However, if absolute abundance needs to be estimated and/or if detection probability depends on covariates that also affect abundance, then detection probability must be accounted for in any modeling framework for estimating abundance (Kéry 2008; Kéry et al. 2010; Yamaura 2013).

During the last several decades, a vast number of statistical methods of inference about distribution and abundance have been developed that accommodate imperfect detection, especially those developed during the last 15 years (Buckland et al. 2001; Borchers et al. 2002; Williams et al. 2002; Buckland et al. 2004; MacKenzie et al. 2006; Royle and Dorazio 2008; Link and Barker 2009; King et al. 2010; Kéry and Schaub 2012; Royle et al. 2014). Detection probability is typically estimated using a ‘single-species approach’, i.e., probability is estimated for every species individually (Alldredge et al. 2007). However, because the analysis of detection probability needs adequate sample sizes (Buckland et al. 2001; MacKenzie and Royle 2005), rare species are usually difficult to analyze independently. In this situation, researchers have repeatedly suggested overcoming the problem of small sample size for rare species by lumping (or pooling) their data with data for more common species that may be expected to have similar detection probabilities and respond to covariates in a similar fashion (MacKenzie et al. 2005; Buckland et al. 2008). As an alternative, one might analyze multiple or all species together in an analysis that stratifies by species, thereby accounting for the effects of species identity on parameters of abundance and detection (Alldredge et al. 2007), perhaps treating them as random effects, so that some information is shared among them (Kéry and Royle 2008; Zipkin et al. 2009).

Dorazio and Royle (2005) and Dorazio et al. (2006) proposed an approach of modeling a community as an ensemble of elemental species-level models from which community-level variables such as species richness or site similarity can naturally be derived (for a similar approach see also Gelfand et al. 2005; Ovaskainen and Soininen 2011). Using a series of binary detection/non-detection data of all species detected in a community, their community occupancy model estimates binary occupancy (presence/absence) of individual species at each site while correcting for imperfect detection. Species in the same community share hyper-parameters, and parameters of rare species (e.g., their detection probability) can be estimated by combining their own information with that coming from all the other species in the community, i.e., thereby borrowing strength from the ensemble (Link and Sauer 1996; Sauer and Link 2002; Kéry 2010). Furthermore, community occupancy models allow us to estimate the number of species that were not observed in a survey and their unobserved occupancy status while using data augmentation (Royle et al. 2007). Hence, community occupancy models produce a species richness estimator that accounts (1) for species that occurred in at least one sampling site but were missed in any of the other sites (i.e., were detected at least once), (2) for species that occurred in at least one sampling site but were missed in all other sampling sites (i.e., were never detected), and (3) for species that did occur in the meta-community (i.e., in the wider region that is sampled near the studied sites), but did not occur (and therefore were not detected) at any of the sampling sites (Kéry and Royle 2009; Kéry 2011; Iknayan et al. 2014). This modeling framework has recently been extended for abundance as a state variable based on count data (Yamaura et al. 2012; Chandler et al. 2013; Barnagaud et al. 2014; Dorazio and Connor 2014). In these models which estimate abundance of species (herein, community abundance or community N-mixture models), the occurrence (or occupancy) of a species is naturally a function of its local abundance (i.e., a species occurs if its local abundance is greater than zero), and community-level species richness and total abundance is obtained as a derived parameter. We can assume that a studied community is composed of multiple (functional) groups in which species may have similar parameters, which are summarized by group-level hyper-parameters (Sauer and Link 2002; Ruiz-Gutiérrez et al. 2010; Yamaura et al. 2012; Chen et al. 2013; Barnagaud et al. 2014; Pacifici et al. 2014).

A large number of other models and methods have been developed over the years to study biological communities that are subject to imperfect detection. These nonparametric models estimate the number of unobserved species and compare community composition (Gotelli and Colwell 2001; Williams et al. 2002; Chao et al. 2005, 2009). Compared with these methods, community abundance models have several desirable properties that are not shared by other methods (Dorazio et al. 2011). First, abundance and the detection process are treated separately; therefore, common species with low detection probability and rare species with high detection probability are treated differently. This separation yields less-biased estimates of diversity measures (Broms et al. 2015). Second, community abundance models are able to estimate local (site-specific) species richness (α diversity) as well as overall species richness (γ diversity) and the size of the regional species pool; γ diversity can be greatly smaller than the size of regional species pool depending on the total area of the sampling plots used. Turnover of community composition among sites (β diversity) can be calculated by subtracting the mean α diversity from γ diversity using the framework of additive partitioning (Veech et al. 2002; Crist et al. 2003) or by computing indices such as the Jaccard index that are based on pairwise comparisons of species occurrence (Dorazio and Royle 2005; Dorazio et al. 2010; Kéry and Royle 2016).

In contrast to community occupancy models, community abundance models are still in their infancy, they have not been widely applied (Iknayan et al. 2014; Dénes et al. 2015), and their performance is essentially unknown. Occupancy models, which are the building blocks of all community occupancy models, are now commonly used for modeling presence/absence of individual species (Guillera-Arroita et al. 2015). Their estimation performance as well as appropriate sampling design that is used to maximize performance have both been actively examined and discussed for single species models (e.g., MacKenzie and Royle 2005; Guillera-Arroita et al. 2010, 2014; Rota et al. 2011; Guillera-Arroita and Lahoz-Monfort 2012; Wintle et al. 2012; Welsh et al. 2013). It would be natural to assume that what was learned for single-species models could be carried over to community models, and this was shown by Sanderlin et al. (2014) for the community occupancy model. In contrast, N-mixture models (Royle 2004), which are the building blocks of community abundance models and estimate abundance of a single species from repeated count data, have received much less attention. Several studies have compared estimates between N-mixture and other models that accommodate imperfect detection using field data (Kéry et al. 2005; Hunt et al. 2012; Couturier et al. 2013). In addition, some simulation studies related to their performance have also been conducted (Kéry 2008; McIntyre et al. 2012; Yamaura 2013; Dennis et al. 2015). However, estimation performance of community N-mixture abundance models has not been examined to date.

The objective of this study is to examine the bias and precision of community abundance models under various conditions of abundance, detectability and different combinations of the number of survey visits and sampling sites. We also compared the performance of community abundance models as a species richness estimator to that of community occupancy models under conditions where abundance is low and where count data converge towards binary detection/non-detection records. This comparison included the computation time of both models. Finally, we divided γ diversity of species richness and the Shannon index into α and β diversity in the estimation procedure of community abundance models, and examined whether community abundance models produced unbiased estimates of the diversity measures. Because the true values of species richness and all parameters of individual species are known, we can gauge the differences between true and estimated values, i.e., the bias and precision of all estimates. Following Yamaura (2013), we focused on small-sample situations with a limited number of sampling sites as a worst-case scenario. If we know the minimal conditions under which the model performs adequately, then we can be assured that they perform even better with larger samples.


A brief outline of community abundance models

Submodel of the ecological process

In community abundance models, we assume that a community is assembled as an ensemble of independent Poisson processes for each individual species. The abundance of species i at site j, N ij , is a Poisson random variable:

$$N_{ij} \sim {\text{ Poisson}}(\lambda_{ij} )$$

where λ ij is the expected (or mean) abundance (Royle et al. 2005), and λ ij can be expressed as a function of site-level covariates \((\varvec{x}_{j}^{'}\)) typically using a log-link:

$$\log \left( {\lambda_{ij} } \right) = \beta_{0i} + \varvec{x}_{j}^{{\prime }}\varvec{\beta}_{i}$$

Here intercepts β 0i and covariate coefficients (β i ) are assumed to follow separate normal distributions, e.g., \(\beta_{0i} \sim {\text{Normal}}(\mu_{{\beta_{0} }} ,\sigma_{{\beta_{0} }}^{2} )\). Under this model the community-level hyper-parameters, i.e., the mean and the standard deviation of these normal distributions, are shared by all species in the community; they describe the average of the community and the among-species heterogeneity, respectively. We can also use separate normal distributions for individual species (functional) groups, and examine group-specific responses to covariates (Yamaura et al. 2012; Chen et al. 2013; Barnagaud et al. 2014). Thanks to this sharing of hyper-parameters among species, we can obtain better estimates of the parameters of rare species and even those of unobserved species by “borrowing strength” (i.e., sharing information) among similar but more common species (Zipkin et al. 2009; Ovaskainen and Soininen 2011).

Submodel of the detection process

Counts of detected individuals are one of the most convenient data types to collect, and the N-mixture (or binomial mixture) model is a natural model for abundance of a single species based on count data (Royle 2004; Kéry et al. 2005). The main idea is that if every individual at a given point in space and time (i.e., every member of N ij ) has the same detection probability and is detected independently, then the number of individuals detected will be a binomial random variable. When individual sites are visited multiple times and repeated counts can be obtained within a short period in which abundance does not change (this is called the “closure assumption”), we can describe the detection process as a binomial process with a probability of success (or here, detection) of p i :

$$y_{ijk} \sim {\text{ Binomial}}\left( {N_{ij} ,p_{i} } \right)$$

Here, y ijk is the number of detected individuals (i.e., the count) of species i at site j on visit k. In a single-species situation, parameters can easily be estimated by maximum likelihood (Royle 2004; Kéry et al. 2005) or using Bayesian inference (Kéry 2010; Kéry and Schaub 2012). However, in the much more complex multispecies situation we typically have to resort to a Bayesian implementation of the model (Royle and Dorazio 2008; Kéry and Royle 2016). Similarly as the abundance submodel, the individual species-level detection probability of species i (p i ) is also assumed to be drawn from a normal distribution with community-level hyper-parameters defined on the logit-link scale, i.e., \({\text{logit}}\left( {p_{i} } \right) = q_{i} ,\quad {\text{with }}q_{i} \sim {\text{Normal}}(\mu_{q} ,\sigma_{q}^{2} )\). Although p i is here assumed to be constant across sites and replicate surveys, we can easily relax this assumption, for example, to model covariate effects (Kéry 2008; Yamaura 2013).

Data augmentation to account for unobserved species

Royle et al. (2007) and Dorazio et al. (2006) used data augmentation to estimate the number of unobserved species in a survey, and community composition at each site by accounting for unobserved species. In traditional community analyses such as in ordination methods, we analyze the available detection histories (e.g., series of counts or detection/non-detection over replicate surveys) only for the observed species. In the community occupancy model of Dorazio and Royle, we add (augment) all-zero detection histories for an arbitrary number of hypothetical, unobserved species to the detection histories of observed species and analyze the augmented data set (Dorazio et al. 2006; Royle et al. 2007). We call the resulting community, which is composed of observed and augmented species, a ‘super-community.’ The augmented data set of detection histories is analyzed to estimate the number of unobserved species, where a species can go unobserved either because it does not happen to occur in the sampled areas by chance (but does occur in the wider sampled area: \(\sum\nolimits_{j = 1}^{{n_{site} }} {N_{ij} = 0}\)) or because it does occur \((\sum\nolimits_{j = 1}^{{n_{site} }} {N_{ij} > 0}\)) but went undetected by chance. Under data augmentation the size of the super-community (S) is prescribed, and it should be chosen to be (much) larger than R (which is the unknown species pool size to be estimated). This can easily be achieved in practice by trial-and-error and making sure that the posterior mass of R is concentrated away from the chosen value of S (see below).

To re-formulate the community model using data augmentation, we introduce a binary, partially observed indicator variable, w i , which takes the value of 1 if a species in the super-community is a member of the community of R species that are exposed to sampling and 0 otherwise. This “community membership indicator variable” is known to be 1 for all species that are observed at least once, but its value must be estimated for the augmented species. We assume that w i are mutually independent Bernoulli random variables with an inclusion parameter Ω, i.e., w i  ~ Bernoulli(Ω). We then estimate R as a function of the inclusion variables as \(\sum\nolimits_{i = 1}^{S} {w_{i} }\). Estimating the data augmentation parameter Ω is therefore functionally equivalent to estimating R in the sense that under the data augmentation scheme R is a binomial random variable with sample size S and success parameter Ω. In the analyses, we use a sufficiently large value of S such that all the mass of the posterior distribution of Ω is well away from 1 (Royle and Dorazio 2008; Kéry and Royle 2016). However, larger values of S require a longer computation time and hence, the selected S should not be too high for purely practical reasons.

Detection histories of all augmented species contain only zeroes, but the species that are exposed to sampling could in principle have produced non-zero counts depending on the covariates and their detection probabilities, while the observations for non-exposed species are structural zeroes. Following previous studies with community abundance models (Yamaura et al. 2011, 2012), we formulate this zero-inflation in y ijk by modifying Eq. 3 (but see also the “Discussion”):

$$y_{ijk} \sim {\text{ Binomial}}\left( {N_{ij} w_{i} ,p_{i} } \right).$$

That is, for species that are not exposed to sampling (with w i  = 0), the observations are binomial with a sample size of zero and therefore counts are necessarily equal to zero.

We can estimate site-specific species richness and total abundance of communities as derived parameters using the posterior samples of the latent variables (N ij and w i ; see also the “Discussion”). We can similarly estimate the number of species occurring at any site (γ diversity) by tallying up species with at least one individual at that site. We note that this γ diversity of species richness is different from the community size R (O’Hara 2005; Iknayan et al. 2014), and can be greatly smaller than R when the total area of sampling sites is small and therefore many species in the wider region may simply not occur in the sampled sites (see estimation of species-accumulation curve in Dorazio et al. 2006 and Kéry and Royle 2016).

Another way is available to obtain site-specific species richness (R j ). Using a property of the Poisson distribution, we can formulate the probability that at least one individual occurs (Royle and Dorazio 2008): Pr[N ≥ 1] = 1 − exp(−λ). By using species-level parameters of community abundance models (β 0i and β i in Eq. 2), we can obtain point estimates of R j by expanding this probability into r observed species:

$$E[\hat{R}_{j} ] = \sum\limits_{i = 1}^{r} {\left[ {1 - \exp \left( { - \exp \left( {\hat{\beta }_{0i} + \mathop {\user2{x}}\nolimits_{j}^{{\prime }} \hat{\varvec{\beta }}_{i} } \right)} \right)} \right]}$$

If data augmentation is used to account for the existence of unobserved species, we can use the following quantity with the aid of an indicator variable, w i :

$$E[\hat{R}_{j} ] = \sum\limits_{i = 1}^{S}\left\{w_{i}\times {\left[ {1 - \exp \left( { - \exp \left( {\hat{\beta }_{0i} + \mathop {\user2{x}}\nolimits_{j}^{{\prime }} \hat{\varvec{\beta }}_{i} } \right)} \right)} \right]} \right\}$$

Simulation experiments

We conducted three simulation experiments, and first assessed bias and precision of community abundance models under various conditions that all characterize situations in which there is only little information about the model parameters. We next compared community abundance and occupancy models as species richness estimators. We finally examined the performance of community abundance models when quantifying β diversity.

Simulation experiment 1: assessing bias and precision of community abundance models

Parameter settings, data generation, and estimation procedure

Kéry (2008) and Yamaura (2013) showed that N-mixture models can remove the bias in inferred abundances based on the fact that detection probability depends on covariates that also affect abundance. Here, for simplicity, we assumed a constant detection probability and only varied the following four factors related to the sampling design and species biology (Yamaura 2013):

  • Number of sampling sites (n site ): 10, 20, 40.

  • Number of visits (n visit ): 2, 3, 4.

  • Mean detection probability (logit transformed, and denoted by μ q ): −2.2, −1.4, 0.0, corresponding to average of p = 0.1, 0.2, 0.5.

  • Mean abundance (log-transformed and denoted by \(\mu_{{\beta_{0} }}\)): −2.3, −0.69, 0.0, 1.61, corresponding to average of λ = 0.1, 0.5, 1, 5 at the average value of the covariate modeled.

Ranges of these four factors represent different aspects of a “small sample size;” in addition, the estimation performance of N-mixture model can greatly change within these ranges (Yamaura 2013). That is, our study examined the performance of the community abundance models in a worst-case scenario; with larger sample sizes, performance typically improves. Mean abundance and mean detection probability are not “settings” of a study design but characteristics of the analyzed communities (but see also “Discussion”). These are mean values of community-level hyper-parameters, and we fixed the other hyper-parameter, the standard deviations, at 1 (i.e., σ q  = \(\sigma_{{\beta_{0} }}\) = 1). We assumed that every sampling site received the same number of visits, and that the sampling area of individual sites was constant across sites (we can relax this assumption by modifying Eq. 1: Royle et al. 2005). We considered the situation in which only a single site-specific covariate x j was ecologically important; an example might be a continuous measurement of habitat quality. Minimum and maximum values of x j were −1 and 1, respectively, the intermediate values were equally spaced, and their intervals depended on the number of sampling sites. We chose a mean value of the coefficient of this covariate \((\mu_{{\beta_{1} }}\)) of 0 and for its associated standard deviation \((\sigma_{{\beta_{1} }}\)) a value of 1. Furthermore, we set the number of possible occurring species (i.e., species richness: community size or species pool size) at 40 throughout the simulation.

To simulate a data set, we randomly drew 40 species-level parameters (β 0i , β 1i , q i  = logit[p i ]) from their normal distributions with the given hyper-parameters. Next, we computed the site- and species-specific expected abundance using Eq. 2 (i.e., λ ij  = exp[β 0i  + β 1i  × x j ]), and then drew the realized abundances from Poisson distributions with those expectations (i.e., N ij  ~ Poisson[λ ij ]). Finally, we simulated replicated surveys of each site, and obtained count histories specified by detected species, sites, and visits (y ijk ), which followed a binomial process described by Eq. 3. Count histories and the number of detected species (the observed total species richness r) were of course different among replications because of sampling variability and differences in the combinations of the four factors, but r was always less than or equal to 40. We replicated each combination of four factors (n site , n visit , mean detection μ q , and mean abundance \(\mu_{{\beta_{0} }}\)) ten times, meaning that our experiments had a balanced design with 1080 replicate data sets representing 3 × 3 × 3 × 4 = 108 parameter combinations. See also Chapter 11 in Kéry and Royle (2016) for more information and an R function that can be used to simulate community abundance data.

We analyzed all simulated data sets using the above-described community abundance model. To estimate the number of unobserved species (R − r), we added all-zero count histories for 80 − r ‘potential’ species ([80 − r] × n site  × n visit ) to those of the detected species, and the resulting augmented data sets of 80 count histories were then analyzed. This means that we used a constant super-community size of S = 80. To minimize computation time, we used a constant of 40 augmented species in the case of λ = 5 (where the simulated survey data detected almost all 40 species, and the number of 40 augmented species was sufficiently large to estimate R [=40]). We fitted the model using Markov chain Monte Carlo (MCMC), and adopted conventional vague priors (e.g., \(\mu_{{\beta_{0} }}\) ~ Normal [0,1002], \(\sigma_{{\beta_{0} }}\) ~ Uniform [0,10], Ω ~ Uniform [0,1]). We fitted the community abundance model using R 3.0.2 (R Core Team 2014) and JAGS 3.4.0 (Plummer 2013) via the package R2jags 0.03-11 (Su and Yajima 2013). We discarded as a burn-in the first 10,000 iterations of three chains with different initial values, and ran an additional 100,000 iterations to accumulate a posterior sample to be used for inference. We assumed chain convergence was achieved when the Gelman-Rubin statistic of all parameters \((\mu_{{\beta_{0} }}\), \(\mu_{{\beta_{1} }}\), μ r , \(\sigma_{{\beta_{0} }}\), \(\sigma_{{\beta_{1} }}\), σ r , Ω, N, p i , β 0i , β 1i ) was <1.1; otherwise, we ran additional sets of 100,000 iterations until we achieved chain convergence, using the function autojags in R package R2jags.

Bias and precision of estimators under the community abundance models

To assess the performance of community abundance models, we focused on the bias and precision of the estimates of the following parameters: species-level intercepts (β 0i ) and slopes (β 1i ) in the abundance model and overall species richness (R). For β 0i and β 1i , we first calculated the absolute differences between the estimates (posterior mean of the parameter) and true values for each species, i.e., \(\left| {\hat{\beta }_{0i} - \beta_{0i} } \right|\), and then averaged them across the observed species, ignoring unobserved species in this calculation. We then averaged this average over the ten replicate data sets to quantify the bias of this estimate in a given simulation scenario. As a measure of precision, we averaged the standard deviations (SDs) of estimates across the observed species, and again averaged over the ten replicate data sets. For species richness R, we simply calculated absolute differences between estimates (posterior means) and the true value (40), and averaged this value across the ten replicate data sets. We also averaged SDs of estimated R over the ten replicates, and treated this as a measure of imprecision (analogous to a standard error). We plotted these averaged values as a function of the number of sampling sites (n site ) and visits (n visit ) for each combination of μ r and \(\mu_{{\beta_{0} }}\).

In these calculations, we took the absolute differences between the true and estimated values of the intercepts and slopes for each species, because otherwise negative and positive errors would cancel out. In contrast, for species richness R, the posterior means were usually larger than the true value of 40 and there was only a very slight difference in the results between absolute and raw differences. To summarize the relative effects of experimental factors on the bias and imprecision, we conducted an analysis of variance (ANOVA) on the bias or the imprecision of each replication and viewed the four factors in our simulation design as treatments (n site , n visit , μ r , \(\mu_{{\beta_{0} }}\)), with all interactions included. Following White et al. (2014), we evaluated sums of squares (SS) and mean SS rather than the statistical significance of the ANOVA to assess the importance of the effects (Tyler and Hargrove 1997; Fahrig 2001; Fletcher 2006).

Simulation experiment 2: comparing community abundance and occupancy models as species richness estimators

When a Poisson distribution has low expected values \((\bar{\lambda }\)), both the realized abundance and the observed count data will consist mostly of zeroes and ones and hence will be approximately equal to the binary presence/absence and detection/non-detection, respectively. Therefore, we might expect the community occupancy model to achieve convergence faster and perform in a similar fashion to the community abundance model, which may need more MCMC iterations to achieve convergence. To test this expectation, we compared the estimates of species richness \((\hat{R}\)) between community abundance and community occupancy models when \(\bar{\lambda }\) \(({ \exp }[\mu_{{\beta_{0} }} ]\)) was low (=0.1, 0.5). We simulated communities and obtained count histories using the same methods described in the previous section. We then fitted both the community abundance and the community occupancy models to the same data sets, where we reduced the count histories to binary detection/non-detection histories. Because of the differences in model structure, we could no longer directly compare species-level parameters (e.g., β 0i ) between community occupancy and abundance models but we could still compare the estimates of community properties such as \(\hat{R}\). We fitted the community occupancy model with a single covariate (x j ), and drew species-level parameters (β 0i , β 1i , r i ) from separate independent normal distributions. We used the same settings (e.g., conditions, replications, priors) as were chosen in the first experiment (e.g., a super-community size S = 80).

Simulation experiment 3: partitioning γ diversity into α and β diversity subject to imperfect detection

We used additive partitioning (Veech et al. 2002; Crist et al. 2003) in the community abundance models for those scenarios of the sampling design where community abundance models achieved good performance (see “Results”): n site  = 40, n visit  = 4, and mean abundance \((\bar{\lambda } = { \exp }(\mu_{{\beta_{0} }} )\)) = 0.5. We only varied the mean detection probability \((\bar{p}\)), which will affect the observed site-level species richness, at the three levels of 0.1, 0.2, and 0.5. We conducted these simulations in an analogous way as in the above experiments, and treated the number of detected species throughout the survey (in any visits at any sites) as the observed (i.e., detection-naïve) γ diversity of species richness. Because the number of detected species throughout the survey (r) was always ≥35, we used 40 augmented species in the analysis. The mean site-level detection-naïve species richness is the observed α diversity; hence, we obtained the detection-naïve β diversity estimate by subtracting the mean of the α diversity from γ diversity, i.e., β = γ − mean (α) (Veech et al. 2002; Crist et al. 2003). We calculated the site-level Shannon index (α diversity of Shannon index) using the maximum count (over the four visits) of each species and used the overall Shannon index as a measure of γ diversity by summing these maximum counts for each species across the sites. We then obtained the β diversity of the Shannon index by subtraction (Crist et al. 2003). As a benchmark, we obtained the true values of these measures in an analogous way from the known true values of abundance in the simulation. We conducted these calculations using the function adipart in R package vegan 2.2-1 (Oksanen et al. 2015). In community abundance models, we obtained estimates of α, β, and γ diversity of species richness and the Shannon index using species- and site-specific abundance estimates. We also obtained the estimates of the expected species richness at each site (Eq. 6), and of the corresponding estimates of mean α and β diversity of species richness. We used an alternative location of the data augmentation indicator variable to minimize computation time (see “Discussion”). We conducted the analysis using the function autojags in R package jagsUI 1.3.1 (Kellner 2015), discarded as a burn-in the first 10,000 iterations, and then ran additional sets of 100,000 iterations until convergence was achieved (with parallel computing). For each level of the mean detection probability \((\bar{p}\)) we generated and analyzed ten data sets, and obtained mean values of species diversity indices through the replications.


Experiment 1: bias and precision of community abundance models

Bias and precision showed similar responses to the four factors (Figs. 1, 2, 3; and S1–3). The effects of mean abundance \((\bar{\lambda }\)), detection probability \((\bar{p} = 1/[1 + { \exp }[ - \mu_{r} ]]\)), number of visits (n visit ), and number of sites (n site ) were as expected; that is, bias and imprecision decreased with increasing values of all these variables (Figs. 1, 2, 3; and S1–3). The results of the ANOVA suggested that variation in \(\bar{p}\) was most important among the four experimental factors to explain variation in bias and imprecision of mean abundance (i.e., the intercepts β 0i ); its sums of squares (SS) and/or mean SS were the largest (Table 1). In contrast, variation in abundance \((\bar{\lambda }\)) was the most important factor for explaining variation in bias and imprecision of the slopes (β 1i ) and species richness (R). When mean abundance \(\bar{\lambda }\) was larger than or equal to 0.5 (i.e., the average species had mean abundance greater than or equal to 0.5), the bias and imprecision in estimates of β 1i and R decreased greatly (Figs. 2, 3; and S2–3). For all parameters, the number of surveyed sites (n site ) was more important than was the number of visits (n visit ), and this also held when we removed the data with n site  = 40 (Table S1). We found evidence for interacting effects for all three parameters (β 0i , β 1i , and R), and specific interaction effects were different between the two species-level parameters (β 0i , β 1i ). Interaction terms important to the community-level property R included those important to species-level parameters (Table 1). Interaction terms between \(\bar{\lambda }\) and n site , and between \(\bar{\lambda }\) and \(\bar{p}\) were the most important, indicating that larger n site (≥20) and \(\bar{p}\) (≥0.2) and large values of \(\bar{\lambda }\) (≥0.5) increased the effectiveness of community abundance models to accurately estimate species richness. These conditions also yielded relatively good performance for the intercepts and slope estimators (Figs. 1, 2; and S1–2).

Fig. 1

Bias of intercept estimates as a function of the number of sampling sites (s on the right axis = 10, 20, 40) and the number of visits (v on the left axis = 2, 3, 4). One plot is produced for each combination of \(\bar{\lambda }\) (=\({ \exp }[\mu_{{\beta_{0} }} ]\)) and \(\bar{p}\) (=1/[1 + exp[− μ r]]). Each bar indicates the averaged value of ten replicate data sets for each combination of s and v. Fig. S1 shows the corresponding results for the imprecision measure

Fig. 2

Bias of the slope estimates as a function of the number of sampling sites (s on the right axis = 10, 20, 40) and the number of visits (v on the left axis = 2, 3, 4). Details are seen in Figs. 1 and S2 for the corresponding results for the imprecision measure

Fig. 3

Bias of species richness estimates as a function of the number of sampling sites (s on the right axis = 10, 20, 40) and the number of visits (v on the left axis = 2, 3, 4). Details are seen in Figs. 1 and S3 for the analogous results for the imprecision measure

Table 1 ANOVA tables for the contributions to variations in bias and precision by four factors and their interactions

Experiment 2: community abundance and occupancy models as species richness estimators

Because the convergence of MCMC chains took a very long time for the combination of \(\bar{\lambda }\) = 0.1 and \(\bar{p}\) = 0.1 (e.g., more than 2–3 days for a single data set), we did not include the analyses for this combination in our simulation results. For the analyses with the other combinations of \(\bar{\lambda }\) and \(\bar{p}\), there were only modest differences in bias and imprecision of the estimates under the two versions of the community models (Figs. 4, S4). However, the bias and imprecision of estimators under the community abundance model were slightly smaller than that of estimators of the community occupancy models for the scenario producing the highest expected counts \((\bar{\lambda }\) = \(\bar{p}\) = 0.5). The time to achieve convergence in the community occupancy model was less than half of that for the community abundance model (Table 2).

Fig. 4

Bias of species richness estimates derived from the community abundance and community occupancy models. Community abundance models (CAMs) and community occupancy models (COMs) have different structures; therefore, only the estimates of species richness (R) are directly comparable. Fig. S4 shows the analogous results for the imprecision measure

Table 2 Time required for convergence by a community abundance model (CAM) and community occupancy models (COM) for five combinations of mean abundance and mean detection probability

Experiment 3: partitioning of γ diversity into α and β diversity

In the scenario of the lowest value for detection probability \((\bar{p}\) = 0.1), community abundance models slightly overestimated γ and α diversity of species richness (Fig. 5). On the other hand, the observed (i.e., detection-naïve) values of α were much lower than true values and those of γ diversity were slightly lower, resulting in the overestimation of β diversity. Community abundance models produced unbiased estimates of the α, β, and γ diversity based on the Shannon index, while the observed values again underestimated α diversity and overestimated β diversity. The bias of estimates from community abundance models and also of the observed values decreased for larger \(\bar{p}\), and was negligible even for the observed values once \(\bar{p}\) became 0.5. The expected α diversity of species richness (obtained from Eq. 6) was almost the same for posterior means of the number of species with at least one individual in each site. This equivalence suggests that we can use either of the two metrics as the estimates of local-species richness interchangeably.

Fig. 5

Additive partitioning of species richness and Shannon indices under a 40 sites and four visits sampling design with different mean detection probabilities. For this analysis, we subtracted overall γ diversity from the mean site-level α diversity, and thus obtained β diversity. We fixed mean abundance \((\bar{\lambda }\)), the numbers of sampling sites and the number of visits, and only varied the mean detection probability \((\bar{p}\)). For each of three values of \(\bar{p},\) we repeated the generation of communities and their analyses ten times, and here show mean diversity indices. The leftmost bar shows the true values, and community abundance models (CAMs) and rightmost bars are the estimates and naïve values, respectively. In the results of species richness, we also obtained expected α diversity, obtained from Eq. 6


Performance of community abundance models in small-sample situations

In this first published performance assessment of the community abundance models of Yamaura et al. (2012), we found that different variables were most influential in explaining the magnitude of the bias and imprecision of estimates of species-level intercepts (β 1i ), slopes (β 1i ), and species richness (R). The average detection probability \((\bar{p}\)) had the most influence on the estimates of β 0i . In contrast, average abundance \((\bar{\lambda }\)) was most influential on the estimates of β 1i and R. It has previously been shown that the precision and the accuracy of intercepts in the abundance model (i.e., the estimated abundance) were greatly affected by detection probability in N-mixture models (McIntyre et al. 2012; Yamaura 2013), which is consistent with the result of our present study. Both results emphasize the difficulty in achieving accurate estimates of community abundance when the detection probability is very low (which of course is a manifestation of “the first law of capture-recapture”: p. 246 in Kéry and Royle 2016). Modification of the sampling methods to increase \(\bar{p}\) would be important in such cases, for example, by spending ≥15 min instead of 5 min (Drapeau et al. 1999), or by broadcasting mobbing calls (Senzaki et al. 2015) when counting birds.

In contrast, \(\bar{\lambda }\) was the most important factor explaining the magnitude of bias and imprecision of β 1i and R. This highlights the difficulty in quantifying species richness and changes in community composition when abundances of many species are low. In such cases, an increase in the sampling area would be useful to increase the number of individuals exposed to sampling. For instance, Bibby et al. (2000) recommended adopting line transects instead of point counts when bird densities are low. Indeed, relatively accurate estimates would be expected with \(\bar{\lambda }\) ≥ 1 even for low values of \(\bar{p}\) (Figs. 2, 3 and S2–3). Of course, this would also entail an increase in effort.

For all three parameters that were of primary interest in our simulation (β 0i , β 1i , and R), the number of sampling sites (n site ) was more important than was the number of visits (n visit ). This suggests that when we model abundances of multiple species at sites with different environments, at least in the small-sample situations considered here, it would be more beneficial to increase the number of sampling sites than to increase the number of replicate surveys. We note that n site was more important than n visit even after removing the replications where n site  = 40. The trade-off between the number of sites and the number of visits per site has previously been addressed in single-species models several times. That is, an increase in the number of visits (e.g., 2–3 or 4) was typically found to be more important than increasing the number of sampling sites to improve the estimates under single-species occupancy (Guillera-Arroita and Lahoz-Monfort 2012) and N-mixture models (McIntyre et al. 2012), and also in community occupancy models (Sanderlin et al. 2014). Therefore, the results of previous studies do not concur with this study. In single-species occupancy models, the importance of n visit increased when detection probability was smaller (MacKenzie and Royle 2005; Guillera-Arroita et al. 2010; Guillera-Arroita and Lahoz-Monfort 2012). Differences between previous studies and this study suggests that the relative importance of n site to n visit may increase when multiple species are modeled whose abundance varies among sites. In other words, when sampling sites are heterogeneous with respect to densities of individual species, we need to sample more sites to improve the estimates of community abundance models.

Community abundance and occupancy models as species richness estimators

Our simulation experiments suggest that increases in mean detection probability \(\bar{p}\) (≥0.2) and especially of the number of sampling sites n site (≥ 20) are useful options at large values of mean abundance \(\bar{\lambda }\) (≥0.5) to increase the performance of community abundance models as a species richness estimator. Community abundance models provide an abundance-based species richness estimator, which is unique in that the abundance and detection probability of each species are treated separately, and species richness is estimated via an ensemble of species-level elemental Poisson models. Estimates of site-specific species richness (R j ) are obtained from the posterior distributions of the sum of species with N ij w ij  > 0 (but see below).

Our experiments also suggest that, as expected, when mean species-level abundance is small \((\bar{\lambda }\) ≤ 0.5), estimates of species richness by community abundance and community occupancy models are fairly similar. Differences between the two models were only evident when the expected counts were the largest \((\bar{\lambda }\) = \(\bar{p}\) = 0.5). In that case, bias and imprecision were smaller under the community abundance models than under the corresponding community occupancy models. In N-mixture models for a single species, Yamaura (2013) found that the expected abundance has to be 1 or more to achieve accurate estimations in small-sample situations. This is probably why community abundance models attained better estimation results than did community occupancy models. In these cases, we would be able to make the best use of count data to increase the accuracy of species richness estimates.

Location of an indicator variable in the hierarchical model formulation

In our data augmentation scheme, we originally inserted the data augmentation variable (w i ) into the detection model (in this study, Eq. 4: Yamaura et al. 2011). However, in community occupancy models, this variable is usually inserted into the ecological model (Royle et al. 2007; Royle and Dorazio 2008). If we also do this in community abundance models, an indicator variable is inserted into Eq. 1, and the series of equations defining the likelihood of the hierarchical model becomes the following:

$$w_{ij} \sim {\text{ Bernoulli}}({\Omega})$$
$$N_{ij} \sim {\text{ Poisson}}(\lambda_{ij} w_{ij} ),$$
$$y_{ijk} \sim {\text{ Binomial}}\left( {N_{ij} ,p_{i} } \right).$$

This change of the location of the indicator variable zeroes out the abundance of species that are not part of the sampled community (and therefore have structural zeros in the observed data), and it seems that the subsequent estimation is more efficient. Indeed, for the same detection history data under \(\bar{\lambda }\) = 0.5, \(\bar{p}\) = 0.2, n site  = 20, n visit  = 3, this change of the location of the indicator variable decreased computation time to nearly 30 % (232 min vs. 64 min), and the two sets of estimates appeared to be identical up to MC simulation error. We then conducted additional simulations of the first experiments for values of \(\bar{\lambda }\) = 0.1, \(\bar{p}\) = 0.5, n site  = 10, n visit  = 2 with ten simulated data sets and fit both parameterizations of the community abundance model. The computation time for the model with a changed location of the indicator variable was again much smaller (61 ± 33 vs. 31 ± 1 min). We note that these computation times did not include the initial 110,000 iterations including burn-in (ca. 36 min in this case) in which the chain convergence was not achieved. Long computation times are a challenge in the application of these parameter-rich models; therefore, we recommend the parameterization in Eq. 79. We note that the same algorithmic equivalence in a community abundance model with data augmentation has independently been discovered by Tobler et al. (2015).

Partitioning γ diversity into α and β diversity subject to imperfect detection

Quantifying β diversity is an important component of community analysis (Anderson et al. 2011; Legendre 2014), and our results using additive partitioning showed that imperfect detection can be confounded by the turnover of species composition among sites. Community abundance models were successful at resolving this confounding of imperfect detection and species turnover, and yielded more accurate estimates of α, β, and γ diversity. The confounding of imperfect detection with ecological parameters has been observed in the single species situations for population extinction and colonization rates (Moilanen 2002; Kéry 2004; Kéry et al. 2006; 2013). Beck et al. (2013) also showed that most metrics measuring community differences are susceptible to incomplete sampling, and called for the development of robust metrics. In our simulation experiments, although we assumed that detection probability of individual species was constant among the sampling sites, this assumption cannot be always accepted in field surveys. That is, open habitats can have higher detection probability than closed habitats (e.g., Ruiz-Gutiérrez et al. 2010), and detection-naïve diversity measures can be confounded with covariates (Mc New and Handel 2015). In such case, our community abundance model can be easily expanded to relax this assumption (Kéry 2008; Yamaura 2013). Hierarchical community abundance and occupancy models represent flexible and powerful methods that can be used to deal with these sampling issues, and open up a new avenue to study biological communities in varied situations.


  1. Alldredge MW, Pollock KH, Simons TR, Shriner SA (2007) Multiple-species analysis of point count data: a more parsimonious modelling framework. J Appl Ecol 44:281–290

    Article  Google Scholar 

  2. Anderson MJ, Crist TO, Chase JM, Vellend M, Inouye BD, Freestone AL, Sanders NJ, Cornell HV, Comita LS, Davies KF, Harrison SP, Kraft NJB, Stegen JC, Swenson NG (2011) Navigating the multiple meanings of β diversity: a roadmap for the practicing ecologist. Ecol Lett 14:19–28

    Article  PubMed  Google Scholar 

  3. Banks-Leite C, Pardini R, Boscolo D, Cassano CR, Püttker T, Barros CS, Barlow J (2014) Assessing the utility of statistical adjustments for imperfect detection in tropical conservation science. J Appl Ecol 51:849–859

    Article  PubMed  PubMed Central  Google Scholar 

  4. Barnagaud J-Y, Barbaro L, Papaïx J, Deconchat M, Brockerhoff EG (2014) Habitat filtering by landscape and local forest composition in native and exotic New Zealand birds. Ecology 95:78–87

    Article  PubMed  Google Scholar 

  5. Beck J, Holloway JD, Schwanghart W (2013) Undersampling and the measurement of beta diversity. Methods Ecol Evol 4:370–382

    Article  Google Scholar 

  6. Bibby CJ, Burgess ND, Hill DA, Mustoe SH (2000) Bird census techniques, 2nd edn. Academic Press, San Diego

    Google Scholar 

  7. Borchers DL, Buckland ST, Zucchini W (2002) Estimating animal abundance: closed populations. Springer, London

    Book  Google Scholar 

  8. Broms KM, Hooten MB, Fitzpatrick RM (2015) Accounting for imperfect detection in Hill numbers for biodiversity studies. Methods Ecol Evol 6:99–108

    Article  Google Scholar 

  9. Buckland ST, Anderson DR, Burnham KP, Laake JL, Borchers DL, Thomas L (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford

    Google Scholar 

  10. Buckland ST, Anderson DR, Burnham KP, Laake JL, Borchers DL, Thomas L (2004) Advanced distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford

    Google Scholar 

  11. Buckland ST, Marsden SJ, Green RE (2008) Estimating bird abundance: making methods work. Bird Conserv Int 18:S91–S108

    Article  Google Scholar 

  12. Chandler RB, King DI, Raudales R, Trubey R, Chandler C, Chávez VJA (2013) A small-scale land-sparing approach to conserving biological diversity in tropical agricultural landscapes. Conserv Biol 27:785–795

    Article  PubMed  Google Scholar 

  13. Chao A, Chazdon RL, Colwell RK, Shen T-J (2005) A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecol Lett 8:148–159

    Article  Google Scholar 

  14. Chao A, Colwell RK, Lin C-W, Gotelli NJ (2009) Sufficient sampling for asymptotic minimum species richness estimators. Ecology 90:1125–1133

    Article  PubMed  Google Scholar 

  15. Chen G, Kéry M, Plattner M, Ma K, Gardner B (2013) Imperfect detection is the rule rather than the exception in plant distribution studies. J Ecol 101:183–191

    Article  Google Scholar 

  16. Couturier T, Cheylan M, Bertolero A, Astruc G, Besnard A (2013) Estimating abundance and population trends when detection is low and highly variable: a comparison of three methods for the Hermann’s tortoise. J Wildl Manage 77:454–462

    Article  Google Scholar 

  17. Crist TO, Veech JA, Gering JC, Summerville KS (2003) Partitioning species diversity across landscapes and regions: a hierarchical analysis of α, β, and γ diversity. Am Nat 162:734–743

    Article  PubMed  Google Scholar 

  18. Dénes FV, Silveira LF, Beissinger SR (2015) Estimating abundance of unmarked animal populations: accounting for imperfect detection and other sources of zero inflation. Methods Ecol Evol 6:543–556

    Article  Google Scholar 

  19. Dennis EB, Morgan BJT, Ridout MS (2015) Computational aspects of N-mixture models. Biometrics 71:237–246

    Article  PubMed  PubMed Central  Google Scholar 

  20. Dorazio RM, Connor EF (2014) Estimating abundances of interacting species using morphological traits, foraging guilds, and habitat. PLoS ONE 9:e94323

    Article  PubMed  PubMed Central  Google Scholar 

  21. Dorazio RM, Royle JA (2005) Estimating size and composition of biological communities by modeling the occurrence of species. J Am Stat Assoc 100:389–398

    CAS  Article  Google Scholar 

  22. Dorazio RM, Royle JA, Söderström B, Glimskär A (2006) Estimating species richness and accumulation by modeling species occurrence and detectability. Ecology 87:842–854

    Article  PubMed  Google Scholar 

  23. Dorazio RM, Kéry M, Royle JA, Plattner M (2010) Models for inference in dynamic metacommunity systems. Ecology 91:2466–2475

    Article  PubMed  Google Scholar 

  24. Dorazio RM, Gotelli NJ, Ellison AM (2011) Modern methods of estimating biodiversity from presence-absence surveys. In: Grillo O, Venora G (eds) Biodiversity loss in a changing planet. InTech, New York, pp 277–302

    Google Scholar 

  25. Drapeau P, Leduc A, McNeil R (1999) Refining the use of point counts at the scale of individual points in studies of bird-habitat relationships. J Avian Biol 30:367–382

    Article  Google Scholar 

  26. Ehrlich PR, Roughgarden J (1987) The science of ecology. Macmillan Publishing, New York

    Google Scholar 

  27. Fahrig L (2001) How much habitat is enough? Biol Conserv 100:65–74

    Article  Google Scholar 

  28. Fletcher RJ (2006) Emergent properties of conspecific attraction in fragmented landscapes. Am Nat 168:207–219

    Article  PubMed  Google Scholar 

  29. Gelfand AE, Schmidt AM, Wu S, Silander JA, Latimer A, Rebelo AG (2005) Modelling species diversity through species level hierarchical modelling. J Roy Stat Soc: Ser C (Appl Stat) 54:1–20

    Article  Google Scholar 

  30. Gotelli NJ, Colwell RK (2001) Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecol Lett 4:379–391

    Article  Google Scholar 

  31. Guillera-Arroita G, Lahoz-Monfort JJ (2012) Designing studies to detect differences in species occupancy: power analysis under imperfect detection. Methods Ecol Evol 3:860–869

    Article  Google Scholar 

  32. Guillera-Arroita G, Ridout MS, Morgan BJT (2010) Design of occupancy studies with imperfect detection. Methods Ecol Evol 1:131–139

    Article  Google Scholar 

  33. Guillera-Arroita G, Lahoz-Monfort JJ, MacKenzie DI, Wintle BA, McCarthy MA (2014) Ignoring imperfect detection in biological surveys is dangerous: a response to ‘Fitting and interpreting occupancy models’. PLoS ONE 9:e99571

    Article  PubMed  PubMed Central  Google Scholar 

  34. Guillera-Arroita G, Lahoz-Monfort JJ, Elith J, Gordon A, Kujala H, Lentini PE, McCarthy MA, Tingley R, Wintle BA (2015) Is my species distribution model fit for purpose? Matching data and models to applications. Global Ecol Biogeogr 24:276–292

    Article  Google Scholar 

  35. Higa M, Yamaura Y, Koizumi I, Yabuhara Y, Senzaki M, Ono S (2015) Mapping large-scale bird distributions using occupancy models and citizen data with spatially biased sampling effort. Divers Distrib 21:46–54

    Article  Google Scholar 

  36. Hunt JW, Weckerly FW, Ott JR (2012) Reliability of occupancy and binomial mixture models for estimating abundance of golden-cheeked warblers (Setophaga chrysoparia). Auk 129:105–114

    Article  Google Scholar 

  37. Iknayan KJ, Tingley MW, Furnas BJ, Beissinger SR (2014) Detecting diversity: emerging methods to estimate species diversity. Trends Ecol Evol 29:97–106

    Article  PubMed  Google Scholar 

  38. Johnson DH (2008) In defense of indices: the case of bird surveys. J Wildl Manage 72:857–868

    Article  Google Scholar 

  39. Kellner K (2015) jagsUI: a wrapper around rjags to streamline JAGS analyses. Accessed 17 Apr 2014

  40. Kéry M (2004) Extinction rate estimates for plant populations in revisitation studies: importance of detectability. Conserv Biol 18:570–574

    Article  Google Scholar 

  41. Kéry M (2008) Estimating abundance from bird counts: binomial mixture models uncover complex covariate relationships. Auk 125:336–345

    Article  Google Scholar 

  42. Kéry M (2010) Introduction to WinBUGS for ecologists: a Bayesian approach to regression, ANOVA, mixed models and related analysis. Academic Press, San Diego

    Google Scholar 

  43. Kéry M (2011) Species richness and community dynamics: a conceptual framework. In: O’Connell AF, Nichols JD, Karanth KU (eds) Camera traps in animal ecology. Springer, Tokyo, pp 207–231

    Google Scholar 

  44. Kéry M, Royle JA (2008) Hierarchical Bayes estimation of species richness and occupancy in spatially replicated surveys. J Appl Ecol 45:589–598

    Article  Google Scholar 

  45. Kéry M, Royle JA (2009) Inference about species richness and community structure using species-specific occupancy models in the National Swiss Breeding Bird Survey MHB. In: Thomson D, Cooch E, Conroy M (eds) Modeling demographic processes in marked populations. Springer, US, pp 639–656

    Google Scholar 

  46. Kéry M, Royle JA (2016) Applied hierarchical modeling in ecology: analysis of distribution, abundance and species richness using R and BUGS, vol 1. Prelude and static models. Academic Press, San Diego

    Google Scholar 

  47. Kéry M, Schaub M (2012) Bayesian population analysis using WinBUGS: a hierarchical perspective. Academic Press, San Diego

    Google Scholar 

  48. Kéry M, Royle JA, Schmid H (2005) Modeling avian abundance from replicated counts using binomial mixture models. Ecol Appl 15:1450–1461

    Article  Google Scholar 

  49. Kéry M, Spillmann JH, Truong C, Holderegger R (2006) How biased are estimates of extinction probability in revisitation studies? J Ecol 94:980–986

    Article  Google Scholar 

  50. Kéry M, Gardner B, Monnerat C (2010) Predicting species distributions from checklist data using site-occupancy models. J Biogeogr 37:1851–1862

    Google Scholar 

  51. Kéry M, Guillera-Arroita G, Lahoz-Monfort JJ (2013) Analysing and mapping species range dynamics using occupancy models. J Biogeogr 40:1463–1474

    Article  Google Scholar 

  52. King R, Morgan BJT, Gimenez O, Brooks SP (2010) Bayesian analysis for population ecology. CRC Press, Boca Raton

    Google Scholar 

  53. Lahoz-Monfort JJ, Guillera-Arroita G, Wintle BA (2014) Imperfect detection impacts the performance of species distribution models. Global Ecol Biogeogr 23:504–515

    Article  Google Scholar 

  54. Legendre P (2014) Interpreting the replacement and richness difference components of beta diversity. Global Ecol Biogeogr 23:1324–1334

    Article  Google Scholar 

  55. Link WA, Barker RJ (2009) Bayesian Inference: with ecological applications. Academic Press, San Diego

    Google Scholar 

  56. Link WA, Sauer JR (1996) Extremes in ecology: avoiding the misleading effects of sampling variation in summary analyses. Ecology 77:1633–1640

    Article  Google Scholar 

  57. MacKenzie DI, Royle JA (2005) Designing occupancy studies: general advice and allocating survey effort. J Appl Ecol 42:1105–1114

    Article  Google Scholar 

  58. MacKenzie DI, Nichols JD, Sutton N, Kawanishi K, Bailey LL (2005) Improving inferences in population studies of rare species that are detected imperfectly. Ecology 86:1101–1113

    Article  Google Scholar 

  59. MacKenzie DI, Nichols JD, Royle JA, Pollock KH, Bailey LL, Hines JE (2006) Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. Academic Press, Amsterdam

    Google Scholar 

  60. Mc New LB, Handel CM (2015) Evaluating species richness: biased ecological inference results from spatial heterogeneity in detection probabilities. Ecol Appl 25:1669–1680

    Article  Google Scholar 

  61. McIntyre AP, Jones JE, Lund EM, Waterstrat FT, Giovanini JN, Duke SD, Hayes MP, Quinn T, Kroll AJ (2012) Empirical and simulation evaluations of an abundance estimator using unmarked individuals of cryptic forest-dwelling taxa. For Ecol Manage 286:129–136

    Article  Google Scholar 

  62. Moilanen A (2002) Implications of empirical data quality to metapopulation model parameter estimation and application. Oikos 96:516–530

    Article  Google Scholar 

  63. O’Hara RB (2005) Species richness estimators: how many species can dance on the head of a pin? J Anim Ecol 74:375–386

    Article  Google Scholar 

  64. Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H (2015) vegan: community ecology R package. Accessed 17 Apr 2014

  65. Ovaskainen O, Soininen J (2011) Making more out of sparse data: hierarchical modeling of species communities. Ecology 92:289–295

    Article  PubMed  Google Scholar 

  66. Pacifici K, Zipkin EF, Collazo JA, Irizarry JI, DeWan A (2014) Guidelines for a priori grouping of species in hierarchical community models. Ecol Evol 4:877–888

    Article  PubMed  PubMed Central  Google Scholar 

  67. Plummer M (2013) JAGS: Just another Gibbs sampler.

  68. R Core Team (2014) R: a language and environment for statistical computing: R Foundation for Statistical Computing. Vienna, Austria. Accessed 28 Apr 2014

  69. Rota CT, Fletcher RJ, Evans JM, Hutto RL (2011) Does accounting for imperfect detection improve species distribution models? Ecography 34:659–670

    Article  Google Scholar 

  70. Royle JA (2004) N-mixture models for estimating population size from spatially replicated counts. Biometrics 60:108–115

    Article  PubMed  Google Scholar 

  71. Royle JA, Dorazio RM (2008) Hierarchical modeling and inference in ecology: the analysis of data from populations, metapopulations and communities. Academic Press, Amsterdam

    Google Scholar 

  72. Royle JA, Nichols JD, Kéry M (2005) Modelling occurrence and abundance of species when detection is imperfect. Oikos 110:353–359

    Article  Google Scholar 

  73. Royle JA, Dorazio RM, Link WA (2007) Analysis of multinomial models with unknown index using data augmentation. J Comput Graph Stat 16:67–85

    Article  Google Scholar 

  74. Royle JA, Chandler RB, Sollmann R, Gardner B (2014) Spatial capture-recapture. Elsevier, Amsterdam

    Google Scholar 

  75. Ruiz-Gutiérrez V, Zipkin EF, Dhondt AA (2010) Occupancy dynamics in a tropical bird community: unexpectedly high forest use by birds classified as non-forest species. J Appl Ecol 47:621–630

    Article  Google Scholar 

  76. Sanderlin JS, Block WM, Ganey JL (2014) Optimizing study design for multi-species avian monitoring programmes. J Appl Ecol 51:860–870

    Article  Google Scholar 

  77. Sauer JR, Link WA (2002) Hierarchical modeling of population stability and species group attributes from survey data. Ecology 83:1743–1751

    Article  Google Scholar 

  78. Senzaki M, Yamaura Y, Nakamura F (2015) The usefulness of top predators as biodiversity surrogates indicated by the relationship between the reproductive outputs of raptors and other bird species. Biol Conserv 191:460–468

    Article  Google Scholar 

  79. Su Y-S, Yajima M (2013) R2jags: a package for running jags from R. Accessed 5 Jan 2014

  80. Tobler MW, Hartley AZ, Carrillo-Percastegui SE, Powell GVN (2015) Spatiotemporal hierarchical modelling of species richness and occupancy using camera trap data. J Appl Ecol 52:413–421

    Article  Google Scholar 

  81. Tyler JA, Hargrove WW (1997) Predicting spatial distribution of foragers over large resource landscapes: a modeling analysis of the Ideal Free Distribution. Oikos 79:376–386

    Article  Google Scholar 

  82. Veech JA, Summerville KS, Crist TO, Gering JC (2002) The additive partitioning of species diversity: recent revival of an old idea. Oikos 99:3–9

    Article  Google Scholar 

  83. Welsh AH, Lindenmayer DB, Donnelly CF (2013) Fitting and interpreting occupancy models. PLoS ONE 8:e52015

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  84. White JW, Rassweiler A, Samhouri JF, Stier AC, White C (2014) Ecologists should not use statistical significance tests to interpret simulation model results. Oikos 123:385–388

    Article  Google Scholar 

  85. Williams BK, Nichols JD, Conroy MJ (2002) Analysis and management of animal populations. Academic Press, San Diego

    Google Scholar 

  86. Wintle BA, Walshe TV, Parris KM, McCarthy MA (2012) Designing occupancy surveys and interpreting non-detection when observations are imperfect. Divers Distrib 18:417–424

    Article  Google Scholar 

  87. Yamaura Y (2013) Confronting imperfect detection: behavior of binomial mixture models under varying circumstances of visits, sampling sites, detectability, and abundance, in small-sample situations. Ornithol Sci 12:73–88

    Article  Google Scholar 

  88. Yamaura Y, Royle JA, Kuboi K, Tada T, Ikeno S, Makino S (2011) Modelling community dynamics based on species-level abundance models from detection/nondetection data. J Appl Ecol 48:67–75

    Article  Google Scholar 

  89. Yamaura Y, Royle JA, Shimada N, Asanuma S, Sato T, Taki H, Makino S (2012) Biodiversity of man-made open habitats in an underused country: a class of multispecies abundance models for count data. Biodivers Conserv 21:1365–1380

    Article  Google Scholar 

  90. Zipkin EF, DeWan A, Royle JA (2009) Impacts of forest fragmentation on species richness: a hierarchical approach to community modelling. J Appl Ecol 46:815–822

    Article  Google Scholar 

Download references


Y. Yamaura was supported by a KAKENHI Grant in Aid for Scientific Research from the Japanese Society for the Promotion of Science (Grant Nos. 23780153 and 26292074). M. Kéry was supported by the Swiss National Science Foundation (Grant No. 31_146125). We are grateful to G. Guillera-Arroita for her helpful suggestions on our initial manuscript. Our manuscript was greatly improved by comments from M. Tingley, an anonymous reviewer and R. Russell.

Author information



Corresponding author

Correspondence to Yuichi Yamaura.

Additional information

Yuichi Yamaura is the recipient of the 18th Denzaburo Miyadi Award.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 656 kb)

Supplementary material 2 (PDF 16 kb)

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yamaura, Y., Kéry, M. & Andrew Royle, J. Study of biological communities subject to imperfect detection: bias and precision of community N-mixture abundance models in small-sample situations. Ecol Res 31, 289–305 (2016).

Download citation


  • β (beta) diversity
  • Count data
  • Data augmentation
  • False negative
  • Species richness