Skip to main content
Log in

Parameter-expanded data augmentation for Bayesian analysis of capture–recapture models

  • EURING Proceedings
  • Published:
Journal of Ornithology Aims and scope Submit manuscript

Abstract

Data augmentation (DA) is a flexible tool for analyzing closed and open population models of capture–recapture data, especially models which include sources of hetereogeneity among individuals. The essential concept underlying DA, as we use the term, is based on adding “observations” to create a dataset composed of a known number of individuals. This new (augmented) dataset, which includes the unknown number of individuals N in the population, is then analyzed using a new model that includes a reformulation of the parameter N in the conventional model of the observed (unaugmented) data. In the context of capture–recapture models, we add a set of “all zero” encounter histories which are not, in practice, observable. The model of the augmented dataset is a zero-inflated version of either a binomial or a multinomial base model. Thus, our use of DA provides a general approach for analyzing both closed and open population models of all types. In doing so, this approach provides a unified framework for the analysis of a huge range of models that are treated as unrelated “black boxes” and named procedures in the classical literature. As a practical matter, analysis of the augmented dataset by MCMC is greatly simplified compared to other methods that require specialized algorithms. For example, complex capture–recapture models of an augmented dataset can be fitted with popular MCMC software packages (WinBUGS or JAGS) by providing a concise statement of the model’s assumptions that usually involves only a few lines of pseudocode. In this paper, we review the basic technical concepts of data augmentation, and we provide examples of analyses of closed-population models (M 0, M h, distance sampling, and spatial capture–recapture models) and open-population models (Jolly–Seber) with individual effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bled F, Royle JA, Cam E (2010) Hierarchical modeling of an invasive spread: case of the Eurasian collared-dove Streptopelia decaocto in the USA. Ecol Appl (in press)

  • Bonner S, Schwarz C (2006) An extension of the Cormack Jolly Seber model for continuous covariates with application to Microtus pennsylvanicus. Biometrics 62:142–149

    CAS  PubMed  Google Scholar 

  • Borchers DL, Efford MG (2008) Spatially explicit maximum likelihood methods for capture–recapture studies. Biometrics 64:377–385

    CAS  PubMed  Google Scholar 

  • Burnham KP, Overton WS (1978) Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65:625–633

    Google Scholar 

  • Converse SJ, Royle JA (2010) Dealing with incomplete and variable detectability in multi-year, multi-site monitoring of ecological populations. In: Design and analysis of long-term ecological monitoring studies (in press)

  • Cooch E, White G (2001) Using MARK: a gentle introduction. Cornell University, Ithaca

    Google Scholar 

  • Coull BA, Agresti A (1999) The use of mixed logit models to reflect heterogeneity in capture–recapture studies. Biometrics 55:294–301

    CAS  PubMed  Google Scholar 

  • Crosbie SF, Manly BFJ (1985) Parsimonious modelling of capture–mark-recapture studies. Biometrics 41:385–398

    Google Scholar 

  • Dorazio RM, Royle JA (2003) Mixture models for estimating the size of a closed population when capture rates vary among individuals. Biometrics 59:350–363

    Google Scholar 

  • Dorazio RM, Royle JA (2005) Estimating size and composition of biological communities by modeling the occurrence of species. J Am Stat Assoc 100:389–398

    CAS  Google Scholar 

  • Dorazio RM, Royle JA, Soderstrom B, Glimskar A (2006) Estimating species richness and accumulation by modeling species occurrence and detectability. Ecology 87:842–854

    PubMed  Google Scholar 

  • Dorazio RM, Kéry M, Royle JA, Plattner M (2010) Models for inference in dynamic metacommunity systems. Ecology 91:2466–2475

    PubMed  Google Scholar 

  • Dupuis JA, Schwarz CJ (2007) A Bayesian approach to the multistate Jolly-Seber capture–recapture model. Biometrics 63:1015–1022

    PubMed  Google Scholar 

  • Durban JW, Elston DA (2005) Mark-recapture with occasion and individual effects: abundance estimation through Bayesian model selection in a fixed dimensional parameter space. J Agric Biol Environ Stat 10:291–305

    Google Scholar 

  • Efford M (2004) Density estimation in live-trapping studies. Oikos 106:598–610

    Google Scholar 

  • Gardner B, Royle JA, Wegan MT, Rainbolt RE, Curtis PD (2010) Estimating black bear density using DNA data from hair snares. J Wildl Manag 74:318–325

    Google Scholar 

  • Gardner B, Reppucci J, Lucherini M, Royle JA (2010b) Spatially-explicit inference for open populations: estimating demographic parameters from camera-trap studies. Ecology 91:3376–3383

    PubMed  Google Scholar 

  • Gimenez O, Rossi V, Choquet R, Dehais C, Doris B, Varella H, Vila JP, Pradel R (2007) State-space modelling of data on marked individuals. Ecol Model 206:431–438

    Google Scholar 

  • Gimenez O, Choquet R (2010) Individual heterogeneity in studies on marked animals using numerical integration: capture–recapture mixed models. Ecology 91:148–154

    Google Scholar 

  • Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732

    Google Scholar 

  • Johnson DH (1999) The insignificance of statistical significance testing. J Wildl Manag 63:763–772

    Google Scholar 

  • Jolly G (1965) Explicit estimates from capture–recapture data with both death and immigration—stochastic model. Biometrika 52:225–247

    CAS  PubMed  Google Scholar 

  • Karanth KU (1995) Estimating tiger (Panthera tigris) populations from camera-trap data using capture–recapture models. Biol Conserv 71:333–338

    Google Scholar 

  • Karanth KU, Nichols JD (1998) Estimation of tiger densities in India using photographic captures and recaptures. Ecology 79:2852–2862

    Google Scholar 

  • Karanth K, Nichols JD, Kumar N, Hines JE (2006) Assessing tiger population dynamics using photographic capture–recapture sampling. Ecology 87:2925–2937

    PubMed  Google Scholar 

  • Kéry M, Royle JA (2009) Inference about species richness and community structure using species-specific occupancy models in the National Swiss Breeding Bird Survey MHB. In: Thomson DL, Cooch EG, Conroy MJ (eds) Modeling demographic processes in marked populations. Springer, New York, pp 639–656

    Google Scholar 

  • Kéry M, Royle JA, Plattner M, Dorazio RM (2009) Species richness and occupancy estimation in communities subject to temporary emigration. Ecology 90:1279–1290

    PubMed  Google Scholar 

  • King R, Brooks SP (2001) On the Bayesian analysis of population size. Biometrika 88:317–336

    Google Scholar 

  • King R, Brooks SP (2008) On the Bayesian estimation of a closed population size in the presence of heterogeneity and model uncertainty. Biometrics 64:816–824

    CAS  PubMed  Google Scholar 

  • King R, Brooks SP, Coulson T (2008) Analysing complex capture–recapture data in the presence of individual and temporal covariates and model uncertainty. Biometrics 64:1187–1195

    CAS  PubMed  Google Scholar 

  • Langtimm CA, Dorazio RM, Stith BM, Doyle TJ (2010) A new aerial survey design to monitor manatee abundance for Everglades restoration. J Wildl Manag (in press)

  • Lebreton JD, Burnham K, Clobert J, Anderson DR (1992) Modeling survival and testing biological hypotheses using marked animals: a unified approach with case studies. Ecol Monogr 62:67–118

    Google Scholar 

  • Link WA (2003) Nonidentifiability of population size from capture–recapture data with heterogeneous detection probabilities. Biometrics 59:1123–1130

    PubMed  Google Scholar 

  • Link WA, Barker RJ (2010) Bayesian inference: with ecological applications. Academic, New York

    Google Scholar 

  • Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94:1264–1274

    Google Scholar 

  • Lunn D, Spiegelhalter D, Thomas A, Best N (2009) The BUGS project: evolution, critique and future directions (with discussion). Stat Med 28:3049–3082

    PubMed  Google Scholar 

  • MacKenzie DI, Nichols JD, Lachman GB, Droege S, Royle JA, Langtimm CA (2002) Estimating site occupancy rates when detection probabilities are less than one. Ecology 83:2248–2255

    Google Scholar 

  • MacKenzie DI, Nichols JD, Hines JE, Knutson MG, Franklin AB (2003) Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology 84:2200–2207

    Google Scholar 

  • Nichols JD, Karanth KU (2002) Statistical concepts: assessing spatial distributions. In: Monitoring tigers and their prey: a manual for researchers, managers, and conservationists in tropical Asia. Centre for Wildlife Studies, pp 29–38

  • Patil A, Huard D, Fonnesbeck CJ (2010) PyMC 2.0: Bayesian stochastic modelling in python. J Stat Softw (in press)

  • Pledger S (2005) The performance of mixture models in heterogeneous closed population capture–recapture. Biometrics 61:868–873

    PubMed  Google Scholar 

  • Pledger S, Pollock KH, Norris JL (2003) Open capture–recapture models with heterogeneity: I. Cormack-Jolly-Seber model. Biometrics 59:786–794

    PubMed  Google Scholar 

  • Pollock K (1982) A capture–recapture design robust to unequal probability of capture. J Wildl Manag 46:757–760

    Google Scholar 

  • Royle JA (2006) Site occupancy models with heterogeneous detection probabilities. Biometrics 62:97–102

    PubMed  Google Scholar 

  • Royle JA (2008) Modeling individual effects in the Cormack-Jolly-Seber model: a state-space formulation. Biometrics 64:364–370

    PubMed  Google Scholar 

  • Royle JA (2009) Analysis of capture—recapture models with individual covariates using data augmentation. Biometrics 65:267–274

    PubMed  Google Scholar 

  • Royle JA, Kéry M (2007) A Bayesian state-space formulation of dynamic occupancy models. Ecology 88:1813–1823

    PubMed  Google Scholar 

  • Royle JA, Dorazio RM, Link WA (2007) Analysis of multinomial models with unknown index using data augmentation. J Comput Graph Stat 16:67–85

    Google Scholar 

  • Royle JA, Dorazio RM (2008) Hierarchical modeling and inference in ecology: the analysis of data from populations, metapopulations and communities. Academic, San Diego

    Google Scholar 

  • Royle JA, Gardner B (2010) Hierarchical spatial capture–recapture models for estimating density from trapping arrays. In: O'Connell AF, Nichols JD, Karanth KU (eds) Camera traps in animal ecology: methods and analyses. Springer, Berlin

  • Royle JA, Young KV (2008) A hierarchical model for spatial capture–recapture data. Ecology 89:2281–2289

    PubMed  Google Scholar 

  • Royle JA, Karanth KU, Gopalaswamy AM, Kumar NS (2009) Bayesian inference in camera trap studies using a class of spatial capture–recapture models. Ecology 90:3233–3244

    PubMed  Google Scholar 

  • Schwarz C, Arnason A (1996) A general methodology for the analysis of capture–recapture experiments in open populations. Biometrics 52:860–873

    Google Scholar 

  • Schofield MR, Barker RJ (2008) A unified capture–recapture framework. J Agric Biol Environ Stat 13:458–477

    Google Scholar 

  • Seber G (1965) A note on the multiple-recapture census. Biometrika 52:249–59

    CAS  PubMed  Google Scholar 

  • Tanner MA (1996) Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions, 3rd edn. Springer, New York

    Google Scholar 

  • Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82:528–540

    Google Scholar 

  • Williams BK, Nichols JD, Conroy MJ (2002) Analysis and management of animal populations. Academic, San Diego

    Google Scholar 

  • Wright JA, Barker RJ, Schofield MR, Frantz AC, Byrom AE, Gleeson DM (2009) Incorporating genotype uncertainty into mark-recapture–type models for estimating abundance using DNA samples. Biometrics 65:833–840

    CAS  PubMed  Google Scholar 

Download references

Acknowledgments

We thank Beth Gardner and Elise Zipkin for reviewing drafts of this manuscript. We thank Ullas Karanth (camera-trapping data) and Jim Nichols (Microtus data) for making data from their research available for our use. Use of trade, product, or firm names does not imply endorsement by the U.S. Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Andrew Royle.

Additional information

Communicated by M. Schaub.

Appendix: Technical conditions for PX-DA

Appendix: Technical conditions for PX-DA

In applications of PX-DA to capture–recapture problems, the conventional (multinomial) model of the complete data must be expanded to account for—and to estimate—the number of non-sampling zeros in the augmented dataset of known size M. The expanded model includes an additional parameter ψ, which is related to zero inflation. Some basic conditions must be satisfied for this expanded model to be innocuous with respect to the original inference problem (Liu and Wu 1999). In this appendix, we show that the expanded model of the augmented data satisfies these conditions.

We begin with a few definitions. Let f correspond to any distribution (posterior, prior, etc.) for the observed data y, and let p denote the same for the augmented data, which we denote by (yw). (In our applications, w is typically a vector of zeros that correspond to the all-zero capture histories.) Naturally, we desire that the posterior for the augmented data should be equivalent to the posterior based on the observed data. From Liu and Wu (1999), p(Np | yw) = f(Np | y) if and only if p(Np, ψ) “agrees with” f(Np) in the following sense:

$$ \int p(N,p,\psi)d\psi = f(N,p) $$

In other words, if the extra parameter ψ is integrated from the prior of the model of the augmented data, this yields the prior for the model of the observed data. In the RDL formulation, this integration yields a \(\hbox{U}(0,M) \times \hbox{U}(0,1)\) prior (M being some arbitrarily large integer) for the parameters (Np), thereby satisfying Liu and Wu’s first condition. The second condition to be satisfied is that the expanded model of the augmented data, p(yw | Np, ψ), preserves the model of the observed data, f(y | Np). This condition is satisfied automatically because our model of the augmented data can be considered as originating from the choice of prior on N (see “PX-DA for Model M0”), not by formulating a model that is structurally distinct from the observed-data model. Therefore, our choice of prior is sufficient to guarantee that we have satisfied the conditions of Liu and Wu (1999) and that the extra parameter ψ used in modeling the augmented data is innocuous to inference about N.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Royle, J.A., Dorazio, R.M. Parameter-expanded data augmentation for Bayesian analysis of capture–recapture models. J Ornithol 152 (Suppl 2), 521–537 (2012). https://doi.org/10.1007/s10336-010-0619-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10336-010-0619-4

Keywords

Navigation