Journal of Ornithology

, Volume 152, Supplement 2, pp 521–537

Parameter-expanded data augmentation for Bayesian analysis of capture–recapture models

EURING Proceedings

DOI: 10.1007/s10336-010-0619-4

Cite this article as:
Royle, J.A. & Dorazio, R.M. J Ornithol (2012) 152(Suppl 2): 521. doi:10.1007/s10336-010-0619-4

Abstract

Data augmentation (DA) is a flexible tool for analyzing closed and open population models of capture–recapture data, especially models which include sources of hetereogeneity among individuals. The essential concept underlying DA, as we use the term, is based on adding “observations” to create a dataset composed of a known number of individuals. This new (augmented) dataset, which includes the unknown number of individuals N in the population, is then analyzed using a new model that includes a reformulation of the parameter N in the conventional model of the observed (unaugmented) data. In the context of capture–recapture models, we add a set of “all zero” encounter histories which are not, in practice, observable. The model of the augmented dataset is a zero-inflated version of either a binomial or a multinomial base model. Thus, our use of DA provides a general approach for analyzing both closed and open population models of all types. In doing so, this approach provides a unified framework for the analysis of a huge range of models that are treated as unrelated “black boxes” and named procedures in the classical literature. As a practical matter, analysis of the augmented dataset by MCMC is greatly simplified compared to other methods that require specialized algorithms. For example, complex capture–recapture models of an augmented dataset can be fitted with popular MCMC software packages (WinBUGS or JAGS) by providing a concise statement of the model’s assumptions that usually involves only a few lines of pseudocode. In this paper, we review the basic technical concepts of data augmentation, and we provide examples of analyses of closed-population models (M0, Mh, distance sampling, and spatial capture–recapture models) and open-population models (Jolly–Seber) with individual effects.

Keywords

Hierarchical models Individual covariates Individual heterogeneity Markov chain Monte Carlo Occupancy models 

Introduction

Capture–recapture models with individual effects have received considerable attention in recent years. Much of this attention has focused on closed-population models, but the basic ideas have been applied in a limited sense to certain open-population models. As a working definition, models with “individual effects” are models in which parameters are functions of something that varies by individual. Several types of individual effects arise in practice. An important class of models includes those with latent (unobservable) sources of individual heterogeneity. So-called “Model Mh” is a standard closed-population model (e.g., Burnham and Overton 1978; Dorazio and Royle 2003; Link 2003; Pledger 2005), and similar ideas have been applied to open populations (Pledger et al. 2003; Gimenez et al. 2007; Royle 2009; Gimenez and Choquet 2010). Another useful class of models includes the individual covariate models, in which covariates are measured on each individual in the sample. Various extensions of these models accommodate imperfect information about the covariate. These include models with measurement error, time-varying covariates (Bonner and Schwarz 2006; King et al. 2008), and spatial capture–recapture (e.g., Borchers and Efford 2008; Royle and Young 2008; Royle et al. 2009). Classical “multi-state” models are also individual effects models, with a state variable that is a categorical covariate.

An important technical linkage between all of these different models is that the dimension of the parameter space is itself an unknown variable. For example, consider a version of Model Mh in which the detection probability pi of the ith individual is modeled on the logit scale:
$$ \hbox{logit}(p_{i}) = \eta_{i} $$
and ηi ∼ Normal(μ,σ2) (Coull and Agresti 1999). Under this model, the number of individual effects equals the unknown population size N. If we attempt to fit this model using Bayesian methods, such as Markov chain Monte Carlo (MCMC), each update of N changes the number of parameters in the model because ηi parameters are created or deleted depending on whether N increases or decreases from its previous value. This is a challenging MCMC problem that has been resolved using specialized algorithms such as reversible-jump MCMC (King and Brooks 2001; Schofield and Barker 2008), regarding the problem as one of model-selection (Durban and Elston 2005). These methods are difficult to implement and thus appear to be inaccessible to practitioners.

In this paper, we describe a general method for analyzing capture–recapture models with individual effects that uses data augmentation and model expansion to solve the problem of a variable-dimension parameter space. In this approach the observed dataset is augmented with an arbitrarily large number of all-zero encounter histories. The model of this augmented dataset is a zero-inflated version of the conventional, multinomial model for the observed data (Royle et al. 2007). Importantly, this new model may conveniently be fitted using WinBUGS, OpenBUGS (Lunn et al. 2009) or other MCMC-computing engines, such as PyMC (Patil et al. 2010). This approach, which was developed by Royle et al. (2007) and extended by Royle and Dorazio (2008), solves the problem of a variable-dimension parameter space in some generality across classes of capture–recapture models, wherein N is the object of inference (or otherwise important). The relative ease of implementation of our approach is important because the development of MCMC algorithms represents a distraction from the main task at hand (model development) and should never be the focus of effort for most researchers.

In the following sections, we review the technical formulation of data augmentation and illustrate its use in fitting closed-population models, including Models M0 and Mh, distance sampling, and spatially explicit capture–recapture models. We then describe how data augmentation may be used in the analysis of open populations. Along the way, we discuss the duality between models for estimation of population size and so-called “occupancy models.” In particular, the Jolly–Seber (JS) type model under data augmentation is precisely equivalent to a type of metapopulation “occupancy” model that allows for local extinction and colonization. Specifically, the JS model is a constrained occupancy model wherein an individual cannot be recruited once it has died (whereas in metapopulation type models, sites or patches can be recolonized).

Parameter-expanded data augmentation (PX-DA)

The general concept behind data augmentation is to augment the observed data with “latent data” with the intent of simplifying the analysis, usually by MCMC. Here, latent data may include missing observations, parameter values, or values of sufficient statistics (Tanner and Wong 1987; Tanner 1996). Royle et al. (2007; henceforth RDL) developed a general form of DA to simplify the analysis of capture–recapture models wherein N, the multinomial sample size or index parameter, is unknown. RDL’s development was motivated by the analysis of models with individual effects in which the dimension of the parameter space (i.e., the number of unknowns) is itself unknown. RDL’s idea was to fix the dimension of the parameter space by embedding the “complete data” (wherein N is known) into a larger dataset of fixed dimension and to analyze this larger, augmented dataset with a new model, one which provides a reparameterization of N in the conventional model of the observed (unaugmented) data. This new model of the augmented data is a zero-inflated version of the conventional known-N model and is easily fitted using standard methods of MCMC sampling (e.g., Gibbs sampling). In many cases, even classical (i.e., non-Bayesian) methods may be used to fit the model of augmented data.

Because inference is based on a new model that results from augmenting the data, RDL’s proposal is not simply data augmentation. Rather, it is what Liu and Wu (1999) refer to more formally as parameter-expanded data augmentation (PX-DA), because the conventional model of the complete data must be expanded to account for—and to estimate—the number of non-sampling zeros in the augmented data. Some basic conditions must be satisfied for the model expansion to be innocuous with respect to the inference problem (Liu and Wu 1999). In the Appendix, we show that RDL’s expanded model of the augmented data satisfies these conditions.

We use the term PX-DA in this paper to clarify the methodological context of RDL and also distinguish it from other uses of the term “data augmentation” in capture–recapture models, such as Schofield and Barker (2008) and Wright et al. (2009), in which the dataset is augmented with all-zero encounter histories, but not in a manner that yields a new model upon which inference is based.

Computational benefits

An important benefit of RDL’s use of PX-DA is that it solves the problem of a variable dimension parameter space in some generality across classes of capture–recapture models. No specialized algorithms, such as reversible-jump MCMC (Green 1995), are needed to fit these models because the dimension of the parameter space is fixed. Furthermore, by augmenting the data with all-zero capture histories, we eliminate the need to evaluate π(0), the probability that an individual in the population is not captured in any sample. In complex capture–recapture models (e.g., those with individual effects), π(0) can be difficult to calculate because it is a function of individual attributes such as distance, age, sex, or other things that influence detection or survival probabilities. These attributes are unobserved for the missing component of the population (that is, for the uncaptured individuals), which leads to what statisticians refer to as nonignorable missingness. The adjective, nonignorable, implies that the probability that an observation is “missing” depends on what is missing. A good example is distance sampling—observations far away from the observer are less likely to appear in the dataset. Thus, individuals that are not in the sample (and their distances) will tend to be farther away than those in the sample. It is certainly possible to conduct a Bayesian analysis of capture–recapture data without DA; however, such analyses require explicit calculation of π(0), which is generally difficult in complex models. See Dorazio and Royle (2005) for an especially challenging problem wherein π(0) is computed.

PX-DA versus RJ-MCMC

Data augmentation can be used alone (that is, without model expansion) to conduct a Bayesian analysis of complex, capture–recapture data. In this approach, capture histories of the n observed individuals are augmented with only N − n all-zero capture histories (and possibly with N individual-level parameters) to account for unobserved individuals in the population. While this approach avoids the difficult calculation of π(0), it also requires a specialized algorithm, such as reversible-jump MCMC (RJ-MCMC), to solve the problem of the variable-dimension parameter space because N is an unknown parameter in the model. There are several examples of capture–recapture analyses using RJ-MCMC (King and Brooks 2008; Schofield and Barker 2008; Wright et al. (2009).

A practical impediment to applications of RJ-MCMC is that implementations are model specific. For every set of candidate models, a set of full conditionals and proposal distributions must be derived for “jumping” among models. In the context of capture–recapture models, the RJ-MCMC algorithm must specify how to add or subtract parameters when different values of N (vis-a-vis different models) are visited by the sampler. To our knowledge, RJ-MCMC algorithms suitable for capture–recapture models have not been implemented in popular software programs for Bayesian analysis. Therefore, applications of RJ-MCMC are currently inaccessible to many ecologists because the details of implementation require a level of statistical and computational expertise not common among them. In contrast, applications of PX-DA are relatively simple to implement in popular software programs, such as WinBUGS or JAGS (Royle and Dorazio 2008; Link and Barker 2010). We provide several examples of such applications in later sections of this paper.

Link and Barker (2010, p. 210) assert that RDL’s use of data augmentation (i.e., PX-DA) “is equivalent to a reversible jump algorithm \(\ldots\) with a discrete uniform prior for N.” A referee also suggested that PX-DA is nothing more than a specific manifestation of the RJ-MCMC algorithm. However, our development of PX-DA for use in capture–recapture problems did not originate from considerations of RJ-MCMC (Royle et al. 2007). RJ-MCMC refers to a specific class of MCMC algorithms that are designed to analyze models with variable-dimension parameter spaces (Green 1995). Conversely, PX-DA is not an algorithm for doing MCMC. Rather, it is a reformulation of the model intended to simplify analysis by providing a straightforward, conventional implementation of MCMC. In our case, capture–recapture models reformulated by PX-DA can be analyzed seamlessly using Gibbs sampling directly from full-conditional distributions or using the Metropolis–Hastings algorithm with default proposal distributions. Consequently, PX-DA is useful because of the hierarchical construction of the model that is produced and the resulting simplicity of MCMC implementations for that model, not because it corresponds to an implementation of RJ-MCMC. Furthermore, in many cases, the expanded model of the augmented data is also relatively easy to fit using classical methods, such as maximum likelihood.

To summarize, we use PX-DA to facilitate Bayesian analysis of various classes of capture–recapture models in which N is unknown and is important to inference. Other investigators have also proposed DA for fitting these models, but their analyses are made conditional on N and therefore require RJ-MCMC to deal with the problem of a variable-dimension parameter space.

PX-DA for model M0

Consider J independent samples of a closed population of size N. If N was known, then the model would simply be a generalized linear model (GLM) of binomial responses. That is, yi, the number of encounters of individual i in J samples, could be modeled as follows:
$$ y_{i} \sim \hbox{Bin}(J,p) \; \; \; \hbox{ for}\, i=1,2,\ldots,N $$
where p denotes the probability of capture during each sampling occasion. Assuming individuals of the population are observed independently, the joint probability of these data is
$$ \begin{aligned} \Pr(y_1, \ldots, y_N | p) &= \prod_{i=1}^N \hbox{Bin}(y_i | J, p) \\ &= \prod_{k=0}^J \pi(k)^{n_k} \end{aligned} $$
where \(\pi(k) = \hbox{Bin}(k | J,p)\) and where \(n_k = \sum_{i=1}^N I(y_i = k)\) denotes the number of individuals captured k times in J surveys.
The essential problem in capture–recapture, however, is that N is not known because the number of uncaptured/missing individuals [i.e., those in the zero cell that occur with probability π(0)] is unknown. Consequently, the observed capture frequencies nk are no longer independent and their joint distribution is multinomial:
$$ n_1, n_2, \ldots, n_J \sim \hbox{Multin}(N, \pi(1), \pi(2), \ldots, \pi(J)) $$
(1)
Note that in our notation, the number of uncaptured/missing individuals is denoted by n0 = N − n, where \(n = \sum_{k=1}^J n_k\) denotes the total number of distinct individuals seen in the J samples.
Now suppose we were to adopt a Bayesian mode of inference for the observed-data model in Eq. 1. This would require prior distributions for N and p. It seems natural and innocuous to assume a discrete uniform prior on N, say all integers on the interval [0,M] for MN:
$$ N \sim \hbox{U}(0,M) $$
Similarly, we assume a uniform prior for p (\(p \sim \hbox{U}(0,1)\)); however, this prior is not directly relevant to our subsequent discussion. The \(\hbox{U}(0,M)\) prior for N is innocuous in the sense that the posterior associated with this prior is equal to the likelihood for sufficiently large M. But, more importantly and relevant for our purposes, the discrete uniform prior implies a super-population of M “pseudo-individuals” that potentially could belong to the (real) population of size N. In particular, this prior implies that there exists at most M − n potential zeros in the dataset. We will show that a proportion of these zeros are “sampling zeros” which correspond to uncaptured individuals from the population and that the multinomial model of the observed data (Eq. 1) can be expanded to allow the number of sampling zeros in the augmented data to be estimated. Therefore, our use of DA and model expansion for solving inference problems with unknown N can be thought of as originating from the choice of uniform prior on N.
One way of inducing the \(\hbox{U}(0,M)\) prior on N is by assuming the following hierarchical prior:
$$ \begin{aligned} N &\sim \hbox{Bin}(M, \psi) \\ \psi &\sim \hbox{U}(0,1) \\ \end{aligned} $$
(2)
which includes a new model parameter ψ. This parameter denotes the probability that an individual in the super-population of size M is a member of the population of N individuals exposed to sampling. Our modeling assumptions, specifically Eqs. 1 and 2, may be combined to yield a reparameterization of the conventional model that is appropriate for the augmented dataset of known size M:
$$ n_1, n_2, \ldots, n_J \sim \hbox{Multin}(M, \psi \pi(1), \psi \pi(2), \ldots, \psi \pi(J)) $$
(3)
This arises by removing N from Eq. 1 by integrating over the binomial prior distribution for N. Note that the M − n unobserved individuals in the augmented dataset have probability ψ π(0) + 1 − ψ, indicating that some of these individuals are sampling zeros (and belong to the population of size N) while others are simply “structural zeros.”
In Eq. 3, N has been eliminated as a formal parameter of the model by marginalization (integration) and replaced with the new parameter ψ. However, the full likelihood containing both N and ψ can be analyzed (see Royle et al. 2007). Furthermore, we can easily prove that the number of sampling zeros has a binomial distribution with index M − n and success parameter ψ π(0)/(ψ π(0) + 1 − ψ). Therefore, estimating ψ is closely related to estimating N. We could, for example, compute the expected value of N given n and estimates of ψ and p as follows:
$$ \hbox{E}(N | n, \psi, p) = n + (M-n) \frac{\psi \pi(0)}{\psi \pi(0) + 1-\psi}, $$
(4)
regardless of whether a Bayesian or classical mode of inference is adopted.
However, our main reason for deriving Eq. 3 is to establish the theoretical justification for a simpler, yet equivalent, model of the augmented data, which we now present and analyze using Bayesian methods. Because M, the number of individuals in the augmented population, is known, the multinomial dependence among counts of recaptured individuals is not really necessary provided an equivalent model of independent observations is used. Consider an example of augmented data in Table 1, which contains n = 6 individuals observed on J = 5 sampling occasions. Note that the augmented data (by construction) contain N binomial counts plus M − N excess, structural zeros. Thus, it is clear from this table and from the cell probabilities in Eq. 3 that an appropriate model for the augmented data is a set of independent, zero-inflated binomial observations. We can specify this model hierarchically by introducing a set of binary latent variables, \(z_{1},z_{2},\ldots, z_{M}\), to indicate whether each individual is (z = 1) or is not (z = 0) a member of the population of N individuals exposed to sampling. Thus, our model of the augmented dataset is
$$ \begin{aligned} y_{i}|{z_{i}=1} &\sim \hbox{Bin}(J, p) \\ y_{i}|{z_{i}=0} &\sim \delta(0) \\ z_{i} &\mathop{\sim}\limits^{iid} \hbox{Bern}(\psi) \\ \psi &\sim \hbox{U}(0,1) \\ p &\sim \hbox{U}(0,1) \\ \end{aligned} $$
for \(i=1, \ldots, M\). Note that the uniform prior for ψ is needed to preserve the \(\hbox{U}(0,M)\) prior for N, even though N is not a formal parameter of this model. If we were to remove the latent zi parameters from this model by integration, the joint probability of the data would be
$$ \Pr(y_1, \ldots, y_M | p, \psi) = \prod_{i=1}^M \psi \hbox{Bin}(y_i | J, p) + I(y_i=0) (1-\psi) $$
Thus, intuitively, we might think of estimating ψ and p from this model and using these estimates to compute N using Eq. 4. However, removing the zi parameters is actually counterproductive because they also can be used to estimate N. Under the assumptions of the zero-inflated model, \(z_{i} \mathop{\sim}\limits^{iid} \hbox{Bern}(\psi)\); therefore, N is a function of these parameters:
$$ N = \sum_{i=1}^{M} z_{i}. $$
Table 1

An augmented dataset with n = 6 observed individuals and J = 5 samples

indiv i

Sample occasion

yi

zi

1

2

3

4

5

1

1

0

0

1

0

2

1

2

0

1

0

0

1

2

1

3

1

0

0

1

0

2

1

4

1

0

1

0

1

3

1

5

0

1

0

0

0

1

1

n = 6

1

0

0

0

0

1

1

n + 1

0

0

0

0

0

0

1

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

N

0

0

0

0

0

0

1

N + 1

0

0

0

0

0

0

0

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

M

0

0

0

0

0

0

0

We emphasize that M is fixed in the new model of the augmented data. As a result, MCMC is a relatively simple proposition using standard Gibbs Sampling. We have to update {zi}, p, and ψ. Updating each zi parameter is easily done by computing a random draw from a Bernoulli distribution (Royle et al. 2007). Moreover, this is true for every class of capture–recapture models. Thus, PX-DA casts every capture–recapture model as a zero-inflated version of the original multinomial model, which is the main reason for the versatility of PX-DA. In contrast, analysis by RJ-MCMC is based on analyzing the complete-data likelihood (i.e., the original multinomial model) associated with an augmented set of “complete data” whose size N is unknown (King and Brooks 2008; Schofield and Barker 2008). In PX-DA, these complete data are included in a larger dataset of known size M and an entirely different model—the zero-inflated multinomial—is therefore needed in the analysis.

Model M0 in BUGS

The augmented data are given by the vector of frequencies \(y_{1},\ldots,y_{n},0, 0, \ldots, 0\). The zero-inflated model of the augmented data combines the model of the latent variables, \(z_{i} \sim \hbox{Bern}(\psi)\) with the conditional on z = 1 binomial model:
$$ \begin{aligned} y_{i} | z_{i} &= 0 \sim \delta(0) \\ y_{i}|z_{i} &= 1 \sim \hbox{Bin}(J,p) \end{aligned} $$
The instructions for describing this model in the BUGS language and for estimating N are given in Panel 1. We note also that the OpenBUGS ecology examples module contains a closed population example analyzed using PX-DA.
Panel 1

Model M0 described in the BUGS model specification language

p ~ dunif (0,1)

psi ~ dunif (0,1)

# nind = number of individuals captured at least once

# nz = number of uncaptured individuals added for PX-DA

for(i in 1: (nind+nz)) {

    z[i] ~ dbern (psi)

mu[i] ← z[i]*p

    y[i] ~ dbin (mu[i],J)

}

N ← sum(z[1:(nind+nz)])

Example: tiger data

To illustrate the results of such an analysis, we use a dataset collected by K.U. Karanth and colleagues from their long-term study of tiger populations (e.g., Karanth 1995; Karanth and Nichols 1998; Karanth et al. 2006). The specific data used here are encounter histories on 44 individuals sampled using an array of camera traps on the Nagarahole reserve, India, in 2006. There were 120 unique camera trap locations (Fig. 1) and J = 12 nightly sample intervals. In fact, the 120 traps were operated in blocks of 30 rotated every 12 nights. Here, we regard them as 120 traps operated simultaneously for 12 nights. The data were augmented with M − n = 400 (M = 444) all-zero encounter histories. The posterior distributions of the model parameters are shown in Fig. 2. In particular, the posterior mean of N under this model is 77.17 and a 95% posterior interval is (57,109). We revisit these data later in the context of more complex models.
Fig. 1

Camera-trapping study area, Nagarahole reserve, India

Fig. 2

Posterior frequency histograms for parameters of model M0 fitted to the tiger camera-trapping data

Heuristic development of PX-DA based on occupancy models

There is a formal duality between “occupancy” models and closed-population models that provides a nice heuristic motivation for PX-DA. This duality was developed in Royle et al. (2007) and in Royle and Dorazio (2008) for both closed- and open-population models. The basic idea originates from J.D. Nichols (see Nichols and Karanth 2002).

In so-called occupancy models (MacKenzie et al. 2002), the sampling situation is that M sites, or patches, are sampled multiple times to assess whether a species occurs at each site. This yields encounter data such as that illustrated in the left panel of Table 2. The important problem is that a species may occur at a site, but go undetected, yielding the “all-zero” encounter histories. However, some of the all-zeros may well correspond to sites where the species in fact does not occur. Thus, while the zeros are observed there are too many of them and, in a sense, the inference problem is to allocate the zeros into “structural” and “sampling” zeros. More formally, inference is focused on the parameter ψ, the probability that a site is occupied, which is the complement of the zero-inflation parameter as it relates to data augmentation. In contrast, in classical closed population studies, we observe a dataset as in the middle panel of Table 2 where no zeros are observed. The inference problem is, essentially, to estimate how many sampling zeros there are in a “complete” dataset of N individuals. The inference objective (how many sampling zeros?) is precisely the same for both types of problems if an upper limit M is specified. In occupancy models, this is set by design (i.e., the number of sites surveyed) whereas, in capture–recapture models, it arises from the uniform prior on N. This was recognized by Nichols who, given an occupancy dataset, discarded the all-zeros and estimated N using a closed population model, forming an estimator of ψ by \(\hat{N}/M\). The idea of DA as applied to closed population models is to reverse the problem—take a closed population dataset, and add too many zeros to create the dataset shown in the right panel of Table 2—and then analyze the augmented dataset using an occupancy type model, which is formally justified under the \(\hbox{U}(0,M)\) prior for N, as noted earlier.
Table 2

Hypothetical occupancy dataset (left), capture–recapture data in standard form (center), and capture–recapture data augmented with all-zero capture histories (right)

Occupancy data

Capture–recapture

Augmented C-R

Site

k = 1

k = 2

k = 3

Ind

k = 1

k = 2

k = 3

Ind

k = 1

k = 2

k = 3

1

0

1

0

1

0

1

0

1

0

1

0

2

1

0

1

2

1

0

1

2

1

0

1

3

0

1

0

.

0

1

0

3

1

0

1

4

1

0

1

.

1

0

1

4

1

0

1

5

0

1

1

.

0

1

1

5

1

0

1

.

0

1

1

.

0

1

1

.

0

1

1

.

1

1

1

.

1

1

1

.

0

1

1

.

1

1

1

.

1

1

1

.

1

1

1

 

1

1

1

.

1

1

1

.

1

1

1

n

1

1

1

n

1

1

1

n

1

1

1

.

0

0

0

    

.

0

0

0

.

0

0

0

    

.

0

0

0

 

0

0

0

     

0

0

0

 

0

0

0

     

0

0

0

 

0

0

0

     

0

0

0

 

0

0

0

    

N

0

0

0

.

0

0

0

    

.

0

0

0

.

0

0

0

     

0

0

0

M

0

0

0

    

.

0

0

0

        

.

.

.

.

        

.

.

.

.

        

.

.

.

.

        

M

0

0

0

Closed-population models

PX-DA is useful primarily for its generality and extensibility across different classes of models with unknown N. The main MCMC problem is focused on imputing the missing z values as a step in the MCMC algorithm. For example, in closed population models with individual effects the MCMC algorithm will typically involve 2 main steps: (1) imputing zi for each \(i=1,2,\ldots,M\) and then (2) imputing any of the missing individual effects, say xi, which can be achieved easily in most cases using simple Gibbs sampling or the Metropolis–Hastings algorithm. In addition to these two steps, we require updates of the basic structural parameters of the model (e.g., hyper-parameters). An important practical benefit of PX-DA is that it allows one to formulate and fit a large variety of models using WinBUGS and other MCMC computing engines. Royle and Dorazio (2008) provided several illustrations of data augmentation applied to closed populations, and some of those examples are repeated here.

Model Mh: individual heterogeneity

Models with heterogeneity have a long history in population size estimation. The approach taken here is to assume that detection probability is individual specific, having distribution g with density function g(p). Standard choices are the logit-normal distribution (Coull and Agresti 1999), the beta distribution (Dorazio and Royle 2003), and various others. As with model M0, we have the Bernoulli model for the zero-inflation indicator variables: \(z_{i} \sim \hbox{Bern}(\psi)\) and the model of the observations expressed conditional on the latent variables zi. For zi = 1, we have a binomial model with individual-specific pi:
$$ y_{i}|{z_{i} = 1} \sim \hbox{Bin}(J,p_{i}) $$
and otherwise \(y_{i} |{z_{i} = 0} \sim \delta(0)\). Further, we prescribe a distribution for pi. Here we assume
$$ \hbox{logit}(p_{i}) \sim {\hbox{Normal}}(\mu,\sigma^2) $$
The basic WinBUGS declaration for this model is given in Panel 2. For this analysis, we assumed a Uniform(0,1) prior for p0 = logit−1(μ). An application of model Mh for estimating species richness, analyzed by PX-DA, can be found in the ecology examples module of OpenBUGS. We note that heterogeneity models formulated under PX-DA are easily analyzed by conventional likelihood methods as zero-inflated binomial mixtures (Royle 2006).
Panel 2

Model Mh assuming that \(\hbox{logit}(p)\) has a normal distribution

p0 ~ dunif (0,1)     # prior distributions

mup ← log (p0/(1-p0))

taup ~ dgamma (.1,.1)

psi ~ dunif (0,1)

for(i in 1: (nind+nz)){

z[i] ~ dbern (psi)     # zero inflation variables

lp[i] ~ dnorm (mup,taup)     # individual effect

logit(p[i]) ← lp[i]

mu[i] ← z[i]*p[i]

y[i] ~ dbin (mu[i],J)     # observation model

}

N ← sum(z[1: (nind+nz)])

This model was fitted to the Nagarahole reserve tiger data producing the posterior distribution for N shown in Fig. 3. Posterior summaries of parameters are given in Table 3. We used M = 744 for this analysis and we note that the posterior mass is located well away from this upper bound, indicating that sufficient data augmentation was used.
Fig. 3

Posterior of N for tiger data under logit-normal version of model Mh

Table 3

Parameter estimates for Model Mh (logit-normal model for p) fitted to the Nagarahole tiger camera-trapping data

Parameter

Mean

SD

2.5%

Median

97.5%

N

113.641

48.280

65.000

99.000

248.000

μ

−3.175

0.597

−4.673

−3.021

−2.393

p0

0.046

0.020

0.009

0.046

0.084

ψ

0.154

0.066

0.082

0.135

0.334

σ

0.673

0.342

0.245

0.580

1.484

Spatial capture–recapture

Density estimation in capture–recapture studies has a long history in the literature. Recent work using so-called spatial capture–recapture (SCR) models has resolved long-standing problems with the use of ordinary closed-population models for such purposes (see Efford 2004; Borchers and Efford 2008; Royle and Young 2008; Royle et al. 2009). The basic idea with these models is to introduce a latent variable, s, for each individual that represents, conceptually, that individual’s home range or “activity” center. Then, encounter probability is expressed as a function of that latent variable. We consider a typical camera-trapping problem here. Suppose an array of J camera traps is used to observe a population of individuals over some period of time (say K nights). The array of traps produces individual- and trap-specific encounter frequencies yij for individual i and trap j. For example, for the tiger data, we have K = 12 nightly intervals and the data may be processed into the frequency of nightly encounters out of 12. The model is described by the Bernoulli model for the DA indicator variables \(z_{i} \sim \hbox{Bern}(\psi)\) followed by the conditional binomial model:
$$ y_{i}|{z_{i} = 1} \sim \hbox{Binomial}(K,p_{ij}) $$
As with previous models, \(y_{i} |{z_{i} = 0} \sim \delta(0)\). In SCR models, the trap- and individual-specific encounter probability is assumed to be a function of distance between individuals and traps:
$$ \hbox{logit}(p_{ij}) = a + b*dist({\bf s}_{i},{\bf x}_{j})^{2} $$
where \(dist({\bf s}_{i},{\bf x}_{j})\) is the unobserved distance between the location of trap j, xj, and individual location si. Conceptually, we view si as the center of activity or home-range center for individual i. These are latent variables, or random individual effects. The model is extended then by specifying a simple type of point process model for these latent variables. In particular, we assume
$$ {\bf s}_{i} \sim {\hbox{Unif}}({\mathcal{S}}) \; \; \; \hbox{ for }i=1,2,\ldots,N $$
where \(\mathcal{S}\) denotes the region where the closed population is located. Evidently, spatial capture–recapture models are closely related to heterogeneity models as well as to individual covariate models. Indeed, in a sense, they are conceptual intermediates because the covariate is latent, as in Model Mh, but some information is obtained by where individuals are captured.

In Bayesian analysis of these models, it is necessary to prescribe \(\mathcal{S}\) because the analysis proceeds by simulating realizations of \(s_{1},\ldots,s_{N}\) from the required posterior distribution. Individuals have to reside somewhere. In developing the problem this way, N is sensitive to the choice of \(\mathcal{S}\)—as its area increases, so too does N and vice versa. However, as \(\mathcal{S}\) becomes large, then p(si) diminishes to zero rapidly (under well-behaved models), and additional increases in \(\mathcal{S}\) are inconsequential. In particular, N will continue to increase, but density will become invariant to the size of \(\mathcal{S}\), which is a consequence of the model (which implies a constant density of individuals). This same phenomenon is relevant to the likelihood-based analyses of Borchers and Efford (2008). In particular, they parameterize the model in terms of density, and observe that it is invariant as long as integration occurs over a sufficiently large state-space.

Using data augmentation, SCR models are easily implemented in the BUGS language. For example, the specification shown in Panel 3 was used to analyze the tiger data. Note that this model uses the complementary log–log link to relate pij to distance (see Royle et al. 2009):
$$ \hbox{cloglog}(p_{ij}) = a + b*dist(s_{i},x_{j})^{2}. $$
for b < 0. A myriad of extensions of SCR models are straightforward using PX-DA and require scarcely more than an additional line or two of BUGS model description. For example, Gardner et al. (2010a) consider models with individual-level covariates, Gardner et al. (2010b) extend SCR models to open populations, and Mollet et al. (in review) apply the concept to unstructured “search-encounter” situations in which animal scat is collected and analyzed to establish identity from DNA.
Panel 3

Description of spatial capture–recapture model in the BUGS language

  a ~ dnorm(0,.1)

  b ~ dunif(0,10)

  psi ~ dunif(0,1)

for(i in 1:(nind+nz)){

   z[i] ~ dbern(psi)

   sx[i] ~ dunif(Xl,Xu)

   sy[i] ~ dunif(Yl,Yu)

for(j in 1:ntraps){

   dist2[i,j] ← (pow(sx[i]-grid[j,1],2)+pow(sy[i]-grid[j,2],2))

   cloglog(p[i,j]) ← a - b*dist2[i,j]

   muy[i,j] ← z[i]*p[i,j]

   y[i,j] ~ dbin(muy[i,j],12)

}

}

N ←  sum(z[1:(nind+nz)])

D ← N/area

This implementation assumes a complementary log-log link relating pij to distance-squared

We reconsider the tiger data under this SCR model. To emphasize the technical problem inherent in spatial capture–recapture data, we note that the nominal area of the trap array as determined by the minimum area rectangle is 772 km2. We might consider using this area and estimates of N based on Model M0 or Model Mh to estimate density. However, the precise area for which N applies is not known under those models. Therefore, whether we use the minimum area rectangle, a convex hull, a circle, or some other shape largely determines the estimated density under any particular closed-population model that fails to accommodate the spatial structure. In the present case, we defined \(\mathcal{S}\) to be a rectangle with area 1866 \(\hbox{km}^2\); so N is the number of activity centers in this polygon. The rectangle buffers the study area by approximately 7.5 km from its minimum and maximum coordinates. For the analysis, coordinates were scaled to units of 5 km (thus affecting the units of b). Posterior summaries are given in Table 4, where we see that N is substantially larger than that obtained under either of models M0 or Mh. Density D is expressed in units of tigers/100 \(\hbox{km}^2\), and the estimate of D = 12.771 is consistent with previous analyses (e.g., Royle et al. 2009; Royle and Gardner 2010).
Table 4

Parameter estimates for the spatial capture–recapture model fitted to the Nagarahole tiger camera trapping data

Parameter

Mean

SD

2.5%

50%

97.5%

N

185.377

36.198

128.000

181.000

271.000

D

12.771

2.494

8.818

12.469

18.669

ψ

0.418

0.084

0.280

0.408

0.614

σ

0.336

0.084

0.214

0.322

0.538

λ0

0.015

0.004

0.008

0.015

0.025

We close this section by summarizing our analyses of the tiger data. We expect that the estimator of N under model M0 is biased in the presence of heterogeneity in detection probability, which we know must be present given the spatial context and thus one that is not even sensible to formally test (Johnson 1999). We might approximate the existing heterogeneity using a standard model Mh (but see Link 2003). The basic problem with that approach is that we do not know the area for which the parameter N of that model applies. Finally, we can use a formal spatial model—a closed population model, extended hierarchically with a point process model to describe the juxtaposition of individuals with traps thereby accounting implicitly for heterogeneity induced by the spatial organization of traps and individuals. This model fixes the state-space of the underlying point process, so that the area over which the N individuals inhabit is well defined. The spatial model then provides an estimate of N that applies to that prescribed state-space, and hence density can be computed. In addition, the number or density of individuals in any subset of \(\mathcal{S}\) can be obtained by summarizing the posterior samples of the individual activity centers.

Individual-covariate models and distance sampling

Models containing the effects of individual-level covariates represent a widely used class of capture–recapture models. These models are similar in structure to Model Mh, except that the individual effects are observed for the n individuals that appear in the sample. PX-DA can be applied directly to this class of models. In particular, we have a basic zero-inflated binomial model of the form:
$$ \begin{aligned} z_{i} &\sim \hbox{Bern}(\psi) \\ y_{i}|{z_{i} = 1} &\sim \hbox{Bin}(J,p_{i}) \\ y_{i} |{ z_{i} = 0} &\sim \delta(0) \end{aligned} $$
We assume that pi is functionally related to a covariate xi. Here, we consider logit-linear functions of the form:
$$ \hbox{logit}(p_{i}) = a + b*x_{i}. $$
Because the individual covariate is unobserved for the N − n uncaptured individuals, we require a model to describe variation among individuals so that the sample can be extrapolated to the population. We consider a model of the form:
$$ x_{i} \sim \hbox{Normal}(\mu,\sigma^{2}) $$
As with the previous models, implementation is trivial in the BUGS language and similar MCMC black boxes. The BUGS specification is basically equivalent to that for model Mh, but we require the distribution of the covariate to be specified, along with priors for the parameters of that distribution.

Examples of individual-covariate models can be found in Royle and Dorazio (2008, Chap. 6), including a standard closed population with the individual covariate, “body mass”, which is thought to influence detectability. Other examples are provided by models of aerial survey data wherein detection probabilities of observers are specified as a function of the individual covariate, “group size” (Royle and Dorazio 2008; Royle 2009; Langtimm et al. 2010).

The model underlying distance sampling is precisely the same model as that which applies to the individual-covariate models, except that observations are made at only J = 1 sampling occasion. We won’t elaborate on most of the model details but, as before, they include a set of M zero-inflation variables zi and the binomial model expressed conditional on z (binomial for z = 1, and fixed zeros for z = 0). In distance sampling, we pay for having only a single sample (i.e., J = 1) by requiring constraints on the model of detection probability. A standard model is
$$ \log(p_{i}) = b * x_{i}^{2} $$
for b < 0, where xi denotes the distance at which the ith individual is detected relative to some reference location where perfect detectability (p = 1) is assumed. This function corresponds to the “half-normal” detection function (i.e., with b = 1/σ2). As with previous examples, we require a distribution for the individual covariate xi. The customary choice is
$$ x_{i} \sim \hbox{Uniform}(0,B) $$
wherein B > 0 is a known constant. Specification of this distance sampling model in the BUGS language is shown in Panel 4. The OpenBUGS ecology module provides a distance sampling example analyzed by PX-DA using the famous Impala data.
Panel 4

Distance sampling model in WinBUGS, using a “half-normal” detection function

b ~ dunif (0,10)

psi ~ dunif (0,1)

for(i in 1: (nind+nz)){

z[i] ~ dbern (psi)# DA Variables

x[i] ~ dunif (0,B)# B=strip width

p[i] ← exp(logp[i])# DETECTION MODEL

logp[i] ← ((x[i]*x[i])*b)

mu[i] ← z[i]*p[i]

y[i] ~ dbern(mu[i])# OBSERVATION MODEL

  }

N ← sum(z[1: (nind+nz)])

D ← N/strip area # area of transects

Open-population models

Open populations are susceptible to the demographic processes of mortality and recruitment over time and thus are parameterized in terms of survival probabilities ϕ and various parameters that describe recruitment. Cooch and White (2001) recognize at least 5 distinct parameterizations of the recruitment process, which we will not summarize here. We refer to models for open populations as JS-type models to encompass any possible parameterization of the basic demographic processes, following the initial technical developments of Jolly (1965) and Seber (1965).

In keeping with the basic strategy of our previous examples, what we do when confronted with an open-population problem is to first describe the model conditional on N and then we zero-inflate the known-N data up to a fixed size M. As with closed-population models, Bayesian analysis of the resulting model for the augmented data is straightforward (Royle and Dorazio 2008, Chap. 10). In an open population, we need to think a little bit about the operative definition of N because, in fact, N varies with time as a result of population dynamics. Perhaps naturally, we might think to define N as the total number of individuals ever alive during all sampling periods, which is the formulation of the JS model put forth by Crosbie and Manly (1985) and Schwarz and Arnason (1996). We adopt that approach here. As in the closed-population case, we assume a discrete uniform prior for N:
$$ N \sim \hbox{U}(0,M) $$
which, as before, implies a super-population of “pseudo-individuals” that leads us to zero-inflating the augmented data. Note that we perhaps should call this a “super-super-population” given that Schwarz and Arnason also used the term super-population to refer to the actual population of individuals that were ever alive; however, we will retain our consistent definition of super-population as being the known size of the augmented population of individuals.

Other recent implementations of JS models exist that also condition on N (Dupuis and Schwarz 2007; Schofield and Barker 2008). Unlike these implementations, we use PX-DA to fix the dimension of the parameter space, and analyze a model that is unconditional on N. Conversely, Dupuis and Schwarz (2007) and Schofield and Barker (2008) retain N in the model and devise specific MCMC algorithms to solve the problem of a variable-dimension parameter space. Both authors use DA to formulate the model for missing state variables but not to fix the dimension of the parameter space. That is, they augment up to N, not M [see “Parameter-expanded data augmentation (PX-DA)”]. Durban and Elston (2005) devised an MCMC implementation for closed capture–recapture models in which they also fixed the dimension of the dataset by adding a collection of all-zero encounter histories. However, they did not exploit the fact that the model for the augmented data can be simplified to yield a convenient MCMC implementation. Finally, the analysis of open populations presented by Link and Barker (2010, Chap. 11, p. 263) uses PX-DA and exploits the resulting model for the augmented data as in Royle and Dorazio (2008, Chap. 10) and described subsequently.

Data structure

The data structure for a JS-type model is an array of individuals × primary periods, which we henceforth refer to as “years.” A sample dataset is shown in Table 5. These data are frequencies of captures of Microtus over 6 years and were obtained from J.D. Nichols (the data appear in Williams et al. 2002). The data were obtained using J = 5 sample occasions during each of 6 primary periods (roughly 2-week intervals), consistent with the “robust design” (Pollock 1982).
Table 5

JS datasets under the Robust design with T = 6 years and J = 5 secondary samples

Left: a dataset like we might observe, absent “all zero” encounter histories. Right: a dataset conditional on N which has some all-zero encounter histories during each primary period

Hierarchical model development

In open-population models, two distinct processes combine to produce individual encounter histories: (1) the detection/encounter process, which describes how individuals appear in the sample, and (2) population dynamics (survival/recruitment), which dictate when individuals can appear in the sample. In what follows, we provide the hierarchical formulation of the JS-type model conditional-on-N that was given in Royle and Dorazio (2008, Chap. 10). This formulation begins by defining the two random variables. One is y(it), the observed number of captures of individual i out of J samples in year t. The other is a latent state variable, z(it), for the “alive state” of individual i in year t (i.e., z = 1 if alive; 0 of not alive).

Observation Model—The observation model is the same as it is in the closed-population models. For a population sampled J times in each period, then
$$ y(i,t) \mid (z(i,t)=1) \sim \hbox{Bin}(J,p) $$
Otherwise, if z = 0, then y(it) is 0 with probability 1. If we set J = 1, this is the standard Jolly-Seber case. Identifiability issues arise and one should study that situation to understand better what can be estimated (see Cooch and White 2001).
State Process Model—It remains to specify a model for the state variable z(it). That accommodates the basic demographic processes of survival and recruitment. The standard model for survival in discrete time is a first-order Markov model of the form
$$ z(i,t) \mid z(i,t-1) \sim \hbox{Bern}( \phi_{t} z(i,t-1) ) $$
Thus, if an individual is alive at time t − 1, then z(i,t) = 1 with probability ϕt; otherwise, z(i,t) = 0.
The recruitment component of this model merits more discussion. Cooch and White (2001) recognize at least 5 distinct parameterizations of the recruitment process. By and large, they are equivalent in information content but subject to differing interpretations of parameters. The Schwarz-Arnason (1996) parameterization makes use of so-called “entrance probabilities” in which the time period of recruitment for individual i, say ri, is a multinomial trial. An analogous representation (see Royle and Dorazio (2008, Ch 10), which is convenient for the “known-N” situation, is based on “conditional entrance probabilities”—the probability of entry into the population given “not previously entered.” This expresses the notion that the pool of N available individuals is depleted over time—as in removal sampling of individuals from a single population—but the population from which removal is occurring is the population of individuals available to be recruited; once removed, these individuals are added to the population of alive individuals. To express this model, let r(it) denote a binary covariate that indicates whether the ith individual is “recruitable” (r = 1) or not (r = 0) at the beginning of primary sampling period t. In a Bayesian analysis, the model is formulated conditional on the past latent states z(it) for each individual. Thus, given these variables, the r(it) variable is a deterministic function of this state history. In particular, for a T = 3 study,
$$ \begin{aligned} r(i,1) &= 1 \\ r(i,2) &= (1-z(i,1)) \\ r(i,3) &= (1-z(i,1))(1-z(i,2)) \end{aligned} $$
For t > 1, it is clear that r(it) evaluates to 0 if an individual was ever alive, whereas, if an individual was never alive, then r(i,t) = 1. Thus, the state process model can be expressed naturally as a dependent sequence of Bernoulli random variables by using the covariate r(it). For T = 3, the model is
$$ \begin{aligned} z(i,1) &\sim \hbox{Bern}(\gamma_{1} r(i,1)) \\ z(i,2) &\sim \hbox{Bern}( \phi_{1}z(i,1) + \gamma_{2} r(i,2)) \\ z(i,3) &\sim \hbox{Bern}( \phi_{2}z(i,2) +\gamma_{3} r(i,3)) \end{aligned} $$
Note that conditioning of z(it) on the past is implicit here. In addition, conditional on N, γ3 = 1 so that all of the individuals available at time t = 3 are recruited. Specification of the basic JS-type model using this parameterization of entrance probabilities is straightforward in the BUGS language. For N known (say NNsuper), and in the case of robust design data, this is shown in Panel 2.
Under this model, we can define recruitment and annual population size as functions of the latent state variables z(it), according to:
$$ \begin{aligned} R_{t} &= \sum_{i} (1-z(i,t-1))z(i,t) \\ N_{t} &= \sum_{i} z(i,t) \end{aligned} $$
Also, the relationship between the entrance probabilities of Schwarz and Arnason (1996) and the “conditional” entrance probabilities is given by
$$ \begin{aligned} \pi_{1} &= \gamma_{1} \\ \pi_{2} &= (1-\gamma_{1})\gamma_{2} \\ \pi_{3} &= (1-\gamma_{1})(1-\gamma_{2})\gamma_{3} \end{aligned} $$
(again, note that γ3 = 1 in the T = 3 case). In a Bayesian analysis of this model, we can put a Dirichlet prior distribution on the parameters {πt} or use independent \(\hbox{U}(0,1)\) priors on the γ parameters as we have done in the BUGS specification for convenience. We note that such a prior does not correspond to a uniform prior on the entrance probabilities (Royle and Dorazio 2008, p. 336) and so this may not be desirable.

Unknown N

To deal with the fact that N is unknown, we use PX-DA. In this case, the JS-type model is similar to closed-population models because it can be shown (Royle and Dorazio 2008, Chap. 10) that the conditional entrance probabilities are confounded with the PX-DA parameter ψ. That is, we define new entrance probabilities: \(g_{t} = \gamma_{t} \psi\) which is the probability that an individual in the augmented data is a member of the real population and recruited in year t (alternative parameterizations are possible, see Royle and Dorazio 2008, Chap. 10). Thus, use of PX-DA in the open-population model merely changes the interpretation of the recruitment parameters. In the BUGS language, the model is shown in Panel 6. Beyond changing the names of a few quantities, the model is equivalent to that in Panel 5.
Panel 5

Definition of a basic Jolly-Seber type model with fixed super-population size, Nsuper, and “conditional entrance probabilities” (γt)

for (t in 1:T){

phi[t] ~ dunif (0,1)

p[t] ~ dunif (0,1)

gamma[t] ~ dunif (0,1)

N[t] ← sum (z[1:Nsuper,t])

}

for (i in 1:Nsuper){

z[i,1] ~ dbin (gamma[1],1)

mu[i] ← z[i,1]*p[1]

y[i,1] ~ dbin (mu[i],J[1])

r[i,1] ← 1

for(t in 2:T){

   survived[i,t] ← phi[t]*z[i,t-1]

   r[i,t] ← r[i,(t-1)]*(1-z[i,t-1])

   muz[i,t] ← survived[i,t] + gamma[t]*r[i,t]

   z[i,t] ~ dbin (muz[i,t],1)

   muy[i,t] ← z[i,t]*p[t]

   y[i,t] ~ dbin (muy[i,t],J[t])

}

}

Panel 6

Definition of a basic Jolly-Seber type model with unknown super-population size, Nsuper, using PX-DA

for (t in 1:T){

phi[t] ~ dunif (0,1)

p[t] ~ dunif (0,1)

g[t] ~ dunif (0,1)

N[i] ← sum (z[1:M,i])

}

for (i in 1:M){

z[i,1] ~ dbin (g[1],1)

mu[i] ← z[i,1]*p[1]

y[i,1] ~ dbin (mu[i],J[1])

r[i,1] ← 1

for (t in 2:T){

   survived[i,t] ← phi[t]*z[i,t-1]

   r[i,t]<- r[i,(t-1)] * (1-z[i,t-1])

   muz[i,t] ← survived[i,t] + g[t]*r[i,t]

   z[i,t] ~ dbin (muz[i,t],1)

   muy[i,t] ← z[i,t]*p[t]

   y[i,t] ~ dbin (muy[i,t],J[t])

}

}

for (i in 1:M){# Compute Nsuper

Nind[i] ← sum(z[i,1:T])

Nalive[i] ← 1-equals(Nind[i],0)

}

Nsuper ← sum(Nalive[1:M])

Example: Microtus data

We provide an application of the JS model using data that were kindly provided by J.D. Nichols. These data also appear in Williams et al. (2002). We consider a model that includes heterogeneity in survival probability among individuals according to
$$ \hbox{logit}(\phi_{i,t}) \sim \hbox{Normal}(\mu_{t},\sigma^{2}). $$
This formulation is similar to that considered by Gimenez et al. (2007) and Royle (2008); however, in both of those studies, the model was specified conditional-on-capture (i.e., in a CJS model; Lebreton et al. 1992). In such analyses, the parameter σ2 is not directly relevant to population-level heterogeneity, and survival should be less heterogeneous than the population quantity. To include this type of structure in WinBUGS, we need to modify the definition of the parameter ϕ slightly to allow variation by individual and year:

for (i in 1:m){

   eta[i] ~ dnorm(0,tauphi)

   for(t in 2:T){

     ...

   logit(phi[i,t]) ← phiyr[t]+eta[i] # yr + individual

   survived[i,t] ← phi[i,t]*z[i,t-1]

     ...

}

}

Results of fitting the JS-type model to the Microtus data, with and without individual heterogeneity in survival, are summarized in Table 6. We note that the estimated standard deviation (2.57) along with the posterior interval of (1.41, 2.49) suggests considerable heterogeneity in survival probability among individuals. We note that estimates of p and population size are little affected whereas estimates of yearly survival are strongly influenced by the presence of heterogeneity. This suggests that inference about the presence of individual heterogeneity in survival might be crucial to obtaining accurate estimates of this important vital rate.
Table 6

Estimates of open-population model parameters for Microtus data under models with (left half of table) and without individual heterogeneity on ϕ

 

Heterogeneity (survival) model

Ordinary JS model (SA)

Mean

SD

2.5%

50%

97.5%

Mean

SD

2.5%

50%

97.5%

N1

56.37

0.62

56.00

56.00

58.00

56.35

0.62

56.00

56.00

58.00

N2

75.81

1.88

73.00

76.00

80.00

77.97

2.21

74.00

78.00

83.00

N3

55.47

1.33

54.00

55.00

59.00

56.98

1.82

54.00

57.00

61.00

N4

60.08

1.12

59.00

60.00

63.00

61.10

1.48

59.00

61.00

64.00

N5

51.34

0.60

51.00

51.00

53.00

51.84

0.90

51.00

52.00

54.00

N6

78.99

1.47

77.00

79.00

82.00

80.03

1.72

77.00

80.00

84.00

Nsuper

173.72

1.73

171.00

174.00

178.00

173.44

1.63

171.00

173.00

177.00

p1

0.64

0.03

0.58

0.64

0.70

0.64

0.03

0.58

0.64

0.70

p2

0.44

0.03

0.39

0.44

0.50

0.44

0.03

0.38

0.44

0.49

p3

0.44

0.03

0.38

0.44

0.50

0.44

0.03

0.37

0.43

0.50

p4

0.51

0.03

0.45

0.51

0.57

0.51

0.03

0.45

0.51

0.57

p5

0.57

0.03

0.51

0.57

0.63

0.57

0.03

0.51

0.57

0.63

p6

0.53

0.03

0.48

0.53

0.58

0.53

0.03

0.48

0.53

0.58

ϕ1

0.96

0.03

0.87

0.96

0.99

0.85

0.05

0.73

0.85

0.94

ϕ2

0.50

0.10

0.30

0.50

0.70

0.54

0.06

0.42

0.54

0.65

ϕ3

0.56

0.12

0.31

0.57

0.77

0.70

0.06

0.57

0.70

0.82

ϕ4

0.33

0.12

0.12

0.33

0.57

0.57

0.06

0.44

0.57

0.69

ϕ5

0.79

0.11

0.53

0.81

0.96

0.86

0.05

0.75

0.87

0.95

σ

2.57

0.71

1.41

2.49

4.38

     

γ1

0.15

0.02

0.12

0.15

0.19

0.21

0.03

0.16

0.21

0.26

γ2

0.10

0.02

0.06

0.10

0.13

0.14

0.02

0.10

0.14

0.19

γ3

0.06

0.02

0.03

0.06

0.09

0.09

0.02

0.05

0.09

0.13

γ4

0.08

0.02

0.05

0.08

0.12

0.13

0.03

0.08

0.13

0.18

γ5

0.07

0.02

0.04

0.07

0.11

0.12

0.03

0.07

0.12

0.18

γ6

0.15

0.02

0.11

0.15

0.20

0.27

0.04

0.19

0.26

0.35

Individual effects were modeled as normal random variables with variance σ2 on the logit-survival scale

Relationship to models of occupancy dynamics

As is the case with closed-population models formulated under PX-DA, open-population models also have a close correspondence to “occupancy” models of metapopulation dynamics (in particular, see MacKenzie et al. 2003; the present formulation follows Royle and Kéry 2007, and Royle and Dorazio 2008, Chap. 9). For such models the state variable z(it) is the occupancy status of patch (site) i in period (year) t, and this state variable is subject to the dynamic processes of “local extinction” and “local colonization”. The complement of local extinction probability is precisely equivalent to a survival probability, i.e., \(\epsilon_{t} \!=\! 1\!-\! \phi_{t}\), which is the probability that an occupied site survives from period t to t + 1:
$$ \phi_{t} = {Pr}(z(i,t \! + \! 1) \! = \! 1 | z(i,t) \! = \!1). $$
Local colonization probability, γt, is the probability that an unoccupied site at t becomes occupied at t  +  1:
$$ \gamma_{t} = {Pr}(z(i,t\!+\!1)\!=\!1|z(i,t)\!=\!0) $$
The dynamic occupancy model is somewhat simpler than the JS model being a first-order, Markov process of the form
$$ z(i,t+1) \sim \hbox{Bern}( \phi_{t}z(i,t) + \gamma_{t}(1-z(i,t)) ) $$
and initial state model:
$$ z(i,1) \sim \hbox{Bern}(\gamma_{1}) $$
This model differs from the JS-type model because a site can suffer a local extinction and then be re-colonized subsequently. Thus, a site is “recruitable” as long as its immediately preceding value is 0 and earlier values of the state variable z prior need not be considered. Conversely, in JS-type models, once an individual dies it must remain dead. Naturally then, the JS model can be described precisely as a constrained version of the dynamic occupancy model—individuals cannot be “reborn” once they die. In particular, the JS state model arises by constraining the recolonization probability to be 0. Motivated by this Royle and Dorazio (2008, Chap. 9) and Bled et al. (2010) provide a generalization of the occupancy state model that allows for distinct initial and recolonization probabilities.

Jolly-Seber models under PX-DA are multi-state models

Under PX-DA, “individuals” in the augmented dataset belong to one of three types: (1) alive; (2) previously alive but now dead; and (3) have not previously been alive. This is easily described using conventional “multi-state” models with 3 states, and where state 2 is an absorbing state. Clearly, this formulation is equivalent to a multi-state occupancy model as well and provides an alternative and direct implementation in the BUGS language (but one which we omit here).

Summary and conclusions

We have shown that PX-DA provides a flexible tool for analyzing closed- and open-population models with individual effects, such as random effects or covariate effects. The utility of PX-DA has been established in many different contexts, and the method has been used to solve problems of extraordinary complexity, for which solutions were unimaginable only a few years ago. For example, PX-DA effectively renders spatial capture–recapture models as ordinary individual-effects models, and PX-DA has been proposed for analysis of both closed (Royle and Young 2008; Royle et al. 2009) and open (Gardner et al. 2010b) spatial capture–recapture models. PX-DA has also been used in the formulation and analysis of metacommunity models based on species-level occurrence models (Dorazio et al. 2006; Kéry and Royle 2009) and to extend those models for open metacommunities (Kéry et al. 2009; Dorazio et al. 2010).

Our use of PX-DA in capture–recapture problems is not without limitations. For example, N is removed as a formal parameter of the model by marginalizing over a specific prior—a binomial mixture with mixing distribution given by the uniform prior assumed for ψ (Eq. 2). This prior implies only vague knowledge of N, which may not always be the case. However, in these instances, binomial mixtures can still be used to specify a flexible class of priors for N by assuming different mixing distributions for ψ. As an example, the beta-binomial prior for N arises by assuming a beta mixing distribution for ψ. Other types of hierarchical priors for N are also possible. We have extended the principles of PX-DA to accommodate variability in N among sub-populations (Converse and Royle 2010); however, it is clear that additional research is needed in the formulation of priors for use in PX-DA.

The essential concept underlying PX-DA is that excess “observations” are added to a dataset and then the new (augmented) dataset is analyzed using a new model, expanded to accommodate the augmented data. In the context of capture–recapture models, we add the “all zero” encounter histories which are not, in practice, observable. The model for this dataset is naturally a zero-inflated version of either a binomial or a multinomial base model. Thus, PX-DA yields a generally consistent formulation for the analysis of both closed- and open-population models of all types. In addition, in doing so, PX-DA unifies the inference framework across a huge range of models that in the classical literature are treated in a relatively diffuse manner as unrelated “black boxes” and named procedures. We have identified interesting parallels between PX-DA-based capture–recapture models for closed populations and “occupancy models” and between open-population models and “multi-state” occupancy models. These parallels suggest the potential for convergence of many software platforms that have been developed for the analysis of ecological data. This convergence has already been achieved by Bayesian analysis using software such as WinBUGS, OpenBUGS, and other MCMC engines.

Acknowledgments

We thank Beth Gardner and Elise Zipkin for reviewing drafts of this manuscript. We thank Ullas Karanth (camera-trapping data) and Jim Nichols (Microtus data) for making data from their research available for our use. Use of trade, product, or firm names does not imply endorsement by the U.S. Government.

Copyright information

© Dt. Ornithologen-Gesellschaft e.V. (outside the USA) 2010

Authors and Affiliations

  1. 1.USGS Patuxent Wildlife Research CenterLaurelUSA
  2. 2.USGS Southeast Ecological Science CenterGainesvilleUSA
  3. 3.Department of StatisticsUniversity of FloridaGainesvilleUSA

Personalised recommendations