1 Introduction

The models presented within this paper focus on the estimation of the size of a closed population along with capture and transition probabilities between discrete states using real ecological capture–recapture data on a population of great crested newts Triturus cristatus. An assumption often made in the modelling of capture–recapture data is that of homogeneity in the probability of capture. When estimating the size of a closed population, one in which the population being sampled remains constant across all capture occasions, violations of this assumption can lead to biased estimates of abundance (Seber 1982; Hwang and Huggins 2005). A number of models have been proposed that relax this homogeneity assumption. In particular, Pollock (1974) and Otis et al. (1978) proposed a set of eight closed population capture–recapture models. These models allow the probability of capture to be affected by three factors: time (capture probabilities vary by occasion); behaviour (probability of initial capture is different to all subsequent recaptures) and heterogeneity (each individual has a different capture probability). These models have been fitted using a variety of methods including maximum likelihood (Otis et al. 1978; Agresti 1994; Norris and Pollock 1996; Coull and Agresti 1999; Pledger 2000), the jackknife (Burnham and Overton 1978, 1979; Pollock and Otto 1983), moment methods based on sample coverage (Chao et al. 1992) and Bayesian methods (Casteldine 1981; Gazey and Staley 1986; Smith 1988, 1991; George and Robert 1992; Diebolt and Robert 1993; Ghosh and Norris 2005; King and Brooks 2008). To specifically address the problem of heterogeneity in the capture probabilities, a variety of models have been proposed, including finite mixtures (Diebolt and Robert 1993; Agresti 1994; Norris and Pollock 1996; Pledger 2000) and infinite mixtures (Coull and Agresti 1999; Dorazio and Royle 2003). A comparison of examples of the two types of mixture through simulation are presented in Pledger (2005) and Dorazio and Royle (2005). An issue that commonly arises when estimating the size of closed populations is that different individual heterogeneity models which may be deemed to fit the data equally well can give rise to very different estimates of the abundance (Link 2003, 2006). An extended mixture model which provides a convenient framework for model selection is presented in Morgan and Ridout (2008); see also Holzmann et al. (2006).

We extend the previous models for closed capture–recapture data to account for individual heterogeneity where the “state” of an individual is also recorded as a discrete variable. We consider the case where the discrete state is time-varying; for the time-invariant case, see, for example, King and McCrea (2019) for a review. In standard closed capture–recapture studies, for each capture occasion, individuals from a closed population are sampled, returned to the population, and on subsequent occasions attempts are made to recapture them. Individuals within the population are marked when initially captured. If the marking method used assigns a unique mark to each captured individual (e.g. individual ID tags, natural physical markings), then an individual encounter history can be constructed for each individual observed within the study. This history typically takes the form of a vector of 0s and 1s: with a 0 denoting an individual was not encountered and a 1 denoting an individual was captured. We consider the case where individuals may be observed in different discrete states. For example, states may correspond to “resting” and “foraging” or “breeding” and “not breeding”. Observed histories then correspond to whether an individual is observed or not, and, given that an individual is observed, its corresponding state. For example, an individual with encounter history

$$\begin{aligned} 0 \ 2 \ 1 \ 0 \ 2 \ 2 \end{aligned}$$

was initially captured (and marked) in state 2 on occasion 2, captured in state 1 on occasion 3, missed on occasion 4, then captured on occasions 5 and 6 in state 2. In general, individuals are able to move between the states during the study period and the capture probabilities may be dependent on the state of an individual. For example, if the states correspond to “resting” and “foraging”, the capture probabilities may be very different with a significantly higher capture probability for individuals in the “foraging” state compared to “resting”. Failure to account for the state dependence may result in biased population estimates. There may also be biological interest in the transition rates between the states (Stoklosa et al. 2012). To estimate the number of unobserved individuals (and hence the total population size), a model is fitted to the observed data, permitting the estimation of the associated model parameters and the number of unobserved individuals. The number of individuals in each state on each occasion can then be derived using the parameter estimates. For such models, a number of standard assumptions are made. These include: the population as a whole is closed (individuals cannot leave or join the population during the study period), marks cannot be lost, and individuals are identified without error [see McCrea and Morgan (2014), chapter 3, for further discussion and references therein].

The model we present can be considered a generalisation of the time-dependent multi-state closed population model of Schwarz and Ganter (1995) to a form that additionally includes trap dependence and heterogeneity in the capture probabilities. It may also be seen as a closed population capture–recapture equivalent to the Arnason–Schwarz (AS) model (Arnason 1972, 1973; Brownie et al. 1993; Schwarz et al. 1993; King and Brooks 2003; Lebreton et al. 2009), for open capture–recapture data. Initially developed for multi-site capture–recapture data, but more generally applicable to individuals captured in discrete states, the AS model is a multi-state generalisation of the Cormack–Jolly–Seber (CJS) model (Cormack 1964; Jolly 1965; Seber 1965). The CJS and AS models allow for a time dependence in the capture probabilities, with the AS model additionally able to allow capture probabilities to be state dependent. However, these models condition on the first capture of an individual and so are unable to estimate the total population size directly. Dupuis and Schwarz (2007) consider a similar multi-state extension for the Jolly–Seber model for estimating abundance in open populations, fitted within a Bayesian (data augmentation) framework.

We consider a similar AS-type state dependence in the closed population scenario, within an explicit closed-form likelihood framework where population size is estimated directly through the likelihood. In particular, we compare the performance of the new unconditional multi-state model to the existing single-state models which ignore the state information and do not include movement between states and a conditional approach where the population size is not estimated directly through the likelihood. We present the likelihood in terms of a set of minimal sufficient statistics which permits the fit of the model to be assessed using a Pearson chi-squared test.

The motivation for developing this methodology relates to a study on great crested newts. A species with protected status in Europe, individuals within the study population are captured weekly throughout the breeding season, with the additional state information referring to the pond in which the individuals are captured. Originally consisting of four ponds, the study site was extended to a total of eight ponds in 2009 with the new ponds being first colonised in the 2010 breeding season. How these new ponds have been colonised, the effectiveness of the traps to capture individuals and whether there are differences in the capture probabilities between the old and new ponds are of particular interest. Ignoring any differences between the old and new ponds, for instance differing amounts of vegetation which may be affecting the probability of capture, may lead to poorer estimates of the total population size, and for this study, the total population size and the states themselves are of interest.

In Sect. 2, we review the construction of existing single-state closed population models in terms of sets of sufficient statistics, before introducing the likelihood function for the multi-state model and considering the time-varying population size for each state in Sect. 3. The performance of the multi-state model is compared to a conditional approach and existing heterogeneity models using simulation in Sect. 4 with a particular focus on the bias and precision of the population size estimates and the ability of the new model to estimate state specific parameters and population size. The new model is applied to the data set of great crested newts in Sect. 5. The paper concludes with a general discussion in Sect. 6.

2 Single-State Closed Population Models

Consider a study with T capture occasions labelled \(t=1,\ldots ,T\) and let N denote the total population size, which is to be estimated. Let the set of encounter histories be given by \({\varvec{x}}= \{x_{it}: i=1,\ldots ,N, \ t=1,\ldots ,T\}\), where \(x_{it} = 1\) indicates individual i was captured on occasion t, and \(x_{it} = 0\) indicates individual i was not captured on occasion t. We let n denote the number of observed individuals within the study. Further, we define the set of capture probabilities \({\varvec{p}}= \{p_{it}: i=1,\ldots ,N, \ t=1,\ldots ,T\}\) where \(p_{it}\) denotes the probability individual i is captured on occasion t (we note that for generality the initial capture probability and recapture probabilities may be different). The overall likelihood expression for a closed population model can be expressed in the form,

$$\begin{aligned} L(N, {\varvec{p}}; {\varvec{x}})&\propto \frac{N!}{(N-n)!} \prod _{i=1}^{N}{\hbox {Pr(encounter history for individual } i)}. \end{aligned}$$
(1)

The existing modelling approaches for data of this type differ by the capture probability parameter dependence. Using the notation of Otis et al. (1978), we denote models by \(M_\gamma \) where \(\gamma \) describes the dependence structure placed on the capture probabilities. In general, \(\gamma \subseteq \{t, b, h\}\), where t denotes temporal dependence; b denotes behavioural dependence (or trap response); and h denotes individual heterogeneity. This leads to a total of eight different model dependencies, corresponding to the inclusion/exclusion of each of the different types of dependence, with \(M_0\) denoting the model with a constant capture probability which ignores all three dependencies. We note that given a particular form of heterogeneity, multiple models may be defined in terms of the specific dependence. For example, and of particular interest in this paper, a number of different models have been proposed to include individual heterogeneity. In particular, in the absence of additional individual covariate information, the capture probabilities have been specified as finite or infinite mixtures models. These include (with associated notation):

  • \(M_{h(k)}\): individual capture probabilities come from a mixture model with k components (Pledger 2000);

  • \(M_{h(be)}\): individual capture probabilities specified to be from an underlying beta distribution (Burnham 1972; Dorazio and Royle 2003);

  • \(M_{h(b-be)}\): individual capture probabilities come from a mixture model with two components: one component simply has a fixed capture probability, while the other component is specified to be from some underlying beta distribution (Morgan and Ridout 2008).

For a given model, the corresponding likelihood function can be specified and maximised to obtain the MLEs of the model parameters (including the beta distribution parameters for models \(M_{h(be)}\) and \(M_{h(b-be)}\)). The likelihood function given above is a function of the observed encounter histories \({\varvec{x}}\). However, this likelihood can be expressed more efficiently via the use of sufficient statistics for some of the models. In particular, for models \(M_0\) and \(M_h\), a set of sufficient statistics is the Schnabel census (Schnabel 1938), defined to be \(\{f_1,f_2,\ldots ,f_T\}\), where \(f_j\) denotes the number of individuals captured on a total of j occasions. The Schnabel census denotes the set of minimal sufficient statistics for the heterogeneity models \(M_h\); for model \(M_0\) the minimal sufficient statistics reduce further to \(f=\sum _{j=1}^{T}{j f_j}\), corresponding to simply the total number of encounters over the study. For model \(M_t\), the minimal sufficient statistics are \(\{n_1,\ldots ,n_T\}\), where \(n_t\) denotes the number of individuals captured on occasion \(t=1,\dots ,T\). For model \(M_{tb}\), minimal sufficient statistics are given by \(\{z_1,\dots ,z_T,n_2,\dots ,n_T\}\) where \(z_t\) denotes the number of individuals captured for the first time on occasion t. Finally, for model \(M_b\), the sufficient statistics can be reduced to \(\{n, y, f\}\) where \(n = \sum _{t=1}^T z_t\), corresponding to the number of observed individuals within the study; \(y = \sum _{t=1}^T(t-1) z_t\), corresponding to total number of capture occasions before initial observation summed over all captured individuals; and \(f = \sum _{t=1}^T n_t\), the total number of captures (equivalent to the equation for f given above).

The use of sufficient statistics allows for an efficient evaluation of the likelihood. In addition, they have the advantage of being able to be used to assess the performance of each of these models through the calculation of the Pearson chi-squared statistic, since the likelihood of the data is of multinomial form.

3 Multi-state Closed Population Model

We now extend the previous closed population models for standard encounter histories to those with individual time-varying discrete state information. In particular, we let \(\mathcal{R}\) denote the set of possible discrete states, which for convenience we label as \(r=1,\dots ,R\). Following the AS analogy to the CJS model, we assume that movement between these states is modelled as a first-order Markov process. We then define the following model parameters:

  • \(p_t(r)\): the probability an individual is initially captured at time \(t=1,\dots ,T\) given that the individual is in state \(r \in \mathcal{R}\) at this time;

  • \(c_t(r)\): the probability an individual is recaptured at time \(t=2,\dots ,T\) given that the individual is in state \(r \in \mathcal{R}\) at this time;

  • \(\psi _t(r,s)\): probability an individual is in state \(s \in \mathcal{R}\) at time \(t+1\), given that an individual is in state \(r \in \mathcal{R}\) at time \(t=1,\dots ,T-1\);

  • \(\alpha (r)\): probability an individual is in state \(r \in \mathcal{R}\) at time \(t=1\).

For notational convenience, we let \({\varvec{p}}= \{p_t(r): t=1,\dots ,T, \ r \in \mathcal{R}\}\), \({\varvec{c}}= \{c_t(r): t=2,\dots ,T, \ r \in \mathcal{R}\}\), \({\varvec{\psi }}= \{\psi _t(r,s): t=1,\dots ,T-1, \ r,s \in \mathcal{R}\}\) and \({\varvec{\alpha }}= \{\alpha (r): r \in \mathcal{R}\}\). We note that by definition, \(\sum _{r \in \mathcal{R}}{\alpha (r)} = 1\). To retain model identifiability, the recapture probabilities are specified as a function of the initial capture probabilities, such that \(\hbox {logit}\ c_t(r) = \left( \hbox {logit}\ p_t(r)\right) + \beta \), where \(\beta \) denotes the trap dependence; \(\beta < 0\) corresponds to a trap shy response; and \(\beta > 0\) a trap happy response [see, for example, Chao (2001); King and Brooks (2008) for the analogous single-state case].

3.1 Likelihood Formulation

The likelihood function is again of the same form as given in Eq. (1), where now the probability of the encounter history includes not only detection/non-detection at each time but also the associated discrete state. In order to evaluate the likelihood efficiently, we follow King and McCrea (2014) and consider all possible partial encounter histories that could be observed corresponding to (i) the beginning of the study to initial capture; (ii) successive captures; and (iii) final capture to the end of the study. This leads to the following sufficient statistics:

  1. (i)

    \(z_t(r)\): the number of individuals that are observed for the first time at time \(t=1,\dots ,T\) in state \(r \in \mathcal{R}\);

  2. (ii)

    \(n_{t_1,t_2}(r,s)\): the number of individuals that are observed at time \(t_1 =1,\dots ,T-1\) in state \(r \in \mathcal{R}\) and next observed at time \(t_2 = t_1+1,\dots ,T\) in state \(s \in \mathcal{R}\);

  3. (iii)

    \(v_t(r)\): the number of individuals that are observed for the last time at time \(t=1,\dots ,T-1\) in state \(r \in \mathcal{R}\).

For notational convenience, we set \({\varvec{z}}= \{z_t(r): t=1,\dots ,T, \ r \in \mathcal{R}\}\), \({\varvec{n}}= \{n_{t_1,t_2}(r,s): t_1 = 1,\dots ,T-1, \ t_2 = t_1+1,\dots ,T, \ r,s \in \mathcal{R}\}\) and \({\varvec{v}}= \{v_t(r): t = 1,\dots ,T-1, \ r \in \mathcal{R}\}\).

In order to express the likelihood as a function of the above sufficient statistics, we need to calculate the probabilities for each of the associated partial encounter histories. In deriving these probabilities, we consider similar notation to King and McCrea (2014). We let \(Q_{t_1,t_2}(r,s)\) denote the probability that an individual in state \(r \in \mathcal{R}\) at time \(t_1=1,\dots ,T-1\) is in state \(s \in \mathcal{R}\) at time \(t_2=t_1+1,\dots ,T\) and not observed on any occasions between \(t_1\) and \(t_2\). The form of this probability is dependent on whether an individual has yet to be captured for the first time or has been previously captured on at least one occasion. We let \(Q_{t_1,t_2}^P(r,s)\) denote the former and \(Q_{t_1,t_2}^C(r,s)\) the latter scenario. (If the capture probabilities are not behaviour dependent, this distinction is not required.) Then, it follows immediately that

$$\begin{aligned} Q_{t_1,t_2}^P(r,s)&= \left\{ \begin{array}{ll} \psi _{t_1}(r,s) &{} t_2=t_1+1 \\ \sum _{u \in \mathcal R}{\psi _{t_1}(r,u) (1-p_{t_1+1}(u)) Q_{t_1+1,t_2}^P(u,s)} &{} t_2 = t_1+2,\dots ,T. \\ \end{array} \right. \end{aligned}$$

\(Q_{t_1,t_2}^C(r,s)\) follows analogously using the appropriate recapture probabilities. We now consider the probabilities associated with each of the above sufficient statistics. We begin by considering the probabilities associated with an individual being observed for the first time [i.e. case (i) and statistic \({\varvec{z}}\)]. Let \(\zeta _t(r)\) denote the probability an individual is initially captured at time \(t=1,\dots ,T\) in state \(r \in \mathcal{R}\). Then, using a probabilistic argument we have

$$\begin{aligned} \zeta _t(r) = \left\{ \begin{array}{ll} p_1(r) \alpha (r) &{} t = 1 \\ p_t(r) \sum _{u \in \mathcal R}{\alpha (u) (1-p_1(u)) Q_{1,t}^P(u,r)} &{} t =2,\dots ,T. \\ \end{array} \right. \end{aligned}$$

We now consider case (ii) (associated with statistic \({\varvec{n}}\)) and the probability of being recaptured, conditional on the previous capture. Let \(O_{t_1,t_2}(r,s)\) denote the probability an individual in state \(r \in \mathcal{R}\) at time \(t_1=1,\dots ,T-1\) is next recaptured in state \(s \in \mathcal{R}\) at time \(t_2 = t_1+1,\dots ,T\). Then, by definition,

$$\begin{aligned} O_{t_1,t_2}(r,s) = Q_{t_1,t_2}^C(r,s) c_{t_2}(s). \end{aligned}$$

The final case (iii) (associated with statistic \({\varvec{v}}\)) considers the probability an individual is not observed again within the study, following their final capture. Let \(\chi _t(r)\) denote the probability an individual in state \(r \in \mathcal{R}\) at time \(t=1,\dots ,T-1\) is not observed again during the study. Then, for all \(r \in \mathcal{R}\), it follows that

$$\begin{aligned} \chi _t(r) = \left\{ \begin{array}{ll}1 &{} t = T \\ \sum _{u \in \mathcal R}{Q_{t,T}^C(r,u) (1-c_T(u))} &{} t = 1,\dots ,T-1. \end{array}\right. \end{aligned}$$

By definition, an individual observed at the last capture time is clearly not able to be observed again within the study, i.e. the probability it is not seen again is 1 (this means that we do not need to consider this term within the likelihood expression).

Finally, in order to permit estimation of the total population size, we let \(\rho \) denote the probability an individual is not observed within the study. From the law of total probability (considering all possible states an individual may be in at the first and last capture time), it follows that

$$\begin{aligned} \rho&= \sum _{r,s \in \mathcal R}{\alpha (r) (1-p_1(r)) Q_{1,T}^P(r,s) (1-p_T(s))}. \end{aligned}$$
(2)

The corresponding unconditional likelihood function, specified as a function of the sufficient statistics, is of the form,

$$\begin{aligned} L(N, {\varvec{p}}, {\varvec{c}}, {\varvec{\psi }}, {\varvec{\alpha }}; {\varvec{n}}, {\varvec{v}}, {\varvec{z}})&\propto \frac{N!}{(N-n)!} \rho ^{N-n} \prod _{t=1}^{T} \prod _{r \in \mathcal R} \left[ \zeta _t(r)^{z_t(r)} \right] \\&\quad \times \prod _{t=1}^{T-1} \prod _{r \in \mathcal R} {\left[ \chi _t(r)^{v_t(r)} \prod _{\tau =t+1}^{T} \prod _{s \in \mathcal R} O_{t,\tau }(r,s)^{n_{t,\tau }(r,s)} \right] }. \end{aligned}$$

The above likelihood allows for temporal effects, behavioural effects and individual heterogeneity (in the form of discrete covariates), which we represent notationally as \(M_{tbh}^{R}\), where the superscript, R, denotes the number of discrete states. Sub-models can be derived by specifying restrictions on the model parameters. In particular, the basic dependence structures can be described with

  • \(M_0^R\): \(p_t(r) = c_t(r) = p\), for all \(r \in \mathcal{R}\) and \(t=1,\dots ,T\);

  • \(M_t^R\): \(p_t(r) = c_t(r) = p_t\), for all \(r \in \mathcal{R}\) and \(t=1,\dots ,T\);

  • \(M_b^R\): \(p_t(r) = p\) and \(c_t(r) = c\), for all \(r \in \mathcal{R}\) and \(t=1,\dots ,T\);

  • \(M_h^R\): \(p_t(r) = c_t(r) = p(r)\), for all \(r \in \mathcal{R}\) and \(t=1,\dots ,T\);

with associated restrictions for models with multiple dependencies. We note that the case of heterogeneous capture probabilities, \(M_h^R\), is fully determined by the additional discrete state information where a capture probability is estimated for each discrete state and remains constant for each state across all capture occasions.

Evaluating the likelihood through the sufficient statistics uses recursions similar to those in hidden Markov models (HMMs) but in more efficient forms. In the HMM framework, the likelihood considers each individual encounter history in turn. By using more efficient sufficient statistics, we are able to reduce the number of operations required to calculate the likelihood. This is achieved by using the probabilities associated with each of the sufficient statistics for multiple partial histories.

The likelihood estimates the total population size N. This is the number of individuals in the population on each capture occasion (since a closed population remains constant). The number of individuals in each state on each occasion can be estimated using a forward–backward-type algorithm. Typically applied to hidden Markov models (HMMs) the forward–backward algorithm calculates the conditional state probabilities on each occasion for a given observation sequence. We use these conditional probabilities calculated for the observed capture histories and the estimated total population size to obtain estimates of state-dependent abundance. Our approach differs from the typical HMM application since the states are partially observed (uncertainty in the state of an individual only occurs when they are not captured). Further details on this approach are presented in the online supplementary material Appendix A and are demonstrated in the simulation study and newt application below.

3.2 Conditional and Unconditional Approaches

Bishop et al. (1975) classified population size modelling into both conditional and unconditional approaches. The unconditional approach involves maximising the full likelihood, written as a function of the observed capture histories (or associated sufficient statistics) and the number of unobserved individuals to obtain an MLE of the total population size N. The conditional approach (Sanathanan 1972; Huggins 1989, 1991) involves maximising the conditional likelihood \(L^{(c)}\) (conditional on the number of observed individuals) given by,

$$\begin{aligned} L^{(c)}({\varvec{p}}, {\varvec{c}}, {\varvec{\psi }}, {\varvec{\alpha }}; {\varvec{n}}, {\varvec{v}}, {\varvec{z}})&\propto \frac{n!}{(1-\rho )^n} \prod _{t=1}^T \prod _{r \in \mathcal{R}} \left[ \zeta _t(r)^{z_t(r)} \right] \\&\quad \times \prod _{t=1}^{T-1} \prod _{r \in \mathcal{R}} \left[ \chi _t(r)^{v_t(r)} \prod _{\tau =t+1}^T \prod _{s \in \mathcal{R}} O_{t,\tau }(r,s)^{n_{t,\tau }(r,s)} \right] \end{aligned}$$

to obtain estimates of the capture probabilities. The population size is then estimated using a Horvitz–Thompson-like estimator; see McCrea and Morgan (2014, pp.33–35) for the single-state case and Schwarz and Ganter (1995) for the multi-state case (though here estimation of the total population was not of interest for the given study). Specifically,

$$\begin{aligned} \widehat{N}&= \frac{n}{1-\widehat{\rho }}, \end{aligned}$$
(3)

where \(\widehat{\rho }\) is calculated by Eq. 2 using the MLEs obtained from the conditional likelihood. Fewster and Jupp (2009) demonstrated that, for a wide class of models, the difference between the population size estimates obtained from the conditional and unconditional approaches is of order 1. However, the differences were large when capture probabilities included both behavioural and heterogeneity effects, and in this case advocated the use of unconditional approaches. Here, we develop the multi-state unconditional approach.

4 Simulation Study

We conduct a simulation study of the proposed \(M_h^{R}\) model (i.e. no temporal or behavioural effects but a constant capture probability for each discrete state). We compare the performance of fitting this true covariate model with the corresponding conditional approach and a range of alternative individual heterogeneity models which ignore the state covariate (models \(M_{h(2)}\), \(M_{h(3)}\), \(M_{h(be)}\) and \(M_{h(b-be)}\)), and models that ignore the individual heterogeneity all together (models \(M_0\), \(M_t\) and \(M_b\)). Of particular interest is the bias and precision of the population estimates for the different models, especially those that do not account for state dependence in the capture probabilities or permit movement between states. We are also interested in how accurately the population size in each state on each occasion can be estimated using a forward–backward-type algorithm (see Appendix A of the online supplementary material). For each simulation, we assume that there are six encounter occasions (\(T=6\)) and a total population size of 100 individuals (\(N=100\)). We consider four different sets of parameter values for the simulation study, corresponding to different numbers of states (\(R=2\) and \(R=3\)) and different sets of transition matrices. We evaluate two scenarios corresponding to “low” mobility and “high” mobility between states. In particular, for \(R=2\) low mobility corresponds to \(\psi (1,2)=0.3\) and \(\psi (2,1)=0.2\); high mobility to \(\psi (1,2)=0.9\) and \(\psi (2,1)=0.6\). The equilibrium distribution for the low- and high-mobility cases is the same, and we set the initial state distribution to be equal to this equilibrium distribution, such that \(\alpha (1) = 0.4\) (and \(\alpha (2) = 0.6\)). Finally, for each of these cases, the capture probabilities for the different states are set to \(p(1)=0.15\) and \(p(2)=0.4\). For \(R=3\), the transition matrices are specified to be:

$$\begin{aligned} \left( \begin{array}{ccc} 0.76 &{} 0.12 &{} 0.12 \\ 0.1 &{} 0.8 &{} 0.1 \\ 0.15 &{} 0.15 &{} 0.7 \\ \end{array} \right)&\left( \begin{array}{ccc} 0.28 &{} 0.36 &{} 0.36 \\ 0.3 &{} 0.4 &{} 0.3 \\ 0.45 &{} 0.45 &{} 0.1 \\ \end{array} \right) \end{aligned}$$

for low and high mobilities, respectively. Again, these transition matrices are specified such that they have the same equilibrium distribution, and we set the initial state distribution to be equal to this equilibrium distribution, such that \(\alpha (1) = 0.33\), \(\alpha (2) = 0.4\) (and \(\alpha (3) = 0.27\)). The corresponding state-dependent capture probabilities are defined to be \(p(1)=0.15\), \(p(2) = 0.25\) and \(p(3)=0.4\). For each set of parameter values, we simulate a total of 1000 datasets and fit the following models to the data: \(M_0\), \(M_t\), \(M_b\), \(M_{h(2)}\), \(M_{h(3)}\), \(M_{h(be)}\), \(M_{h(b-be)}\), the true model \(M_h^{R}\) and the conditional model \(M_h^{R(c)}\). In the conditional model, N is not estimated directly through the likelihood but is calculated using the Horvitz–Thompson-like estimator in Eq. 3.

Figure 1 shows boxplots of the bias of the model parameters for the true model \(M_h^2\) (\(R=2\)). Figure 2 displays boxplots of population size estimates from all the models when \(R=2\). The corresponding plots for the three-state case are given in Figs. 3 and 4.

Fig. 1
figure 1

Boxplots of the bias of the model parameters for the true model \(M_h^2\) for the simulated cases of low mobility (left) and high mobility (right). Parameter values used are given in the text.

Fig. 2
figure 2

Estimates of population size (N) from nine models for the simulated cases of low mobility (left) and high mobility (right) between two discrete capture states. Parameter values used are given in the text.

Fig. 3
figure 3

Boxplots of the bias of the model parameters for the true model \(M_h^3\) for the simulated cases of low mobility (left) and high mobility (right). Parameter values used are given in the text.

Fig. 4
figure 4

Estimates of population size (N) from nine models for the simulated cases of low mobility (left) and high mobility (right) between three discrete capture states. Parameter values used are given in the text.

For the true model, in all cases considered, the estimates of N do not show any bias. In the two-state scenario, the remaining model parameters are estimated well with little difference in variation between low and high mobilities. In the three-state scenario, the remaining model parameters are estimated well in the case of low mobility. In the scenario of high mobility for the three-state case, some of the remaining model parameter estimates appear to exhibit some bias and there is very large variation in all of the parameter estimates. This appears to be due to an “averaging” or “mixing” effect across the states where there is greater uncertainty about the state of an individual when they are not captured leading to greater uncertainty in the parameter estimates. The traditional models without any individual heterogeneity, \(M_0\), \(M_t\) and \(M_b\), indicate a strong negative bias for the case of low mobility for both the two- and three-state scenarios. For these three models, the variability in the estimates of N is similar for low and high mobilities for both two and three states. For the low-mobility scenarios, the heterogeneity models \(M_{h(2)}\), \(M_{h(3)}\), \(M_{h(be)}\) and \(M_{h(b-be)}\) all estimate N well, but there are a large number of extremely large estimates. For the high-mobility scenario, the heterogeneity models \(M_{h(2)}\), \(M_{h(3)}\), \(M_{h(be)}\) and \(M_{h(b-be)}\) appear to be positively biased. This is due to underestimation of the capture or mixture probabilities or both caused by the mixing effect described above. When there are three states, the heterogeneity models have higher precision in estimating N when there is high mobility. In comparison with the existing heterogeneity models, the new multi-state model has the greatest precision of all the heterogeneity models considered for low and high mobility in both the two- and three-state cases. The conditional model displays very similar results to the unconditional approach, which appears to agree with the findings of Fewster and Jupp (2009). The new multi-state model shows no bias in estimating the population size in each state on each occasion in all cases except the three-state high-mobility scenario. In the case of low mobility, the estimated number in states with higher capture probabilities shows less variation, and this is expected since more individuals in the state are captured leading to less uncertainty about the number in the state. For the three-state high-mobility case, the estimates are biased to varying degrees, and this is again due to the mixing issue described above and the high uncertainty of the model parameters. Plots of the bias of this state-dependent population size are given in Appendix B of the online supplementary material.

5 Application: Great Crested Newts

These data are collected from a study site on the University of Kent campus and are included as electronic supplementary material to this article. Data have been collected on the population of newts breeding at the site since 2002. Within this study, capture occasions occur weekly throughout the breeding season, with individuals being identified through unique physical markings. The additional state information corresponds to the pond in which the newts are captured. Originally the site consisted of four ponds but was extended in 2009 to a total of eight ponds, four “old” ponds (state 1) and four “new” ponds (state 2), with the new ponds first being colonised in 2010. Of specific interest is whether there were any differences between the old and new ponds in terms of capture and transition probabilities when they were first colonised and whether any differences have remained. In order to assess this, we compare results from the 2010 and 2013 data sets. In order to make the assumption of closure reasonable, we take a subset of six weeks (\(T=6\)) from the middle of each of the 2010 and 2013 breeding seasons, during which it can be assumed that all breeding newts have arrived at the breeding ponds and have not yet started to leave the area.

In the 2010 season, a total of 33 unique individuals were encountered over the study period considered, while in 2013 there were 44 unique individuals encountered. We fit a range of models to the data, initially considering the new multi-state model with heterogeneity, model \(M_h^2\). Table 1 provides the MLEs and 95% nonparametric bootstrap confidence intervals (CIs) for the parameters of the \(M_h^2\) model for the 2010 and 2013 data. Figure 5 shows the estimated population size in each state for each occasion for model \(M_h^2\) for the 2010 data and model \(M_t^2\) for the 2013 data (see model selection discussion below). The 95% CIs are calculated using a nonparametric bootstrap with 9999 bootstrap resamples to avoid intervals outside permissible boundary ranges.

Table 1 MLEs and 95% nonparametric bootstrap confidence intervals for the parameters of the \(M_h^2\) model for the great crested newt study 2010 and 2013 data.
Fig. 5
figure 5

Point estimates and 95% nonparametric bootstrap confidence intervals for state- and occasion-dependent population size for model \(M_h^2\) (left) and \(M_t^2\) (right) for the great crested newt study 2010 and 2013 data, respectively.

The MLEs of the capture probabilities indicate that in 2010 the old ponds had a higher capture probability than for the new ponds. However, by 2013 the higher capture probability for the old ponds seems to have disappeared with similar capture probabilities for both old and new ponds (see below for discussion of model selection). The old ponds had more vegetation around the traps which meant that the newts had a greater chance of entering them than in the new ponds, where traps were more exposed. In addition, for 2010 the transition probabilities indicate a general movement trend away from the old ponds to the new ponds, but once a newt reaches the new pond demonstrates high fidelity to the new ponds. This movement can be clearly seen in Fig. 5 (left-hand plot). Previous analyses suggested that new recruits (first time breeders) used the new ponds more frequently than newts returning to the ponds (Lewis 2010). By 2013, the newts show high fidelity to both the old and new ponds. Finally, we note that in 2010 the newts appear to be evenly distributed between the two ponds at the beginning of the study period, but by 2013, the proportion of newts increases in the new ponds (though the confidence intervals are reasonably wide).

Interestingly, the results imply that only a single individual was missed during the study period in 2010 and two were missed during the 2013 study period. These estimates are in keeping with the ecological understanding of the population. It is believed that capture probability over the breeding season as a whole is very high. This was confirmed in 2005 when a drift fence was set up confirming that all individuals had been captured at least once. A period of six weeks has been selected here in the central part of the breeding season, to accommodate the assumption of closure. Outside of the selected six week period, in 2010, 7 individuals were seen before the selected period, but not during the study and one individual after but not during the study period. No newts were captured both before and after. Of those seen only before, all were seen quite early in the season, while the one individual seen after the study period is seen immediately after the 6 week period. In 2013, 5 individuals were seen before, but not during, the closed period (all were seen very early in the season) and one individual is recaptured before and after the study period, but not during.

We now consider in further detail the issue of model selection. We fit the additional models \(M_0^2\), \(M_t^2\) and \(M_{th}^2\) and compare them using the AIC statistic. For model \(M_{th}^2\), we specify the additional time dependence to be additive, so that \(\hbox {logit}\ p_t(2) = \left( \hbox {logit}\ p_t(1)\right) + \eta \). Table 2 provides the corresponding \(\Delta \)AIC values, estimates and 95% nonparametric bootstrap confidence intervals of N, and the chi-squared goodness-of-fit parametric bootstrap p-value for each model for both years of data. The model \(M_h^2\) is deemed optimal for the 2010 data (only state-dependent capture probabilities) and \(M_t^2\) for the 2013 data (only time-dependent capture probabilities). The corresponding parameter estimates for model \(M_t^2\) for 2013 are provided in Table 3. However, we note that in both cases the model \(M_{th}^2\) has a \(\Delta \)AIC \(<2\), indicating that there is little difference in support for the model deemed optimal and the model with both time- and state-dependent capture probabilities.

Table 2 \(\Delta \)AIC values, MLEs and 95% nonparametric bootstrap CIs for N (denoted \(\hat{N}\)) and corresponding chi-squared goodness-of-fit parametric bootstrap p-values for four multi-state models for the great crested newt study 2010 and 2013 data.
Table 3 MLEs and 95% nonparametric bootstrap confidence intervals for the parameters of the \(M_t^2\) model for the great crested newt study 2013 data.

All models fitted to the data suggest high fidelity to the old ponds in 2013 and the new ponds in both years with an increase in the proportion of individuals arriving at the new ponds in 2013 with similar estimates for the total population size. The difference in choice of pond on arrival can be seen in Fig. 5. In comparison with 2010 (model \(M_h^2\)), model \(M_t^2\) for the 2013 data shows the distribution of newts between the old and new ponds converging to a near equal split between the two. The Pearson’s chi-squared goodness-of-fit test (with parametric bootstrap p-values from 9999 bootstraps) does not indicate a lack of fit for the models fitted to the 2013 data. However, for the 2010 data, it is suggestive of a lack of fit. In conducting the goodness-of-fit test we do not pool small cells together. Fewer individuals are observed in 2010 leading to an increase in the number of small cells observed.

For comparative purposes, the estimates of population size and 95% nonparametric bootstrap confidence intervals resulting from alternative standard models are displayed in Table 4. The models without any individual heterogeneity component (\(M_0\), \(M_t\) and \(M_b\)) all provide similar estimates for N. However, we note that for 2010, the estimate of N is a boundary estimate for models \(M_t\) and \(M_b\). Further, for both 2010 and 2013 both the three group binomial mixture model, \(M_{h(3)}\) and the binomial beta-binomial model \(M_{h(b-be)}\) also lead to boundary estimates for model parameters, essentially reducing the model to the two binomial mixture model \(M_{h(2)}\). For the 2013 data, the estimates are generally similar to those obtained for the multi-state model. However, for the smaller dataset in 2010, the models without any individual heterogeneity appear to underestimate the population size, while the mixture models provide larger estimates (and very wide confidence intervals).

Table 4 MLEs and 95% nonparametric bootstrap confidence intervals for N (denoted \(\hat{N}\)) for seven single-state models for the great crested newt study 2010 and 2013 data.

6 Discussion

We have focussed on deriving multi-state closed capture–recapture models, where additional individual time-varying discrete covariates are observed. The models derived can be viewed as a closed population analogy to the AS model, assuming a first-order Markovian process for the transitions between states. The construction of an explicit closed-form (unconditional) likelihood via a set of sufficient statistics permits efficient evaluation of the likelihood and standard goodness-of-fit techniques, in the form of Pearson’s chi-squared tests, to be applied. This can lead to generally small cell entries in the goodness-of-fit test, with different approaches for pooling cells and their interpretation a focus of current research. Similarities of the closed multi-state model to the AS model also permit other extensions to be immediately applied. For example, in many cases, state may be only partially observed, including failure to observe a state when an individual is observed, or observing states with error (King and McCrea 2014). In the case where no states are known with certainty, the model reduces to a multi-event-type model (Pradel 2005) corresponding to a finite mixture model which allows for transitions between states. Conditional on the observed number of individuals, these multi-event models can be fitted within E-SURGE (Choquet et al. 2009) to estimate the model parameters (though this package does not have the associated Horvitz–Thompson-like estimator incorporated into it). We note that the limiting case where no states are observed upon recapture and there are no transitions between states, the model reduces to the mixture models proposed by Pledger (2000). Further, the modelling approach can be applied to the case of continuous individual time-varying covariates by considering an approximate (discretised) likelihood of multi-state form (Langrock and King 2013). The movement between the different states can also be generalised by removing the first-order Markov assumption, where the dwell-time distribution (the time spent in each state) is geometric, and instead imposing a more general dwell-time distribution, for example a shifted Poisson or negative-binomial distribution (King and Langrock 2016).

The proposed multi-state closed population model shows better accuracy and precision in estimating N compared to competing models where the additional discrete state information is ignored. Further, additional insight can be obtained with regard to the states, which may themselves be of interest. Most notably, transition probabilities can be estimated (and hence the stable equilibrium distribution of the population over the states) and the relationship between state and capture probabilities evaluated. For the newt data analysis conducted, particular interest lays in the potential transition of newts from the old ponds to the new ponds installed in 2009 with a interest also in the total population size, not least with regard to the completeness of the data collection process and observing all individuals present. The analyses concluded that the data survey collection process appears to be close to a complete census of individuals present at the site which is unusual for capture–recapture studies. Further, there was a general transition of newts from the old ponds to the new ponds between 2010 and 2013, but with little movement within the season. Finally, it appeared that there were initial differences between the capture probabilities in new and old ponds, in 2010, but once the new ponds had become established, the state dependence was no longer significant by 2013.

In the presence of an underlying multi-state system process for closed populations, an unconditional likelihood can be derived and MLEs of the model parameters obtained, extending the previous conditional approaches. In the absence of the observed discrete covariate data, existing heterogeneity models appear to perform adequately; however, including the covariate information does improve the precision of the population estimate, as would be expected. The model developed here can be extended to the open population case, permitting the estimation of both recruitment and departure times from the study population along with state-dependent capture and transition probabilities, i.e. to stopover models where departure times can additionally depend on time since recruitment. Developing these models using classical methods is a focus of current research.