Estimation of population size when capture probability depends on individual states

We develop a multi-state model to estimate the size of a closed population from ecological capture-recapture studies. We consider the case where capture-recapture data are not of a simple binary form, but where the state of an individual is also recorded upon every capture as a discrete variable. The proposed multi-state model can be regarded as a generalisation of the commonly applied set of closed population models to a multi-state form. The model permits individuals to move between the different discrete states, whilst allowing heterogeneity within the capture probabilities. A closed-form expression for the likelihood is presented in terms of a set of sufficient statistics. The link between existing models for capture heterogeneity are established, and simulation is used to show that the estimate of population size can be biased when movement between states is not accounted for. The proposed unconditional approach is also compared to a conditional approach to assess estimation bias. The model derived in this paper is motivated by a real ecological data set on great crested newts, Triturus cristatus.


Introduction
The models presented within this paper focus on the estimation of the size of a closed population using real ecological capture-recapture data on a population of great crested newts 1 arXiv:1708.00348v1 [stat.AP] 1 Aug 2017 Triturus cristatus. An assumption often made in the modelling of capture-recapture data is that of homogeneity in the probability of capture. When estimating the size of a closed population, one in which the population being sampled remains constant across all capture occasions, ignoring this assumption can lead to biased estimates of abundance (Seber, 1982;Hwang and Huggins, 2005). A number of models have been proposed that relax this homogeneity assumption. In particular, Pollock (1974) and Otis et al. (1978) proposed a set of eight closed population capture-recapture models. These models allow the probability of capture to be affected by three factors: time (capture probabilities vary by occasion); behaviour (probability of initial capture is different to all subsequent recaptures) and heterogeneity (each individual has a different capture probability). These models have been fitted using a variety of methods including maximum likelihood (Otis et al., 1978;Agresti, 1994;Norris and Pollock, 1996;Coull and Agresti, 1999;Pledger, 2000), the jackknife (Burnham andOverton, 1978, 1979;Pollock and Otto, 1983), moment methods based on sample coverage (Chao et al., 1992) and Bayesian methods (Casteldine, 1981;Gazey and Staley, 1986;Smith, 1988Smith, , 1991George and Robert, 1992;Diebolt and Robert, 1993;Ghosh and Norris, 2005;King and Brooks, 2008). To specifically address the problem of heterogeneity in the capture probabilities a variety of models have been proposed, including finite mixtures (Diebolt and Robert, 1993;Agresti, 1994;Norris and Pollock, 1996;Pledger, 2000) and infinite mixtures (Coull and Agresti, 1999;Dorazio and Royle, 2003). A comparison of examples of the two types of mixture through simulation are presented in Pledger (2005) and Dorazio and Royle (2005). An issue that commonly arises when estimating the size of closed populations is that different individual heterogeneity models which may be deemed to fit the data equally well can give rise to very different estimates of the abundance (Link, 2003(Link, , 2006. An extended mixture model which provides a convenient framework for model selection is presented in Morgan and Ridout (2008); see also Holzmann et al. (2006).
We extend the previous models for closed capture-recapture data to account for individual heterogeneity when the "state" of an individual is also recorded as a discrete variable. In standard closed capture-recapture studies, for each capture occasion, individuals from a closed population are sampled, returned to the population, and on subsequent occasions attempts are made to recapture them. Individuals within the population are marked when initially captured. If the marking method used assigns a unique mark to each captured 2 individual (e.g. individual ID tags, natural physical markings) then an individual encounter history can be constructed for each individual observed within the study. This history typically takes the form of a vector of 0s and 1s: with a 0 denoting an individual was not encountered and a 1 denoting an individual was captured. We consider the case where individuals may be observed in different discrete states. For example, states may correspond to "resting" and "foraging" or "breeding" and "not breeding". Observed histories then correspond to whether an individual is observed or not, and, given that an individual is observed, its corresponding state. For example, suppose that there are three states labelled 1, 2, and 3. The encounter history 1 0 0 3 2 0 represents that the given individual was observed (and marked) on occasion 1 in state 1, observed on occasion 4 in state 3 and occasion 5 in state 2 and unobserved on occasions 2, 3 and 6. In general, individuals are able to move between the states during the study period and the capture probabilities may be dependent on the state of an individual. For example, if the states correspond to "resting" and "foraging", the capture probabilities may be very different with a significantly higher capture probability for individuals in the "foraging" state compared to "resting". Failure to account for the state dependence may result in biased population estimates. To estimate the number of unobserved individuals (and hence the total population size) a model is fitted to the observed data, permitting the estimation of the associated model parameters and the number of unobserved individuals. For such models, a number of standard assumptions are made. These include the population as a whole is closed; marks can not be lost and individuals are identified without error (see McCrea and Morgan (2014), chapter 3, for further discussion and references therein).
The model we present can be considered a generalisation of the time-dependent multistate closed population model of Schwarz and Ganter (1995) to a form that additionally includes trap-dependence and heterogeneity in the capture probabilities. It may also be seen as a closed-form capture-recapture equivalent to the Arnason-Schwarz (AS) model (Arnason, 1972(Arnason, , 1973Brownie et al., 1993;Schwarz et al., 1993;King and Brooks, 2003;Lebreton et al., 2009), for open capture-recapture data. Initially developed for multi-site capture-recapture data, but more generally applicable to individuals captured in discrete states, the AS model is a multi-state generalisation of the Cormack-Jolly-Seber (CJS) model (Cormack, 1964;Jolly, 3 1965;Seber, 1965). The CJS and AS models allow for a time-dependence in the capture probabilities, with the AS model additionally able to allow capture probabilities to be statedependent. However, these models condition on the first capture of an individual and so are unable to estimate the total population size directly. Dupuis and Schwarz (2007) consider a similar multi-state extension for the Jolly-Seber model for estimating abundance in open populations, fitted within a Bayesian (data augmentation) framework.
We consider a similar AS-type state-dependence in the closed population scenario, within an explicit closed-form likelihood framework where population size is estimated directly through the likelihood. In particular, we compare the performance of the new unconditional multi-state model to the existing single-state models and a conditional approach where the population size is not estimated directly through the likelihood. We present the likelihood in terms of a set of minimal sufficient statistics which permits the fit of the model to be assessed using a Pearson chi-squared test.
The motivation for developing this methodology relates to a study on great crested newts.
A species with protected status in Europe, individuals within the study population are captured weekly throughout the breeding season, with the additional state information referring to the pond in which the individuals are captured. Originally consisting of four ponds, the study site was extended to a total of eight ponds in 2009 with the new ponds being first colonised in the 2010 breeding season. How these new ponds have been colonised, the effectiveness of the traps to capture individuals and whether there are differences in the capture probabilities between the old and new ponds are of particular interest. Ignoring any differences between the old and new ponds, for instance differing amounts of vegetation which may be affecting the probability of capture, may lead to poorer estimates of the total population size, and for this study the states themselves are of interest.
In Section 2 we review the construction of existing single-state closed population models in terms of sets of sufficient statistics, before introducing the likelihood function for the multi-state model in Section 3. The performance of the multi-state model is compared to a conditional approach and the existing heterogeneity models using simulation in Section 4 with a particular focus on the bias and precision of the population size estimates and the ability of the new model to estimate state specific parameters. The new model is applied to the data set of great crested newts in Section 5. The paper concludes with a general 4 discussion in Section 6.
2 Single-state closed population models Consider a study with T capture occasions labelled t = 1, . . . , T and let N denote the total population size, which is to be estimated. Let the set of encounter histories be given by Pr(encounter history for individual i). (1) The existing modelling approaches for data of this type differ by the capture probability parameter dependence. Using the notation of Otis et al. (1978), we denote models by M γ where γ describes the dependence structure placed on the capture probabilities. In general, γ ⊆ {t, b, h}, where t denotes temporal dependence; b denotes behavioural dependence (or trap response); and h denotes individual heterogeneity. This leads to a total of eight different model dependencies, corresponding to the inclusion/exclusion of each of the different types of dependence, with M 0 denoting the model with a constant capture probability which ignores all three dependencies. We note that given a particular form of heterogeneity, multiple models may be defined in terms of the specific dependence. For example, and of particular interest in this paper, a number of different models have been proposed to include individual heterogeneity. In particular, in the absence of additional individual covariate information, the capture probabilities have been specified as finite or infinite mixtures models. These include (with associated notation): M h(k) : individual capture probabilities come from a mixture model with k components (Pledger, 2000); M h(be) : individual capture probabilities specified to be from an underlying beta distribution (Burnham, 1972;Dorazio and Royle, 2003); : individual capture probabilities come from a mixture model with two components: one component simply has a fixed capture probability while the other component is specified to be from some underlying beta distribution (Morgan and Ridout, 2008).
For a given model, the corresponding likelihood function can be specified and maximised to obtain the MLEs of the model parameters (including the beta distribution parameters for models M h(be) and M h(b−be) ). The likelihood function given above is a function of the observed encounter histories x. However, this likelihood can be expressed more efficiently via the use of sufficient statistics for some of the models. In particular, for models M 0 and M h dependencies, a set of sufficient statistics is the Schnabel census (Schnabel, 1938), the study; y = T t=1 (t − 1)z t , corresponding to total number of capture events before initial observation summed over all captured individuals; and f = T t=1 n t , the total number of captures (equivalent to the equation for f given above).
The use of sufficient statistics allows for an efficient evaluation of the likelihood. In addition, they have the advantage of being able to be used to assess the performance of each of these models through the calculation of the Pearson chi-squared statistic, since the likelihood of the data is of multinomial form. The advantage of using sufficient statistics, compared to a bootstrap method, is that the likelihood and goodness-of-fit test require only one evaluation of the likelihood which may be advantageous for large populations. 6 We now extend the previous closed population models for standard encounter histories to those with individual time-varying discrete state information. In particular, we let R denote the set of possible discrete states, which for convenience we label as r = 1, . . . , R. Following the AS analogy to the CJS model, we assume that movement between these states is modelled as a first-order Markov process. We then define the following model parameters: p t (r): the probability an individual is initially captured at time t = 1, . . . , T given that the individual is in state r ∈ R at this time; c t (r): the probability an individual is recaptured at time t = 2, . . . , T given that the individual is in state r ∈ R at this time; r ∈ R}. We note that by definition, r∈R α(r) = 1. To retain model identifiability the recapture probabilities are specified as a function of the initial capture probabilities, such that logitc t (r) = logitp t (r) + β, where β denotes the trap dependence; β < 0 corresponds to a trap shy response; and β > 0 a trap happy response (see for example Chao (2001); King and Brooks (2008) for the analogous single-state case).

Likelihood formulation
The likelihood function is again of the same form as given in equation (1), where now the probability of the encounter history includes not only detection/non-detection at each time but also the associated discrete state. In order to evaluate the likelihood efficiently, we follow King and McCrea (2014) and consider all possible partial encounter histories that could be observed corresponding to (i) the beginning of the study to initial capture; (ii) successive captures; and (iii) final capture to the end of the study. This leads to the following sufficient statistics: 7 (i) z t (r): the number of individuals that are observed for the first time at time t = 1, . . . , T in state r ∈ R; (ii) n t 1 ,t 2 (r, s): the number of individuals that are observed at time t 1 = 1, . . . , T − 1 in state r ∈ R and next observed at time t 2 = t 1 + 1, . . . , T in state s ∈ R; (iii) v t (r): the number of individuals that are observed for the last time at time t = 1, . . . , T − 1 in state r ∈ R.
For notational convenience we set z = {z t (r) : t = 1, . . . , T, r ∈ R}, n = {n t 1 ,t 2 (r, s) : In order to express the likelihood as a function of the above sufficient statistics, we need to calculate the probabilities for each of the associated partial encounter histories. In deriving these probabilities, we consider similar notation to King and McCrea (2014). We let Q t 1 ,t 2 (r, s) denote the probability that an individual in state r ∈ R at time t 1 = 1, . . . , T − 1 is in state s ∈ R at time t 2 = t 1 + 1, . . . , T and not observed at any times between t 1 and t 2 .
The form of this probability is dependent on whether an individual has yet to be captured for the first time or has been previously captured on at least one occasion. We let Q P t 1 ,t 2 (r, s) denote the former and Q C t 1 ,t 2 (r, s) the latter scenario. (If the capture probabilities are not behaviour dependent this distinction is not required.) Then it follows immediately that Q C t 1 ,t 2 (r, s) follows analogously using the appropriate recapture probabilities. We now consider the probabilities associated with each of the above sufficient statistics. We begin by considering the probabilities associated with an individual being observed for the first time (i.e. case (i) and statistic z). Let ζ t (r) denote the probability an individual is initially captured at time t = 1, . . . , T in state r ∈ R. Then using a probabilistic argument we have, We now consider case (ii) (associated with statistic n) and the probability of being recaptured, conditional on the previous capture time. Let O t 1 ,t 2 (r, s) denote the probability 8 an individual in state r ∈ R at time t 1 = 1, . . . , T − 1 is next recaptured in state s ∈ R at time t 2 = t 1 + 1, . . . , T . Then, by definition, The final case (iii) (associated with statistic v) considers the probability an individual is not observed again within the study, following their final capture. Let χ t (r) denote the probability an individual in state r ∈ R at time t = 1, . . . , T − 1 is not observed again during the study. Then, for all r ∈ R it follows that, By definition an individual observed at the last capture time is clearly not able to be observed again within the study, i.e. the probability it is not seen again is 1 (this means that we do not need to consider this term within the likelihood expression).
Finally, in order to permit estimation of the total population size, we let ρ denote the probability an individual is not observed within the study. From the law of total probability (considering all possible states an individual may be in at the first and last capture time) it follows that ρ = r,s∈R α(r)(1 − p 1 (r))Q P 1,T (r, s)(1 − p T (s)).
The corresponding likelihood function, specified as a function of the sufficient statistics is of the form, The above likelihood allows for temporal effects, behavioural effects and individual heterogeneity (in the form of discrete covariates), which we represent notationally as M R tbh , where the superscript, R, denotes the number of discrete states. Sub-models can be derived by specifying restrictions on the model parameters. In particular the basic dependence structures can be described with M R 0 : p t (r) = c t (r) = p, for all r ∈ R and t = 1, . . . , T ; 9 M R t : p t (r) = c t (r) = p t , for all r ∈ R and t = 1, . . . , T ; M R b : p t (r) = p and c t (r) = c, for all r ∈ R and t = 1, . . . , T ; M R h : p t (r) = c t (r) = p(r), for all r ∈ R and t = 1, . . . , T ; with associated restrictions for models with multiple dependencies. We note that the case of heterogeneous capture probabilities is fully determined by the additional discrete state information where a capture probability is estimated for each discrete state and remains constant for each state across all capture occasions.
Evaluating the likelihood through the sufficient statistics uses recursions similar to those in hidden Markov models (HMMs) but in more efficient forms. In the HMM framework the likelihood considers each individual encounter history in turn. By using more efficient sufficient statistics we are able to reduce the number of operations required to calculate the likelihood. This is achieved by using the probabilities associated to each of the sufficient statistics for multiple partial histories. Bishop et al. (1975) classified population size modelling into both conditional and unconditional approaches. The unconditional approach involves maximising the full likelihood, written as a function of the observed capture histories (or associated sufficient statistics) and the number of unobserved individuals to obtain an MLE of the total population size N .

Conditional and Unconditional approaches
The conditional approach (Sanathanan, 1972) involves maximising the conditional likelihood (conditional on the number of observed individuals) to obtain estimates of the capture probabilities. The population size is then estimated using a Horvitz-Thompson-like estimator; see McCrea and Morgan (2014, pp.33-35) for the single-state case and Schwarz and Ganter (1995) for the multi-state case (though here estimation of the total population was not of interest for the given study). Specifically: whereρ is calculated using Equation 2 but using the MLEs obtained using the conditional likelihood. Fewster and Jupp (2009) demonstrated that, for a wide class of models, the difference between the population size estimates obtained from the conditional and unconditional approaches is of order 1. However the differences were large when capture probabilities included both a behavioural and heterogeneity effect, and in this case advocated the use of unconditional approaches. Here we develop the multi-state unconditional approach.

State uncertainty
Within the model derivation we have assumed states are known with certainty. In practice this may not be the case and the likelihood can be extended to incorporate state-uncertainty akin to the approach adopted by King and McCrea (2014). This involves introducing further parameters corresponding to state assignment probability terms. In the case where no states are known with certainty, the model reduces to a multi-event-type model (Pradel, 2005) corresponding to a finite mixture model which allows for transitions between states. Conditional on the observed number of individuals, these multi-event models can be fitted within E-SURGE (Choquet et al., 2009) to estimate the model parameters (though this package does not have the associated Horvitz-Thompson-like estimator incorporated into it). We further note that the limiting case where no states are observed upon recapture and there are no transitions between states, the model reduces to the mixture models proposed by Pledger (2000), where α represents the associated mixture probabilities.

Simulation study
We conduct a simulation study of the proposed M R h model (i.e. no temporal or behavioural effects but a constant capture probability for each discrete state). We compare the performance of fitting this true covariate model with the corresponding conditional approach and a range of alternative individual heterogeneity models which ignore the state covariate (models M h(2) , M h(3) , M h(be) and M h(b−be) ), and models that ignore the individual heterogeneity all together (models M 0 , M t and M b ). Of particular interest is the bias and precision of the population estimates for the different models, especially those that do not account for state-dependence in the capture probabilities. For each simulation we assume that there are six encounter occasions (T = 6) and a total population size of 100 (N = 100). We consider four different sets of parameter values for the simulation study, corresponding to different numbers of states (R = 2 and R = 3) and different sets of transition matrices. We evaluate two scenarios corresponding to "low" mobility and "high" mobility between states.
In particular, for R = 2 low mobility corresponds to ψ(1, 2) = 0.3 and ψ(2, 1) = 0.2; high mobility to ψ(1, 2) = 0.9 and ψ(2, 1) = 0.6. The equilibrium distribution for the low and high mobility cases are the same, and we set the initial state distribution to be equal to this equilibrium distribution, such that α(1) = 0.4 (and α(2) = 0.6). Finally, for each of these cases, the capture probabilities for the different states are set to p(1) = 0.15 and p(2) = 0.4. For R = 3, the transition matrices are specified to be:  For the true model, in all cases considered, the estimates of N do not show any bias.
In the two-state scenario the remaining model parameters are estimated well with no obvious differences in variation between low and high mobility. In the three-state scenario the remaining model parameters are estimated accurately in the case of low mobility. In the scenario of high mobility for the three-state case some of the remaining model parameter estimates appear to exhibit some bias and there is very large variation in all of the parameter estimates. This appears to be due to an "averaging" or "mixing" effect across the When there are two-states the heterogeneity models have similar precision for both low and high mobility. In the three-state scenario for low mobility the heterogeneity models estimate N reasonably well. For high mobility they display a tendency to overestimate N but only marginally. When there are three-states the heterogeneity models have higher precision when there is high mobility. In comparison to the existing heterogeneity models, the new multi-state model has the greatest precision of all the heterogeneity models considered for low and high mobility in both the two-and three-state cases. The conditional model displays very similar results to the unconditional approach. [ The MLEs of the capture probabilities indicate that in 2010 the old ponds had a higher capture probability than for the new ponds. However, by 2013 the higher capture probability for the old ponds seems to have disappeared with similar capture probabilities for both old and new ponds (see below for discussion of model selection). The old ponds had more 14 vegetation around the traps which meant that the newts had a greater chance of entering them than in the new ponds, where traps were more exposed. In addition, for 2010 the transition probabilities indicate a general movement trend away from the old ponds to the new ponds, but once a newt reaches the new pond demonstrates a high fidelity to the new ponds. Previous analyses suggested that new recruits (first time breeders) used the new ponds more frequently than newts returning to the ponds (Lewis, 2012). By 2013 the newts show high fidelity to both the old and new ponds. Finally we note that in 2010 the newts appear to be evenly distributed between the two ponds at the beginning of the study period, but by 2013, the proportion of newts increases in the new ponds by 2013 (though the confidence intervals are reasonably wide).
Interestingly, the results imply that only a single individual was missed during the study period in 2010 and two were missed during the 2013 study period. Whilst the total population size is not the main focus of this study, these estimates are in keeping with the ecological understanding of the population. It is believed that capture probability over the breeding season as a whole is very high. This was confirmed in 2005 when a drift fence was set up confirming that all individuals had been captured at least once. A period of six weeks has been selected here in the central part of the breeding season, to accommodate the assumption of closure. Outside of the selected six week period, in 2010, 7 individuals were seen before the selected period, but not during the study and one individual after but not during the study period. No newts were captured both before and after. Of those seen only before, all were seen quite early in the season, whilst the one individual seen after the study period is seen immediately after the 6 week period. In 2013, 5 individuals were seen before, but not during, the closed period (all were seen very early in the season) and one individual is recaptured before and after the study period, but not during.
We now consider in further detail the issue of model selection. We fit the additional models M 2 0 , M 2 t and M 2 th and compare them using the AIC statistic. For model M 2 th we specify the additional time dependence to be additive, so that logit p t (2) = logit p t (1) + η.  Table 3. However, we note that in both cases the model M 2 th has a ∆AIC < 2, indicating that there is little difference in support for the model deemed optimal and the model with both time and state dependent capture probabilities.
[ All models fitted to the data suggest high fidelity to the old ponds in 2013 and the new ponds in both years with an increase in the proportion of individuals arriving at the new ponds in 2013 with similar estimates for the total population size,. The Pearson's chi-squared goodness-of-fit test does not indicate a lack of fit for the models fitted to the 2013 data. However, for the 2010 data, it is suggestive of a lack of fit. In conducting the goodness-of-fit test we do not pool small cells together so that the p-values will generally be an underestimate (i.e. we are more likely to get a significant result than if an exact test is used). In addition, fewer individuals are observed in 2010 leading to an increase in the number of small cells observed, and hence an expectation of smaller p-values.
For comparative purposes, the estimates of population size resulting from alternative standard models are displayed in Tables 6 and 7  [ We have focussed on deriving multi-state closed capture-recapture models, where additional individual time-varying discrete covariates are observed. The models derived can be viewed as a closed population analogy to the AS model, assuming a first-order Markovian process for the transitions between states. The construction of an explicit closed-form (unconditional) likelihood via a set of sufficient statistics permits efficient evaluation of the likelihood and standard goodness-of-fit techniques, in the form of Pearson's chi-shared tests, to be applied.
This can lead to generally small cell entries in the goodness-of-fit test, with different approaches for pooling cells and their interpretation a focus of current research. Similarities of the closed multi-state model to the AS model also permit other extensions to be immediately applied. For example, in many cases, state may be only partially observed, including failure to observe a state when an individual is observed, or observing states with error (King and McCrea, 2014). Further, the modelling approach can be applied to the case of continuous individual time-varying covariates by considering an approximate (discretised) likelihood of multi-state form (Langrock and King, 2013). The movement between the different states can also be generalised by removing the first-order Markov assumption, where the dwell-time distribution (the time spent in each state) is geometric, and instead imposing a more general dwell-time distribution, for example a shifted Poisson or negative-binomial distribution (King and Langrock, 2016).
The proposed multi-state closed population model shows better accuracy and precision in estimating N compared to competing models where the additional discrete state information is ignored. Further, additional insight can be obtained with regard to the states, which may themselves be of interest. Most notably, transition probabilities can be estimated (and hence the stable equilibrium distribution of the population over the states) and the relationship between state and capture probabilities evaluated. For the newt data analysis conducted, particular interest lay in the potential transition of newts from the old ponds to the new ponds installed in 2009 with a interest also in the total population size, not least with regard to the completeness of the data collection process and observing all individuals present.
The analyses concluded that the data survey collection process appears to be close to a complete census of individuals present at the site which is unusual for capture-recapture studies. Further, that there was a general transition of newts from the old ponds to the new ponds between 2010 and 2013, but with little movement within the season. Finally, it appeared that there were initial differences between the capture probabilities in new and old ponds, in 2010, but once the new ponds had become established, the state-dependence was no longer significant by 2013.
In the presence of an underlying multi-state system process for closed populations, an unconditional likelihood can be derived and MLEs of the model parameters obtained, extending the previous conditional approaches. In the absence of the observed discrete covariate data, existing heterogeneity models appear to perform adequately, however, including the covariate information does improve the precision of the population estimate, as would be expected.
The model developed here can be extended to the open population case, permitting both recruitment and departure from the study population, i.e. to stopover models. This is the focus of current research.