1 Introduction

Capture–recapture surveys are often used when studying wildlife populations to understand the associated population dynamics necessary for management and conservation. These surveys involve repeatedly sampling the population over a series of capture occasions. Observations at each capture occasion may take the form of physical captures and/or visual sightings of animals. Individuals are uniquely identified, using, for example, a ring or tag applied at initial capture or via natural markings. The observed data correspond to the set of capture histories for each individual observed in the study, detailing whether or not they are observed at each capture occasion. Capture–recapture studies may be assumed to be closed or open, dependent on whether the population is constant throughout the study; or may change due to births, deaths and/or migration, respectively. The corresponding parameters of interest typically differ between such studies with closed population models primarily focusing on abundance estimation; while open population models often focus on the estimation of survival probabilities, although these may also extend to abundance. For a review of these data and associated models see for example McCrea and Morgan (2015), Seber and Schofield (2019).

Incorporating heterogeneity in capture–recapture models can be important to model the underlying system processes. Omitting such sources of heterogeneity can lead to biased and/or misleading results (Rosenberg et al. 1995; Schwarz 2001; Chao et al. 2001; White and Cooch 2017). Otis et al. (1978) described three main sources of heterogeneity: temporal, behavioural and individual. In this paper we focus on individual heterogeneity, which can often be incorporated via the use of observable characteristics, such as gender, breeding status or condition. King and McCrea (2019) categorised observed individual covariates into \(2\times 2\) cross-classifications corresponding to continuous/discrete-valued and time-varying/invariant. We focus on the more challenging case of continuous-valued covariates. Missing data often arise for such covariate information due to, for example, imperfect data collection or simply the structure of the experimental design. For example, for stochastic time-varying covariates, if an individual is not observed, the corresponding covariate is also unknown at that time. In general, for continuous-valued covariates, the observed data likelihood is only expressible as an analytically intractable integral leading to model-fitting challenges. Previous model-fitting approaches include using a Bayesian data augmentation (Bonner and Schwarz 2006; King and Brooks 2008; Bonner et al. 2010); an approximate discrete hidden Markov model (Langrock and King 2013); and a two-step multiple imputation approach (Worthington et al. 2015). However, these approaches are computationally expensive and not scalable to large datasets.

Alternatively, unobservable individual heterogeneity can be modelled via individual random effects. These may either be specified as a finite mixture model (Pledger 2000; Pledger et al. 2003); or an infinite mixture model (Coull and Agresti 1999; Dorazio and Royle 2003; King and Brooks 2008). However, we note that identifiability issues may arise in terms of the distributional assumption of this heterogeneity, with different models leading to the same distribution for the observed data (Link 2003, 2006), indicating that some sensitivity analyses are advisable in practice. For the finite case, a closed form expression for the likelihood is available, summing over the mixture components; for the infinite case the necessary integration is generally analytically intractable (though see Dorazio and Royle (2003) for a special case for closed models assuming a Beta distribution). Again, within this paper we focus on the case of continuous-valued individual random effects. A variety of approaches have been applied to fit random effect individual heterogeneity models to data including conditional likelihood (Huggins and Hwang 2011); Bayesian data augmentation (King and Brooks 2008; Royle et al. 2007; Royle 2008); numerical integration (Coull and Agresti 1999; Gimenez and Choquet 2010; White and Cooch 2017) and combined numerical integration and data augmentation (Bonner and Schofield 2014; King et al. 2016). Observed and unobserved heterogeneity can be jointly considered via a mixed model specification, including both covariate information and additional random effects with similar model-fitting tool applied [see, for example, King et al. (2006), Gimenez and Choquet (2010), Stoklosa et al. (2011)].

We develop a computationally efficient Laplace approximation for the analytically intractable integral in the likelihood function in the presence of individual heterogeneity for capture–recapture data, which we subsequently numerically maximise to obtain the MLEs of the parameters. We apply the approach to fit (i) a closed population model with individual random effects (using a higher order Laplace approximation for improved accuracy) and (ii) an open population model with time-varying continuous covariate information. For the open population model, we use the numerical automatic differentiation tool in the R package Template Model Builder (TMB; Kristensen et al. 2016) to approximate the likelihood. TMB builds on the approach of the Automatic Differentiation Model Builder (ADMB) package where the objective function is written in C++. The approach is scalable to both large dimensions and sample size i.e., it is designed to be fast for handling many random effects (\(\approx 10^6\)) and parameters (\(\approx 10^3\)) (Kristensen et al. 2016) since computing the most challenging computation (the second derivatives) is no longer expensive due to the automatic differentiation function within TMB.

In Sect. 2 we introduce the notation and individual heterogeneity models. In Sect. 3 we describe the associated Laplace approximation for the closed and open capture–recapture models we consider. In Sect. 4 we conduct a simulation study of these systems, before analysing two real datasets in Sect. 5. Finally we conclude with a discussion in Sect. 6.

2 Capture–Recapture Models

We begin with a brief description of the general notation for capture–recapture studies and model parameters, before describing the specific models for closed and open populations that we consider in detail with their associated observed data likelihoods.

2.1 Notation

We let \(t=1,\ldots ,T\) denote the capture occasions within the study; and \(i=1,\ldots ,n\) the observed individuals over all the capture occasions. Then, for each individual \(i=1,\ldots ,n\) and capture occasion, \(t=1,\ldots ,T\) we let,

$$\begin{aligned} x_{it} = {\left\{ \begin{array}{ll} 1 &{} \quad \text {if individual } \,\, i \,\, \text { is observed at time } t; \\ 0 &{} \quad \text {if individual } \,\, i \,\, \text { is unobserved at time } t. \end{array}\right. } \end{aligned}$$

The capture history associated with individual \(i=1,\ldots ,n\) is denoted by \(\varvec{x}_{i}=\{x_{it}:t=1,\ldots ,T\}\); and the set of capture histories \(\varvec{x} = \{x_{it}:i=1,\ldots ,n; t=1,\ldots ,T\}\). Finally, we let \(f_i\) and \(\ell _i\) denote the first and last time individual i is observed, respectively, for \(i=1,\ldots ,n\).

The associated model parameters depend on the specific capture–recapture model being fitted. We describe the general set of parameters and indicate whether they apply to the closed or population models considered within this paper:

  • N = the total population size (closed);

  • \(p_{it}\) = \(\mathbb {P}(\text{ individual } \)i\( \text{ is } \text{ observed } \text{ at } \text{ time } \)t |\( \text{ available } \text{ for } \text{ capture } \text{ at } \text{ time } \)t) (open and closed);

  • \(\phi _{it}\) = \(\mathbb {P}(\text{ individual } \)i\( \text{ is } \text{ alive } \text{ at } \text{ time } \)t+1 |\( \text{ alive } \text{ at } \text{ time } \)t) (open).

We let \(\varvec{p}= \{p_{it}:i=1,\ldots ,N; t=1,\ldots ,T\}\), and similarly for \(\varvec{\phi }\).

We consider the individual heterogeneity model component denoted by \(\varvec{y}= \{\varvec{y}^{\mathrm{obs}},\varvec{y}^{\mathrm{mis}}\}\), where \(\varvec{y}^{\mathrm{obs}}\) and \(\varvec{y}^{\mathrm{mis}}\) denote the observed and unobserved individual heterogeneity components, respectively. For example, \(\varvec{y}^{\mathrm{obs}}\) may correspond to observed covariate values; while \(\varvec{y}^{\mathrm{mis}}\) the missing covariate values or random effect terms. In this paper we assume that \(\varvec{y}^{\mathrm{mis}}\) is continuous-valued. We let \(\varvec{\psi }\) denote the associated individual heterogeneity model parameters; and \(\varvec{\theta }\) the model parameters to be estimated.

The corresponding data are given by \(\{\varvec{x}, \varvec{y}^{\mathrm{obs}}\}\), with associated observed data likelihood,

$$\begin{aligned} f(\varvec{x}, \varvec{y}^{\mathrm{obs}}|\varvec{\theta }, \varvec{\psi }) = \int _{\varvec{y}_{\mathrm{mis}}} f(\varvec{x}| \varvec{\theta }, \varvec{y}) f(\varvec{y}| \varvec{\psi }) d \varvec{y}^{\mathrm{mis}}, \end{aligned}$$
(1)

where \(f(\varvec{x}|\varvec{\theta }, \varvec{y})\) denotes the complete data likelihood; and \(f(\varvec{y}| \varvec{\psi })\) the random effect or covariate model component. This likelihood is, in general, analytically intractable. (If there are discrete-valued elements of \(\varvec{y}^{\mathrm{mis}}\) the integral becomes a summation). We now describe both a closed and open capture–recapture model, considering a continuous-valued random effects model, and time-varying continuous-valued individual covariate model, respectively.

2.2 Closed \(M_h\)-type Models: Unobserved Heterogeneity

We consider the closed models, \(M_{tbh}\), proposed by Otis et al. (1978), where the subscripts correspond to temporal, behavioural and individual heterogeneity, respectively. The subscripts denote which heterogeneities are present in a given model. Additional heterogeneity can be modelled via observed covariates (Stoklosa et al. 2011), but we do not consider this case here. The total population size, N, is the parameter of primary interest. The recapture probabilities, \(p_{it}\), are expressed such that, \(h(p_{it}) = \alpha _t + \lambda S_{it} + \epsilon _i\), where \(S_{it}=0\) if \(t \le f_i\); and \(S_{it}=1\) if \(t > f_i\); and h denotes some function constraining the recapture probabilities to the interval [0, 1]. Within this paper we assume a logistic relationship, so that \(h \equiv \) logit. The \(\alpha _t\) terms correspond to the temporal component; \(\lambda \) the behavioural component and \(\epsilon _i\) the individual random effect term which we assume to be of the form \(\epsilon _i \sim \text {N}(0,\sigma ^2)\). The individual heterogeneity sub-models \(M_h\), \(M_{th}\), \(M_{bh}\) are obtained by setting restrictions on the parameters. For example, in the absence of a behavioural effect \(\lambda = 0\); and when there are no temporal effects \(\alpha _t = \alpha \) for all \(t=1,\ldots ,T\). The conditional likelihood, given the capture probabilities, \(\varvec{p}= \{p_{it}: i=1,\ldots ,N, t=1,\ldots ,T\}\), is of multinomial form (omitting the constant coefficient terms):

$$\begin{aligned} f(\varvec{x}| N, \varvec{p}) = \frac{N!}{(N-n)!} \prod _{i=1}^n \prod _{t=1}^{T} p_{it}^{x_{it}}(1-p_{it})^{(1-x_{it})}. \end{aligned}$$

However, the capture probabilities are specified as a random effect component. The model parameters are \(\varvec{\theta }= \{N, \varvec{\alpha }, \lambda \}\) with individual heterogeneity model parameters \(\varvec{\psi }= \{\sigma ^2\}\). Thus, assuming there is no additional observed individual covariate information within the study, we have \(\varvec{y}^{\mathrm{obs}} = \emptyset \), and \(\varvec{y}^{\mathrm{mis}} = \{\epsilon _i: i=1,\ldots ,N\}\) . We let \(\varvec{x}_0 = \{x_{01},\ldots ,x_{0T}\}\) denote the unobserved capture history with all entries equal to zero (i.e. \(\varvec{x}_{0t} = 0\) for all \(t=1,\ldots ,T\) and all individuals \(i=n+1,\ldots ,N\)). Further, the associated random effect of an unobserved individual is denoted by \(\epsilon _0\), with associated capture probability at time t given by \(h(p_{0t}) = \alpha _t + \epsilon _0\). The observed data likelihood of Eq. (1) can be expressed as,

$$\begin{aligned}& f(\varvec{x} | \varvec{\theta }, \sigma ^2) \propto \frac{N!}{(N-n)!} \prod _{i=1}^n \left[ \int f(\varvec{x}_i| \varvec{\theta }, \epsilon _i) f(\epsilon _i|\sigma ^2) \, \mathrm{d}\epsilon _i\right] \nonumber \\&\quad \times \left[ \int f(\varvec{x}_0| \varvec{\theta }, \epsilon _0) f(\epsilon _0|\sigma ^2) \, \mathrm{d}\epsilon _0 \right] ^{N-n}, \end{aligned}$$
(2)

where \(f(\epsilon _i|\sigma ^2)\) denotes the density function for the unobserved heterogeneity process; and \(f(\varvec{x}_i|\varvec{\theta }, \epsilon _i)\) the probability of the associated capture history, such that

$$\begin{aligned} f(\varvec{x}_i|\varvec{\theta }, \epsilon _i) = \prod _{t=1}^{T} p_{it}^{x_{it}}(1-p_{it})^{(1-x_{it})}. \end{aligned}$$

We note that the likelihood can be written more efficiently by further considering only unique observed capture histories, but for notational simplicity we retain the product over all observed individuals within this expression.

Previous approaches for dealing with the intractable likelihood include the use of Gauss–Hermite quadrature to estimate the integral (Coull and Agresti 1999). White and Cooch (2017) evaluated this approach further via simulation for different parameter values, and concluded that the results were generally unbiased except for relatively low recapture probabilities and/or few capture events. Further, they demonstrated that for larger individual heterogeneity variances a greater number of quadrature points are required to retain accuracy. Alternatively, a Bayesian data augmentation approach has been applied, treating the individual heterogeneity terms as additional parameters (or auxiliary variables) and calculating the joint posterior distribution over both parameters and auxiliary variables. However, in this approach the number of auxiliary variables is also an unknown (it is equal to the unknown parameter, N), leading to the use of a reversible jump algorithm (King and Brooks 2008) or super-population approach (Durban and Elston 2005; Royle et al. 2007). To address these issues King et al. (2016) proposed a computationally efficient semi-complete data likelihood model fitting approach, combining a data augmentation approach for the individual heterogeneity terms of observed individuals, with a numerical integration scheme for the likelihood component corresponding to the unobserved individuals. A similar approach was proposed by Bonner and Schofield (2014), for the case of individual continuous covariates for closed populations (assumed constant within the study period), using a Monte Carlo approach to perform the numerical integration necessary for the likelihood component associated with the unobserved individuals.

2.3 Open Capture–Recapture Models: Observed Heterogeneity

We consider Cormack–Jolly–Seber-type (CJS) models (Cormack 1964; Jolly 1965; Seber 1965) for open populations, which permit entries and exits from the population over the study period. We focus on the case where the survival probabilities are modelled as a function of individual time-varying continuous covariates, to explain the individual and temporal variability.

Notationally, we let \(y_{it}\) denote the covariate value associated with individual \(i=1,\ldots , n\) at time \(t=f_i, \ldots ,T\); and set \(\varvec{y}_i = \{y_{it}:t=f_i,\ldots ,T\}\). The survival probability is specified as a function of the covariate values such that \(h(\phi (y_{it})) = \beta _0 + \beta _1 y_{it}\), for all \(t=f_i,\ldots ,T-1\) and \(i=1,\ldots ,n\), where h denotes some function that maps to the interval [0, 1]; and \(\beta _0\) and \(\beta _1\) denote the corresponding regression parameters. Similarly we may also specify the recapture probabilities to be a function of the covariate such that \(h(p(y_{it})) = \gamma _0 + \gamma _1 y_{it}\), for \(t=f_i+1,\ldots ,T\) and \(i=1,\ldots ,n\), assuming the same functional form for h for simplicity; and where \(\gamma _0\) and \(\gamma _1\) are the associated regression parameters. For notational convenience, we let \(\varvec{\beta }=\{\beta _0, \beta _1\}\) and \(\varvec{\gamma }=\{\gamma _0, \gamma _1\}\). Further we define the stochastic model for the covariate values, assuming a first-order Markovian process, such that, for \(t=f_i,\ldots ,T-1\) and \(i=1,\ldots ,n\),

$$\begin{aligned} y_{it+1} | y_{it} \sim N(y_{it} + \mu _t, \sigma ^2_y). \end{aligned}$$
(3)

Clearly, suitable covariate models will be dependent on the given covariate(s) and biological knowledge. We note that in the case where the covariate value may not be observed at initial capture we also need to specify an initial state distribution for the initial covariate values. However, for simplicity, we assume the covariate values at initial capture are known for each individual, as is the case in our case study, but the approach is easily generalisable.

For capture–recapture studies we do not observe all the individual covariate values. Assuming that the covariate model is stochastic, if an individual is unobserved the associated covariate value is, by definition, also unknown; further if an individual is observed, the covariate value may still not be recorded. Finally, we let \(\varvec{\zeta }^1_i\) denote the set of times for which the covariate values are observed for individual i and \(\varvec{\zeta }^0_i\) the set of times following initial capture for which the covariate value is unknown for individual i.

To express the likelihood, we let \(\varvec{y}^{\mathrm{obs}}_i = \{y_{it}: t \in \varvec{\zeta }^1_i\}\) and \(\varvec{y}^{\mathrm{mis}}_i = \{y_{it}: t \in \varvec{\zeta }^0_i\}\) denote the observed and unobserved covariate values associated with individual \(i=1,\ldots ,n\). The full set of observed and unobserved covariate values are \(\varvec{y}^{\mathrm{obs}} = \{\varvec{y}_{i}^{\mathrm{obs}} : i=1,\ldots ,n \}\) and \(\varvec{y}^{\mathrm{mis}} = \{\varvec{y}_{i}^{\mathrm{mis}} : i=1,\ldots ,n\}\). The model parameters are \(\varvec{\theta }= \{\varvec{\beta }, \varvec{\gamma }\}\), with covariate parameters \(\varvec{\psi }= \{\mu _1,\ldots ,\mu _{T-1},\sigma ^2_y\}\). The observed data likelihood in Eq. (1) is given by,

$$\begin{aligned} f(\varvec{x}, \varvec{y}^{\mathrm{obs}} |\varvec{\theta }, \varvec{\psi })&= \prod _{i=1}^{n} \left[ \int f(\varvec{x}_i|\varvec{y}_i, \varvec{\theta }) f(\varvec{y}_i|\varvec{\psi }) \, d\varvec{y}^{\mathrm{mis}}_i \right] , \end{aligned}$$

and is again analytically intractable. The term \(f(\varvec{x}_i| \varvec{y}_i, \varvec{\theta })\) denotes the complete data likelihood component corresponding to the probability of the capture history \(\varvec{x}_i\); and \(f(\varvec{y}_i|\varvec{\psi })\) the joint probability density function of the covariate values associated with individual i. The probability of a given capture history, conditional on initial capture and all covariate values \(\varvec{y}_i\), is given by,

$$\begin{aligned} f(\varvec{x}_i|\varvec{y}_i, \varvec{\theta }) = \left[ \prod _{t=f_i}^{\ell _i-1} \phi (y_{it}) \right] \left[ \prod _{t=f_i+1}^{\ell _i} p(y_{it})^{x_{it}}(1-p(y_{it}))^{1-x_{it}} \right] \chi _{i\ell _i}, \end{aligned}$$

where \(\chi _{i \ell _i}\) denotes the probability that individual i is not recaptured again after time \(\ell _i\). We can express \(\chi _{i t}\) via the recursive function, \( \chi _{i t} = 1 - \phi (y_{it}) + \phi (y_{it})(1-p(y_{i t+1}))\chi _{i t+1}, \) such that \(\chi _{i T} = 1\), for all \(i=1,\ldots ,n\). The covariate model component of the observed data likelihood, conditioning on the initial covariate value (which we assume to be known, but can easily be relaxed by the inclusion of an initial state distribution) is given by,

$$\begin{aligned} f(\varvec{y}_i|\varvec{\psi }) = \prod _{t=f_i}^{T-1} f(y_{i t+1}|y_{it},\varvec{\psi }), \end{aligned}$$

where \(f(y_{i t+1}|y_{it},\varvec{\psi })\) denotes the associated density for the given covariate model.

Previous attempts for dealing with missing covariate values include both classical and Bayesian model-fitting approaches. In particular, Catchpole et al. (2004) derived a conditional likelihood approach, conditioning on only the known observed covariate values. This approach is computationally fast but leads to a (potentially substantial) reduction in the precision of the parameter estimates due to the amount of discarded information (Bonner et al. 2010); and cannot be applied when the observation process parameters (i.e. capture probabilities) are covariate dependent. To use all the available information, Worthington et al. (2015) consider a two-step multiple imputation approach, which involves initially fitting a model to only the observed covariate values and imputing the unobserved covariates before conditioning on these imputed values and using a complete-case likelihood approach for the associated capture histories. Alternatively Langrock and King (2013) numerically approximate the likelihood by finely discretising the integrals and estimate the integral via a hidden Markov model, providing improved parameter estimates. The integral can be made arbitrarily accurate by increasing the level of discretisation. However there is a trade-off between the accuracy of the estimate and the computational expense. Finally, within a Bayesian framework, a data augmentation approach has been applied, treating the missing covariate values as auxiliary variables and sampling from the joint posterior distribution of the parameters and auxiliary variables (Bonner and Schwarz 2006; King et al. 2008).

3 Laplace Approximation

The Laplace approximation is a numerical closed-form approximation of an integral and can be regarded as an alternative form of Gauss–Hermite quadrature (Liu 1994). The underlying idea of the Laplace integration is to approximate the negative log-likelihood by a second (or higher) order Taylor expansion. When the negative-log likelihood is well approximated by a Gaussian curve the Laplace approximation can be shown to have high precision. In particular, in a study from Liu (1994) the standard error of the estimate was shown to be of order \(O(m^{-1})\), where m denotes the number of observations. Adding in further leading terms can further improve the accuracy to the order of \(O(m^{-2})\) (Wong and Li 1992; Breslow and Lin 1995; Raudenbush et al. 2000). Using an analytical expression for the Laplace approximation is generally only feasible when the dimension of the integral is small (for example, of dimension 2 or 3) due to the computational complexity in computing the higher order derivatives. However, numerical approximations of the required derivatives can be obtained using automatic differentiation. In particular we use the Template Model Builder (TMB) automatic differentiation tool developed by Kristensen et al. (2016) that enables the use of the Laplace approximation using a computationally efficient implementation. We apply the Laplace approximation to individual heterogeneity capture–recapture models applying the TMB tool for numerically calculating the derivatives, when these are analytically intractable. We note that Laplace approximations have been used previously for capture–recapture models but within a Bayesian context for approximating the marginal posterior densities of the parameter of interest (Smith 1991; Chavez-Demoulin 1999).

3.1 Closed \(M_h\)-type Models

We consider the general \(M_{tbh}\) model with corresponding likelihood specified in Eq. (2). The integrand of the likelihood can be rewritten in exponential form, such that,

$$\begin{aligned}&f(\varvec{x}| \varvec{\theta }, \sigma ^2) \propto \frac{N!}{(N-n)!} \prod _{i=1}^{n} \left[ \int \exp \{ -g(\varvec{x}_{i},\epsilon _i|\varvec{\theta }, \sigma ^2)\} \, \mathrm{d}\epsilon _i\right] \nonumber \\&\quad \times \left[ \int \exp \{ -g(\varvec{x}_0, \epsilon _0|\varvec{\theta }, \sigma ^2)\} \, \mathrm{d}\epsilon _0\right] ^{N-n}, \end{aligned}$$
(4)

where,

$$\begin{aligned} g(\varvec{x}_{i}, \epsilon _i|\varvec{\theta }, \sigma ^2) = -\log f(\varvec{x}_{i}|\varvec{\theta }, \epsilon _i) - \log f(\epsilon _i|\sigma ^2), \end{aligned}$$

for \(i \in \{0,1,\ldots ,n\}\). Dropping the subscript notation on i for notational brevity, let \(\hat{\epsilon }\) denote the value of \(\epsilon \) that minimises \(g(\varvec{x}, \epsilon |\varvec{\theta }, \sigma ^2)\) given model parameters \(\varvec{\theta }\) and heterogeneity parameter \(\sigma ^2\), so that \(g'(\varvec{x}, \epsilon |\varvec{\theta }, \sigma ^2) = 0\) and \(g''(\varvec{x}, \epsilon |\varvec{\theta }) > 0\). A second-order Taylor series expansion is given by,

$$\begin{aligned} g(\varvec{x}, \epsilon |\varvec{\theta }, \sigma ^2) \approx g(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2) + \frac{g''(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2)(\epsilon -\hat{\epsilon })^2}{2}. \end{aligned}$$

Laplace’s method approximates the integrands in Eq. (4) using the properties of normal density functions such that the contribution to the observed data likelihood takes the form,

$$\begin{aligned} \int \exp \{ -g(\varvec{x}, \epsilon |\varvec{\theta }, \sigma ^2)\} \, \mathrm{d}\epsilon&\approx \exp \{ -g(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2)\} \int \exp \left\{ \frac{(\epsilon -\hat{\epsilon })^2}{2g''(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2)^{-1}} \right\} \, \mathrm{d}\epsilon \\&= \exp \{ -g(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2)\} \sqrt{\frac{2\pi }{g''(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2)}}. \end{aligned}$$

To improve the accuracy of the approximation, we can also consider a higher-order Laplace approximation involving higher-order derivatives (Raudenbush et al. 2000). We use the fourth-order Taylor expansion to obtain the fourth-order Laplace approximation. Applying the one-dimensional Laplace approximation on the integral yields,

$$\begin{aligned} \int \exp \{ -g(\varvec{x}, \epsilon |\varvec{\theta }, \sigma ^2)\} \, \mathrm{d}\epsilon\approx & {} \exp \{- g(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2)\} \sqrt{\frac{2\pi }{g''(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2)}} \nonumber \\&\, \times \left[ 1+\frac{5(g^{(3)}(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2))^2}{24(g''(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2))^3} - \frac{3g^{(4)}(\varvec{x}, \hat{\epsilon }|\varvec{\theta }, \sigma ^2)}{24(g''(\varvec{x, \hat{\epsilon }}|\varvec{\theta }, \sigma ^2))^2} \right] , \nonumber \\ \end{aligned}$$
(5)

where \(g^{(3)}(.)\) and \(g^{(4)}(.)\) denote the third and fourth derivatives with respect to the random effect term \(\epsilon \). A closed form expression for the fourth-order Laplace approximation is presented in Web Appendix A of the Supplementary Material where we assume the recapture probabilities are specified using the logistic function, so that \(\text{ logit } (p_{it}) = \alpha _t + \lambda S_{it} + \epsilon _i\).

3.2 CJS Model with Individual Time-Varying Continuous Covariates

For the open population CJS model with individual time-varying continuous covariates, the observed data likelihood in Eq. (1) can be expressed in the form,

$$\begin{aligned} f(\varvec{x}, \varvec{y}^{\mathrm{obs}}_i|\varvec{\theta }, \varvec{\psi })&= \prod _{i=1}^{n} \left[ \int \exp \{-g(\varvec{x}_i, \varvec{y}_i| \varvec{\theta }, \varvec{\psi })\} \, d\varvec{y}^{\mathrm{mis}}_i \right] , \end{aligned}$$

such that for \(i=1,\ldots ,n\),

$$\begin{aligned} g(\varvec{x}_i, \varvec{y}_i | \varvec{\theta }, \varvec{\psi })&= -\sum _{t=f_i}^{\ell _i-1} \log \phi (y_{it}) - \sum _{t=f_i+1}^{\ell _i} \{x_{it} \log p(y_{it}) + (1-x_{it}) \log (1-p(y_{it}))\} \\&\quad - \log \chi _{i\ell _i} - \sum _{t=f_i}^{T-1} \log f(y_{it+1}|y_{it}, \varvec{\psi }). \end{aligned}$$

Applying the multivariate k-dimension second-order Laplace approximation, we obtain,

$$\begin{aligned} \int \exp \{-g(\varvec{x}_i, \varvec{y}_i| \varvec{\theta }, \varvec{\psi })\} \, \varvec{y}^{\mathrm{mis}}_i \approx \exp \{ -g(\varvec{x}_i, \varvec{y}^{\mathrm{obs}}_i, \hat{\varvec{y}}^{\mathrm{mis}}_i |\varvec{\theta }, \varvec{\psi })\} (2\pi )^{k/2}|g''(\varvec{x}, d\varvec{y}^{\mathrm{obs}}_i, \hat{\varvec{y}}^{\mathrm{mis}}_i |\varvec{\theta }, \varvec{\psi })|^{-1/2}, \end{aligned}$$

where \(\hat{\varvec{y}}^{\mathrm{mis}}_i\) denotes the value of \(\varvec{y}^{\mathrm{mis}}_i\) that minimises \(g(\varvec{x}_i, \varvec{y}_i| \varvec{\theta }, \varvec{\psi })\). There is no closed form for the derivatives (due to the \(\chi \) term) and so we use numerical automatic differentiation function provided in TMB, to evaluate the Laplace approximation for this model.

4 Simulation Study

To assess the performance of the Laplace approximation for both closed and open capture–recapture models we conduct a simulation study, using 1000 datasets for each simulation. For the closed population \(M_{h}\)-type models, we apply both the second and fourth order Laplace approximations (LA2 and LA4, respectively) and compare with a Gauss–Hermite quadrature (GHQ) approximation. For the open population CJS model with covariates we compare the Laplace approximation with an HMM approximation (Langrock and King 2013). All methods use TMB for computational comparability for calculating the MLEs of the parameters.

Appendix B of the Supplementary Material provides a detailed discussion of the simulation study and associated results. For the closed population \(M_h\)-type models the MLEs of the parameters, N and \(\sigma \), appear to be slightly negatively biased. However this bias was consistently less for LA4 and GHQ than for LA2. Further, the coverage probabilities for the LA2 approximation were also consistently smaller than the alternative approaches. See Table 1 of Appendix B for further details. We conducted a further simulation study focusing on model \(M_h\) to investigate the performance of the Laplace approximations and GHQ for increasing values of the individual heterogeneity variance, \(\sigma ^2\), as White and Cooch (2017) showed that as \(\sigma ^2\) increases the model-fitting becomes more challenging. Within our study, the LA2 approximation again performs relatively poorly; while the performance of the LA4 and GHQ approaches are very similar in terms of both relative bias and coverage probabilities (see Table 2 of Appendix B). Of further note, for the GHQ approach the width of the confidence interval increases as \(\sigma \) increases. This relationship is not observed for LA4 with similar width confidence intervals across the different values of \(\sigma \), yet both approximations return very similar levels of coverage probabilities for the parameters. For the CJS model we consider a range of scenarios, based on sample size (\(n=200,400\)), capture regimes (\(p=0.5,0.75\)) and length of study \((T=4,6\)). We assume that there is a single covariate and consider two possible covariate models over time corresponding to (i) a random walk; and (ii) a random walk with time-dependent effects. The second-order Laplace approach is compared to the alternative HMM approximation (Langrock and King 2013). For all scenarios, both approaches perform consistently well and similarly in terms of estimating the model parameters (see Tables 3 and 4 of Appendix B).

Finally, we consider the comparative computational expense. The Laplace approximations are consistently faster than the competing methods across all models, however as the model complexity increases, this difference is accentuated. For example, for model \(M_h\) the Laplace approximations are nearly twice as fast at evaluating the log-likelihood function as the given implementation of GHQ; but for model \(M_{th}\) this increases to 8–10 times faster. For CJS-type models the differential in computational expense is even more substantial. For example, the Laplace approximation was often (at least) one order of magnitude faster, and dependent on the size of the data and implementation of the HMM approximation, increases to two orders of magnitude faster. For further discussion, and more specific details regarding computational comparisons, see Appendix B.

5 Examples

We consider two case studies corresponding to the closed \(M_h\)-type models (St. Andrews golf tees data) and open CJS model with individual time-varying continuous covariates (meadow voles). We again compare the Laplace approximation with alternative approaches. All the codes used in these example are provided in the github repository detailed in the information regarding additional Supplementary Material.

5.1 St. Andrews Golf Tees

We consider the St. Andrews golf tees data from Borchers et al. (2002). The dataset consists of \(N=250\) tees differing in size, colour and visibility resulting in individual capture heterogeneity. A total of \(T=8\) independent observers (i.e. capture occasions) were assigned to predefined transects and recorded all golf tees observed. A total of 546 observations were recorded and \(n=162\) unique tees observed in the study (additional size/colour data were not recorded).

Borchers et al. (2002) showed that omitting the presence of individual heterogeneity vastly underestimates the true population size, thus we consider the set of closed population models with individual heterogeneity present. Table 1 provides the estimated population size and \(95\%\) non-parametric bootstrap confidence interval fitted to the four individual heterogeneity \(M_h\)-type models for the Laplace approximations and GHQ approach. In general, the higher order Laplace approximation (LA4) and the GHQ give relatively similar maximum likelihood estimates for N, varying from approximately 242 to 262; while the lower order Laplace approximation (LA2) gives somewhat varying estimates, as previously observed within the simulation study in Sect. 4. The higher order Laplace approximation tends to consistently produce slightly larger estimates of N than the GHQ approach for all individual heterogeneity models. For example, the estimates of N under \(M_h\) model are 251.3 and 242.4 for the LA4 and GHQ approaches, respectively.

The bootstrap confidence intervals for the GHQ approach are noticeably wider than the LA4 approximation, with a consistently substantially larger upper limit (we note that the lower limit is bounded by the number of observed individuals). In particular, the highest upper bound of the LA4 approximation is approximately 310, with the width ranging from 90 to 100; while the lowest upper bound in the GHQ approach is 358 and the widths generally double. King et al. (2008) report a similar uncertainty interval as the LA4 approximation, using a Bayesian approach with a model-averaged 95% credible interval of [194, 288], over the same set of models. Further, the estimate of \(\sigma \) for each of these models is relatively large (approximately 2 for all fitted models). We note that as observed in the second simulation study (see Sect. 4 and Appendix B), as \(\sigma \) increases, the width of the 95% confidence intervals for N using the GHQ approach are increasingly larger than for the Laplace approximations, yet providing comparable coverage probabilities.

Finally, we compared the comparative computational times. In general, the computational speed is fast in terms of absolute time, and on the order of milliseconds for each of the methods. However, comparing the Laplace approximations to GHQ, the Laplace approaches are approximately twice as fast for model \(M_h\), increasing to fives times as fast for model \(M_{tbh}\) (Table 1).

Table 1 The estimates of the total population of St. Andrews golf tees and associated non-parametric bootstrap \(95\%\) confidence interval (in brackets) for the four individual heterogeneity models using second-order (LA2) and fourth-order (LA4) Laplace approximations and Gauss–Hermite quadrature (GHQ) with 50 quadrature points

5.2 Meadow Voles

We consider capture–recapture data collected on meadow voles (Microtus pennsylvanicus) at Patuxent Wildlife Research Center, Laurel, Maryland over \(T=4\) capture occasions from 1981 to 1982 (Nichols et al. 1992). A total of 512 voles were observed over the study period. When an individual was observed, its corresponding body mass was also recorded. We follow Bonner and Schwarz (2006) by classifying individuals as immature (body mass \(\le 22\) g) and mature (body mass \(> 22\) g) and consider data only for the mature voles observed for the first time prior to the final capture occasion. This provides a total of \(n=199\) unique capture histories corresponding to a total of 450 observed sightings and associated body mass recordings. We note that approximately 40% of body mass recordings were unknown, following initial capture.

We follow Bonner and Schwarz (2006) and consider the model where the survival and recapture probabilities are dependent on body mass. In particular, we let \(y_{it}\) denote the body mass of individual \(i=1,\ldots ,n\) at time \(t=f_i,\ldots ,T-1\) and set,

$$\begin{aligned} \text {logit}(\phi _{it}) = \beta _0 + \beta _1 y_{it};&\text{ and } \qquad \text {logit}(p_{it+1}) = \gamma _0 + \gamma _1 y_{it+1}. \end{aligned}$$

Similarly, we specify the model for body mass to be of the form,

$$\begin{aligned} y_{it+1} | y_{it} \sim \text {N}(y_{it} + \mu _t, \sigma ^2), \end{aligned}$$

for \(i=1,\ldots ,n\) and \(t=f_i+1,\ldots ,T\). The set of model parameters is \(\varvec{\theta }= \{\beta _0,\beta _1,\gamma _0,\gamma _1\}\), and the heterogeneity parameters \(\varvec{\psi }= \{\mu _1,\mu _2,\mu _3,\sigma ^2\}\). All covariate values were recorded when an individual was observed, so we do not need to specify an initial distribution for body mass.

We compared the Laplace approximation, HMM-approximation using 20 intervals (Langrock and King 2013), two-step multiple imputation approach (Worthington et al. 2015) and Bayesian data augmentation approach (as fitted by Bonner and Schwarz 2006). Figure 1 provides the parameter estimates of the fitted CJS model for the different approaches, in terms of MLE/posterior mean and associated 95% confidence/credible interval. All approaches provide generally similar results, and in particular we note that the results for the Laplace approximation and HMM-approximation are very similar. There are however some differences in the results obtained via the two-step multiple imputation approach for the parameters associated with recapture probability in terms of the MLEs and with noticeably larger confidence intervals (the latter is also true for the estimation of \(\sigma \)). Some differences are not unexpected since the regression model parameters are estimated independently by fitting the given covariate model to the observed covariate values only, ignoring the capture observations.

Finally, we compared the associated computational times for each of the different approaches. To obtain the MLE of the parameters, the Laplace approximation takes \(<1\) s; the multiple imputation approach approximately 3 s; and the HMM-approximation approximately 9 s for these data. Thus, using 999 bootstrap replicates to obtain the 95% confidence intervals, the computation times are of the order of approximately 15 min; 45 min and 2.5 h for the Laplace approximation, imputation approach and HMM-approximation, respectively. The Bayesian data augmentation approach using a Markov chain Monte Carlo algorithm will depend on the updating algorithm used, number of iterations required for the Markov chain to converge to the stationary distribution and to obtain reliable posterior summary statistics with small Monte Carlo error following convergence and thus will be comparatively slower, particularly in relation to the fast Laplace approximation.

Fig. 1
figure 1

Comparison of the parameter estimates (MLE or posterior mean) and associated 95% uncertainty interval (non-parameteric confidence interval or symmetric credible interval) for the Laplace approximation, HMM-approximation, multiple imputation and Bayesian data augmentation approaches

6 Discussion

In this paper, we describe a Laplace approximation to estimate the analytically intractable observed data likelihood for capture–recapture models in the presence of individual heterogeneity. The complexity of the likelihood function influences the order of the Taylor expansion that can be analytically calculated. For closed population \(M_h\)-type models a fourth-order expression can be analytically derived; whereas for the CJS model with continuous time-varying covariates we use automatic differentiation (via TMB) to obtain the necessary numerical derivatives, as they are analytically intractable. The Laplace approximation for these models provide a reliable and efficient mechanism for evaluating the likelihood function, and hence obtaining the maximum likelihood estimates, and associated confidence intervals.

In particular, comparing this approach to the current “gold-standards”, namely GHQ for \(M_h\)-type models, and an HMM-approximation for the CJS model with continuous covariates, the Laplace approximation consistently performs at least as well but at substantially lower computational cost. However, we note that for the \(M_h\)-type models, the fourth-order Laplace approximation is required for this performance. Further, for the scenarios considered, the coverage probabilities were essentially identical between the fourth-order Laplace approximation and the GHQ approach, yet the width of the associated 95% confidence intervals were significantly larger for the quadrature approach with increasing individual heterogeneity variability (i.e. \(\sigma ^2\)). Wider confidence intervals for increasing variability is also discussed by White and Cooch (2017) when using the quadrature approach. Understanding this apparent difference is the focus of current research. In addition the Laplace approximation is scalable to higher dimensions and thus this approach is potentially a very attractive avenue to pursue for more complex models (for example, in the presence of multiple continuous individual covariates), particularly as the GHQ and the HMM approximation generally suffer from the curse of dimensionality (Langrock and King 2013). Huber et al. (2004) explain that summation-based approaches such as GHQ and HMM may have larger bias when the dimension of integrals increases. The reason of this behaviour is due to GHQ and HMM-approximation being based on pre-specified and fixed quadrature points. These fixed points can easily become too coarse, particular in higher dimensions, such that the peak of the log-likelihood is missed.

Computational efficiency continues to be an important consideration, as models become more complex and/or datasets/studies increase in size. For example, Warton et al. (2015) describe and compare several approaches in terms of speed and accuracy for fitting joint ecological mixed models, including Laplace approximations, adaptive quadrature, data imputation approaches and variational approximations. Within their applications considered, the Laplace approximation was again a very computationally efficient approach, but less accurate for small samples; while adaptive quadrature appeared a reasonable compromise between accuracy and speed. The Expectation–Maximisation (EM) algorithm and Bayesian data augmentation were accurate but relatively slow. Variational approximations, appeared to be as computationally efficient as Laplace approximations whilst also providing moderately accurate estimates. Further, Niku et al. (2019) showed that variational approximations coded in TMB had smaller bias for negative binomial generalized linear latent variable models compared with the second order Laplace approximation. Exploring the use of variational approximations for capture–recapture models is also an area of current research, as well as investigating the robustness of Laplace approximation to models where the individual heterogeneity component is non-Gaussian.

7 Supplementary Material

Web Appendices referenced in Sects. 3.1, 4 and 5.1 are available with this paper in the online Supplementary Material. The codes used in this paper can be assessed at https://github.com/riki-herliansyah/capture-recapture.