For designed distance sampling experiments, we wish to test for treatment effects on counts, while accounting for variation in detectability. For this, we need plot-based models.
Plot Count Models
Exact Distance Data
So far, we have ignored the spatial information in our data. Denote the unknown number of animals on plot k (\(k=1,\dots ,K\)) by \(N_k\), and the number of animals detected on plot k by \(n_k\), where \(\sum _{k=1}^K n_k = n\). Here we define plot k to mean the strip of half-width w and length \(L_k\) centred on line k (line transect sampling) or the circle of radius w centred on point k (point transect sampling). We denote the area of plot k by \(a_k\), so that \(a_k=2wL_k\) (line transect sampling) or \(a_k=\pi w^2\) (point transect sampling).
We consider two models for the plot counts \(n_k\): the multinomial, which involves extending the binomial approach of Sect. 2.1, and the Poisson.
Two new issues arise when we move to plot count models. The first is that we need to decide whether we are interested in inference about total abundance N, as in Sect. 2.1, or whether we only wish to compare densities among the plots, as occurs in a designed distance sampling experiment. In the second case, we wish to restrict inference to the plots, for example to compare a treatment with a control. The second new issue is that we may have covariates at the plot level or higher, but we may also have individual covariates (i.e. covariates whose values are recorded for each individual detection). If the detection function depends only on plot-level covariates, we can condition on the covariates, and there is no advantage in terms of estimating probability of detection to adding a component to the likelihood for these covariates. However, if the detection function also (or instead) depends on individual covariates, then the detections on a given plot are biased towards those taking covariate values that increase the probability of detection. Thus we need to include a component in the likelihood corresponding to the distribution of individual covariates.
In Sect. 2.1, for model-based conventional distance sampling, we wrote the full likelihood as \({\mathcal {L}}_{n,y} = {\mathcal {L}}_n\times {\mathcal {L}}_y\). We can again take \({\mathcal {L}}_y\) as given in (1). In place of \({\mathcal {L}}_n\), we write \({\mathcal {L}}_{\{n_k\}}\) where \(\{n_k\}\) is the set of plot counts \(n_k\), \(k=1,\dots ,K\). Extending the binomial likelihood of (3), we obtain the multinomial model:
$$\begin{aligned} {\mathcal {L}}_{\{n_k\}} = \frac{N!}{\prod _{k=1}^{K}n_k!(N-n)!}\left[ 1- \sum _{k=1}^K \alpha _k P_k \right] ^{N-n} \prod _{k=1}^K (\alpha _k P_k)^{n_k}, \end{aligned}$$
(21)
where \(\alpha _k\) is the probability that an animal is located on plot k, and \(P_k\) is the probability that an animal is detected, given that it is on plot k. (When \(K=1\), (21) reduces to (3).) Under a uniform density model, \(\alpha _k\) is simply the area of plot k divided by the total study area. To model how density varies through the study area, we can express \(\alpha _k\) as a function of plot covariates \(\mathbf{x}_k\): \(\alpha _k \equiv \pi _x(\mathbf{x}_k)\). For model-based conventional distance sampling (with no covariates other than distance in the detection function), \(P_k = P_\mathrm{a} = \int _0^w g(y)\pi _y(y)\,\mathrm{d}y\), the same for every plot. If the detection function is a function of plot covariates \(\mathbf{x}_k\) but not of individual covariates (other than distance y), then
$$\begin{aligned} P_k = \int _0^w g(y,\mathbf{x}_k)\pi _y(y)\,\mathrm{d}y, \end{aligned}$$
(22)
and if it is also a function of individual covariates \(\mathbf{z}\) (i.e. model-based multiple-covariate distance sampling), then we must specify a model for the probability density function \(\pi _z (\mathbf{z})\) of \(\mathbf{z}\) in the population, and take the expectation over this density:
$$\begin{aligned} P_k = \int _0^w \int _\mathbf{z} g(y,\mathbf{z},\mathbf{x}_k)\pi _z(\mathbf{z})\pi _y(y)\,\mathrm{d}\mathbf{z}\,\mathrm{d}y. \end{aligned}$$
(23)
In the latter case, to complete the full likelihood, instead of taking \({\mathcal {L}}_y\), we would multiply \({\mathcal {L}}_{\{n_k\}}\) by \({\mathcal {L}}_{z}\times {\mathcal {L}}_{y|z}\), where \({\mathcal {L}}_z\) is from (10) and \({\mathcal {L}}_{y|z}\) is from (11).
If we wish to restrict inference to the plots, as occurs in designed experiments using distance sampling, then we can replace (21) by
$$\begin{aligned} {\mathcal {L}}_{\{n_k\}} = \frac{N_c!}{\prod _{k=1}^{K}n_k!(N_c-n)!}\left[ 1- \sum _{k=1}^K \alpha _k P_k \right] ^{N_c-n} \prod _{k=1}^K (\alpha _k P_k)^{n_k}, \end{aligned}$$
(24)
where \(N_c=\sum _{k=1}^{K} N_k\) is total abundance on the plots.
Poisson models offer a simpler alternative when inference is restricted to the plots:
$$\begin{aligned} {\mathcal {L}}_{\{n_k\}} = \prod _{k=1}^K \frac{\lambda _k^{n_k}\exp [-\lambda _k]}{n_k!}, \end{aligned}$$
(25)
where for model-based conventional distance sampling,
$$\begin{aligned} \lambda _k=E(n_k)=E(N_k)\times P_\mathrm{a}=\exp \left( \sum _{q=1}^Q x_{qk}\beta _q +\log _e(a_k P_\mathrm{a})\right) \end{aligned}$$
(26)
so that the vector \(\mathbf{x}_k\), with qth element \(x_{qk}\), represents covariates recorded at the plot level.
Equation (26) defines a generalized linear model with log link function and an offset term of \(\log _e(a_k P_\mathrm{a})\). The complication is that \(P_\mathrm{a}\) is an unknown parameter. This suggests a two-stage approach: maximize \({\mathcal {L}}_y\), to give an estimate of \(P_\mathrm{a}\), and then substitute our estimate \(\hat{P}_\mathrm{a}\) into the offset, and maximize \({\mathcal {L}}_{\{n_k\}}\) using standard generalized linear modelling software. Melville and Welsh (2014) adopted this strategy. The method fails to take account of the uncertainty in \(\hat{P}_\mathrm{a}\) at the second stage. One way to propagate the uncertainty from stage 1 into stage 2 is to use a bootstrap (Buckland et al. 2009). Williams et al. (2011) described a less computer-intensive and more stable method, but given its complexity, and the fact that the two-stage approach does not in general give maximum likelihood estimates of the parameters, a full likelihood approach, in which \(P_\mathrm{a}\) is just treated as another parameter to estimate, seems preferable:
$$\begin{aligned} {\mathcal {L}}_{\{n_k\},y} = {\mathcal {L}}_{\{n_k\}}\times {\mathcal {L}}_y = \prod _{k=1}^K \frac{\lambda _k^{n_k}\exp [-\lambda _k]}{n_k!} \times \prod _{i=1}^n\frac{g(y_i) \pi _y(y_i)}{P_\mathrm{a}}. \end{aligned}$$
(27)
If we have individual covariates \(\mathbf{z}\) in the detection function, then the full likelihood is \({\mathcal {L}}_{\{n_k\},z,y} = {\mathcal {L}}_{\{n_k\}}\times {\mathcal {L}}_z\times {\mathcal {L}}_{y|z}\) where \({\mathcal {L}}_z\) is given by (10) and \({\mathcal {L}}_{y|z}\) by (11). Further, \(P_k\), the probability of detection on plot k, now varies by plot, so that our model for plot counts becomes
$$\begin{aligned} \lambda _k=E(n_k)=E(N_k)\times P_k=\exp \left( \sum _{q=1}^Q x_{qk}\beta _q +\log _e(a_k P_k)\right) . \end{aligned}$$
(28)
When the detection function depends on plot-level covariates but not on individual covariates, then we do not need to specify a distribution for these covariates; instead, we simply form the likelihood conditional on the covariate values, as for the multinomial model.
Note that if we adopt a spatial non-homogeneous Poisson process distance sampling model, we can write
$$\begin{aligned} \lambda _k = \int _k D(l) g(y(l))\,\mathrm{d}l, \end{aligned}$$
(29)
where the integral is over plot k (compare with \(\mu _A\) of (5)). In this case, for plot k, \(\pi _y(y)\) is the integral of D(l) across the sections of plot at distance y from the line or point (two parallel incremental strips for line transect sampling, and an incremental annulus for point transect sampling), divided by the integral across the whole plot. In practice, if plots are small, we are likely to approximate this by assuming that \(\pi _y(y)\) is uniform (line transects) or triangular (point transects). If a fully spatial model is preferred, it would be sensible to record location \(l_i\) of detection i, and not just its distance \(y_i\) from the line or point. A point process likelihood of the form given by Hedley and Buckland (2004) might then be used.
We can specify models for \(\lambda _k\) of the form of (26), where the covariates \(\mathbf{x}_k\) (assumed to be at the plot level or higher) might define the design in the case of a designed distance sampling experiment, or might be spatial covariates for a spatial model, or might simply be any explanatory variables that are potentially useful for modelling animal density. Generally, we would define \(x_{1k}=1\) for all k, so that \(\beta _1\) is an intercept term. We can also replace linear terms by smooth terms to give greater flexibility (e.g. Hedley and Buckland 2004). Instead of maximizing the two likelihood components separately, we can maximize the full likelihood, or use Bayesian methods to draw inference on all unknown parameters. Thus the parameter \(P_k\) in the offset, which for the two-stage approach was estimated in stage 1 then treated as known in stage 2, is now a function of the detection function parameters (below), and estimated along with all other parameters in a single step. Oedekoven et al. (2014) proposed the above approach, with the inclusion of a random effect for location in the model for \(\lambda _k\) (see below).
If counts are summed across repeat visits to a plot, the offset term is multiplied by the effort, where effort is defined to be the number of repeat visits; and if counts are summed across replicate plots, plot size \(a_k\) is the combined size of the plots whose counts have been combined.
The product \(a_k P_k\) is the effective area surveyed on plot k. The \(P_k\) are defined exactly as for the multinomial models.
Note that neither population size N nor plot abundances \(N_k\) appear as parameters in the Poisson likelihood. For designed distance sampling experiments, we only wish to compare densities in the plots, and have no interest in estimating N in a wider study area. However, with this approach, we can still draw inference on abundance. For spatial distance sampling models, we can predict density throughout the study area, and so can use numerical integration under the fitted density surface to estimate abundance either for the full study area or for any subset of it. We can also estimate plot abundance \(N_k\) by \(\hat{N}_k = \hat{\lambda }_k/\hat{P}_k\). By contrast, the multinomial model allows direct inference for both population size N and plot abundances \(N_k = N\pi _x(\mathbf{x}_k)\), and, as with the Poisson model, the effect of the covariates \(\mathbf{x}\) on abundance or density can be investigated through the parameters of the model for \(\pi _x(\mathbf{x})\).
Grouped Distance Data
For the case without covariates, let \(m_{jk}\) be the number of detections in distance interval j on plot k, with \(\sum \limits _{j=1}^u m_{jk} = n_k\). Adopting a Poisson model for these counts, and given \(E(n_k)=\lambda _k\), then \(E(m_{jk})=\lambda _k f_j\) for \(j=1,\dots ,J\), where \(f_j\) is given in (7).
We can now write the full likelihood as
$$\begin{aligned} \prod _{k=1}^K \prod _{j=1}^J \frac{(\lambda _k f_j)^{m_{jk}} \exp (-\lambda _k f_j)}{m_{jk}!}. \end{aligned}$$
(30)
For detection function covariates \(\mathbf{z}\) recorded at the plot level, or at the stratum level if the design is stratified (but not at the individual level), we can define
$$\begin{aligned} f_{jk} = \int _{c_{j-1}}^{c_j} {f_{y|z}(y|\mathbf{z}_k)}\,\mathrm{d}y=\frac{\int \limits _{c_{j-1}}^{c_j}{g\left( y,\mathbf{z}_k\right) \pi _y(y)\mathrm{d}y}}{P_\mathrm{a}(\mathbf{z}_k)} \end{aligned}$$
(31)
giving the full likelihood
$$\begin{aligned} \prod _{k=1}^K \prod _{j=1}^J \frac{(\lambda _k f_{jk})^{m_{jk}} \exp (-\lambda _k f_{jk})}{m_{jk}!}. \end{aligned}$$
(32)
Note that when using the Poisson model for the expected abundances for this approach, we do not have separate components for \({\mathcal {L}}_{\{n_k\}}\) and \({\mathcal {L}}_m\). Oedekoven et al. (2014) adopted a different strategy which is essentially the grouped data equivalent of (27), and which does have separate components for \({\mathcal {L}}_{\{n_k\}}\) (using a generalized version of (25)) and \({\mathcal {L}}_m\) (using (6)).
Plot Abundance Models
Royle et al. (2004) adopted what appears to be a different strategy for grouped distance data. Again \(m_{jk}\) is the count of detected animals in distance interval j on plot k, with \(\sum \limits _{j=1}^J m_{jk} = n_k\). We define the proportion \(P_j\) of plot abundance that was observed within distance interval j:
$$\begin{aligned} P_j = \int \limits _{c_{j-1}}^{c_j}{g(y)\pi _y(y)\mathrm{d}y}, \end{aligned}$$
(33)
where g(y) and \(\pi _y(y)\) represent the detection function and the distribution of distances from the line or point in the population as before. The sum of proportions \(P_j\) over all J distance intervals gives the average detection probability \(P_\mathrm{a}\), i.e. \(\sum \limits _{j=1}^J{P_j} = P_\mathrm{a}\), where \(P_\mathrm{a} = \int _{0}^w{g(y)\pi _y(y)\mathrm{d}y}\) (2).
The \(P_j\) represent the proportion of plot abundance \(N_k\) that was both located in distance interval j and detected, while the \(f_j\) represent the proportion of detected animals \(n_k\) that were located in distance interval j. Hence we have the relationship \(f_j = P_j/P_\mathrm{a}\).
As we do not observe the true abundances on the plot, we set \(E(N_k) = \kappa _k\) and model these using a log-linear Poisson model:
$$\begin{aligned} \kappa _k = \exp \left( \sum _{q=1}^Q{x_{qk}\beta _q}\right) . \end{aligned}$$
(34)
The observed counts in distance interval j are then modelled as a Poisson random variable, \(m_{jk} \sim \mathrm{Poisson}(\kappa _k \times P_j)\). The likelihood for this model is
$$\begin{aligned} {\mathcal {L}}_{{\{n_k\}},m}=\prod \limits _{k=1}^{K}{\prod \limits _{j=1}^{J}{\frac{\left( \kappa _k P_j\right) ^{m_{jk}}\exp \left( -\kappa _k P_j\right) }{m_{jk}!}}}. \end{aligned}$$
(35)
By noting that \(\lambda _k = \kappa _k P_\mathrm{a}\), we see that the above model for \(\kappa _k\) is equivalent to our model for \(\lambda _k\), provided plot sizes are all the same, and arbitrarily set as \(a_k=1\):
$$\begin{aligned} \lambda _k = \kappa _k P_\mathrm{a} = \exp \left( \sum _{q=1}^Q{x_{qk}\beta _q} + \log (P_\mathrm{a})\right) . \end{aligned}$$
(36)
Thus the Poisson rate corresponding to count \(m_{jk}\) is \(\lambda _k f_j = \lambda _k P_j/P_\mathrm{a} = \kappa _k P_j\), so that the likelihood of (35) is equivalent to that of (30). Here, we use (30), as it allows plot area \(a_k\) to vary.
One of the approaches of Hedley and Buckland (2004) combines a plot abundance model with a two-stage modelling strategy: a Horvitz–Thompson-like estimator is used to estimate abundance \(N_k\) on plot k, and these estimates \(\hat{N}_k\) are taken as the responses for a spatial model. When individual covariates are recorded, this offers a simpler, if conceptually less appealing, approach to the use of (23) in a plot count model.