Likelihood ratio confidence interval for the abundance under binomial detectability models

Liu, Yang; Liu, Yukun; Fan, Yan; Geng, Han

doi:10.1007/s00184-018-0655-2

Likelihood ratio confidence interval for the abundance under binomial detectability models

Published: 30 March 2018

Volume 81, pages 549–568, (2018)
Cite this article

Metrika Aims and scope Submit manuscript

Yang Liu¹,
Yukun Liu¹,
Yan Fan² &
…
Han Geng¹

334 Accesses
1 Citation
Explore all metrics

Abstract

Binomial detectability models are often used to estimate the size or abundance of a finite population in biology, epidemiology, demography and reliability. Special cases include incompletely observed multinomial models, capture–recapture models, and distance sampling models. The most commonly-used confidence interval for the abundance is the Wald-type confidence interval, which is based on the asymptotic normality of a reasonable point estimator of the abundance. However, the Wald-type confidence interval may have poor coverage accuracy and its lower limit may be less than the number of observations. In this paper, we rigorously establish that the likelihood ratio test statistic for the abundance under the binomial detectability models follows the chisquare limiting distribution with one degree of freedom. This provides a solid theoretical justification for the use of the proposed likelihood ratio confidence interval. Our simulations indicate that in comparison to the Wald-type confidence interval, the likelihood ratio confidence interval not only has more accurate coverage rate, but also exhibits more stable performance in a variety of binomial detectability models. The proposed interval is further illustrated through analyzing three real data-sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian estimation of the number of species from Poisson-Lindley stochastic abundance model using non-informative priors

Article 23 February 2024

On the Turing estimator in capture–recapture count data under the geometric distribution

Article 12 November 2018

Two-step semiparametric empirical likelihood inference from capture–recapture data with missing covariates

Article 14 February 2024

References

Alho JM (1990) Logistic regression in capture–recapture models. Biometrics 46:623–635
Article MathSciNet MATH Google Scholar
Barnard J, Emam K, Zubrow D (2003) Using capture–recapture models for the reinspection decision. Softw Qual Prof 5:11–20
Google Scholar
Borchers DL, Zucchini W, Fewster RM (1998) Mark-recapture models for line transect surveys. Biometrics 54:1207–1220
Article MATH Google Scholar
Borchers DL, Buckland ST, Zucchini W (2002) Estimating animal abundance: closed population. Springer, London
Book MATH Google Scholar
Borchers DL, Stevenson BC, Kidney D, Thomas L, Marques TA (2015) A unifying model for capture-recapture and distance sampling surveys of wildlife populations. J Am Stat Assoc 110:195–204
Article MathSciNet MATH Google Scholar
Buckland ST, Anderson DR, Burnham KP, Laake JL, Borchers DL, Thomas L (2001) Introduction to distance sampling. Oxford University Press, Oxford
MATH Google Scholar
Chao A (1987) Estimating the population size for capture–recapture data with unequal catchability. Biometrics 43:783–791
Article MathSciNet MATH Google Scholar
Chao A, Chu W, Hsu CH (2000) Capture–recapture when time and behavioral response affect capture probabilities. Biometrics 56:427–433
Article MATH Google Scholar
Chao A, Tsay PK, Lin SH, Shau WY, Chao DY (2001) The applications of capture–recapture models to epidemiological data. Stat Med 20:3123–3157
Article Google Scholar
Chen SX, Lloyd CJ (2002) Estimation of population size from biased samples using non-parametric binary regression. Stat. Sin. 12:505–518
MathSciNet MATH Google Scholar
Cormack RM (1992) Interval estimation for mark-recapture studies of closed population. Biometrics 48:567–576
Article MathSciNet Google Scholar
Evans MA, Bonett DG (1994) Bias reduction for multiple recapture estimators of closed population size. Biometrics 50:388–395
Article Google Scholar
Evans MA, Kim H, O’Bren TE (1996) An application of profile-likelihood based confidence interval to capture–recapture estimators. J Agric Biol Environ Stat 1(1):131–140
Article MathSciNet Google Scholar
Fancy SG, Snetsinger TJ, Jacobi JD (1997) Translocation of the Palila, an endangered Hawaiian honeycreeper. Pac Conserv Biol 3:39–46
Article Google Scholar
Fewster RM, Jupp PE (2009) Inference on population size in binomial detectability models. Biometrika 96:805–820
Article MathSciNet MATH Google Scholar
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378
Article MathSciNet MATH Google Scholar
Heinze D, Broome L, Mansergh I (2004) A review of the ecology and conservation of the mountain pygmy-possum Burramys parvus. In: Goldingay RL, Jackson SM (eds) The biology of Australian Possums and Gliders. Baulkham Hills, Surrey Beatty & Sons, pp 254–267
Google Scholar
Hjort NL, Pollard D (2011) Asymptotics for minimisers of convex processes. arXiv:1107.3806v1
Hogan H (2000) Accuracy and coverage evaluation 2000: decomposition of dual system estimate components. U.S. Census Bureau, Washington
Google Scholar
Huggins RM (1989) On the statistical analysis of capture experiments. Biometrika 76:133–140
Article MathSciNet MATH Google Scholar
Huggins R, Hwang WH (2007) Non-parametric estimation of population size from capture–recapture data when the capture probability depends on a covariate. J R Stat Soc Ser C 56:429–443
Article MathSciNet Google Scholar
Liu Y, Li P, Qin J (2017) Maximum empirical likelihood estimation for abundance in a closed population from capture–recapture data. Biometrika 104(3):527–543
MathSciNet Google Scholar
Marques FFC, Buckland ST (2004) Covariate models for the detection function. In: Buckland ST, Anderson DR, Burnham KP, Laake JL, Borchers DL, Thomas L (eds) Advanced distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford
Marques TA, Thomas L, Fancy SG, Buckland ST (2007) Improving estimates of bird density using multiple-covariate distance sampling. Auk 124(4):1229–1243
Article Google Scholar
Otis DL, Burnham KP, White GC, Anderson DR (1978) Statistical inference from capture data on closed animal populations. Wildl Monogr 62:1–135
MATH Google Scholar
Pollock KH (2000) Capture–recapture models. J Am Stat Assoc 95:293–296
Article Google Scholar
Qin J, Lawless J (1994) Empirical likelihood and general estimating equations. Ann Stat 22(1):300–325
Article MathSciNet MATH Google Scholar
Sanathanan L (1972) Estimating the size of a multinomial population. Ann Math Stat 43:142–152
Article MathSciNet MATH Google Scholar
Serfling RJ (1980) Approximation theorem of mathematical statistics. Wiley, New York
Book MATH Google Scholar
Stoklosa J, Hwang WH, Wu SH, Huggins R (2011) Heterogeneous capture–recapture models with covariates: a partial likelihood approach for closed populations. Biometrics 67:1659–1665
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are grateful to the editor and two anonymous referees for their insightful and constructive comments which led to an improved presentation of this article. The research was supported by National Natural Science Foundation of China (Grant Nos. 11501354, 11771144, 11371142, and 11501208), Program of Shanghai Subject Chief Scientist (14XD1401600) and the 111 Project (B14019).

Author information

Authors and Affiliations

School of Statistics, East China Normal University, Shanghai, China
Yang Liu, Yukun Liu & Han Geng
School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, China
Yan Fan

Authors

Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yukun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Fan
View author publications
You can also search for this author in PubMed Google Scholar
Han Geng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yukun Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix

1.1 A. Regularity conditions on $k(x;\theta )$

We assume that $k(x;\theta )$ satisfies the following regularity conditions, which are from §4.2.2 of Serfling (1980).

(R1):

Let $\varTheta $ be the parameter space of $\theta $ and $\theta _0$ be its true value. Suppose $\varTheta $ is an open set and $\theta _0$ belongs to $\varTheta $.

(R2):

For each $\theta \in \varTheta $, the derivatives

$$\begin{aligned} \frac{\partial \log k(x; \theta )}{\partial \theta }, \; \frac{\partial ^2 \log k(x; \theta )}{\partial \theta ^2},\; \frac{\partial ^3 \log k(x; \theta )}{\partial \theta ^3} \end{aligned}$$

exist for all x.

(R3):

There exist functions g(x), h(x) and H(x) such that for $\theta $ in a neighborhood of $\theta _0$,

$$\begin{aligned} \Big | \frac{\partial \log k(x; \theta )}{\partial \theta } \Big | \le g(x), \; \Big | \frac{\partial ^2 \log k(x; \theta )}{\partial \theta ^2} \Big | \le h(x), \; \Big | \frac{\partial ^3 \log k(x; \theta )}{\partial \theta ^3} \Big | \le H(x) \end{aligned}$$

hold for all x and

$$\begin{aligned} \int g(x) dx<\infty , \; \int h(x) dx<\infty , \; \int H(x) k(x; \theta )dx <\infty . \end{aligned}$$

(R4):

For each $\theta \in \varTheta $, $0< \int \left\{ \partial \log k(x; \theta )/\partial \theta \right\} ^2 k(x; \theta )dx <\infty $.

1.2 B. Technical preparations

We make technical preparations for the proof of Theorem 1. For any positive real number x greater than n, define the digamma function $ \psi _0(x) = d \log \{ \varGamma (x)\}/dx $ and $ S_1(x, n) = \psi _0(x+1) - \psi _0(x-n+1). $ For $a=1,2,\dots $, we define the polygamma functions

$$\begin{aligned} \psi _a(x) =&\frac{d^{a+1}\log \{ \varGamma (x)\} }{dx^{a+1} } = \frac{d^{a}\psi _0(x) }{dx^{a} } =(-1)^{a+1}a!\sum _{k=0}^{\infty } \frac{1}{(x+k)^{a+1}}, \end{aligned}$$

(6)

$$\begin{aligned} S_{a}(x, u) =&\psi _{a-1}(x+1) - \psi _{a-1}(x-u+1) =(-1)^{a-1}(a-1)!\sum _{k=x-u+1}^{x} k^{-a}.\nonumber \\ \end{aligned}$$

(7)

It is clear that $\psi _1(x) = d\psi _0(x)/dx$ and therefore $S_{2}(x, n) = d S_{1}(x, n)/dx.$

Since $x^{-1}$ and $x^{-2}$ are both monotone decreasing functions for $x>0$, it follows from Eqs. (6) and (7) that

$$\begin{aligned}&\log \{ (N+1)/(N+1-n)\}<S_1(N,n)<\log \{ N/(N-n) \}, \\&-n/\{N(N-n)\}<S_2(N,n)< -n/\{ (N+1)(N+1-n)\}. \end{aligned}$$

Note that the number n of detected observations follows a binomial distribution $B(N_0,p(\theta _0))$. By the central limit theorem,

$$\begin{aligned} \sqrt{N_0}\left\{ \frac{n}{N_0}-p(\theta _0) \right\} {\overset{d}{\longrightarrow \; }}\; N\left( 0, \; p(\theta _0) \{1-p(\theta _0) \}\right) , \end{aligned}$$

as $N_0 \rightarrow \infty $. Therefore, it follows that

$$\begin{aligned} S_1(N_0,n) =&\log \{ N_0/\left( N_0-n\right) \} + O_p\left( N_0^{-1}\right) \\ =&- \log \{1-p(\theta _0)\} + \frac{\left( n/N_0\right) -p\left( \theta _0\right) }{ 1-p\left( \theta _0\right) } + O_p\left( N_0^{-1}\right) , \\ S_2\left( N_0,n\right) =&- \frac{n}{N_0\left( N_0-n\right) }+ O_p\left( N_0^{-2}\right) \\ =&- \frac{p\left( \theta _0\right) }{N_0\{ 1-p(\theta _0) \}}+ O_p\left( N_0^{-3/2}\right) . \end{aligned}$$

The following lemma from Hjort and Pollard (2011) can ease much of the technical burden in our proof of Theorem 1.

Lemma 2

Assume that $\theta ^{{\mathrm {\scriptscriptstyle \top }}}=(\theta _{1}^{{\mathrm {\scriptscriptstyle \top }}}, \theta _{2}^{{\mathrm {\scriptscriptstyle \top }}}) $ where $\theta _1$ and $\theta _2$ are r- and s-dimensional vectors, respectively. Let $\theta _0^{{\mathrm {\scriptscriptstyle \top }}}=(\theta _{10}^{{\mathrm {\scriptscriptstyle \top }}}, \theta _{20}^{{\mathrm {\scriptscriptstyle \top }}})$ be its true value, and $\gamma =(\gamma _{1}^{{\mathrm {\scriptscriptstyle \top }}}, \gamma _{2}^{{\mathrm {\scriptscriptstyle \top }}})^{{\mathrm {\scriptscriptstyle \top }}} = \sqrt{n}(\theta -\theta _0)$ where n is the sample size. Suppose for $\theta = \theta _0+O_p(n^{-1/2})$, it holds that

$$\begin{aligned} H(\theta ) = C_n + a_n^{{\mathrm {\scriptscriptstyle \top }}} \gamma - \frac{1}{2} \gamma ^{{\mathrm {\scriptscriptstyle \top }}}A\gamma + \varepsilon _n(\theta ) \end{aligned}$$

where $a_n=O_p(1)$, A is a positive definite matrix, $C_n$ does not depend on $\theta $, and $\varepsilon _n(\theta )= o_p(1)$ for any fixed $\theta $. According to $\theta =(\theta _{1}^{{\mathrm {\scriptscriptstyle \top }}}, \theta _{2}^{{\mathrm {\scriptscriptstyle \top }}})^{{\mathrm {\scriptscriptstyle \top }}}$, we partition A into $A = (A_{ij})_{1\le i, j\le 2}$, and partition $a_n^{{\mathrm {\scriptscriptstyle \top }}}$ into $(a_{n1}^{{\mathrm {\scriptscriptstyle \top }}}, a_{n2}^{{\mathrm {\scriptscriptstyle \top }}})$. As $n\rightarrow \infty $, if $a_n{\overset{d}{\longrightarrow \; }}N(0, A)$, then

(a)
the maximizer $\hat{\theta }$ of $H(\theta )$ satisfies $ \sqrt{n}(\hat{\theta }-\theta _0) = A^{-1}a_n + o_p(1) {\overset{d}{\longrightarrow \; }}N(0, A^{-1})$,
(b)
$2\{ \mathop {\max }\nolimits _{\theta } H(\theta ) - H(\theta _0) \} = a_n^{{\mathrm {\scriptscriptstyle \top }}} A^{-1} a_n + o_p(1) {\overset{d}{\longrightarrow \; }}\chi _{r+s}^2 $, and
(c)
$2\{ \mathop {\max }\nolimits _{\theta } H(\theta )- \mathop {\max }\nolimits _{\theta _2} H(\theta _{10}, \theta _2)\} = a_n^{{\mathrm {\scriptscriptstyle \top }}} A^{-1} a_n - a_{n2}^{{\mathrm {\scriptscriptstyle \top }}} A_{22}^{-1} a_{n2}+o_p(1) {\overset{d}{\longrightarrow \; }}\chi _{r}^2$.

1.3 C. Proof of Theorem 1

Using a similar argument to that in the proofs of Lemma 1 and Theorem 1 of Qin and Lawless (1994), we have $\hat{N} = N_0 +O_p(N_0^{1/2}) $ and $ \hat{\theta } - \theta _0 = O_p(N_0^{-1/2})$. Since the results in Theorem 1 are about the properties of $( \hat{N}, \hat{\theta })$, our proof begins by studying the behavior of $\ell (N, \theta )$ for $(N, \theta )$ such that $( (N-N_0)/N_0, \theta -\theta _0) = O_p(N_0^{-1/2})$.

Let $\alpha =(\alpha _1, \alpha _2^{{\mathrm {\scriptscriptstyle \top }}})^{{\mathrm {\scriptscriptstyle \top }}}$ with $\alpha _1 =N_0^{-1/2}(N-N_0)$ and $\alpha _2 = N_0^{1/2}(\theta -\theta _0)$. Define $H(\alpha ) = \ell ( N_0 +N_0^{1/2} \alpha _1, \theta _0+N_0^{-1/2}\alpha _2)$. The likelihood ratio function of $(N, \theta )$ can be expressed as

$$\begin{aligned} R(N, \theta ) = 2\{ H(\alpha ) - H(0) \}. \end{aligned}$$

By the second-order Taylor expansion, we have

$$\begin{aligned} H(\alpha ) = H(0) + \alpha ^{\mathrm {\scriptscriptstyle \top }}u + \frac{1}{2} \alpha ^{{\mathrm {\scriptscriptstyle \top }}} V \alpha + o_p(1), \end{aligned}$$

where $u \equiv (u_1, u_2^{\mathrm {\scriptscriptstyle \top }})^{\mathrm {\scriptscriptstyle \top }}=\partial H(0)/\partial \alpha $ and V is the leading term of $ \partial ^2 H(0)/(\partial \alpha \partial \alpha ^{\mathrm {\scriptscriptstyle \top }})$.

To proceed, we need the expressions of u and V. It can be seen that

$$\begin{aligned} \frac{\partial \ell (N, \theta )}{\partial N}&= S_1(N,n) + \log \{1-p(\theta )\},\\ \frac{\partial \ell (N, \theta )}{\partial \theta }&= \frac{n-N p(\theta )}{p(\theta )\{1-p(\theta )\} } \frac{d p(\theta )}{d\theta } + \sum _{i=1}^n \frac{\partial \log \{ k(x_i; \theta )\} }{ \partial \theta }. \end{aligned}$$

According to the properties of the digamma functions, we further have

$$\begin{aligned} u_1&=\frac{\partial H(0)}{\partial \alpha _1}= N_0^{1/2} \frac{\partial \ell (N_0, \theta _0)}{\partial N} \\&=N_0^{1/2} \left[ S_1(N_0,n) + \log \{1-p(\theta _0)\} \right] \\&=N_0^{1/2} \frac{ (n/N_0) -p(\theta _0) }{ 1-p(\theta _0) }+ O_p\left( N_0^{-1/2}\right) . \end{aligned}$$

and

$$\begin{aligned} u_2&= \frac{\partial H(0)}{\partial \alpha _2}= N_0^{-1/2} \frac{\partial \ell (N_0, \theta _0)}{\partial \theta } \\&= N_0^{-1/2} \left[ \frac{n- N_0p(\theta _0)}{ 1-p(\theta _0) } \frac{d \log \{ p(\theta _0)\} }{d\theta } + \sum _{i=1}^n \frac{d \log \{ k(x_i,\theta _0)\} }{ d\theta } \right] \\&=N_0^{1/2} \frac{(n/N_0) - p(\theta _0)}{ 1-p(\theta _0) } p_1(\theta _0) + \{p(\theta _0)\}^{1/2} n^{-1/2} \sum _{i=1}^n \frac{d \log \{ k(x_i,\theta _0)\} }{ d\theta } \\&\quad + O_p\left( N_0^{-1/2}\right) . \end{aligned}$$

By the central limit theorem, it can be shown that $u{\overset{d}{\longrightarrow \; }}N(0, \varSigma )$.

Write $V=(V_{ij})_{1\le i,j\le 2}$. It can be seen that $V_{11}$ is the leading term of

$$\begin{aligned} \frac{\partial H(0)}{\partial \alpha _1^2} = N_0 \frac{\partial \ell (N_0, \theta _0)}{\partial N^2} = N_0 S_2(N_0,n) = - \frac{p(\theta _0)}{1-p(\theta _0)} +O_p\left( N_0^{-1/2}\right) , \end{aligned}$$

where we have used an approximate of $S_2(N_0, n)$. This implies that

$$\begin{aligned} V_{11} = - \frac{p(\theta _0)}{1-p(\theta _0)}. \end{aligned}$$

With tedious algebra, we similarly have

$$\begin{aligned} V_{21} = - \frac{p(\theta _0)}{1-p(\theta _0)} p_1(\theta _0), \quad V_{22} = - \frac{p(\theta _0) }{ 1-p(\theta _0) } p_1(\theta _0) \{p_1(\theta _0) \}^{{\mathrm {\scriptscriptstyle \top }}} - p(\theta _0) I(\theta _0). \end{aligned}$$

Since $u{\overset{d}{\longrightarrow \; }}N(0, \varSigma )$ and $V = -\varSigma $, Theorem 1 is proved by applying Lemma 2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Liu, Y., Fan, Y. et al. Likelihood ratio confidence interval for the abundance under binomial detectability models. Metrika 81, 549–568 (2018). https://doi.org/10.1007/s00184-018-0655-2

Download citation

Received: 21 September 2017
Published: 30 March 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s00184-018-0655-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Likelihood ratio confidence interval for the abundance under binomial detectability models

Abstract

Access this article

Similar content being viewed by others

Bayesian estimation of the number of species from Poisson-Lindley stochastic abundance model using non-informative priors

On the Turing estimator in capture–recapture count data under the geometric distribution

Two-step semiparametric empirical likelihood inference from capture–recapture data with missing covariates

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix

1.1 A. Regularity conditions on \(k(x;\theta )\)

1.2 B. Technical preparations

Lemma 2

1.3 C. Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Likelihood ratio confidence interval for the abundance under binomial detectability models

Abstract

Access this article

Similar content being viewed by others

Bayesian estimation of the number of species from Poisson-Lindley stochastic abundance model using non-informative priors

On the Turing estimator in capture–recapture count data under the geometric distribution

Two-step semiparametric empirical likelihood inference from capture–recapture data with missing covariates

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix

Appendix

1.1 A. Regularity conditions on \(k(x;\theta )\)

1.2 B. Technical preparations

Lemma 2

1.3 C. Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation