Skip to main content
Log in

On the Turing estimator in capture–recapture count data under the geometric distribution

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

We introduce an estimator for an unknown population size in a capture–recapture framework where the count of identifications follows a geometric distribution. This can be thought of as a Poisson count adjusted for exponentially distributed heterogeneity. As a result, a new Turing-type estimator under the geometric distribution is obtained. This estimator can be used in many real life situations of capture–recapture, in which the geometric distribution is more appropriate than the Poisson. The proposed estimator shows a behavior comparable to the maximum likelihood one, on both simulated and real data. Its asymptotic variance is obtained by applying a conditional technique and its empirical behavior is investigated through a large-scale simulation study. Comparisons with other well-established estimators are provided. Empirical applications, in which the population size is known, are also included to further corroborate the simulation results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Anan O, Böhning D, Maruotti A (2017a) Population size estimation and heterogeneity in capture–recapture data: a linear regression estimator based on the Conway–Maxwell–Poisson distribution. Stat Methods Appl 26:49–79

    Article  MathSciNet  MATH  Google Scholar 

  • Anan O, Böhning D, Maruotti A (2017b) Uncertainty estimation in heterogeneous capture–recapture count data. J Stat Comput Simul 87:2094–2114

    Article  MathSciNet  MATH  Google Scholar 

  • Böhning D (2008) A simple variance formula for population size estimators by conditioning. Stat Methodol 5:410–423

    Article  MathSciNet  MATH  Google Scholar 

  • Böhning D (2015) Power series mixtures and the ratio plot with applications to zero-truncated count distribution modelling. METRON 73:201–216

    Article  MathSciNet  MATH  Google Scholar 

  • Böhning D, Schön D (2005) Nonparametric maximum likelihood estimation of population size based on the counting distribution. J R Stat Soc Ser C 54:721–737

    Article  MathSciNet  MATH  Google Scholar 

  • Böhning D, Punyapornwithaya V (2018) The geometric distribution, the ratio plot under the null and the burden of Dengue fever in Chiang Mai province. In: Böhning D, Bunge J, van der Heijden PGM (eds) Capture–recapture methods for the social and medical sciences. CRC Press, Boca Raton, pp 55–60

    Google Scholar 

  • Böhning D, Baksh MF, Lerdsuwansri R, Gallagher J (2013) Use of the ratio plot in capture–recapture estimation. J Comput Graph Stat 22:135–155

    Article  MathSciNet  Google Scholar 

  • Böhning D, van der Heijden PGM, Bunge J (2018) Capture–recapture methods for the social and medical sciences. CRC Press, Boca Raton

    Google Scholar 

  • Borchers DL, Buckland ST, Zucchini W (2004) Estimating animal abundance: closed populations. Springer, London

    MATH  Google Scholar 

  • Chao A (1987) Estimating the population size for capture–recapture data with unequal catchability. Biometrics 43:783–791

    Article  MathSciNet  MATH  Google Scholar 

  • Chao A (1989) Estimating population size for sparse data in capture–recapture experiments. Biometrics 45:427–438

    Article  MathSciNet  MATH  Google Scholar 

  • Chao A, Colwell RK (2017) Thirty years of progeny from Chao’s inequality: estimating and comparing richness with incidence data and incomplete sampling. SORT Stat Oper Res Trans 41:3–54

    MathSciNet  MATH  Google Scholar 

  • Coumans AM, Cruyff M, Van der Heijden PGM, Wolf J, Schmeets H (2017) Estimating homelessness in the Netherlands using a capture–recapture approach. Soc Indic Res 130:189–212

    Article  Google Scholar 

  • Farcomeni A, Scacciatelli D (2013) Heterogeneity and behavioural response in continuous time capture–recapture, with application to street cannabis use in Italy. Ann Appl Stat 7:2293–2314

    Article  MathSciNet  MATH  Google Scholar 

  • Fisher RA, Corbet AS, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample from one animal population. J Anim Ecol 12:42–58

    Article  Google Scholar 

  • Hwang WH, Huggins R (2005) An examination of the effect of heterogeneity on the estimation of population size using capture–recapture data. Biometrika 92:229–233

    Article  MathSciNet  MATH  Google Scholar 

  • Hwang W-H, Lin C-W, Shen T-J (2015) Good–Turing frequency estimation in a finite population. Biometrical J 57:321–339

    Article  MathSciNet  MATH  Google Scholar 

  • Lloyd CJ, Frommer D (2004) Regression based estimation of the false negative fraction when multiple negatives are unverified. J R Stat Soc Ser C 53:619–631

    Article  MathSciNet  MATH  Google Scholar 

  • McRea RS, Morgan BJT (2014) Analysis of capture–recapture data. CRC Press, Boca Raton

    Book  Google Scholar 

  • Niwitpong SA, Böhning D, van der Heijden PG, Holling H (2013) Capture–recapture estimation based upon the geometric distribution allowing for heterogeneity. Metrika 76:495–519

    Article  MathSciNet  MATH  Google Scholar 

  • Norris JL, Pollock KH (1996) Including model uncertainty in estimating variances in multiple capture studies. Environ Ecol Stat 3:235–244

    Article  Google Scholar 

  • Puig P, Barquinero JF (2011) An application of compound poisson modelling to biological dosimetry. Proc R Soc A Math Phys Eng Sci 467:897–910

    Article  MathSciNet  MATH  Google Scholar 

  • Puig P, Kokonendji CC (2018) Non-parametric estimation of the number of zeros in truncated count distributions. Scand J Stat 45:347–365

    Article  MathSciNet  MATH  Google Scholar 

  • Shmueli G, Minka TP, Kadane JB, Borle S, Boatwright P (2005) A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution. J R Stat Soc Ser C 54:127–142

    Article  MathSciNet  MATH  Google Scholar 

  • Zelterman D (1988) Robust estimation in truncated discrete distributions with application to capture–recapture experiments. J Stat Plan Inference 18:225–237

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is developed under the PRIN2015 supported-project “Environmental processes and human activities: capturing their interactions via statistical methods (EPHASTAT)” funded by MIUR (Italian Ministry of Education, University and Scientific Research). Antonello Maruotti is grateful to the “Centro di Ateneo per la Ricerca e l’Internalizzazione” (LUMSA) for the financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonello Maruotti.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Appendix: Proof of Proposition 1

Appendix: Proof of Proposition 1

According to the conditional technique, we have

$$\begin{aligned} \textit{Var}(\hat{N}_{\textit{TG}}) = \textit{Var}_{n} \left\{ E(\widehat{N}_{\textit{TG}}|n) \right\} + E_{n} \left\{ \textit{Var}(\widehat{N}_{\textit{TG}}|n) \right\} . \end{aligned}$$
(8)

Starting from the first term on the right hand side of (8), the delta method we have \(E(\widehat{N}_{\textit{TG}}|n)\approx \frac{n}{1-\kappa _0}\) and, accordingly,

$$\begin{aligned} \textit{Var}_{n} \left\{ E(\widehat{N}_{\textit{TG}}|n) \right\}\approx & {} \textit{Var}_{n} \left\{ \frac{n}{1-{\kappa }_0} \right\} = \frac{1}{(1-{\kappa }_0)^{2}} \textit{Var}(n) = \frac{N(1-{\kappa }_0){\kappa }_0}{(1-{\kappa }_0)^{2}}. \end{aligned}$$
(9)

Since \(E(n) = N(1-{\kappa }_0)\) and \(\widehat{\kappa }_{0(TG)} = \sqrt{\frac{f_{1}}{S}} \), the variance in (9) can be estimated as:

$$\begin{aligned} \widehat{\textit{Var}}_{n}\left\{ E(\widehat{N}_{\textit{TG}}|n) \right\} = \frac{n \sqrt{\frac{f_{1}}{S}}}{\left( 1- \sqrt{\frac{f_{1}}{S}}\right) ^{2} }. \end{aligned}$$

Additionally,

$$\begin{aligned} \textit{Var}(\widehat{N}_{\textit{TG}}|n)= & {} \textit{Var}\left( \frac{n}{1-\sqrt{\frac{f_{1}}{S}}}|n \right) = n^{2} \textit{Var}\left( \frac{1}{1-\sqrt{\frac{f_{1}}{S}}}\right) . \end{aligned}$$

We know that \(\textit{Var}\Big ( \frac{1}{1-\sqrt{\frac{f_{1}}{S}}}\Big )\) can be approximated by the delta-method. Hence, let \(y= \frac{f_{1}}{S}\) and we take \(h(y)=\frac{1}{1-\sqrt{y}}\). Then,

$$\begin{aligned} h'(y)=-(1-y^{1/2})^{-2} \left( -\frac{1}{2}y^{-1/2} \right) =\frac{1}{2\sqrt{y}(1-\sqrt{y})^{2}}. \end{aligned}$$

Furthermore,

$$\begin{aligned} \textit{Var}\left( \frac{1}{1-\sqrt{\frac{f_{1}}{S}}}|n\right)\approx & {} \left( \frac{1}{2\sqrt{y}(1-\sqrt{y})^{2}} \right) ^{2}{} \textit{Var}\left( \frac{f_{1}}{S} \right) \\= & {} \left( \frac{1}{4\frac{f_{1}}{S}\left( 1-\sqrt{\frac{f_{1}}{S}}\right) ^{4}} \right) \textit{Var}\left( \frac{f_{1}}{S} \right) . \end{aligned}$$

As next step, using the conditional variance technique to estimate \(\textit{Var}\left( \frac{f_{1}}{S} \right) \), we have that

$$\begin{aligned} \textit{Var}\left( \frac{f_{1}}{S} \right)= & {} \textit{Var}_{f_{1}}\left\{ E\left( \frac{f_{1}}{S}\right) |f_{1} \right\} +E_{f_{1}}\left\{ \textit{Var} \left( \frac{f_{1}}{S}|f_{1} \right) \right\} . \end{aligned}$$
(10)

With the approximation \(E\left( \frac{f_{1}}{S}|f_{1} \right) = f_{1}E(\frac{1}{S}) \approx \frac{f_{1}}{S}\), we have that

$$\begin{aligned} \textit{Var}_{f_{1}}\left\{ E\left( \frac{f_{1}}{S} |f_{1}\right) \right\}\approx & {} \textit{Var}_{f_{1}}\left( \frac{f_{1}}{S} \right) = \frac{1}{S^{2}} \textit{Var}(f_{1}) = \frac{1}{S^{2}} Np_{1}(1-p_{1}) \nonumber \\= & {} \frac{1}{S^{2}}\left( N\frac{f_{1}}{N} \left( 1-\frac{f_{1}}{N}\right) \right) = \frac{f_{1}}{S^{2}}\left( 1-\frac{f_{1}}{N}\right) . \end{aligned}$$
(11)

Again, estimating \(E_{f_{1}} \left\{ \textit{Var}\left( \frac{f_{1}}{S}|f_{1} \right) \right\} \) by \(\textit{Var}\left( \frac{f_{1}}{S}|f_{1} \right) \) we have that

$$\begin{aligned} E_{f_{1}} \left\{ \textit{Var}\left( \frac{f_{1}}{S}|f_{1} \right) \right\}\approx & {} \textit{Var}\left( \frac{f_{1}}{S}|f_{1} \right) = f_{1}^{2} \textit{Var}\left( \frac{1}{S}\right) \end{aligned}$$

Using the delta method, we achieve that

$$\begin{aligned} \textit{Var}\left( \frac{1}{S}\right)\approx & {} \frac{1}{S^{4}} \textit{Var}(N\bar{X}) = \frac{1}{S^{4}}N^{2} \textit{Var}(\bar{X}) = \frac{1}{S^{4}}N^{2} \frac{\textit{Var}(X)}{N}. \end{aligned}$$

Since \(X\sim \textit{Geo}(p)\) we have that \(E(X)=\frac{1-p}{p}\) and \(\textit{Var}(X)=\frac{1-p}{p^{2}}\).

$$\begin{aligned} \textit{Var}\left( \frac{1}{S}\right)\approx & {} \frac{1}{S^{4}}N^{2} \frac{ \left( \frac{1-p}{p^{2}}\right) }{N} = \frac{1}{S^{4}}N^{2} \frac{ \left( \frac{E(X)}{p}\right) }{N} =\frac{1}{S^{4}}N^{2} \frac{ \left( \frac{E(S/N)}{p}\right) }{N} \approx \frac{1}{pS^{3}}. \end{aligned}$$

Let us note that

$$\begin{aligned} E\left( \frac{S}{N}\right) = \frac{1-p}{p}; \quad \frac{S}{N} \approx \frac{1-p}{p} \quad \mathrm{or} \quad p(S+N)\approx & {} N \quad \mathrm{or} \quad \frac{1}{p} \approx \frac{S+N}{N}.\qquad \end{aligned}$$
(12)

Hence,

$$\begin{aligned} \widehat{\textit{Var}}\left( \frac{f_{1}}{S}|f_{1} \right) =\frac{f_{1}^{2}}{S^{3}}\left( \frac{S+N}{N} \right) . \end{aligned}$$

Substituting (11) and (12) into (10), this leads to

$$\begin{aligned} \widehat{\textit{Var}} \left( \frac{f_{1}}{S} \right)= & {} \frac{1}{S^{2}}\left\{ f_{1}\left( 1-\frac{f_{1}}{N} \right) \right\} +\frac{f_{1}^{2}}{S^{3}}\left( \frac{S+N}{N} \right) \\= & {} \frac{f_{1}}{S^{2}} \left\{ \frac{N+f_{1}}{N}+\frac{f_{1}}{S}\left( \frac{S+N}{N} \right) \right\} \nonumber \\= & {} \frac{f_{1}}{S^{2}}\left\{ \frac{NS-Sf_{1}+f_{1}S+f_{1}N}{NS} \right\} =\frac{f_{1}}{S^{2}} \left\{ \frac{N(S+f_{1})}{NS} \right\} =\frac{f_{1}S+f_{1}^{2}}{S^{3}}. \end{aligned}$$

We have that

$$\begin{aligned} \widehat{\textit{Var}} \left( \frac{1}{1-\sqrt{\frac{f_{1}}{S}}} \right)= & {} \left\{ \frac{1}{\frac{4f_{1}}{S} \left( 1-\sqrt{\frac{f_{1}}{S}} \right) ^{4}} \right\} \left\{ \frac{f_{1}S+f_{1}^{2}}{S^{3}} \right\} = \widehat{\textit{Var}} \left( \frac{1}{1-\sqrt{\frac{f_{1}}{S}}} \right) \nonumber \\= & {} \left\{ \frac{S}{4f_{1}\left( 1-\sqrt{\frac{f_{1}}{S}} \right) ^{4}} \right\} \left\{ \frac{f_{1}S+f_{1}^{2}}{S^{3}} \right\} = \frac{Sf_{1}+f_{1}^{2}}{4f_{1}S^{2}\left( 1-\sqrt{\frac{f_{1}}{S}} \right) ^{4}} \nonumber \\= & {} \frac{S+f_{1}}{4S^{2}\left( 1-\sqrt{\frac{f_{1}}{S}} \right) ^{4}} . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anan, O., Böhning, D. & Maruotti, A. On the Turing estimator in capture–recapture count data under the geometric distribution. Metrika 82, 149–172 (2019). https://doi.org/10.1007/s00184-018-0695-7

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-018-0695-7

Keywords

Navigation