Skip to main content
Log in

Estimating the number of shared species by a jackknife procedure

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

A sequence of jackknife estimators is developed to estimate the number of shared species in two communities. The estimators have simple and explicit formulae. A sequential testing criterion is also developed to determine a proper order for these jackknife estimators. The performance of the estimators is evaluated using empirical data on two forests from Malaysia, where 209 shared species present in both forests, and using simulated data. Results for the empirical data and simulated scenarios (for sampling fraction ranging from 0.5 to 20 %) show that the jackknife estimator, compared with other existing estimators, has a smaller bias and provides more reliable interval estimation in most cases. Additionally, two avian datasets from Taiwan and Hong Kong are used to demonstrate the proposed method. To extend the proposed method to three communities, we also list the first six orders of the jackknife estimators explicitly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Amstrup SC, McDonald TL, Manly BF (eds) (2010) Handbook of capture–recapture analysis. Princeton University Press, Princeton

    Google Scholar 

  • Arvesen JN (1969) Jackknifing U-statistics. Ann Math Stat 40:2076–2100

    Article  Google Scholar 

  • Burnham KP, Overton WS (1978) Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65(3):625–633

    Article  Google Scholar 

  • Burnham KP, Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60(5):927–936

    Article  Google Scholar 

  • Chao A (1987) Estimating the population size for capture–recapture data with unequal catchability. Biometrics 43:783–791

    Article  CAS  PubMed  Google Scholar 

  • Chao A (2005) Species estimation and applications. In: Balakrishnan N, Read CB, Vidakovic B (eds) Encyclopedia of statistical sciences, vol 12, 2nd edn. Wiley, New York, pp 7907–7916

    Google Scholar 

  • Chao A, Hwang W-H, Chen Y-C, Kuo C-Y (2000) Estimating the number of shared species in two communities. Stat Sin 10:227–246

    Google Scholar 

  • Chao A, Jost L, Chiang S-C, Jiang Y-H, Chazdon R (2008) A two-stage probabilistic approach to multiple-community similarity indices. Biometrics 64:1178–1186

    Article  PubMed  Google Scholar 

  • Chao A, Lee S-M (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87:210–217

    Article  Google Scholar 

  • Chao A, Ma M-C, Yang MCK (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80:193–201

    Article  Google Scholar 

  • Chao A, Shen T-J (2010) Program SPADE (Species Prediction And Diversity Estimation). Program and User’s Guide published at http://chao.stat.nthu.edu.tw

  • Chao A, Shen T-J, Hwang W-H (2006) Application of Laplace’s boundary-mode approximations to estimate species and shared species richness. Aust N Z J Stat 48:117–128

    Article  Google Scholar 

  • Chiarucci A, Enright NJ, Perry GLW, Miller BP, Lamont BB (2003) Performance of nonparametric species richness estimators in a high diversity plant community. Divers Distrib 9:283–295

    Article  Google Scholar 

  • Chiu CH, Wang YT, Walther BA, Chao A (2014) An improved nonparametric lower bound of species richness via a modified good-turing frequency formula. Biometrics 70(3):671–682

    Article  PubMed  Google Scholar 

  • Colwell RK, Coddington JA (1994) Estimating terrestrial biodiversity through extrapolation. Philos Trans R Soc Lond B 345:101–118

    Article  CAS  Google Scholar 

  • Colwell RK, Elsensohn JE (2014) EstimateS turns 20: statistical estimation of species richness and shared species from samples, with non-parametric extrapolation. Ecography 37:609–613

    Article  Google Scholar 

  • Condit R, Pitman N, Leigh EG Jr, Chave J, Terborgh J, Foster RB, Núñez P, Aguilar S, Valencia R, Villa G, Muller-Landau HC, Losos E, Hubbell SP (2002) Beta-diversity in tropical forest trees. Science 295:666–669

    Article  CAS  PubMed  Google Scholar 

  • Cormack RM (1989) Log-linear models for capture-recapture. Biometrics 395–413

  • Darroch JN, Ratcliff D (1980) A note on capture–recapture estimation. Biometrics 36:149–153

    Article  Google Scholar 

  • Eren MI, Chao A, Hwang WH, Colwell RK (2012) Estimating the richness of a population when the maximum number of classes is fixed: a nonparametric solution to an archaeological problem. PLoS One 7(5):e34179

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Esty WW (1985) Estimation of the number of classes in a population and the coverage of a sample. Math Stat 10:41–50

    Google Scholar 

  • Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40:237–264

    Article  Google Scholar 

  • Gotelli NJ, Colwell RK (2009) Estimating species richness. In: Magurran A, McGill B (eds) Frontiers in measuring biodiversity. Oxford University Press, New York

    Google Scholar 

  • Goutis C, Casella G (1999) Explaining the saddle point approximation. Am Stat 53:216–224

    Google Scholar 

  • Heltshe JF, Forrester NE (1983) Estimating species using the jackknife procedure. Biometrics 39:1–11

    Article  CAS  PubMed  Google Scholar 

  • Hellmann JJ, Fowler GW (1999) Bias, precision, and accuracy of four measures of species richness. Ecol Appl 9:824–834

    Article  Google Scholar 

  • Hwang WH, Huang SY (2003) Estimation in capture–recapture models when covariates are subject to measurement errors. Biometrics 59:1113–1122

    Article  PubMed  Google Scholar 

  • Krishnamani R, Kumar A, Harte J (2004) Estimating species richness at large spatial scales using data from discrete plots. Ecography 27:637–642

    Article  Google Scholar 

  • Magurran AE (2004) Measuring biological diversity. Blackwell, Oxford

    Google Scholar 

  • Ostling A, Harte J, Green J, Kinzig A (2003) A community-level fractal property produces power-law species–area relationships. Oikos 103:218–224

    Article  Google Scholar 

  • Palmer MW (1990) The estimation of species richness by extrapolation. Ecology 71:1195–1198

    Article  Google Scholar 

  • Palmer MW (1991) Estimating species richness: the second-order jackknife reconsidered. Ecology 72:1512–1513

    Article  Google Scholar 

  • Pan H-Y, Chao A, Foissner W (2009) A nonparametric lower bound for the number of specie hared by multiple communities. J Agric Biol Environ Stat 14:452–468

    Article  PubMed Central  PubMed  Google Scholar 

  • Quenouille MH (1949) Approximate tests of correlation in time series. J R Stat Soc Ser B 11:68–84

    Google Scholar 

  • Rasmussen SL, Starr N (1979) Optimal and adaptive stopping in the search for new species. J Am Stat Assoc 74:661–667

    Article  Google Scholar 

  • Schechtman E, Wang S (2004) Jackknifing two-sample statistics. J Stat Plan Inference 119:329–340

    Article  Google Scholar 

  • Schloss PD, Handelsman J (2006) Introducing SONS, a tool for OTU-based comparisons of membership and structure between microbial communities. Appl Environ Microbiol 72:6773–6779

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Shao J, Tu D (1995) The jackknife and bootstrap. Springer, New York

    Book  Google Scholar 

  • Tjørve E, Tjørve KMC (2008) The species–area relationship, self-similarity, and the true meaning of the z-value. Ecology 89:3528–3533

    Article  PubMed  Google Scholar 

  • Walther BA, Moore JL (2005) The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography 28:815–829

    Article  Google Scholar 

  • Walther BA, Morand S (1998) Comparative performance of species richness estimation methods. Parasitology 116:395–405

    Article  PubMed  Google Scholar 

  • Williams VL, Witkowski ET, Balkwill K (2007) The use of incidence-based species richness estimators, species accumulation curves and similarity measures to appraise ethnobotanical inventories from South Africa. Biodivers Conserv 16:2495–2513

    Article  Google Scholar 

  • Yip PS, Fang X, Zhou Y, Wang Y (2003) Sequential procedure for fixed accuracy estimation of the population size in recapture sampling. Aust N Z J Stat 45:207–216

    Article  Google Scholar 

  • Yue JC, Clayton MK (2012) Sequential sampling in the search for new shared species. J Stat Plan Inference 142:1031–1039

    Article  Google Scholar 

Download references

Acknowledgments

The authors are grateful to Professor Fangliang He for his valuable discussions and providing the Lambir forest plot data. The authors thank the referees and editor for their useful comments. We also thank Roman Gulati for his generous editing assistance. This work was supported by the Ministry of Science and Technology of Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Han Hwang.

Additional information

Handling Editor: Pierre Dutilleul.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 1065 KB)

Appendix: A general result of the jackknife estimators \(\hat{S}_k \)

Appendix: A general result of the jackknife estimators \(\hat{S}_k \)

Define a 2-dimensional array of coefficients \(d_{t,u} \) as:

$$\begin{aligned} \left\{ {{\begin{array}{ll} {d_{1,1} =1} &{} \\ {d_{t,t} =-td_{t-1,t-1}} &{} {\forall t\ge 2;} \\ {d_{t,1} =2^{t}-1} &{} {\forall t\ge 2;} \\ {d_{t,u} =d_{t-1,u} +u\left( {d_{t-1,u} -d_{t-1,u-1}} \right) } &{} {\forall t\ge 2\hbox { and }2\le u<t} \\ {d_{t,u} =0} &{} {\hbox {otherwise}.} \\ \end{array}}} \right. \end{aligned}$$

These coefficients are used to simplify the expressions of jackknife estimators. The formulae can be summarized with the following Theorem.

Theorem 1

For each nonnegative integer \(v\), we have:

$$\begin{aligned} \hat{S}_{2\nu ,X}= & {} D+\sum _{t=1}^{\nu +1} {d_{\nu +1,t} \frac{n_1 -t}{n_1}f_{t+} +} \sum _{u=1}^\nu {d_{\nu ,u} \frac{n_2 -u}{n_2}f_{+u}} \nonumber \\&+\,\sum _{t=1}^{\nu +1} {\sum _{u=1}^\nu {d_{\nu +1,t} d_{\nu ,u} \frac{(n_1 -t)(n_2 -u)}{n_1 n_2}f_{tu}}} \end{aligned}$$
(7)

and

$$\begin{aligned} \hat{S}_{2\nu ,Y}= & {} D+\sum _{t=1}^\nu {d_{\nu ,t} \frac{n_1 -t}{n_1}f_{t+}} + \sum _{u=1}^{\nu +1} {d_{\nu +1,u} \frac{n_2 -u}{n_2}f_{+u}} \nonumber \\&+\,\sum _{t=1}^\nu {\sum _{u=1}^{\nu +1} {d_{\nu ,t} d_{\nu +1,u} \frac{(n_1 -t)(n_2 -u)}{n_1 n_2}f_{tu}}} . \end{aligned}$$
(8)

Therefore, \(\hat{S}_{2\nu +1} =(n_1 \hat{S}_{2\nu ,X} +n_2 \hat{S}_{2\nu ,Y})/(n_1 +n_2)\) is a linear combination of the frequencies \(f_{tu}\). Furthermore, the \((2\nu +2)\)-th order jackknife estimator is:

$$\begin{aligned} \hat{S}_{2\nu +2}= & {} D+\sum _{t=1}^{\nu +1} {d_{\nu +1,t} \frac{n_1 -t}{n_1}f_{t+}} + \sum _{u=1}^{\nu +1} {d_{\nu +1,u} \frac{n_2 -u}{n_2}f_{+u}} \nonumber \\&+\,\sum _{t=1}^{\nu +1} {\sum _{u=1}^{\nu +1} {d_{\nu +1,t} d_{\nu +1,u} \frac{(n_1 -t)(n_2 -u)}{n_1 n_2}f_{tu}}} . \end{aligned}$$
(9)

The proof is established by mathematical induction and is shown in the Supplementary Materials due to lengthy algebra. We can further simplify the formulae in the next Corollary.

Corollary 1

When the sample sizes \(n_1 \) and \(n_2 \) are sufficiently large, define \(\lambda _j =(n_j -h)/(n_1 +n_2)\) for any finite number \(h\) and \(j=1,2\). Asymptotically, the explicit forms of the jackknife estimators \(\hat{S}_k \) for \(k=1,\ldots ,6\), are as follows:

$$\begin{aligned} \hat{S}_1= & {} D+\lambda _1 f_{1+} +\lambda _2 f_{+1} ; \\ \hat{S}_2= & {} D+f_{1+} +f_{+1} +f_{11} ; \\ \hat{S}_3= & {} D+(1+2\lambda _1)f_{1+} -2\lambda _1 f_{2+} +(1+2\lambda _2)f_{+1} -2\lambda _2 f_{+2} \\&+\,3f_{11} -2\lambda _1 f_{12} -2\lambda _1 \lambda _2 f_{21} ; \\ \hat{S}_4= & {} D+3f_{1+} -2f_{2+} +3f_{+1} -2f_{+2} +9f_{11} -6f_{12} -6f_{21} +4f_{22} ; \\ \hat{S}_5= & {} D+(3+4\lambda _1)f_{1+} -2(1+5\lambda _1)f_{2+} +6\lambda _1 f_{3+} \\&+\,(3+4\lambda _2)f_{+1} -2(1+5\lambda _2)f_{+2} +6\lambda _2 f_{+3} \\&+\,21f_{11} +(22\lambda _1 -36)f_{12} -(22\lambda _2 -36)f_{21} +24f_{22} \\&+\,18\lambda _1 f_{31} +18\lambda _2 f_{13} -12\lambda _1 f_{32} -12\lambda _2 f_{23} ; \\ \hat{S}_6= & {} D+7f_{1+} -12f_{2+} +6f_{3+} +7f_{+1} -12f_{+2} +6f_{+3} \\&+\,49f_{11} -84f_{12} +42f_{13} -84f_{21} +144f_{22} -72f_{23} \\&+\,42f_{31} -72f_{32} +36f_{33} . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chuang, CJ., Shen, TJ. & Hwang, WH. Estimating the number of shared species by a jackknife procedure. Environ Ecol Stat 22, 759–778 (2015). https://doi.org/10.1007/s10651-015-0318-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-015-0318-7

Keywords

Navigation