Nonparametric prediction in species sampling

  • Anne ChaoEmail author
  • Tsung-Jen Shen


Consider a continuous-time stochastic model in which species arrive in the sample according to independent Poisson processes and where the species discovery rates are heterogeneous. Based on an initial survey, we are concerned with the problem of predicting the number of new species that would be discovered by additional sampling. When the sampling time or sample size of the additional sample tends to infinity, this problem reduces to the prediction of the number of undetected species in the original sample, or equivalently, the estimation of species richness. The topic has a wide range of applications in various disciplines. We propose a simple prediction method and apply it to two datasets. One set of data deals with the capture counts of the Malayan butterfly and the other set deals with identification records of organic pollutants in a water environment. Simulation results are shown to investigate the performance of the proposed method and to compare it with the existing estimators.

Key Words

Discovery rates Frequency counts Species abundance Species richness 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agresti, A. (1994), “Simple Capture-Recapture Models Permitting Unequal Catchability and Variable Sampling Effort,” Biometrics, 50, 494–500.CrossRefGoogle Scholar
  2. Boneh, S., Boneh, A., and Caron, R. J. (1998), “Estimating the Prediction Function and the Number of Unseen Species in Sampling With Replacement,” Journal of the American Statistical Association, 93, 372–379.zbMATHCrossRefMathSciNetGoogle Scholar
  3. Boulinier, T., Nichols, J. D., Sauer, J. R., Hines, J. E., and Pollock, K. H. (1998), “Estimating Species Richness: the Importance of Heterogeneity in Species Detectability,” Ecology, 79, 1018–1028.Google Scholar
  4. Bulmer, M. G. (1974), “On Fitting the Poisson Lognormal Distribution to Species Abundance Data,” Biometrics, 30, 101–110.zbMATHCrossRefGoogle Scholar
  5. Bunge, J., and Fitzpatrick, M. (1993), “Estimating the Number of Species: A Review,” Journal of the American Statistical Association, 88, 364–373.CrossRefGoogle Scholar
  6. Burnham, K. P., and Overton, W. S. (1978), “Estimating the Size of a Closed Population When Capture Probabilities Vary Among Animals,” Biometrika, 65, 625–633.zbMATHCrossRefGoogle Scholar
  7. Chao, A., and Lee, S.-M. (1992), “Estimating the Number of Classes via Sample Coverage,” Journal of the American Statistical Association, 87, 210–217.zbMATHCrossRefMathSciNetGoogle Scholar
  8. Chao, A., Hwang, W.-H., Chen, Y.-C., and Kuo, C.-Y. (2000), “Estimating the Number of Shared Species in Two Communities,” Statistica Sinica, 10, 227–246.zbMATHMathSciNetGoogle Scholar
  9. Chao, A., Ma, M.-C., and Yang, M. C. K. (1993), “Stopping Rules and Estimation for Recapture Debugging With Unequal Failure Rates,” Biometrika, 80, 193–201.zbMATHCrossRefMathSciNetGoogle Scholar
  10. Colwell, R. K., and Coddington, J. A. (1994), “Estimating Terrestrial Biodiversity Through Extrapolation,” Philosophical Transactions of the Royal Society, London B, 345, 101–118.CrossRefGoogle Scholar
  11. Coull, B. A., and Agresti, A. (1999), “The Use of Mixed Logit Models to Reflect Heterogeneity in Capture-Recapture Studies,” Biometrics, 55, 294–301.zbMATHCrossRefGoogle Scholar
  12. Efron, B., and Thisted, R. (1976), “Estimating the Number of Unseen Species: How Many Words Did Shakespeare Know?” Biometrika, 63, 435–447.zbMATHGoogle Scholar
  13. Engen, S. (1978), Stochastic Abundance Models, London: Chapman and Hall.zbMATHGoogle Scholar
  14. Fisher, R. A., Corbet, A. S., and Williams, C. B. (1943), “The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population,” Journal of Animal Ecology, 12, 42–58.CrossRefGoogle Scholar
  15. Good, I. J. (1953), “The Population Frequencies of Species and the Estimation of Population Parameters,” Biometrika, 40, 237–264.zbMATHMathSciNetGoogle Scholar
  16. Good, I. J., and Toulmin, G. H. (1956), “The Number of New Species, and the Increase in Population Coverage, When a Sample is Increased,” Biometrika, 43, 45–63.zbMATHMathSciNetGoogle Scholar
  17. Huang, S. P., and Weir, B. S. (2001), “Estimating the Total Number of Alleles Using a Sample Coverage Method,” Genetics, 159, 1365–1373.Google Scholar
  18. Janardan, K. G., and Schaeffer, D. J. (1981), “Methods for Estimating the Number of Identifiable Organic Pollutants in the Aquatic Environment,” Water Resources Research, 17, 243–249.CrossRefGoogle Scholar
  19. Keating, K. A., Quinn, J. F., Ivie, M. A., and Ivie, L. L. (1998), “Estimating the Effectiveness of Further Sampling in Species Inventories,” Ecological Applications, 8, 1239–1249.Google Scholar
  20. Lloyd, C. J., and Yip, P. (1991), “A Unification of Inference for Capture-Recapture Studies Through Martingale Estimating Functions,” in Estimating Equations, ed. V. P. Godambe, Oxford: Clarendon Press, pp. 65–88.Google Scholar
  21. Mandelbrot, B. (1977), Fractals, Form, Chance and Dimension, San Francisco: Freeman.zbMATHGoogle Scholar
  22. Norris III, J. L., and Pollock, K. H. (1998), “Non-Parametric MLE for Poisson Species Abundance Models Allowing for Heterogeneity Between Species,” Environmental and Ecological Statistics, 5, 391–402.CrossRefGoogle Scholar
  23. Ord, J. K., and Whitmore, G. A. (1986), “The Poisson-Inverse Gaussian Distribution as a Model for Species Abundance,” Communications in Statistics, Part A—Theory and Methods, 15, 853–871.zbMATHCrossRefMathSciNetGoogle Scholar
  24. Shen, T.-J., Chao, A., and Lin, C.-F. (2003), “Predicting the Number of New Species in Further Taxonomic Sampling,” Ecology, 84, 798–804.CrossRefGoogle Scholar
  25. Sichel, H. S. (1997), “Modelling Species-Abundance Frequencies and Species-Individual Functions with the Generalized Inverse Gaussian-Poisson Distribution,” South African Statistical Journal, 31, 13–37.zbMATHGoogle Scholar
  26. Smith, E. P., and devan Belle, G. (1984), “Nonparametric Estimation of Species Richness,” Biometrics, 40, 119–129.CrossRefGoogle Scholar
  27. Solow, A. R., and Polasky, S. (1999), “A Quick Estimator for Taxonomic Surveys,” Ecology, 80, 2799–2803.CrossRefGoogle Scholar
  28. Williams, C. B. (1964), Patterns in the Balance of Nature, London: Academic Press.Google Scholar
  29. Zipf, G. K. (1965), Human Behavior and Principle of Least Effort, New York: Addison-Wesley.Google Scholar

Copyright information

© International Biometric Society 2004

Authors and Affiliations

  1. 1.Institute of StatisticsNational Tsing Hua UniversityTaiwan

Personalised recommendations