Environmental and Ecological Statistics

, Volume 10, Issue 4, pp 429–443 | Cite as

Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample

  • Anne Chao
  • Tsung-Jen Shen


A biological community usually has a large number of species with relatively small abundances. When a random sample of individuals is selected and each individual is classified according to species identity, some rare species may not be discovered. This paper is concerned with the estimation of Shannon’s index of diversity when the number of species and the species abundances are unknown. The traditional estimator that ignores the missing species underestimates when there is a non-negligible number of unseen species. We provide a different approach based on unequal probability sampling theory because species have different probabilities of being discovered in the sample. No parametric forms are assumed for the species abundances. The proposed estimation procedure combines the Horvitz–Thompson (1952) adjustment for missing species and the concept of sample coverage, which is used to properly estimate the relative abundances of species discovered in the sample. Simulation results show that the proposed estimator works well under various abundance models even when a relatively large fraction of the species is missing. Three real data sets, two from biology and the other one from numismatics, are given for illustration.

biodiversity entropy Horvitz–Thompson estimator jackknife sample coverage species unequal probability sampling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ashbridge, J. and Goudie, I.B.J. (2000) Coverage-adjusted estimators for mark-recapture in heterogeneous populations. Communications in Statistics-Simulation, 29, 1215–37.Google Scholar
  2. Basharin, G.P. (1959) On a statistical estimate for the entropy of a sequence of independent random variables. Theory of Probability and Its Applications, 4, 333–6.Google Scholar
  3. Batten, L.A. (1976) Bird communities of some Killarney woodlands. Proceedings of the Royal Irish Academy, 76, 285–313.Google Scholar
  4. Bunge, J. and Fitzpatrick, M. (1993) Estimating the number of species: a review. Journal of the American Statistical Association, 88, 364–73.Google Scholar
  5. Bunge, J., Fitzpatrick, M., and Handley, J. (1995) Comparison of three estimators of the number of species. Journal of Applied Statistics, 22, 45–59.Google Scholar
  6. Chao, A. and Lee, S.-M. (1992) Estimating the number of classes via sample coverage. Journal of the American Statistical Association, 87, 210–17.Google Scholar
  7. Chao, A., Hwang, W.-H., Chen, Y.-C., and Kuo, C.-Y. (2000) Estimating the number of shared species in two communities. Statistica Sinica, 10, 227–46.Google Scholar
  8. Chao, A., Ma, M.-C., and Yang, M.C.K. (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika, 80, 193–201.Google Scholar
  9. Colwell, R.K. and Coddington, J.A. (1994) Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society, London B, 345, 101–18.Google Scholar
  10. Efron, B. and Tibshirani, R.J. (1993) An Introduction to the Bootstrap, Chapman and Hall, New York.Google Scholar
  11. Engen, S. (1978) Stochastic Abundance Models, Halsted Press, New York.Google Scholar
  12. Esty, W. (1986) The efficiency of Good's nonparametric coverage estimator. The Annals of Statistics, 14, 1257–60.Google Scholar
  13. Good, I.J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237–64.Google Scholar
  14. Haas, P. and Stokes, L. (1998) Estimating the number of classes in a finite population. Journal of the American Statistical Association, 93, 1475–87.Google Scholar
  15. Holst, L. (1981) Some asymptotic results for incomplete multinomial or Poisson samples. Scandinavian Journal of Statistics, 8, 243–6.Google Scholar
  16. Horvitz, D.G. and Thompson, D.J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–85.Google Scholar
  17. Hutcheson, K. and Shenton, L.R. (1974) Some moments of an estimate of Shannon's measure of information. Communications in Statistics, 3, 89–94.Google Scholar
  18. Janzen, D.H. (1973a) Sweep samples of tropical foliage insects: description of study sites, with data on species abundances and size distributions. Ecology, 54, 659–86.Google Scholar
  19. Janzen, D.H. (1973b) Sweep samples of tropical foliage insects: effects of seasons, vegetation types, elevation, time of day, and insularity. Ecology, 54, 687–708.Google Scholar
  20. MacArthur, R.H. (1957) On the relative abundances of bird species. Proceedings of National Academy of Science, U.S.A., 43, 193–295.Google Scholar
  21. Magurran, A.E. (1988) Ecological Diversity and Its Measurement, Princeton, Princeton University Press, New Jersey.Google Scholar
  22. Mandelbrot, B. (1977) Fractals, Form, Chance and Dimension, Freeman, San Francisco.Google Scholar
  23. Norris III, J.L. and Pollock, K.H. (1998) Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environmental and Ecological Statistics, 5, 391–402.Google Scholar
  24. Peet, R.K. (1974) The measurement of species diversity. Annual Review of Ecology and Systematics, 5, 285–307.Google Scholar
  25. Pielou, E.C. (1975) Ecological Diversity, Wiley, New York.Google Scholar
  26. Smith, W. and Grassle, J.F. (1977) Sampling properties of a family of diversity measures. Biometrics, 33, 283–92.Google Scholar
  27. Solow, A.R. (1993) A simple test for change in community structure. Journal of Animal Ecology, 62, 191–3.Google Scholar
  28. Thompson, S.K. (1992) Sampling, Wiley, New York.Google Scholar
  29. Zahl, S. (1977) Jackknifing an index of diversity. Ecology, 58, 907–13.Google Scholar
  30. Zipf, G.K. (1965) Human Behavior and Principle of Least Effort, Addison-Wesley, New York.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Anne Chao
    • 1
  • Tsung-Jen Shen
    • 1
  1. 1.Institute of StatisticsNational Tsing Hua UniversityHsin-ChuTAIWAN

Personalised recommendations