Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample
- Cite this article as:
- Chao, A. & Shen, TJ. Environmental and Ecological Statistics (2003) 10: 429. doi:10.1023/A:1026096204727
A biological community usually has a large number of species with relatively small abundances. When a random sample of individuals is selected and each individual is classified according to species identity, some rare species may not be discovered. This paper is concerned with the estimation of Shannon’s index of diversity when the number of species and the species abundances are unknown. The traditional estimator that ignores the missing species underestimates when there is a non-negligible number of unseen species. We provide a different approach based on unequal probability sampling theory because species have different probabilities of being discovered in the sample. No parametric forms are assumed for the species abundances. The proposed estimation procedure combines the Horvitz–Thompson (1952) adjustment for missing species and the concept of sample coverage, which is used to properly estimate the relative abundances of species discovered in the sample. Simulation results show that the proposed estimator works well under various abundance models even when a relatively large fraction of the species is missing. Three real data sets, two from biology and the other one from numismatics, are given for illustration.