Statistical Estimation of Uncultivated Microbial Diversity

  • J. BungeEmail author
Part of the Microbiology Monographs book series (MICROMONO, volume 10)


The full microbial richness of a community, or even of an environmental sample, usually cannot be observed completely, but only estimated statistically. This estimation is typically based on observed count data, that is, the counts of the representatives of each species (or other taxonomic units) appearing in the sample or samples. “Abundance” data consists of counts of the numbers of individuals from various species in a single sample, while “incidence” (or multiple recapture) data consists of lists of species appearing in several or many samples. In this chapter we consider statistical estimation of the total richness, i.e., the total number of species, observed + unobserved, based on abundance or on incidence data. We discuss parametric and nonparametric methods, their underlying assumptions, and their advantages and disadvantages; computational implementations and software; and larger scientific issues such as the scope of applicability of the results of a given analysis. Some real-world examples from microbial studies are presented. Our discussion is intended to serve as an overview and an introduction to the literature and available software.


Species Richness Clone Library Abundance Data Sampling Occasion Capture History 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Behnke A, Bunge J, Barger KJ, Stoeck T (2008) Impact of the time dimension on our perception of microbial molecular diversity and its patterns. Submitted for publicationGoogle Scholar
  2. Borchers DL, Buckland ST, Zucchini W (2002) Estimating animal abundance: closed populations. Springer New YorkGoogle Scholar
  3. Bunge J, Barger K (2008) Parametric models for estimating the number of classes. Biometrical Journal 50(5)Google Scholar
  4. Chao A (2005) Species estimation and applications. In: Balakrishnan N, Read , CBVidakovicCEncyclopedia of statistical sciences,2nd edn,vol 12.Wiley, New York, 7907–7916Google Scholar
  5. Chao A, Bunge JA (2002) Estimating the number of species in a stochastic abundance model. Biometrics 58:531–539PubMedCrossRefGoogle Scholar
  6. Chao A, Huggins RM (2005) Classical closed population models. In: Manly B, Mcdonald T, Amstrup S The handbook of capture–recapture methods, Princeton University Press, Princeton, 22–35Google Scholar
  7. Chao A, Lee S-M (1992) Estimating the number of classes via sample coverage. J Am Statist Assn 87:210–217CrossRefGoogle Scholar
  8. Chao A, Yip , PSFLee S-M, Chu W (2001) Population size estimation based on estimating functions for closed capture–recapture models. J Statist Plan Inference 92:213–232CrossRefGoogle Scholar
  9. Choquet R, Reboulet A-M, Pradel R, Gimenez O Lebreton J-D (2004) M-SURGE: new software specifically designed for multistate capture–recapture models. Anim Biodivers Conserv 27:207–215Google Scholar
  10. Colwell RK (2005) EstimateS: Statistical estimation of species richness and shared species from samples. Version 7.5. er’s Guide and application published at:
  11. Efford MG, Dawson DK, Robbins CS (2004) DENSITY: Software for analysing capture–recapture data from passive detector arrays. Anim Biodivers Conserv 27:217–228Google Scholar
  12. Epstein SS, Bunge J (2006) Estimation of microbial diversity from GenBank data. Appl Environ Microbiol 72:(10)6578–6583PubMedCrossRefGoogle Scholar
  13. Epstein SS, Bunge J (2008) Estimation of microbial diversity from GenBank data. In preparation.Google Scholar
  14. Fienberg SE, Johnson MS, Junker BW (1999) Classical multilevel and Bayesian approaches to population size estimation using multiple lists. J R Stat Soc: Ser A 162:383–405CrossRefGoogle Scholar
  15. Hong S-H, Bunge J, Jeon S-O, Epstein SS (2006) Predicting microbial species richness. Proc Natl Acad Sci USA 103:117–122PubMedCrossRefGoogle Scholar
  16. Huber JA, Mark Welch DB, Morrison HG, Huse SM, Neal PR, Butterfield DA, Sogin ML(2007) Microbial population structures in the deep marine biosphere. Science 318:97–100PubMedCrossRefGoogle Scholar
  17. Huggins RM, Yip PSF(2001) A note on nonparametric inference for capture–recapture experiments with heterogeneous capture probabilities. Statistica Sinica 11:843–853Google Scholar
  18. Lee S-M, Chao A (1994) Estimating population size via sample coverage for closed capture–recapture models. Biometrics 50:88–97PubMedCrossRefGoogle Scholar
  19. Magurran AE (2004) Measuring biological diversity. Blackwell, OxfordGoogle Scholar
  20. Mao CX (2004) Predicting the conditional probability of discovering a new class. J Am Stat Assoc 99:1108–1118CrossRefGoogle Scholar
  21. Mao CX, Lindsay BG (2007) Estimating the number of classes. Ann Stat 35:917–930CrossRefGoogle Scholar
  22. Norris JL III, Pollock KH (1996) Nonparametric MLE under two closed capture–recapture models with heterogeneity. Biometrics 52:639–649CrossRefGoogle Scholar
  23. Pledger S (2005) The performance of mixture models in heterogeneous closed population capture–recapture. Biometrics 61:868–876PubMedCrossRefGoogle Scholar
  24. Rexstad E, Burnham KP (1991). User’s Guide for Interactive Program CAPTURE. Colorado Cooperative Fish and Wildlife Research Unit, Fort Collins, ***CO, USA, 29Google Scholar
  25. Shen TJ, Chao A, Lin CF (2003) Predicting the number of new species in further taxonomic sampling. Ecology 84:798–804CrossRefGoogle Scholar
  26. Stackebrandt E, Goebel BM (1994) Taxonomic note: A place for DNA:DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteria. Int J Syst Bacteriol 44:846–849CrossRefGoogle Scholar
  27. Tardella L (2002) A new Bayesian method for nonparametric capture–recapture models in presence of heterogeneity. Biometrika 89:807–817CrossRefGoogle Scholar
  28. Wang J-PZ, Lindsay BG (2005) A penalized nonparametric maximum likelihood approach to species richness estimation. J Am Stat Assoc 100:942–959CrossRefGoogle Scholar
  29. Williamson M, Gaston KJ (2005) The lognormal distribution is not an appropriate null hypothesis for the species-abundance distribution. J Anim Ecol 74:409–422CrossRefGoogle Scholar
  30. Zwane E, van der Heijden P (2005) Population estimation using the multiple system estimator in the presence of continuous covariates. Stat Modelling 5:39–52CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Department of Statistical ScienceCornell UniversityIthacaUSA

Personalised recommendations