Abstract
We show that inferring the taxa-abundance distribution of a microbial community from small environmental samples alone is difficult. The difficulty stems from the disparity in scale between the number of genetic sequences that can be characterized and the number of individuals in communities that microbial ecologists aspire to describe. One solution is to calibrate and validate a mathematical model of microbial community assembly using the small samples and use the model to extrapolate to the taxa-abundance distribution for the population that is deemed to constitute a community. We demonstrate this approach by using a simple neutral community assembly model in which random immigrations, births, and deaths determine the relative abundance of taxa in a community. In doing so, we further develop a neutral theory to produce a taxa-abundance distribution for large communities that are typical of microbial communities. In addition, we highlight that the sampling uncertainties conspire to make the immigration rate calibrated on the basis of small samples very much higher than the true immigration rate. This scale dependence of model parameters is not unique to neutral theories; it is a generic problem in ecology that is particularly acute in microbial ecology. We argue that to overcome this, so that microbial ecologists can characterize large microbial communities from small samples, mathematical models that encapsulate sampling effects are required.
Similar content being viewed by others
References
Bell, G (2000) The distribution of abundance in neutral communities. Am Nat 155: 606–617
Bell, T, Agar, D, Song, J, Newman, JA, Thompson, IP, Lilley, AK, van der Gast, CJ (2005) Larger islands house more bacterial taxa. Science 308: 1884
Coskuner, G, Ballinger, SJ, Davenport, RJ, Pickering, RL, Solera, R, Head, IM, Curtis, TP (2005) Agreement between theory and measurement in quantification of ammonia-oxidizing bacteria. Appl Environ Microbiol 71: 6325–6334
Cox, DR, Miller, HD (1965) The Theory of Stochastic Processes. Methuen, London
Curtis, T, Sloan, WT, Scannell, J (2002) Modelling prokaryotic diversity and its limits. Proc Natl Acad Sci 99: 10494–10499
Curtis, TP, Sloan, WT (2005) Exploring microbial diversity—a vast below. Science 309: 1331–1333
Enquist, BJ, Sanderson, J, Weiser, MD (2002) Modeling macroscopic patterns in ecology. Science 295: 1835–1836
Fenchel, T, Finlay, BJ (2005) Bacteria and Island Biogeography. Science 309: 1997–1999
Finlay, BJ, Clarke, KJ (1999) Ubiquitous dispersal of microbial species. Nature 400: 828–828
Green, JL, Holmes, AJ, Westoby, M, Oliver, I, Briscoe, D, Dangerfield, M, et al. (2004) Spatial scaling of microbial eukaryote diversity. Nature 432: 747–750
Harris, LD (1984) The Fragmented Forest. University of Chicago Press
Horner-Devine, MC, Lage, M, Hughes, JB, Bohannan, BJM (2004) A taxa-area relationship for bacteria. Nature 432: 750–753
Houchmandzadeh, B, Vallade, M (2003) Clustering in neutral ecology. Phys Rev E 68: art. no. 061912
Hubbell, SP (2001) The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton
Kimura, M, Ohta, T (1971) Theoretical Aspects of Population Genetics. Princeton University Press, Princeton
Linacre, CH (2004) Diversity and the quantification of ammonia oxidising bacteria and denitrification from turbidity maximum of estuaries. PhD thesis, Civil Engineering and Geosciences, University of Newcastle upon Tyne.
MacArthur, RH, Wilson, EO (Eds.) (1967) The Theory of Island Biogeography. Princeton University Press, Princeton
May, RM (1975) Patterns of species abundance and diversity. In: Cody, ML, Diamond, JM (Eds.), Ecology and Evolution of Communities. Harvard University Press, Harvard, MA, pp 81–120
McGill, BJ (2003) A test of the unified neutral theory of biodiversity. Nature 422: 881–885
McKane, AJ, Alonso, D, Sole, RV (2004) Analytic solution of Hubbell’s model of local community dynamics. Theor Popul Biol 65: 67–73
Purkhold, U, Pommerening-Roser, A, Juretschko, S, Schmid, MC, Koops, HP, Wagner, M (2000) Phylogeny of all recognized species of ammonia oxidizers based on comparative 16S rRNA and amoA sequence analysis: implications for molecular diversity surveys. Appl Environ Microbiol 66: 5368–5382
Sloan, WT, Woodcock, S, Lunn, M, Head, IM, Nee, S, Curtis, TP (2005) The roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol, Early Online 28 Nov
Sloan, WT, Lunn, M, Woodcock, S, Head, IM, Nee, S, Curtis, TP (2006) Quantifying the roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol 8: 732–740
Vallade, M, Houchmandzadeh, B (2003) Analytical solution of a neutral model of biodiversity. Phys Rev E 68: art. no. 061902
Volkov, I, Banavar, JR, Hubbell, SP, Maritan, A (2003) Neutral theory and relative species abundance in ecology. Nature 424: 1035–1037
Wagner, M, Loy, A (2002) Bacterial community composition and function in sewage treatment systems. Curr Opin Biotechnol 13: 218–227
Whitman, WB, Coleman, DC, Wiebe, WJ (1998) Prokaryotes: the unseen majority. Proc Natl Acad Sci USA 95: 6578–6583
Woodcock, S, Lunn, M, Curtis, TP, Head, IM, Sloan, WT (2006) Taxa area relationships for microbes: the unsampled and the unseen. Ecol Lett 9: 805–812
Zwart, G, van Hannen, EJ, van Kamst, Agterveld, MP, van der Gucht, K, Lindstrom, ES, van Wichelen, J, et al. (2003) Rapid screening for freshwater bacterial groups by using reverse line blot hybridization. Appl Environ Microbiol 69: 5875–5883
Author information
Authors and Affiliations
Corresponding author
Appendix: Mathematical Appendix
Appendix: Mathematical Appendix
Kolmogorov Backward Equation for the neutral community model
The basis of the model is Hubbell’s NCM in which the community is saturated with a total of NT individuals; and for an assemblage to change, an individual must die or leave the system. This occurs at a taxa independent rate δ. The dead individual is immediately replaced by an immigrant from a source community, with probability m, or by reproduction of a member of the local community with probability 1−m. Thus, the community forms and develops through a continuous cycle of immigration, reproduction, and death. Assuming that deaths are uniformly distributed in time, then during a period of time 1/d one death is expected and the ith species, with initial absolute abundance N i , will either increase by 1, stay the same, or decrease by 1, with probability given by the following three expressions, respectively:
where p i is the relative abundance of the ith species in the source community. Hubbell used these transition probabilities for relatively small populations to form a finite Markov–Chain model with which the community dynamics can be investigated and the stationary probability distribution for N i can be calculated. The computational expense [19] of this discrete Markov-Chain formulation makes it impossible to apply to the very large diverse populations that typify the microbial world [27]. Here, we employ Kimura and Ohta’s [15] methods to recast the model for large populations.
Let, \( x_{i} = \frac{{N_{i} }} {{N_{{\text{T}}} }} \) be the relative abundance of the ith species, and assume that N T, the local community size, is large enough that x i can be considered continuous. Also, let \( \phi {\left( {x_{i} , x_{2} , \ldots , x_{n} ; t} \right)} \) be the joint pdf that the relative abundances of species 1,..., n at time t are x 1,..., x n , respectively. The continuous model comes from considering the expected change in ϕ that will occur in a small time interval δt. To do this, we define \( g{\left( {x_{i} , \delta x_{1} , \ldots , x_{n} , \delta x_{n} ; t, \delta t} \right)} \) to be the pdf for the relative abundance of species1 changing from x 1 to x 1 + δx 1, and the relative abundance of species 2 changes from x 2 to x 2 + δx 2,..., and the abundance of species n changes from x n to x n + δx n during the time period between t and t + δt.
Then,
Expanding this as an n-dimensional Taylor series about the point x 1,..., x n and neglecting terms of order 3 and above gives
where ϕg denotes ϕ (x 1, x 2,...,x n , t)g(x 1, δx 1,...,x n , δx n ; t, δt). Because \( {\int {g{\text{d}}{\left( {\delta x_{i} } \right)} = 1,} } \)
therefore,
where \( M_{{\delta x_{i} }} \) and \( V_{{\delta x_{i} }} \) are the first and second moments of the change in x i per unit of time and \( C_{{\delta x_{i} \delta x_{j} }} \) is the expected product of changes in x i and x j . This is the n-dimensional version of the Kolmogorov equation. By considering the expected changes in relative abundance in the discrete time interval 1/d given by Eqs. (8)–(10), then \( M_{{\delta x_{i} }} ,{\text{ }}V_{{\delta x_{i} }} \) and \( C_{{\delta x_{i} \delta x_{j} }} \) can be approximated by
Reasoning that typically either m is small or p i rapidly converges on x i , we can neglect all but the first term of both \( C_{{\delta x_{i} \delta x_{j} }} \) and \( V_{{\delta x_{i} }} \). Equations (13)–(16) then define the NCM for large populations by describing the change in the joint probability of the relative abundances of the n different taxa in the local community.
Stationary probability density function
The solution to the diffusion equation [Eq. (13)] with \( \frac{{\partial \phi }} {{\partial t}} = 0 \) and reflecting boundaries, where x i = 0 or x i = 1, gives the stationary (long-term equilibrium) joint probability density function (pdf) for the relative abundance of the n taxa in the local community, \( {\left\{ {x_{i} } \right\}}^{n}_{{i = 1}} \). Here, we show that the joint pdf for a Dirichlet distribution,
where \( x_{n} = 1 - x_{1} - \cdots - x_{{n - 1}} \) and \( p_{n} = 1 - p_{1} - \cdots - p_{{n - 1}} \) is a solution.
Note that if
then \( \frac{{\partial \phi }} {{\partial t}} = 0 \). Therefore, substituting in Eqs. (14)–(16), we require
Substituting ϕ into the left-hand side of Eq. (19) gives
Similarly, substituting ϕ into the right-hand side of (19) gives
Now, because (20) and (21) are equal, ϕ is a solution to the diffusion equation [Eq. (13)] with \( \frac{{\partial \phi }} {{\partial t}} = 0 \) and the reflecting boundary conditions are met.
Algorithm for generating the stationary probability density function
Given the relative abundances of n taxa in the source community \( {\left\{ {p_{i} } \right\}}^{n}_{{i = 1}} \), a realization of the Dirichlet distributed local abundances can be generated by sampling from a set of gamma dis-tributions. Let \( {\left\{ {Y_{i} } \right\}}^{n}_{{i = 1}} \) be random variables such that Y i ∼ gamma(N T mp i ) and let \( {\left\{ {Y_{i} } \right\}}^{n}_{{i = 1}} \) be realizations of these variables sampled at random, then
will represent a random sample from the Dirichlet joint probability distribution for a local neutral community [Eq. (17)].
Sampling a neutral community
We have already shown that for the continuous variant of the NCM, the steady-state joint pdf for all species is Dirichlet Dir(N T mp i ,...,N T mp n ), where p 1,...,p n are the relative abundances of the species in the metacommunity.
We can repeat the exact same argument to derive the joint distribution of the relative abundances within a sample of size N S from such a community. Strictly speaking, selecting a subsample of size N S from a local community is achieved by simply sampling N S individuals without replacement from the community of size N T. However, since for almost all microbial samples \( N_{{\text{S}}} \ll N_{{\text{T}}} \), the problem can be approximated to one of sampling with replacement.
Regard the sampling exercise as a continuous process through time. Individuals are selected from the source community one by one until a sample of size N S has been collected. Once this sample size has been reached, the process of selecting individuals continues at regular intervals in time (generations) but now the selected individual replaces one randomly chosen individual currently in the sample population. This is analogous to the argument used for deriving the joint distribution for the local abundances, except that we have a pure immigration–death process, with immigrants into the sample from the local community. Setting m = 1 and regarding our local abundances as the metacommunity from which immigrants are drawn, it is clear that conditional on knowledge of local abundances x 1,...,x n the joint distribution of relative abundances y 1,...,y n within a sample is Dirichlet Dir(N S x i,...,N S x n ). That is,
where X = (x 1,...,x n ) and X = (y 1,...,y n ) for notational convenience. This allows us to calculate the first and second moments of the sample distribution because we know that the marginal densities of a Dirichlet distribution are beta distributed. Therefore,
and
Now, since \( x_{i} \sim Beta{\left( {N_{{\text{T}}} {\text{mp}}_{i} , N_{{\text{T}}} m{\left( {1 - p_{i} } \right)}} \right)} \), we have that
and
letting
then
We were unable to derive a neat analytical solution for the marginal pdfs of abundance in the sample. However, repeated sampling from neutrally assembled synthetic communities confirmed that the marginals were very closely approximated by beta distributions. If we assume that the sample marginal distributions are exactly beta, then—as their first and second moments are given by Eqs. (26) and (29), respectively—the sample distribution is given by,
Rights and permissions
About this article
Cite this article
Sloan, W.T., Woodcock, S., Lunn, M. et al. Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data. Microb Ecol 53, 443–455 (2007). https://doi.org/10.1007/s00248-006-9141-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00248-006-9141-x