Skip to main content
Log in

Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data

  • Published:
Microbial Ecology Aims and scope Submit manuscript

Abstract

We show that inferring the taxa-abundance distribution of a microbial community from small environmental samples alone is difficult. The difficulty stems from the disparity in scale between the number of genetic sequences that can be characterized and the number of individuals in communities that microbial ecologists aspire to describe. One solution is to calibrate and validate a mathematical model of microbial community assembly using the small samples and use the model to extrapolate to the taxa-abundance distribution for the population that is deemed to constitute a community. We demonstrate this approach by using a simple neutral community assembly model in which random immigrations, births, and deaths determine the relative abundance of taxa in a community. In doing so, we further develop a neutral theory to produce a taxa-abundance distribution for large communities that are typical of microbial communities. In addition, we highlight that the sampling uncertainties conspire to make the immigration rate calibrated on the basis of small samples very much higher than the true immigration rate. This scale dependence of model parameters is not unique to neutral theories; it is a generic problem in ecology that is particularly acute in microbial ecology. We argue that to overcome this, so that microbial ecologists can characterize large microbial communities from small samples, mathematical models that encapsulate sampling effects are required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

References

  1. Bell, G (2000) The distribution of abundance in neutral communities. Am Nat 155: 606–617

    Article  PubMed  Google Scholar 

  2. Bell, T, Agar, D, Song, J, Newman, JA, Thompson, IP, Lilley, AK, van der Gast, CJ (2005) Larger islands house more bacterial taxa. Science 308: 1884

    Article  PubMed  CAS  Google Scholar 

  3. Coskuner, G, Ballinger, SJ, Davenport, RJ, Pickering, RL, Solera, R, Head, IM, Curtis, TP (2005) Agreement between theory and measurement in quantification of ammonia-oxidizing bacteria. Appl Environ Microbiol 71: 6325–6334

    Article  PubMed  CAS  Google Scholar 

  4. Cox, DR, Miller, HD (1965) The Theory of Stochastic Processes. Methuen, London

    Google Scholar 

  5. Curtis, T, Sloan, WT, Scannell, J (2002) Modelling prokaryotic diversity and its limits. Proc Natl Acad Sci 99: 10494–10499

    Article  PubMed  CAS  Google Scholar 

  6. Curtis, TP, Sloan, WT (2005) Exploring microbial diversity—a vast below. Science 309: 1331–1333

    Article  PubMed  CAS  Google Scholar 

  7. Enquist, BJ, Sanderson, J, Weiser, MD (2002) Modeling macroscopic patterns in ecology. Science 295: 1835–1836

    Article  PubMed  CAS  Google Scholar 

  8. Fenchel, T, Finlay, BJ (2005) Bacteria and Island Biogeography. Science 309: 1997–1999

    Article  PubMed  CAS  Google Scholar 

  9. Finlay, BJ, Clarke, KJ (1999) Ubiquitous dispersal of microbial species. Nature 400: 828–828

    Article  Google Scholar 

  10. Green, JL, Holmes, AJ, Westoby, M, Oliver, I, Briscoe, D, Dangerfield, M, et al. (2004) Spatial scaling of microbial eukaryote diversity. Nature 432: 747–750

    Article  PubMed  CAS  Google Scholar 

  11. Harris, LD (1984) The Fragmented Forest. University of Chicago Press

  12. Horner-Devine, MC, Lage, M, Hughes, JB, Bohannan, BJM (2004) A taxa-area relationship for bacteria. Nature 432: 750–753

    Article  PubMed  CAS  Google Scholar 

  13. Houchmandzadeh, B, Vallade, M (2003) Clustering in neutral ecology. Phys Rev E 68: art. no. 061912

    Google Scholar 

  14. Hubbell, SP (2001) The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton

    Google Scholar 

  15. Kimura, M, Ohta, T (1971) Theoretical Aspects of Population Genetics. Princeton University Press, Princeton

    Google Scholar 

  16. Linacre, CH (2004) Diversity and the quantification of ammonia oxidising bacteria and denitrification from turbidity maximum of estuaries. PhD thesis, Civil Engineering and Geosciences, University of Newcastle upon Tyne.

  17. MacArthur, RH, Wilson, EO (Eds.) (1967) The Theory of Island Biogeography. Princeton University Press, Princeton

  18. May, RM (1975) Patterns of species abundance and diversity. In: Cody, ML, Diamond, JM (Eds.), Ecology and Evolution of Communities. Harvard University Press, Harvard, MA, pp 81–120

    Google Scholar 

  19. McGill, BJ (2003) A test of the unified neutral theory of biodiversity. Nature 422: 881–885

    Article  PubMed  CAS  Google Scholar 

  20. McKane, AJ, Alonso, D, Sole, RV (2004) Analytic solution of Hubbell’s model of local community dynamics. Theor Popul Biol 65: 67–73

    Article  PubMed  Google Scholar 

  21. Purkhold, U, Pommerening-Roser, A, Juretschko, S, Schmid, MC, Koops, HP, Wagner, M (2000) Phylogeny of all recognized species of ammonia oxidizers based on comparative 16S rRNA and amoA sequence analysis: implications for molecular diversity surveys. Appl Environ Microbiol 66: 5368–5382

    Article  PubMed  CAS  Google Scholar 

  22. Sloan, WT, Woodcock, S, Lunn, M, Head, IM, Nee, S, Curtis, TP (2005) The roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol, Early Online 28 Nov

  23. Sloan, WT, Lunn, M, Woodcock, S, Head, IM, Nee, S, Curtis, TP (2006) Quantifying the roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol 8: 732–740

    Article  PubMed  Google Scholar 

  24. Vallade, M, Houchmandzadeh, B (2003) Analytical solution of a neutral model of biodiversity. Phys Rev E 68: art. no. 061902

    Google Scholar 

  25. Volkov, I, Banavar, JR, Hubbell, SP, Maritan, A (2003) Neutral theory and relative species abundance in ecology. Nature 424: 1035–1037

    Article  PubMed  CAS  Google Scholar 

  26. Wagner, M, Loy, A (2002) Bacterial community composition and function in sewage treatment systems. Curr Opin Biotechnol 13: 218–227

    Article  PubMed  CAS  Google Scholar 

  27. Whitman, WB, Coleman, DC, Wiebe, WJ (1998) Prokaryotes: the unseen majority. Proc Natl Acad Sci USA 95: 6578–6583

    Article  PubMed  CAS  Google Scholar 

  28. Woodcock, S, Lunn, M, Curtis, TP, Head, IM, Sloan, WT (2006) Taxa area relationships for microbes: the unsampled and the unseen. Ecol Lett 9: 805–812

    Article  PubMed  Google Scholar 

  29. Zwart, G, van Hannen, EJ, van Kamst, Agterveld, MP, van der Gucht, K, Lindstrom, ES, van Wichelen, J, et al. (2003) Rapid screening for freshwater bacterial groups by using reverse line blot hybridization. Appl Environ Microbiol 69: 5875–5883

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William T. Sloan.

Appendix: Mathematical Appendix

Appendix: Mathematical Appendix

Kolmogorov Backward Equation for the neutral community model

The basis of the model is Hubbell’s NCM in which the community is saturated with a total of NT individuals; and for an assemblage to change, an individual must die or leave the system. This occurs at a taxa independent rate δ. The dead individual is immediately replaced by an immigrant from a source community, with probability m, or by reproduction of a member of the local community with probability 1−m. Thus, the community forms and develops through a continuous cycle of immigration, reproduction, and death. Assuming that deaths are uniformly distributed in time, then during a period of time 1/d one death is expected and the ith species, with initial absolute abundance N i , will either increase by 1, stay the same, or decrease by 1, with probability given by the following three expressions, respectively:

$$ Pr{\left( {N_{i} + 1 \mathord{\left/ {\vphantom {1 {N_{i} }}} \right. \kern-\nulldelimiterspace} {N_{i} }} \right)} = {\left( {\frac{{N_{{\text{T}}} - N_{i} }} {{N_{{\text{T}}} }}} \right)}{\left[ {{\text{mp}}_{i} + {\left( {1 - m} \right)}{\left( {\frac{{N_{i} }} {{N_{{\text{T}}} - 1}}} \right)}} \right]} $$
(8)
$$Pr{\left( {{N_{i} } \mathord{\left/ {\vphantom {{N_{i} } {N_{i} }}} \right. \kern-\nulldelimiterspace} {N_{i} }} \right)} = \frac{{N_{i} }}{{N_{{\text{T}}} }}{\left[ {{\text{mp}}_{i} + {\left( {1 - m} \right)}{\left( {\frac{{N_{i} - 1}}{{N_{{\text{T}}} - 1}}} \right)}} \right]} + {\left( {\frac{{N_{{\text{T}}} - N_{i} }}{{N_{{\text{T}}} }}} \right)}{\left[ {m{\left( {1 - p_{i} } \right)} + {\left( {1 - m} \right)}{\left( {\frac{{N_{{\text{T}}} - N_{i} - 1}}{{N_{{\text{T}}} - 1}}} \right)}} \right]}$$
(9)
$$ Pr{\left( {N_{i} - 1 \mathord{\left/ {\vphantom {1 {N_{i} }}} \right. \kern-\nulldelimiterspace} {N_{i} }} \right)} = \frac{{N_{i} }} {{N_{{\text{T}}} }}{\left[ {m{\left( {1 - p_{i} } \right)} + {\left( {1 - m} \right)}{\left( {\frac{{N_{{\text{T}}} - N_{i} }} {{N_{{\text{T}}} - 1}}} \right)}} \right]} $$
(10)

where p i is the relative abundance of the ith species in the source community. Hubbell used these transition probabilities for relatively small populations to form a finite Markov–Chain model with which the community dynamics can be investigated and the stationary probability distribution for N i can be calculated. The computational expense [19] of this discrete Markov-Chain formulation makes it impossible to apply to the very large diverse populations that typify the microbial world [27]. Here, we employ Kimura and Ohta’s [15] methods to recast the model for large populations.

Let, \( x_{i} = \frac{{N_{i} }} {{N_{{\text{T}}} }} \) be the relative abundance of the ith species, and assume that N T, the local community size, is large enough that x i can be considered continuous. Also, let \( \phi {\left( {x_{i} , x_{2} , \ldots , x_{n} ; t} \right)} \) be the joint pdf that the relative abundances of species 1,..., n at time t are x 1,..., x n , respectively. The continuous model comes from considering the expected change in ϕ that will occur in a small time interval δt. To do this, we define \( g{\left( {x_{i} , \delta x_{1} , \ldots , x_{n} , \delta x_{n} ; t, \delta t} \right)} \) to be the pdf for the relative abundance of species1 changing from x 1 to x 1 + δx 1, and the relative abundance of species 2 changes from x 2 to x 2 + δx 2,..., and the abundance of species n changes from x n to x n + δx n during the time period between t and t + δt.

Then,

$$ \phi {\left( {x_{i} , \ldots , x_{n} ; t + \delta t} \right)} = {\int {\phi {\left( {x_{1} - \delta x_{1} , \ldots , x_{n} - \delta x_{n} ; t} \right)}} }g{\left( {x_{1} - \delta x_{1} , \delta x_{1} , \ldots , x_{n} - \delta x_{n} , \delta x_{n} ; t, \delta t} \right)}{\text{d}}{\left( {\delta x_{1} } \right)} \cdots {\text{d}}{\left( {\delta x_{n} } \right)} $$

Expanding this as an n-dimensional Taylor series about the point x 1,..., x n and neglecting terms of order 3 and above gives

$$ \phi {\left( {x_{i} , \ldots ,x_{n} ;t + \delta t} \right)}{\int {{\left[ \begin{aligned} \phi g - {\sum\limits_{i = 1}^n {{\left( {\delta x_{i} \frac{\partial } {{\partial x_{i} }}{\left( {\phi g} \right)}} \right)}} } + {\sum\limits_{i = 1}^n {{\left( {\frac{{{\left( {\delta x_{i} } \right)}^{2} }} {2}\frac{{\partial ^{2} }} {{\partial x^{2}_{i} }}{\left( {\phi g} \right)}} \right)}} } + \frac{1} {2}{\sum\limits_{i = 1}^n {{\sum\limits_{j \ne i} {{\left( {\delta x_{i} \delta x_{j} \frac{{\partial ^{2} }} {{\partial x_{i} \partial x_{j} }}{\left( {\phi g} \right)}} \right)}} }} } \\ \quad \quad \; \\ \end{aligned} \right]}} }d{\left( {\delta x_{1} } \right)} \cdots d{\left( {\delta x_{n} } \right)} $$
(11)

where ϕg denotes ϕ (x 1, x 2,...,x n , t)g(x 1, δx 1,...,x n , δx n ; t, δt). Because \( {\int {g{\text{d}}{\left( {\delta x_{i} } \right)} = 1,} } \)

$$\begin{aligned} & \phi {\left( {x_{1} , x_{2} , \ldots , x_{n} ; t + \delta t} \right)} \\ & \quad \quad - \phi {\left( {x_{1} , x_{2} , \ldots , x_{n} ; t} \right)} \\ & \quad = - {\sum\limits_{i = 1}^n {\frac{\partial }{{\partial x_{i} }}} }{\left( {\phi {\left( {p_{i} , x_{i} ; t} \right)}{\int {{\left( {\delta x_{i} } \right)}g{\text{d}}{\left( {\delta x_{i} } \right)}} }} \right)} \\ & \quad \;\;\, + \frac{1}{2}{\sum\limits_{i = 1}^n {\frac{{\partial ^{2} }}{{\partial x^{2}_{i} }}{\left( {\phi {\left( {p_{i} , x_{i} ;t} \right)}{\int {{\left( {\delta x_{i} } \right)}^{2} g\;d{\left( {\delta x_{i} } \right)}} }} \right)}} } \\ & \quad \;\;\, + \frac{1}{2}{\sum\limits_{i = 1}^n {{\sum\limits_{j \ne i} {{\left( {\phi {\left( {p_{i} , x_{i} ; t} \right)}{\int {{\left( {\partial \delta x_{i} } \right)}{\left( {\delta x_{j} } \right)}g{\text{d}}{\left( {\delta x_{i} } \right)}{\text{d}}{\left( {\delta x_{j} } \right)}} }} \right)}} }} } \\ \end{aligned} $$
(12)

therefore,

$$ \frac{{\partial \phi }} {{\partial t}} = {\sum\limits_{i = 1}^n {{\left[ { - \frac{{\partial {\left( {M_{{\delta x_{i} }} \phi } \right)}}} {{\partial x_{i} }} + \frac{1} {2}\frac{{\partial ^{2} {\left( {V_{{\delta x_{i} \phi }} } \right)}}} {{\partial x^{2}_{i} }}} \right]} + \frac{1} {2}} }{\sum\limits_{i = 1}^n {{\sum\limits_{j \ne i} {\frac{{\partial ^{2} {\left( {C_{{\delta x_{i} \delta x_{j} }} \phi } \right)}}} {{\partial x_{i} \partial x_{j} }}} }} } $$
(13)

where \( M_{{\delta x_{i} }} \) and \( V_{{\delta x_{i} }} \) are the first and second moments of the change in x i per unit of time and \( C_{{\delta x_{i} \delta x_{j} }} \) is the expected product of changes in x i and x j . This is the n-dimensional version of the Kolmogorov equation. By considering the expected changes in relative abundance in the discrete time interval 1/d given by Eqs. (8)–(10), then \( M_{{\delta x_{i} }} ,{\text{ }}V_{{\delta x_{i} }} \) and \( C_{{\delta x_{i} \delta x_{j} }} \) can be approximated by

$$ M_{{\delta x_{i} }} = \frac{{m{\left( {p_{i} - x_{i} } \right)}}} {{N_{{\text{T}}} }} $$
(14)
$$ V_{{\delta x_{i} }} = \frac{{2x_{i} {\left( {1 - x_{i} } \right)} + m{\left( {p_{i} - x_{i} } \right)}{\left( {1 - 2x_{i} } \right)}}} {{N^{2}_{{\text{T}}} }} $$
(15)
$$ C_{{\delta x_{j} \delta x_{j} }} = - {\left[ {\frac{{2x_{i} x_{j} + m{\left[ {x_{i} {\left( {p_{j} - x_{j} } \right)} + x_{j} {\left( {x_{i} - p_{i} } \right)}} \right]}}} {{N^{2}_{{\text{T}}} }}} \right]}. $$
(16)

Reasoning that typically either m is small or p i rapidly converges on x i , we can neglect all but the first term of both \( C_{{\delta x_{i} \delta x_{j} }} \) and \( V_{{\delta x_{i} }} \). Equations (13)–(16) then define the NCM for large populations by describing the change in the joint probability of the relative abundances of the n different taxa in the local community.

Stationary probability density function

The solution to the diffusion equation [Eq. (13)] with \( \frac{{\partial \phi }} {{\partial t}} = 0 \) and reflecting boundaries, where x i = 0 or x i = 1, gives the stationary (long-term equilibrium) joint probability density function (pdf) for the relative abundance of the n taxa in the local community, \( {\left\{ {x_{i} } \right\}}^{n}_{{i = 1}} \). Here, we show that the joint pdf for a Dirichlet distribution,

$$ \phi = {\left[ {\frac{{\Gamma {\left( {N_{{\text{T}}} m} \right)}}} {{\Gamma {\left( {N_{{\text{T}}} {\text{mp}}_{1} } \right)} \ldots \Gamma {\left( {N_{{\text{T}}} {\text{mp}}_{n} } \right)}}}} \right]}x^{{N_{{\text{T}}} {\text{mp}}_{1} - 1}}_{1} x^{{N_{{\text{T}}} {\text{mp}}_{2} - 1}}_{2} \ldots x^{{N_{{\text{T}}} {\text{mp}}_{n} - 1}}_{n} $$
(17)

where \( x_{n} = 1 - x_{1} - \cdots - x_{{n - 1}} \) and \( p_{n} = 1 - p_{1} - \cdots - p_{{n - 1}} \) is a solution.

Note that if

$$ {\left[ { - {\left( {M_{{\delta x_{i} }} \phi } \right)} + \frac{1} {2}\frac{{\partial {\left( {V_{{\delta x_{i} }} \phi } \right)}}} {{\partial x_{i} }}} \right]} + \frac{1} {2}{\sum\limits_{i \ne j} {\frac{{\partial {\left( {C_{{\delta x_{i} \delta x_{j} }} \phi } \right)}}} {{\partial x_{j} }}} }{\text{ = }}0{\text{ for }}i = 1, \ldots , n $$
(18)

then \( \frac{{\partial \phi }} {{\partial t}} = 0 \). Therefore, substituting in Eqs. (14)–(16), we require

$$ \frac{{m{\left( {p_{i} - x_{i} } \right)}}} {{N_{{\text{T}}} }}\phi - \frac{1} {2}\frac{\partial } {{\partial x_{i} }}{\left( {\frac{{2x_{i} {\left( {1 - x_{i} } \right)}}} {{N^{2}_{{\text{T}}} }}\phi } \right)} = \frac{1} {2}{\sum\limits_{i \ne j} {\frac{\partial } {{\partial x_{j} }}{\left( {\frac{{ - 2x_{i} x_{j} }} {{N^{2}_{{\text{T}}} }}\phi } \right)}} } $$
(19)

Substituting ϕ into the left-hand side of Eq. (19) gives

$$ \frac{{m{\left( {p_{i} - x_{i} } \right)}}} {{N_{{\text{T}}} }}\phi - \frac{\partial } {{\partial x_{i} }}{\left( {\frac{{x_{i} {\left( 1 \right)} - x_{i} }} {{N^{2}_{{\text{T}}} }}\phi } \right)} $$
$$ \matrix {{ = \frac{{m{\left( {p_{i} - x_{i} } \right)}}} {{N_{{\text{T}}} }}\phi - {\left[ {\frac{\phi } {{N^{2}_{{\text{T}}} }}} \right]}{\left[ {N_{{\text{T}}} {\text{mp}}_{i} - x_{i} {\left( {N_{{\text{T}}} {\text{mp}}_{i} + 1} \right)} - \frac{{x_{i} {\left( {1 - x_{i} } \right)}{\left( {N_{{\text{T}}} {\text{mp}}_{n} - 1} \right)}}} {{x_{n} }}} \right]}} \hfill} \\ {{ = {\left[ {\frac{\phi } {{N^{2}_{{\text{T}}} }}} \right]}{\left[ {N_{{\text{T}}} m{\left( {p_{i} - x_{i} } \right)} - {\left( {N_{{\text{T}}} {\text{mp}}_{i} - x_{i} {\left( {N_{{\text{T}}} {\text{mp}}_{i} + 1} \right)} - \frac{{x_{i} {\left( {1 - x_{i} } \right)}{\left( {N_{{\text{T}}} {\text{mp}}_{n} - 1} \right)}}} {{x_{n} }}} \right)}} \right]}} \hfill} \\ {{ = {\left[ { - \frac{{\phi x_{i} }} {{N^{2}_{{\text{T}}} }}} \right]}{\left[ {N_{{\text{T}}} m{\left( {1 - p_{i} } \right)} - 1 - \frac{{{\left( {1 - x_{i} } \right)}}} {{x_{n} }}{\left( {N_{{\text{T}}} {\text{mp}}_{n} - 1} \right)}} \right]}} \hfill} \ $$
(20)

Similarly, substituting ϕ into the right-hand side of (19) gives

$$ \matrix {{\sum\limits_{i \ne j} {\frac{\partial } {{\partial x_{j} }}{\left( {\frac{{ - x_{i} x_{j} }} {{N^{2}_{{\text{T}}} }}\phi } \right)}} }}{ = - \frac{{x_{i} \phi }} {{N^{2}_{{\text{T}}} }}{\sum\limits_{i \ne j} {{\left[ {N_{{\text{T}}} {\text{mp}}_{j} - \frac{{x_{j} }} {{x_{n} }}{\left( {N_{{\text{T}}} {\text{mp}}_{n} - 1} \right)}} \right]}} }} \\ {}{ = - \frac{{x_{i} \phi }} {{N^{2}_{{\text{T}}} }}{\left[ {N_{{\text{T}}} m{\left( {1 - p_{i} - p_{r} } \right)} - \frac{{{\left( {1 - x_{i} - x_{n} } \right)}}} {{x_{n} }}{\left( {N_{{\text{T}}} {\text{mp}}_{n} - 1} \right)}} \right]}} \\ {}{ = - \frac{{x_{i} \phi }} {{N^{2}_{{\text{T}}} }}{\left[ {N_{{\text{T}}} m{\left( {1 - p_{i} - p_{r} } \right)} + {\left( {N_{{\text{T}}} {\text{mp}}_{r} - 1} \right)} - \frac{{{\left( {1 - x_{i} } \right)}}} {{x_{n} }}{\left( {N_{{\text{T}}} {\text{mp}}_{n} - 1} \right)}} \right]}} \\ {}{ = - \frac{{x_{i} \phi }} {{N^{2}_{{\text{T}}} }}{\left[ {N_{{\text{T}}} m{\left( {1 - p_{i} } \right)} - 1 - \frac{{{\left( {1 - x_{i} } \right)}}} {{x_{n} }}{\left( {N_{{\text{T}}} {\text{mp}}_{n} - 1} \right)}} \right]}} \ $$
(21)

Now, because (20) and (21) are equal, ϕ is a solution to the diffusion equation [Eq. (13)] with \( \frac{{\partial \phi }} {{\partial t}} = 0 \) and the reflecting boundary conditions are met.

Algorithm for generating the stationary probability density function

Given the relative abundances of n taxa in the source community \( {\left\{ {p_{i} } \right\}}^{n}_{{i = 1}} \), a realization of the Dirichlet distributed local abundances can be generated by sampling from a set of gamma dis-tributions. Let \( {\left\{ {Y_{i} } \right\}}^{n}_{{i = 1}} \) be random variables such that Y i ∼ gamma(N T mp i ) and let \( {\left\{ {Y_{i} } \right\}}^{n}_{{i = 1}} \) be realizations of these variables sampled at random, then

$$ x_{i} = \frac{{y_{i} }} {{{\sum\limits_{j = 1}^n {y_{j} } }}}\quad i = 1, \ldots , n $$
(22)

will represent a random sample from the Dirichlet joint probability distribution for a local neutral community [Eq. (17)].

Sampling a neutral community

We have already shown that for the continuous variant of the NCM, the steady-state joint pdf for all species is Dirichlet Dir(N T mp i ,...,N T mp n ), where p 1,...,p n are the relative abundances of the species in the metacommunity.

We can repeat the exact same argument to derive the joint distribution of the relative abundances within a sample of size N S from such a community. Strictly speaking, selecting a subsample of size N S from a local community is achieved by simply sampling N S individuals without replacement from the community of size N T. However, since for almost all microbial samples \( N_{{\text{S}}} \ll N_{{\text{T}}} \), the problem can be approximated to one of sampling with replacement.

Regard the sampling exercise as a continuous process through time. Individuals are selected from the source community one by one until a sample of size N S has been collected. Once this sample size has been reached, the process of selecting individuals continues at regular intervals in time (generations) but now the selected individual replaces one randomly chosen individual currently in the sample population. This is analogous to the argument used for deriving the joint distribution for the local abundances, except that we have a pure immigration–death process, with immigrants into the sample from the local community. Setting m = 1 and regarding our local abundances as the metacommunity from which immigrants are drawn, it is clear that conditional on knowledge of local abundances x 1,...,x n the joint distribution of relative abundances y 1,...,y n within a sample is Dirichlet Dir(N S x i,...,N S x n ). That is,

$$ f{\left( {Y\left| X \right.} \right)} = \Gamma {\left( {N_{{\text{S}}} } \right)}{\prod\limits_{i = 1}^n {\frac{{y_{i} ^{{N_{{\text{S}}} x_{i} }} }} {{\Gamma {\left( {N_{{\text{S}}} x_{i} } \right)}}}} } $$
(23)

where X = (x 1,...,x n ) and X = (y 1,...,y n ) for notational convenience. This allows us to calculate the first and second moments of the sample distribution because we know that the marginal densities of a Dirichlet distribution are beta distributed. Therefore,

$$ E{\left( {y_{i} \left| {x_{i} } \right.} \right)} = x_{i} $$
(24)

and

$$ E{\left( {y_{i} ^{2} \left| {x_{i} } \right.} \right)} = \frac{{x_{i} {\left( {N_{{\text{S}}} x_{i} + 1} \right)}}} {{N_{{\text{S}}} + 1}} $$
(25)

Now, since \( x_{i} \sim Beta{\left( {N_{{\text{T}}} {\text{mp}}_{i} , N_{{\text{T}}} m{\left( {1 - p_{i} } \right)}} \right)} \), we have that

$$ E{\left( {y_{i} } \right)} = p_{i} $$
(26)

and

$$ E{\left( {y_{i} ^{2} } \right)} = {\left[ {\frac{1} {{N_{{\text{S}}} + 1}}} \right]}{\left[ {N_{{\text{S}}} \frac{{p_{i} {\left( {N_{{\text{T}}} {\text{mp}}_{i} + 1} \right)}}} {{N_{{\text{T}}} m + 1}} + p_{i} } \right]} = \frac{{N_{{\text{S}}} N_{{\text{T}}} {\text{mp}}_{i} ^{2} + {\left( {N_{{\text{S}}} + N_{{\text{T}}} m + 1} \right)}p_{i} }} {{N_{{\text{S}}} N_{{\text{T}}} m + N_{{\text{T}}} m + N_{{\text{S}}} + 1}} = \frac{{{\left( {\frac{{N_{{\text{S}}} N_{{\text{T}}} m}} {{N_{{\text{T}}} m + N_{{\text{S}}} + 1}}} \right)}p_{i} ^{2} + p_{i} }} {{{\left( {\frac{{N_{{\text{S}}} N_{{\text{T}}} m}} {{N_{{\text{T}}} m + N_{{\text{S}}} + 1}}} \right)} + 1}} $$
(27)

letting

$$ \ifmmode\expandafter\tilde\else\expandafter\~\fi{m} = \frac{{N_{{\text{T}}} m}} {{N_{{\text{T}}} m + N_{{\text{S}}} + 1}} $$
(28)

then

$$ E{\left( {y_{i} ^{2} } \right)} = \frac{{N_{{\text{S}}} {\text{\ifmmode\expandafter\hat\else\expandafter\^\fi{m}p}}_{i} ^{2} + p_{i} }} {{N_{{\text{S}}} \ifmmode\expandafter\hat\else\expandafter\^\fi{m} + 1}} $$
(29)

We were unable to derive a neat analytical solution for the marginal pdfs of abundance in the sample. However, repeated sampling from neutrally assembled synthetic communities confirmed that the marginals were very closely approximated by beta distributions. If we assume that the sample marginal distributions are exactly beta, then—as their first and second moments are given by Eqs. (26) and (29), respectively—the sample distribution is given by,

$$ y_{i} \tilde{}b{\text{eta}}{\left( {N_{{\text{S}}} {\text{\ifmmode\expandafter\hat\else\expandafter\^\fi{m}p}}_{i} ,N_{{\text{S}}} \ifmmode\expandafter\hat\else\expandafter\^\fi{m}{\left( {1 - p_{i} } \right)}} \right)} $$
(30)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sloan, W.T., Woodcock, S., Lunn, M. et al. Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data. Microb Ecol 53, 443–455 (2007). https://doi.org/10.1007/s00248-006-9141-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00248-006-9141-x

Keywords

Navigation