Review

In the past few years, the availability of improved sequencing methods, including pyrosequencing [1], has revolutionized what we know about the microbes that inhabit our bodies. Although it has been known for decades that our microbial symbionts outnumber our own cellsby about a factor of 10 [2], the differences in the repertoires ofsymbiontsharbored by different healthy individuals, different siteswithin the individual, and by individuals over time are only now coming to light. Initially, it was assumed that a 'core microbiome' existed; that is, that a substantial number of microbial species was shared in each body habitat in all or most humans, and that the genomes of these core species could be used as scaffolds to assemble fragmentary data from short-read shotgun sequencing of microbial community DNA [3].

The first three individuals whose gut microbiomes were surveyed using substantial numbers of 16S rRNA genesequences shared few of their species, however [4]. Similarly, observations that a person's left and right hands have only 17% of bacterial species in common, and that two different people's hands share only 13% [5], cast doubt on the concept of a substantial core set of microbial species shared by all or most people. This doubt has been reinforced by recent work that redefines core lineages or genes as 'core' even if shared by relatively few people [6, 7]. In fact, on the basis of 16S rRNA geneanalyses we can rule out the possibility that, even within relatively homogeneous small populations of fewer than 100 individuals, everyone's skin-surface communities or gut communities share more than a tiny fraction of species [68]. This unanticipated variability in shared community membership, and also in other important aspects of the human microbiome, poses substantial conceptual and computational challenges.

Of particular importance for microbiome studies is the following question: what is the effect size? That is, using standard terminology from statistics, how distinguishable are two communities or groups of communities? Obtaining an answer is essential for addressing many practical concerns with experimental design. For example, the effect size determineshow many individuals need to be recruited for a given study, and how many sequences need to be collected per sample to observe differences if they exist. These considerations are particularly importantfor the study ofsystemic disorders such as diabetes or some autoimmune disorders, which are expected to influence the microbiomein multiple body habitats. We need a sense of how much variation exists among different body habitats, how much variation is observed among healthy individuals for the same body habitat, and how much of a shift occurs due to a pathophysiologic state. It is also importantto define the most appropriate method for determining the magnitude of similarity or difference between communities, as the choice of method has a large influence on the results of community comparisons [912]. A general discussion of the pros and cons of different metrics of community overlap is beyond the scope of this paper (see [912] for reviews). Here, we summarize the types and sizes of effects found in studies that used various methods of comparing groups of samples, and look for large-scale patterns that can give information on the number of individuals and sequences that are needed to observe different types of effects (Figure 1).

Figure 1
figure 1

The problem of distinguishing between sequences. (a) An investigator contemplating the problem of distinguishing between sequences from the gut of Equus asinus and the volar forearm of humans. (b) Our solution; guess the effect size based on the effect sizes reported in published studies; perform simulations based on these effect sizes as shown in Figure 2, and then acquire sufficient sequences to resolve microbial community differences of the expected magnitude. (c) When comparing the Equus asinus gut (white point) to human forearms (red and green points represent left and right arms, respectively), 100 or even 10 sequences per sample provide sufficient resolution, but one sequence per sample does not.

Figure 2
figure 2

Variation in human body habitats within and between people. (a) The full dataset (approximately 1,500 sequences per sample); (b) the dataset sampled at only 10 sequences per sample, showing the same pattern; (c) the relationship between sequencing depth and the PERMANOVA component of variation. The amount of variation explained by the factors plateaus at relatively shallow sequencing depths. Note that the proportion of variation captured by differences between the samples (that is, residual variation) is still highest despite the explanatory values of the three factors examined. (d) Effect size determines the number of sequences required for sample identification. Each point in the figure represents a specific sample selected from a pair of body sites, and the number of sequences required to correctly distinguish which site the sample originated from. The point is colored according to the two body sites under consideration, the center's color represents the broad category the selected sample originated from, the border color represents the other broad category under consideration. Many body sites share the same broad category, and thus some points have the same border and center coloring. Red, external ear canal; yellow, hair; green, oral cavity; blue, gut; magenta, skin; gray, nostril. ns, not significant.

A variety of interrelated features differentiate microbial communities. These features include the the relative abundance of specific taxa (the proportion of the bacteria in the sample that are Firmicutes, for example), the level of species richness or diversity observed within a community (alpha diversity), and the degree to which different communities share membership or structure (beta diversity). A major challenge in comparing studies is that there is no consistent way in which the size of community differences is reported, as the type of difference that is relevant depends on the study. For example, lean and obese mice and humans differ in their ratios of prominent bacterial phyla (Bacteroidetes (which include the common gut commensal Bacteroides), Firmicutes (Gram-positive bacteria, including Lactobacillus and Clostridium), and Actinobacteria (which include Corynebacteria and Mycobacteria) [1315]); men's and women's hands differ in the number of species-level phylotypes (defined as organisms with 16S sequence identity >97%) observed on average [5]; and samples from the same or similar sites on the bodies of different individuals cluster together using UniFrac-based principal coordinates analysis [4, 16, 17]. UniFrac is a metric for comparing microbial communities using phylogenetic information, which has been implemented in several tools.

Because of the diverse ways in which microbial communities respond to various environmental factors, it is difficult to compare effect sizes across different studies or systems, as an analysis that highlights differences in one system may obscure them in another. Thus, in what follows, we review effect types and sizes as reported by the authors of individual studies. We focus on variation in human-associated microbial community diversity as assessed by 16S rRNA gene sequence surveys of abundant lineages, using various measures of both within- and between-sample diversity (alpha and beta diversity, respectively). We review comparisons of microbial communities in relationship toboth sampling depth (that is, number of sequences per sample) and breadth (that is, number of samples or individuals). We then perform simulations using an atlas of microbes associated with different sites in the human body to ask how many sequences per sample are needed in order to detect differences across individuals, time, and locations within the body.

Reported effect sizes between and within different body habitats

Table 1a provides an illustrative (though not exhaustive) overview of the literature regarding differences observed in different body habitats and locations in healthy individuals, and the number of subjects and sequences that were used to identify these differences. Although metagenomic studies that examine all the genes in the genome are also of immense interest, shotgun metagenomic data are so far available only from the gut and for a relatively few samples, and so the range of questions that can be addressed at present is substantially more limited than for 16S rRNA-based surveys, the type of survey we consider here. One robust finding that exemplifies relative effect sizes is that there appears to be a greater degree of variation in microbial community composition between individuals than within the same individual over time (Table 1a). This has been found to be true in multiple studies and over a wide range of body habitats. For example, gut community composition is relatively stable in the same individual across a period of months when diet is consistent [6, 16], and even to a certain degree when diet is altered. (Changes in the Firmicutes:Bacteroidetes ratio have been reported in individuals who lost weight, whether they were consuming low-calorie fat- or carbohydrate-restricted diets, but despite these shifts in relative abundance, interpersonal variation was the largest effect observed using phylogenetic comparisons of the communities [14].) Likewise, skin community composition is more similar within a subject than between subjects over a period of months [16, 18], as are oral, nasal and external auditory canal communities [16]. These results indicate that you are likely to be more similar to yourself in 3 months time than to your friend today in terms of the bacteria you harbor.

Table 1 Variations observed among different types of microbial communities, and the extent of sequencing and sampling used

Microbial community changes in human disease and environmental samples

Although a wide range of studies in healthy subjects have identified substantial interpersonal variation in overall microbial community composition, how do these effect sizes compare with differences correlated with disease, or in response to treatments ofvarious environmental samples? To address this question, we reviewed culture-independent, 16S rRNAgene-based surveys associated with different physiological conditions (Table 1b) and associated with experimental manipulations in non-human environments (which were surprisingly scarce; Table 1c).

One of the best-characterized effects of health status on the gut microbiome is the association between obesity and the proportional representation of Bacteroidetes, Firmicutes and Actinobacteria [6, 1315]. Studies in mice indicate that the microbiota contributes to the obese state by providing the host with a greater amount of energy from the diet compared with the microbiota of a lean host [15], as well as by manipulating host genes that regulate the deposition of energy in adipocytes [19]. The obesity-associated microbiomes of humans (and mice) are enriched in functional genes for certain types of carbohydrate metabolism, and this is directly attributable to the reduction in the numbers of genomes of members of the Bacteroidetes [6, 15].

However, even the size of the differences in gut bacterial community composition of obese versus lean hosts is debated, as different studies using different methodologies have returned varied results [20]. The impact of methodology is particularly evident in a study of twins concordant for obesity or leanness, in which the observed relative abundances of Bacteroidetes, Actinobacteria and Firmicutes, as judged by sequencing of different regions of 16S rRNA clones, depended on the sequencing approach - pyrosequencing of PCR products, Sanger sequencing of 16S rRNA clones, or shotgun sequencing and phylogenetic classification of reads [6]. However, the direction of the effect was consistent across methodologies, and detectable with as few as a couple of hundred sequences per sample.

Observable phenotypes such as obesity may be caused by a variety of underlying factors, and which of those factors is responsible for shifts in the host's microbiota is difficult to address in such correlative studies. Experimental manipulations of microbial communities, however, allow determination of the relative effects of specific variables on overall community composition or the abundance of particular taxa, and as such, allow researchers to draw conclusions regarding cause and effect. Examples of experimental manipulations of non-human environments that used 16S rRNA gene sequencing approaches (either clone libraries or pyrosequencing) and that were well enough replicated to allow statistical analysis are shown in Table 1c. For soil samples, three to four replicates with 70 to 100 sequences were sufficient to observe differences in microbial communities due to land use and moisture regimes [21, 22]. For piglet gut microbiota, the effects of antibiotics on overall community composition were evident with as few as 96 sequences per sample [23]. It would be fascinating to test whether similar antibiotic-induced effects in outbred populations of humans with diverse diets [24] can be found with relatively few sequences. Similarly, it would be important to consider sampling depth under human physiological conditions in cases where the effect size is known to be large, for example, in the development of the infant gut microbiota [25].

Has the depth of sequencing used up to now really been necessary?

The literature reviewed in Table 1 reports how many sequences were used to reveal a variety of different effects. Could the same results have been achieved with less sequencing? To begin to address this question, we carried out a limited reanalysis of a study of multiple body habitats by Costello et al. [16], which encompasses variability explained by nested factors with different effect sizes (Box 1).

Box 1
figure 3

How many sequences does it take...?.

In conclusion, the results described here, and previously reported [8, 37], show that arbitrarily choosing to generate large numbers of sequences may not be the most cost-effective way to identify changes in microbial communities associated with different physiological or pathophysiological states. Instead, we call for a few standardized methods to assess differences among microbial communities, which will allow for effect size and power calculations, and therefore a considered assessment of the number of individuals and sequences required to differentiate among given communities. The following four methods have been successful in a range of studies: differences in alpha diversity (number of phylotypes observed or extrapolated); differences in abundance of specific lineages; differences in location on a principal coordinates plot obtained from UniFrac distances or other metrics; and the F ST measure described in the previous section.

The rapid increase in sequencing capacity provides a spectacular opportunity to advance the field in ways that were unimaginable even 3 years ago. How can individual investigators, or groups of investigators, use these resources most wisely at this unique moment of democratization of the ability to perform sequence-based studies? The data summarized here suggest that study designs consisting of tens of thousands of samples sequenced at shallow coverage will be highly informative (depending on the effect size), and such studies are possible with the instruments available today. Given recent observations that inter-habitat and inter-personal variations are large effects, we believe that individual researchers can and should sieze the opportunity provided by these findings to analyze vast numbers of samples at low-coverage (for example, 100 to 1,000 sequences). At this number of samples, detailed exploration of spatial and temporal dynamics of microbial communities will be possible, as will comparisons of large patient populations. In addition, replicate samples can be acquired and analyzed without too strongly impairing the breadth of an investigation, allowing more robust experimental designs to be implemented. One can envisage that perhaps within the next few years, a group of motivated high-school students might, for a science-fair project, be able to track movements in microbes between humans and their pets and livestock across the planet. These studies, especially when combined with hypothesis-driven approches to understanding the effects of factors such as diet and antibiotic exposure, could go far beyond even the largest purely observational studies being contemplated today.

Such studies will yield an overall map of variation within the human microbial ecosystem, and relate differences to specific physiological states within and between individuals in a manner that is replicated across individuals. These studies will serve as a framework to identify and compare the shifts that take place in the microbial community that are related to specific disorders.