The majority of microorganisms defy axenic culture in the laboratory and so have eluded study by the classic microbiological approaches [1]. With the advent of cultivation-independent molecular tools, the true extent of microbial diversity has been, and continues to be, revealed [24]. Much of that work, however, is based on a single phylogenetic marker gene, small subunit ribosomal RNA (ssu rRNA) [5]. By contrast, metagenomics in principle makes accessible the entire genetic complement of a microbial community - we define metagenomics here as the large-scale application of random shotgun sequencing to DNA extracted directly from environmental samples and resulting in at least 50 megabase pairs (Mbp) of sequence data. It has been barely three years since the publication of the first large-scale metagenomic studies: of an acid mine drainage biofilm [6] and of ocean surface water [7]. Since then, numerous other habitats have been investigated using this 'basic' metagenomic approach (Figure 1, arrow 1), including farmland soil and whale falls (whale carcasses that have fallen to the sea floor) [8], symbionts in a gutless marine worm [9], phosphorus-removing activated sludge [10], the human [11] and termite [12] gut and marine microbial [13, 14] and viral [15] samples. In all these cases, metagenomics provided insights into the microbial community under study that probably would have taken much longer to come to light using more directed (nonrandom) approaches. Shotgun sequencing of environmental samples has, however, a number of limitations [16], which can best be addressed by the use of complementary techniques.

Figure 1
figure 1

Enhancing the basic metagenomic approach through complementary technologies. The metagenomic analysis of microbial communities by random shotgun sequencing (arrow 1) is being enriched in one dimension by parallel detection and analysis of transcripts ('metatranscriptomics', arrow 2) and of expressed proteins ('metaproteomics', arrow 3). In addition, because of the complexity of most natural microbial communities a separation of the community into populations enriched in a particular group of microorganisms and even into individual cells would be advantageous. Whole-genome amplification (WGA) is beginning to be validated as an approach to metagenomic and metatranscriptomic analysis in such samples, but there are still some methodological constraints to be overcome (see text). The horizontal arrows indicate examples of techniques that can be used to move to the next level of analysis, for example, (a) flow sorting and filtration and (b) microfluidics and flow sorting. SIP, stable isotope probing.

Limitations of environmental shotgun sequencing

Three notable limitations of the basic metagenomic approach are low resolution, the inability to classify short metagenomic fragments, and the lack of functional verification. Perhaps surprisingly, the resolution of microbial communities by shotgun sequencing is rather low, with only dominant populations producing sufficient sequence coverage to result in a sequence assembly. For example, assuming no other biases, a population representing 0.1% of a community would account for only 100 kilobase pairs (kbp) of a 100 Mbp metagenome, resulting in very little coverage (0.025 × coverage for a 4 Mbp genome). If a recent study on the microbial diversity in the deep sea is an accurate indication of species-abundance distribution [4], rare community members comprising the bulk of the diversity in many environmental samples will be completely missed by current levels of shotgun sequencing.

The second limitation is in identifying the source species of metagenomic fragments. Current methods to classify such fragments do not perform well on sequences of less than 8 kbp [17], that is, the bulk of the sequence data obtained in most metagenomic studies. And third, as with all DNA sequence data, metagenomics can only provide information on metabolic potential, and only for genes with recognizable homology with biochemically characterized proteins.

Divide and conquer

The first two limitations can be addressed by dividing microbial communities into simpler subsets, which facilitates contig identification and greater genomic coverage of populations. Ironically, cultivation of pure strains is an excellent example of this divide-and-conquer approach, as single cells or microcolonies are separated from an environmental inoculum and grown clonally on artificial media. However, directed cultivation of organisms of environmental relevance is typically difficult to achieve [1, 18, 19], although metagenomic studies can provide valuable guidance for such efforts [20].

Cultivation-independent methods to subdivide microbial communities into enriched populations (see Figure 1, arrow a) often rely on the physical properties of the target cells. For example, populations comprising cells of atypical size can be effectively enriched via filtration. This approach was successfully applied to enrich phylogenetically novel populations of ultra-small archaea using filters with a 0.45 μm pore size [21, 22]. Both enriched populations have been the subject of subsequent genome sequencing projects ([23] and B.J. Baker, E.E. Allen and J.F. Banfield, unpublished work; see [24]). In a metagenomic project studying bacterial endosymbionts of a gutless marine oligochete worm, a Nycodenz density-gradient centrifugation was used to separate the bacterial and eukaryotic host-cell populations, improving the recovery of the bacterial genome sequences in subsequent shotgun sequencing [9].

More sophisticated techniques for separating cells from communities are also being applied, including fluorescence-activated cell sorting (FACS [25]) and microfluidics [26] (see Figure 1). FACS can be used to rapidly sort large numbers of cells belonging to specific populations on the basis of cell properties such as size, DNA content, photosynthetic pigments or fluorescently labeled probes targeting the cells [2729]. Such sorting can provide enough biomass to allow direct extraction of DNA or RNA for the polymerase chain reaction (PCR) and shotgun sequencing. FACS and microfluidics can also be used to separate individual cells, with the caveat that single cells require whole-genome amplification, for example by multiple strand displacement amplification (MDA [30]), to provide enough genomic DNA for shotgun sequencing.

Co-localization of PCR-amplified marker genes (such as ssu rRNA) and functional genes in single cells has recently been demonstrated in two independent studies. Ottesen and colleagues [31] used highly parallelized microfluidic chambers to separate individual cells and, via PCR, were able to link a key metabolic gene in homoacetogenesis to the ssu rRNA of treponeme spirochetes present in the termite hindgut. Bacterial homoacetogenesis delivers the major carbon and energy source (acetate) for the host termite, and hence represents an important link in this mutualistic symbiosis. Stepanauskas and Sieracki [32] flow sorted single marine planktonic cells into microtiter plates and identified a range of bacteria containing proteorhodopsin and other genes after MDA and PCR. In fact, their results hint at flavobacteria as major carriers of the proteorhodopsin gene. Compared with large-scale shotgun sequencing, this approach represents a rather low-cost alternative for studying the metabolic potential of uncultivated microbes. In summary, both the studies mentioned above mark an important milestone in microbial ecology - the systematic linkage of identity with function in uncultivated microorganisms. PCR-based co-localization of genes is, however, limited by existing sequence data and cannot access novel gene families discovered by random shotgun sequencing.

The holy grail of de novo sequencing of sorted cells, and individually sorted cells in particular, is to obtain a finished genome and thus a complete inventory of an organism's genetic potential. The feasibility of genome sequencing from just one or a few cells has been validated by using MDA and partial sequencing of species with known genome sequence (Escherichia coli [33] and Prochlorococcus [34]). This approach has been applied to members of the candidate bacterial phylum TM7 from the human mouth [35] and from soil [36], yielding some insights into the metabolic potential of novel uncultivated organisms. For example, the presence of genes for type IV pilus biosynthesis in the isolates from both studies [35, 36] study may hint at a gliding motility known from some Gram-positive bacteria. However, the majority of genes of the TM7 genomes studied bear little similarity to genes of characterized proteins.

Full genome sequencing from a single microbial cell (Figure 1, arrow 1) remains problematic, however, due to contamination, uneven genome coverage and chimeric sequence formation during MDA [34, 37]. A number of solutions have been proposed to somewhat mitigate these limitations. Reducing the reaction volume increases the specific template concentration, leading to fewer chimeric sequences [37]. Microfluidic devices allow MDA reactions at the nanoliter scale, which increases the specific template concentration by three orders of magnitude [35]. Uneven genome coverage, on the other hand, seems random [33] and hence pooling of separate MDA reactions from individual but genomically identical cells [36] should improve coverage.

Going beyond metabolic potential

A major criticism of metagenomics is that it is, to some extent, crystal-ball gazing as one attempts to infer the metabolism of organisms from their DNA sequence alone (the third limitation raised earlier: lack of functional verification). Indeed, purely metagenomic studies often raise more questions than they can answer. Transcriptomic and proteomic analyses have been applied for several years to microbial isolates in order to observe their expressed metabolic potential [38, 39]. These approaches have recently been applied in a high-throughput fashion to microbial communities - coining the terms 'metatranscriptomics' and 'metaproteomics'.

A technical difficulty associated with transcriptomics in bacteria and archaea is separating mRNAs from the dominant rRNAs. The poly(A) tail of eukaryotic mRNAs (which facilitates their separation from rRNAs before cDNA synthesis) is not present on bacterial and archaeal transcripts [40]. Leininger and colleagues [41] circumvented this problem to some extent by simply using the brute force of the new massively parallel short-read sequencing technologies to absorb the loss of transcript sequence output due to the predominance of rRNA. Through this approach they provided unexpected evidence for members of the Crenarchaeota being the most active ammonia-oxidizing microorganisms in soil ecosystems [41].

Modern proteomic methods based on mass spectrometry allow a fine-scale analysis of the expressed proteins of microbial communities [42]. By combining such techniques with genomic data, Lo et al. [43] were able to distinguish strain-specific protein variants differing in only a single amino-acid residue from a different site in the same mine. Interestingly, 48% of the proteins predicted in the genome sequence of the most abundant member in this system, Leptospirillum group II, were detected by proteomics. This value is higher than those reported for many proteomic analyses of isolates and may point to a heterogeneity of metabolic states in naturally occurring populations [42].

Unexplored territory

By describing techniques that extend the basic metagenomic approach in two dimensions - gene expression and translation (Figure 1, arrows 2, 3) and community fractionation (Figure 1, arrows a, b) - additional combinations become apparent that remain to be explored (see Figure 1 'Unexplored teritory'). Applying transcriptomics and proteomics to separated populations will allow functional characterization of species that have been inaccessible via cultivation so far. The many phyla in the tree of life without genome-sequenced representatives will provide attractive targets for this type of analysis [2].

The application of transcriptomics and proteomics to enriched populations or even individual microbial cells taken directly from the environment remains technically challenging (see Figure 1, arrows 2, 3). However, the technical hurdles may not be insurmountable. For instance, electrospray ionization/mass spectrometry can provide greater sensitivity than the currently standard liquid chromatography mass spectrometry used in proteomics, leading to smaller sample size requirements [44]. Commercial kits are already available for amplifying RNAs from as few as 50 cells (for example, QuantiTect™ from Qiagen) paving the way for single-cell transcriptomics. Such methods would allow functional characterization of single cells, providing insights into the heterogeneity of expression postulated to exist in microbial cell populations [45]. Moreover, if these approaches prove viable, such population expression heterogeneity would be assessable in the context of the community from which the population was derived.

Although there is still great scope for application of the basic metagenomic approach to microbial communities - in making spatial series [14] and in population genomics [46, 47] for example - researchers are making concerted efforts to extend and enhance metagenomics using techniques such as flow sorting, microfluidics, transcriptomics and proteomics. There are many other recently developed methods that can similarly be applied to build on or complement the basic metagenomic approach, including stable isotope probing [48], stable isotope mass spectroscopy [49] and subcellular high-resolution imaging [50], guaranteeing a rich and interesting future for those who study microbial ecology and evolution.