Native Lignocellulolytic Microbial Community Metagenomics
In this study, we used metagenomics to identify enzymes from the native microbial community present in a tropical rain forest soil sample from Puerto Rico (Allgaier et al. analyzed the metagenome of a switchgrass compost in a previously published study ). We performed shotgun sequencing using the Roche 454 GS FLX Titanium technology to obtain metagenomic data for the native soil community, resulting in 863,759 reads, with an average read length of 417 bp, for a total of 350Mbp of sequence data. After trimming and quality control, the final data set resulted in 780,588 reads equaling 321 Mbp. Assembly of the tropical forests soil metagenome was attempted using the Newbler assembler software by 454 Life Sciences, but the species composition of the sample was too complex to yield any significant assembly of the metagenome sequence reads. MG-RAST was able to assign various degrees of functional annotation to 29.7% of the sequences (232,025) at E < 1e−10. PRIAM enzyme-specific sequence profiles assigned four-digit EC numbers to 110,411 sequence reads at E < 1e−10. BLASTX of the rainforest metagenome to the CAZy and FOLy databases resulted in 29,051 protein family assignments, and 9,041 EC number assignments (Fig. 2), also at E < 1e−10.
The most abundant carbohydrate and lignin active enzyme families, as inferred by best BLASTX hits against CAZy and FOLy, are glycoside hydrolases (GH, 12,193 BLASTX hits) and glycosyl transferases (GT, 11,562 hits), followed by carbohydrate esterases (CE, 3,133 hits), lignin oxidases (LO, 413 hits), polysaccharide lyases (PL, 282 hits), and lignin-degrading auxiliary oxidases (LDA, 240 hits; Fig. 3). GT2 and GT4 are large, predominantly bacterial, multifunctional enzyme families, and most of the sequences present here seem to be involved in aspects of bacterial cell-wall biogenesis. GH13 contains mostly starch- and glycogen-degrading enzymes, as well as trehalose synthases. Lignolytic enzymes are relatively low in abundance (likely due to the smaller number of known reference enzymes, and the lack of bacterial sequences in FOLy), consisting mainly of putative cellobiose dehydrogenases (LO3, 361 hits), and a small number of aryl-alcohol oxidases (LDA1, 124 hits), glucose oxidases (LDA6, 82 hits) and laccases (LO1, 47 hits). Not shown are 3,645 hits against enzymes with carbohydrate-binding modules (CBM), the two most abundant of which are CBM13 (previously known as cellulose-binding domain family XIII, 1366 hits) and the glycogen-binding CBM48 (826 hits).
Supplementary table S1 shows similar abundance patterns for selected lignocellulose-degrading GH families across rain forest soil (this study), compost , cow rumen , and termite metagenome  datasets. GH family counts in the latter metagenomes were based on pfam hits to open reading frames on assembled metagenome contigs; however, the good correlation between BLASTX hits on unassembled reads and pfam hits on assembled ORFs suggests that BLASTX hits provide a reasonable proxy for enzyme family assignment. Beta-glucosidases and other oligosaccharide degrading families are the most abundant in the rain forest reads. The main cellulase families represented are GH5 (243 hits) and GH9 (86 hits), whereas other traditional cellulase families (GH7, GH45, GH48) are only present in very low abundance (if at all) in all four microbiomes examined.
Overall, the native soil community has a variety of potentially interesting genes for use in industrial biofuels production. However, the lack of assembly of full-length genes in this data set and the rather short list of full-length genes identified from compost in an earlier study by Allgaier et al.  prompted us to investigate whether selective cultivation on bioenergy feedstocks could reduce the community complexity of these native communities, thereby facilitating identification of a greater number of full-length genes in future metagenomic sequencing efforts.
Switchgrass-adapted Puerto Rican Rainforest Soil Cultures
We chose to adapt the tropical soil sample to switchgrass under anaerobic conditions to select for anaerobic biomass-degrading microbes, since many of these organisms produce cellulosomes, multi-enzyme complexes capable of depolymerizing both cellulose and hemicellulose. Anaerobic switchgrass-adapted consortia were enriched from tropical forest soils by passing the communities two times for 6 weeks, with switchgrass as the sole carbon source, under anaerobic conditions with and without supplemental iron (Table 1). The richness of the original soil sample was 1,339 taxa as determined by PhyloChip (Fig. 4a), and growth on switchgrass as the sole carbon source reduced the richness to 84 taxa, while inclusion of iron in the consortia growth media resulted in a richness of 336 taxa. There were archaea present in the soils that were not present in either feedstock-adapted community, along with taxa from 20 phyla (Table 2),
Taxa in the switchgrass-adapted communities lacking iron were heavily dominated by Proteobacteria, Firmicutes, and Bacteroidetes (Fig. 4a), with members of the class Proteobacteria making up 83% of the richness of the switchgrass-adapted communities. All of the taxa enriched in the switchgrass-amended cultures lacking iron were also present in the iron-amended cultures (Fig. 4a). Taxa that were specifically enriched in the presence of iron and not found in the non-iron consortia were mostly dominated by the Bacteroidetes, Desulfovibrionaceae (class Deltaproteobacteria), Caulobacterales (class Alphaproteobacteria), and Enterococcales (class Bacilli). There were also representatives from many phyla that are rare, uncultivated, and otherwise of cryptic function (Table 2). Of the 41 phyla originally represented in the soil, only nine phyla remained when communities were adapted to switchgrass only, while iron addition resulted in the growth of taxa from 28 different phyla on switchgrass as the sole carbon source. These results clearly show that selective growth does indeed reduce the complexity of a native soil community (Table 3, Fig. 4).
Lignin-adapted Municipal Green Waste Compost Cultures
To test a different set of selective conditions, we adapted the municipal green waste compost community to various types of purified lignin as the sole carbon source under aerobic conditions, to select for microbes specialized in the deconstruction and/or modification of lignin (Table 1). The community was grown under aerobic conditions since many lignin-degrading enzymes use oxidative chemistry to depolymerize lignin. Three types of lignin were chosen as the carbon source: alkali lignin (AL), organosolv lignin (OL), and Indulin AT (IL). The AL is sulfonated and completely water-soluble with a MW range reported by Sigma Aldrich (St. Louis, MO) between 10,000 and 60,000. The OL is a mixture of soluble and insoluble lignin, indicating that it may have higher MW fragments of lignin than the AL. The IL is unsulfonated and completely insoluble (the soluble fraction was removed by water extraction), and therefore is likely to contain only high MW lignin. The range of water solubility and MW for these types of lignin is likely to facilitate the enrichment of microbes with a broad range of lignin-degrading attributes.
After five 2-week enrichments, the composition of each lignin-adapted microbial community was determined by SSU rRNA amplicon pyrosequencing. In the compost inoculum, a total of 391 different taxa were identified representing 24 bacterial and eukaryotic phyla. The diversity of the compost inoculum microbial community was reduced to 30, 98, and 136 taxa in the lignin-enriched cultures belonging to 15 bacterial phyla, with each of the three lignin-adapted cultures dominated by Alphaproteobacteria (Fig. 4b, Table 2). In general, eukaryotes were of minor abundance in both the compost inoculum and the lignin-adapted communities, with the exception of one taxon belonging to the Alveolata, which accounted for 2% of the microbial community in the OL enrichment.
Identifying the organisms shared between the three cultures may indicate which organisms are playing an active role in lignin modification and depolymerization. Eight phylotypes are common in all three lignin-amended cultures. Five of these phylotypes have cultured representatives: Paracoccus sp. Str. WB1, Mesorhizobium sp. Str. CCBAU 41182, brackish water isolate str. HINUF007, Rhizobium sp. str. RM1-2001, and Hyphomicrobium aestuarii str. DSM 1564. The other three phylotypes are most closely related to SSU rDNA clones recovered from environmental samples (water manure clone, aspen rhizosphere clone, and solid waste clone).
Again, the lignin-adapted communities showed a substantial decrease in microbial diversity compared to the native compost inoculum. The selective conditions were completely different from the switchgrass-adapted soil communities, yet both types of selection reduced the number of taxa compared to the respective native community, ranging from three- to 15-fold.