Next-Generation Sequencing of Crown and Rhizome Transcriptome from an Upland, Tetraploid Switchgrass
- First Online:
- Cite this article as:
- Palmer, N.A., Saathoff, A.J., Kim, J. et al. Bioenerg. Res. (2012) 5: 649. doi:10.1007/s12155-011-9171-1
- 395 Views
The crown and rhizome transcriptome of an upland tetraploid switchgrass cultivar cv Summer well adapted to the upper Midwest was investigated using the Roche 454-FLX pyrosequencing platform. Overall, approximately one million reads consisting of 216 million bases were assembled into 27,687 contigs and 43,094 singletons. Analyses of these sequences revealed minor contamination with non-plant sequences (< 0.5%), indicating that a majority were for transcripts coded by the switchgrass genome. Blast2Gos comparisons resulted in the annotation of ~65% of the contig sequences and ~40% of the singleton sequences. Contig sequences were mostly homologous to other plant sequences, dominated by matches to Sorghum bicolor genome. Singleton sequences, while displaying significant matches to S. bicolor, also contained sequences matching non-plant species. Comparisons of the 454 dataset to existing EST collections resulted in the identification of 30,177 new sequences. These new sequences coded for a number of different proteins and a selective analysis of two categories, namely, peroxidases and transcription factors, resulted in the identification of specific peroxidases and a number of low-abundance transcription factors expected to be involved in chromatin remodeling. KEGG maps for glycolysis and sugar metabolism showed high levels of transcript coding for enzymes involved in primary metabolism. The assembly provided significant insights into the status of these tissues and broadly indicated that there was active metabolism taking place in the crown and rhizomes at post-anthesis, the seed maturation stage of plant development.
Keywords454 FLX pyrosequencingBioenergyCrowns and rhizomesSwitchgrassTranscriptomeTranscription factors
Increasingly, there is interest in the use of switchgrass as a feedstock for biofuels because it can be effectively grown on marginal croplands [1, 2]. In order to fulfill anticipated biomass demand, improvements in agronomic properties, particularly biomass yields, yield stability, and quality of lignocellulosic materials, need to be accomplished by the year 2030 to meet a national goal of replacing 30% of petroleum gasoline with liquid fuels derived from renewables . Accomplishing these goals in the upper Midwest will also be met with the challenge of sustaining high productivity from potential cold weather-related losses in stand (plants per square meter) over time that could both reduce yields and increase production costs to replant fields.
Sustainable production of switchgrass for biofuels in the upper Midwest will require cultivars that withstand great fluctuation in temperatures and rainfall. At least two different factors are believed to contribute to switchgrass production under these conditions. The first is the overall health of the below-ground components of the plant. Depending on the genotype, switchgrass produces either short or long rhizomes. Each spring, new tillers arise from rhizomes, crowns, and axilliary buds present on stem bases. There is significant genetic variation for new tiller production and for the proportion of tiller initials derived from different sources. Thus, breeding efforts will have to capitalize on this diversity to produce cultivars with optimal biological efficiency for tiller meristem initiation and growth. In addition to tillering, a second factor that is related with winter hardiness is lignin. Selection of forages for increased dry matter digestibility (e.g., for animal feed) is accompanied by lowering lignin in plant tissues, but plants bred for lowered lignin also have displayed a loss in agricultural fitness in some genetic backgrounds . Unfortunately, little is known about the underlying reason for this observation. However, several studies have also seen that selection for increased digestibility also negatively impacts winter hardiness in some switchgrass populations [5, 6].
To aid breeding and selection, molecular markers that are associated with below-ground tissue health are necessary. While genomic biology provides a systematic means for identifying such markers, the transformational step of establishing a whole genome sequence is difficult to realize in plants such as switchgrass that have polyploid genomes and are likely to contain large families of dispersed repetitive DNA elements . To circumvent this problem, transcriptomes of these plants are generally evaluated by de novo sequencing of cDNA to provide a fundamental overview of the coding capacity of their genomes (for example, [8–14]). ESTs from sequencing of switchgrass tissues, including young crowns and roots, have been produced and made publicly available [11, 15, 16]. However, these ESTs suffer from the limitations of being produced from traditional clone-based libraries and are not from crowns and rhizomes of field-grown plants, especially from a cultivar well adapted to the Upper Midwest of the USA. To more systematically characterize the transcriptome of plants relevant to the Upper Midwest, we have capitalized on the capacity of next-generation sequencing technologies that can provide a more comprehensive overview of the transcriptome. In addition to capacity, the availability of longer reads (250–500 bases) from the Roche-454 FLX titanium platform allows a relatively accurate assembly of data into contigs, permitting better overall annotation and data mining.
Here we have analyzed the transcriptome of crowns and rhizomes obtained from field-grown switchgrass cv Summer plants. This cultivar is an upland tetraploid with good winter hardiness  and has been used to create hybrids which show heterosis for yields [18, 19].
Materials and Methods
Stands of switchgrass cv Summer had been established in the field near Mead, NE, USA, for several years . Above-ground portions of the plants were cut and below-ground portions of the plants were then harvested in late August 2009 at post-anthesis, seed maturation stage of development, using a lever-action hole cutter for golf greens. Four soil plugs containing crown, roots, and rhizomes were placed in plastic bags and kept on ice until cleaned. Soil plugs were cleaned by hand within 1 h of harvest. Adherent soil was removed using toothbrushes. Crowns and rhizomes were trimmed to remove much of the roots and tiller buds and immediately flash-frozen in liquid nitrogen. Flash-frozen tissues were placed on dry ice for transport to the laboratory and stored at −80°C until use. Crowns and rhizomes were fine-milled either by hand or using a cryogenic grinder (6870 Freezer Mill (Spex Sample Prep, Metuchen, NJ, USA)). Pulverized plant material was used to extract RNA.
RNA Extraction and cDNA Library Generation
Total RNA was extracted from all switchgrass tissues using the modified Trizol (Invitrogen, Carlsbad, CA, USA) protocol of Tobias et al. . In short, total RNA was extracted from 16–20 100 mg aliquots of switchgrass tissue. During extraction, the RNA from two 100-mg aliquots was combined for resuspension in 50 μl of RNase-free water with RNaseOUT ribonuclease inhibitor added. The pellets from both aliquots were resuspended sequentially in the same 50 μl of water with heating at 60°C for 5 min each. Any undissolved pellet material was discarded. From these samples, mRNA was isolated using the FastTrack MAG Maxi isolation kit and 100 μl of magnetic beads as directed (Invitrogen, Carlsbad, CA, USA). The mRNA was quantitated using a Nanodrop spectrophotometer (Thermo Fisher, Waltham, MA). Synthesis of cDNA was performed using the high-yield protocol of QuantiTect Whole Transcriptome kit (Qiagen, Valencia, CA, USA) with 100 ng of mRNA as the starting material. The cDNA was purified from the reaction mixture using QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA, USA). Clean-up was achieved using the supplementary protocol for the purification of REPLI-g amplified DNA. The cDNA was again quantitated using the Nanodrop and adjusted to a concentration of 400–500 ng/μl (total cDNA of 40–50 μg provided for 454 sequencing).
Switchgrass cv Summer crown and rhizome cDNA was fractionated and sequenced using a 454 GS-FLX sequencer with titanium chemistry according to the manufacturer’s instructions (Roche, IN, USA) at the Core for Applied Genomics and Ecology, The University of Nebraska—Lincoln, Lincoln, NE, USA.
Assembly of 454 data
Inferred read error
Isogroups with one contig
Bioinformatics and De Novo Transcriptome Assembly
454 GS FLX titanium sequence data were assembled using Roche’s GS De Novo Assembler (gsAssembler) software, version 2.3. The cDNA option was used since the sequence data source was mRNA. The default assembly parameters were used to assemble all three half-plates in a single assembly, and the software automatically excluded reads <50 bp. The assembly output consisted of a series of 27,687 contigs, all of which were greater than 100 bp in length. These contigs were used for downstream analysis. In addition to the contig sequences, individual sequencing reads that had no significant overlap with any other read were classified as “singletons” by Roche’s software and not included in the assembly output. These singleton reads were separated from the initial data set and all of these reads greater than 250 bp in length were also used in downstream analysis. Although we have used contig coverage as an approximation of transcript abundance, the actual relationship between these two parameters has not been quantified.
The trimmed component yielded a total of 929,820 reads containing over 216 million bases. These reads were assembled into 27,687 contigs of 100 base pairs or larger with a total assembly length of 12.9 million bases. A total of 641,443 (69%) reads of the original 929,982 were included in this assembly with an inferred read error of 2.23%. This error term was generated by the Newbler assembler and was defined as: number of read alignment differences/number of mapped bases. About 18% (170,312) of the reads failed to assemble into contigs using the Newbler 2.3 program and were categorized as singletons. The other aligned reads could be placed into 12,548 isogroups (gene models) and 27,687 contigs (Table 1). The average contig length was 722 bp and the median was 568 bp.
To assess potential contamination in this assembly, switchgrass contig and singleton sequences were first compared to proteins present in diverse taxonomic groups of organisms contained in the Refseq databases (NCBI) using the blastX algorithm at an e-value threshold of 1 × 10−7. These analyses showed that the contig and singleton sequences displayed a match of 0.02% and 0.05%, respectively, to microbial sequences, 0.08% and 0.70% to fungal proteins, and 0.01% and 2.91% to invertebrate sequences within the Refseq collections. These data indicated that most of the assembled sequences were from switchgrass tissues.
The contig sequences that did not display a blastx similarity were next analyzed by the blastn algorithm  against the NCBI ALL_EST database (Fig. 2b). Analysis of these 8,174 contigs indicated that approximately 87% of the remaining contig sequences matched other sequences with an e-value of 1 × 10−3, and almost 73% of the remaining contig sequences had an EST match of less than 1 × 10−30.
Blastn of contigs and singletons with selected plant transcriptomes
No blast hits
Panicum virgatum (EST)
GO annotation of contig sequences at different stringencies
Number of sequences with ECc
1 × 10−3
1 × 10−10
1 × 10−15
1 × 10−25
1 × 10−50
Peroxidases and related proteins identified in cv Summer crowns and rhizome transcripts (identified within the “new” group with significant matches to heme-containing oxidoreductases)
C or S
Predicted ortholog and groupa
Os02g0237000, group IV.4
Class III peroxidase
PviPrx-19, monocot-specific group V
Class III peroxidase
Os09g0471100, group IV
Class III peroxidase
Rice peroxidase 124, group III
Class III peroxidase
Os03g0762400, group VI
Fatty acid dioxygenase
Switchgrass transcription factor orthologs identified within cv Summer crowns and rhizome transcripts (only “new” switchgrass transcripts with orthologs of known function are shown; orthologs defined as best-hit by BLASTP to translated switchgrass sequences)
C or S
Function of orthologs
AP-2 domain containing
RAV 1 and 2
Interactions with CONSTANS and FT
Antagonist of e2f
Represses E2F activation of genes
Auxin and gibberellin signaling
Involved in rooting/root hairs
Female flower development
basic leucine zipper
Demarcates boundary of leaf and sheath
Abscisic acid signaling
Embryo/fatty acid biosynthesis
Interactions with SWI/SNF complex
Control of cell cycle
Control of leaf spots
Leaf margin organization
Histone H3K4 demethylation
Histone H3K27 demethylation
Male germline formation
Myb r2r3 family
Suppressor of gamma response 1
Protection under stress
ABA, UV responses biotic and abiotic stresses
Perenniality in switchgrass is likely to be controlled by several mechanisms that ultimately impact the physiological status of the below-ground components of the plant . These below-ground structures include the roots, rhizomes, and crowns. It can be anticipated that over a growing cycle there will be significant, but cyclical, changes in the physiology of these organs underpinned by significant changes in the transcriptome. These changes will include nutrient remobilization and regeneration of new shoots (tillers) at the onset of green-up in early spring, increased tissue accretion over the growing season, transition to the slowing of developmental processes in the fall, and followed by a potentially quiescent stage in the winter. At present, we lack meaningful molecular insights into these processes.
As a first step in understanding how gene expression patterns change in the below-ground tissues of switchgrass plants over a growing season, we have assembled a preliminary transcriptome using over 900,000 sequences obtained by next-generation sequencing of tissues obtained from cv Summer. Tissues were obtained from plants at the S4 stage, seeds at physiological maturity; . Contamination within the assembled sequences from other organisms was quite low, indicating that a majority of these sequences were derived from switchgrass tissues. However, some level of contamination from insect, fungal, and bacterial sources can be expected in field-harvested tissues. As might be expected, the level of non-switchgrass transcripts was greater in the singleton pool as compared to the assembled contigs. A majority of the contigs (~67%) had a GO term assigned, consistent with studies in several other non-model species lacking annotated genomes [12, 14, 27, 28] (Table S2 of “Electronic supplementary material”). The numbers of singletons reported in other studies have been variable, and success in finding GO terms have been dependent on whether these sequences were analyzed separately or combined with contigs. The switchgrass singletons reported here were analyzed as a separate pool to maximize the discovery of new (and potentially rare) transcripts and to understand relative distribution of similarities to other organisms. A combination of Blast2GO and Blastx searches resulted in effectively identifying ~ 87% of the singleton sequences as sharing significant similarity to other plants. Nonetheless, actual GO annotation in this singleton pool (`40%) was lower as compared to the contig pool, indicating that deeper coverage might have improved discovery.
A sizeable fraction (~ > 40%) of the 454-derived sequences were not present in earlier EST collections , suggesting that these “new” transcripts not currently in the databases could contain some proportion of rhizome- and crown-specific sequences. However, the unequal distribution of sequences across all libraries (ESTs and 454; see Fig. 5) might have skewed these comparative analyses, and “new” sequences could contain low abundance and rare transcripts that might have escaped detection during Sanger sequencing of the different switchgrass tissues . The relatively large number of overlapping sequences present in all the libraries can be expected to contain transcripts coding for many metabolic processes common to all switchgrass tissues.
New sequences generally occupied a similar distribution within the biological processes and molecular function GO terms as compared the overall 454 assembly, although certain categories broadly classified as “binding” and “hydrolytic/transferase activities” within GO were somewhat overrepresented. These data indicated that singleton sequences could code for specific but rare proteins such as transcription factors that are needed for normal functioning of crowns and rhizomes. Indeed a manual annotation of two sets of sequences classified as coding for peroxidases and transcription factors showed the utility of such a detailed analysis. Five new peroxidases, a cytosolic ascorbate peroxidase, and thioredoxin peroxidase were identified. A bacterial-induced peroxidase homolog  present in the tissues analyzed indicated that rhizome and crowns could have been under biotic stress. Although care was exercised to process field-harvested tissues as quickly as possible (within 1 h of harvest), some elevation of transcripts associated with stress and or wounding might be expected.
Singleton sequences also coded for a number of transcription factors, which could be expected to be of lower abundance in the transcriptome. Although they belonged to diverse families, several factors that control chromatin remodeling were identified. Among these, the switchgrass orthologs of the Jumonji-type ELF6 , REF6 , and SWI3D  could provide a future means to understand chromatin remodeling that might occur in response to the seasonal growth habits of the switchgrass plants. Additionally, many of these transcription factors control the expression of genes involved in basal cell metabolism and responses to biotic and abiotic stress. Transcript abundances for genes coding for metabolic pathways were variable, although abundances for those involved in primary metabolism were quite high. The assembly provided significant insights into the status of these tissues and broadly indicated that there was active metabolism taking place in the crown and rhizomes at this stage of plant development. Future next-generation analyses of crown, rhizome, and root transcriptomes across the growing season, among switchgrass populations with divergent winter hardiness responses, should yield even greater insights into tissues that impact perenniality and are essential to the sustainable production of this important bioenergy feedstock.
We thank Steve Masterson and Patrick Callahan for excellent technical support. This work was supported by the Office of Science (BER), US Department of Energy Grant Number DE-AI02-09ER64829, and USDA-ARS CRIS project 5440-21000-028-00D. The US Department of Agriculture, Agricultural Research Service, is an equal opportunity/affirmative action employer and all agency services are available without discrimination. Mention of commercial products and organizations in this manuscript is solely to provide specific information. It does not constitute endorsement by USDA-ARS over other products and organizations not mentioned.