Transcriptome characterization and detection of gene expression differences in aspen (Populus tremuloides)
- First Online:
- Cite this article as:
- Rai, H.S., Mock, K.E., Richardson, B.A. et al. Tree Genetics & Genomes (2013) 9: 1031. doi:10.1007/s11295-013-0615-y
- 853 Views
Aspen (Populus tremuloides) is a temperate North American tree species with a geographical distribution more extensive than any other tree species on the continent. Because it is economically important for pulp and paper industries and ecologically important for its role as a foundation species in forest ecosystems, the decline of aspen in large portions of its range is of serious concern. The availability and annotation of the black cottonwood (Populus trichocarpa) genome enables a range of high throughput sequencing approaches that can be used to understand rangewide patterns of genetic variation, adaptation, and responses to environmental challenges in other Populus species, including aspen. Gene expression studies are particularly useful for understanding the molecular basis of ecological responses, but are limited by the availability of transcriptome data. We explored the aspen transcriptome through the use of high-throughput sequencing with two main goals: (1) characterization of the expressed portion of the P. tremuloides genome in leaves and (2) assessment of variation in gene expression among genets collected from distinct latitudes but reared in a common garden. We also report a large single nucleotide polymorphism dataset that provides the groundwork for future studies of aspen evolution and ecology, and we identify a set of differentially expressed genes across individuals and population boundaries for the leaf transcriptome of P. tremuloides.
KeywordsTrembling aspenQuaking aspenRNA-SeqDifferential expressionSNP (single nucleotide polymorphism)Populus trichocarpa
The poplars (Populus L., Salicaceae) are widely distributed northern hemisphere trees that comprise 22–85 species, usually in six sections, depending on taxonomic scheme (Eckenwalder 1977; Hamzeh and Dayanandan 2004). Here, we have sequenced the leaf tissue transcriptome of Populus tremuloides Michx. (trembling aspen), a close relative of the fully sequenced model organism black cottonwood, Populus trichocarpa Torr. & Gray. P. tremuloides has the broadest distribution, in terms of both latitude and longitude, of any North American tree species (Jones 1985) and has been proposed as an ideal study system for both adaptation and responses to climate change (Hogg and Hurdle 1995; Isebrands et al. 2001). Its distribution across many climatic zones suggests that P. tremuloides is highly plastic, highly adaptable, or both (Mitton and Grant 1996). P. tremuloides has been shown to have extremely high levels of genetic variation at the population level, both in terms of trait heritability and molecular marker diversity (Cole 2005; Liu and Furnier 1993). Aspen is economically important for pulp and paper industries and ecologically important for its role as a foundation species in forest ecosystems (Mitton and Grant 1996). Dramatic mortality in certain portions of the species’ range has recently been documented, raising concerns about the ability of P. tremuloides to respond to a changing climate (Frey et al. 2004; Iverson and Prasad 2002; Rehfeldt et al. 2009; Worrall et al. 2008; Hogg et al. 2008). A pervasive loss of this tree species at a continental scale would have significant ecological and economic impact and could result in dramatic carbon fluxes (Kurz and Apps 1999; Hogg and Hurdle 1995).
The availability of a high-quality genomic reference in the genus, P. trichocarpa (Tuskan et al. 2006), along with dramatic advances in sequencing technologies (reviewed by Fang and Cui 2011; Metzker 2009), provide new opportunities to understand the evolutionary history of Populus species and to characterize molecular responses to environmental challenges (e.g., Jiang et al. 2012; Qiu et al. 2011). In P. tremuloides, these technologies enable studies at the molecular level to examine how a single species can achieve such an enormous ecological amplitude (e.g., through plasticity, adaptation, migration) and what factors limit its persistence in changing climates. Answers to these questions are critical for effective management of natural populations as well as for silvicultural improvement aimed at specific applications (e.g., biofuels, carbon sequestration).
Here, we present an initial characterization of the P. tremuloides transcriptome. The high-throughput sequencing of cDNA libraries (RNA-Seq) enables the direct study of the expressed portion of the genome in sampled tissues (Marioni et al. 2008; Nagalakshmi et al. 2008; Perkins et al. 2009; Wang et al. 2009). Compared to array-based technologies, RNA-Seq has the advantage of higher sensitivity and the ability to capture a significantly larger component of gene expression (Severin et al. 2010), allowing for the assay of tens of thousands of transcripts, or even the entire transcriptome (e.g., Bajgain et al. 2011; Bräutigam et al. 2011; Coppe et al. 2010; Geraldes et al. 2011; Nagalakshmi et al. 2008; Severin et al. 2010; Wang et al. 2009). A distinct advantage of RNA-Seq is that the relative abundance of sequencing reads, aligned with or without the existence of a reference genome, can be used as an estimate of transcript abundance, which can be compared among individuals or under different environmental conditions (Bashir et al. 2010; Wang et al. 2009).
Variation within the transcriptome of other Populus species has been investigated for responses to various sources of stress using microarrays (Grisel et al. 2010; Hamanishi et al. 2010), for comparative studies (using real-time PCR and microarray analysis: Quesada et al. 2008), and even as a tool for understanding ecosystem interactions (using RNA-Seq data: Larsen et al. 2011). However, little attention has been paid to the descriptive, functional, or comparative genomics of P. tremuloides. Here, we present an overview of the aspen transcriptome, focusing on expression in leaf tissues using an RNA-Seq approach. Our major objectives are (1) to provide a snapshot of the expressed portion of the aspen genome and (2) to characterize variation present within and among aspen from two latitudinally distinct regions.
Within the transcribed regions reported here, we characterize sequence variation [single nucleotide polymorphisms (SNPs)] present in the expressed regions of the genome of aspen within and among individual genotypes. Gene expression patterns vary during development, between tissue types, and with external stimuli, and play a key role in the functioning of all organisms. We also highlight novel transcribed regions of the genome relative to the transcript assemblies of black cottonwood (P. trichocarpa v2; Phytozome v8.0). The availability of this type of data has utility in future studies of aspen, including the discovery and identification of novel genes, identification of genetic variants useful for mapping and association studies, assessment of plasticity and adaptive mechanisms, and identification of genetic markers for quantitative trait loci (QTL).
Materials and methods
Plant material and RNA isolation
Geographic sources of root cuttings used for greenhouse common garden propagation
Trout Creek, MT, USA
Latitude: 47°52′2.454″ N
Longitude: 115°36′42.078″ W
Trout Creek, MT, USA
Latitude: 47°53′24.630″ N
Longitude: 115°37′ 33.050″ W
Trout Creek, MT, USA
Latitude: 47° 53′ 28.169″ N
Longitude: 115° 37′ 14.896″ W
Mt. Lemon, AZ, USA
Latitude: 32°25′7.368″ N
Longitude: 110°43′55.673″ W
Mt. Lemon, AZ, USA
Latitude: 32°27′7.721″ N
Longitude: 110°46′59.474″ W
cDNA library preparation
Sera-Mag Oligo(dT) beads (Thermo Scientific, Freemont, CA, USA) were used to enrich poly-adenylated RNA by hybridization, following the manufacturer’s protocol, and the enriched RNA was resuspended in 50 μl of 10 mM Tris–HCl. Library preparation followed the Illumina GAII RNA-Seq protocol (San Diego, CA; part no. 1004898 Rev. A), except that custom barcoded adapters were used during ligation steps (Cronn et al. 2008). We used PCR to enrich libraries for fragments containing adapters ligated in the correct orientation. Following each enzymatic step, a cleaning step was performed using AMPure XP beads (Agencourt, Danvers, MA, USA). Libraries were quantified using a Qubit fluorometer and their quality validated using an Agilent 2100 Bioanalyser (Santa Clara, CA, USA). The five libraries were pooled in equimolar quantities for a 5 pM final concentration and run on three separate lanes of the Illumina sequencer (two of these lanes were part of the same flow cell).
Summary of reads generated from Illumina sequencing for five Populus tremuloides individuals from two populations in Western North America
We trimmed adapter sequences and binned single-end reads by barcode (deconvolution) and aligned them to the P. trichocarpa v2 transcript assemblies [45,033 protein-coding transcripts (file name: Ptrichocarpa_156_transcript.fa.gz); Phytozome v8.0] using CLC Genomics Workbench (v4.9; CLC Bio, Cambridge, MA, USA). We mapped to transcript assemblies rather than to the genome to avoid problems with gapped alignments at splice junctions. The maximum gap and mismatch count were set to 2, and insertion and deletion costs were set to 3, with a minimum contig length of 200 bp. Length fraction and similarity parameters were set to 0.5 and 0.8, respectively.
Gene ontology (GO) annotations were assigned using Blast2GO (Conesa et al. 2005). GO terms were searched and retrieved to match the results from a blastx search against the NCBI nr database (maximum of 10 hits for each contig with an e-value cutoff of 1 × e−10), and the annotation function of Blast2GO was used to select final GO terms for each contig. Finally, the terms were analyzed through the GO Slim for plants function of Blast2GO to provide a broad overview of the gene product functions contained within the ontology content.
Summary of blast results of the unmapped reads in Populus tremuloides from the reference-guided assembly. A de novo assembly of these reads resulted in 13,544 contigs
Number of contigs
Percent of total
No significant match
We used our reference-guided assembly (above) to detect SNPs using CLC Genomics Workbench to detect SNPs occurring between P. tremuloides and P. trichocarpa as well as SNPs between the two sampled populations of P. tremuloides. Only variants with a minimum coverage of 8× and a minimum variant count (same as minimum allele depth) of 4 were called. Maximum coverage for the SNP search was set at 500× in order to avoid highly repetitive regions of the genome. The resulting search for variable sites was filtered for SNPs specific to aspen and also filtered for SNPs within open reading frames (ORFs) of more than 200 bp.
We used our raw reads to explore differential expression among P. tremuloides genets within and among sampling sites representing different latitudes. In order to quantify transcript abundance, the 73 bp reads were mapped to the P. trichocarpa reference using the reference-based aligner Bowtie version 0.12.7 (http://bowtie-bio.sourceforge.net/index.shtml), resulting in SAM format files. Bowtie deals with multimapping reads by assigning them randomly among transcript sites (Langmead et al. 2009). The maximum number of mismatches was set to 2 in the –n alignment mode. The number of reads mapped to each transcript was tallied using a custom perl script. Unbalanced library sizes were down-sampled using the method implemented in NBPSeq (Di et al. 2011); the smallest library size was identified, and the other libraries were down-sampled to this number of reads, based on a probabilistic step, resulting in approximately equal library sizes. Once libraries were normalized to balanced sizes, comparisons of differential expression between genets were made using Fisher’s exact test (Fisher 1922), which is appropriate for small sample sizes (1 or 2). We corrected for multiple comparisons by setting the false discovery rate to 0.05 (Benjamini and Hochberg 1995). For comparison across the two sampling sites, the dispersion parameter for the negative binomial distribution was estimated for each transcript using the Bioconductor package edgeR (Robinson et al. 2010) using the “estimateTagwiseDisp” function, and negative-binomial exact tests were fit for each gene using the dispersion parameters estimated above. Results from the statistical analysis of differential expression are presented with MA plots (log ratio versus log abundance) using the plotSmear function in EdgeR (Robinson et al. 2010).
Results and discussion
In this study, we characterized the leaf transcriptome of P. tremuloides, an ecologically important tree species in western North America. Although we lacked biological and experimental replication, our study design included five individual aspen genotypes from two distinct and latitudinally separated sampling sites. This enabled us to make the first examination of variation in gene expression in aspen based on RNA-Seq data. We mapped the trimmed 73-bp sequence reads to the P. trichocarpa reference genome (Tuskan et al. 2006), from which we generated a functional characterization of the transcriptome and a set of putative SNPs. We also characterized the differences in gene expression patterns between sampled regions and among individual genets.
Transcriptome assembly and analysis
We generated 43,742,783 sequence reads from three independent runs of pooled and barcoded RNA-Seq libraries (Table 2). All reads are deposited at the NCBI Sequence Read Archive (accession number SRA057223). Assuming a high degree of sequence and transcript similarity (Cervera et al. 2005; Chen et al. 2010; Chen et al. 2007; Cole 2005; Hamzeh and Dayanandan 2004; Hamzeh et al. 2006; Unneberg et al. 2005), sequences from all P. tremuloides individuals were combined into a single reference-mapped assembly against the 45,033 annotated, protein-coding transcripts of P. trichocarpa. A large proportion of the total sequencing reads, 84 % (∼37M), were successfully mapped to the reference library with an average depth of ∼66X. The 6,813,822 sequences of the ∼44M total reads were not mapped to the reference transcripts in this assembly. We detected 7.8M nonspecific reads, mapping to more than one P. trichocarpa transcript. In these cases, we assigned the reads randomly. Thus, we may have slightly overestimated the total coverage and underestimated the sequencing depth.
The above assembly resulted in the pooled P. tremuloides sequencing reads mapping to 38,177 of the 45,033 P. trichocarpa reference transcripts with at least a single read in each contig (of these, 20,906 had an average read depth of 4×). Even when this was examined for each of the five genets individually, a large proportion of the P. trichocarpa transcripts were recovered: MT5, 32,963; MT6, 32,049; MT8, 31,272; AZ8, 33,109; AZ9, 34,007 total transcripts for each genet, respectively. This demonstrates our ability to detect low abundance transcripts using RNA-Seq, with which we have captured a large proportion of expressed genes in the P. tremuloides leaf transcriptome, and that our aspen leaf transcriptome, contains a substantial proportion of the total annotated P. trichocarpa transcripts.
The de novo assembly of the unmapped reads resulted in 13,544 assembled contigs >200 bp with an N50 of 493 bp (4,042,046 unmapped reads assembled). An initial search using blastx against the NCBI “nr” database was performed (with a minimum e value cutoff of 1 × e−10; Table 3). Approximately 27 % of the de novo assembled contigs had no significant match in the NCBI ‘nr’ database, whereas 28 % had significant homology to at least one angiosperm sequence (25 % of these were reported from the genus Populus). Not surprisingly, almost half (44 %) of the contigs assembled from the unmapped reads showed sequence similarity (based on the above blast criteria) with bacteria, suggesting low-level contamination (fewer than 1 % of all high-throughput sequencing reads).
SNP discovery and genotyping
Frequency of SNPs in Populus tremuloides by mutation type
To characterize the variation present among P. tremuloides samples, we analyzed gene expression differences between the RNA-Seq libraries and determined whether there were statistically significant differences in expression levels of transcripts across samples. We made two separate comparisons: one among the five genets and a second comparison between the two geographic regions. All of our RNA-Seq libraries were carefully selected from leaves at the same developmental stages. Collected at the same time on the same date from trees grown in a greenhouse common garden.
Pairwise comparison results of the numbers of significantly differentially expressed genes among Populus tremuloides individuals: significance was based on Fisher’s exact test with a false discovery rate of 0.05
Aspen has the potential to be an important study system for responses to climate change and its effects across a landscape that spans a large majority of the North American continent. The characterization of the leaf transcriptome signals an important step in the direction of high-throughput comparative genomics in this ecologically significant tree species. We have identified a large number of SNPs in transcribed genes that should prove useful for species-wide population genetic studies of P. tremuloides, enabling a range of high-throughput sequencing and genotyping methodologies (e.g., microfluidic SNP genotyping across large number of individuals). Our study of differential gene expression in aspen is a part of a small but growing body of literature that documents differential gene expression between latitudinally separate populations (Geraldes et al. 2011; Kawahara-Miki et al. 2011; Beritognolo et al. 2011; Eckert et al. 2009; Ellison et al. 2011) and provides a first look into the differentiation and gene expression changes within the transcriptome of aspen.
Thanks to Tim Benedict, Mary Lou Fairweather, and E. Pfalzer for sample collections. Thanks to Tara Jennings for laboratory assistance and Chris Sullivan for biocomputing assistance. This research was funded by the USDA Forest Service Western Forest Transcriptome Survey and National Fire Plan (2012-NFP-GSD-1).