Suitability of Illumina deep mRNA sequencing for reliable gene expression profiling in a non-model conifer species (Pseudotsuga menziesii)

Hess, Moritz; Wildhagen, Henning; Ensminger, Ingo

doi:10.1007/s11295-013-0656-2

Suitability of Illumina deep mRNA sequencing for reliable gene expression profiling in a non-model conifer species (Pseudotsuga menziesii)

Original Paper
Published: 21 September 2013

Volume 9, pages 1513–1527, (2013)
Cite this article

Tree Genetics & Genomes Aims and scope Submit manuscript

Moritz Hess^1,2,
Henning Wildhagen¹^nAff3 &
Ingo Ensminger^1,4

383 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

Pseudotsuga menziesii (Douglas-fir) is an ideal model system to study the effect of local adaptation and intraspecific variation in transcriptome responses to the environment. Nonetheless, the lack of genomic resources and standardized microarray platforms for gene expression profiling has been a limitation to test the hypothesis on transcriptome organization and variation. Only recently, deep mRNA sequencing has become a promising alternative to overcome the present limitations. However, information on the transcript abundance distribution is needed for unbiased gene expression profiling from mRNA sequencing data. Since this information is not available for adult conifer needle tissue, we inferred the transcript abundance distribution and tested the effect of sequencing depth on the reliable detection and quantification of transcripts from the needle tissue of 50-year-old Douglas-fir trees. We obtained a similar distribution of GO-slim categories in our mRNA-sequencing libraries and in previously published putative unique transcripts (PUTs) for Douglas-fir, that were used as alignment reference. However, the GO-slim distribution in the Douglas-fir libraries and the Douglas-fir PUTs differed from the GO-slim distributions reported from mRNA deep sequencing libraries obtained from Arabidopsis thaliana leaf tissue. Apparently, several highly abundant PUTs associated with proteins involved in photosynthesis were limiting the benefits of increased sequencing depth. Simulations and empirical data indicated that a 3-fold increase from 5 to 15 million aligned reads results in about twice the number of PUTs that surpass the 100 aligned reads threshold that was used for robust transcript quantification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transcriptome responses to temperature, water availability and photoperiod are conserved among mature trees of two divergent Douglas-fir provenances from a coastal and an interior habitat

Article Open access 26 August 2016

De novo transcriptome analysis reveals tissue-specific differences in gene expression in Salix arbutifolia

Article 22 April 2016

PacBio single-molecule long-read sequencing shed new light on the complexity of the Carex breviculmis transcriptome

Article Open access 29 October 2019

Abbreviations

CA:: Canada
CV:: coefficient of variation
EST:: expressed sequence tags
GEO:: gene expression omnibus
GO:: gene ontology
KDE:: kernel density estimate
MAQC:: Micro-Array Quality Control
Mreads:: million reads
PUT:: putative unique transcript
qPCR:: quantitative polymerase chain reaction

References

Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25
Article PubMed CAS Google Scholar
Ausin I, Greenberg M, Simanshu D, Hale C, Vashisht A, Simon S, Lee T, Feng S, Española S, Meyers B et al (2012) INVOLVED IN DE NOVO 2-containing complex involved in RNA-directed DNA methylation in Arabidopsis. Proc Natl Acad Sci 109(22):8374–8381
Article PubMed CAS Google Scholar
Bullard J, Purdom E, Hansen K, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinforma 11(1):94
Article Google Scholar
Cai G, Li H, Lu Y, Huang X, Lee J, Müller P, Ji Y, Liang S (2012) Accuracy of RNA-Seq and its dependence on sequencing depth. BMC Bioinforma 13(Suppl 13):S5
Article CAS Google Scholar
Chang S, Puryear J, Cairney J (1993) A simple and efficient method for isolating RNA from pine trees. Plant Mol Biol Report 11(2):113–116
Article CAS Google Scholar
Clark J, Brooksbank C, Lomax J (2005) It's all go for plant scientists. Plant Physiol 138(3):1268–1279
Article PubMed CAS Google Scholar
Daines B, Wang H, Wang L, Li Y, Han Y, Emmert D, Gelbart W, Wang X, Li W, Gibbs R et al (2011) The Drosophila melanogaster transcriptome by paired-end RNA sequencing. Genome Res 21(2):315–324
Article PubMed CAS Google Scholar
Draghici S, Khatri P, Eklund A, Szallasi Z (2006) Reliability and reproducibility issues in DNA microarray measurements. Trends Genet 22(2):101
Article PubMed CAS Google Scholar
Gan X, Stegle O, Behr J, Steffen J, Drewe P, Hildebrand K, Lyngsoe R, Schultheiss S, Osborne E, Sreedharan V et al (2011) Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477(7365):419–423
Article PubMed CAS Google Scholar
Götz S, García-Gómez JM, Terol J et al (2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36:3420–3435
Article PubMed Google Scholar
Gugger P, Sugita S, Cavender-Bares J (2010) Phylogeography of Douglas-fir based on mitochondrial and chloroplast DNA sequences: testing hypotheses from the fossil record. Mol Ecol 19(9):1877–1897
Article PubMed CAS Google Scholar
Hebenstreit D, Fang M, Gu M, Charoensawan V, van Oudenaarden A, Teichmann S (2011) RNA sequencing reveals two major classes of gene expression levels in metazoan cells. Mol Syst Biol 7(1):497
PubMed Google Scholar
Hermann R, Lavender D (1999) Douglas-fir planted forests. New For 17(1):53–70
Article Google Scholar
Holliday J, Ralph S, White R, Bohlmann J, Aitken S (2008) Global monitoring of autumn gene expression within and among phenotypically divergent populations of Sitka spruce (Picea sitchensis). New Phytol 178(1):103–122
Article PubMed CAS Google Scholar
Howe GT, Yu J, Knaus B et al (2013) A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation. BMC Genomics 14:137
Article PubMed CAS Google Scholar
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95
Article Google Scholar
Łabaj P, Leparc G, Linggi B, Markillie L, Wiley H, Kreil D (2011) Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 27(13):i383–i391
Article PubMed Google Scholar
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359
Article PubMed CAS Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760
Article PubMed CAS Google Scholar
Li P, Ponnala L, Gandotra N, Wang L, Si Y, Tausta S, Kebrom T, Provart N, Patel R, Myers C et al (2010) The developmental dynamics of the maize leaf transcriptome. Nat Genet 42(12):1060–1067
Article PubMed CAS Google Scholar
Liu S, Lin L, Jiang P, Wang D, Xing Y (2011) A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res 39(2):578–588
Article PubMed CAS Google Scholar
Lorenz W, Alba R, Yu Y, Bordeaux J, Simões M, Dean J (2011) Microarray analysis and scale-free gene networks identify candidate regulators in drought-stressed roots of Loblolly pine (P. taeda L.). BMC Genomics 12(1):264
Article PubMed CAS Google Scholar
Mane S, Evans C, Cooper K, Crasta O, Folkerts O, Hutchison S, Harkins T, Thierry-Mieg D, Thierry-Mieg J, Jensen R (2009) Transcriptome sequencing of the microarray quality control (MAQC) RNA reference samples using next generation sequencing. BMC Genomics 10(1):264
Article PubMed Google Scholar
Marioni J, Mason C, Mane S, Stephens M, Gilad Y (2008) RNA-Seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–1517
Article PubMed CAS Google Scholar
McIntyre L, Lopiano K, Morse A, Amin V, Oberg A, Young L, Nuzhdin S (2011) RNA-Seq: technical variability and sampling. BMC Genomics 12(1):293
Article PubMed CAS Google Scholar
Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628
Article PubMed CAS Google Scholar
Müller T, Ensminger I, Schmid K (2012) A catalogue of putative unique transcripts from Douglas-fir (Pseudotsuga menziesii) based on 454 transcriptome sequencing of genetically diverse, drought stressed seedlings. BMC Genomics 13(1):673
Article PubMed Google Scholar
Ning Z, Cox A, Mullikin J (2001) SSAHA: a fast search method for large DNA databases. Genome Res 11(10):1725–1729
Article PubMed CAS Google Scholar
Oberg A, Bot B, Grill D, Poland G, Therneau T (2012) Technical and biological variance structure in mRNA-Seq data: life in the real world. BMC Genomics 13(1):304
Article PubMed CAS Google Scholar
Raherison E, Rigault P, Caron S, Poulin P, Boyle B, Verta J, Giguère I, Bomal C, Bohlmann J, MacKay J (2012) Transcriptome profiling in conifers and the PiceaGenExpress database show patterns of diversification within gene families and interspecific conservation in vascular gene expression. BMC Genomics 13(1):434
Article PubMed CAS Google Scholar
Ramsköld D, Wang E, Burge C, Sandberg R (2009) An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput Biol 5(12):e1000,598
Article Google Scholar
Raz T, Kapranov P, Lipson D, Letovsky S, Milos P, Thompson J (2011) Protocol dependence of sequencing-based gene expression measurements. PloS One 6(5):e19,287
Article CAS Google Scholar
Rigault P, Boyle B, Lepage P, Cooke J, Bousquet J, MacKay J (2011) A white spruce gene catalog for conifer genome analyses. Plant Physiol 157(1):14–28
Article PubMed CAS Google Scholar
Shi L, Reid L, Jones W, Shippy R, Warrington J, Baker S, Collins P, de Longueville F, Kawasaki E, Lee K et al (2006) The microarray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9):1151–1161
Article PubMed CAS Google Scholar
Smith A, Heisler L, Onge R, Farias-Hesson E, Wallace I, Bodeau J, Harris A, Perry K, Giaever G, Pourmand N et al (2010) Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Res 38(13):e142–e142
Article PubMed Google Scholar
Tarazona S, Garcìa-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011) Differential expression in RNA-Seq: a matter of depth. Genome Res 21(12):2213–2223
Article PubMed CAS Google Scholar
Team RC (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, pp 1–1731
Toung J, Morley M, Li M, Cheung V (2011) RNA-sequence analysis of human B-cells. Genome Res 21(6):991–998
Article PubMed CAS Google Scholar

Download references

Acknowledgments

This project is part of the collaborative project "DougAdapt" with funding from the Deutsche Forschungsgemeinschaft to IE (DFG-project EN 829/4-1). The authors are grateful to Anita Kleiber and Anna-Maria Weisser for technical assistance with RNA extraction. The authors also thank Wolfgang Hess for valuable comments and discussion.

Conflict of interests

The authors declare that they have no competing interests.

Ethical standards

All experiments comply with the current laws of the Federal Republic of Germany.

Data archiving statement

All sequence data has been submitted to the NCBI Sequence Read Archive (SRA, www.ncbi.nlm.nih.gov/sra). Accession numbers are SRR908308(COA1), SRR908309 (COA2), SRR868709 (INT1), SRR908307 (INT2). Accession number of the study: SRP026170.

Author information

Henning Wildhagen
Present address: Department of Forest Botany and Tree Physiology, Büsgen-Institute, Georg-August-University Göttingen, Büsgenweg 2, 37077, Göttingen, Germany

Authors and Affiliations

Forest Research Institute of Baden-Württemberg (FVA), Wonnhaldestrasse 4, 79100, Freiburg, Germany
Moritz Hess, Henning Wildhagen & Ingo Ensminger
Institute of Biology III, Faculty of Biology, Albert Ludwigs University Freiburg, Schänzlestrasse 1, 79104, Freiburg, Germany
Moritz Hess
Department of Biology, University of Toronto Mississauga, 3359 Mississauga Road North, Mississauga, Ontario, Canada, L5L 1C6
Ingo Ensminger

Authors

Moritz Hess
View author publications
You can also search for this author in PubMed Google Scholar
Henning Wildhagen
View author publications
You can also search for this author in PubMed Google Scholar
Ingo Ensminger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ingo Ensminger.

Additional information

Communicated by J. Wegrzyn

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource 1

Distribution of GO-slim categories in the namespace "Biological Process" within the 100 most abundant PUTs. Distribution of GO-slim categories within the 100 highest expressed PUTs in the deep sequencing libraries COA1, COA2, INT1 and INT2 that were detected when using the Müller or the Howe PUT set as alignment reference. Annotation and GO-slimming is described in Methods section (XLS 14 kb)

Online Resource 2

Annotation statistics for Howe PUT set. Summary of numbers of detected PUTs and detected PUTs with functional annotation. Functionally annotated PUTs have a hit in the Arabidopsis thaliana peptide database (ARA) or in the NCBI Plant RefSeq peptide database. "Unique annotations" are the set of all unique hits in the A. thaliana or the NCBI Plant RefSeq peptide database. GO annotations refer to the PUTs with GO slim annotation (details of functional and GO-slim annotation in Methods section). Blast2GO annotations refer to annotations inferred by Blast2GO. Relative numbers with respect to the PUT set are shown in parentheses, relative numbers with respect to the deep sequencing libraries are shown in square brackets (XLS 11 kb)

Online Resource 3

Distribution of GO-slim terms (Howe PUT set). The relative abundance of functional categories, represented by plant GO-slims in the four libraries and the Douglas-fir PUT set, compared with the relative abundance detected in deep mRNA-sequencing data generated from Arabidopsis thaliana whole seedlings (NCBI GEO [GSM762070]) and leaves (NCBI GEO [GSM881683]). The functional annotation of Douglas-fir PUTs was obtained by aligning the PUTs to the NCBI Plant RefSeq peptide database and feeding the alignment to the Blast2GO pipeline (for details, see Methods section). The distribution of the deviation of GO-slim abundances relative to the Howe PUT set or the A. thaliana samples in the namespace "molecular function" is shown as smoothed kernel density estimates (KDE) (a). The relative abundance of a GO-slim category in one of the four Douglas-fir libraries or the Douglas-fir PUT set is normalized by the relative abundance of this GO-slim category in an A. thaliana full seedling (b) or leaf (c) deep mRNA-sequencing library. A value of 0 plotted on the y-axis implies an equal distribution of GO-slim terms in the Douglas-fir libraries compared to A. thaliana deep mRNA sequencing libraries or the Douglas-fir PUT set (PDF 1702 kb)

Online Resource 4

Impact of sequencing depth on the number of reliably quantified PUTs when using the Howe PUT set. The number of PUTs with a hit in the NCBI Plant RefSeq peptide database detected with more than x number of aligned reads (value shown on the x-axis). To demonstrate the effect of sequencing depth, sub-samples of library COA2 are included (gradient: yellow to red). The number of aligned reads is printed in the legend. Estimates of expected binomial sampling error (as coefficient of variation [CV]), dependent on the number of aligned reads per PUT are shown for 10, 100 and 1,000 aligned reads per PUT (PDF 858 kb)

Online Resource 5

Shared annotations among Müller and Howe PUT sets, P. glauca transcript clusters and Arabidopsis peptides. Venn diagram which shows the overlap of the functional annotations inferred by Blast2GO of the Müller PUT set (Muller), the Howe PUT set (Howe), the P. glauca transcript cluster database (Picea) and the Arabidopsis thaliana peptide database (Ara). All sequence sets have been aligned to the NCBI Plant RefSeq peptides. This alignment was used for detecting annotations using Blast2GO (for details, see Methods section) (PDF 307 kb)

Online Resource 6

Top 1,000 most abundant PUTs in the deep sequencing libraries (alignment to Müller and Howe PUT set). Top 1,000 most abundant PUTs in the libraries COA1, COA2, INT1 and INT2 sorted by the number of aligned reads. For each PUT, the PUT name with the associated annotation inferred by Blast2GO and the number of aligned reads are printed in the form: PUT name-Blast2GO Annotation-counts (XLS 5787 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hess, M., Wildhagen, H. & Ensminger, I. Suitability of Illumina deep mRNA sequencing for reliable gene expression profiling in a non-model conifer species (Pseudotsuga menziesii). Tree Genetics & Genomes 9, 1513–1527 (2013). https://doi.org/10.1007/s11295-013-0656-2

Download citation

Received: 20 February 2013
Revised: 11 July 2013
Accepted: 05 August 2013
Published: 21 September 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s11295-013-0656-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Suitability of Illumina deep mRNA sequencing for reliable gene expression profiling in a non-model conifer species (Pseudotsuga menziesii)

Abstract

Access this article

Similar content being viewed by others

Transcriptome responses to temperature, water availability and photoperiod are conserved among mature trees of two divergent Douglas-fir provenances from a coastal and an interior habitat

De novo transcriptome analysis reveals tissue-specific differences in gene expression in Salix arbutifolia

PacBio single-molecule long-read sequencing shed new light on the complexity of the Carex breviculmis transcriptome

Abbreviations

References

Acknowledgments

Conflict of interests

Ethical standards

Data archiving statement

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Online Resource 1

Online Resource 2

Online Resource 3

Online Resource 4

Online Resource 5

Online Resource 6

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Suitability of Illumina deep mRNA sequencing for reliable gene expression profiling in a non-model conifer species (Pseudotsuga menziesii)

Abstract

Access this article

Similar content being viewed by others

Abbreviations

References

Acknowledgments

Conflict of interests

Ethical standards

Data archiving statement

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation