Abstract
In this chapter, we describe methods for analyzing RNA-Seq data, presented as a flow along a pipeline beginning with raw data from a sequencer and ending with an output of differentially expressed genes and their functional characterization. The first section covers de novo transcriptome assembly for organisms lacking reference genomes or for those interested in probing against the background of organism-specific transcriptomes assembled from RNA-Seq data. Section 2 covers both gene- and transcript-level quantifications, leading to the third and final section on differential expression analysis between two or more conditions. The pipeline starts with raw sequence reads, followed by quality assessment and preprocessing of the input data to ensure a robust estimate of the transcripts and their differential regulation. The preprocessed data can be inputted into the de novo transcriptome flow to assemble transcripts, functionally annotated using tools such as InterProScan or Blast2Go and then forwarded to differential expression analysis flow, or directly inputted into the differential expression analysis flow if a reference genome is available. An online repository containing sample data has also been made available, as well as custom Python scripts to modify the output of the programs within the pipeline for various downstream analyses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Costa-Silva J, Domingues D, Lopes FM (2017) RNA-Seq differential expression analysis: an extended review and a software tool. PLoS One 12:1–18
Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of De novo generated eukaryotic transcriptomes. Front Genet 6:1–9
Müller M, Seifert S, Lübbe T et al (2017) De novo transcriptome assembly and analysis of differential gene expression in response to drought in European beech. PLoS One 12:1–20
Wang X, Yang S, Dong Y et al (2018) De novo transcriptome characterization of Rhodomyrtus tomentosa leaves and identification of genes involved in a/ß-pinene and ß-caryophyllene biosynthesis. Front Plant Sci 9:1231
Li QS, Li XM, Qiao RY et al (2018) Data descriptor: De novo transcriptome assembly of fluorine accumulator tea plant camellia sinensis with fluoride treatments. Sci Data 5:1–9
Jones P, Binns D, Chang HY et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240
Conesa A, Götz S (2008) Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008:619832
Frazee AC, Jaffe AE, Langmead B et al (2015) Polyester: Simulating RNA-seq datasets with differential transcript expression. Bioinformatics 31:2778–2784
Berardini TZ, Reiser L, Li D et al (2015) The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome. Genesis 53:474–485
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Honaas LA, Wafula EK, Wickett NJ et al (2016) Selecting superior de novo transcriptome assemblies: lessons learned by leveraging the best plant genome. PLoS One 11:1–42
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Smith-Unna R, Boursnell C, Patro R et al (2016) TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134–1144
Leinonen R, Sugawara H, Shumway M et al (2011) The sequence read archive. Nucleic Acids Res 39:D19–D21
Teng M, Love MI, Davis CA et al (2016) A benchmark for RNA-seq quantification pipelines. Genome Biol 17
Baruzzo G, Hayer KE, Kim EJ et al (2017) Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods 14:135–139
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
Bray NL, Pimentel H, Melsted P et al (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525–527
Patro R, Duggal G, Love MI et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419
Wu DC, Yao J, Ho KS et al (2018) Limitations of alignment-free tools in total RNA-seq quantification. BMC Genomics 19:1–14
Soneson C, Love MI, Robinson MD (2016) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 4:1521
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:1–21
Marini F and Binder H (2016) Gene expression pcaExplorer : an R/Bioconductor package for interacting with RNA-seq principal components. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2879-1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Burks, D.J., Azad, R.K. (2022). RNA-Seq Data Analysis Pipeline for Plants: Transcriptome Assembly, Alignment, and Differential Expression Analysis. In: Shulaev, V. (eds) Plant Metabolic Engineering. Methods in Molecular Biology, vol 2396. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1822-6_5
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1822-6_5
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1821-9
Online ISBN: 978-1-0716-1822-6
eBook Packages: Springer Protocols