RNA-Seq Data Analysis Pipeline for Plants: Transcriptome Assembly, Alignment, and Differential Expression Analysis

Burks, David J.; Azad, Rajeev K.

doi:10.1007/978-1-0716-1822-6_5

David J. Burks³ &
Rajeev K. Azad^3,4

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2396))

1522 Accesses
2 Citations
1 Altmetric

Abstract

In this chapter, we describe methods for analyzing RNA-Seq data, presented as a flow along a pipeline beginning with raw data from a sequencer and ending with an output of differentially expressed genes and their functional characterization. The first section covers de novo transcriptome assembly for organisms lacking reference genomes or for those interested in probing against the background of organism-specific transcriptomes assembled from RNA-Seq data. Section 2 covers both gene- and transcript-level quantifications, leading to the third and final section on differential expression analysis between two or more conditions. The pipeline starts with raw sequence reads, followed by quality assessment and preprocessing of the input data to ensure a robust estimate of the transcripts and their differential regulation. The preprocessed data can be inputted into the de novo transcriptome flow to assemble transcripts, functionally annotated using tools such as InterProScan or Blast2Go and then forwarded to differential expression analysis flow, or directly inputted into the differential expression analysis flow if a reference genome is available. An online repository containing sample data has also been made available, as well as custom Python scripts to modify the output of the programs within the pipeline for various downstream analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Costa-Silva J, Domingues D, Lopes FM (2017) RNA-Seq differential expression analysis: an extended review and a software tool. PLoS One 12:1–18
Article Google Scholar
Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of De novo generated eukaryotic transcriptomes. Front Genet 6:1–9
Article Google Scholar
Müller M, Seifert S, Lübbe T et al (2017) De novo transcriptome assembly and analysis of differential gene expression in response to drought in European beech. PLoS One 12:1–20
Google Scholar
Wang X, Yang S, Dong Y et al (2018) De novo transcriptome characterization of Rhodomyrtus tomentosa leaves and identification of genes involved in a/ß-pinene and ß-caryophyllene biosynthesis. Front Plant Sci 9:1231
Article Google Scholar
Li QS, Li XM, Qiao RY et al (2018) Data descriptor: De novo transcriptome assembly of fluorine accumulator tea plant camellia sinensis with fluoride treatments. Sci Data 5:1–9
Article Google Scholar
Jones P, Binns D, Chang HY et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240
Article CAS Google Scholar
Conesa A, Götz S (2008) Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008:619832
Article Google Scholar
Frazee AC, Jaffe AE, Langmead B et al (2015) Polyester: Simulating RNA-seq datasets with differential transcript expression. Bioinformatics 31:2778–2784
Article CAS Google Scholar
Berardini TZ, Reiser L, Li D et al (2015) The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome. Genesis 53:474–485
Article CAS Google Scholar
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Article CAS Google Scholar
Honaas LA, Wafula EK, Wickett NJ et al (2016) Selecting superior de novo transcriptome assemblies: lessons learned by leveraging the best plant genome. PLoS One 11:1–42
Article Google Scholar
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Article CAS Google Scholar
Smith-Unna R, Boursnell C, Patro R et al (2016) TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134–1144
Article CAS Google Scholar
Leinonen R, Sugawara H, Shumway M et al (2011) The sequence read archive. Nucleic Acids Res 39:D19–D21
Article CAS Google Scholar
Teng M, Love MI, Davis CA et al (2016) A benchmark for RNA-seq quantification pipelines. Genome Biol 17
Google Scholar
Baruzzo G, Hayer KE, Kim EJ et al (2017) Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods 14:135–139
Article CAS Google Scholar
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
Article CAS Google Scholar
Bray NL, Pimentel H, Melsted P et al (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525–527
Article CAS Google Scholar
Patro R, Duggal G, Love MI et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419
Article CAS Google Scholar
Wu DC, Yao J, Ho KS et al (2018) Limitations of alignment-free tools in total RNA-seq quantification. BMC Genomics 19:1–14
Article Google Scholar
Soneson C, Love MI, Robinson MD (2016) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 4:1521
Article Google Scholar
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:1–21
Article Google Scholar
Marini F and Binder H (2016) Gene expression pcaExplorer : an R/Bioconductor package for interacting with RNA-seq principal components. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2879-1

Download references

Author information

Authors and Affiliations

Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA
David J. Burks & Rajeev K. Azad
Department of Mathematics, University of North Texas, Denton, TX, USA
Rajeev K. Azad

Authors

David J. Burks
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev K. Azad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajeev K. Azad .

Editor information

Editors and Affiliations

Department of Biological Sciences, University of North Texas, Denton, TX, USA
Vladimir Shulaev

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Burks, D.J., Azad, R.K. (2022). RNA-Seq Data Analysis Pipeline for Plants: Transcriptome Assembly, Alignment, and Differential Expression Analysis. In: Shulaev, V. (eds) Plant Metabolic Engineering. Methods in Molecular Biology, vol 2396. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1822-6_5

Download citation

DOI: https://doi.org/10.1007/978-1-0716-1822-6_5
Published: 17 November 2021
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1821-9
Online ISBN: 978-1-0716-1822-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics