Skip to main content

RNA-Seq Data Analysis Pipeline for Plants: Transcriptome Assembly, Alignment, and Differential Expression Analysis

  • Protocol
  • First Online:
Plant Metabolic Engineering

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2396))

Abstract

In this chapter, we describe methods for analyzing RNA-Seq data, presented as a flow along a pipeline beginning with raw data from a sequencer and ending with an output of differentially expressed genes and their functional characterization. The first section covers de novo transcriptome assembly for organisms lacking reference genomes or for those interested in probing against the background of organism-specific transcriptomes assembled from RNA-Seq data. Section 2 covers both gene- and transcript-level quantifications, leading to the third and final section on differential expression analysis between two or more conditions. The pipeline starts with raw sequence reads, followed by quality assessment and preprocessing of the input data to ensure a robust estimate of the transcripts and their differential regulation. The preprocessed data can be inputted into the de novo transcriptome flow to assemble transcripts, functionally annotated using tools such as InterProScan or Blast2Go and then forwarded to differential expression analysis flow, or directly inputted into the differential expression analysis flow if a reference genome is available. An online repository containing sample data has also been made available, as well as custom Python scripts to modify the output of the programs within the pipeline for various downstream analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Costa-Silva J, Domingues D, Lopes FM (2017) RNA-Seq differential expression analysis: an extended review and a software tool. PLoS One 12:1–18

    Article  Google Scholar 

  2. Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of De novo generated eukaryotic transcriptomes. Front Genet 6:1–9

    Article  Google Scholar 

  3. Müller M, Seifert S, Lübbe T et al (2017) De novo transcriptome assembly and analysis of differential gene expression in response to drought in European beech. PLoS One 12:1–20

    Google Scholar 

  4. Wang X, Yang S, Dong Y et al (2018) De novo transcriptome characterization of Rhodomyrtus tomentosa leaves and identification of genes involved in a/ß-pinene and ß-caryophyllene biosynthesis. Front Plant Sci 9:1231

    Article  Google Scholar 

  5. Li QS, Li XM, Qiao RY et al (2018) Data descriptor: De novo transcriptome assembly of fluorine accumulator tea plant camellia sinensis with fluoride treatments. Sci Data 5:1–9

    Article  Google Scholar 

  6. Jones P, Binns D, Chang HY et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240

    Article  CAS  Google Scholar 

  7. Conesa A, Götz S (2008) Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008:619832

    Article  Google Scholar 

  8. Frazee AC, Jaffe AE, Langmead B et al (2015) Polyester: Simulating RNA-seq datasets with differential transcript expression. Bioinformatics 31:2778–2784

    Article  CAS  Google Scholar 

  9. Berardini TZ, Reiser L, Li D et al (2015) The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome. Genesis 53:474–485

    Article  CAS  Google Scholar 

  10. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    Article  CAS  Google Scholar 

  11. Honaas LA, Wafula EK, Wickett NJ et al (2016) Selecting superior de novo transcriptome assemblies: lessons learned by leveraging the best plant genome. PLoS One 11:1–42

    Article  Google Scholar 

  12. Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652

    Article  CAS  Google Scholar 

  13. Smith-Unna R, Boursnell C, Patro R et al (2016) TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134–1144

    Article  CAS  Google Scholar 

  14. Leinonen R, Sugawara H, Shumway M et al (2011) The sequence read archive. Nucleic Acids Res 39:D19–D21

    Article  CAS  Google Scholar 

  15. Teng M, Love MI, Davis CA et al (2016) A benchmark for RNA-seq quantification pipelines. Genome Biol 17

    Google Scholar 

  16. Baruzzo G, Hayer KE, Kim EJ et al (2017) Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods 14:135–139

    Article  CAS  Google Scholar 

  17. Dobin A, Davis CA, Schlesinger F et al (2013) STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21

    Article  CAS  Google Scholar 

  18. Bray NL, Pimentel H, Melsted P et al (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525–527

    Article  CAS  Google Scholar 

  19. Patro R, Duggal G, Love MI et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419

    Article  CAS  Google Scholar 

  20. Wu DC, Yao J, Ho KS et al (2018) Limitations of alignment-free tools in total RNA-seq quantification. BMC Genomics 19:1–14

    Article  Google Scholar 

  21. Soneson C, Love MI, Robinson MD (2016) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 4:1521

    Article  Google Scholar 

  22. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:1–21

    Article  Google Scholar 

  23. Marini F and Binder H (2016) Gene expression pcaExplorer : an R/Bioconductor package for interacting with RNA-seq principal components. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2879-1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajeev K. Azad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Burks, D.J., Azad, R.K. (2022). RNA-Seq Data Analysis Pipeline for Plants: Transcriptome Assembly, Alignment, and Differential Expression Analysis. In: Shulaev, V. (eds) Plant Metabolic Engineering. Methods in Molecular Biology, vol 2396. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1822-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1822-6_5

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1821-9

  • Online ISBN: 978-1-0716-1822-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics