Skip to main content

De Novo Plant Transcriptome Assembly and Annotation Using Illumina RNA-Seq Reads

  • Protocol
Book cover Plant Long Non-Coding RNAs

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1933))

Abstract

The ability to identify and quantify transcribed sequences from a multitude of organisms using high-throughput RNA sequencing has revolutionized our understanding of genetics and plant biology. However, a number of computational tools used in these analyses still require a reference genome sequence, something that is seldom available for non-model organisms. Computational tools employing de Bruijn graphs to reconstruct full-length transcripts from short sequence reads allow for de novo transcriptome assembly. Here we provide detailed methods for generating and annotating de novo transcriptome assembly from plant RNA-seq data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36

    Article  Google Scholar 

  2. Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21

    Article  CAS  Google Scholar 

  3. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550

    Article  Google Scholar 

  4. Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140

    Article  CAS  Google Scholar 

  5. Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol 28:511–515

    Article  CAS  Google Scholar 

  6. Afgan E, Sloggett C, Goonasekera N et al (2015) Genomics virtual laboratory: a practical bioinformatics workbench for the cloud. PLoS One 10:e0140829

    Article  Google Scholar 

  7. Ungaro A, Pech N, Martin J-F et al (2017) Challenges and advances for transcriptome assembly in non-model species. PLoS One 12:e0185020

    Article  Google Scholar 

  8. Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet 6:361

    Article  Google Scholar 

  9. Atallah NM, Vitek O, Gaiti F et al (2018) Sex determination in Ceratopteris richardii is accompanied by transcriptome changes that drive epigenetic reprogramming of the young gametophyte. G3 Genes Genomes Genet 8:2205–2214

    Google Scholar 

  10. Kerr SC, Gaiti F, Beveridge CA et al (2017) De novo transcriptome assembly reveals high transcriptional complexity in Pisum sativum axillary buds and shows rapid changes in expression of diurnally regulated genes. BMC Genomics 18:221

    Article  Google Scholar 

  11. Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652

    Article  CAS  Google Scholar 

  12. Schulz MH, Zerbino DR, Vingron M et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092

    Article  CAS  Google Scholar 

  13. Bogdanov EA, Shagina I, Barsova EV, et al (2010) Normalizing cDNA libraries. Curr Protoc Mol Biol Chapter 5:Unit 5.12.1-27

    Google Scholar 

  14. Schmieder R, Edwards R (2011) Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One 6:e17288

    Article  CAS  Google Scholar 

  15. Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512

    Article  CAS  Google Scholar 

  16. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37

    Article  CAS  Google Scholar 

  17. Petersen TN, Brunak S, von Heijne G et al (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8:785–786

    Article  CAS  Google Scholar 

  18. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277

    Article  CAS  Google Scholar 

  19. Kong L, Zhang Y, Ye Z-Q et al (2007) CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35:W345–W349

    Article  Google Scholar 

  20. Nawrocki EP, Burge SW, Bateman A et al (2015) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 43:D130–D137

    Article  CAS  Google Scholar 

  21. Lai Z, Kane NC, Kozik A et al (2012) Genomics of Compositae weeds: EST libraries, microarrays, and evidence of introgression. Am J Bot 99:209–218

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milos Tanurdzic .

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Supplementary File 1

: The get_longest_ORF_per_transcript.pl Perl script used in Subheading 3.5, step 5 for identifying the longest open reading frame from each transcript present in your transcriptome assembly (PL 1 kb)

Supplementary File 2

: The lncRNA_pipeline.sh bash script used in Subheading 3.5, step 9 (SH 4 kb)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Cite this protocol

Kerr, S.C., Gaiti, F., Tanurdzic, M. (2019). De Novo Plant Transcriptome Assembly and Annotation Using Illumina RNA-Seq Reads. In: Chekanova, J.A., Wang, HL.V. (eds) Plant Long Non-Coding RNAs. Methods in Molecular Biology, vol 1933. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9045-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9045-0_16

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-9044-3

  • Online ISBN: 978-1-4939-9045-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics