De Novo Plant Transcriptome Assembly and Annotation Using Illumina RNA-Seq Reads

Kerr, Stephanie C.; Gaiti, Federico; Tanurdzic, Milos

doi:10.1007/978-1-4939-9045-0_16

Stephanie C. Kerr⁴,
Federico Gaiti^nAff2 &
Milos Tanurdzic⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1933))

2778 Accesses
8 Citations
1 Altmetric

Abstract

The ability to identify and quantify transcribed sequences from a multitude of organisms using high-throughput RNA sequencing has revolutionized our understanding of genetics and plant biology. However, a number of computational tools used in these analyses still require a reference genome sequence, something that is seldom available for non-model organisms. Computational tools employing de Bruijn graphs to reconstruct full-length transcripts from short sequence reads allow for de novo transcriptome assembly. Here we provide detailed methods for generating and annotating de novo transcriptome assembly from plant RNA-seq data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36
Article Google Scholar
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
Article CAS Google Scholar
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550
Article Google Scholar
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140
Article CAS Google Scholar
Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol 28:511–515
Article CAS Google Scholar
Afgan E, Sloggett C, Goonasekera N et al (2015) Genomics virtual laboratory: a practical bioinformatics workbench for the cloud. PLoS One 10:e0140829
Article Google Scholar
Ungaro A, Pech N, Martin J-F et al (2017) Challenges and advances for transcriptome assembly in non-model species. PLoS One 12:e0185020
Article Google Scholar
Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet 6:361
Article Google Scholar
Atallah NM, Vitek O, Gaiti F et al (2018) Sex determination in Ceratopteris richardii is accompanied by transcriptome changes that drive epigenetic reprogramming of the young gametophyte. G3 Genes Genomes Genet 8:2205–2214
Google Scholar
Kerr SC, Gaiti F, Beveridge CA et al (2017) De novo transcriptome assembly reveals high transcriptional complexity in Pisum sativum axillary buds and shows rapid changes in expression of diurnally regulated genes. BMC Genomics 18:221
Article Google Scholar
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Article CAS Google Scholar
Schulz MH, Zerbino DR, Vingron M et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092
Article CAS Google Scholar
Bogdanov EA, Shagina I, Barsova EV, et al (2010) Normalizing cDNA libraries. Curr Protoc Mol Biol Chapter 5:Unit 5.12.1-27
Google Scholar
Schmieder R, Edwards R (2011) Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One 6:e17288
Article CAS Google Scholar
Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512
Article CAS Google Scholar
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37
Article CAS Google Scholar
Petersen TN, Brunak S, von Heijne G et al (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8:785–786
Article CAS Google Scholar
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277
Article CAS Google Scholar
Kong L, Zhang Y, Ye Z-Q et al (2007) CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35:W345–W349
Article Google Scholar
Nawrocki EP, Burge SW, Bateman A et al (2015) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 43:D130–D137
Article CAS Google Scholar
Lai Z, Kane NC, Kozik A et al (2012) Genomics of Compositae weeds: EST libraries, microarrays, and evidence of introgression. Am J Bot 99:209–218
Article CAS Google Scholar

Download references

Author information

Federico Gaiti
Present address: New York Genome Center and Department of Medicine, Weill Cornell Medicine, New York, NY, USA

Authors and Affiliations

School of Biological Sciences, The University of Queensland, St Lucia, QLD, Australia
Stephanie C. Kerr & Milos Tanurdzic

Authors

Stephanie C. Kerr
View author publications
You can also search for this author in PubMed Google Scholar
Federico Gaiti
View author publications
You can also search for this author in PubMed Google Scholar
Milos Tanurdzic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Milos Tanurdzic .

Editor information

Editors and Affiliations

Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning, Guangxi, China
Julia A. Chekanova & Hsiao-Lin V. Wang &
Department of Biology, Emory University, Atlanta, Georgia, USA
Hsiao-Lin V. Wang

1 Electronic Supplementary Material

Supplementary File 1

: The get_longest_ORF_per_transcript.pl Perl script used in Subheading 3.5, step 5 for identifying the longest open reading frame from each transcript present in your transcriptome assembly (PL 1 kb)

Supplementary File 2

: The lncRNA_pipeline.sh bash script used in Subheading 3.5, step 9 (SH 4 kb)

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Kerr, S.C., Gaiti, F., Tanurdzic, M. (2019). De Novo Plant Transcriptome Assembly and Annotation Using Illumina RNA-Seq Reads. In: Chekanova, J.A., Wang, HL.V. (eds) Plant Long Non-Coding RNAs. Methods in Molecular Biology, vol 1933. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9045-0_16

Download citation

DOI: https://doi.org/10.1007/978-1-4939-9045-0_16
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-9044-3
Online ISBN: 978-1-4939-9045-0
eBook Packages: Springer Protocols

Publish with us

Policies and ethics