Abstract
The ability to identify and quantify transcribed sequences from a multitude of organisms using high-throughput RNA sequencing has revolutionized our understanding of genetics and plant biology. However, a number of computational tools used in these analyses still require a reference genome sequence, something that is seldom available for non-model organisms. Computational tools employing de Bruijn graphs to reconstruct full-length transcripts from short sequence reads allow for de novo transcriptome assembly. Here we provide detailed methods for generating and annotating de novo transcriptome assembly from plant RNA-seq data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140
Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol 28:511–515
Afgan E, Sloggett C, Goonasekera N et al (2015) Genomics virtual laboratory: a practical bioinformatics workbench for the cloud. PLoS One 10:e0140829
Ungaro A, Pech N, Martin J-F et al (2017) Challenges and advances for transcriptome assembly in non-model species. PLoS One 12:e0185020
Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet 6:361
Atallah NM, Vitek O, Gaiti F et al (2018) Sex determination in Ceratopteris richardii is accompanied by transcriptome changes that drive epigenetic reprogramming of the young gametophyte. G3 Genes Genomes Genet 8:2205–2214
Kerr SC, Gaiti F, Beveridge CA et al (2017) De novo transcriptome assembly reveals high transcriptional complexity in Pisum sativum axillary buds and shows rapid changes in expression of diurnally regulated genes. BMC Genomics 18:221
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Schulz MH, Zerbino DR, Vingron M et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092
Bogdanov EA, Shagina I, Barsova EV, et al (2010) Normalizing cDNA libraries. Curr Protoc Mol Biol Chapter 5:Unit 5.12.1-27
Schmieder R, Edwards R (2011) Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One 6:e17288
Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37
Petersen TN, Brunak S, von Heijne G et al (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8:785–786
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277
Kong L, Zhang Y, Ye Z-Q et al (2007) CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35:W345–W349
Nawrocki EP, Burge SW, Bateman A et al (2015) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 43:D130–D137
Lai Z, Kane NC, Kozik A et al (2012) Genomics of Compositae weeds: EST libraries, microarrays, and evidence of introgression. Am J Bot 99:209–218
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Supplementary File 1
: The get_longest_ORF_per_transcript.pl Perl script used in Subheading 3.5, step 5 for identifying the longest open reading frame from each transcript present in your transcriptome assembly (PL 1 kb)
Supplementary File 2
: The lncRNA_pipeline.sh bash script used in Subheading 3.5, step 9 (SH 4 kb)
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Kerr, S.C., Gaiti, F., Tanurdzic, M. (2019). De Novo Plant Transcriptome Assembly and Annotation Using Illumina RNA-Seq Reads. In: Chekanova, J.A., Wang, HL.V. (eds) Plant Long Non-Coding RNAs. Methods in Molecular Biology, vol 1933. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9045-0_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9045-0_16
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-9044-3
Online ISBN: 978-1-4939-9045-0
eBook Packages: Springer Protocols