Abstract
Using High-Throughput DNA Sequencing (HTS) to examine gene expression is rapidly becoming a viable choice and is typically referred to as RNA-seq. Often the depth and breadth of coverage of RNA-seq data can exceed what is achievable using microarrays. However, the strengths of RNA-seq are often its greatest weaknesses. Accurately and comprehensively mapping millions of relatively short reads to a reference genome sequence can require not only specialized software, but also more structured and automated procedures to manage, analyze, and visualize the data. Additionally, the computational hardware required to efficiently process and store the data can be a necessary and often-overlooked component of a research plan. We discuss several aspects of the computational analysis of RNA-seq, including file management and data quality control, analysis, and visualization. We provide a framework for a standard nomenclature system that can facilitate automation and the ability to track data provenance. Finally, we provide a general workflow of the computational analysis of RNA-seq and a downloadable package of scripts to automate the processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hannon (2011) FASTX-Toolkit, FASTQ/A short-reads pre-processing tools. http://hannonlab.cshl.edu/fastx_toolkit/index.html. Accessed 25 Feb 2011
Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Langmead B et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111
Trapnell C et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515
Langille MG, Eisen JA (2010) BioTorrents: a file sharing service for scientific data. PLoS One. doi:10.1371/journal.pone.0010071
Barrett T et al (2011) NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res 39:D1005–D1010
Parkinson H et al (2011) ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 39:D1002–D1004
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Longo MS, O’Neill MJ, O’Neill RJ (2011) Abundant human DNA contamination identified in non-primate genome databases. PLoS One. doi:10.1371/journal.pone.0016410
Tarailo-Graovac M, Chen N (2009) Using RepeatMasker to identify repetitive elements in genomic sequences. In: Baxevanis AD (ed) Current protocols in bioinformatics, vol Suppl 25. Wiley, New York
Schnable PS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115
Richard GF, Kerrest A, Dujon B (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 72:686–727
Vicient CM (2010) Transcriptional activity of transposable elements in maize. BMC Genomics. doi:doi:10.1186/1471-2164-11-601
Stein LD et al (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12:1599–1610
Milne I et al (2010) Tablet—next generation sequence assembly visualization. Bioinformatics 26:401–402
Acknowledgments
This work was supported by NSF grant 0701731 and a Missouri Life Sciences Trust Fund Research Grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Givan, S.A., Bottoms, C.A., Spollen, W.G. (2012). Computational Analysis of RNA-seq. In: Jin, H., Gassmann, W. (eds) RNA Abundance Analysis. Methods in Molecular Biology, vol 883. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-839-9_16
Download citation
DOI: https://doi.org/10.1007/978-1-61779-839-9_16
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-838-2
Online ISBN: 978-1-61779-839-9
eBook Packages: Springer Protocols