Computational Analysis of RNA-seq

Givan, Scott A.; Bottoms, Christopher A.; Spollen, William G.

doi:10.1007/978-1-61779-839-9_16

Scott A. Givan³,
Christopher A. Bottoms⁴ &
William G. Spollen⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 883))

5107 Accesses
9 Citations
14 Altmetric

Abstract

Using High-Throughput DNA Sequencing (HTS) to examine gene expression is rapidly becoming a viable choice and is typically referred to as RNA-seq. Often the depth and breadth of coverage of RNA-seq data can exceed what is achievable using microarrays. However, the strengths of RNA-seq are often its greatest weaknesses. Accurately and comprehensively mapping millions of relatively short reads to a reference genome sequence can require not only specialized software, but also more structured and automated procedures to manage, analyze, and visualize the data. Additionally, the computational hardware required to efficiently process and store the data can be a necessary and often-overlooked component of a research plan. We discuss several aspects of the computational analysis of RNA-seq, including file management and data quality control, analysis, and visualization. We provide a framework for a standard nomenclature system that can facilitate automation and the ability to track data provenance. Finally, we provide a general workflow of the computational analysis of RNA-seq and a downloadable package of scripts to automate the processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hannon (2011) FASTX-Toolkit, FASTQ/A short-reads pre-processing tools. http://hannonlab.cshl.edu/fastx_toolkit/index.html. Accessed 25 Feb 2011
Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Article PubMed Google Scholar
Langmead B et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25
Article PubMed Google Scholar
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111
Article PubMed CAS Google Scholar
Trapnell C et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515
Article PubMed CAS Google Scholar
Langille MG, Eisen JA (2010) BioTorrents: a file sharing service for scientific data. PLoS One. doi:10.1371/journal.pone.0010071
Google Scholar
Barrett T et al (2011) NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res 39:D1005–D1010
Article PubMed Google Scholar
Parkinson H et al (2011) ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 39:D1002–D1004
Article PubMed Google Scholar
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
PubMed CAS Google Scholar
Longo MS, O’Neill MJ, O’Neill RJ (2011) Abundant human DNA contamination identified in non-primate genome databases. PLoS One. doi:10.1371/journal.pone.0016410
Google Scholar
Tarailo-Graovac M, Chen N (2009) Using RepeatMasker to identify repetitive elements in genomic sequences. In: Baxevanis AD (ed) Current protocols in bioinformatics, vol Suppl 25. Wiley, New York
Google Scholar
Schnable PS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115
Article PubMed CAS Google Scholar
Richard GF, Kerrest A, Dujon B (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 72:686–727
Article PubMed CAS Google Scholar
Vicient CM (2010) Transcriptional activity of transposable elements in maize. BMC Genomics. doi:doi:10.1186/1471-2164-11-601
Google Scholar
Stein LD et al (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12:1599–1610
Article PubMed CAS Google Scholar
Milne I et al (2010) Tablet—next generation sequence assembly visualization. Bioinformatics 26:401–402
Article PubMed CAS Google Scholar

Download references

Acknowledgments

This work was supported by NSF grant 0701731 and a Missouri Life Sciences Trust Fund Research Grant.

Author information

Authors and Affiliations

Department of Molecular Microbiology and Immunology, Informatics Research Core Facility, University of Missouri, Columbia, MO, USA
Scott A. Givan
Informatics Research Core Facility, University of Missouri, Columbia, MO, USA
Christopher A. Bottoms & William G. Spollen

Authors

Scott A. Givan
View author publications
You can also search for this author in PubMed Google Scholar
Christopher A. Bottoms
View author publications
You can also search for this author in PubMed Google Scholar
William G. Spollen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Scott A. Givan .

Editor information

Editors and Affiliations

, Dept. Plant Pathology & Microbiology, University of California, University Avenue 900, Riverside, 92521, USA
Hailing Jin
, Division of Plant Sciences, University of Missouri, Agriculture Building 1-31, Columbia, 65211, Missouri, USA
Walter Gassmann

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Givan, S.A., Bottoms, C.A., Spollen, W.G. (2012). Computational Analysis of RNA-seq. In: Jin, H., Gassmann, W. (eds) RNA Abundance Analysis. Methods in Molecular Biology, vol 883. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-839-9_16

Download citation

DOI: https://doi.org/10.1007/978-1-61779-839-9_16
Published: 06 April 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-838-2
Online ISBN: 978-1-61779-839-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics