Skip to main content

Computational Analysis of RNA-seq

  • Protocol
  • First Online:
RNA Abundance Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 883))

Abstract

Using High-Throughput DNA Sequencing (HTS) to examine gene expression is rapidly becoming a ­viable choice and is typically referred to as RNA-seq. Often the depth and breadth of coverage of RNA-seq data can exceed what is achievable using microarrays. However, the strengths of RNA-seq are often its greatest weaknesses. Accurately and comprehensively mapping millions of relatively short reads to a reference genome sequence can require not only specialized software, but also more structured and automated procedures to manage, analyze, and visualize the data. Additionally, the computational hardware required to efficiently process and store the data can be a necessary and often-overlooked component of a research plan. We discuss several aspects of the computational analysis of RNA-seq, including file management and data quality control, analysis, and visualization. We provide a framework for a standard nomenclature ­system that can facilitate automation and the ability to track data provenance. Finally, we provide a general workflow of the computational analysis of RNA-seq and a downloadable package of scripts to automate the processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hannon (2011) FASTX-Toolkit, FASTQ/A short-reads pre-processing tools. http://hannonlab.cshl.edu/fastx_toolkit/index.html. Accessed 25 Feb 2011

  2. Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  Google Scholar 

  3. Langmead B et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    Article  PubMed  Google Scholar 

  4. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111

    Article  PubMed  CAS  Google Scholar 

  5. Trapnell C et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515

    Article  PubMed  CAS  Google Scholar 

  6. Langille MG, Eisen JA (2010) BioTorrents: a file sharing service for scientific data. PLoS One. doi:10.1371/journal.pone.0010071

    Google Scholar 

  7. Barrett T et al (2011) NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res 39:D1005–D1010

    Article  PubMed  Google Scholar 

  8. Parkinson H et al (2011) ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 39:D1002–D1004

    Article  PubMed  Google Scholar 

  9. Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194

    PubMed  CAS  Google Scholar 

  10. Longo MS, O’Neill MJ, O’Neill RJ (2011) Abundant human DNA contamination identified in non-primate genome databases. PLoS One. doi:10.1371/journal.pone.0016410

    Google Scholar 

  11. Tarailo-Graovac M, Chen N (2009) Using RepeatMasker to identify repetitive elements in genomic sequences. In: Baxevanis AD (ed) Current protocols in bioinformatics, vol Suppl 25. Wiley, New York

    Google Scholar 

  12. Schnable PS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115

    Article  PubMed  CAS  Google Scholar 

  13. Richard GF, Kerrest A, Dujon B (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 72:686–727

    Article  PubMed  CAS  Google Scholar 

  14. Vicient CM (2010) Transcriptional activity of transposable elements in maize. BMC Genomics. doi:doi:10.1186/1471-2164-11-601

    Google Scholar 

  15. Stein LD et al (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12:1599–1610

    Article  PubMed  CAS  Google Scholar 

  16. Milne I et al (2010) Tablet—next generation sequence assembly visualization. Bioinformatics 26:401–402

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This work was supported by NSF grant 0701731 and a Missouri Life Sciences Trust Fund Research Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Scott A. Givan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Givan, S.A., Bottoms, C.A., Spollen, W.G. (2012). Computational Analysis of RNA-seq. In: Jin, H., Gassmann, W. (eds) RNA Abundance Analysis. Methods in Molecular Biology, vol 883. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-839-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-839-9_16

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-838-2

  • Online ISBN: 978-1-61779-839-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics