Computational Analysis of RNA-seq
Using High-Throughput DNA Sequencing (HTS) to examine gene expression is rapidly becoming a viable choice and is typically referred to as RNA-seq. Often the depth and breadth of coverage of RNA-seq data can exceed what is achievable using microarrays. However, the strengths of RNA-seq are often its greatest weaknesses. Accurately and comprehensively mapping millions of relatively short reads to a reference genome sequence can require not only specialized software, but also more structured and automated procedures to manage, analyze, and visualize the data. Additionally, the computational hardware required to efficiently process and store the data can be a necessary and often-overlooked component of a research plan. We discuss several aspects of the computational analysis of RNA-seq, including file management and data quality control, analysis, and visualization. We provide a framework for a standard nomenclature system that can facilitate automation and the ability to track data provenance. Finally, we provide a general workflow of the computational analysis of RNA-seq and a downloadable package of scripts to automate the processing.
Key wordsHigh-throughput DNA sequencing RNA-seq Gene expression Data processing
This work was supported by NSF grant 0701731 and a Missouri Life Sciences Trust Fund Research Grant.
- 1.Hannon (2011) FASTX-Toolkit, FASTQ/A short-reads pre-processing tools. http://hannonlab.cshl.edu/fastx_toolkit/index.html. Accessed 25 Feb 2011
- 6.Langille MG, Eisen JA (2010) BioTorrents: a file sharing service for scientific data. PLoS One. doi:10.1371/journal.pone.0010071Google Scholar
- 10.Longo MS, O’Neill MJ, O’Neill RJ (2011) Abundant human DNA contamination identified in non-primate genome databases. PLoS One. doi:10.1371/journal.pone.0016410Google Scholar
- 11.Tarailo-Graovac M, Chen N (2009) Using RepeatMasker to identify repetitive elements in genomic sequences. In: Baxevanis AD (ed) Current protocols in bioinformatics, vol Suppl 25. Wiley, New YorkGoogle Scholar
- 14.Vicient CM (2010) Transcriptional activity of transposable elements in maize. BMC Genomics. doi:doi:10.1186/1471-2164-11-601Google Scholar