Abstract
RNA-Seq technology is becoming widely used in various transcriptomics studies; however, analyzing and interpreting the RNA-Seq data face serious challenges. With the development of high-throughput sequencing technologies, the sequencing cost is dropping dramatically with the sequencing output increasing sharply. However, the sequencing reads are still short in length and contain various sequencing errors. Moreover, the intricate transcriptome is always more complicated than we expect. These challenges proffer the urgent need of efficient bioinformatics algorithms to effectively handle the large amount of transcriptome sequencing data and carry out diverse related studies. This review summarizes a number of frequently-used applications of transcriptome sequencing and their related analyzing strategies, including short read mapping, exon-exon splice junction detection, gene or isoform expression quantification, differential expression analysis and transcriptome reconstruction.
Article PDF
Similar content being viewed by others
References
Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009, 10: 57–63
Marguerat S, Bahler J. RNA-seq: from technology to biology. Cell Mol Life Sci, 2010, 67: 569–579
Ozsolak F, Milos P M. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet, 2011, 12: 87–98
Sultan M, Schulz M H, Richard H, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science, 2008, 321: 956–960
Gan Q, Chepelev I, Wei G, et al. Dynamic regulation of alternative splicing and chromatin structure in Drosophila gonads revealed by RNA-seq. Cell Res, 2010, 20: 763–783
Mortazavi A, Williams B A, McCue K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods, 2008, 5: 621–628
Trapnell C, Williams B A, Pertea G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol, 2010, 28: 511–515
Maher C A, Kumar-Sinha C, Cao X, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature, 2009, 458: 97–101
Pflueger D, Terry S, Sboner A, et al. Discovery of non-ETS gene fusions in human prostate cancer using next-generation RNA sequencing. Genome Res, 2011, 21: 56–67
Guttman M, Garber M, Levin J Z, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol, 2010, 28: 503–510
Chepelev I, Wei G, Tang Q, et al. Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq. Nucleic Acids Res, 2009, 37: e106
Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res, 2008, 18: 1851–1858
Lin H, Zhang Z, Zhang M Q, et al. ZOOM! Zillions of oligos mapped. Bioinformatics, 2008, 24: 2431–2437
Smith A D, Xuan Z, Zhang M Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics, 2008, 9: 128
Jiang H, Wong W H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics, 2008, 24: 2395–2396
Li R, Li Y, Kristiansen K, et al. SOAP: short oligonucleotide alignment program. Bioinformatics, 2008, 24: 713–714
Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009, 10: R25
Li R, Yu C, Li Y, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 2009, 25: 1966–1967
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25: 1754–1760
Rumble S M, Lacroute P, Dalca A V, et al. SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol, 2009, 5: e1000386
Ning Z, Cox A J, Mullikin J C. SSAHA: a fast search method for large DNA databases. Genome Res, 2001, 11: 1725–1729
Trapnell C, Salzberg S L. How to map billions of short reads onto genomes. Nat Biotechnol, 2009, 27: 455–457
Flicek P, Birney E. Sense from sequence reads: methods for alignment and assembly. Nat Methods, 2009, 6: S6–S12
Faulkner G J, Forrest A R, Chalk A M, et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics, 2008, 91: 281–288
Li B, Ruotti V, Stewart R M, et al. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 2010, 26: 493–500
Black D L. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem, 2003, 72: 291–336
Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25: 1105–1111
Au K F, Jiang H, Lin L, et al. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res, 2010, 38: 4570–4578
Wang K, Singh D, Zeng Z, et al. MapSplice: accurate mapping of RNA-Seq reads for splice junction discovery. Nucleic Acids Res, 2010, 38: e178
Huang S, Zhang J, Li R, et al. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Front. Gene, 2011, 2: 46
Dimon M T, Sorber K, DeRisi J L. HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. PLoS ONE, 2010, 5: e13875
Ameur A, Wetterbom A, Feuk L, et al. Global and unbiased detection of splice junctions from RNA-seq data. Genome Biol, 2010, 11: R34
Bryant D W Jr., Shen R, Priest H D, et al. Supersplat-spliced RNA-seq alignment. Bioinformatics, 2010, 26: 1500–1505
Chen G, Yin K, Shi L, et al. Comparative analysis of human protein-coding and noncoding RNAs between brain and 10 mixed cell lines by RNA-Seq. PLoS ONE, 2011, 6: e28318
Griffith M, Griffith O L, Mwenifumbo J, et al. Alternative expression analysis by RNA sequencing. Nat Methods, 2010, 7: 843–847
Katz Y, Wang E T, Airoldi E M, et al. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods, 2010, 7: 1009–1015
Feng J, Li W, Jiang T. Inference of isoforms from short sequence reads. J Comput Biol, 2011, 18: 305–321
Turro E, Su S Y, Goncalves A, et al. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol, 2011, 12: R13
Jiang H, Wong W H. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics, 2009, 25: 1026–1032
Tarazona S, Garcia-Alcalde F, Dopazo J, et al. Differential expression in RNA-seq: A matter of depth. Genome Res, 2011, 21: 2213–2223
Hardcastle T J, Kelly K A. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 2010, 11: 422
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol, 2010, 11: R106
Robinson M D, McCarthy D J, Smyth G K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 2010, 26: 139–140
Wang L, Feng Z, Wang X, et al. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics, 2010, 26: 136–138
Srivastava S, Chen L. A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res, 2010, 38: e170
Langmead B, Hansen K D, Leek J T. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol, 2010, 11: R83
Wu Z, Jenkins B D, Rynearson T A, et al. Empirical bayes analysis of sequencing-based transcriptional profiling without replicates. BMC Bioinformatics, 2010, 11: 564
Cumbie J S, Kimbrel J A, Di Y, et al. GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences. PLoS ONE, 2011, 6: e25279
Garber M, Grabherr M G, Guttman M, et al. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods, 2011, 8: 469–477
Zerbino D R, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 2008, 18: 821–829
Robertson G, Schein J, Chiu R, et al. De novo assembly and analysis of RNA-seq data. Nat Methods, 2010, 7: 909–912
Grabherr M G, Haas B J, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol, 2011, 29: 644–652
Martin J, Bruno V M, Fang Z, et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics, 2010, 11: 663
Chen G, Li R, Shi L, et al. Revealing the missing expressed genes beyond the human reference genome by RNA-Seq. BMC Genomics, 2011, 12: 590
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is published with open access at Springerlink.com
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Chen, G., Wang, C. & Shi, T. Overview of available methods for diverse RNA-Seq data analyses. Sci. China Life Sci. 54, 1121–1128 (2011). https://doi.org/10.1007/s11427-011-4255-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-011-4255-x