Science China Life Sciences

, Volume 54, Issue 12, pp 1121–1128 | Cite as

Overview of available methods for diverse RNA-Seq data analyses

  • Geng Chen
  • Charles Wang
  • TieLiu ShiEmail author
Open Access


RNA-Seq technology is becoming widely used in various transcriptomics studies; however, analyzing and interpreting the RNA-Seq data face serious challenges. With the development of high-throughput sequencing technologies, the sequencing cost is dropping dramatically with the sequencing output increasing sharply. However, the sequencing reads are still short in length and contain various sequencing errors. Moreover, the intricate transcriptome is always more complicated than we expect. These challenges proffer the urgent need of efficient bioinformatics algorithms to effectively handle the large amount of transcriptome sequencing data and carry out diverse related studies. This review summarizes a number of frequently-used applications of transcriptome sequencing and their related analyzing strategies, including short read mapping, exon-exon splice junction detection, gene or isoform expression quantification, differential expression analysis and transcriptome reconstruction.


next generation sequencing transcriptome RNA-Seq data analysis transcriptomics 


  1. 1.
    Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009, 10: 57–63PubMedPubMedCentralCrossRefGoogle Scholar
  2. 2.
    Marguerat S, Bahler J. RNA-seq: from technology to biology. Cell Mol Life Sci, 2010, 67: 569–579PubMedPubMedCentralCrossRefGoogle Scholar
  3. 3.
    Ozsolak F, Milos P M. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet, 2011, 12: 87–98PubMedPubMedCentralCrossRefGoogle Scholar
  4. 4.
    Sultan M, Schulz M H, Richard H, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science, 2008, 321: 956–960PubMedCrossRefGoogle Scholar
  5. 5.
    Gan Q, Chepelev I, Wei G, et al. Dynamic regulation of alternative splicing and chromatin structure in Drosophila gonads revealed by RNA-seq. Cell Res, 2010, 20: 763–783PubMedPubMedCentralCrossRefGoogle Scholar
  6. 6.
    Mortazavi A, Williams B A, McCue K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods, 2008, 5: 621–628PubMedCrossRefGoogle Scholar
  7. 7.
    Trapnell C, Williams B A, Pertea G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol, 2010, 28: 511–515PubMedPubMedCentralCrossRefGoogle Scholar
  8. 8.
    Maher C A, Kumar-Sinha C, Cao X, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature, 2009, 458: 97–101PubMedPubMedCentralCrossRefGoogle Scholar
  9. 9.
    Pflueger D, Terry S, Sboner A, et al. Discovery of non-ETS gene fusions in human prostate cancer using next-generation RNA sequencing. Genome Res, 2011, 21: 56–67PubMedPubMedCentralCrossRefGoogle Scholar
  10. 10.
    Guttman M, Garber M, Levin J Z, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol, 2010, 28: 503–510PubMedPubMedCentralCrossRefGoogle Scholar
  11. 11.
    Chepelev I, Wei G, Tang Q, et al. Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq. Nucleic Acids Res, 2009, 37: e106PubMedPubMedCentralCrossRefGoogle Scholar
  12. 12.
    Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res, 2008, 18: 1851–1858PubMedPubMedCentralCrossRefGoogle Scholar
  13. 13.
    Lin H, Zhang Z, Zhang M Q, et al. ZOOM! Zillions of oligos mapped. Bioinformatics, 2008, 24: 2431–2437PubMedPubMedCentralCrossRefGoogle Scholar
  14. 14.
    Smith A D, Xuan Z, Zhang M Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics, 2008, 9: 128PubMedPubMedCentralCrossRefGoogle Scholar
  15. 15.
    Jiang H, Wong W H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics, 2008, 24: 2395–2396PubMedPubMedCentralCrossRefGoogle Scholar
  16. 16.
    Li R, Li Y, Kristiansen K, et al. SOAP: short oligonucleotide alignment program. Bioinformatics, 2008, 24: 713–714PubMedCrossRefGoogle Scholar
  17. 17.
    Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009, 10: R25PubMedPubMedCentralCrossRefGoogle Scholar
  18. 18.
    Li R, Yu C, Li Y, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 2009, 25: 1966–1967PubMedCrossRefGoogle Scholar
  19. 19.
    Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25: 1754–1760PubMedPubMedCentralCrossRefGoogle Scholar
  20. 20.
    Rumble S M, Lacroute P, Dalca A V, et al. SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol, 2009, 5: e1000386PubMedPubMedCentralCrossRefGoogle Scholar
  21. 21.
    Ning Z, Cox A J, Mullikin J C. SSAHA: a fast search method for large DNA databases. Genome Res, 2001, 11: 1725–1729PubMedPubMedCentralCrossRefGoogle Scholar
  22. 22.
    Trapnell C, Salzberg S L. How to map billions of short reads onto genomes. Nat Biotechnol, 2009, 27: 455–457PubMedPubMedCentralCrossRefGoogle Scholar
  23. 23.
    Flicek P, Birney E. Sense from sequence reads: methods for alignment and assembly. Nat Methods, 2009, 6: S6–S12PubMedCrossRefGoogle Scholar
  24. 24.
    Faulkner G J, Forrest A R, Chalk A M, et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics, 2008, 91: 281–288PubMedCrossRefGoogle Scholar
  25. 25.
    Li B, Ruotti V, Stewart R M, et al. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 2010, 26: 493–500PubMedPubMedCentralCrossRefGoogle Scholar
  26. 26.
    Black D L. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem, 2003, 72: 291–336PubMedCrossRefGoogle Scholar
  27. 27.
    Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25: 1105–1111PubMedPubMedCentralCrossRefGoogle Scholar
  28. 28.
    Au K F, Jiang H, Lin L, et al. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res, 2010, 38: 4570–4578PubMedPubMedCentralCrossRefGoogle Scholar
  29. 29.
    Wang K, Singh D, Zeng Z, et al. MapSplice: accurate mapping of RNA-Seq reads for splice junction discovery. Nucleic Acids Res, 2010, 38: e178PubMedPubMedCentralCrossRefGoogle Scholar
  30. 30.
    Huang S, Zhang J, Li R, et al. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Front. Gene, 2011, 2: 46CrossRefGoogle Scholar
  31. 31.
    Dimon M T, Sorber K, DeRisi J L. HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. PLoS ONE, 2010, 5: e13875PubMedPubMedCentralCrossRefGoogle Scholar
  32. 32.
    Ameur A, Wetterbom A, Feuk L, et al. Global and unbiased detection of splice junctions from RNA-seq data. Genome Biol, 2010, 11: R34PubMedPubMedCentralCrossRefGoogle Scholar
  33. 33.
    Bryant D W Jr., Shen R, Priest H D, et al. Supersplat-spliced RNA-seq alignment. Bioinformatics, 2010, 26: 1500–1505PubMedPubMedCentralCrossRefGoogle Scholar
  34. 34.
    Chen G, Yin K, Shi L, et al. Comparative analysis of human protein-coding and noncoding RNAs between brain and 10 mixed cell lines by RNA-Seq. PLoS ONE, 2011, 6: e28318PubMedPubMedCentralCrossRefGoogle Scholar
  35. 35.
    Griffith M, Griffith O L, Mwenifumbo J, et al. Alternative expression analysis by RNA sequencing. Nat Methods, 2010, 7: 843–847PubMedCrossRefGoogle Scholar
  36. 36.
    Katz Y, Wang E T, Airoldi E M, et al. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods, 2010, 7: 1009–1015PubMedPubMedCentralCrossRefGoogle Scholar
  37. 37.
    Feng J, Li W, Jiang T. Inference of isoforms from short sequence reads. J Comput Biol, 2011, 18: 305–321PubMedPubMedCentralCrossRefGoogle Scholar
  38. 38.
    Turro E, Su S Y, Goncalves A, et al. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol, 2011, 12: R13PubMedPubMedCentralCrossRefGoogle Scholar
  39. 39.
    Jiang H, Wong W H. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics, 2009, 25: 1026–1032PubMedPubMedCentralCrossRefGoogle Scholar
  40. 40.
    Tarazona S, Garcia-Alcalde F, Dopazo J, et al. Differential expression in RNA-seq: A matter of depth. Genome Res, 2011, 21: 2213–2223PubMedPubMedCentralCrossRefGoogle Scholar
  41. 41.
    Hardcastle T J, Kelly K A. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 2010, 11: 422PubMedPubMedCentralCrossRefGoogle Scholar
  42. 42.
    Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol, 2010, 11: R106PubMedPubMedCentralCrossRefGoogle Scholar
  43. 43.
    Robinson M D, McCarthy D J, Smyth G K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 2010, 26: 139–140PubMedPubMedCentralCrossRefGoogle Scholar
  44. 44.
    Wang L, Feng Z, Wang X, et al. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics, 2010, 26: 136–138PubMedCrossRefGoogle Scholar
  45. 45.
    Srivastava S, Chen L. A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res, 2010, 38: e170PubMedPubMedCentralCrossRefGoogle Scholar
  46. 46.
    Langmead B, Hansen K D, Leek J T. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol, 2010, 11: R83PubMedPubMedCentralCrossRefGoogle Scholar
  47. 47.
    Wu Z, Jenkins B D, Rynearson T A, et al. Empirical bayes analysis of sequencing-based transcriptional profiling without replicates. BMC Bioinformatics, 2010, 11: 564PubMedPubMedCentralCrossRefGoogle Scholar
  48. 48.
    Cumbie J S, Kimbrel J A, Di Y, et al. GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences. PLoS ONE, 2011, 6: e25279PubMedPubMedCentralCrossRefGoogle Scholar
  49. 49.
    Garber M, Grabherr M G, Guttman M, et al. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods, 2011, 8: 469–477PubMedCrossRefGoogle Scholar
  50. 50.
    Zerbino D R, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 2008, 18: 821–829PubMedPubMedCentralCrossRefGoogle Scholar
  51. 51.
    Robertson G, Schein J, Chiu R, et al. De novo assembly and analysis of RNA-seq data. Nat Methods, 2010, 7: 909–912PubMedCrossRefGoogle Scholar
  52. 52.
    Grabherr M G, Haas B J, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol, 2011, 29: 644–652PubMedPubMedCentralCrossRefGoogle Scholar
  53. 53.
    Martin J, Bruno V M, Fang Z, et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics, 2010, 11: 663PubMedPubMedCentralCrossRefGoogle Scholar
  54. 54.
    Chen G, Li R, Shi L, et al. Revealing the missing expressed genes beyond the human reference genome by RNA-Seq. BMC Genomics, 2011, 12: 590PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© The Author(s) 2011
This article is published under license to BioMed Central Ltd.

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.

Open AccessThis is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  1. 1.Center for Bioinformatics and Computational Biology, Institute of Biomedical Sciences, School of Life ScienceEast China Normal UniversityShanghaiChina
  2. 2.Functional Genomics Core, Beckman Research InstituteCity of Hope Comprehensive Cancer CenterDuarteUSA
  3. 3.Shanghai Information Center for Life Sciences, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiChina

Personalised recommendations