Observations on potential novel transcripts from RNA-Seq data

  • Chao Ye
  • Linxi Liu
  • Xi Wang
  • Xuegong Zhang
Research Article


With the rapid development of next generation deep sequencing technologies, sequencing cDNA reverse-transcribed from RNA molecules (RNA-Seq) has become a key approach in studying gene expression and transcriptomes. Because RNA-Seq does not rely on annotation of known genes, it provides the opportunity of discovering transcripts that have not been annotated in current databases. Studying the distribution of RNASeq signals and a systematic view on the potential new transcripts revealed from the signals is an important step toward the understanding of transcriptomes.


RNA-Seq novel transcripts next generation sequencing bioinformatics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mercer T R, Dinger M E, Mattick J S. Long non-coding RNAs: insights into functions. Nature Reviews Genetics, 2009, 10(3): 155–159CrossRefGoogle Scholar
  2. 2.
    van Bakel H, Hughes T R. Establishing legitimacy and function in the new transcriptome. Briefings in Functional Genomics & Proteomics, 2009, 8(6): 424–436CrossRefGoogle Scholar
  3. 3.
    Schena M, Shalon D, Davis R W, Brown P O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 1995, 270(5235): 467–470CrossRefGoogle Scholar
  4. 4.
    Shendure J, Ji H. Next-generation DNA sequencing. Nature Biotechnology, 2008, 26(10): 1135–1145CrossRefGoogle Scholar
  5. 5.
    Metzker M L. Sequencing technologies — the next generation. Nature Reviews Genetics, 2010, 11(1): 31–46CrossRefGoogle Scholar
  6. 6.
    Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 2009, 10(1): 57–63CrossRefGoogle Scholar
  7. 7.
    Cock P J, Fields C J, Goto N, Heier M L, Rice P M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 2010, 38(6): 1767–1771CrossRefGoogle Scholar
  8. 8.
    Mortazavi A, Williams B A, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 2008, 5(7): 621–628CrossRefGoogle Scholar
  9. 9.
    Marioni J C, Mason C E, Mane S M, Stephens M, Gilad Y. RNA-Seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Research, 2008, 18(9): 1509–1517CrossRefGoogle Scholar
  10. 10.
    Friedlaender M R, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, Rajewsky N. Discovering micro-RNAs from deep sequencing data using miRDeep. Nature Biotechnology, 2008, 26(4): 407–415CrossRefGoogle Scholar
  11. 11.
    Pan Q, Shai O, Lee L J, Frey B J, Blencowe B J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics, 2008, 40(12): 1413–1415CrossRefGoogle Scholar
  12. 12.
    Wang E T, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore S F, Schroth G P, Burge C B. Alternative isoform regulation in human tissue transcriptomes. Nature, 2008, 456(7221): 470–476CrossRefGoogle Scholar
  13. 13.
    Jiang H, Wong W H. Statistical inferences for Isoform expression in RNA-Seq. Bioinformatics, 2009, 25(8): 1026–1032CrossRefGoogle Scholar
  14. 14.
    Homer N, Merriman B, Nelson S F. BFAST: an alignment tool for large scale genome resequencing. PLoS One, 2009, 4(11): e7767CrossRefGoogle Scholar
  15. 15.
    Langmead B, Trapnel C, Pop M, Salzberg S L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 2009, 10(3): R25CrossRefGoogle Scholar
  16. 16.
    Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 2008, 18(11): 1851–1858CrossRefGoogle Scholar
  17. 17.
    Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25(9): 1105–1111CrossRefGoogle Scholar
  18. 18.
    Au K F, Jiang H, Lin L, Xing Y, Wong W H. Detection of splice junctions from paired-end RNA-Seq data by SpliceMap. Nucleic Acids Research, 2010, 38(14): 4570–4578CrossRefGoogle Scholar
  19. 19.
    Wang K, Singh D, Zeng Z, Coleman S J, Huang Y, Savich G L, He X, Mieczkowski P, Grimm S A, Perou C M, MacLeod J N, Chiang D Y, Prins J F, Liu J. MapSplice: accurate mapping of RNA-Seq reads for splice junction discovery. Nucleic Acids Research, 2010, 38(18): e178CrossRefGoogle Scholar
  20. 20.
    Trapnell C, Salzberg S L. How to map billions of short reads onto genomes. Nature Biotechnology, 2009, 27(5): 455–457CrossRefGoogle Scholar
  21. 21.
    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/MAP format and SAMtools. Bioinformatics, 2009, 25(16): 2078–2079CrossRefGoogle Scholar
  22. 22.
    Pruitt K D, Tatusova T, Maglott D R. NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research, 2005, 33(suppl 1): D501–D504Google Scholar
  23. 23.
    Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M. The Ensemble genome database project. Nucleic Acids Research, 2002, 30(1): 38–41CrossRefGoogle Scholar
  24. 24.
    Harrow J, Denoeud F, Frankish A, Reymond A, Chen C K, Chrast J, Lagarde J, Gilbert J G R, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis S E, Guigo R. GENCODE: producing a reference annotation for ENCODE. Genome Biology, 2006, 7(Suppl 1): S4.1–S4.9Google Scholar
  25. 25.
    Wang L K, Feng Z X, Wang X, Wang X W, Zhang X G. DEGseq: an R package for identifying differentially expressed genes from RNA-Seq data. Bioinformatics, 2010, 26(1): 136–138CrossRefGoogle Scholar
  26. 26.
    Kent W J, Sugnet C W, Furey T S, Roskin K M, Pringle T H, Zahler A M, Haussler D. The human genome browser at UCSC. Genome Research, 2002, 12(6): 996–1006Google Scholar
  27. 27.
    Robinson J T, Thorvaldsdóttir H, Winckler W, Guttman M, Lander E S, Getz G, Mesirov J P. Integrative genomics viewer. Nature Biotechnology, 2011, 29(1): 24–26CrossRefGoogle Scholar
  28. 28.
    Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 1995, 57(1): 289–300MathSciNetMATHGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Key Laboratory of Bioinformatics and Bioinformatics Division, Ministry of Education, Tsinghua National Laboratory for Information Science and Technology/Department of AutomationTsinghua UniversityBeijingChina

Personalised recommendations