Estimation of Alternative Splicing isoform Frequencies from RNA-Seq Data

  • Marius Nicolae
  • Serghei Mangul
  • Ion Măndoiu
  • Alex Zelikovsky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6293)

Abstract

In this paper we present a novel expectation-maximization algorithm for inference of alternative splicing isoform frequencies from high-throughput transcriptome sequencing (RNA-Seq) data. Our algorithm exploits disambiguation information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information if available. Empirical experiments on synthetic datasets show that the algorithm significantly outperforms existing methods of isoform and gene expression level estimation from RNA-Seq data. The Java implementation of IsoEM is available at http://dna.engr.uconn.edu/software/IsoEM/.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anton, M., Gorostiaga, D., Guruceaga, E., Segura, V., Carmona-Saez, P., Pascual-Montano, A., Pio, R., Montuenga, L., Rubio, A.: SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays. Genome Biology 9(2), R46 (2008)Google Scholar
  2. 2.
    Birol, I., Jackman, S.D., Nielsen, C.B., Qian, J.Q., Varhol, R., Stazyk, G., Morin, R.D., Zhao, Y., Hirst, M., Schein, J.E., Horsman, D.E., Connors, J.M., Gascoyne, R.D., Marra, M.A., Jones, S.J.M.: De novo transcriptome assembly with ABySS. Bioinformatics 25(21), 2872–2877 (2009)CrossRefGoogle Scholar
  3. 3.
    Carninci, P., et al.: The Transcriptional Landscape of the Mammalian Genome. Science 309(5740), 1559–1563 (2005)CrossRefGoogle Scholar
  4. 4.
    Feng, J., Li, W., Jiang, T.: Inference of isoforms from short sequence reads. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 138–157. Springer, Heidelberg (2010)Google Scholar
  5. 5.
    Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucl. Acids Res. p. gkq224 (2010) (advance access)Google Scholar
  6. 6.
    Hiller, D., Jiang, H., Xu, W., Wong, W.H.: Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics 25(23), 3056–3059 (2009)CrossRefGoogle Scholar
  7. 7.
    Jackson, B., Schnable, P., Aluru, S.: Parallel short sequence assembly of transcriptomes. BMC Bioinformatics 10(suppl. 1), S14+ (2009)Google Scholar
  8. 8.
    Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25(8), 1026–1032 (2009)CrossRefGoogle Scholar
  9. 9.
    Lacroix, V., Sammeth, M., Guigo, R., Bergeron, A.: Exact transcriptome reconstruction from short sequence reads. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 50–63. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3), R25 (2009)Google Scholar
  11. 11.
    Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)CrossRefGoogle Scholar
  12. 12.
    Mortazavi, A., Williams, B.A.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods (2008)Google Scholar
  13. 13.
    Paşaniuc, B., Zaitlen, N., Halperin, E.: Accurate estimation of expression levels of homologous genes in RNA-seq experiments. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 397–409. Springer, Heidelberg (2010)Google Scholar
  14. 14.
    Richard, H., Schulz, M.H., Sultan, M., Nurnberger, A., Schrinner, S., Balzereit, D., Dagand, E., Rasche, A., Lehrach, H., Vingron, M., Haas, S.A., Yaspo, M.-L.: Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments. Nucl. Acids Res. 38(10), e112+ (2010)CrossRefGoogle Scholar
  15. 15.
    She, Y., Hubbell, E., Wang, H.: Resolving deconvolution ambiguity in gene alternative splicing. BMC Bioinformatics 10(1), 237 (2009)CrossRefGoogle Scholar
  16. 16.
    Temple, G., et al.: The completion of the Mammalian Gene Collection (MGC). Genome Research 19(12), 2324–2333 (2009)CrossRefGoogle Scholar
  17. 17.
    Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)CrossRefGoogle Scholar
  18. 18.
    Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28(5), 511–515 (2010)CrossRefGoogle Scholar
  19. 19.
    Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470–476 (2008)CrossRefGoogle Scholar
  20. 20.
    Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Marius Nicolae
    • 1
  • Serghei Mangul
    • 2
  • Ion Măndoiu
    • 1
  • Alex Zelikovsky
    • 2
  1. 1.Computer Science & Engineering DepartmentUniversity of ConnecticutStorrs
  2. 2.Computer Science DepartmentGeorgia State University, University PlazaGeorgia

Personalised recommendations