Maximum Likelihood Estimation of Incomplete Genomic Spectrum from HTS Data

  • Serghei Mangul
  • Irina Astrovskaya
  • Marius Nicolae
  • Bassam Tork
  • Ion Mandoiu
  • Alex Zelikovsky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6833)

Abstract

High-throughput sequencing makes possible to process samples containing multiple genomic sequences and then estimate their frequencies or even assemble them. The maximum likelihood estimation of frequencies of the sequences based on observed reads can be efficiently performed using expectation-maximization (EM) method assuming that we know sequences present in the sample. Frequently, such knowledge is incomplete, e.g., in RNA-seq not all isoforms are known and when sequencing viral quasispecies their sequences are unknown. We propose to enhance EM with a virtual string and incorporate it into frequency estimation tools for RNA-Seq and quasispecies sequencing. Our simulations show that EM enhanced with the virtual string estimates string frequencies more accurately than the original methods and that it can find the reads from missing quasispecies thus enabling their reconstruction.

Keywords

high-throughput sequencing expectation maximization viral quasispecies RNA-Sequencing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Astrovskaya, I., Tork, B., Mangul, S., Westbrooks, K., Mandoiu, I., Balfe, P., Zelikovsky, A.: Inferring viral spectrum from 454 pyrosequencing reads. BMC Bioinformatics (to appear), http://dna.engr.uconn.edu/bibtexmngr/upload/Aal.11a.pdf
  2. 2.
    Balser, S., Malde, K., Lanzen, A., Sharma, A., Jonassen, I.: Characteristics of 454 pyrosequencing data–enabling realistic simulation with flowsim. Bioinformatics 26, i420–i425 (2010)CrossRefGoogle Scholar
  3. 3.
    Zaitlen, N., Pasaniuc, B., Halperin, E.: Accurate estimation of expression levels of homologous genes in RNA-seq experiments. Journal of Computational Biology 18(3), 459–468 (2011)CrossRefGoogle Scholar
  4. 4.
    Eriksson, N., Pachter, L., Mitsuya, Y., Rhee, S.Y., Wang, C.: et al. Viral population estimation using pyrosequencing. PLoS Comput. Biol. 4, e1000074 (2008)CrossRefGoogle Scholar
  5. 5.
    Von Hahn, T., Yoon, J.C., Alter, H., Rice, C.M., Rehermann, B., Balfe, P., Mckeating, J.A.: Hepatitis c virus continuously escapes from neutralizing antibody and t-cell responses during chronic infection in vivo. Gastroenterology 132, 667–678 (2007)CrossRefGoogle Scholar
  6. 6.
    Hoffmann, S., Otto, C., Kurtz, S., Sharma, C.M., Khaitovich, P., Vogel, J., Stadler, P.F., Hackermüller, J.: Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comput. Biol. 5(9), e1000502 (2009)CrossRefGoogle Scholar
  7. 7.
    Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)CrossRefGoogle Scholar
  8. 8.
    Nicolae, M., Mangul, S., Mandoiu, I.I., Zelikovsky, A.: Estimation of alternative splicing isoform frequencies from RNA-seq data. Algorithms for Molecular Biology 6, 9 (2011)CrossRefGoogle Scholar
  9. 9.
    Mortazavi, A., Williams, B.A.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods (2008)Google Scholar
  10. 10.
    Zagordi, O., Geyrhofer, L., Roth, V., Beerenwinkel, N.: Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 17(3), 417–428 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Serghei Mangul
    • 1
  • Irina Astrovskaya
    • 1
  • Marius Nicolae
    • 2
  • Bassam Tork
    • 1
  • Ion Mandoiu
    • 2
  • Alex Zelikovsky
    • 1
  1. 1.Department of Computer ScienceGeorgia State UniversityAtlantaUSA
  2. 2.Department of Computer Science & EngineeringUniversity of ConnecticutStorrsUSA

Personalised recommendations