On-Line Viterbi Algorithm for Analysis of Long Biological Sequences

  • Rastislav Šrámek
  • Broňa Brejová
  • Tomáš Vinař
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4645)

Abstract

Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires O(mn) memory to annotate a sequence of length n with an m-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory Θ(mlogn) on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.

Keywords

biological sequence analysis hidden Markov models on-line algorithms Viterbi algorithm gene finding 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268(1), 78–94 (1997)CrossRefGoogle Scholar
  2. 2.
    Ohler, U., Niemann, H., Rubin, G.M.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(S1), 199–206 (2001)Google Scholar
  3. 3.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge (1998)MATHGoogle Scholar
  4. 4.
    Pedersen, J.S., Hein, J.: Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19(2), 219–227 (2003)CrossRefGoogle Scholar
  5. 5.
    Siepel, A., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research 15(8), 1034–1040 (2005)CrossRefGoogle Scholar
  6. 6.
    Forney Jr., G.D.: The Viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Grice, J.A., Hughey, R., Speck, D.: Reduced space sequence alignment. Computer Applications in the Biosciences 13(1), 45–53 (1997)Google Scholar
  8. 8.
    Tarnas, C., Hughey, R.: Reduced space hidden Markov model training. Bioinformatics 14(5), 401–406 (1998)CrossRefGoogle Scholar
  9. 9.
    Wheeler, R., Hughey, R.: Optimizing reduced-space sequence analysis. Bioinformatics 16(12), 1082–1090 (2000)CrossRefGoogle Scholar
  10. 10.
    Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. Journal of Computational Biology 4(2), 127–131 (1997)CrossRefGoogle Scholar
  11. 11.
    Hemmati, F., Costello Jr., D.: Truncation error probability in Viterbi decoding. IEEE Transactions on Communications 25(5), 530–532 (1977)MATHCrossRefGoogle Scholar
  12. 12.
    Onyszchuk, I.: Truncation length for Viterbi decoding. IEEE Transactions on Communications 39(7), 1023–1026 (1991)CrossRefGoogle Scholar
  13. 13.
    Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn., vol. 1. Wiley, Chichester (1968)MATHGoogle Scholar
  14. 14.
    Guibas, L.J., Odlyzko, A.M.: Long repetitive patterns in random sequences. Probability Theory and Related Fields 53, 241–262 (1980)MATHMathSciNetGoogle Scholar
  15. 15.
    Gordon, L., Schilling, M.F., Waterman, M.S.: An extreme value theory for long head runs. Probability Theory and Related Fields 72, 279–287 (1986)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Schuster, E.F.: On overwhelming numerical evidence in the settling of Kinney’s waiting-time conjecture. SIAM Journal on Scientific and Statistical Computing 6(4), 977–982 (1985)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. Journal of Computer and System Sciences 70(3), 342–363 (2005)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Brejova, B., Brown, D.G., Vinar, T.: Advances in hidden Markov models for sequence annotation. In: Mandoiu, I., Zelikovski, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, Wiley, Chichester (to appear, 2007)Google Scholar
  19. 19.
    Guigo, R., et al.: EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology 7(S1), 1–31 (2006)Google Scholar
  20. 20.
    Brejova, B., Brown, D.G., Li, M., Vinar, T.: ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21(S1), i57–65 (2005)CrossRefGoogle Scholar
  21. 21.
    Keibler, E., Arumugam, M., Brent, M.R.: The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs. Bioinformatics 23(5), 545–554 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Rastislav Šrámek
    • 1
  • Broňa Brejová
    • 2
  • Tomáš Vinař
    • 2
  1. 1.Department of Computer Science, Comenius University, 842 48 BratislavaSlovakia
  2. 2.Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853USA

Personalised recommendations