Skip to main content

On-Line Viterbi Algorithm for Analysis of Long Biological Sequences

  • Conference paper
Algorithms in Bioinformatics (WABI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4645))

Included in the following conference series:

Abstract

Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires O(mn) memory to annotate a sequence of length n with an m-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory Θ(mlogn) on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268(1), 78–94 (1997)

    Article  Google Scholar 

  2. Ohler, U., Niemann, H., Rubin, G.M.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(S1), 199–206 (2001)

    Google Scholar 

  3. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  4. Pedersen, J.S., Hein, J.: Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19(2), 219–227 (2003)

    Article  Google Scholar 

  5. Siepel, A., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research 15(8), 1034–1040 (2005)

    Article  Google Scholar 

  6. Forney Jr., G.D.: The Viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  7. Grice, J.A., Hughey, R., Speck, D.: Reduced space sequence alignment. Computer Applications in the Biosciences 13(1), 45–53 (1997)

    Google Scholar 

  8. Tarnas, C., Hughey, R.: Reduced space hidden Markov model training. Bioinformatics 14(5), 401–406 (1998)

    Article  Google Scholar 

  9. Wheeler, R., Hughey, R.: Optimizing reduced-space sequence analysis. Bioinformatics 16(12), 1082–1090 (2000)

    Article  Google Scholar 

  10. Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. Journal of Computational Biology 4(2), 127–131 (1997)

    Article  Google Scholar 

  11. Hemmati, F., Costello Jr., D.: Truncation error probability in Viterbi decoding. IEEE Transactions on Communications 25(5), 530–532 (1977)

    Article  MATH  Google Scholar 

  12. Onyszchuk, I.: Truncation length for Viterbi decoding. IEEE Transactions on Communications 39(7), 1023–1026 (1991)

    Article  Google Scholar 

  13. Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn., vol. 1. Wiley, Chichester (1968)

    MATH  Google Scholar 

  14. Guibas, L.J., Odlyzko, A.M.: Long repetitive patterns in random sequences. Probability Theory and Related Fields 53, 241–262 (1980)

    MATH  MathSciNet  Google Scholar 

  15. Gordon, L., Schilling, M.F., Waterman, M.S.: An extreme value theory for long head runs. Probability Theory and Related Fields 72, 279–287 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  16. Schuster, E.F.: On overwhelming numerical evidence in the settling of Kinney’s waiting-time conjecture. SIAM Journal on Scientific and Statistical Computing 6(4), 977–982 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  17. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. Journal of Computer and System Sciences 70(3), 342–363 (2005)

    Article  MathSciNet  Google Scholar 

  18. Brejova, B., Brown, D.G., Vinar, T.: Advances in hidden Markov models for sequence annotation. In: Mandoiu, I., Zelikovski, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, Wiley, Chichester (to appear, 2007)

    Google Scholar 

  19. Guigo, R., et al.: EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology 7(S1), 1–31 (2006)

    Google Scholar 

  20. Brejova, B., Brown, D.G., Li, M., Vinar, T.: ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21(S1), i57–65 (2005)

    Article  Google Scholar 

  21. Keibler, E., Arumugam, M., Brent, M.R.: The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs. Bioinformatics 23(5), 545–554 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Raffaele Giancarlo Sridhar Hannenhalli

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Šrámek, R., Brejová, B., Vinař, T. (2007). On-Line Viterbi Algorithm for Analysis of Long Biological Sequences. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74126-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74125-1

  • Online ISBN: 978-3-540-74126-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics