Abstract
Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires O(mn) memory to annotate a sequence of length n with an m-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory Θ(mlogn) on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268(1), 78–94 (1997)
Ohler, U., Niemann, H., Rubin, G.M.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(S1), 199–206 (2001)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge (1998)
Pedersen, J.S., Hein, J.: Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19(2), 219–227 (2003)
Siepel, A., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research 15(8), 1034–1040 (2005)
Forney Jr., G.D.: The Viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)
Grice, J.A., Hughey, R., Speck, D.: Reduced space sequence alignment. Computer Applications in the Biosciences 13(1), 45–53 (1997)
Tarnas, C., Hughey, R.: Reduced space hidden Markov model training. Bioinformatics 14(5), 401–406 (1998)
Wheeler, R., Hughey, R.: Optimizing reduced-space sequence analysis. Bioinformatics 16(12), 1082–1090 (2000)
Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. Journal of Computational Biology 4(2), 127–131 (1997)
Hemmati, F., Costello Jr., D.: Truncation error probability in Viterbi decoding. IEEE Transactions on Communications 25(5), 530–532 (1977)
Onyszchuk, I.: Truncation length for Viterbi decoding. IEEE Transactions on Communications 39(7), 1023–1026 (1991)
Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn., vol. 1. Wiley, Chichester (1968)
Guibas, L.J., Odlyzko, A.M.: Long repetitive patterns in random sequences. Probability Theory and Related Fields 53, 241–262 (1980)
Gordon, L., Schilling, M.F., Waterman, M.S.: An extreme value theory for long head runs. Probability Theory and Related Fields 72, 279–287 (1986)
Schuster, E.F.: On overwhelming numerical evidence in the settling of Kinney’s waiting-time conjecture. SIAM Journal on Scientific and Statistical Computing 6(4), 977–982 (1985)
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. Journal of Computer and System Sciences 70(3), 342–363 (2005)
Brejova, B., Brown, D.G., Vinar, T.: Advances in hidden Markov models for sequence annotation. In: Mandoiu, I., Zelikovski, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, Wiley, Chichester (to appear, 2007)
Guigo, R., et al.: EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology 7(S1), 1–31 (2006)
Brejova, B., Brown, D.G., Li, M., Vinar, T.: ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21(S1), i57–65 (2005)
Keibler, E., Arumugam, M., Brent, M.R.: The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs. Bioinformatics 23(5), 545–554 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Šrámek, R., Brejová, B., Vinař, T. (2007). On-Line Viterbi Algorithm for Analysis of Long Biological Sequences. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-74126-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)