On-Line Viterbi Algorithm for Analysis of Long Biological Sequences
Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires O(mn) memory to annotate a sequence of length n with an m-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory Θ(mlogn) on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.
Keywordsbiological sequence analysis hidden Markov models on-line algorithms Viterbi algorithm gene finding
Unable to display preview. Download preview PDF.
- 2.Ohler, U., Niemann, H., Rubin, G.M.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(S1), 199–206 (2001)Google Scholar
- 7.Grice, J.A., Hughey, R., Speck, D.: Reduced space sequence alignment. Computer Applications in the Biosciences 13(1), 45–53 (1997)Google Scholar
- 18.Brejova, B., Brown, D.G., Vinar, T.: Advances in hidden Markov models for sequence annotation. In: Mandoiu, I., Zelikovski, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, Wiley, Chichester (to appear, 2007)Google Scholar
- 19.Guigo, R., et al.: EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology 7(S1), 1–31 (2006)Google Scholar