A Mixture Model Based Markov Random Field for Discovering Patterns in Sequences
In this paper a new maximum a posteriori (MAP) approach based on mixtures of multinomials is proposed for discovering probabilistic patterns in sequences. The main advantage of the method is the ability to bypass the problem of overlapping patterns in neighboring positions of sequences by using a Markov random field (MRF) prior. This model consists of two components, the first models the pattern and the second the background. The Expectation-Maximization (EM) algorithm is used to estimate the model parameters and provides closed form updates. Special care is also taken to overcome the known dependence of the EM algorithm to initialization. This is done by applying an adaptive clustering scheme based on the k-means algorithm in order to produce good initial values for the pattern multinomial model. Experiments with artificial sets of sequences show that the proposed approach discovers qualitatively better patterns, in comparison with maximum likelihood (ML) and Gibbs sampling (GS) approaches.
KeywordsPattern discovering Markov random field mixture of multinomials model Expectation-Maximization (EM) algorithm
Unable to display preview. Download preview PDF.
- 1.Brāzma, A., Jonasses, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5(2), 277–303 (1998)Google Scholar
- 2.Bréjova, B., DiMarco, C., Vinař, T., Hidalgo, S.R., Holguin, G., Patten, C.: Finding patterns in biological sequences. Project Report for CS798g, University of Waterloo (2000)Google Scholar
- 5.Bailey, T.L., Elkan, C.C.: Unsupervised learning of multiple motifs in Biopolymers using Expectation Maximization. Machine Learning 21, 51–83 (1995)Google Scholar
- 6.Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: Extension and analysis of the basic method. CABIOS 12(2), 95–107 (1996)Google Scholar
- 7.Liu, X., Brutlag, D.L., Liu, J.S.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)Google Scholar