A Mixture Model Based Markov Random Field for Discovering Patterns in Sequences

  • Konstantinos Blekas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3955)


In this paper a new maximum a posteriori (MAP) approach based on mixtures of multinomials is proposed for discovering probabilistic patterns in sequences. The main advantage of the method is the ability to bypass the problem of overlapping patterns in neighboring positions of sequences by using a Markov random field (MRF) prior. This model consists of two components, the first models the pattern and the second the background. The Expectation-Maximization (EM) algorithm is used to estimate the model parameters and provides closed form updates. Special care is also taken to overcome the known dependence of the EM algorithm to initialization. This is done by applying an adaptive clustering scheme based on the k-means algorithm in order to produce good initial values for the pattern multinomial model. Experiments with artificial sets of sequences show that the proposed approach discovers qualitatively better patterns, in comparison with maximum likelihood (ML) and Gibbs sampling (GS) approaches.


Pattern discovering Markov random field mixture of multinomials model Expectation-Maximization (EM) algorithm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brāzma, A., Jonasses, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5(2), 277–303 (1998)Google Scholar
  2. 2.
    Bréjova, B., DiMarco, C., Vinař, T., Hidalgo, S.R., Holguin, G., Patten, C.: Finding patterns in biological sequences. Project Report for CS798g, University of Waterloo (2000)Google Scholar
  3. 3.
    Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwland, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 226, 208–214 (1993)CrossRefGoogle Scholar
  4. 4.
    Liu, J.S., Neuwald, A.F., Lawrence, C.E.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statistical Assoc 90, 1156–1169 (1995)CrossRefzbMATHGoogle Scholar
  5. 5.
    Bailey, T.L., Elkan, C.C.: Unsupervised learning of multiple motifs in Biopolymers using Expectation Maximization. Machine Learning 21, 51–83 (1995)Google Scholar
  6. 6.
    Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: Extension and analysis of the basic method. CABIOS 12(2), 95–107 (1996)Google Scholar
  7. 7.
    Liu, X., Brutlag, D.L., Liu, J.S.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)Google Scholar
  8. 8.
    Blekas, K., Fotiadis, D.I., Likas, A.: Greedy mixture learning for multiple motif discovering in biological sequences. Bioinformatics 19(5), 607–617 (2003)CrossRefGoogle Scholar
  9. 9.
    Xing, E.P., Wu, W., Jordan, M.I., Karp, R.M.: LOGOS: A modular Bayesian model for de novo motif detection. Journal of Bioinformatics and Computational Biology 2(1), 127–154 (2004)CrossRefGoogle Scholar
  10. 10.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  11. 11.
    McLachlan, G.M., Peel, D.: Finite Mixture Models. John Wiley & Sons, Inc., New York (2001)zbMATHGoogle Scholar
  12. 12.
    Besag, J.: Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Stat. Soc., ser. B 36(2), 192–326 (1975)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. on Pattern Analysis and Machine Intelligence 6, 721–741 (1984)CrossRefzbMATHGoogle Scholar
  14. 14.
    Blekas, K., Likas, A., Galatsanos, N.P., Lagaris, I.E.: A Spatially-Constrained Mixture Model for Image Segmentation. IEEE Trans. on Neural Networks 62(2), 494–498 (2005)CrossRefGoogle Scholar
  15. 15.
    Huang, Z.: Extensions to the k-Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Konstantinos Blekas
    • 1
  1. 1.Department of Computer ScienceUniversity of IoanninaIoanninaGreece

Personalised recommendations