Data Mining and Knowledge Discovery

, Volume 17, Issue 3, pp 431–456 | Cite as

Parallell interacting MCMC for learning of topologies of graphical models

  • Jukka Corander
  • Magnus Ekdahl
  • Timo Koski


Automated statistical learning of graphical models from data has attained a considerable degree of interest in the machine learning and related literature. Many authors have discussed and/or demonstrated the need for consistent stochastic search methods that would not be as prone to yield locally optimal model structures as simple greedy methods. However, at the same time most of the stochastic search methods are based on a standard Metropolis–Hastings theory that necessitates the use of relatively simple random proposals and prevents the utilization of intelligent and efficient search operators. Here we derive an algorithm for learning topologies of graphical models from samples of a finite set of discrete variables by utilizing and further enhancing a recently introduced theory for non-reversible parallel interacting Markov chain Monte Carlo-style computation. In particular, we illustrate how the non-reversible approach allows for novel type of creativity in the design of search operators. Also, the parallel aspect of our method illustrates well the advantages of the adaptive nature of search operators to avoid trapping states in the vicinity of locally optimal network topologies.


MCMC Equivalence search Learning graphical models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Andersson SA, Madigan D, Perlman MD (1996) An alternative Markov property for chain graphs. In: Uncertainty in artificial intelligence: proceedings of the twelfth conference. Morgan Kaufmann, San Francisco, pp 40–48Google Scholar
  2. Andersson SA, Madigan D and Perlman MD (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann Statist 25: 505–541 zbMATHCrossRefMathSciNetGoogle Scholar
  3. Andersson SA, Madigan D and Perlman MD (2001). Alternative Markov properties for chain graphs. Scand J Stat 28: 33–85 zbMATHCrossRefMathSciNetGoogle Scholar
  4. Chickering DM (1995) A transformational characterization of equivalent Bayesian network structures. In: Uncertainty in artificial intelligence: proceedings of the eleventh conference. Morgan Kaufmann, San Francisco, pp 87–98Google Scholar
  5. Chickering DM (2002a). Learning equivalence classes of Bayesian network structures. J Mach Learn Res 2: 445–498 zbMATHCrossRefMathSciNetGoogle Scholar
  6. Chickering DM (2002b). Optimal structure identification with greedy search. J Mach Learn Res 3: 507–554 CrossRefMathSciNetGoogle Scholar
  7. Cooper G and Hershkovitz E (1992). A bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–347 zbMATHGoogle Scholar
  8. Corander J (2003). Bayesian graphical model determination using decision theory. J Multivariate Anal 85: 253–266 zbMATHCrossRefMathSciNetGoogle Scholar
  9. Corander J, Gyllenberg M and Koski T (2006). Bayesian model learning based on parallel mcmc strategy. Stat Comput 16: 355–362 CrossRefMathSciNetGoogle Scholar
  10. Cowell RG, Dawid AP, Lauritzen SL and Spiegelhalter DJ (1999). Probabilistic networks and expert systems. Springer, New York zbMATHGoogle Scholar
  11. Dawid AP (1979). Conditional independence in statistical theory. J Roy Stat Soc B 41: 1–31 zbMATHMathSciNetGoogle Scholar
  12. Dawid AP and Lauritzen SL (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann Statist 21: 1272–1317 zbMATHCrossRefMathSciNetGoogle Scholar
  13. Dellaportas P and Forster J (1999). Markov chain monte carlo model determination for hierarchical and graphical log-linear models. Biometrika 86: 615–633 zbMATHCrossRefMathSciNetGoogle Scholar
  14. Durrett R (1996). Probability: theory and examples. Duxbury Press, CA Google Scholar
  15. Frydenberg M (1990). The chain graph Markov property. Scand J Stat 17: 333–353 zbMATHMathSciNetGoogle Scholar
  16. Frydenberg M and Lauritzen SL (1989). Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika 76: 539–555 zbMATHCrossRefMathSciNetGoogle Scholar
  17. Geyer CJ and Thompson EA (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J Am Stat Assoc 90: 909–920 zbMATHCrossRefGoogle Scholar
  18. Gillispie SB, Perlman MD (2001) Enumerating Markov equivalence classes of acyclic digraph models. In: Uncertainty in artificial intelligence: proceedings of the seventeeth conference. Morgan Kaufmann, San Francisco, pp 171–177Google Scholar
  19. Giudici P and Castelo R (2003). Improving Markov chain Monte Carlo model search for data mining. Mach Learn 50: 127–158 zbMATHCrossRefGoogle Scholar
  20. Giudici P and Green PJ (1999). Decomposable graphical Gaussian model determination. Biometrika 86: 785–801 zbMATHCrossRefMathSciNetGoogle Scholar
  21. Isaacson DL and Madsen RW (1976). Markov Chains: theory and applications. Wiley, New York zbMATHGoogle Scholar
  22. Janzura M and Nielsen J (2006). A simulated annealing-based method for learning Bayesian networks from statistical data. Int J Intell Syst 21: 335–348 zbMATHCrossRefGoogle Scholar
  23. Jones B, Carvalho C and Dobra A et al (2005). Experiments in stochastic computation for high-dimensional graphical models. Stat Sci 20: 388–400 zbMATHCrossRefMathSciNetGoogle Scholar
  24. Jordan MI (1998). Learning in graphical models. MIT Press, Cumberland zbMATHGoogle Scholar
  25. Koivisto M and Sood K (2004). Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res 5: 549–573 MathSciNetGoogle Scholar
  26. Lam W and Bacchus F (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Comput Intell 10: 269–293 CrossRefGoogle Scholar
  27. Madigan D, Andersson S, Perlman M and Volinsky C (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Communtat Theor Meth 25: 2493–2519 zbMATHCrossRefGoogle Scholar
  28. Madigan D and Raftery A (1994). Model selection and accounting for model uncertainly in graphicalmodels using Occam’s window. J Am Stat Assoc 89: 1535–1546 zbMATHCrossRefGoogle Scholar
  29. Peña JM (2007) Approximate counting of graphical models via MCMC. In: Proceedings of the 11th international conference on artificial intelligence, pp 352–359Google Scholar
  30. Poli I and Roverato A (1998). A genetic algorithm for graphical model selection. J Italian Stat Soc 2: 197–208 CrossRefGoogle Scholar
  31. Riggelsen C (2005). MCMC learning of Bayesian network models by markov blanket decomposition. Springer, New York Google Scholar
  32. Robert C and Casella G (2004). Monte Carlo statistical methods, 2nd edn. Springer, New York zbMATHGoogle Scholar
  33. Roverato A and Studený M (2006). A graphical representation of equivalence classes of AMP chain graphs. J Mach Learn Res 7: 1045–1078 MathSciNetGoogle Scholar
  34. Sanguesa R and Cortes U (1997). Learning causal networks from data: a survey and a new algorithm to learn possibilistic causal networks from data.. AI Commun 4: 1–31 Google Scholar
  35. Spirtes P, Glymour C and Scheines R (1993). Causation, prediction and search. Springer, New York zbMATHGoogle Scholar
  36. Studený M (1998) Bayesian networks from the point of view of chain graphs. Uncertainty in Artificial Intelligence: In: proceedings of the twelfth conference. Morgan Kaufmann, San Francisco, pp 496–503Google Scholar
  37. Sundberg R (1975). Some results about decomposable (or markov-type) models for multidimensional contingency tables: distribution of marginals and partitioning of tests. Scand J Stat 2: 771–779 MathSciNetGoogle Scholar
  38. Suzuki J (1996) Learning Bayesian belief networks based on the minimum description length principle. In: International Conference Machine on Learning, Morgan Kaufmann, San Francisco, pp 462–470Google Scholar
  39. Suzuki J (2006). On strong consistency of model selection in classification. IEEE Trans Inform Theory 52: 4767–4774 CrossRefMathSciNetGoogle Scholar
  40. van Laarhoven PJM, Aarts EHJ (1987). Simulated annealing: theory and applications. Kluwer, Norwell zbMATHGoogle Scholar
  41. Verma E, Pearl J (1990) Equivalence and synthesis of causal models. In: Uncertainty in artificial intelligence: proceedings of the sixth conference. Elsevier, New York, pp 220–227Google Scholar
  42. Volf M and Studený M (1999). A graphical characterization of the largest chain graphs. Int J Approx Reason 20: 209–236 zbMATHCrossRefGoogle Scholar
  43. Wedelin D (1996). Efficient estimation and model selection in large graphical models. Stat Comput 6: 313–323 CrossRefGoogle Scholar
  44. Whittaker J (1990). Graphical models in applied multivariate statistics. Wiley, Chichester zbMATHGoogle Scholar
  45. Wong F, Carter C and Kohn R (2003). Efficient estimation of covariance selection models. Biometrika 90: 809–830 CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of MathematicsÅbo Akademi UniversityAboFinland
  2. 2.Department of MathematicsLinköping UniversityLinkopingSweden
  3. 3.Department of MathematicsRoyal Institute of TechnologyStockholmSweden

Personalised recommendations