Bayesian Learning with Mixtures of Trees

  • Jussi Kollin
  • Mikko Koivisto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


We present a Bayesian method for learning mixtures of graphical models. In particular, we focus on data clustering with a tree-structured model for each cluster. We use a Markov chain Monte Carlo method to draw a sample of clusterings, while the likelihood of a clustering is computed by exact averaging over the model class, including the dependency structure on the variables. Experiments on synthetic data show that this method usually outperforms the expectation–maximization algorithm by Meilă and Jordan [1] when the number of observations is small (hundreds) and the number of variables is large (dozens). We apply the method to study how much single nucleotide polymorphisms carry information about the structure of human populations.


Mixture Model Markov Chain Monte Carlo Bayesian Network Markov Chain Monte Carlo Method Dirichlet Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Meilă, M., Jordan, M.I.: Learning with mixtures of trees. Journal of Machine Learning Research 1, 1–48 (2000)Google Scholar
  2. 2.
    Efron, B.: Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 1–26 (1979)Google Scholar
  3. 3.
    Alfaro, M.E., Zoller, S., Lutzoni, F.: Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Molecular Biology and Evolution 20, 255–266 (2003)Google Scholar
  4. 4.
    Diebolt, J., Robert, C.P.: Estimation of finite mixture distributions through Bayesian sampling. J. Royal Statistical Society B 56, 363–375 (1994)Google Scholar
  5. 5.
    Jasra, A., Holmes, C.C., Stephens, D.A.: Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modelling. Statistical Science 20, 50–67 (2005)Google Scholar
  6. 6.
    Celeux, G., Hurn, M., Robert, C.P.: Computational and inferential difficulties with mixture posterior distributions. J. Amer. Statist. Assoc. 95, 957–970 (2000)Google Scholar
  7. 7.
    Neal, R.M.: Markov chain sampling method for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9, 249–265 (2000)Google Scholar
  8. 8.
    Rasmussen, C.E.: The Infinite Gaussian Mixture Model. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) NIPS 12, pp. 554–560. The MIT Press, Cambridge (2000)Google Scholar
  9. 9.
    Dawson, K.J., Belkhir, K.: A Bayesian approach to the identification of panmictic population and the assignment of individuals. Genetical Research 78, 59–77 (2001)Google Scholar
  10. 10.
    Corander, J., Waldman, P., Sillanpää, M.J.: Bayesian analysis of genetic differentiation between populations. Genetics 163, 367–374 (2003)Google Scholar
  11. 11.
    Meila, M., Jaakkola, T.: Tractable Bayesian learning of tree belief networks. In: Boutilier, C., Goldszmidt, M. (eds.) UAI, pp. 380–388. Morgan Kaufmann, San Francisco (2000)Google Scholar
  12. 12.
    Hinds, D.A., et al.: Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005)Google Scholar
  13. 13.
    Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995)Google Scholar
  14. 14.
    Friedman, N., Koller, D.: Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning 50(1–2), 95–125 (2003)Google Scholar
  15. 15.
    Koivisto, M., Sood, K.: Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research 5, 549–573 (2004)Google Scholar
  16. 16.
    Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14, 462–467 (1968)Google Scholar
  17. 17.
    Meilă, M., Jaakkola, T.: Tractable Bayesian learning of tree belief networks. Technical Report CMU-RI-TR-00-15, Carnegie Mellon University Robotics Institute (2000)Google Scholar
  18. 18.
    Cerquides, J., de Mántaras, R.L.: TAN classifiers based on decomposable distributions. Machine Learning 59, 1–32 (2005)Google Scholar
  19. 19.
    Kaltofen, E., Villard, G.: On the complexity of computing determinants. Computational Complexity 13, 91–130 (2004)Google Scholar
  20. 20.
    Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)Google Scholar
  21. 21.
    Jaccard, P.: The distribution of the flora in the Alpine zone. The New Phytologist XI, 37–50 (1912)Google Scholar
  22. 22.
    Meilă, M.: Comparing Clusterings by the Variation of Information. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS, vol. 2777, pp. 173–187. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  23. 23.
    Thomas, A., Camp, N.J.: Graphical modeling of the joint distribution of alleles at associated loci. American Journal of Human Genetics 74, 1088–1101 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jussi Kollin
    • 1
  • Mikko Koivisto
    • 1
  1. 1.HIIT Basic Research Unit, Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations