Efficiently Approximating Markov Tree Bagging for High-Dimensional Density Estimation

  • François Schnitzler
  • Sourour Ammar
  • Philippe Leray
  • Pierre Geurts
  • Louis Wehenkel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)

Abstract

We consider algorithms for generating Mixtures of Bagged Markov Trees, for density estimation. In problems defined over many variables and when few observations are available, those mixtures generally outperform a single Markov tree maximizing the data likelihood, but are far more expensive to compute. In this paper, we describe new algorithms for approximating such models, with the aim of speeding up learning without sacrificing accuracy. More specifically, we propose to use a filtering step obtained as a by-product from computing a first Markov tree, so as to avoid considering poor candidate edges in the subsequently generated trees. We compare these algorithms (on synthetic data sets) to Mixtures of Bagged Markov Trees, as well as to a single Markov tree derived by the classical Chow-Liu algorithm and to a recently proposed randomized scheme used for building tree mixtures.

Keywords

mixture models Markov trees bagging randomization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aliferis, C., Statnikov, A., Tsamardinos, I., Mani, S., Koutsoukos, X.: Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. JMLR 11, 171–234 (2010)MATHMathSciNetGoogle Scholar
  2. 2.
    Ammar, S., Leray, P., Defourny, B., Wehenkel, L.: Probability density estimation by perturbing and combining tree structured Markov networks. In: Sossai, C., Chemello, G. (eds.) ECSQARU 2009. LNCS, vol. 5590, pp. 156–167. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Ammar, S., Leray, P., Schnitzler, F., Wehenkel, L.: Sub-quadratic Markov tree mixture learning based on randomizations of the Chow-Liu algorithm. In: The Fifth European Workshop on Probabilistic Graphical Models, pp. 17–24 (2010)Google Scholar
  4. 4.
    Ammar, S., Leray, P., Wehenkel, L.: Sub-quadratic Markov tree mixture models for probability density estimation. In: 19th International Conference on Computational Statistics (COMP-STAT 2010), pp. 673–680 (2010)Google Scholar
  5. 5.
    Auvray, V., Wehenkel, L.: On the construction of the inclusion boundary neighbourhood for Markov equivalence classes of bayesian network structures. In: Proceedings of 18th Conference on Uncertainty in Artificial Intelligence, pp. 26–35 (2002)Google Scholar
  6. 6.
    Auvray, V., Wehenkel, L.: Learning inclusion-optimal chordal graphs. In: Proceedings of 24th Conference on Uncertainty in Artificial Intelligence, pp. 18–25 (2008)Google Scholar
  7. 7.
    Bach, F.R., Jordan, M.I.: Thin junction trees. In: Advances in Neural Information Processing Systems, vol. 14, pp. 569–576. MIT Press, Cambridge (2001)Google Scholar
  8. 8.
    Breiman, L.: Arcing classifiers. Tech. rep., Dept. of Statistics, University of California (1996)Google Scholar
  9. 9.
    Chazelle, B.: A minimum spanning tree algorithm with inverse-Ackermann type complexity. J. ACM 47(6), 1028–1047 (2000)CrossRefMATHMathSciNetGoogle Scholar
  10. 10.
    Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14, 462–467 (1968)CrossRefMATHGoogle Scholar
  11. 11.
    Cooper, G.: The computational complexity of probabilistic inference using bayesian belief networks. Artificial Intelligence 42(2-3), 393–405 (1990)CrossRefMATHMathSciNetGoogle Scholar
  12. 12.
    Efron, B., Tibshirani, R.: An introduction to the bootstrap. Chapman & Hall, Boca Raton (1993)CrossRefMATHGoogle Scholar
  13. 13.
    Elidan, G.: Bagged structure learning of bayesian network. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (2011)Google Scholar
  14. 14.
    Elidan, G., Gould, S.: Learning bounded treewidth bayesian networks. JMLR 9, 2699–2731 (2008)MATHMathSciNetGoogle Scholar
  15. 15.
    Friedman, N., Goldszmidt, M., Wyner, A.: Data analysis with bayesian networks: A bootstrap approach. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 196–205. (1999)Google Scholar
  16. 16.
    Friedman, N., Nachman, I., Peér, D.: Learning bayesian network structure from massive datasets: The “sparse candidate” algorithm. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 206–215 (1999)Google Scholar
  17. 17.
    Kirshner, S., Smyth, P.: Infinite mixtures of trees. In: ICML 2007: Proceedings of the 24th International Conference on Machine Learning, pp. 417–423. ACM, New York (2007)Google Scholar
  18. 18.
    Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT Press, Cambridge (2009)MATHGoogle Scholar
  19. 19.
    Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society 7(1), 48–50 (1956)CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Kumar, M.P., Koller, D.: Learning a small mixture of trees. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1051–1059 (2009)Google Scholar
  22. 22.
    Kwisthout, J.H., Bodlaender, H.L., van der Gaag, L.: The necessity of bounded treewidth for efficient inference in bayesian networks. In: the 19th European Conference on Artificial Intelligence, pp. 623–626 (2010)Google Scholar
  23. 23.
    Liu, H., Xu, M., Haijie Gu, A.G., Lafferty, J., Wasserman, L.: Forest density estimation. JMLR 12, 907–951 (2011)MATHMathSciNetGoogle Scholar
  24. 24.
    Madigan, D., Raftery, A., Wermuth, N., York, J., Zucchini, W.: Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window. Journal of the American Statistical Association 89, 1535–1546 (1994)CrossRefMATHGoogle Scholar
  25. 25.
    Meila, M., Jordan, M.: Learning with mixtures of trees. JMLR 1, 1–48 (2001)MATHMathSciNetGoogle Scholar
  26. 26.
    Meila, M., Jaakkola, T.: Tractable bayesian learning of tree belief networks. In: Proc. of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 380–388. Morgan Kaufmann, San Francisco (2000)Google Scholar
  27. 27.
    Ormoneit, D., Tresp, V.: Improved gaussian mixture density estimates using bayesian penalty terms and network averaging. In: Advances in Neural Information Processing Systems, pp. 542–548. MIT Press, Cambridge (1995)Google Scholar
  28. 28.
    Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco (1988)MATHGoogle Scholar
  29. 29.
    Robinson, R.W.: Counting unlabeled acyclic digraphs, vol. 622. Springer, Heidelberg (1977)MATHGoogle Scholar
  30. 30.
    Schnitzler, F., Leray, P., Wehenkel, L.: Towards sub-quadratic learning of probability density models in the form of mixtures of trees. In: 18th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2010), Bruges, Belgium, pp. 219–224 (2010)Google Scholar
  31. 31.
    Shahaf, D., Chechetka, A., Guestrin, C.: Learning thin junction trees via graph cuts. In: Artificial Intelligence and Statistics (AISTATS), pp. 113–120 (2009)Google Scholar
  32. 32.
    Tarjan, R.E.: Data structures and network algorithms. Society for Industrial and Applied Mathematics, Philadelphia (1983)CrossRefMATHGoogle Scholar
  33. 33.
    Wehenkel, L.: Decision tree pruning using an additive information quality measure. In: Uncertainty in Intelligent Systems, pp. 397–411 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • François Schnitzler
    • 1
  • Sourour Ammar
    • 2
  • Philippe Leray
    • 2
  • Pierre Geurts
    • 1
  • Louis Wehenkel
    • 1
  1. 1.Department of EECS and GIGA-ResearchUniversité de LiègeLiègeBelgium
  2. 2.Knowledge and Decision Team, Laboratoire d’Informatique de Nantes Atlantique (LINA) UMR 6241Ecole Polytechnique de l’Université de NantesFrance

Personalised recommendations