Sum–product graphical models

  • Mattia DesanaEmail author
  • Christoph Schnörr


This paper introduces a probabilistic architecture called sum–product graphical model (SPGM). SPGMs represent a class of probability distributions that combines, for the first time, the semantics of probabilistic graphical models (GMs) with the evaluation efficiency of sum–product networks (SPNs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence assumptions and corresponding factorizations. Thus, this approach provides new connections between the fields of SPNs and GMs, and enables a high-level interpretation of the family of distributions encoded by SPNs. We provide two applications of SPGMs in density estimation with empirical results close to or surpassing state-of-the-art models. The theoretical and practical results demonstrate that jointly exploiting properties of SPNs and GMs is an interesting direction of future research.


Sum product networks Probabilistic graphical models Density estimation Deep learning Exact inference Density estimation 



Support of the German Science Foundation, Grant GRK 1653, is gratefully acknowledged.


  1. Amer, M., & Todorovic, S. (2015). Sum product networks for activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 800–813.CrossRefGoogle Scholar
  2. Bacchus, F., Dalmao, S., & Pitassi, T. (2012). Value elimination: Bayesian inference via backtracking search. CoRR, arXiv:1212.2452.
  3. Bach, F. R., & Jordan, M. I. (2001). Thin junction trees. In Advances in neural information processing systems, vol 14. MIT Press, pp. 569–576.Google Scholar
  4. Boutilier, C., Friedman, N., Goldszmidt, M., & Koller, D. (1996). Context-specific independence in Bayesian networks. pp. 115–123.Google Scholar
  5. Cheng, W.-C., Kok, S., Pham, H. V., Chieu, H. L., & Chai, K. M. (2014). Language modeling with sum–product networks. InAnnual conference of the international speech communication association 15 (INTERSPEECH 2014).Google Scholar
  6. Chickering, D. M., Heckerman, D., & Meek, C. (2013). A Bayesian approach to learning Bayesian networks with local structure. CoRR, arXiv:1302.1528.
  7. Choi, M. J., Tan, V. Y. F., Anandkumar, A., & Willsky, A. S. (2011). Learning latent tree graphical models. Journal of Machine Learning Research, 12, 1771–1812.MathSciNetzbMATHGoogle Scholar
  8. Chow, C. I., & Liu, C. N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467.CrossRefzbMATHGoogle Scholar
  9. Conaty, D., Mauá, D. D., & de Campos, C. P. (2017). Approximation complexity of maximum a posteriori inference in sum–product networks. In Elidan, G., & Kersting, K. (Eds), Proceedings of the thirty-third conference on uncertainty in artificial intelligence. AUAI Press, pp. 322–331.Google Scholar
  10. Cowell, R., Dawid, A., Lauritzen, S., & Spiegelhalter, D. (2003). Probabilistic networks and expert systems. Berlin: Springer.zbMATHGoogle Scholar
  11. Darwiche, A. (2002). A logical approach to factoring belief networks. In D. Fensel, F. Giunchiglia, D. L. McGuinness, & M.-A. Williams (Eds.), KR (pp. 409–420). Burlington: Morgan Kaufmann.Google Scholar
  12. Darwiche, A. (2003). A differential approach to inference in Bayesian networks. Journal of the ACM, 50(3), 280–305.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Dechter, R., & Mateescu, R. (2007). AND/OR search spaces for graphical models. Artificial Intelligence, 171(2–3), 73–106.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Desana, M. & Schnörr, C. (2016). Expectation maximization for sum–product networks as exponential family mixture models. CoRR, arXiv:1604.07243.
  16. Diestel, R. (2006). Graph theory (3rd ed.). Berlin: Springer.zbMATHGoogle Scholar
  17. Fridman, A. (2003). Mixed Markov models. PNAS, 100, 8092–8096.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Gens, R., & Domingos, P. (2012). Discriminative learning of sum–product networks. In NIPS, pp. 3248–3256.Google Scholar
  19. Gens, R., & Domingos, P. (2013). Learning the structure of sum–product networks. ICML, 3, 873–880.Google Scholar
  20. Gogate, V., Webb, W., & Domingos, P. (2010). Learning efficient Markov networks. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 23, pp. 748–756). Red Hook: Curran Associates Inc.Google Scholar
  21. Hinton, G. E., & Osindero, S. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 2006.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Jordan, M. I. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.CrossRefGoogle Scholar
  23. Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.CrossRefGoogle Scholar
  24. Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159.CrossRefzbMATHGoogle Scholar
  25. Lowd, D., & Domingos, P. (2012). Learning arithmetic circuits. CoRR, arXiv:1206.3271.
  26. Mcallester, D., Collins, M., & Pereira, F. (2004). Case-factor diagrams for structured probabilistic modeling. In Proceedings of the twentieth conference on uncertainty in artificial intelligence (UAI 04), pp. 382–391.Google Scholar
  27. Mei, J., Jiang, Y., & Tu, K. (2018). Maximum a posteriori inference in sum–product networks.Google Scholar
  28. Meila, M., & Jordan, M. I. (2000). Learning with mixtures of trees. Journal of Machine Learning Research, 1, 1–48.MathSciNetzbMATHGoogle Scholar
  29. Minka, T., & Winn, J. (2009). Gates. In Advances in neural information processing systems 21.Google Scholar
  30. Neal, R., & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models. Kluwer Academic Publishers, pp. 355–368.Google Scholar
  31. Peharz, R. (2015). Foundations of sum–product networks for probabilistic modeling (PhD thesis). Researchgate:273000973.Google Scholar
  32. Peharz, R., Gens, R., Pernkopf, F., & Domingos, P. M. (2016). On the latent variable interpretation in sum–product networks. CoRR, arXiv:1601.06180.
  33. Pletscher, P., Ong, C. S., & Buhmann, J. M. (2009). Spanning tree approximations for conditional random fields. In Dyk, D. A. V., & Welling, M. (Eds.), AISTATS, volume 5 of JMLR proceedings, pp. 408–415. Scholar
  34. Poole, D. L., & Zhang, N. L. (2011). Exploiting contextual independence in probabilistic inference. CoRR, arXiv:1106.4864.
  35. Poon, H., & Domingos, P. (2011). Sum–product networks: A new deep architecture. In UAI 2011, Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence, Barcelona, Spain, July 14–17, 2011, pp. 337–346.Google Scholar
  36. Rahman, T.m & Gogate, V. (2016a). Learning ensembles of cutset networks. In Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, AZ, USA, pp. 3301–3307.Google Scholar
  37. Rahman, T., & Gogate, V. (2016b). Merging strategies for sum–product networks: From trees to graphs. In Proceedings of the thirty-second conference on uncertainty in artificial intelligence, UAI 2016, June 25–29, 2016, New York City, NY, USA.Google Scholar
  38. Rahman, T., Kothalkar, P., & Gogate, V. (2014). Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of Chow–Liu trees. In Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2014, Nancy, France, September 15–19, 2014. Proceedings, Part II, pp. 630–645.Google Scholar
  39. Rooshenas, A., & Lowd, D. (2014). Learning sum–product networks with direct and indirect variable interactions. In Jebara, T., & Xing, E. P., (Eds.), Proceedings of the 31st international conference on machine learning (ICML-14). JMLR workshop and conference proceedings, pp. 710–718.Google Scholar
  40. Vergari, A., Mauro, N. D., & Esposito, F. (2015). Simplifying, regularizing and strengthening sum–product network structure learning. In Proceedings of the 2015th European conference on machine learning and knowledge discovery in databases—Volume Part II, ECMLPKDD’15, Switzerland. Springer, pp. 343–358.Google Scholar
  41. Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2), 1–305.zbMATHGoogle Scholar
  42. Zhao, H., Melibari, M., & Poupart, P. (2015). On the relationship between sum–product networks and Bayesian networks. CoRR, arXiv:1501.01239.
  43. Zhao, H., Poupart, P., & Gordon, G. (2016). A unified approach for learning the parameters of sum–product networks. In Proceedings of the 29th advances in neural information processing systems (NIPS 2016).Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Heidelberg Collaboratory for Image Processing (HCI)Heidelberg UniversityHeidelbergGermany
  2. 2.Image and Pattern Analysis Group (IPA)Heidelberg UniversityHeidelbergGermany

Personalised recommendations