## Abstract

This paper introduces a probabilistic architecture called sum–product graphical model (SPGM). SPGMs represent a class of probability distributions that combines, for the first time, the semantics of probabilistic graphical models (GMs) with the evaluation efficiency of sum–product networks (SPNs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence assumptions and corresponding factorizations. Thus, this approach provides new connections between the fields of SPNs and GMs, and enables a high-level interpretation of the family of distributions encoded by SPNs. We provide two applications of SPGMs in density estimation with empirical results close to or surpassing state-of-the-art models. The theoretical and practical results demonstrate that jointly exploiting properties of SPNs and GMs is an interesting direction of future research.

## Keywords

Sum product networks Probabilistic graphical models Density estimation Deep learning Exact inference Density estimation## Notes

### Acknowledgements

Support of the German Science Foundation, Grant GRK 1653, is gratefully acknowledged.

## References

- Amer, M., & Todorovic, S. (2015). Sum product networks for activity recognition.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*38*, 800–813.CrossRefGoogle Scholar - Bacchus, F., Dalmao, S., & Pitassi, T. (2012). Value elimination: Bayesian inference via backtracking search.
*CoRR*, arXiv:1212.2452. - Bach, F. R., & Jordan, M. I. (2001). Thin junction trees. In
*Advances in neural information processing systems*, vol 14. MIT Press, pp. 569–576.Google Scholar - Boutilier, C., Friedman, N., Goldszmidt, M., & Koller, D. (1996). Context-specific independence in Bayesian networks. pp. 115–123.Google Scholar
- Cheng, W.-C., Kok, S., Pham, H. V., Chieu, H. L., & Chai, K. M. (2014). Language modeling with sum–product networks. In
*Annual conference of the international speech communication association 15 (INTERSPEECH 2014)*.Google Scholar - Chickering, D. M., Heckerman, D., & Meek, C. (2013). A Bayesian approach to learning Bayesian networks with local structure.
*CoRR*, arXiv:1302.1528. - Choi, M. J., Tan, V. Y. F., Anandkumar, A., & Willsky, A. S. (2011). Learning latent tree graphical models.
*Journal of Machine Learning Research*,*12*, 1771–1812.MathSciNetzbMATHGoogle Scholar - Chow, C. I., & Liu, C. N. (1968). Approximating discrete probability distributions with dependence trees.
*IEEE Transactions on Information Theory*,*14*, 462–467.CrossRefzbMATHGoogle Scholar - Conaty, D., Mauá, D. D., & de Campos, C. P. (2017). Approximation complexity of maximum a posteriori inference in sum–product networks. In Elidan, G., & Kersting, K. (Eds),
*Proceedings of the thirty-third conference on uncertainty in artificial intelligence*. AUAI Press, pp. 322–331.Google Scholar - Cowell, R., Dawid, A., Lauritzen, S., & Spiegelhalter, D. (2003).
*Probabilistic networks and expert systems*. Berlin: Springer.zbMATHGoogle Scholar - Darwiche, A. (2002). A logical approach to factoring belief networks. In D. Fensel, F. Giunchiglia, D. L. McGuinness, & M.-A. Williams (Eds.),
*KR*(pp. 409–420). Burlington: Morgan Kaufmann.Google Scholar - Darwiche, A. (2003). A differential approach to inference in Bayesian networks.
*Journal of the ACM*,*50*(3), 280–305.MathSciNetCrossRefzbMATHGoogle Scholar - Dechter, R., & Mateescu, R. (2007). AND/OR search spaces for graphical models.
*Artificial Intelligence*,*171*(2–3), 73–106.MathSciNetCrossRefzbMATHGoogle Scholar - Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm.
*Journal of the Royal Statistical Society. Series B (Methodological)*,*39*(1), 1–38.MathSciNetCrossRefzbMATHGoogle Scholar - Desana, M. & Schnörr, C. (2016). Expectation maximization for sum–product networks as exponential family mixture models.
*CoRR*, arXiv:1604.07243. - Diestel, R. (2006).
*Graph theory*(3rd ed.). Berlin: Springer.zbMATHGoogle Scholar - Fridman, A. (2003). Mixed Markov models.
*PNAS*,*100*, 8092–8096.MathSciNetCrossRefzbMATHGoogle Scholar - Gens, R., & Domingos, P. (2012). Discriminative learning of sum–product networks. In
*NIPS*, pp. 3248–3256.Google Scholar - Gens, R., & Domingos, P. (2013). Learning the structure of sum–product networks.
*ICML*,*3*, 873–880.Google Scholar - Gogate, V., Webb, W., & Domingos, P. (2010). Learning efficient Markov networks. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.),
*Advances in neural information processing systems*(Vol. 23, pp. 748–756). Red Hook: Curran Associates Inc.Google Scholar - Hinton, G. E., & Osindero, S. (2006). A fast learning algorithm for deep belief nets.
*Neural Computation*,*18*, 2006.MathSciNetCrossRefzbMATHGoogle Scholar - Jordan, M. I. (1994). Hierarchical mixtures of experts and the EM algorithm.
*Neural Computation*,*6*, 181–214.CrossRefGoogle Scholar - Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*28*(10), 1568–1583.CrossRefGoogle Scholar - Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts?
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*26*(2), 147–159.CrossRefzbMATHGoogle Scholar - Lowd, D., & Domingos, P. (2012). Learning arithmetic circuits.
*CoRR*, arXiv:1206.3271. - Mcallester, D., Collins, M., & Pereira, F. (2004). Case-factor diagrams for structured probabilistic modeling. In
*Proceedings of the twentieth conference on uncertainty in artificial intelligence (UAI 04)*, pp. 382–391.Google Scholar - Mei, J., Jiang, Y., & Tu, K. (2018). Maximum a posteriori inference in sum–product networks.Google Scholar
- Meila, M., & Jordan, M. I. (2000). Learning with mixtures of trees.
*Journal of Machine Learning Research*,*1*, 1–48.MathSciNetzbMATHGoogle Scholar - Minka, T., & Winn, J. (2009). Gates. In
*Advances in neural information processing systems 21*.Google Scholar - Neal, R., & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In
*Learning in graphical models*. Kluwer Academic Publishers, pp. 355–368.Google Scholar - Peharz, R. (2015). Foundations of sum–product networks for probabilistic modeling (PhD thesis).
*Researchgate:273000973*.Google Scholar - Peharz, R., Gens, R., Pernkopf, F., & Domingos, P. M. (2016). On the latent variable interpretation in sum–product networks.
*CoRR*, arXiv:1601.06180. - Pletscher, P., Ong, C. S., & Buhmann, J. M. (2009). Spanning tree approximations for conditional random fields. In Dyk, D. A. V., & Welling, M. (Eds.),
*AISTATS*, volume 5 of*JMLR proceedings*, pp. 408–415. JMLR.org.Google Scholar - Poole, D. L., & Zhang, N. L. (2011). Exploiting contextual independence in probabilistic inference.
*CoRR*, arXiv:1106.4864. - Poon, H., & Domingos, P. (2011). Sum–product networks: A new deep architecture. In
*UAI 2011, Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence, Barcelona, Spain, July 14–17, 2011*, pp. 337–346.Google Scholar - Rahman, T.m & Gogate, V. (2016a). Learning ensembles of cutset networks. In
*Proceedings of the thirtieth AAAI conference on artificial intelligence*, February 12–17, 2016, Phoenix, AZ, USA, pp. 3301–3307.Google Scholar - Rahman, T., & Gogate, V. (2016b). Merging strategies for sum–product networks: From trees to graphs. In
*Proceedings of the thirty-second conference on uncertainty in artificial intelligence, UAI 2016*, June 25–29, 2016, New York City, NY, USA.Google Scholar - Rahman, T., Kothalkar, P., & Gogate, V. (2014). Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of Chow–Liu trees. In
*Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2014*, Nancy, France, September 15–19, 2014. Proceedings, Part II, pp. 630–645.Google Scholar - Rooshenas, A., & Lowd, D. (2014). Learning sum–product networks with direct and indirect variable interactions. In Jebara, T., & Xing, E. P., (Eds.),
*Proceedings of the 31st international conference on machine learning (ICML-14)*. JMLR workshop and conference proceedings, pp. 710–718.Google Scholar - Vergari, A., Mauro, N. D., & Esposito, F. (2015). Simplifying, regularizing and strengthening sum–product network structure learning. In
*Proceedings of the 2015th European conference on machine learning and knowledge discovery in databases—Volume Part II*, ECMLPKDD’15, Switzerland. Springer, pp. 343–358.Google Scholar - Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference.
*Foundations and Trends in Machine Learning*,*1*(1–2), 1–305.zbMATHGoogle Scholar - Zhao, H., Melibari, M., & Poupart, P. (2015). On the relationship between sum–product networks and Bayesian networks.
*CoRR*, arXiv:1501.01239. - Zhao, H., Poupart, P., & Gordon, G. (2016). A unified approach for learning the parameters of sum–product networks. In
*Proceedings of the 29th advances in neural information processing systems (NIPS 2016)*.Google Scholar