Skip to main content
Log in

Learning Bayesian networks with low inference complexity

Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

One of the main research topics in machine learning nowadays is the improvement of the inference and learning processes in probabilistic graphical models. Traditionally, inference and learning have been treated separately, but given that the structure of the model conditions the inference complexity, most learning methods will sometimes produce inefficient inference models. In this paper we propose a framework for learning low inference complexity Bayesian networks. For that, we use a representation of the network factorization that allows efficiently evaluating an upper bound in the inference complexity of each model during the learning process. Experimental results show that the proposed methods obtain tractable models that improve the accuracy of the predictions provided by approximate inference in models obtained with a well-known Bayesian network learner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  2. Andreassen, S., Rosenfalck, A., Falck, B., Olesen, K.G., Andersen, S.K.: Evaluation of the diagnostic performance of the expert EMG assistant MUNIN. Electromyogr. Mot. Control 101(2), 129–144 (1996)

    Article  Google Scholar 

  3. Arnborg, S., Corneil, D.G., Proskurowski, A.: Complexity of finding embeddings in a \(k\)-tree. SIAM J. Algebraic Discret. 8(2), 277–284 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bach, F.R., Jordan, M.I.: Thin junction trees. In: Adv. Neural Inf., pp. 569–576 (2001)

  5. Beygelzimer, A., Rish, I.: Approximability of probability distributions. In: Adv. Neural Inf. pp. 377–384 (2004)

  6. Bielza, C., Li, G., Larranaga, P.: Multi-dimensional classification with Bayesian networks. Int. J. Approx. Reason. 52(6), 705–727 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  7. Bielza, C., Larranaga, P.: Discrete Bayesian network classifiers: a survey. ACM Comput. Surv. 47(1), 5 (2014)

    Article  MathSciNet  Google Scholar 

  8. Bodlaender, H.L.: A linear time algorithm for finding tree-decompositions of small treewidth. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pp. 226–234 (1993)

  9. Bodlaender, H.L., Koster, A.M.: Treewidth computations I. Upper bounds. Inf. Comput. 208(3), 259–275 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  10. Bouckaert, R.R.: Probabilistic network construction using the minimum description length principle. In: Lect. Notes Artif. Int., pp. 41–48 (1993)

  11. Chechetka, A., Guestrin, C.: Efficient principled learning of thin junction trees. In: Adv. Neural Inf., pp. 273–280 (2008)

  12. Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2), 393–405 (1990)

    Article  MATH  Google Scholar 

  13. Cooper, G.F., Herskovits, E.: A Bayesian method for constructing Bayesian belief networks from databases. In: Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, pp. 86–94 (1991)

  14. Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)

    MATH  Google Scholar 

  15. Dagum, P., Luby, M.: Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artif. Intell. 60(1), 141–153 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  16. Darwiche, A.: A differential approach to inference in Bayesian networks. J. Assoc. Comput. Mach. 50(3), 280–305 (2003)

    Article  MathSciNet  Google Scholar 

  17. Elidan, G., Gould, S.: Learning bounded treewidth Bayesian networks. In: Adv. Neural Inf., pp. 417–424 (2009)

  18. Fung, R.M., Chang, K.C.: Weighing and integrating evidence for stochastic simulation in Bayesian networks. In: Uncertainty in Artificial Intelligence, pp. 209–220 (1989)

  19. Gámez, J.A., Mateo, J.L., Puerta, J.M.: Learning Bayesian networks by hill climbing: effficient methods based on progressive restriction of the neighborhood. Data Min. Knowl. Discov. 22(1–2), 106–148 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  20. Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)

    MATH  Google Scholar 

  21. Heckerman, D., Horwitz, E., Nathwani, B.: Towards normative expert systems: part I. The pathfinder project. Methods Inf. Med. 31, 90–105 (1992)

    Google Scholar 

  22. Kim, J., Pearl, J.: A computational model for causal and diagnostic reasoning in inference systems. In: Proceedings of the Eighth International Joint Conference on Artificial Intelligence, pp. 190–193 (1983)

  23. Kwisthout, J.: Most probable explanations in Bayesian networks: complexity and tractability. Int. J. Approx. Reason. 52(9), 1452–1469 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  24. Lam, W., Bacchus, F.: Learning Bayesian belief networks: an approach based on the MDL principle. Comput. Intell. 10(3), 269–293 (1994)

    Article  Google Scholar 

  25. Larranaga, P., Kuijpers, C.M., Murga, R.H., Yurramendi, Y.: Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Trans. Syst. Man Cybern. 26(4), 487–493 (1996)

    Article  Google Scholar 

  26. Larrañaga, P., Poza, M., Yurramendi, Y., Murga, R.H., Kuijpers, C.M.: Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters. IEEE Trans. Pattern Anal. 18(9), 912–926 (1996)

    Article  Google Scholar 

  27. Lowd, D., Domingos, P.: Learning arithmetic circuits. In: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, pp. 383–392 (2008)

  28. Pham, D.T., Ruz, G.A.: Unsupervised training of Bayesian networks for data clustering. Proc. Roy. Soc. Lond. A Mat., pp. 2927–2948 (2009)

  29. Shachter, R.D., Peot, M.A.: Simulation approaches to general probabilistic inference on belief networks. In: Uncertainty in Artificial Intelligence, pp. 221–234 (1989)

  30. Shahaf, D., Guestrin, C.: Learning thin junction trees via graph cuts. In: International Conference on Artificial Intelligence and Statistics, pp. 113–120 (2009)

  31. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Benjumeda.

Additional information

This work has been partially supported by the Spanish Ministry of Economy and Competitiveness through the Cajal Blue Brain (C080020-09; the Spanish partner of the Blue Brain initiative from EPFL) and TIN2013-41592-P projects, by the Regional Government of Madrid through the S2013/ICE-2845-CASI-CAM-CM project, and by the European Union’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement No. 604102 (Human Brain Project). M. Benjumeda is supported by a predoctoral contract for the formation of doctors from the Spanish Ministry of Economy and Competitiveness (BES-2014-068637).

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

This work relies heavily on Theorem 1, which assures that the proposed incremental compilation and optimization methods produce always sound PTs. To demonstrate the soundness of a PT \({\mathcal {P}}\) with respect to a BN \({\mathcal {B}}\), we show that for each node \(X_i\) of \({\mathcal {P}}\) every parent of \(X_i\) in \({\mathcal {B}}\) is a predecessor of \(X_i\) in \({\mathcal {P}}\). In this “Appendix” we provide a proof of Theorem 1.

Lemma 1

Let \({\mathcal {P}}\) be a PT over \({\mathcal {X}}_P = \{*\} \cup {\mathcal {X}}\) and \({\mathcal {B}}\) be a Bayesian network over \({\mathcal {X}}\). If \({\mathcal {P}}\) is sound with respect to \({\mathcal {B}}\), then the PT \({\mathcal {P}}'\) obtained after applying addArc\(({\mathcal {B}},{\mathcal {P}},X_{\mathrm{out}},X_{\mathrm{out}})\) is also sound with respect to \({\mathcal {B}}'\), where \({\mathcal {B}}'\) is the result of adding arc \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) to \({\mathcal {B}}\), and the addition of \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) to \({\mathcal {B}}\) does not produce a cycle in \({\mathcal {B}}'\).

Proof

The structure of \({\mathcal {P}}'\) depends on the precedence relationship between \(X_{\mathrm{out}}\) and \(X_{\mathrm{in}}\) in \({\mathcal {P}}\).

  • \(X_{\mathrm{out}}\in \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{in}})\): there are no changes in the structure of \({\mathcal {P}}\). \(\forall X_i \in {\mathcal {X}} {\setminus } \{X_{\mathrm{in}}\}\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) = \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i)\), so \(X_i\) is sound. \(X_{\mathrm{in}}\) is also sound because \(\mathbf{pa }_{\mathcal {B}'}(X_{\mathrm{in}}) = \mathbf{pa }_{\mathcal {B}}(X_{\mathrm{in}}) \cup \{X_{\mathrm{out}}\}\), \(\mathbf{pred }_{\mathcal {P}'}(X_{\mathrm{in}}) = \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{in}})\) and \(X_{\mathrm{out}}\in \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{in}})\).

  • \(X_{\mathrm{in}}\in \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{out}})\): The nodes that are not descendants of \(X_{\mathrm{in}}\) in \({\mathcal {P}}\) do not change. \(\forall X_i \in {\mathcal {X}} {\setminus } (\mathbf{desc }_{\mathcal {P}}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\})\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) = \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i)\). Thus, \(X_i\) is sound. \(X_{\mathrm{out}}\) and its descendants in \({\mathcal {P}}'\) that are not descendants of \(X_{\mathrm{in}}\) have less predecessors in \({\mathcal {P}}'\) than in \({\mathcal {P}}\). \(\forall X_i \in \mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{out}}) \cup \{X_{\mathrm{out}}\} {\setminus } (\mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\})\), as \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i) {\setminus } (\mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\})\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) = \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pa }_{\mathcal {B}'}(X_i) \cap (\mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\}) = \varnothing \), \(X_i\) is sound. Finally, \(X_{\mathrm{in}}\) has \(X_{\mathrm{out}}\) as a predecessor in \({\mathcal {P}}\). \(\forall X_i \in \mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\}, \mathbf{pred }_{\mathcal {P}'}(X_i) \supseteq \mathbf{pred }_{\mathcal {P}}(X_i) \cup \{X_{\mathrm{out}}\}\) and \(\mathbf{pa }_{\mathcal {B}'}(X_i) \subseteq \mathbf{pa }_{\mathcal {B}}(X_i) \cup \{X_{\mathrm{out}}\}\), so \(X_i\) is sound.

  • \(X_{\mathrm{out}}\notin \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{in}})\) and \(X_{\mathrm{in}}\notin \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{out}})\): \(X_{\mathrm{out}}\) and its predecessors in \({\mathcal {P}}\) are set as predecessors of \(X_{\mathrm{in}}\) in \({\mathcal {P}}'\). \(\forall X_i \notin \mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\}, \mathbf{pa }_{\mathcal {B}'}(X_i) = \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) \supseteq \mathbf{pred }_{\mathcal {P}}(X_i)\). Hence \(X_i\) is sound. \(\forall X_i \in \mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\}\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) \subseteq \mathbf{pa }_{\mathcal {B}}(X_i) \cup \{X_{\mathrm{out}}\}\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) \supseteq \mathbf{pred }_{\mathcal {P}}(X_i) \cup \{X_{\mathrm{out}}\}\). Therefore \(X_i\) is sound.

Lemma 2

Let \({\mathcal {P}}\) be a PT over \({\mathcal {X}}_P = \{*\} \cup {\mathcal {X}}\) and \({\mathcal {B}}\) be a Bayesian network over \({\mathcal {X}}\). If \({\mathcal {P}}\) is sound with respect to \({\mathcal {B}}\), then the PT \({\mathcal {P}}'\) obtained after applying removeArc\(({\mathcal {B}},{\mathcal {P}},X_{\mathrm{out}},X_{\mathrm{in}})\) is also sound with respect to \({\mathcal {B}}'\), where \({\mathcal {B}}'\) is the result of removing arc \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) from \({\mathcal {B}}\).

Proof

\(\forall X_i \in {\mathcal {X}}\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) \subseteq \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i)\), so \(X_i\) is sound.

Lemma 3

Let \({\mathcal {P}}\) be a PT over \({\mathcal {X}}_P = \{*\} \cup {\mathcal {X}}\) and \({\mathcal {B}}\) be a Bayesian network over \({\mathcal {X}}\). If \({\mathcal {P}}\) is sound with respect to \({\mathcal {B}}\), then the PT \({\mathcal {P}}'\) obtained after applying reverseArc\(({\mathcal {B}},{\mathcal {P}},X_{\mathrm{out}},X_{\mathrm{in}})\) is also sound with respect to \({\mathcal {B}}'\), where \({\mathcal {B}}'\) is the result of reversing arc \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) in \({\mathcal {B}}\), and \(X_{\mathrm{in}}\rightarrow X_{\mathrm{out}}\) does not produce a cycle in \({\mathcal {B}}'\).

Proof

We can describe the reversion of arc \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) in two steps:

  1. 1

    \({\mathcal {P}}_1,{\mathcal {B}}_1 \leftarrow \) removeArc\(({\mathcal {P}},{\mathcal {B}},X_{\mathrm{out}},X_{\mathrm{in}})\).

  2. 2.

    \({\mathcal {P}}',{\mathcal {B}}' \leftarrow \) addArc\(({\mathcal {P}}_1,{\mathcal {B}}_1,X_{\mathrm{in}},X_{\mathrm{out}})\).

From Lemma 1 we know that \({\mathcal {P}}_1\) is sound with respect to \({\mathcal {B}}_1\), and from Lemma 2 we know that \({\mathcal {P}}'\) is sound with respect to \({\mathcal {B}}'\).

Lemma 4

Let \({\mathcal {P}}\) be a PT over \({\mathcal {X}}_P = \{*\} \cup {\mathcal {X}}\) and \({\mathcal {B}}\) be a Bayesian network over \({\mathcal {X}}\). If \({\mathcal {P}}\) is sound with respect to \({\mathcal {B}}\), then the PT \({\mathcal {P}}'\) obtained after applying pushUpNode\(({\mathcal {B}},{\mathcal {P}},X_{\mathrm{opt}})\) is also sound with respect to \({\mathcal {B}}\).

Proof

Let \({\mathcal {D}}_{\mathrm{opt}}= (\mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{opt}}) \cup \{X_{\mathrm{opt}}\}) {\setminus } (\mathbf{desc }_{\mathcal {P}'}(X_p) \cup \{X_p\})\), and \({\mathcal {D}}_p= \mathbf{desc }_{\mathcal {P}'}(X_p) \cup \{X_p\}\).

\(\forall X_i \in {\mathcal {X}} {\setminus } ({\mathcal {D}}_{\mathrm{opt}}\cup {\mathcal {D}}_p)\), \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i)\). Therefore, \(X_i\) is sound.

\(\forall X_i \in {\mathcal {D}}_{\mathrm{opt}}, \mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i) {\setminus } \{X_p\}\). Given that \(X_i \in {\mathcal {D}}_{\mathrm{opt}}\) only if \(X_i \notin \mathbf{desc }_{\mathcal {B}}(X_p)\), then \(X_i\) is sound.

The nodes in \({\mathcal {D}}_p\) may contain \(X_{\mathrm{opt}}\) as a predecessor in \({\mathcal {P}}'\) depending on their predecessors in \({\mathcal {B}}\).

If \({\mathcal {D}}_p\cap \mathbf{desc }_{\mathcal {B}}(X_{\mathrm{opt}}) \ne \varnothing \), \(\forall X_i \in {\mathcal {D}}_p\), \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i) \cup \{X_{\mathrm{opt}}\}\), so \(X_i\) is sound. Otherwise, \(\forall X_i \in {\mathcal {D}}_p, \mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i) {\setminus } \{X_{\mathrm{opt}}\}\), and given that \(X_{\mathrm{opt}}\notin \mathbf{pred }_{\mathcal {B}}(X_i)\), \(X_i\) is sound.

Theorem 1 Let \({\mathcal {P}}\) be a sound PT with respect to a BN \({\mathcal {B}}\), and \({\mathcal {A}}\) an algorithm that receives \({\mathcal {P}}\) and \({\mathcal {B}}\) and obtains a new PT \({\mathcal {P}}'\) and BN \({\mathcal {B}}'\). If every change in \({\mathcal {P}}\) and \({\mathcal {B}}\) made by \({\mathcal {A}}\) corresponds to applying Algorithms 1–4, then \({\mathcal {P}}'\) is sound with respect to \({\mathcal {B}}'\).

Proof

Algorithm \({\mathcal {A}}\) obtains \({\mathcal {P}}'\) and \({\mathcal {B}}'\) from \({\mathcal {P}}\) and \({\mathcal {B}}\) using any sequence of changes, where each change is produced by Algorithms 1–4. Since \({\mathcal {P}}\) is sound for \({\mathcal {B}}\) and Lemmas 14 assure that Algorithms 1–4 return a PT \({\mathcal {P}}_1\) and a BN \({\mathcal {B}}_1\) such that \({\mathcal {P}}_1\) is sound for \({\mathcal {B}}_1\), the result of applying the sequence of changes in \({\mathcal {A}}\) is a PT \({\mathcal {P}}'\) and a BN \({\mathcal {B}}'\) where \({\mathcal {P}}'\) is sound for \({\mathcal {B}}'\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Benjumeda, M., Larrañaga, P. & Bielza, C. Learning Bayesian networks with low inference complexity. Prog Artif Intell 5, 15–26 (2016). https://doi.org/10.1007/s13748-015-0070-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-015-0070-0

Keywords

Navigation