Learning Bayesian networks with low inference complexity

Benjumeda, Marco; Larrañaga, Pedro; Bielza, Concha

doi:10.1007/s13748-015-0070-0

Learning Bayesian networks with low inference complexity

Regular Paper
Published: 14 December 2015

Volume 5, pages 15–26, (2016)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

Marco Benjumeda¹,
Pedro Larrañaga¹ &
Concha Bielza¹

268 Accesses
5 Citations
Explore all metrics

Abstract

One of the main research topics in machine learning nowadays is the improvement of the inference and learning processes in probabilistic graphical models. Traditionally, inference and learning have been treated separately, but given that the structure of the model conditions the inference complexity, most learning methods will sometimes produce inefficient inference models. In this paper we propose a framework for learning low inference complexity Bayesian networks. For that, we use a representation of the network factorization that allows efficiently evaluating an upper bound in the inference complexity of each model during the learning process. Experimental results show that the proposed methods obtain tractable models that improve the accuracy of the predictions provided by approximate inference in models obtained with a well-known Bayesian network learner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Min-BDeu and Max-BDeu Scores for Learning Bayesian Networks

Learning Bayesian Networks with Non-Decomposable Scores

An Efficient Bayesian Network Structure Learning Strategy

Article 12 December 2016

References

Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Article MATH MathSciNet Google Scholar
Andreassen, S., Rosenfalck, A., Falck, B., Olesen, K.G., Andersen, S.K.: Evaluation of the diagnostic performance of the expert EMG assistant MUNIN. Electromyogr. Mot. Control 101(2), 129–144 (1996)
Article Google Scholar
Arnborg, S., Corneil, D.G., Proskurowski, A.: Complexity of finding embeddings in a \(k\)-tree. SIAM J. Algebraic Discret. 8(2), 277–284 (1987)
Article MATH MathSciNet Google Scholar
Bach, F.R., Jordan, M.I.: Thin junction trees. In: Adv. Neural Inf., pp. 569–576 (2001)
Beygelzimer, A., Rish, I.: Approximability of probability distributions. In: Adv. Neural Inf. pp. 377–384 (2004)
Bielza, C., Li, G., Larranaga, P.: Multi-dimensional classification with Bayesian networks. Int. J. Approx. Reason. 52(6), 705–727 (2011)
Article MATH MathSciNet Google Scholar
Bielza, C., Larranaga, P.: Discrete Bayesian network classifiers: a survey. ACM Comput. Surv. 47(1), 5 (2014)
Article MathSciNet Google Scholar
Bodlaender, H.L.: A linear time algorithm for finding tree-decompositions of small treewidth. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pp. 226–234 (1993)
Bodlaender, H.L., Koster, A.M.: Treewidth computations I. Upper bounds. Inf. Comput. 208(3), 259–275 (2010)
Article MATH MathSciNet Google Scholar
Bouckaert, R.R.: Probabilistic network construction using the minimum description length principle. In: Lect. Notes Artif. Int., pp. 41–48 (1993)
Chechetka, A., Guestrin, C.: Efficient principled learning of thin junction trees. In: Adv. Neural Inf., pp. 273–280 (2008)
Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2), 393–405 (1990)
Article MATH Google Scholar
Cooper, G.F., Herskovits, E.: A Bayesian method for constructing Bayesian belief networks from databases. In: Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, pp. 86–94 (1991)
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)
MATH Google Scholar
Dagum, P., Luby, M.: Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artif. Intell. 60(1), 141–153 (1993)
Article MATH MathSciNet Google Scholar
Darwiche, A.: A differential approach to inference in Bayesian networks. J. Assoc. Comput. Mach. 50(3), 280–305 (2003)
Article MathSciNet Google Scholar
Elidan, G., Gould, S.: Learning bounded treewidth Bayesian networks. In: Adv. Neural Inf., pp. 417–424 (2009)
Fung, R.M., Chang, K.C.: Weighing and integrating evidence for stochastic simulation in Bayesian networks. In: Uncertainty in Artificial Intelligence, pp. 209–220 (1989)
Gámez, J.A., Mateo, J.L., Puerta, J.M.: Learning Bayesian networks by hill climbing: effficient methods based on progressive restriction of the neighborhood. Data Min. Knowl. Discov. 22(1–2), 106–148 (2011)
Article MATH MathSciNet Google Scholar
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
MATH Google Scholar
Heckerman, D., Horwitz, E., Nathwani, B.: Towards normative expert systems: part I. The pathfinder project. Methods Inf. Med. 31, 90–105 (1992)
Google Scholar
Kim, J., Pearl, J.: A computational model for causal and diagnostic reasoning in inference systems. In: Proceedings of the Eighth International Joint Conference on Artificial Intelligence, pp. 190–193 (1983)
Kwisthout, J.: Most probable explanations in Bayesian networks: complexity and tractability. Int. J. Approx. Reason. 52(9), 1452–1469 (2011)
Article MATH MathSciNet Google Scholar
Lam, W., Bacchus, F.: Learning Bayesian belief networks: an approach based on the MDL principle. Comput. Intell. 10(3), 269–293 (1994)
Article Google Scholar
Larranaga, P., Kuijpers, C.M., Murga, R.H., Yurramendi, Y.: Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Trans. Syst. Man Cybern. 26(4), 487–493 (1996)
Article Google Scholar
Larrañaga, P., Poza, M., Yurramendi, Y., Murga, R.H., Kuijpers, C.M.: Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters. IEEE Trans. Pattern Anal. 18(9), 912–926 (1996)
Article Google Scholar
Lowd, D., Domingos, P.: Learning arithmetic circuits. In: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, pp. 383–392 (2008)
Pham, D.T., Ruz, G.A.: Unsupervised training of Bayesian networks for data clustering. Proc. Roy. Soc. Lond. A Mat., pp. 2927–2948 (2009)
Shachter, R.D., Peot, M.A.: Simulation approaches to general probabilistic inference on belief networks. In: Uncertainty in Artificial Intelligence, pp. 221–234 (1989)
Shahaf, D., Guestrin, C.: Learning thin junction trees via graph cuts. In: International Conference on Artificial Intelligence and Statistics, pp. 113–120 (2009)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
Marco Benjumeda, Pedro Larrañaga & Concha Bielza

Authors

Marco Benjumeda
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Larrañaga
View author publications
You can also search for this author in PubMed Google Scholar
Concha Bielza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Benjumeda.

Additional information

This work has been partially supported by the Spanish Ministry of Economy and Competitiveness through the Cajal Blue Brain (C080020-09; the Spanish partner of the Blue Brain initiative from EPFL) and TIN2013-41592-P projects, by the Regional Government of Madrid through the S2013/ICE-2845-CASI-CAM-CM project, and by the European Union’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement No. 604102 (Human Brain Project). M. Benjumeda is supported by a predoctoral contract for the formation of doctors from the Spanish Ministry of Economy and Competitiveness (BES-2014-068637).

Appendix: Proof of Theorem 1

This work relies heavily on Theorem 1, which assures that the proposed incremental compilation and optimization methods produce always sound PTs. To demonstrate the soundness of a PT \({\mathcal {P}}\) with respect to a BN \({\mathcal {B}}\), we show that for each node \(X_i\) of \({\mathcal {P}}\) every parent of \(X_i\) in \({\mathcal {B}}\) is a predecessor of \(X_i\) in \({\mathcal {P}}\). In this “Appendix” we provide a proof of Theorem 1.

Lemma 1

Let \({\mathcal {P}}\) be a PT over \({\mathcal {X}}_P = \{*\} \cup {\mathcal {X}}\) and \({\mathcal {B}}\) be a Bayesian network over \({\mathcal {X}}\). If \({\mathcal {P}}\) is sound with respect to \({\mathcal {B}}\), then the PT \({\mathcal {P}}'\) obtained after applying addArc\(({\mathcal {B}},{\mathcal {P}},X_{\mathrm{out}},X_{\mathrm{out}})\) is also sound with respect to \({\mathcal {B}}'\), where \({\mathcal {B}}'\) is the result of adding arc \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) to \({\mathcal {B}}\), and the addition of \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) to \({\mathcal {B}}\) does not produce a cycle in \({\mathcal {B}}'\).

Proof

The structure of \({\mathcal {P}}'\) depends on the precedence relationship between \(X_{\mathrm{out}}\) and \(X_{\mathrm{in}}\) in \({\mathcal {P}}\).

\(X_{\mathrm{out}}\in \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{in}})\): there are no changes in the structure of \({\mathcal {P}}\). \(\forall X_i \in {\mathcal {X}} {\setminus } \{X_{\mathrm{in}}\}\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) = \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i)\), so \(X_i\) is sound. \(X_{\mathrm{in}}\) is also sound because \(\mathbf{pa }_{\mathcal {B}'}(X_{\mathrm{in}}) = \mathbf{pa }_{\mathcal {B}}(X_{\mathrm{in}}) \cup \{X_{\mathrm{out}}\}\), \(\mathbf{pred }_{\mathcal {P}'}(X_{\mathrm{in}}) = \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{in}})\) and \(X_{\mathrm{out}}\in \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{in}})\).
\(X_{\mathrm{in}}\in \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{out}})\): The nodes that are not descendants of \(X_{\mathrm{in}}\) in \({\mathcal {P}}\) do not change. \(\forall X_i \in {\mathcal {X}} {\setminus } (\mathbf{desc }_{\mathcal {P}}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\})\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) = \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i)\). Thus, \(X_i\) is sound. \(X_{\mathrm{out}}\) and its descendants in \({\mathcal {P}}'\) that are not descendants of \(X_{\mathrm{in}}\) have less predecessors in \({\mathcal {P}}'\) than in \({\mathcal {P}}\). \(\forall X_i \in \mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{out}}) \cup \{X_{\mathrm{out}}\} {\setminus } (\mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\})\), as \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i) {\setminus } (\mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\})\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) = \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pa }_{\mathcal {B}'}(X_i) \cap (\mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\}) = \varnothing \), \(X_i\) is sound. Finally, \(X_{\mathrm{in}}\) has \(X_{\mathrm{out}}\) as a predecessor in \({\mathcal {P}}\). \(\forall X_i \in \mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\}, \mathbf{pred }_{\mathcal {P}'}(X_i) \supseteq \mathbf{pred }_{\mathcal {P}}(X_i) \cup \{X_{\mathrm{out}}\}\) and \(\mathbf{pa }_{\mathcal {B}'}(X_i) \subseteq \mathbf{pa }_{\mathcal {B}}(X_i) \cup \{X_{\mathrm{out}}\}\), so \(X_i\) is sound.
\(X_{\mathrm{out}}\notin \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{in}})\) and \(X_{\mathrm{in}}\notin \mathbf{pred }_{\mathcal {P}}(X_{\mathrm{out}})\): \(X_{\mathrm{out}}\) and its predecessors in \({\mathcal {P}}\) are set as predecessors of \(X_{\mathrm{in}}\) in \({\mathcal {P}}'\). \(\forall X_i \notin \mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\}, \mathbf{pa }_{\mathcal {B}'}(X_i) = \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) \supseteq \mathbf{pred }_{\mathcal {P}}(X_i)\). Hence \(X_i\) is sound. \(\forall X_i \in \mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{in}}) \cup \{X_{\mathrm{in}}\}\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) \subseteq \mathbf{pa }_{\mathcal {B}}(X_i) \cup \{X_{\mathrm{out}}\}\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) \supseteq \mathbf{pred }_{\mathcal {P}}(X_i) \cup \{X_{\mathrm{out}}\}\). Therefore \(X_i\) is sound.

Lemma 2

Let \({\mathcal {P}}\) be a PT over \({\mathcal {X}}_P = \{*\} \cup {\mathcal {X}}\) and \({\mathcal {B}}\) be a Bayesian network over \({\mathcal {X}}\). If \({\mathcal {P}}\) is sound with respect to \({\mathcal {B}}\), then the PT \({\mathcal {P}}'\) obtained after applying removeArc\(({\mathcal {B}},{\mathcal {P}},X_{\mathrm{out}},X_{\mathrm{in}})\) is also sound with respect to \({\mathcal {B}}'\), where \({\mathcal {B}}'\) is the result of removing arc \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) from \({\mathcal {B}}\).

Proof

\(\forall X_i \in {\mathcal {X}}\), \(\mathbf{pa }_{\mathcal {B}'}(X_i) \subseteq \mathbf{pa }_{\mathcal {B}}(X_i)\) and \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i)\), so \(X_i\) is sound.

Lemma 3

Let \({\mathcal {P}}\) be a PT over \({\mathcal {X}}_P = \{*\} \cup {\mathcal {X}}\) and \({\mathcal {B}}\) be a Bayesian network over \({\mathcal {X}}\). If \({\mathcal {P}}\) is sound with respect to \({\mathcal {B}}\), then the PT \({\mathcal {P}}'\) obtained after applying reverseArc\(({\mathcal {B}},{\mathcal {P}},X_{\mathrm{out}},X_{\mathrm{in}})\) is also sound with respect to \({\mathcal {B}}'\), where \({\mathcal {B}}'\) is the result of reversing arc \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) in \({\mathcal {B}}\), and \(X_{\mathrm{in}}\rightarrow X_{\mathrm{out}}\) does not produce a cycle in \({\mathcal {B}}'\).

Proof

We can describe the reversion of arc \(X_{\mathrm{out}}\rightarrow X_{\mathrm{in}}\) in two steps:

1
\({\mathcal {P}}_1,{\mathcal {B}}_1 \leftarrow \) removeArc\(({\mathcal {P}},{\mathcal {B}},X_{\mathrm{out}},X_{\mathrm{in}})\).
2.
\({\mathcal {P}}',{\mathcal {B}}' \leftarrow \) addArc\(({\mathcal {P}}_1,{\mathcal {B}}_1,X_{\mathrm{in}},X_{\mathrm{out}})\).

From Lemma 1 we know that \({\mathcal {P}}_1\) is sound with respect to \({\mathcal {B}}_1\), and from Lemma 2 we know that \({\mathcal {P}}'\) is sound with respect to \({\mathcal {B}}'\).

Lemma 4

Let \({\mathcal {P}}\) be a PT over \({\mathcal {X}}_P = \{*\} \cup {\mathcal {X}}\) and \({\mathcal {B}}\) be a Bayesian network over \({\mathcal {X}}\). If \({\mathcal {P}}\) is sound with respect to \({\mathcal {B}}\), then the PT \({\mathcal {P}}'\) obtained after applying pushUpNode\(({\mathcal {B}},{\mathcal {P}},X_{\mathrm{opt}})\) is also sound with respect to \({\mathcal {B}}\).

Proof

Let \({\mathcal {D}}_{\mathrm{opt}}= (\mathbf{desc }_{\mathcal {P}'}(X_{\mathrm{opt}}) \cup \{X_{\mathrm{opt}}\}) {\setminus } (\mathbf{desc }_{\mathcal {P}'}(X_p) \cup \{X_p\})\), and \({\mathcal {D}}_p= \mathbf{desc }_{\mathcal {P}'}(X_p) \cup \{X_p\}\).

\(\forall X_i \in {\mathcal {X}} {\setminus } ({\mathcal {D}}_{\mathrm{opt}}\cup {\mathcal {D}}_p)\), \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i)\). Therefore, \(X_i\) is sound.

\(\forall X_i \in {\mathcal {D}}_{\mathrm{opt}}, \mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i) {\setminus } \{X_p\}\). Given that \(X_i \in {\mathcal {D}}_{\mathrm{opt}}\) only if \(X_i \notin \mathbf{desc }_{\mathcal {B}}(X_p)\), then \(X_i\) is sound.

The nodes in \({\mathcal {D}}_p\) may contain \(X_{\mathrm{opt}}\) as a predecessor in \({\mathcal {P}}'\) depending on their predecessors in \({\mathcal {B}}\).

If \({\mathcal {D}}_p\cap \mathbf{desc }_{\mathcal {B}}(X_{\mathrm{opt}}) \ne \varnothing \), \(\forall X_i \in {\mathcal {D}}_p\), \(\mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i) \cup \{X_{\mathrm{opt}}\}\), so \(X_i\) is sound. Otherwise, \(\forall X_i \in {\mathcal {D}}_p, \mathbf{pred }_{\mathcal {P}'}(X_i) = \mathbf{pred }_{\mathcal {P}}(X_i) {\setminus } \{X_{\mathrm{opt}}\}\), and given that \(X_{\mathrm{opt}}\notin \mathbf{pred }_{\mathcal {B}}(X_i)\), \(X_i\) is sound.

Theorem 1 Let \({\mathcal {P}}\) be a sound PT with respect to a BN \({\mathcal {B}}\), and \({\mathcal {A}}\) an algorithm that receives \({\mathcal {P}}\) and \({\mathcal {B}}\) and obtains a new PT \({\mathcal {P}}'\) and BN \({\mathcal {B}}'\). If every change in \({\mathcal {P}}\) and \({\mathcal {B}}\) made by \({\mathcal {A}}\) corresponds to applying Algorithms 1–4, then \({\mathcal {P}}'\) is sound with respect to \({\mathcal {B}}'\).

Proof

Algorithm \({\mathcal {A}}\) obtains \({\mathcal {P}}'\) and \({\mathcal {B}}'\) from \({\mathcal {P}}\) and \({\mathcal {B}}\) using any sequence of changes, where each change is produced by Algorithms 1–4. Since \({\mathcal {P}}\) is sound for \({\mathcal {B}}\) and Lemmas 1–4 assure that Algorithms 1–4 return a PT \({\mathcal {P}}_1\) and a BN \({\mathcal {B}}_1\) such that \({\mathcal {P}}_1\) is sound for \({\mathcal {B}}_1\), the result of applying the sequence of changes in \({\mathcal {A}}\) is a PT \({\mathcal {P}}'\) and a BN \({\mathcal {B}}'\) where \({\mathcal {P}}'\) is sound for \({\mathcal {B}}'\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Benjumeda, M., Larrañaga, P. & Bielza, C. Learning Bayesian networks with low inference complexity. Prog Artif Intell 5, 15–26 (2016). https://doi.org/10.1007/s13748-015-0070-0

Download citation

Received: 20 November 2015
Accepted: 24 November 2015
Published: 14 December 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s13748-015-0070-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Bayesian networks with low inference complexity

Abstract

Access this article

Similar content being viewed by others

Min-BDeu and Max-BDeu Scores for Learning Bayesian Networks

Learning Bayesian Networks with Non-Decomposable Scores

An Efficient Bayesian Network Structure Learning Strategy

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of Theorem 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Bayesian networks with low inference complexity

Abstract

Access this article

Similar content being viewed by others

Min-BDeu and Max-BDeu Scores for Learning Bayesian Networks

Learning Bayesian Networks with Non-Decomposable Scores

An Efficient Bayesian Network Structure Learning Strategy

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation