Hashing-Based Hybrid Duplicate Detection for Bayesian Network Structure Learning

Jahnsson, Niklas; Malone, Brandon; Myllymäki, Petri

doi:10.1007/978-3-319-28379-1_4

Hashing-Based Hybrid Duplicate Detection for Bayesian Network Structure Learning

Niklas Jahnsson¹⁶,
Brandon Malone¹⁷ &
Petri Myllymäki^16,18

Conference paper
First Online: 08 January 2016

1141 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9505))

Abstract

In this work, we address the well-known score-based Bayesian network structure learning problem. Breadth-first branch and bound (BFBnB) has been shown to be an effective approach for solving this problem. Delayed duplicate detection (DDD) is an important component of the BFBnB algorithm. Previously, an external sorting-based technique, with complexity \({\text {O}}\left( m \log m\right) \), where m is the number of nodes stored in memory, was used for DDD. In this work, we propose a hashing-based technique, with complexity \({\text {O}}\left( m\right) \), for DDD. In practice, by removing the \({\text {O}}\left( \log m\right) \) overhead of sorting, over an order of magnitude more memory is available for the search. Empirically, we show the extra memory improves locality and decreases the amount of expensive external memory operations. We also give a bin packing algorithm for minimizing the number of external memory files.

This is a preview of subscription content, log in via an institution.

Notes

1.
In this work, we use “memory” to refer to fast-access storage, such as RAM; by “external memory,” we mean storage with slower access, such as hard disks and network storage. All of the theoretical complexity analysis, such as \({\text {O}}\left( \cdot \right) \), refers to fast-access storage.
2.
The problem can also be defined as a maximization using non-positive local scores.
3.
Efficient sorting implementations, such as the g++ version of std::sort, often do not exhaust the additional \({\text {O}}\left( \log m\right) \) space; however, it is difficult to a priori estimate the required overhead, so \({\text {O}}\left( m \log m\right) \) must be used to ensure stable algorithm behavior.
4.
For this analysis, we do not consider the load factor of the hash table.
5.
The strategy is optimal in that it minimizes the number of files. We solve the optimization problem using an integer linear programming formulation.

References

Bartlett, M., Cussens, J.: Integer linear programming for the Bayesian network structure learning problem. Artif. Intell. (2015)
Google Scholar
Chickering, D.M.: Learning Bayesian networks is NP-complete. In: Fisher, D., Lenz, H.-J. (eds.) Learning from Data: Artificial Intelligence and Statistics V, pp. 121–130. Springer, New York (1996)
Chapter Google Scholar
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992)
Article MATH Google Scholar
de Campos, C.P., Ji, Q.: Efficient learning of Bayesian networks using constraints. J. Mach. Learn. Res. 12, 663–689 (2011)
MathSciNet MATH Google Scholar
de Campos, L.M., Huete, J.F.: A new approach for learning belief networks using independence criteria. Int. J. Approximate Reasoning 24(1), 11–37 (2000)
Article MathSciNet MATH Google Scholar
Fan, X., Yuan, C., Malone, B.: Tightening bounds for Bayesian network structure learning. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (2014)
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1979)
MATH Google Scholar
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20, 197–243 (1995)
Article MATH Google Scholar
Johnson, D.: Near-Optimal Bin Packing Algorithms. Ph.D. thesis, Massachusetts Institute of Technology (1973)
Google Scholar
Koivisto, M., Sood, K.: Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 5, 549–573 (2004)
MathSciNet MATH Google Scholar
Korf, R.E.: A new algorithm for optimal bin packing. In: Proceedings of the 18th AAAI Conference on Artificial Intelligence (2002)
Google Scholar
Korf, R.E. Best-first frontier search with delayed duplicate detection. In: Proceedings of the 19th AAAI Conference on Artificial Intelligence (2004)
Google Scholar
Korf, R.E.: Linear-time disk-based implicit graph search. J. ACM 35(6) (2008)
Google Scholar
Malone, B., Järvisalo, M., Myllymäki, P.: Impact of learning strategies on the qual packing Bayesian networks: an empirical evaluation. In: Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (2015)
Google Scholar
Malone, B., Kangas, K., Järvisalo, M., Koivisto, M., Myllymäki, P.: Predicting the hardness of learning Bayesian networks. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (2014)
Google Scholar
Malone, B., Yuan, C.: Evaluating anytime algorithms for learning optimal Bayesian networks. In: Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (2013)
Google Scholar
Malone, B., Yuan, C.: A depth-first branch and bound algorithm for learning optimal Bayesian networks. In: Croitoru, M., Rudolph, S., Woltran, S., Gonzales, C. (eds.) GKR 2013. LNCS, vol. 8323, pp. 111–122. Springer, Heidelberg (2014)
Chapter Google Scholar
Malone, B., Yuan, C., Hansen, E.: Memory-efficient dynamic programming for learning optimal Bayesian networks. In: Proceedings of the 25th AAAI Conference on Artifical Intelligence (2011)
Google Scholar
Ott, S., Imoto, S., Miyano, S.: Finding optimal models for small gene networks. In: Proceedings of the Pacific Symposium on Biocomputing (2004)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Mateo (1988)
MATH Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education, Upper Saddle River (2003)
MATH Google Scholar
Silander, T., Myllymäki, P.: A simple approach for finding the globally optimal Bayesian network structure. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (2006)
Google Scholar
Silander, T., Roos, T., Kontkanen, P., Myllymäki, P.: Factorized normalized maximum likelihood criterion for learning Bayesian network structures. In: Proceedings of the 4th European Workshop on Probabilistic Graphical Models (2008)
Google Scholar
Suzuki, J.: Learning Bayesian belief networks based on the MDL principle: an efficient algorithm using the branch and bound technique. IEICE Trans. Inf. Syst. E82–D(2), 356–367 (1999)
Google Scholar
Tamada, Y., Imoto, S., Miyano, S.: Parallel algorithm for learning optimal Bayesian network structure. J. Mach. Learn. Res. 12, 2437–2459 (2011)
MathSciNet MATH Google Scholar
Teyssier, M., Koller, D.: Ordering-based search: a simple and effective algorithm for learning Bayesian networks. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (2005)
Google Scholar
Tian, J.: A branch-and-bound algorithm for MDL learning Bayesian networks. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (2000)
Google Scholar
van Beek, P., Hoffmann, H.-F.: Machine learning of Bayesian networks using constraint programming. In: Pesant, G. (ed.) CP 2015. LNCS, vol. 9255, pp. 429–445. Springer, Heidelberg (2015)
Chapter Google Scholar
Yuan, C., Malone, B.: An improved admissible heuristic for finding optimal Bayesian networks. In: Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (2012)
Google Scholar
Yuan, C., Malone, B.: Learning optimal Bayesian networks: a shortest path perspective. J. Artif. Intell. Res. 48, 23–65 (2013)
Article MathSciNet MATH Google Scholar
Zhou, R., Hansen, E. A.: Sparse-memory graph search. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (2003)
Google Scholar
Zhou, R., Hansen, E.A.: Breadth-first heuristic search. Artif. Intell. 170, 385–408 (2006)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Helsinki, Helsinki, Finland
Niklas Jahnsson & Petri Myllymäki
Max Planck Institute for the Biology of Ageing, Cologne, Germany
Brandon Malone
Helsinki Institute for Information Technology, Esbo, Finland
Petri Myllymäki

Authors

Niklas Jahnsson
View author publications
You can also search for this author in PubMed Google Scholar
Brandon Malone
View author publications
You can also search for this author in PubMed Google Scholar
Petri Myllymäki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brandon Malone .

Editor information

Editors and Affiliations

Osaka University, Osaka, Japan
Joe Suzuki
The University of Electro-Communications, Tokyo, Japan
Maomi Ueno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jahnsson, N., Malone, B., Myllymäki, P. (2015). Hashing-Based Hybrid Duplicate Detection for Bayesian Network Structure Learning. In: Suzuki, J., Ueno, M. (eds) Advanced Methodologies for Bayesian Networks. AMBN 2015. Lecture Notes in Computer Science(), vol 9505. Springer, Cham. https://doi.org/10.1007/978-3-319-28379-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-28379-1_4
Published: 08 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28378-4
Online ISBN: 978-3-319-28379-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics