Advertisement

Machine Learning

, Volume 107, Issue 1, pp 247–283 | Cite as

Empirical hardness of finding optimal Bayesian network structures: algorithm selection and runtime prediction

  • Brandon Malone
  • Kustaa Kangas
  • Matti Järvisalo
  • Mikko Koivisto
  • Petri Myllymäki
Article
  • 904 Downloads
Part of the following topical collections:
  1. Special Issue on Metalearning and Algorithm Selection

Abstract

Various algorithms have been proposed for finding a Bayesian network structure that is guaranteed to maximize a given scoring function. Implementations of state-of-the-art algorithms, solvers, for this Bayesian network structure learning problem rely on adaptive search strategies, such as branch-and-bound and integer linear programming techniques. Thus, the time requirements of the solvers are not well characterized by simple functions of the instance size. Furthermore, no single solver dominates the others in speed. Given a problem instance, it is thus a priori unclear which solver will perform best and how fast it will solve the instance. We show that for a given solver the hardness of a problem instance can be efficiently predicted based on a collection of non-trivial features which go beyond the basic parameters of instance size. Specifically, we train and test statistical models on empirical data, based on the largest evaluation of state-of-the-art exact solvers to date. We demonstrate that we can predict the runtimes to a reasonable degree of accuracy. These predictions enable effective selection of solvers that perform well in terms of runtimes on a particular instance. Thus, this work contributes a highly efficient portfolio solver that makes use of several individual solvers.

Keywords

Bayesian networks Structure learning Algorithm selection Hyperparameter optimization Empirical hardness Algorithm portfolio Runtime prediction 

Notes

Acknowledgements

The authors thank James Cussens for discussions on GOBNILP and the anonymous reviewers for valuable suggestions that helped improve the manuscript. This work is supported by Academy of Finland, Grants #125637, #251170 (COIN Centre of Excellence in Computational Inference Research), #255675, #276412, and #284591; Finnish Funding Agency for Technology and Innovation (Project D2I); and Research Funds of the University of Helsinki.

References

  1. Achterberg, T. (2009). SCIP: Solving constraint integer programs. Mathematical Programming Computation, 1(1), 1–41.MathSciNetCrossRefMATHGoogle Scholar
  2. Bache, K., & Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml
  3. Balint, A., Belov, A., Järvisalo, M., & Sinz, C. (2015). Overview and analysis of the SAT Challenge 2012 solver competition. Artificial Intelligence, 223, 120–155.CrossRefGoogle Scholar
  4. Bartlett, M., & Cussens, J. (2015). Integer linear programming for the Bayesian network structure learning problem. Artificial Intelligence, 244, 258–271. (in press).MathSciNetCrossRefMATHGoogle Scholar
  5. Berg, J., Järvisalo, M., & Malone, B. (2014). Learning optimal bounded treewidth Bayesian networks via maximum satisfiability. In Proceedings of the 17th international conference on artificial intelligence and statistics (AISTATS 2014), JMLR workshop and conference proceedings (Vol. 33, pp. 86–95). JMLR.Google Scholar
  6. Bielza, C., & Larrañaga, P. (2014). Discrete Bayesian network classifiers: A survey. ACM Computing Surveys, 47(1), 5:1–5:43.CrossRefMATHGoogle Scholar
  7. Bischl, B., Kerschke, P., Kotthoff, L., Lindauer, M. T., Malitsky, Y., Fréchette, A., et al. (2016). ASlib: A benchmark library for algorithm selection. Artificial Intelligence, 237, 41–58.  https://doi.org/10.1016/j.artint.2016.04.003.MathSciNetCrossRefMATHGoogle Scholar
  8. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRefMATHGoogle Scholar
  9. Buntine, W. (1991). Theory refinement on Bayesian networks. In Proceedings of the 7th conference on uncertainty in artificial intelligence (UAI 1997) (pp. 52–60). Morgan Kaufmann Publishers Inc.Google Scholar
  10. Carbonell, J., Etzioni, O., Gil, Y., Joseph, R., Knoblock, C., Minton, S., et al. (1991). Prodigy: An integrated architecture for planning and learning. SIGART Bulletin, 2, 51–55.CrossRefGoogle Scholar
  11. Cheng, J., Greiner, R., Kelly, J., Bell, D. A., & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137(1–2), 43–90.MathSciNetCrossRefMATHGoogle Scholar
  12. Chickering, D. (1996). Learning Bayesian networks is NP-complete. In D. Fisher, H-J. Lenz (Eds.), Learning from data: Artificial intelligence and statistics (Vol. V, pp. 121–130). Springer: New York.Google Scholar
  13. Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.MATHGoogle Scholar
  14. Cussens, J. (2011). Bayesian network learning with cutting planes. In Proceedings of the 27th conference on uncertainty in artificial intelligence (UAI 2011) (pp. 153–160). AUAI Press.Google Scholar
  15. Cussens, J. (2013). Advances in Bayesian network learning using integer programming. In Proceedings of the 29th conference on uncertainty in artificial intelligence (UAI 2013), (pp. 182–191). AUAI Press.Google Scholar
  16. de Campos, C., & Ji, Q. (2011). Efficient learning of Bayesian networks using constraints. Journal of Machine Learning Research, 12, 663–689.MathSciNetMATHGoogle Scholar
  17. Fan, X., Malone, B., & Yuan, C. (2014). Finding optimal Bayesian network structures with constraints learned from data. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014) (pp. 200–209). AUAI Press.Google Scholar
  18. Fan, X., & Yuan, C. (2015). An improved lower bound for Bayesian network structure learning. In Proceedings of the 29th AAAI conference on artificial intelligence (AAAI 2015) (pp. 3526–3532). AAAI Press.Google Scholar
  19. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett (Eds.), Advances in neural information processing systems (Vol. 28, pp. 2962–2970). Curran Associates, Inc.Google Scholar
  20. Fink, E. (1998). How to solve it automatically: Selection among problem-solving methods. In Proceedings of the 4th international conference on artificial intelligence planning systems (AIPS 1998) (pp. 126–136). AAAI Press.Google Scholar
  21. Fréchette, A., Kotthoff, L., Michalak, T. P., Rahwan, T., Hoos, H. H., & Leyton-Brown, K. (2016). Using the Shapley value to analyze algorithm portfolios. In D. Schuurmans, M. P. Wellman (Eds.), Proceedings of the 30th AAAI conference on artificial intelligence (pp. 3397–3403). AAAI Press.Google Scholar
  22. Friedman, N., & Koller, D. (2003). Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50, 95–125.CrossRefMATHGoogle Scholar
  23. Gebruers, C., Hnich, B., Bridge, D. G., & Freuder, E. C. (2005). Using CBR to select solution strategies in constraint programming. In 6th International conference on case-based reasoning (ICCBR 2005), lecture notes in computer science (Vol. 3620, pp. 222–236). Springer.Google Scholar
  24. Giraud-Carrier, C., Vilalta, R., & Brazdil, P. (2004). Introduction to the special issue on meta-learning. Machine Learning, 54(3), 187–193.CrossRefGoogle Scholar
  25. Gomes, C. P., & Selman, B. (2001). Algorithm portfolios. Artificial Intelligence, 126(1–2), 43–62.MathSciNetCrossRefMATHGoogle Scholar
  26. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. SIGKDD Explorations, 11(1), 10–18.CrossRefGoogle Scholar
  27. Heckerman, D., Geiger, D., & Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.MATHGoogle Scholar
  28. Hoos, H., Kaminski, R., Lindauer, M., & Schaub, T. (2015). aspeed: Solver scheduling via answer set programming. Theory and Practice of Logic Programming, 15(1), 117–142.CrossRefMATHGoogle Scholar
  29. Hoos, H., Lindauer, M. T., & Schaub, T. (2014). claspfolio 2: Advances in algorithm selection for answer set programming. Theory and Practice of Logic Programming, 14(4–5), 569–585.CrossRefMATHGoogle Scholar
  30. Horvitz, E., Ruan, Y., Gomes, C. P., Kautz, H. A., Selman, B., & Chickering, D. M. (2001). A Bayesian approach to tackling hard computational problems. In Proceedings of the 17th conference on uncertainty in artificial intelligence (UAI 2001) (pp. 235–244). Morgan Kaufmann.Google Scholar
  31. Hurley, B., Kotthoff, L., Malitsky, Y., & O’Sullivan, B. (2014) Proteus: A hierarchical portfolio of solvers and transformations. In Proceedings of the 11th international conference on integration of AI and OR techniques in constraint programming (CPAIOR 2014), lecture notes in computer science (Vol. 8451, pp. 301–317). Springer.Google Scholar
  32. Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. In Selected papers of the 5th international conference on learning and intelligent optimization (LION 5), lecture notes in computer science (Vol. 6683, pp. 507–523). Springer.Google Scholar
  33. Hutter, F., Xu, L., Hoos, H. H., & Leyton-Brown, K. (2014). Algorithm runtime prediction: Methods and evaluation. Artificial Intelligence, 206, 79–111.MathSciNetCrossRefMATHGoogle Scholar
  34. Jaakkola, T. S., Sontag, D., Globerson, A., & Meila, M. (2010). Learning Bayesian network structure using LP relaxations. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (AISTATS 2010), JMLR proceedings (Vol. 9, pp. 358–365). JMLR.org.Google Scholar
  35. Järvisalo, M., Le Berre, D., Roussel, O., & Simon, L. (2012). The international SAT solver competitions. AI Magazine, 33(1), 89–92.CrossRefGoogle Scholar
  36. Koivisto, M., & Sood, K. (2004). Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5, 549–573.MathSciNetMATHGoogle Scholar
  37. Kontkanen, P., & Myllymäki, P. (2007). MDL histogram density estimation. In Proceedings of the eleventh international conference on artificial intelligence and statistics (AISTATS 2007), JMLR proceedings (Vol. 2, pp. 219–226). JMLR.org.Google Scholar
  38. Kotthoff, L. (2014). Algorithm selection for combinatorial search problems: A survey. AI Magazine, 35(3), 48–60.CrossRefGoogle Scholar
  39. Kotthoff, L., Gent, I. P., & Miguel, I. (2012). An evaluation of machine learning in algorithm selection for search problems. AI Communications, 25(3), 257–270.MathSciNetGoogle Scholar
  40. Kotthoff, L., Kerschke, P., Hoos, H., & Trautmann, H. (2015). Improving the state of the art in inexact TSP solving using per-instance algorithm selection. In Revised selected papers of the 9th international conference on learning and intelligent optimization (LION 9), lecture notes in computer science (Vol. 8994, pp. 202–217). Springer.Google Scholar
  41. Lee, J. W., & Giraud-Carrier, C. G. (2008). Predicting algorithm accuracy with a small set of effective meta-features. In Proceedings of the 7th international conference on machine learning and applications (IEEE ICMLA 2008) (pp. 808–812). IEEE Computer Society.Google Scholar
  42. Leite, R., Brazdil, P., & Vanschoren, J. (2012). Selecting classification algorithms with active testing. In Proceedings of the 8th international conference on machine learning and data mining in pattern recognition (MLDM 2012), lecture notes in computer science (Vol. 7376, pp. 117–131). Springer.Google Scholar
  43. Leyton-Brown, K., Hoos, H. H., Hutter, F., & Xu, L. (2014). Understanding the empirical hardness of NP-complete problems. Communications of the ACM, 57(5), 98–107.CrossRefGoogle Scholar
  44. Leyton-Brown, K., Nudelman, E., & Shoham, Y. (2002). Learning the empirical hardness of optimization problems: The case of combinatorial auctions. In 8th International conference on principles and practice of constraint programming (CP 2002), lecture notes in computer science (Vol. 2470, pp. 556–572). Springer.Google Scholar
  45. Leyton-Brown, K., Nudelman, E., & Shoham, Y. (2009). Empirical hardness models: Methodology and a case study on combinatorial auctions. Journal of the ACM.  https://doi.org/10.1145/1538902.1538906.
  46. Lindauer, M. T., Hoos, H. H., Hutter, F., & Schaub, T. (2015). AutoFolio: An automatically configured algorithm selector. Journal of Artificial Intelligence Research, 53, 745–778.MathSciNetGoogle Scholar
  47. Lobjois, L., & Lemaître, M. (1998). Branch and bound algorithm selection by performance prediction. In Proceedings of the 15th national conference on artificial intelligence (AAAI 1998) (pp. 353–358). AAAI Press.Google Scholar
  48. Madigan, D., & York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215–232.CrossRefMATHGoogle Scholar
  49. Malone, B., Järvisalo, M., & Myllymäki, P. (2015). Impact of learning strategies on the quality of Bayesian networks: An empirical evaluation. In Proceedings of the 31st conference on uncertainty in artificial intelligence (UAI 2015) (pp. 362–371). AUAI PressGoogle Scholar
  50. Malone, B., Kangas, K., Järvisalo, M., Koivisto, M., & Myllymäki, P. (2014). Predicting the hardness of learning Bayesian networks. In Proceedings of the 28th AAAI conference on artificial intelligence (AAAI 2014) (pp. 2460–2466). AAAI Press.Google Scholar
  51. Malone, B. M., & Yuan, C. (2013). Evaluating anytime algorithms for learning optimal Bayesian networks. In Proceedings of the 29th conference on uncertainty in artificial intelligence (UAI 2013). AUAI Press.Google Scholar
  52. Ott, S., Imoto, S., & Miyano, S. (2004). Finding optimal models for small gene networks. In Proceedings of the pacific symposium on biocomputing 2004 (pp. 557–567). World Scientific.Google Scholar
  53. Parviainen, P., & Koivisto, M. (2013). Finding optimal Bayesian networks using precedence constraints. Journal of Machine Learning Research, 14, 1387–1415.MathSciNetMATHGoogle Scholar
  54. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Burlington: Morgan Kaufmann.MATHGoogle Scholar
  55. Perrier, E., Imoto, S., & Miyano, S. (2008). Finding optimal Bayesian network given a super-structure. Journal of Machine Learning Research, 9, 2251–2286.MathSciNetMATHGoogle Scholar
  56. Pulina, L., & Tacchella, A. (2008). Treewidth: A useful marker of empirical hardness in quantified Boolean logic encodings. In Proceedings of the 15th international conference on logic for programming, artificial intelligence, and reasoning (LPAR 2008), lecture notes in computer science (Vol. 5330, pp. 528–542). Springer.Google Scholar
  57. Rice, J. (1976). The algorithm selection problem. Advances in Computers, 15, 65–118.CrossRefGoogle Scholar
  58. Rijn, J. N., Abdulrahman, S. M., Brazdil, P., & Vanschoren, J. (2015). Fast algorithm selection using learning curves. In Proceedings of the 14th international symposium on advances in intelligent data analysis (IDA 2015), lecture notes in computer science (Vol. 9385, pp. 298–309). Springer.Google Scholar
  59. Saikko, P., Malone, B., & Järvisalo, M. (2015). MaxSAT-based cutting planes for learning graphical models. In Proceedings of the 12th international conference on integration of artificial intelligence and operations research techniques in constraint programming (CPAIOR 2015), lecture notes in computer science (Vol. 9075, pp. 345–354). Springer.Google Scholar
  60. Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games, 2, 307–317.MathSciNetMATHGoogle Scholar
  61. Silander, T., & Myllymäki, P. (2006). A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the 22nd conference in uncertainty in artificial intelligence (UAI 2006) (pp. 445–452). AUAI Press.Google Scholar
  62. Singh, A., & Moore, A. (2005). Finding optimal Bayesian networks by dynamic programming. Technical report, Carnegie Mellon University.Google Scholar
  63. Sokal, R. R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. The University of Kansas Science Bulletin, 38(2), 1409–1438.Google Scholar
  64. Spirtes, P., Glymour, C., & Schemes, R. (1993). Causation, prediction, and search. New York: Springer.CrossRefMATHGoogle Scholar
  65. Tamada, Y., Imoto, S., & Miyano, S. (2011). Parallel algorithm for learning optimal Bayesian network structure. Journal of Machine Learning Research, 12, 2437–2459.MathSciNetMATHGoogle Scholar
  66. Teyssier, M., & Koller, D. (2005). Ordering-based search: A simple and effective algorithm for learning Bayesian networks. In Proceedings of the 21st conference in uncertainty in artificial intelligence (UAI 2005) (pp. 584–590). AUAI Press.Google Scholar
  67. van Beek, P., & Hoffmann, H. (2015). Machine learning of Bayesian networks using constraint programming. In Proceedings of the 21st international conference on principles and practice of constraint programming (CP 2015), lecture notes in computer science (Vol. 9255, pp. 429–445). Springer.Google Scholar
  68. Vanschoren, J., van Rijn, J. N., Bischl, B., & Torgo, L. (2013). OpenML: Networked science in machine learning. SIGKDD Explorations, 15(2), 49–60.CrossRefGoogle Scholar
  69. Wunderling, R. (1996). Paralleler und objektorientierter simplex-algorithmus. Ph.D. thesis, Technische Universität BerlinGoogle Scholar
  70. Xu, L., Hutter, F., Hoos, H., & Leyton-Brown, K. (2008). SATzilla: Portfolio-based algorithm selection for SAT. Journal of Artificial Intelligence Research, 32, 565–606.MATHGoogle Scholar
  71. Yuan, C., & Malone, B. (2012). An improved admissible heuristic for finding optimal Bayesian networks. In Proceedings of the 27th conference in uncertainty in artificial intelligence (UAI 2012) (pp. 924–933). AUAI Press.Google Scholar
  72. Yuan, C., & Malone, B. (2013). Learning optimal Bayesian networks: A shortest path perspective. Journal of Artificial Intelligence Research, 48, 23–65.MathSciNetMATHGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.NEC Laboratories EuropeHeidelbergGermany
  2. 2.Department of Computer Science, Helsinki Institute for Information Technology HIITUniversity of HelsinkiHelsinkiFinland

Personalised recommendations