Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs

Abstract

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including Weighted Constraint Programs (WCSPs), Distributed Constraint Optimization (DCOP), as well as optimization in stochastic variants such as the tasks of finding the most probable explanation (MPE) in belief networks. Inference-based algorithms are powerful techniques for solving discrete optimization problems, which can be used independently or in combination with other techniques. However, their applicability is often limited by their compute intensive nature and their space requirements. This paper proposes the design and implementation of a novel inference-based technique, which exploits modern massively parallel architectures, such as those found in Graphical Processing Units (GPUs), to speed up the resolution of exact and approximated inference-based algorithms for discrete optimization. The paper studies the proposed algorithm in both centralized and distributed optimization contexts. The paper demonstrates that the use of GPUs provides significant advantages in terms of runtime and scalability, achieving up to two orders of magnitude in speedups and showing a considerable reduction in execution time (up to 345 times faster) with respect to a sequential version.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    For simplicity, we assume that tuples of variables are built according to a predefined ordering.

  2. 2.

    2For simplicity, we also use \(\theta \)to represent the tuple \(\langle \theta (x_{i_{1}}),\dots , \theta (x_{i_{h}})\rangle \)where \(\{x_{i_{1}},\dots , x_{i_{h}}\}\)is the domain of \(\theta \).

  3. 3.

    The primal graph of a DCOP is equivalent to that of the corresponding WCSP.

  4. 4.

    A warp is typically composed of 32 threads.

  5. 5.

    In modern devices, each SM allots 64KB for registers space.

  6. 6.

    Accesses to the GPU global memory are cached into cache lines of 128 Bytes, and can be fetched by all requiring threads in a warp.

  7. 7.

    Our source code is available at https://github.com/nandofioretto/GpuBE, and https://github.com/nandofioretto/GpuDBE

  8. 8.

    Downloadable from http://costfunction.org/en/benchmark/ and http://graphmod.ics.uci.edu/group/Repository

  9. 9.

    Recall that BE needs to process bucket-tables whose number of rows is in \(O(d^{w^{*}})\).

  10. 10.

    We use the Pearson product-moment correlation coefficient.

  11. 11.

    In all other experiments we used the GeForce GTX Titan, as this is the best, most affordable card at our disposal.

References

  1. 1.

    Abdennadher, S., & Schlenker, H. (1999). Nurse scheduling using constraint logic programming. In Proceedings of the conference on innovative applications of artificial intelligence (IAAI) (pp. 838–843).

    Google Scholar 

  2. 2.

    Allouche, D., André, I., Barbe, S., Davies, J., de Givry, S., Katsirelos, G., O’Sullivan, B., Prestwich, S.D., Schiex, T., & Traoré, S. (2014). Computational protein design as an optimization problem. Artificial Intelligence, 212, 59–79.

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Allouche, D., de Givry, S., Nguyen, H., & Schiex, T. (2013). Toulbar2 to solve Weighted Partial max-SAT. Tech. rep. INRA.

  4. 4.

    Apt, K. (2003). Principles of constraint programming. Cambridge University Press.

  5. 5.

    Arbelaez, A., & Codognet, P. (2014). A GPU implementation of parallel constraint-based local search. In Proceedings of the euromicro international conference on parallel, distributed and network-based processing (PDP) (pp. 648–655).

    Google Scholar 

  6. 6.

    Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    MathSciNet  Article  MATH  Google Scholar 

  7. 7.

    Bistaffa, F., Bomberi, N., & Farinelli, A. (2016). CUBE: a CUDA approach for bucket elimination on GPUs. In Proceedings of the European conference on artificial intelligence (ECAI), p. to appear.

    Google Scholar 

  8. 8.

    Bistarelli, S., Montanari, U., & Rossi, F. (1997). Semiring-based constraint satisfaction and optimization. Journal of the ACM, 44(2), 201–236.

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Boyer, V., El Baz, D., & Elkihel, M. (2012). Solving knapsack problems on GPU. Computers & Operations Research, 39(1), 42–47.

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Brito, I., & Meseguer, P. (2010). Improving DPOP with function filtering. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 141–158).

    Google Scholar 

  11. 11.

    Burke, E.K., De Causmaecker, P., Berghe, G.V., & Van Landeghem, H. (2004). The state of the art of nurse rostering. Journal of scheduling, 7(6), 441–499.

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Campeotto, F., Dovier, A., Fioretto, F., & Pontelli, E. (2014). A GPU implementation of large neighborhood search for solving constraint optimization problems. In Proceedings of the european conference on artificial intelligence (ECAI) (pp. 189–194).

    Google Scholar 

  13. 13.

    Campeotto, F., Palù, A.D., Dovier, A., Fioretto, F., & Pontelli, E. (2013). A constraint solver for flexible protein model. Journal of Artificial Intelligence Research, 48, 953–1000.

    MathSciNet  Google Scholar 

  14. 14.

    Chakroun, I., Mezmaz, M.S., Melab, N., & Bendjoudi, A. (2013). Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm. Concurrency and Computation: Practice and Experience, 25(8), 1121–1136.

    Article  Google Scholar 

  15. 15.

    Dechter, R. (1999). Bucket elimination: a unifying framework for reasoning. Artificial Intelligence, 113(1), 41–85.

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    Dechter, R. (2003). Constraint processing. San Francisco: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  17. 17.

    Dechter, R. (2013). Reasoning with probabilistic and deterministic graphical models: exact algorithms. Synthesis Lectures on Artificial Intelligence and Machine Learning, 7(3), 1–191.

    Article  MATH  Google Scholar 

  18. 18.

    Dechter, R., & Pearl, J. (1988). Network-based heuristics for constraint-satisfaction problems. Springer.

  19. 19.

    Dechter, R., & Rish, I. (2003). Mini-buckets: a general scheme for bounded inference. Journal of the ACM, 50(2), 107–153.

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    Diamos, G.F., Ashbaugh, B., Maiyuran, S., Kerr, A., Wu, H., & Yalamanchili, S. (2011). SIMD re-convergence at thread frontiers. In Proceedings of the annual IEEE/ACM international symposium on microarchitecture (pp. 477–488).

    Google Scholar 

  21. 21.

    Dovier, A., Formisano, A., & Pontelli, E. (2013). Autonomous agents coordination: action languages meet CLP() and Linda. Theory and Practice of Logic Programming, 13(2), 149–173.

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Edelkamp, S., Jabbar, S., & Schrödl, S. (2004). External A*. In Advances in artificial intelligence: 27th annual German conference on AI, (KI) 2004 (pp. 226–240).

    Google Scholar 

  23. 23.

    Farinelli, A., Rogers, A., Petcu, A., & Jennings, N. (2008). Decentralised coordination of low-power embedded devices using the Max-Sum algorithm. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 639–646).

    Google Scholar 

  24. 24.

    Fioretto, F., Dovier, A., & Pontelli, E. (2015). Constrained community-based gene regulatory network inference. ACM Trans. Model. Comput. Simul., 25(2), 11.

    MathSciNet  Article  MATH  Google Scholar 

  25. 25.

    Fioretto, F., Le, T., Yeoh, W., Pontelli, E., & Son, T.C. (2014). Improving DPOP with branch consistency for solving distributed constraint optimization problems. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 307–323).

    Google Scholar 

  26. 26.

    Fioretto, F., Le, T., Yeoh, W., Pontelli, E., & Son, T.C. (2015). Exploiting GPUs in solving (distributed) constraint optimization problems with dynamic programming. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 121– 139).

    Google Scholar 

  27. 27.

    Fioretto, F., Yeoh, W., & Pontelli, E. (2016). A dynamic programming-based MCMC framework for solving DCOPs with GPUs. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 813–831).

    Google Scholar 

  28. 28.

    Fioretto, F., Yeoh, W., & Pontelli, E. (2016). Multi-variable agent decomposition for DCOPs. In Proceedings of the AAAI conference on artificial intelligence (AAAI) (pp. 2480–2486).

    Google Scholar 

  29. 29.

    Fioretto, F., Yeoh, W., & Pontelli, E. (2017). A multiagent system approach to scheduling devices in smart homes. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 981–989).

    Google Scholar 

  30. 30.

    Fioretto, F., Yeoh, W., Pontelli, E., Ma, Y., & Ranade, S. (2017). A DCOP approach to the economic dispatch with demand response. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 981–989).

    Google Scholar 

  31. 31.

    Fishelson, M., & Geiger, D. (2002). Exact genetic linkage computations for general pedigrees. Bioinformatics, 18(suppl 1), S189–S198.

    Article  Google Scholar 

  32. 32.

    Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using bayesian networks to analyze expression data. Journal of Computational Biology, 7(3-4), 601–620.

    Article  Google Scholar 

  33. 33.

    Gaudreault, J., Frayret, J.M., & Pesant, G. (2009). Distributed search for supply chain coordination. Computers in Industry, 60(6), 441–451.

    Article  Google Scholar 

  34. 34.

    Gupta, S., Yeoh, W., Pontelli, E., Jain, P., & Ranade, S.J. (2013). Modeling microgrid islanding problems as DCOPs. In North American power symposium (NAPS) (pp. 1–6): IEEE.

  35. 35.

    Hamadi, Y., Bessière, C., & Quinqueton, J. (1998). Distributed intelligent backtracking. In Proceedings of the European conference on artificial intelligence (ECAI) (pp. 219–223).

    Google Scholar 

  36. 36.

    Han, T.D., & Abdelrahman, T.S. (2011). Reducing branch divergence in GPU programs. In Proceedings of the fourth workshop on general purpose processing on graphics processing units (pp. 3:1–3:8). New York: ACM Press.

    Google Scholar 

  37. 37.

    Kask, K., Dechter, R., & Gelfand, A.E. (2012). Beem: bucket elimination with external memory. arXiv:1203.3487.

  38. 38.

    Kumar, A., Faltings, B., & Petcu, A. (2009). Distributed constraint optimization with structured resource constraints. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 923–930).

    Google Scholar 

  39. 39.

    Lalami, M.E., El Baz, D., & Boyer, V. (2011). Multi GPU implementation of the simplex algorithm. In Proceedings of the international conference on high performance computing and communication (HPCC), (Vol. 11 pp. 179–186).

    Google Scholar 

  40. 40.

    Larrosa, J. (2002). Node and arc consistency in weighted csp. In Proceedings of the AAAI conference on artificial intelligence (AAAI) (pp. 48–53).

    Google Scholar 

  41. 41.

    Lars, O., & Rina, D. (2017). And/or branch-and-bound on a computational grid. Journal of Artificial Intelligence Research (to appear).

  42. 42.

    Le, T., Fioretto, F., Yeoh, W., Son, T.C., & Pontelli, E. (2016). ER-DCOPS: a framework for distributed constraint optimization with uncertainty in constraint utilities. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 605– 614).

    Google Scholar 

  43. 43.

    Lerner, U., Parr, R., Koller, D., Biswas, G., & et al. (2000). Bayesian fault detection and diagnosis in dynamic systems. In AAAI/IAAI (pp. 531–537).

    Google Scholar 

  44. 44.

    Lim, H., Yuan, C., & Hansen, E.A. (2010). Scaling up map search in bayesian networks using external memory. On Probabilistic Graphical Models, 177.

  45. 45.

    Maheswaran, R., Tambe, M., Bowring, E., Pearce, J., & Varakantham, P. (2004). Taking DCOP to the real world: efficient complete solutions for distributed event scheduling. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 310–317).

    Google Scholar 

  46. 46.

    Marinescu, R., & Dechter, R. (2009). Memory intensive and/or search for combinatorial optimization in graphical models. Artificial Intelligence, 173(16-17), 1492–1524.

    MathSciNet  Article  MATH  Google Scholar 

  47. 47.

    Modi, P., Shen, W.M., Tambe, M., & Yokoo, M. (2005). ADOPT: asynchronous distributed constraint optimization with quality guarantees. Artificial Intelligence, 161 (1–2), 149–180.

    MathSciNet  Article  MATH  Google Scholar 

  48. 48.

    Montanari, U. (1974). Networks of constraints: fundamental properties and applications to picture processing. Information Sciences, 7, 95–132.

    MathSciNet  Article  MATH  Google Scholar 

  49. 49.

    Pawłowski, K., Kurach, K., Michalak, T., & Rahwan, T. (2104). Coalition structure generation with the graphic processor unit. Tech. Rep. CS-RR-13-07, Department of Computer Science, University of Oxford.

  50. 50.

    Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  51. 51.

    Pesant, G. (2004). A regular language membership constraint for finite sequences of variables. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 482–495).

    Google Scholar 

  52. 52.

    Petcu, A., & Faltings, B. (2005). Approximations in distributed optimization. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 802–806).

    Google Scholar 

  53. 53.

    Petcu, A., & Faltings, B. (2005). A scalable method for multiagent constraint optimization. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1413–1420).

    Google Scholar 

  54. 54.

    Quimper, C.G., & Walsh, T. (2006). Global grammar constraints. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 751–755): Springer.

  55. 55.

    Rodrigues, L., & Magatao, L. (2007). Enhancing supply chain decisions using constraint programming: a case study. In MICAI 2007: advances in artificial intelligence, (Vol. LNCS 4827 pp. 1110–1121): Springer.

  56. 56.

    Rossi, F., van Beek, P., & Walsh, T. (eds.) (2006). Handbook of constraint programming. Elsevier.

  57. 57.

    Rust, P., Picard, G., & Ramparany, F. (2016). Using message-passing DCOP algorithms to solve energy-efficient smart environment configuration problems. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 468–474).

    Google Scholar 

  58. 58.

    Sanders, J., & Kandrot, E. (2010). CUDA By example. An introduction to general-purpose GPU programming. Addison Wesley.

  59. 59.

    Sandholm, T. (2002). Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 135(1), 1–54.

    MathSciNet  Article  MATH  Google Scholar 

  60. 60.

    Schiex, T., Fargier, H., Verfaillie, G., & et al. (1995). Valued constraint satisfaction problems: Hard and easy problems. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 95, 631–639.

    Google Scholar 

  61. 61.

    Shapiro, L.G., & Haralick, R.M. (1981). Structural descriptions and inexact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(5), 504–519.

    Article  Google Scholar 

  62. 62.

    Silberstein, M., Schuster, A., Geiger, D., Patney, A., & Owens, J.D. (2008). Efficient computation of sum-products on gpus through software-managed cache. In Proceedings of the 22nd annual international conference on supercomputing (pp. 309–318): ACM.

  63. 63.

    Sturtevant, N.R., & Rutherford, M.J. (2013). Minimizing writes in parallel external memory search. In Proceedings of the international joint conference on artificial intelligence (IJCAI).

    Google Scholar 

  64. 64.

    Sultanik, E., Modi, P.J., & Regli, W.C. (2007). On modeling multiagent task scheduling as a distributed constraint optimization problem. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1531–1536).

    Google Scholar 

  65. 65.

    Trick, M.A. (2003). A dynamic programming approach for consistency and propagation for knapsack constraints. Annals of Operations Research, 118(1-4), 73–84.

    MathSciNet  Article  MATH  Google Scholar 

  66. 66.

    Yeoh, W., Felner, A., & Koenig, S. (2010). Bnb-ADOPT: an asynchronous branch-and-bound DCOP algorithm. Journal of Artificial Intelligence Research, 38, 85–133.

    MATH  Google Scholar 

  67. 67.

    Yeoh, W., & Yokoo, M. (2012). Distributed problem solving. AI Magazine, 33 (3), 53–65.

    Article  Google Scholar 

  68. 68.

    Zivan, R., Yedidsion, H., Okamoto, S., Glinton, R., & Sycara, K. (2015). Distributed constraint optimization for teams of mobile sensing agents. Journal of Autonomous Agents and Multi-Agent Systems, 29(3), 495–536.

    Article  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers for their comments. This research is partially supported by the National Science Foundation under grants 1345232, 1401639, 1458595, 1526842, and 1550662. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsoring organizations, agencies, or the U.S. government.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ferdinando Fioretto.

Additional information

This journal article is an extended version of an earlier conference paper [26]. It includes (i) a parallelized design and implementation of Mini-Bucket Elimination with GPUs on WCSPs; (ii) a more detailed description of the GPU operations to ease reproducibility; (iii) a significantly more comprehensive empirical evaluation with additional WCSP benchmarks and different GPU devices.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fioretto, F., Pontelli, E., Yeoh, W. et al. Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs. Constraints 23, 1–43 (2018). https://doi.org/10.1007/s10601-017-9274-1

Download citation

Keywords

  • GPU
  • WCSP
  • MPE
  • DCOP
  • (Mini-)bucket elimination
  • (A)DPOP