## Abstract

*Discrete optimization* is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including *Weighted Constraint Programs* (WCSPs), *Distributed Constraint Optimization* (DCOP), as well as optimization in stochastic variants such as the tasks of finding the *most probable explanation* (MPE) in *belief networks*. Inference-based algorithms are powerful techniques for solving discrete optimization problems, which can be used independently or in combination with other techniques. However, their applicability is often limited by their compute intensive nature and their space requirements. This paper proposes the design and implementation of a novel inference-based technique, which exploits modern massively parallel architectures, such as those found in Graphical Processing Units (GPUs), to speed up the resolution of exact and approximated inference-based algorithms for discrete optimization. The paper studies the proposed algorithm in both centralized and distributed optimization contexts. The paper demonstrates that the use of GPUs provides significant advantages in terms of runtime and scalability, achieving up to two orders of magnitude in speedups and showing a considerable reduction in execution time (up to 345 times faster) with respect to a sequential version.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
For simplicity, we assume that tuples of variables are built according to a predefined ordering.

- 2.
^{2}For simplicity, we also use \(\theta \)to represent the tuple \(\langle \theta (x_{i_{1}}),\dots , \theta (x_{i_{h}})\rangle \)where \(\{x_{i_{1}},\dots , x_{i_{h}}\}\)is the domain of \(\theta \). - 3.
The

*primal graph*of a DCOP is equivalent to that of the corresponding WCSP. - 4.
A warp is typically composed of 32 threads.

- 5.
In modern devices, each SM allots 64KB for registers space.

- 6.
Accesses to the GPU global memory are cached into cache lines of 128 Bytes, and can be fetched by all requiring threads in a warp.

- 7.
Our source code is available at https://github.com/nandofioretto/GpuBE, and https://github.com/nandofioretto/GpuDBE

- 8.
Downloadable from http://costfunction.org/en/benchmark/ and http://graphmod.ics.uci.edu/group/Repository

- 9.
Recall that BE needs to process bucket-tables whose number of rows is in \(O(d^{w^{*}})\).

- 10.
We use the

*Pearson product-moment correlation*coefficient. - 11.
In all other experiments we used the GeForce GTX Titan, as this is the best, most affordable card at our disposal.

## References

- 1.
Abdennadher, S., & Schlenker, H. (1999). Nurse scheduling using constraint logic programming. In

*Proceedings of the conference on innovative applications of artificial intelligence (IAAI)*(pp. 838–843). - 2.
Allouche, D., André, I., Barbe, S., Davies, J., de Givry, S., Katsirelos, G., O’Sullivan, B., Prestwich, S.D., Schiex, T., & Traoré, S. (2014). Computational protein design as an optimization problem.

*Artificial Intelligence*,*212*, 59–79. - 3.
Allouche, D., de Givry, S., Nguyen, H., & Schiex, T. (2013). Toulbar2 to solve Weighted Partial max-SAT. Tech. rep. INRA.

- 4.
Apt, K. (2003).

*Principles of constraint programming*. Cambridge University Press. - 5.
Arbelaez, A., & Codognet, P. (2014). A GPU implementation of parallel constraint-based local search. In

*Proceedings of the euromicro international conference on parallel, distributed and network-based processing (PDP)*(pp. 648–655). - 6.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks.

*Science*,*286*(5439), 509–512. - 7.
Bistaffa, F., Bomberi, N., & Farinelli, A. (2016). CUBE: a CUDA approach for bucket elimination on GPUs. In

*Proceedings of the European conference on artificial intelligence (ECAI), p. to appear*. - 8.
Bistarelli, S., Montanari, U., & Rossi, F. (1997). Semiring-based constraint satisfaction and optimization.

*Journal of the ACM*,*44*(2), 201–236. - 9.
Boyer, V., El Baz, D., & Elkihel, M. (2012). Solving knapsack problems on GPU.

*Computers & Operations Research*,*39*(1), 42–47. - 10.
Brito, I., & Meseguer, P. (2010). Improving DPOP with function filtering. In

*Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS)*(pp. 141–158). - 11.
Burke, E.K., De Causmaecker, P., Berghe, G.V., & Van Landeghem, H. (2004). The state of the art of nurse rostering.

*Journal of scheduling*,*7*(6), 441–499. - 12.
Campeotto, F., Dovier, A., Fioretto, F., & Pontelli, E. (2014). A GPU implementation of large neighborhood search for solving constraint optimization problems. In

*Proceedings of the european conference on artificial intelligence (ECAI)*(pp. 189–194). - 13.
Campeotto, F., Palù, A.D., Dovier, A., Fioretto, F., & Pontelli, E. (2013). A constraint solver for flexible protein model.

*Journal of Artificial Intelligence Research*,*48*, 953–1000. - 14.
Chakroun, I., Mezmaz, M.S., Melab, N., & Bendjoudi, A. (2013). Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm.

*Concurrency and Computation: Practice and Experience*,*25*(8), 1121–1136. - 15.
Dechter, R. (1999). Bucket elimination: a unifying framework for reasoning.

*Artificial Intelligence*,*113*(1), 41–85. - 16.
Dechter, R. (2003).

*Constraint processing*. San Francisco: Morgan Kaufmann Publishers Inc. - 17.
Dechter, R. (2013). Reasoning with probabilistic and deterministic graphical models: exact algorithms.

*Synthesis Lectures on Artificial Intelligence and Machine Learning*,*7*(3), 1–191. - 18.
Dechter, R., & Pearl, J. (1988).

*Network-based heuristics for constraint-satisfaction problems*. Springer. - 19.
Dechter, R., & Rish, I. (2003). Mini-buckets: a general scheme for bounded inference.

*Journal of the ACM*,*50*(2), 107–153. - 20.
Diamos, G.F., Ashbaugh, B., Maiyuran, S., Kerr, A., Wu, H., & Yalamanchili, S. (2011). SIMD re-convergence at thread frontiers. In

*Proceedings of the annual IEEE/ACM international symposium on microarchitecture*(pp. 477–488). - 21.
Dovier, A., Formisano, A., & Pontelli, E. (2013). Autonomous agents coordination: action languages meet CLP() and Linda.

*Theory and Practice of Logic Programming*,*13*(2), 149–173. - 22.
Edelkamp, S., Jabbar, S., & Schrödl, S. (2004). External A*. In

*Advances in artificial intelligence: 27th annual German conference on AI, (KI) 2004*(pp. 226–240). - 23.
Farinelli, A., Rogers, A., Petcu, A., & Jennings, N. (2008). Decentralised coordination of low-power embedded devices using the Max-Sum algorithm. In

*Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS)*(pp. 639–646). - 24.
Fioretto, F., Dovier, A., & Pontelli, E. (2015). Constrained community-based gene regulatory network inference.

*ACM Trans. Model. Comput. Simul.*,*25*(2), 11. - 25.
Fioretto, F., Le, T., Yeoh, W., Pontelli, E., & Son, T.C. (2014). Improving DPOP with branch consistency for solving distributed constraint optimization problems. In

*Proceedings of the international conference on principles and practice of constraint programming (CP)*(pp. 307–323). - 26.
Fioretto, F., Le, T., Yeoh, W., Pontelli, E., & Son, T.C. (2015). Exploiting GPUs in solving (distributed) constraint optimization problems with dynamic programming. In

*Proceedings of the international conference on principles and practice of constraint programming (CP)*(pp. 121– 139). - 27.
Fioretto, F., Yeoh, W., & Pontelli, E. (2016). A dynamic programming-based MCMC framework for solving DCOPs with GPUs. In

*Proceedings of the international conference on principles and practice of constraint programming (CP)*(pp. 813–831). - 28.
Fioretto, F., Yeoh, W., & Pontelli, E. (2016). Multi-variable agent decomposition for DCOPs. In

*Proceedings of the AAAI conference on artificial intelligence (AAAI)*(pp. 2480–2486). - 29.
Fioretto, F., Yeoh, W., & Pontelli, E. (2017). A multiagent system approach to scheduling devices in smart homes. In

*Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS)*(pp. 981–989). - 30.
Fioretto, F., Yeoh, W., Pontelli, E., Ma, Y., & Ranade, S. (2017). A DCOP approach to the economic dispatch with demand response. In

*Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS)*(pp. 981–989). - 31.
Fishelson, M., & Geiger, D. (2002). Exact genetic linkage computations for general pedigrees.

*Bioinformatics*,*18*(suppl 1), S189–S198. - 32.
Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using bayesian networks to analyze expression data.

*Journal of Computational Biology*,*7*(3-4), 601–620. - 33.
Gaudreault, J., Frayret, J.M., & Pesant, G. (2009). Distributed search for supply chain coordination.

*Computers in Industry*,*60*(6), 441–451. - 34.
Gupta, S., Yeoh, W., Pontelli, E., Jain, P., & Ranade, S.J. (2013). Modeling microgrid islanding problems as DCOPs. In

*North American power symposium (NAPS)*(pp. 1–6): IEEE. - 35.
Hamadi, Y., Bessière, C., & Quinqueton, J. (1998). Distributed intelligent backtracking. In

*Proceedings of the European conference on artificial intelligence (ECAI)*(pp. 219–223). - 36.
Han, T.D., & Abdelrahman, T.S. (2011). Reducing branch divergence in GPU programs. In

*Proceedings of the fourth workshop on general purpose processing on graphics processing units*(pp. 3:1–3:8). New York: ACM Press. - 37.
Kask, K., Dechter, R., & Gelfand, A.E. (2012). Beem: bucket elimination with external memory. arXiv:1203.3487.

- 38.
Kumar, A., Faltings, B., & Petcu, A. (2009). Distributed constraint optimization with structured resource constraints. In

*Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS)*(pp. 923–930). - 39.
Lalami, M.E., El Baz, D., & Boyer, V. (2011). Multi GPU implementation of the simplex algorithm. In

*Proceedings of the international conference on high performance computing and communication (HPCC)*, (Vol. 11 pp. 179–186). - 40.
Larrosa, J. (2002). Node and arc consistency in weighted csp. In

*Proceedings of the AAAI conference on artificial intelligence (AAAI)*(pp. 48–53). - 41.
Lars, O., & Rina, D. (2017). And/or branch-and-bound on a computational grid.

*Journal of Artificial Intelligence Research*(to appear). - 42.
Le, T., Fioretto, F., Yeoh, W., Son, T.C., & Pontelli, E. (2016). ER-DCOPS: a framework for distributed constraint optimization with uncertainty in constraint utilities. In

*Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS)*(pp. 605– 614). - 43.
Lerner, U., Parr, R., Koller, D., Biswas, G., & et al. (2000). Bayesian fault detection and diagnosis in dynamic systems. In

*AAAI/IAAI*(pp. 531–537). - 44.
Lim, H., Yuan, C., & Hansen, E.A. (2010). Scaling up map search in bayesian networks using external memory.

*On Probabilistic Graphical Models*, 177. - 45.
Maheswaran, R., Tambe, M., Bowring, E., Pearce, J., & Varakantham, P. (2004). Taking DCOP to the real world: efficient complete solutions for distributed event scheduling. In

*Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS)*(pp. 310–317). - 46.
Marinescu, R., & Dechter, R. (2009). Memory intensive and/or search for combinatorial optimization in graphical models.

*Artificial Intelligence*,*173*(16-17), 1492–1524. - 47.
Modi, P., Shen, W.M., Tambe, M., & Yokoo, M. (2005). ADOPT: asynchronous distributed constraint optimization with quality guarantees.

*Artificial Intelligence*,*161*(1–2), 149–180. - 48.
Montanari, U. (1974). Networks of constraints: fundamental properties and applications to picture processing.

*Information Sciences*,*7*, 95–132. - 49.
Pawłowski, K., Kurach, K., Michalak, T., & Rahwan, T. (2104). Coalition structure generation with the graphic processor unit. Tech. Rep. CS-RR-13-07, Department of Computer Science, University of Oxford.

- 50.
Pearl, J. (1988).

*Probabilistic reasoning in intelligent systems: Networks of plausible inference*. San Francisco: Morgan Kaufmann Publishers Inc. - 51.
Pesant, G. (2004). A regular language membership constraint for finite sequences of variables. In

- 52.
Petcu, A., & Faltings, B. (2005). Approximations in distributed optimization. In

- 53.
Petcu, A., & Faltings, B. (2005). A scalable method for multiagent constraint optimization. In

*Proceedings of the international joint conference on artificial intelligence (IJCAI)*(pp. 1413–1420). - 54.
Quimper, C.G., & Walsh, T. (2006). Global grammar constraints. In

- 55.
Rodrigues, L., & Magatao, L. (2007). Enhancing supply chain decisions using constraint programming: a case study. In

*MICAI 2007: advances in artificial intelligence*, (Vol. LNCS 4827 pp. 1110–1121): Springer. - 56.
Rossi, F., van Beek, P., & Walsh, T. (eds.) (2006).

*Handbook of constraint programming*. Elsevier. - 57.
Rust, P., Picard, G., & Ramparany, F. (2016). Using message-passing DCOP algorithms to solve energy-efficient smart environment configuration problems. In

*Proceedings of the international joint conference on artificial intelligence (IJCAI)*(pp. 468–474). - 58.
Sanders, J., & Kandrot, E. (2010). CUDA By example. An introduction to general-purpose GPU programming. Addison Wesley.

- 59.
Sandholm, T. (2002). Algorithm for optimal winner determination in combinatorial auctions.

*Artificial Intelligence*,*135*(1), 1–54. - 60.
Schiex, T., Fargier, H., Verfaillie, G., & et al. (1995). Valued constraint satisfaction problems: Hard and easy problems.

*Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)*,*95*, 631–639. - 61.
Shapiro, L.G., & Haralick, R.M. (1981). Structural descriptions and inexact matching.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*3*(5), 504–519. - 62.
Silberstein, M., Schuster, A., Geiger, D., Patney, A., & Owens, J.D. (2008). Efficient computation of sum-products on gpus through software-managed cache. In

*Proceedings of the 22nd annual international conference on supercomputing*(pp. 309–318): ACM. - 63.
Sturtevant, N.R., & Rutherford, M.J. (2013). Minimizing writes in parallel external memory search. In

*Proceedings of the international joint conference on artificial intelligence (IJCAI)*. - 64.
Sultanik, E., Modi, P.J., & Regli, W.C. (2007). On modeling multiagent task scheduling as a distributed constraint optimization problem. In

*Proceedings of the international joint conference on artificial intelligence (IJCAI)*(pp. 1531–1536). - 65.
Trick, M.A. (2003). A dynamic programming approach for consistency and propagation for knapsack constraints.

*Annals of Operations Research*,*118*(1-4), 73–84. - 66.
Yeoh, W., Felner, A., & Koenig, S. (2010). Bnb-ADOPT: an asynchronous branch-and-bound DCOP algorithm.

*Journal of Artificial Intelligence Research*,*38*, 85–133. - 67.
Yeoh, W., & Yokoo, M. (2012). Distributed problem solving.

*AI Magazine*,*33*(3), 53–65. - 68.
Zivan, R., Yedidsion, H., Okamoto, S., Glinton, R., & Sycara, K. (2015). Distributed constraint optimization for teams of mobile sensing agents.

*Journal of Autonomous Agents and Multi-Agent Systems*,*29*(3), 495–536.

## Acknowledgements

We thank the anonymous reviewers for their comments. This research is partially supported by the National Science Foundation under grants 1345232, 1401639, 1458595, 1526842, and 1550662. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsoring organizations, agencies, or the U.S. government.

## Author information

### Affiliations

### Corresponding author

## Additional information

This journal article is an extended version of an earlier conference paper [26]. It includes (*i*) a parallelized design and implementation of Mini-Bucket Elimination with GPUs on WCSPs; (*ii*) a more detailed description of the GPU operations to ease reproducibility; (*iii*) a significantly more comprehensive empirical evaluation with additional WCSP benchmarks and different GPU devices.

## Rights and permissions

## About this article

### Cite this article

Fioretto, F., Pontelli, E., Yeoh, W. *et al.* Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs.
*Constraints* **23, **1–43 (2018). https://doi.org/10.1007/s10601-017-9274-1

Published:

Issue Date:

### Keywords

- GPU
- WCSP
- MPE
- DCOP
- (Mini-)bucket elimination
- (A)DPOP