A GPU-Based Backtracking Algorithm for Permutation Combinatorial Problems

  • Tiago Carneiro PessoaEmail author
  • Jan Gmys
  • Nouredine Melab
  • Francisco Heron de Carvalho Junior
  • Daniel Tuyttens
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10048)


This work presents a GPU-based backtracking algorithm for permutation combinatorial problems based on the Integer-Vector-Matrix (IVM) data structure. IVM is a data structure dedicated to permutation combinatorial optimization problems. In this algorithm, the load balancing is performed without intervention of the CPU, inside a work stealing phase invoked after each node expansion phase. The proposed work stealing approach uses a virtual n-dimensional hypercube topology and a triggering mechanism to reduce the overhead incurred by dynamic load balancing. We have implemented this new algorithm for solving instances of the Asymmetric Travelling Salesman Problem by implicit enumeration, a scenario where the cost of node evaluation is low, compared to the overall search procedure. Experimental results show that the dynamically load balanced IVM-algorithm reaches speed-ups up to 17\(\times \) over a serial implementation using a bitset-data structure and up to 2\(\times \) over its GPU counterpart.


GPU computing Backtracking Depth-first search Load balancing Work stealing 


  1. 1.
    Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on GPUs. In: 2012 IEEE International Symposium on Workload Characterization (IISWC), pp. 141–151. IEEE (2012)Google Scholar
  2. 2.
    Carneiro, T., Muritiba, A., Negreiros, M., de Campos, G.: A new parallel schema for branch-and-bound algorithms using GPGPU. In: 23rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 41–47 (2011)Google Scholar
  3. 3.
    Carneiro, T., Nobre, R.H., Negreiros, M., de Campos, G.A.L.: Depth-first search versus jurema search on GPU branch-and-bound algorithms: a case study. In: NVIDIA’s GCDF - GPU Computing Developer Forum on XXXII Congresso da Sociedade Brasileira de Computação (CSBC) (2012)Google Scholar
  4. 4.
    Cirasella, J., Johnson, D.S., McGeoch, L.A., Zhang, W.: The asymmetric traveling salesman problem: algorithms, instance generators, and tests. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 32–59. Springer, Heidelberg (2001). doi: 10.1007/3-540-44808-X_3 CrossRefGoogle Scholar
  5. 5.
    Cook, W.: In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation. Princeton University Press, Princeton (2012)zbMATHGoogle Scholar
  6. 6.
    Defour, D., Marin, M.: Regularity versus load-balancing on GPU for treefix computations. Procedia Comput. Sci. 18, 309–318 (2013)CrossRefGoogle Scholar
  7. 7.
    Feinbube, F., Rabe, B., von Lowis, M., Polze, A.: NQueens on CUDA: optimization issues. In: 2010 Ninth International Symposium on Parallel and Distributed Computing (ISPDC), pp. 63–70. IEEE (2010)Google Scholar
  8. 8.
    Gmys, J., Mezmaz, M., Melab, N., Tuyttens, D.: A GPU-based Branch-and-Bound algorithm using Integer–Vector–Matrix data structure. Parallel Comput. (2016).
  9. 9.
    Jenkins, J., Arkatkar, I., Owens, J.D., Choudhary, A., Samatova, N.F.: Lessons learned from exploring the backtracking paradigm on the GPU. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6853, pp. 425–437. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-23397-5_42 CrossRefGoogle Scholar
  10. 10.
    Karp, R.M., Zhang, Y.: Randomized parallel algorithms for backtrack search and branch-and-bound computation. J. ACM (JACM) 40(3), 765–789 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Karypis, G., Kumar, V.: Unstructured tree search on SIMD parallel computers. IEEE Trans. Parallel Distrib. Syst. 5(10), 1057–1072 (1994)CrossRefGoogle Scholar
  12. 12.
    Knuth, D.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2, p. 192. Addison-Wesley, Reading (1997). iSBN=9780201896848Google Scholar
  13. 13.
    Li, L., Liu, H., Wang, H., Liu, T., Li, W.: A parallel algorithm for game tree search using GPGPU. IEEE Trans. Parallel Distrib. Syst. 26(8), 2114–2127 (2015)CrossRefGoogle Scholar
  14. 14.
    Mezmaz, M., Leroy, R., Melab, N., Tuyttens, D.: A multi-core parallel branch-and-bound algorithm using factorial number system. In: 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Phoenix, AZ, pp. 1203–1212, May 2014Google Scholar
  15. 15.
    Plauth, M., Feinbube, F., Schlegel, F., Polze, A.: Using dynamic parallelism for fine-grained, irregular workloads: a case study of the n-queens problem. In: 2015 Third International Symposium on Computing and Networking (CANDAR), pp. 404–407. IEEE (2015)Google Scholar
  16. 16.
    Rocki, K., Suda, R.: Parallel minimax tree searching on GPU. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 449–456. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-14390-8_47 CrossRefGoogle Scholar
  17. 17.
    San Segundo, P., Rossi, C., Rodriguez-Losada, D.: Recent Developments in Bit-Parallel Algorithms. INTECH Open Access Publisher (2008)Google Scholar
  18. 18.
    Yelick, K.A.: Programming models for irregular applications. ACM SIGPLAN Not. 28(1), 28–31 (1993)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Tiago Carneiro Pessoa
    • 1
    Email author
  • Jan Gmys
    • 2
    • 3
  • Nouredine Melab
    • 3
  • Francisco Heron de Carvalho Junior
    • 1
  • Daniel Tuyttens
    • 2
  1. 1.ParGO Research Group (Parallelism, Optimization and Graphs), Mestrado e Doutorado em Ciência da ComputaçãoUniversidade Federal do CearáFortalezaBrazil
  2. 2.Mathematics and Operational Research Department (MARO)University of MonsMonsBelgium
  3. 3.INRIA Lille Nord EuropeUniversité Lille 1, CNRS/CRIStAL, Cité scientifiqueVilleneuve D’AscqFrance

Personalised recommendations