Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer Algorithms

Carneiro, Tiago; Gmys, Jan; Melab, Nouredine; de Carvalho Junior, Francisco Heron; Rebouças Filho, Pedro Pedrosa; Tuyttens, Daniel

doi:10.1007/978-3-030-15996-2_2

Tiago Carneiro^21,24,
Jan Gmys^22,24,
Nouredine Melab²⁴,
Francisco Heron de Carvalho Junior²³,
Pedro Pedrosa Rebouças Filho²¹ &
…
Daniel Tuyttens²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11333))

Included in the following conference series:

International Conference on Vector and Parallel Processing

419 Accesses
1 Citations

Abstract

CUDA Dynamic Parallelism (CDP) is an extension of the GPGPU programming model proposed to better address irregular applications and recursive patterns of computation. However, processing memory demanding problems by using CDP is not straightforward, because of its particular memory organization. This work presents an algorithm to deal with such an issue. It dynamically calculates and configures the CDP runtime variables and the GPU heap on the basis of an analysis of the partial backtracking tree. The proposed algorithm was implemented for solving permutation combinatorial problems and experimented on two test-cases: N-Queens and the Asymmetric Travelling Salesman Problem. The proposed algorithm allows different CDP-based backtracking from the literature to solve memory demanding problems, adaptively with respect to the number of recursive kernel generations and the presence of dynamic allocations on GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, J., Yalamanchili, S.: Characterization and analysis of dynamic parallelismin unstructured GPU applications. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 51–60. IEEE (2014)
Google Scholar
Mukherjee, S.S., Sharma, S.D., Hill, M.D., Larus, J.R., Rogers, A., Saltz, J.: Efficient support for irregular applications on distributed-memory machines. In: ACM SIGPLAN Notices, vol. 30, pp. 68–79. ACM (1995)
Google Scholar
Yelick, K.A.: Programming models for irregular applications. ACM SIGPLAN Not. 28(1), 28–31 (1993)
Article Google Scholar
Gendron, B., Crainic, T.G.: Parallel branch-and-bound algorithms: survey and synthesis. Oper. Res. 42(6), 1042–1066 (1994)
Article MathSciNet Google Scholar
Brodtkorb, A., Dyken, C., Hagen, T., Hjelmervik, J., Storaasli, O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010)
Google Scholar
Adinetz, A.: CUDA dynamic parallelism: API and principles (2014). Accessed 10 May 2018
Google Scholar
Carneiro Pessoa, T., Gmys, J., de Carvalho Junior, F.H., Melab, N., Tuyttens, D.: GPU-accelerated backtracking using CUDA dynamic parallelism. Concurr. Comput.: Pract. Exp. 30(9), e4374 (2017)
Google Scholar
NVIDIA: CUDA C programming guide (version 9.1) (2018)
Google Scholar
Cook, W.: In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation. Princeton University Press, Princeton (2012)
MATH Google Scholar
Cirasella, J., Johnson, D.S., McGeoch, L.A., Zhang, W.: The asymmetric traveling salesman problem: algorithms, instance generators, and tests. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 32–59. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44808-X_3
Chapter MATH Google Scholar
Plauth, M., Feinbube, F., Schlegel, F., Polze, A.: A performance evaluation of dynamic parallelism for fine-grained, irregular workloads. Int. J. Netw. Comput. 6(2), 212–229 (2016)
Article Google Scholar
Zhang, T., Shu, W., Wu, M.-Y.: Optimization of N-queens solvers on graphics processors. In: Temam, O., Yew, P.-C., Zang, B. (eds.) APPT 2011. LNCS, vol. 6965, pp. 142–156. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24151-2_11
Chapter Google Scholar
Zhang, P., et al.: Dynamic parallelism for simple and efficient GPU graph algorithms. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, p. 11. ACM (2015)
Google Scholar
DiMarco, J., Taufer, M.: Performance impact of dynamic parallelism on different clustering algorithms and the new GPU architecture. In: Proceedings of SPIE Defense, Security, and Sensing Symposium (2013)
Google Scholar
Zhang, W.: Branch-and-bound search algorithms and their computational complexity. Technical report, DTIC Document (1996)
Google Scholar
Feinbube, F., Rabe, B., von Löwis, M., Polze, A.: NQueens on CUDA: optimization issues. In: Ninth International Symposium on Parallel and Distributed Computing (ISPDC), pp. 63–70. IEEE (2010)
Google Scholar
Carneiro, T., Muritiba, A., Negreiros, M., de Campos, G.: A new parallel schema for branch-and-bound algorithms using GPGPU. In: 23rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 41–47 (2011)
Google Scholar
Karypis, G., Kumar, V.: Unstructured tree search on SIMD parallel computers. IEEE Trans. Parallel Distrib. Syst. 5(10), 1057–1072 (1994)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Federal de Educação, Ciência e Tecnologia do Ceará, Fortaleza, Brazil
Tiago Carneiro & Pedro Pedrosa Rebouças Filho
Mathematics and Operational Research Department (MARO), University of Mons, Mons, Belgium
Jan Gmys & Daniel Tuyttens
Programa de Mestrado e Doutorado em Ciência da Computação, Universidade Federal do Ceará, Fortaleza, Brazil
Francisco Heron de Carvalho Junior
Inria Lille Nord Europe, Université Lille 1, CNRS/CRIStAL, Villeneuve-d’Ascq, France
Tiago Carneiro, Jan Gmys & Nouredine Melab

Authors

Tiago Carneiro
View author publications
You can also search for this author in PubMed Google Scholar
Jan Gmys
View author publications
You can also search for this author in PubMed Google Scholar
Nouredine Melab
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Heron de Carvalho Junior
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Pedrosa Rebouças Filho
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tuyttens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tiago Carneiro .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, São Paulo, Brazil
Hermes Senger
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Osni Marques
Universidade Estadual Paulista Júlio de Mesquita Filho, Presidente Prudente, São Paulo, Brazil
Rogerio Garcia
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Tatiana Pinheiro de Brito
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Rogério Iope
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Silvio Stanzani
Universidad Nacional de San Luis, San Luis, Argentina
Veronica Gil-Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carneiro, T., Gmys, J., Melab, N., de Carvalho Junior, F.H., Rebouças Filho, P.P., Tuyttens, D. (2019). Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer Algorithms. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-15996-2_2
Published: 26 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15995-5
Online ISBN: 978-3-030-15996-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics