Abstract
CUDA Dynamic Parallelism (CDP) is an extension of the GPGPU programming model proposed to better address irregular applications and recursive patterns of computation. However, processing memory demanding problems by using CDP is not straightforward, because of its particular memory organization. This work presents an algorithm to deal with such an issue. It dynamically calculates and configures the CDP runtime variables and the GPU heap on the basis of an analysis of the partial backtracking tree. The proposed algorithm was implemented for solving permutation combinatorial problems and experimented on two test-cases: N-Queens and the Asymmetric Travelling Salesman Problem. The proposed algorithm allows different CDP-based backtracking from the literature to solve memory demanding problems, adaptively with respect to the number of recursive kernel generations and the presence of dynamic allocations on GPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, J., Yalamanchili, S.: Characterization and analysis of dynamic parallelismin unstructured GPU applications. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 51–60. IEEE (2014)
Mukherjee, S.S., Sharma, S.D., Hill, M.D., Larus, J.R., Rogers, A., Saltz, J.: Efficient support for irregular applications on distributed-memory machines. In: ACM SIGPLAN Notices, vol. 30, pp. 68–79. ACM (1995)
Yelick, K.A.: Programming models for irregular applications. ACM SIGPLAN Not. 28(1), 28–31 (1993)
Gendron, B., Crainic, T.G.: Parallel branch-and-bound algorithms: survey and synthesis. Oper. Res. 42(6), 1042–1066 (1994)
Brodtkorb, A., Dyken, C., Hagen, T., Hjelmervik, J., Storaasli, O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010)
Adinetz, A.: CUDA dynamic parallelism: API and principles (2014). Accessed 10 May 2018
Carneiro Pessoa, T., Gmys, J., de Carvalho Junior, F.H., Melab, N., Tuyttens, D.: GPU-accelerated backtracking using CUDA dynamic parallelism. Concurr. Comput.: Pract. Exp. 30(9), e4374 (2017)
NVIDIA: CUDA C programming guide (version 9.1) (2018)
Cook, W.: In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation. Princeton University Press, Princeton (2012)
Cirasella, J., Johnson, D.S., McGeoch, L.A., Zhang, W.: The asymmetric traveling salesman problem: algorithms, instance generators, and tests. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 32–59. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44808-X_3
Plauth, M., Feinbube, F., Schlegel, F., Polze, A.: A performance evaluation of dynamic parallelism for fine-grained, irregular workloads. Int. J. Netw. Comput. 6(2), 212–229 (2016)
Zhang, T., Shu, W., Wu, M.-Y.: Optimization of N-queens solvers on graphics processors. In: Temam, O., Yew, P.-C., Zang, B. (eds.) APPT 2011. LNCS, vol. 6965, pp. 142–156. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24151-2_11
Zhang, P., et al.: Dynamic parallelism for simple and efficient GPU graph algorithms. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, p. 11. ACM (2015)
DiMarco, J., Taufer, M.: Performance impact of dynamic parallelism on different clustering algorithms and the new GPU architecture. In: Proceedings of SPIE Defense, Security, and Sensing Symposium (2013)
Zhang, W.: Branch-and-bound search algorithms and their computational complexity. Technical report, DTIC Document (1996)
Feinbube, F., Rabe, B., von Löwis, M., Polze, A.: NQueens on CUDA: optimization issues. In: Ninth International Symposium on Parallel and Distributed Computing (ISPDC), pp. 63–70. IEEE (2010)
Carneiro, T., Muritiba, A., Negreiros, M., de Campos, G.: A new parallel schema for branch-and-bound algorithms using GPGPU. In: 23rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 41–47 (2011)
Karypis, G., Kumar, V.: Unstructured tree search on SIMD parallel computers. IEEE Trans. Parallel Distrib. Syst. 5(10), 1057–1072 (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Carneiro, T., Gmys, J., Melab, N., de Carvalho Junior, F.H., Rebouças Filho, P.P., Tuyttens, D. (2019). Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer Algorithms. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-15996-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15995-5
Online ISBN: 978-3-030-15996-2
eBook Packages: Computer ScienceComputer Science (R0)