Skip to main content

Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer Algorithms

  • Conference paper
  • First Online:
High Performance Computing for Computational Science – VECPAR 2018 (VECPAR 2018)

Abstract

CUDA Dynamic Parallelism (CDP) is an extension of the GPGPU programming model proposed to better address irregular applications and recursive patterns of computation. However, processing memory demanding problems by using CDP is not straightforward, because of its particular memory organization. This work presents an algorithm to deal with such an issue. It dynamically calculates and configures the CDP runtime variables and the GPU heap on the basis of an analysis of the partial backtracking tree. The proposed algorithm was implemented for solving permutation combinatorial problems and experimented on two test-cases: N-Queens and the Asymmetric Travelling Salesman Problem. The proposed algorithm allows different CDP-based backtracking from the literature to solve memory demanding problems, adaptively with respect to the number of recursive kernel generations and the presence of dynamic allocations on GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, J., Yalamanchili, S.: Characterization and analysis of dynamic parallelismin unstructured GPU applications. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 51–60. IEEE (2014)

    Google Scholar 

  2. Mukherjee, S.S., Sharma, S.D., Hill, M.D., Larus, J.R., Rogers, A., Saltz, J.: Efficient support for irregular applications on distributed-memory machines. In: ACM SIGPLAN Notices, vol. 30, pp. 68–79. ACM (1995)

    Google Scholar 

  3. Yelick, K.A.: Programming models for irregular applications. ACM SIGPLAN Not. 28(1), 28–31 (1993)

    Article  Google Scholar 

  4. Gendron, B., Crainic, T.G.: Parallel branch-and-bound algorithms: survey and synthesis. Oper. Res. 42(6), 1042–1066 (1994)

    Article  MathSciNet  Google Scholar 

  5. Brodtkorb, A., Dyken, C., Hagen, T., Hjelmervik, J., Storaasli, O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010)

    Google Scholar 

  6. Adinetz, A.: CUDA dynamic parallelism: API and principles (2014). Accessed 10 May 2018

    Google Scholar 

  7. Carneiro Pessoa, T., Gmys, J., de Carvalho Junior, F.H., Melab, N., Tuyttens, D.: GPU-accelerated backtracking using CUDA dynamic parallelism. Concurr. Comput.: Pract. Exp. 30(9), e4374 (2017)

    Google Scholar 

  8. NVIDIA: CUDA C programming guide (version 9.1) (2018)

    Google Scholar 

  9. Cook, W.: In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation. Princeton University Press, Princeton (2012)

    MATH  Google Scholar 

  10. Cirasella, J., Johnson, D.S., McGeoch, L.A., Zhang, W.: The asymmetric traveling salesman problem: algorithms, instance generators, and tests. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 32–59. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44808-X_3

    Chapter  MATH  Google Scholar 

  11. Plauth, M., Feinbube, F., Schlegel, F., Polze, A.: A performance evaluation of dynamic parallelism for fine-grained, irregular workloads. Int. J. Netw. Comput. 6(2), 212–229 (2016)

    Article  Google Scholar 

  12. Zhang, T., Shu, W., Wu, M.-Y.: Optimization of N-queens solvers on graphics processors. In: Temam, O., Yew, P.-C., Zang, B. (eds.) APPT 2011. LNCS, vol. 6965, pp. 142–156. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24151-2_11

    Chapter  Google Scholar 

  13. Zhang, P., et al.: Dynamic parallelism for simple and efficient GPU graph algorithms. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, p. 11. ACM (2015)

    Google Scholar 

  14. DiMarco, J., Taufer, M.: Performance impact of dynamic parallelism on different clustering algorithms and the new GPU architecture. In: Proceedings of SPIE Defense, Security, and Sensing Symposium (2013)

    Google Scholar 

  15. Zhang, W.: Branch-and-bound search algorithms and their computational complexity. Technical report, DTIC Document (1996)

    Google Scholar 

  16. Feinbube, F., Rabe, B., von Löwis, M., Polze, A.: NQueens on CUDA: optimization issues. In: Ninth International Symposium on Parallel and Distributed Computing (ISPDC), pp. 63–70. IEEE (2010)

    Google Scholar 

  17. Carneiro, T., Muritiba, A., Negreiros, M., de Campos, G.: A new parallel schema for branch-and-bound algorithms using GPGPU. In: 23rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 41–47 (2011)

    Google Scholar 

  18. Karypis, G., Kumar, V.: Unstructured tree search on SIMD parallel computers. IEEE Trans. Parallel Distrib. Syst. 5(10), 1057–1072 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tiago Carneiro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carneiro, T., Gmys, J., Melab, N., de Carvalho Junior, F.H., Rebouças Filho, P.P., Tuyttens, D. (2019). Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer Algorithms. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15996-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15995-5

  • Online ISBN: 978-3-030-15996-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics