Thrust and CUDA in Data Intensive Algorithms

Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 185)

Abstract

Huge memory bandwidth and instruction throughput make GPU processors very attractive for many algorithms which can only utilize Single Instruction Multiple Data (SIMD) architecture. Databases and their data intensive operations may also benefit from parallel GPU threads and thread streams. Many libraries offer simple interfaces for GPU, which make memory and threads management as easy as possible. Trade-off in programmers’ time, code structure and algorithm efficiency is critical for business applications. In this paper we evaluate the usage of Thrust library and its ability to manage millions of threads when compared to pure CUDA C program.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bell, N., Hoberock, J.: Thrust: A Productivity-Oriented Library for CUDA. In: GPU Computing Gems Jade Edition. Morgan Kaufmann (2011)Google Scholar
  2. 2.
    Dawes, B., Abrahams, D.: Boost C++ Libraries (2012), www.boost.org
  3. 3.
    Hoberock, J., Bell, N.: Thrust CUDA Library v.1.4.0 (2011)Google Scholar
  4. 4.
    Kaczmarski, K., Rudny, T.: MOLAP Cube Based on Parallel Scan Algorithm. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 125–138. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Knuth, D.: The art of computer programming, generating all tuples and permutations, fascicle 2, vol. 4 (2005)Google Scholar
  6. 6.
    NVIDIA Corporation: CUDA C best practices guide (2011)Google Scholar
  7. 7.
    NVIDIA Corporation: CUDA C Toolkit and SDK v.4.0 (2011)Google Scholar
  8. 8.
    NVIDIA Corporation: NVIDIA CUDA C programming guide version 4.0 (2011)Google Scholar
  9. 9.
    Sedgewick, R.: Permutation generation methods. ACM Comp. Surv., 137–164 (1977)Google Scholar
  10. 10.
    Stromme, A., Carlson, R.: Chestnut: Simplifying General Purpose Graphics Processing. Technical Report (2010), www.wsrn.sccs.swarthmore.edu/users/11/rcarlso1/docs/RyanCarlson_parallel.pdf
  11. 11.
    Tsay, J.C., Lee, W.P.: An optimal parallel algorithm for generating permutations in minimal change order. Parallel Comput. 20(3), 353–361 (1994), http://dx.doi.org/10.1016/S0167-81910680018-9, doi:10.1016/S0167-8191(06)80018-9MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations