Thrust and CUDA in Data Intensive Algorithms

  • Krzysztof KaczmarskiEmail author
  • Paweł Rzążewski
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 185)


Huge memory bandwidth and instruction throughput make GPU processors very attractive for many algorithms which can only utilize Single Instruction Multiple Data (SIMD) architecture. Databases and their data intensive operations may also benefit from parallel GPU threads and thread streams. Many libraries offer simple interfaces for GPU, which make memory and threads management as easy as possible. Trade-off in programmers’ time, code structure and algorithm efficiency is critical for business applications. In this paper we evaluate the usage of Thrust library and its ability to manage millions of threads when compared to pure CUDA C program.


Single Instruction Multiple Data Device Function Data Intensive Application Transformation Iterator Coalesce Memory Access 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bell, N., Hoberock, J.: Thrust: A Productivity-Oriented Library for CUDA. In: GPU Computing Gems Jade Edition. Morgan Kaufmann (2011)Google Scholar
  2. 2.
    Dawes, B., Abrahams, D.: Boost C++ Libraries (2012),
  3. 3.
    Hoberock, J., Bell, N.: Thrust CUDA Library v.1.4.0 (2011)Google Scholar
  4. 4.
    Kaczmarski, K., Rudny, T.: MOLAP Cube Based on Parallel Scan Algorithm. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 125–138. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Knuth, D.: The art of computer programming, generating all tuples and permutations, fascicle 2, vol. 4 (2005)Google Scholar
  6. 6.
    NVIDIA Corporation: CUDA C best practices guide (2011)Google Scholar
  7. 7.
    NVIDIA Corporation: CUDA C Toolkit and SDK v.4.0 (2011)Google Scholar
  8. 8.
    NVIDIA Corporation: NVIDIA CUDA C programming guide version 4.0 (2011)Google Scholar
  9. 9.
    Sedgewick, R.: Permutation generation methods. ACM Comp. Surv., 137–164 (1977)Google Scholar
  10. 10.
    Stromme, A., Carlson, R.: Chestnut: Simplifying General Purpose Graphics Processing. Technical Report (2010),
  11. 11.
    Tsay, J.C., Lee, W.P.: An optimal parallel algorithm for generating permutations in minimal change order. Parallel Comput. 20(3), 353–361 (1994),, doi:10.1016/S0167-8191(06)80018-9MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations