Thrust and CUDA in Data Intensive Algorithms
Huge memory bandwidth and instruction throughput make GPU processors very attractive for many algorithms which can only utilize Single Instruction Multiple Data (SIMD) architecture. Databases and their data intensive operations may also benefit from parallel GPU threads and thread streams. Many libraries offer simple interfaces for GPU, which make memory and threads management as easy as possible. Trade-off in programmers’ time, code structure and algorithm efficiency is critical for business applications. In this paper we evaluate the usage of Thrust library and its ability to manage millions of threads when compared to pure CUDA C program.
KeywordsSingle Instruction Multiple Data Device Function Data Intensive Application Transformation Iterator Coalesce Memory Access
Unable to display preview. Download preview PDF.
- 1.Bell, N., Hoberock, J.: Thrust: A Productivity-Oriented Library for CUDA. In: GPU Computing Gems Jade Edition. Morgan Kaufmann (2011)Google Scholar
- 2.Dawes, B., Abrahams, D.: Boost C++ Libraries (2012), www.boost.org
- 3.Hoberock, J., Bell, N.: Thrust CUDA Library v.1.4.0 (2011)Google Scholar
- 5.Knuth, D.: The art of computer programming, generating all tuples and permutations, fascicle 2, vol. 4 (2005)Google Scholar
- 6.NVIDIA Corporation: CUDA C best practices guide (2011)Google Scholar
- 7.NVIDIA Corporation: CUDA C Toolkit and SDK v.4.0 (2011)Google Scholar
- 8.NVIDIA Corporation: NVIDIA CUDA C programming guide version 4.0 (2011)Google Scholar
- 9.Sedgewick, R.: Permutation generation methods. ACM Comp. Surv., 137–164 (1977)Google Scholar
- 10.Stromme, A., Carlson, R.: Chestnut: Simplifying General Purpose Graphics Processing. Technical Report (2010), www.wsrn.sccs.swarthmore.edu/users/11/rcarlso1/docs/RyanCarlson_parallel.pdf