Fast In-Place Sorting with CUDA Based on Bitonic Sort

  • Hagen Peters
  • Ole Schulz-Hildebrandt
  • Norbert Luttenberger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6067)


State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithms that fit to the characteristics of modern GPU-architecture.

We present a high-performance in-place implementation of Batcher’s bitonic sorting networks for CUDA-enabled GPUs. We adapted bitonic sort for arbitrary input length and assigned compare/exchange-operations to threads in a way that decreases low-performance global-memory access and thereby greatly increases the performance of the implementation.


GPU GPGPU CUDA Parallel Sorting Multicore 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kapasi, U.J., Dally, W.J., Rixner, S., Mattson, P.R., Owens, J.D., Khailany, B.: Efficient conditional operations for data-parallel architectures. In: MICRO 33: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, pp. 159–170. ACM, New York (2000)Google Scholar
  2. 2.
    Purcell, T.J., Donner, C., Cammarano, M., Jensen, H.W., Hanrahan, P.: Photon mapping on programmable graphics hardware. In: HWWS 2003: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, Aire-la-Ville, Switzerland, pp. 41–50. Eurographics Association (2003)Google Scholar
  3. 3.
    Kipfer, P., Segal, M., Westermann, R.: Uberflow: a gpu-based particle engine. In: HWWS 2004: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pp. 115–122. ACM, New York (2004)CrossRefGoogle Scholar
  4. 4.
    Govindaraju, N., Raghuvanshi, N., Henson, M., Manocha, D.: A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Technical report, University of North Carolina-Chapel Hill (2005)Google Scholar
  5. 5.
    Greb, A., Zachmann, G.: Gpu-abisort: optimal parallel sorting on stream architectures. In: 20th International on Parallel and Distributed Processing Symposium, IPDPS 2006 (2006)Google Scholar
  6. 6.
    Batcher, K.: Sorting networks and their applications. In: AFIPS Spring Joint Comput. Conf. (1967)Google Scholar
  7. 7.
    Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with cuda. In: GPU Gems 3. Addison-Wesley, Reading (2007)Google Scholar
  8. 8.
    Grand, S.L.: Broad-phase collision detection with cuda. In: GPU Gems, vol. 3, Addison-Wesley, Reading (2007)Google Scholar
  9. 9.
    He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: SC 2007: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 1–12. ACM, New York (2007)CrossRefGoogle Scholar
  10. 10.
    Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for gpu computing. In: GH 2007: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, Aire-la-Ville, Switzerland, pp. 97–106. Eurographics Association (2007)Google Scholar
  11. 11.
    Cederman, D., Tsigas, P.: A practical quicksort algorithm for graphics processors. In: Halperin, D., Mehlhorn, K. (eds.) Esa 2008. LNCS, vol. 5193, pp. 246–258. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Sintorn, E., Assarsson, U.: Fast parallel gpu-sorting using a hybrid algorithm, Orlando, FL, USA, vol. 68, pp. 1381–1388. Academic Press, Inc., London (2008)Google Scholar
  13. 13.
    Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore gpus. In: Proceedings 23rd IEEE International Parallel and Distributed Processing Symposium (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Hagen Peters
    • 1
  • Ole Schulz-Hildebrandt
    • 1
  • Norbert Luttenberger
    • 1
  1. 1.Research Group for Communication Systems, Department of Computer ScienceChristian-Albrechts-University KielGermany

Personalised recommendations