Fast In-Place Sorting with CUDA Based on Bitonic Sort

  • Hagen Peters
  • Ole Schulz-Hildebrandt
  • Norbert Luttenberger
Conference paper

DOI: 10.1007/978-3-642-14390-8_42

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6067)
Cite this paper as:
Peters H., Schulz-Hildebrandt O., Luttenberger N. (2010) Fast In-Place Sorting with CUDA Based on Bitonic Sort. In: Wyrzykowski R., Dongarra J., Karczewski K., Wasniewski J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg

Abstract

State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithms that fit to the characteristics of modern GPU-architecture.

We present a high-performance in-place implementation of Batcher’s bitonic sorting networks for CUDA-enabled GPUs. We adapted bitonic sort for arbitrary input length and assigned compare/exchange-operations to threads in a way that decreases low-performance global-memory access and thereby greatly increases the performance of the implementation.

Keywords

GPU GPGPU CUDA Parallel Sorting Multicore 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Hagen Peters
    • 1
  • Ole Schulz-Hildebrandt
    • 1
  • Norbert Luttenberger
    • 1
  1. 1.Research Group for Communication Systems, Department of Computer ScienceChristian-Albrechts-University KielGermany

Personalised recommendations