A Fast and Flexible Sorting Algorithm with CUDA

  • Shifu Chen
  • Jing Qin
  • Yongming Xie
  • Junping Zhao
  • Pheng-Ann Heng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5574)


In this paper, we propose a fast and flexible sorting algorithm with CUDA. The proposed algorithm is much more practical than the previous GPU-based sorting algorithms, as it is able to handle the sorting of elements represented by integers, floats and structures. Meanwhile, our algorithm is optimized for the modern GPU architecture to obtain high performance. We use different strategies for sorting disorderly list and nearly-sorted list to make it adaptive. Extensive experiments demon- strate our algorithm has higher performance than previous GPU-based sorting algorithms and can support real-time applications.


Parallel sorting algorithm CUDA GPU-based sorting algorithm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Purcell, T.J., Donner, C., Cammarano, M., Jensen, H.W., Hanrahan, P.: Photon Mapping on Programmable Graphics Hardware. In: Proceedings of the ACM Siggraph Eurographics Symposium on Graphics Hardware (2003)Google Scholar
  2. 2.
    Kapasi, U.J., Dally, W.J., Rixner, S., Mattson, P.R., Owens, J.D., Khailany, B.: Efficient Conditional Operations for Data-parallel Architectures. In: Proceedings of the 33rd annual ACM/IEEE International Symposium on Microarchitecture, pp. 159–170 (2000)Google Scholar
  3. 3.
    Kipfer, P., Segal, M., Westermann, R.: UberFlow: A GPU-based Particle Engine. In: Proceedings of the ACM Siggraph/Eurographics Conference on Graphics Hardware, pp. 115–122 (2004)Google Scholar
  4. 4.
    Greβ, A., Zachmann, G.: GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (2006)Google Scholar
  5. 5.
    Bilardi, G., Nicolau, A.: Adaptive Bitonic Sorting. An Optimal Parallel Algorithm for Shared Memory Machines. SIAM Journal on Computing 18(2), 216–228 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)Google Scholar
  7. 7.
    NVIDIA Corporation. NVIDIA CUDA Programming Guide (2008)Google Scholar
  8. 8.
    Sintorn, E., Assarsson, U.: Fast Parallel GPU-Sorting Using a Hybrid Algorithm. In: Workshop on General Purpose Processing on Graphics Processing Units (2007)Google Scholar
  9. 9.
    Cederman, D., Tsigas, P.: A Practical Quicksort Algorithm for Graphics Processors. Technical Report 2008-01, Computer Science and Engineering Chalmers University of Technology (2008)Google Scholar
  10. 10.
    Harris, M., Satish, N.: Designing Efficient Sorting Algorithms for Manycore GPUs. NVIDIA Technical Report (2008)Google Scholar
  11. 11.
    Harris, M.: Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology (2008)Google Scholar
  12. 12.
    Blelloch, E., Greg Plaxton, C., Leiserson, C.E., Smith, S.J., Maggs, B.M., Zagha, M.: An Experimental Analysis of Parallel Sorting Algorithms (1998)Google Scholar
  13. 13.
    Harris, M., Sengupta, S., Owens, J.D.: Parallel Prefix Sum (Scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, Addison-Wesley, Reading (2007)Google Scholar
  14. 14.
    Bilardi, G., Nicolau, A.: Adaptive bitonic sorting: an optimal parallel algorithm for shared-memory machines. SIAM J. Comput. 18(2), 216–228 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Kider, J.T.: GPU as a Parallel Machine: Sorting on the GPU, Lecture of University of Pennsylvania (2005)Google Scholar
  16. 16.
    Knuth, D.: Section 5.2.4: Sorting by merging. In: The Art of Computer Programming, Sorting and Searching, vol. 3, pp. 158–168 (1998) ISBN 0-201-89685-0Google Scholar
  17. 17.
    Harris, M.: Parallel Prefix Sum(Scan) with CUDA (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Shifu Chen
    • 1
  • Jing Qin
    • 2
  • Yongming Xie
    • 2
  • Junping Zhao
    • 3
  • Pheng-Ann Heng
    • 1
    • 2
  1. 1.Shenzhen Institute of Advanced Integration TechnologyChinese Academy of Sciences/The Chinese University of Hong KongHong Kong
  2. 2.Department of Computer Science and EngineeringThe Chinese University of Hong KongHong Kong
  3. 3.Institue of Medical InformaticsChinese PLA General Hospital & Postgraduate Medical SchoolChina

Personalised recommendations