A Practical Quicksort Algorithm for Graphics Processors

  • Daniel Cederman
  • Philippas Tsigas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5193)


In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors. Quicksort has previously been considered as an inefficient sorting solution for graphics processors, but we show that GPU-Quicksort often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan Primitives for GPU Computing. In: Proceedings of the 22nd ACM Siggraph/Eurographics Symposium on Graphics Hardware, pp. 97–106 (2007)Google Scholar
  2. 2.
    Evans, D.J., Dunbar, R.C.: The Parallel Quicksort Algorithm Part 1 - Run Time Analysis. International Journal of Computer Mathematics 12, 19–55 (1982)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Heidelberger, P., Norton, A., Robinson, J.T.: Parallel Quicksort Using Fetch-And-Add. IEEE Transactions on Computers 39(1), 133–138 (1990)CrossRefGoogle Scholar
  4. 4.
    Tsigas, P., Zhang, Y.: A Simple, Fast Parallel Implementation of Quicksort and its Performance Evaluation on SUN Enterprise 10000. In: Proceedings of the 11th Euromicro Conference on Parallel Distributed and Network-based Processing, pp. 372–381 (2003)Google Scholar
  5. 5.
    Purcell, T.J., Donner, C., Cammarano, M., Jensen, H.W., Hanrahan, P.: Photon Mapping on Programmable Graphics Hardware. In: Proceedings of the ACM Siggraph/Eurographics Symposium on Graphics Hardware, pp. 41–50 (2003)Google Scholar
  6. 6.
    Kapasi, U.J., Dally, W.J., Rixner, S., Mattson, P.R., Owens, J.D., Khailany, B.: Efficient Conditional Operations for Data-parallel Architectures. In: Proceedings of the 33rd annual ACM/IEEE International Symposium on Microarchitecture, pp. 159–170 (2000)Google Scholar
  7. 7.
    Kipfer, P., Segal, M., Westermann, R.: UberFlow: A GPU-based Particle Engine. In: Proceedings of the ACM Siggraph/Eurographics Conference on Graphics Hardware, pp. 115–122 (2004)Google Scholar
  8. 8.
    Kipfer, P., Westermann, R.: Improved GPU Sorting. In: Pharr, M. (ed.) GPUGems 2, pp. 733–746. Addison-Wesley, Reading (2005)Google Scholar
  9. 9.
    Greß, A., Zachmann, G.: GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (2006)Google Scholar
  10. 10.
    Bilardi, G., Nicolau, A.: Adaptive Bitonic Sorting. An Optimal Parallel Algorithm for Shared Memory Machines. SIAM Journal on Computing 18(2), 216–228 (1989)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)Google Scholar
  12. 12.
    Dowd, M., Perl, Y., Rudolph, L., Saks, M.: The Periodic Balanced Sorting Network. Journal of the ACM 36(4), 738–757 (1989)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Govindaraju, N., Raghuvanshi, N., Henson, M., Manocha, D.: A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations using Graphics Processors. Technical report, University of North Carolina-Chapel Hill (2005)Google Scholar
  14. 14.
    Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2006)Google Scholar
  15. 15.
    Sintorn, E., Assarsson, U.: Fast Parallel GPU-Sorting Using a Hybrid Algorithm. In: Workshop on General Purpose Processing on Graphics Processing Units (2007)Google Scholar
  16. 16.
    Hoare, C.A.R.: Quicksort. Computer Journal 5(4), 10–15 (1962)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Sedgewick, R.: Implementing Quicksort Programs. Communications of the ACM 21(10), 847–857 (1978)MATHCrossRefGoogle Scholar
  18. 18.
    Harris, M., Sengupta, S., Owens, J.D.: Parallel Prefix Sum (Scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3. Addison-Wesley, Reading (August 2007)Google Scholar
  19. 19.
    Musser, D.R.: Introspective Sorting and Selection Algorithms. Software - Practice and Experience 27(8), 983–993 (1997)CrossRefGoogle Scholar
  20. 20.
    Singleton, R.C.: Algorithm 347: an Efficient Algorithm for Sorting with Minimal Storage. Communications of the ACM 12(3), 185–186 (1969)CrossRefGoogle Scholar
  21. 21.
    Cederman, D., Tsigas, P.: GPU Quicksort Library (December 2007), www.cs.chalmers.se/~dcs/gpuqsortdcs.html
  22. 22.
    Helman, D.R., Bader, D.A., JáJá, J.: A Randomized Parallel Sorting Algorithm with an Experimental Study. Journal of Parallel and Distributed Computing 52(1), 1–23 (1998)MATHCrossRefGoogle Scholar
  23. 23.
    Matsumoto, M., Nishimura, T.: Mersenne Twister: a 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. Transactions on Modeling and Computer Simulation 8(1), 3–30 (1998)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Daniel Cederman
    • 1
  • Philippas Tsigas
    • 1
  1. 1.Department of Computer Science and EngineeringChalmers University of TechnologyGöteborgSweden

Personalised recommendations