Efficient massively parallel quicksort

  • Peter Sanders
  • Thomas Hansch
Discrete Algorithms
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1253)


Parallel quicksort is known as a very scalable parallel sorting algorithm. However, most actual implementations have been limited to basic versions which suffer from irregular communication patterns and load imbalance. We have implemented a high performance variant of parallel quicksort which incorporates the following optimizations: Stop the recursion at the right time, sort locally first, use accurate yet efficient pivot selection strategies, streamline communication patterns, use locality preserving processor indexing schemes and work with multiple pivots at once. It turns out to be among the best practical sorting methods. It is about three times faster than the basic algorithm and achieves a speedup of 810 on a 1024 processor Parsytec GCel for the NAS parallel sorting benchmark of size 224. The optimized algorithm can also be shown to be asymptotically optimal on meshes.


Implementation of parallel quicksort sorting with multiple pivots locality preserving indexing schemes load balancing irregular communication patterns 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    D. Bailey, E. Barszcz, J. Barton, D. Browning, and R. Carter. The NAS parallel benchmarks. Technical Report RNR-94-007, RNR, 1994.Google Scholar
  2. 2.
    G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. A comparison of sorting algorithms for the connection machine CM-2. In ACM Symposium on Parallel Architectures and Algorithms, pages 3–16, 1991.Google Scholar
  3. 3.
    R. Butenuth, W. Burke, and H.-U. Heiß. Cosy — an operating system for highly parallel computers. ACM Operating Systems Review, 30(2):81–91, 1996.Google Scholar
  4. 4.
    R. Diekmann, J. Gehring, R. Lüling, B. Monien, M. Nübel, and R. Wanka. Sorting large data sets on a massively parallel system. In 6th IEEE Symposium on Parallel and Distributed Processing, pages 2–9, 1994.Google Scholar
  5. 5.
    S. Goil, S. Aluru, and S. Ranka. Concatenated parallelism: A technique for efficient parallel divide and conquer. In Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing, pages 488–495, 1996.Google Scholar
  6. 6.
    J. Hardwick. An efficient implementation of nested data parallelism for irregurlar divide-and-conquer algorithms. In Workshop on High-Level Programming Models and Supportive Environments, Honolulu, Hawaii, 1996.Google Scholar
  7. 7.
    W. L. Hightower, J. F. Prins, and J. H. Reif. Implementations of randomized sorting on large parallel machines. In ACM Symposium on Parallel Architectures and Algorithms, pages 158–167, 1992.Google Scholar
  8. 8.
    V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing. Design and Analysis of Algorithms. Benjamin/Cummings, 1994.Google Scholar
  9. 9.
    H. Li and K. C. Sevcik. Parallel sorting by overpartitioning. In ACM Symposium on Parallel Architectures and Algorithms, pages 46–56, Cape May, New Jersey, 1994.Google Scholar
  10. 10.
    R. Niedermeier and P. Sanders. On the Manhattan-distance between points on space-filling mesh-indexings. Technical Report IB 18/96, Universität Karlsruhe, Fakultät für Informatik, 1996.Google Scholar
  11. 11.
    M. J. Quinn. Analysis and benchmarking oaf two parallel sorting algorithms: hyperquicksort and quickmerge. BIT, 25:239–250, 1989.Google Scholar
  12. 12.
    S. Rajasekaran and S. Sen. Random sampling techniques and parallel algorithm design. In H. Reif, editor, Synthesis of Parallel Algorithms, chapter 9, pages 411–451. Morgan Kaufmann, 1993.Google Scholar
  13. 13.
    P. Sanders. A scalable parallel tree search library. In S. Ranka, editor, 2nd Workshop on Solving Irregular Problems on Distributed Memory Machines, Honolulu, Hawaii, 1996.Google Scholar
  14. 14.
    P. Sanders. Randomized priority queues for fast parallel access. Technical Report IB 7/97, Universität Karlsruhe, Fakultät für Informatik, 1997.Google Scholar
  15. 15.
    V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Efficient algorithms for parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2):95–131, 1991.Google Scholar
  16. 16.
    T. Umland. Parallel sorting revisited. Parallel Computing, 20(1):115–124, 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Peter Sanders
    • 1
  • Thomas Hansch
    • 1
  1. 1.Department of Computer ScienceUniversity of KarlsruheKarlsruheGermany

Personalised recommendations