, Volume 77, Issue 1, pp 235–286 | Cite as

Engineering Parallel String Sorting

  • Timo BingmannEmail author
  • Andreas Eberle
  • Peter Sanders


We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. As base-case sorter for LCP-aware string sorting we describe sequential LCP-insertion sort which calculates the LCP array and accelerates its insertions using it. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our parallel string sorting implementations scale very well on real-world inputs and modern machines.


Parallel string sorting String sorting Sample sort Merge sort LCP-merge sort LCP-insertion sort Super scalar string sample sort 



We would like the thank the anonymous reviewer for extraordinarily thorough checking of our algorithms and proofs, and for kind suggestions on how to improve the paper.


  1. 1.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms (JDA) 2(1), 53–86 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Akiba, T.: Parallel string radix sort in C++. (2011). Git repository accessed November 2012
  3. 3.
    Akl, S.G., Santoro, N.: Optimal parallel merging and sorting without memory conflicts. IEEE Trans. Comput. 100(11), 1367–1369 (1987)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Amir, A., Franceschini, G., Grossi, R., Kopelowitz, T., Lewenstein, M., Lewenstein, N.: Managing unbounded-length keys in comparison-driven data structures with applications to online indexing. SIAM J. Comput. 43(4), 1396–1416 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Andersson, A., Nilsson, S.: Implementing radixsort. J. Exp. Algorithmics (JEA) 3, 7 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Bentley, J.L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: ACM (ed.) 8th Symposium on Discrete Algorithms (SODA), pp. 360–369 (1997)Google Scholar
  7. 7.
    Bingmann, T., Sanders, P.: Parallel string sample sort. In: 21th European Symposium on Algorithms (ESA), no. 8125 in LNCS. Springer-Verlag (2013)Google Scholar
  8. 8.
    Blelloch, G.E., Leiserson, C.E., Maggs, B.M., Plaxton, C.G., Smith, S.J., Zagha, M.: A comparison of sorting algorithms for the connection machine CM-2. In: 3rd Symposium on Parallel Algorithms and Architectures (SPAA), pp. 3–16. ACM (1991)Google Scholar
  9. 9.
    Brent, R.P.: The parallel evaluation of general arithmetic expressions. J. ACM (JACM) 21(2), 201–206 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Cole, R.: Parallel merge sort. SIAM J. Comput. 17(4), 770–785 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Dementiev, R., Kettner, L., Mehnert, J., Sanders, P.: Engineering a sorted list data structure for 32 bit keys. In: 6th Workshop on Algorithm Engineering & Experiments (ALENEX), pp. 142–151. SIAM (2004)Google Scholar
  12. 12.
    Eberle, A.: Parallel multiway LCP-mergesort. Bachelor Thesis, Karlsruhe Institute of Technology, to appear (2014)Google Scholar
  13. 13.
    Frazer, W.D., McKellar, A.C.: Samplesort: a sampling approach to minimal storage tree sorting. J. ACM (JACM) 17(3), 496–507 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Hagerup, T.: Optimal parallel string algorithms: sorting, merging and computing the minimum. In: 16th ACM Symposium on Theory of Computing (STOC), pp. 382–391 (1994)Google Scholar
  15. 15.
    Hoare, C.A.R.: Quicksort. Comput. J. 5(1), 10–16 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Kent, C., Lewenstein, M., Sheinwald, D.: On demand string sorting over unbounded alphabets. Theor. Comput. Sci.nce 426, 66–74 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Knöpfle, S.D.: String samplesort. Bachelor Thesis, Karlsruhe Institute of Technology, in German (2012)Google Scholar
  18. 18.
    Knuth, D.E.: The Art of Computer Programming, Volume 3: Sorting And Searching, 2nd edn. Addison Wesley Longman Publishing Co., Inc, Redwood (1998)Google Scholar
  19. 19.
    Kogge, P.M., Stone, H.S.: A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans. Comput. 100(8), 786–793 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Kärkkäinen, J., Rantala, T.: Engineering radix sort for strings. In: 16th International Conference on String Processing and Information Retrieval (SPIRE), no. 5280 in LNCS, pp. 3–14. Springer-Verlag (2009)Google Scholar
  21. 21.
    McIlroy, P.M., Bostic, K., McIlroy, M.D.: Engineering radix sort. Comput. Syst. 6(1), 5–27 (1993)Google Scholar
  22. 22.
    Mehlhorn, K., Sanders, P.: Scanning multiple sequences via cache memory. Algorithmica 35(1), 75–93 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Ng, W., Kakehi, K.: Cache efficient radix sort for string sorting. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E90–A(2), 457–466 (2007)CrossRefGoogle Scholar
  24. 24.
    Ng, W., Kakehi, K.: Merging string sequences by longest common prefixes. IPSJ Digit. Cour. 4, 69–78 (2008)CrossRefGoogle Scholar
  25. 25.
    Rantala, T.: Library of string sorting algorithms in C++. (2007). Git repository accessed November 2012
  26. 26.
    Sanders, P.: Fast priority queues for cached memory. J. Exp. Algorithmics (JEA) 5, 7 (2000)CrossRefzbMATHGoogle Scholar
  27. 27.
    Sanders, P., Winkel, S.: Super scalar sample sort. In: 12th European Symposium on Algorithms (ESA), LNCS, vol. 3221, pp. 784–796. Springer-Verlag (2004)Google Scholar
  28. 28.
    Shamsundar, N.: A fast, stable implementation of mergesort for sorting text files. (2009). Source downloaded November 2012
  29. 29.
    Singler, J., Sanders, P., Putze, F.: MCSTL: The multi-core standard template library. In: Euro-Par 2007 Parallel Processing, no. 4641 in LNCS, pp. 682–694. Springer-Verlag (2007)Google Scholar
  30. 30.
    Sinha, R., Wirth, A.: Engineering burstsort: toward fast in-place string sorting. J. Exp. Algorithmics (JEA) 15, 1–24 (2010)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Sinha, R., Zobel, J.: Cache-conscious sorting of large sets of strings with dynamic tries. J. Exp. Algorithmics (JEA) 9, 1–31 (2004)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Sinha, R., Zobel, J., Ring, D.: Cache-efficient string sorting using copying. J. Exp. Algorithmics (JEA) 11, 1–32 (2007)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Tsigas, P., Zhang, Y.: A simple, fast parallel implementation of quicksort and its performance evaluation on SUN enterprise 10000. In: 11th Euromicro Workshop on Parallel, Distributed and Network-Based Processing (PDP), pp. 372–381. IEEE Computer Society (2003)Google Scholar
  34. 34.
    Wassenberg, J., Sanders, P.: Engineering a multi-core radix sort. In: Euro-Par 2011 Parallel Processing, no. 6853 in LNCS, pp. 160–169. Springer-Verlag (2011)Google Scholar
  35. 35.
    Yang, M.C.K., Huang, J.S., Chow, Y.C.: Optimal parallel sorting scheme by order statistics. SIAM J. Comput. 16(6), 990–1003 (1987)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations