Advertisement

External String Sorting: Faster and Cache-Oblivious

  • Rolf Fagerberg
  • Anna Pagh
  • Rasmus Pagh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3884)

Abstract

We give a randomized algorithm for sorting strings in external memory. For K binary strings comprising N words in total, our algorithm finds the sorted order and the longest common prefix sequence of the strings using \(O(\frac{K}{B}log_{M/B}(\frac{K}{M})log(\frac{N}{K}) + \frac{N}{B})\) I/Os. This bound is never worse than \(O(\frac{K}{B}log_{M/B}(\frac{K}{M})log log_{M/B}(\frac{K}{M}) + \frac{N}{B})\) I/Os, and improves on the (deterministic) algorithm of Arge et al. (On sorting strings in external memory, STOC ’97). The error probability of the algorithm can be chosen as O(N \(^{\rm -{\it c}}\)) for any positive constant c. The algorithm even works in the cache-oblivious model under the tall cache assumption, i.e,, assuming M > B 1 + ε for some ε > 0. An implication of our result is improved construction algorithms for external memory string dictionaries.

Keywords

Hash Function External Memory Short String Recursive Step Oblivious Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, A., Vitter, J.S.: The Input/Output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Andersson, A., Hagerup, T., Nilsson, S., Raman, R.: Sorting in linear time? J. Comput. System Sci. 57(1), 74–93 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Andersson, A., Nilsson, S.: A new efficient radix sort. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS 1994), pp. 714–721. IEEE Comput. Soc. Press, Los Alamitos (1994)CrossRefGoogle Scholar
  4. 4.
    Arge, L.: External memory data structures. In: Abello, J., Pardalos, P.M., Resende, M.G.C. (eds.) Handbook of Massive Data Sets, pp. 313–358. Kluwer Academic Publishers, Dordrecht (2002)CrossRefGoogle Scholar
  5. 5.
    Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: Cache-oblivious priority queue and graph algorithm applications. In: ACM. (ed.) Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC 2002), pp. 268–276. ACM Press, New York (2002)Google Scholar
  6. 6.
    Arge, L., Brodal, G.S., Fagerberg, R.: Cache-oblivious data structures. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Structures and Applications, CRC Press, Boca Raton (2005)Google Scholar
  7. 7.
    Arge, L., Ferragina, P., Grossi, R., Vitter, J.S.: On sorting strings in external memory (extended abstract). In: ACM (ed.) Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC 1997), pp. 540–548. ACM Press, New York (1997)Google Scholar
  8. 8.
    Bentley, J., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proc. 8th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 360–369 (1997)Google Scholar
  9. 9.
    Brodal, G.S.: Cache-oblivious algorithms and data structures. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 3–13. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Brodal, G.S., Fagerberg, R.: Cache oblivious distribution sweeping. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 426–438. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. 35th Annual ACM Symposium on Theory of Computing, pp. 307–315 (2003)Google Scholar
  12. 12.
    Brodal, G.S., Fagerberg, R.: Cache-oblivious string dictionaries. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2006) (to appear, 2006)Google Scholar
  13. 13.
    Demaine, E.D.: Cache-oblivious data structures and algorithms. In: Proc. EFF summer school on massive data sets. LNCS, Springer, Heidelberg (to appear)Google Scholar
  14. 14.
    Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. System Sci. 48(3), 533–551 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: 40th Annual IEEE Symposium on Foundations of Computer Science, pp. 285–298. IEEE Computer Society Press, Los Alamitos (1999)Google Scholar
  18. 18.
    Han, Y., Thorup, M.: Integer sorting in \(O(n\sqrt{\log\log n})\) expected time and linear space. In: Proceedings of the 43rd Annual Symposium on Foundations of Computer Science (FOCS 2002), pp. 135–144 (2002)Google Scholar
  19. 19.
    Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  20. 20.
    Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proceedings of the 4th Annual ACM Symposium on Theory of Computing (STOC 2072), pp. 125–136 (1972)Google Scholar
  21. 21.
    Meyer, U., Sanders, P., Sibeyn, J.F. (eds.): Algorithms for Memory Hierarchies. LNCS, vol. 2625. Springer, Berlin (2003)zbMATHGoogle Scholar
  22. 22.
    Morrison, D.R.: PATRICIA - practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)CrossRefGoogle Scholar
  23. 23.
    Vitter, J.S.: External memory algorithms and data structures: Dealing with MASSIVE data. ACM Computing Surveys 33(2), 209–271 (2001)CrossRefGoogle Scholar
  24. 24.
    Vitter, J.S.: Geometric and spatial data structures in external memory. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Structures and Applications, CRC Press, Boca Raton (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Rolf Fagerberg
    • 1
  • Anna Pagh
    • 2
  • Rasmus Pagh
    • 2
  1. 1.University of Southern DenmarkOdense MDenmark
  2. 2.IT University of CopenhagenKøbenhavn SDenmark

Personalised recommendations