Skip to main content

On Demand String Sorting over Unbounded Alphabets

  • Conference paper
Combinatorial Pattern Matching (CPM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

  • 694 Accesses

Abstract

On-demand string sorting is the problem of preprocessing a set of n strings to allow subsequent queries of finding the k < n lexicographically smallest strings (and afterwards the next k etc.) This on-demand variant strongly resembles the search engine queries which give you the best k-ranked pages recurringly.

We present a data structure that supports this in O(n) preprocessing time, and answers queries in O(logn) time. There is also a cost of O(N) time amortized over all operations, where N is the total length of the strings.

Our data structure is a heap of strings, which supports heapify and delete-mins. As it turns out, implementing a full heap with all operations is not that simple. For the sake of completeness we propose a heap with full operations based on balanced indexing trees that supports the heap operations in optimal times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A., Hopcroft, J., Ullman, J.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA (1974)

    MATH  Google Scholar 

  2. Amir, A., Kopelowitz, T., Lewenstein, M., Lewenstein, N.: Towards Real-Time Suffix Tree Construction. In: Proc. of Symp. on String Processing and Information Retrieval (SPIRE), pp. 67–78 (2005)

    Google Scholar 

  3. Iyer, B.R.: Hardware assisted sorting in IBM’s DB2 DBMS. In: International Conference on Management of Data, COMAD 2005b, Hyderabad, India (December 20-22, 2005)

    Google Scholar 

  4. Arge, L., Ferragina, P., Grossi, R., Vitter, J.S.: On sorting strings in external memory. In: Symposium of Theory of Computing (STOC), pp. 540–548 (1997)

    Google Scholar 

  5. Gonnet, G.H., Baeza-Yates, R.: Handbook of Algorithms and Data Structures. Addison-Wesley, Reading (1991)

    Google Scholar 

  6. Baer, J.-L., Lin, Y.-B.: Improving Quicksort Performance with a Codeword Data Structure. IEEE Transactions on Software Engineering 15, 622–631 (1989)

    Article  MathSciNet  Google Scholar 

  7. Bentley, J.L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 360–369 (1997)

    Google Scholar 

  8. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symposium on Foundations of Computer Science, pp. 137–143 (1997)

    Google Scholar 

  9. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. of the ACM 47(6), 987–1011 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  10. Grossi, R., Italiano, G.F.: Efficient techniques for maintaining multidimensional keys in linked data structures. In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 372–381. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  11. Hagerup, T.: Optimal parallel string algorithms: sorting, merging and computing the minimum. In: Proc. of Symposium on Theory of Computing (STOC), pp. 382–391 (1994)

    Google Scholar 

  12. Hagerup, T., Petersson, O.: Merging and Sorting Strings in Parallel. Mathematical Foundations of Computer Science (MFCS), pp. 298–306 (1992)

    Google Scholar 

  13. JaJa, J.F., Ryu, K.W., Vishkin, U.: Sorting strings and constructing difital search tries in parallel. Theoretical Computer Science 154(2), 225–245 (1996)

    Article  MathSciNet  Google Scholar 

  14. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  15. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Knuth, D.: the Art of Computer Programming. Sorting and Searching, vol. 3. Addison-Wesley, Reading, MA (1973)

    Google Scholar 

  17. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. on Computing 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  19. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. of the ACM 23, 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  20. Mehlhorn, K.: Dynamic Binary Search. SIAM J. Comput. 8(2), 175–198 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  21. Munro, J.I., Raman, V.: Sorting multisets and vectors inplace. In: Proc. of Workshop on Algorithms and Data Structures (WADS), pp. 473–479 (1991)

    Google Scholar 

  22. Sinha, R., Zobel, J., Ring, D.: Cache-efficient string sorting using copying. J. Exp. Algorithmics 11, 1084–6654 (2006)

    Article  MathSciNet  Google Scholar 

  23. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  24. Scott Vitter, J.: External memory algorithms. In: Handbook of massive data sets, pp. 359–416. Kluwer Academic Publishers, Norwell, MA, USA (2002)

    Google Scholar 

  25. Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  26. IBM OmniFind Enterprise Edition: Programming Guide and API Reference for Enterprise Search. Sorting by relevance, date, numeric fields, or text fields, p. 30. http://publibfp.boulder.ibm.com/epubs/pdf/c1892843.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kent, C., Lewenstein, M., Sheinwald, D. (2007). On Demand String Sorting over Unbounded Alphabets. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73437-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73436-9

  • Online ISBN: 978-3-540-73437-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics