Abstract
On-demand string sorting is the problem of preprocessing a set of n strings to allow subsequent queries of finding the k < n lexicographically smallest strings (and afterwards the next k etc.) This on-demand variant strongly resembles the search engine queries which give you the best k-ranked pages recurringly.
We present a data structure that supports this in O(n) preprocessing time, and answers queries in O(logn) time. There is also a cost of O(N) time amortized over all operations, where N is the total length of the strings.
Our data structure is a heap of strings, which supports heapify and delete-mins. As it turns out, implementing a full heap with all operations is not that simple. For the sake of completeness we propose a heap with full operations based on balanced indexing trees that supports the heap operations in optimal times.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A., Hopcroft, J., Ullman, J.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA (1974)
Amir, A., Kopelowitz, T., Lewenstein, M., Lewenstein, N.: Towards Real-Time Suffix Tree Construction. In: Proc. of Symp. on String Processing and Information Retrieval (SPIRE), pp. 67–78 (2005)
Iyer, B.R.: Hardware assisted sorting in IBM’s DB2 DBMS. In: International Conference on Management of Data, COMAD 2005b, Hyderabad, India (December 20-22, 2005)
Arge, L., Ferragina, P., Grossi, R., Vitter, J.S.: On sorting strings in external memory. In: Symposium of Theory of Computing (STOC), pp. 540–548 (1997)
Gonnet, G.H., Baeza-Yates, R.: Handbook of Algorithms and Data Structures. Addison-Wesley, Reading (1991)
Baer, J.-L., Lin, Y.-B.: Improving Quicksort Performance with a Codeword Data Structure. IEEE Transactions on Software Engineering 15, 622–631 (1989)
Bentley, J.L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 360–369 (1997)
Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symposium on Foundations of Computer Science, pp. 137–143 (1997)
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. of the ACM 47(6), 987–1011 (2000)
Grossi, R., Italiano, G.F.: Efficient techniques for maintaining multidimensional keys in linked data structures. In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 372–381. Springer, Heidelberg (1999)
Hagerup, T.: Optimal parallel string algorithms: sorting, merging and computing the minimum. In: Proc. of Symposium on Theory of Computing (STOC), pp. 382–391 (1994)
Hagerup, T., Petersson, O.: Merging and Sorting Strings in Parallel. Mathematical Foundations of Computer Science (MFCS), pp. 298–306 (1992)
JaJa, J.F., Ryu, K.W., Vishkin, U.: Sorting strings and constructing difital search tries in parallel. Theoretical Computer Science 154(2), 225–245 (1996)
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)
Knuth, D.: the Art of Computer Programming. Sorting and Searching, vol. 3. Addison-Wesley, Reading, MA (1973)
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. on Computing 22(5), 935–948 (1993)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. of the ACM 23, 262–272 (1976)
Mehlhorn, K.: Dynamic Binary Search. SIAM J. Comput. 8(2), 175–198 (1979)
Munro, J.I., Raman, V.: Sorting multisets and vectors inplace. In: Proc. of Workshop on Algorithms and Data Structures (WADS), pp. 473–479 (1991)
Sinha, R., Zobel, J., Ring, D.: Cache-efficient string sorting using copying. J. Exp. Algorithmics 11, 1084–6654 (2006)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Scott Vitter, J.: External memory algorithms. In: Handbook of massive data sets, pp. 359–416. Kluwer Academic Publishers, Norwell, MA, USA (2002)
Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
IBM OmniFind Enterprise Edition: Programming Guide and API Reference for Enterprise Search. Sorting by relevance, date, numeric fields, or text fields, p. 30. http://publibfp.boulder.ibm.com/epubs/pdf/c1892843.pdf
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kent, C., Lewenstein, M., Sheinwald, D. (2007). On Demand String Sorting over Unbounded Alphabets. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-73437-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73436-9
Online ISBN: 978-3-540-73437-6
eBook Packages: Computer ScienceComputer Science (R0)