Abstract
In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression more efficient.
We focus our attention on generalized substring compression and present the first non-trivial correct algorithm for this problem. In our algorithm we inherently propose a method for finding the bounded longest common prefix of substrings, which may be of independent interest. In addition, we propose an efficient algorithm for substring compression which makes use of range searching for minimum queries.
We present several tradeoffs for both problems. For compressing the substring S[i . . j] (possibly with the substring S[α . . β] as a context), best query times we achieve are O(C) and \(O\big(C\log\big(\frac{j-i}{C}\big)\big)\) for substring compression query and generalized substring compression query, respectively, where C is the number of phrases encoded.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. In: FOCS 2000: IEEE Symposium on Foundations of Computer Science, pp. 198–207 (2000)
Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)
Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comput. Sci. 321(1), 5–12 (2004)
Cormode, G., Muthukrishnan, S.: Substring compression problems. In: SODA 2005: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 321–330. Society for Industrial and Applied Mathematics (2005)
Crochemore, M., Iliopoulos, C.S., Kubica, M., Rahman, M.S., Walen, T.: Improved algorithms for the range next value problem and applications. In: STACS, pp. 205–216 (2008)
Farach, M.: Optimal suffix tree construction with large alphabets. In: FOCS 1997: Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS 1997), Washington, DC, USA, p. 137. IEEE Computer Society Press, Los Alamitos (1997)
Ferragina, P.: Dynamic text indexing under string updates. J. Algorithms 22(2), 296–328 (1997)
Ferragina, P., Muthukrishnan, S., de Berg, M.: Multi-method dispatching: A geometric approach with applications to string matching problems. In: STOC 1999: Proceedings of the thirty-first annual ACM Symposium on Theory of Computing, pp. 483–491 (1999)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Keller, O., Kopelowitz, T., Lewenstein, M.: Range non-overlapping indexing and successive list indexing. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 626–631. Springer, Heidelberg (2007)
Lenhof, H.-P., Smid, M.: Using persistent data structures for adding range restrictions to searching problems. RAIRO Theoretical Informatics and Applications 28, 25–49 (1994)
Mäkinen, V., Navarro, G.: Position-restricted substring searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 703–714. Springer, Heidelberg (2006)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)
Muthukrishnan, S.: Personal communication with the second author
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, pp. 1–11. IEEE, Los Alamitos (1973)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Keller, O., Kopelowitz, T., Landau, S., Lewenstein, M. (2009). Generalized Substring Compression. In: Kucherov, G., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 2009. Lecture Notes in Computer Science, vol 5577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02441-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-02441-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02440-5
Online ISBN: 978-3-642-02441-2
eBook Packages: Computer ScienceComputer Science (R0)