Skip to main content

Generalized Substring Compression

  • Conference paper
Book cover Combinatorial Pattern Matching (CPM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5577))

Included in the following conference series:

Abstract

In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression more efficient.

We focus our attention on generalized substring compression and present the first non-trivial correct algorithm for this problem. In our algorithm we inherently propose a method for finding the bounded longest common prefix of substrings, which may be of independent interest. In addition, we propose an efficient algorithm for substring compression which makes use of range searching for minimum queries.

We present several tradeoffs for both problems. For compressing the substring S[i . . j] (possibly with the substring S[α . . β] as a context), best query times we achieve are O(C) and \(O\big(C\log\big(\frac{j-i}{C}\big)\big)\) for substring compression query and generalized substring compression query, respectively, where C is the number of phrases encoded.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. In: FOCS 2000: IEEE Symposium on Foundations of Computer Science, pp. 198–207 (2000)

    Google Scholar 

  2. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comput. Sci. 321(1), 5–12 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cormode, G., Muthukrishnan, S.: Substring compression problems. In: SODA 2005: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 321–330. Society for Industrial and Applied Mathematics (2005)

    Google Scholar 

  5. Crochemore, M., Iliopoulos, C.S., Kubica, M., Rahman, M.S., Walen, T.: Improved algorithms for the range next value problem and applications. In: STACS, pp. 205–216 (2008)

    Google Scholar 

  6. Farach, M.: Optimal suffix tree construction with large alphabets. In: FOCS 1997: Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS 1997), Washington, DC, USA, p. 137. IEEE Computer Society Press, Los Alamitos (1997)

    Google Scholar 

  7. Ferragina, P.: Dynamic text indexing under string updates. J. Algorithms 22(2), 296–328 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ferragina, P., Muthukrishnan, S., de Berg, M.: Multi-method dispatching: A geometric approach with applications to string matching problems. In: STOC 1999: Proceedings of the thirty-first annual ACM Symposium on Theory of Computing, pp. 483–491 (1999)

    Google Scholar 

  9. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  10. Keller, O., Kopelowitz, T., Lewenstein, M.: Range non-overlapping indexing and successive list indexing. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 626–631. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Lenhof, H.-P., Smid, M.: Using persistent data structures for adding range restrictions to searching problems. RAIRO Theoretical Informatics and Applications 28, 25–49 (1994)

    MathSciNet  MATH  Google Scholar 

  12. Mäkinen, V., Navarro, G.: Position-restricted substring searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 703–714. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  14. Muthukrishnan, S.: Personal communication with the second author

    Google Scholar 

  15. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  16. Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, pp. 1–11. IEEE, Los Alamitos (1973)

    Chapter  Google Scholar 

  17. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Keller, O., Kopelowitz, T., Landau, S., Lewenstein, M. (2009). Generalized Substring Compression. In: Kucherov, G., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 2009. Lecture Notes in Computer Science, vol 5577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02441-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02441-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02440-5

  • Online ISBN: 978-3-642-02441-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics