Skip to main content

Fingerprints in Compressed Strings

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 8037)

Abstract

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(loglogN) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(logNlogℓ) and O(logℓloglogℓ + loglogN) for SLPs and Linear SLPs, respectively. Here, ℓ denotes the length of the LCE.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-40104-6_13
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-40104-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 73–84. Springer, Heidelberg (2000)

    CrossRef  Google Scholar 

  2. Amir, A., Farach, M., Matias, Y.: Efficient randomized dictionary matching algorithms. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 262–275. Springer, Heidelberg (1992)

    CrossRef  Google Scholar 

  3. Andoni, A., Indyk, P.: Efficient algorithms for substring near neighbor problem. In: Proc. 17th SODA, pp. 1203–1212 (2006)

    Google Scholar 

  4. Belazzougui, D., Boldi, P., Vigna, S.: Predecessor search with distance-sensitive query time. arXiv:1209.5441 (2012)

    Google Scholar 

  5. Bender, M., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321, 5–12 (2004)

    MathSciNet  MATH  CrossRef  Google Scholar 

  6. Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. System Sci. 48(2), 214–230 (1994)

    MathSciNet  MATH  CrossRef  Google Scholar 

  7. Bille, P., Gørtz, I.L., Sach, B., Vildhøj, H.W.: Time-space trade-offs for longest common extensions. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 293–305. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  8. Bille, P., Landau, G., Raman, R., Sadakane, K., Satti, S., Weimann, O.: Random access to grammar-compressed strings. In: Proc. 22nd SODA, pp. 373–389 (2011)

    Google Scholar 

  9. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)

    MathSciNet  CrossRef  Google Scholar 

  10. Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111(3), 313–337 (2011)

    MathSciNet  MATH  Google Scholar 

  11. Cole, R., Hariharan, R.: Faster suffix tree construction with missing suffix links. SIAM J. Comput. 33(1), 26–42 (2003)

    MathSciNet  MATH  CrossRef  Google Scholar 

  12. Cormode, G., Muthukrishnan, S.: Substring compression problems. In: Proc. 16th SODA, pp. 321–330 (2005)

    Google Scholar 

  13. Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. ACM Trans. Algorithms 3(1), 2 (2007)

    MathSciNet  CrossRef  Google Scholar 

  14. Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1991. LNCS, vol. 519, pp. 32–40. Springer, Heidelberg (1991)

    CrossRef  Google Scholar 

  15. Farach, M., Thorup, M.: String matching in Lempel–Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)

    MathSciNet  MATH  CrossRef  Google Scholar 

  16. Gąsieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Randomized efficient algorithms for compressed strings: The finger-print approach. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 39–49. Springer, Heidelberg (1996)

    CrossRef  Google Scholar 

  17. Gąsieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proc. 15th DCC, p. 458 (2005)

    Google Scholar 

  18. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    MathSciNet  MATH  CrossRef  Google Scholar 

  19. Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)

    Google Scholar 

  20. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    MathSciNet  MATH  CrossRef  Google Scholar 

  21. Mehlhorn, K., Näher, S.: Bounded ordered dictionaries in O(loglogN) time and O(n) space. Inform. Process. Lett. 35(4), 183–189 (1990)

    MathSciNet  MATH  CrossRef  Google Scholar 

  22. Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: Proc. 50th FOCS, pp. 315–323 (2009)

    Google Scholar 

  23. Rytter, W.: Application of Lempel–Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1), 211–222 (2003)

    MathSciNet  MATH  CrossRef  Google Scholar 

  24. van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Theory Comput. Syst. 10(1), 99–127 (1976)

    Google Scholar 

  25. Willard, D.: Log-logarithmic worst-case range queries are possible in space Θ(N). Inform. Process. Lett. 17(2), 81–84 (1983)

    MathSciNet  MATH  CrossRef  Google Scholar 

  26. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)

    MathSciNet  MATH  CrossRef  Google Scholar 

  27. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    MathSciNet  MATH  CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bille, P., Cording, P.H., Gørtz, I.L., Sach, B., Vildhøj, H.W., Vind, S. (2013). Fingerprints in Compressed Strings. In: Dehne, F., Solis-Oba, R., Sack, JR. (eds) Algorithms and Data Structures. WADS 2013. Lecture Notes in Computer Science, vol 8037. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40104-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40104-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40103-9

  • Online ISBN: 978-3-642-40104-6

  • eBook Packages: Computer ScienceComputer Science (R0)