Fingerprints in Compressed Strings

  • Philip Bille
  • Patrick Hagge Cording
  • Inge Li Gørtz
  • Benjamin Sach
  • Hjalte Wedel Vildhøj
  • Søren Vind
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8037)

Abstract

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(loglogN) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(logNlogℓ) and O(logℓloglogℓ + loglogN) for SLPs and Linear SLPs, respectively. Here, ℓ denotes the length of the LCE.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 73–84. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  2. 2.
    Amir, A., Farach, M., Matias, Y.: Efficient randomized dictionary matching algorithms. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 262–275. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  3. 3.
    Andoni, A., Indyk, P.: Efficient algorithms for substring near neighbor problem. In: Proc. 17th SODA, pp. 1203–1212 (2006)Google Scholar
  4. 4.
    Belazzougui, D., Boldi, P., Vigna, S.: Predecessor search with distance-sensitive query time. arXiv:1209.5441 (2012)Google Scholar
  5. 5.
    Bender, M., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321, 5–12 (2004)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. System Sci. 48(2), 214–230 (1994)MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Bille, P., Gørtz, I.L., Sach, B., Vildhøj, H.W.: Time-space trade-offs for longest common extensions. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 293–305. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Bille, P., Landau, G., Raman, R., Sadakane, K., Satti, S., Weimann, O.: Random access to grammar-compressed strings. In: Proc. 22nd SODA, pp. 373–389 (2011)Google Scholar
  9. 9.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111(3), 313–337 (2011)MathSciNetMATHGoogle Scholar
  11. 11.
    Cole, R., Hariharan, R.: Faster suffix tree construction with missing suffix links. SIAM J. Comput. 33(1), 26–42 (2003)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Cormode, G., Muthukrishnan, S.: Substring compression problems. In: Proc. 16th SODA, pp. 321–330 (2005)Google Scholar
  13. 13.
    Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. ACM Trans. Algorithms 3(1), 2 (2007)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1991. LNCS, vol. 519, pp. 32–40. Springer, Heidelberg (1991)CrossRefGoogle Scholar
  15. 15.
    Farach, M., Thorup, M.: String matching in Lempel–Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Gąsieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Randomized efficient algorithms for compressed strings: The finger-print approach. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 39–49. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  17. 17.
    Gąsieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proc. 15th DCC, p. 458 (2005)Google Scholar
  18. 18.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)Google Scholar
  20. 20.
    Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)MathSciNetMATHCrossRefGoogle Scholar
  21. 21.
    Mehlhorn, K., Näher, S.: Bounded ordered dictionaries in O(loglogN) time and O(n) space. Inform. Process. Lett. 35(4), 183–189 (1990)MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: Proc. 50th FOCS, pp. 315–323 (2009)Google Scholar
  23. 23.
    Rytter, W.: Application of Lempel–Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1), 211–222 (2003)MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Theory Comput. Syst. 10(1), 99–127 (1976)Google Scholar
  25. 25.
    Willard, D.: Log-logarithmic worst-case range queries are possible in space Θ(N). Inform. Process. Lett. 17(2), 81–84 (1983)MathSciNetMATHCrossRefGoogle Scholar
  26. 26.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)MathSciNetMATHCrossRefGoogle Scholar
  27. 27.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Philip Bille
    • 1
  • Patrick Hagge Cording
    • 1
  • Inge Li Gørtz
    • 1
  • Benjamin Sach
    • 2
  • Hjalte Wedel Vildhøj
    • 1
  • Søren Vind
    • 1
  1. 1.DTU ComputeTechnical University of DenmarkDenmark
  2. 2.Department of Computer ScienceUniversity of WarwickUK

Personalised recommendations