Abstract
The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(loglogN) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(logNlogℓ) and O(logℓloglogℓ + loglogN) for SLPs and Linear SLPs, respectively. Here, ℓ denotes the length of the LCE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 73–84. Springer, Heidelberg (2000)
Amir, A., Farach, M., Matias, Y.: Efficient randomized dictionary matching algorithms. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 262–275. Springer, Heidelberg (1992)
Andoni, A., Indyk, P.: Efficient algorithms for substring near neighbor problem. In: Proc. 17th SODA, pp. 1203–1212 (2006)
Belazzougui, D., Boldi, P., Vigna, S.: Predecessor search with distance-sensitive query time. arXiv:1209.5441 (2012)
Bender, M., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321, 5–12 (2004)
Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. System Sci. 48(2), 214–230 (1994)
Bille, P., Gørtz, I.L., Sach, B., Vildhøj, H.W.: Time-space trade-offs for longest common extensions. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 293–305. Springer, Heidelberg (2012)
Bille, P., Landau, G., Raman, R., Sadakane, K., Satti, S., Weimann, O.: Random access to grammar-compressed strings. In: Proc. 22nd SODA, pp. 373–389 (2011)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111(3), 313–337 (2011)
Cole, R., Hariharan, R.: Faster suffix tree construction with missing suffix links. SIAM J. Comput. 33(1), 26–42 (2003)
Cormode, G., Muthukrishnan, S.: Substring compression problems. In: Proc. 16th SODA, pp. 321–330 (2005)
Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. ACM Trans. Algorithms 3(1), 2 (2007)
Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1991. LNCS, vol. 519, pp. 32–40. Springer, Heidelberg (1991)
Farach, M., Thorup, M.: String matching in Lempel–Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)
Gąsieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Randomized efficient algorithms for compressed strings: The finger-print approach. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 39–49. Springer, Heidelberg (1996)
Gąsieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proc. 15th DCC, p. 458 (2005)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Mehlhorn, K., Näher, S.: Bounded ordered dictionaries in O(loglogN) time and O(n) space. Inform. Process. Lett. 35(4), 183–189 (1990)
Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: Proc. 50th FOCS, pp. 315–323 (2009)
Rytter, W.: Application of Lempel–Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1), 211–222 (2003)
van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Theory Comput. Syst. 10(1), 99–127 (1976)
Willard, D.: Log-logarithmic worst-case range queries are possible in space Θ(N). Inform. Process. Lett. 17(2), 81–84 (1983)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bille, P., Cording, P.H., Gørtz, I.L., Sach, B., Vildhøj, H.W., Vind, S. (2013). Fingerprints in Compressed Strings. In: Dehne, F., Solis-Oba, R., Sack, JR. (eds) Algorithms and Data Structures. WADS 2013. Lecture Notes in Computer Science, vol 8037. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40104-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-40104-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40103-9
Online ISBN: 978-3-642-40104-6
eBook Packages: Computer ScienceComputer Science (R0)