Algorithmica

, Volume 77, Issue 2, pp 336–348

Compressed Subsequence Matching and Packed Tree Coloring

  • Philip Bille
  • Patrick Hagge Cording
  • Inge Li Gørtz
Article

Abstract

We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size n compressing a string of size N and a pattern string of size m over an alphabet of size \(\sigma \), our algorithm uses \(O(n+\frac{n\sigma }{w})\) space and \(O(n+\frac{n\sigma }{w}+m\log N\log w\cdot occ)\) or \(O(n+\frac{n\sigma }{w}\log w+m\log N\cdot occ)\) time. Here w is the word size and occ is the number of minimal occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for \(occ=o(\frac{n}{\log N})\) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.

Keywords

Straight line program SLP Compressed Subsequence matching Tree coloring First colored ancestor 

References

  1. 1.
    Abiteboul, S., Alstrup, S., Kaplan, H., Milo, T., Rauhe, T.: Compact labeling scheme for ancestor queries. SIAM J. Comput. 35(6), 1295–1309 (2006)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Montanari, U., Rolim, J.P., Welzl, E. (eds.) Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 1853, pp. 73–84. Springer, Berlin, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Alstrup, S., Holm, J., de Lichtenberg, K., Thorup, M.: Minimizing diameters of dynamic trees. In: Degano, P., Gorrieri, R., Marchetti-Spaccamela, A. (eds.) Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 1256, pp. 270–280. Springer, Berlin, Heidelberg (1997)CrossRefGoogle Scholar
  4. 4.
    Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proceedings of the 39th Annual Symposium on Foundations of Computer Science, pp. 534–543. IEEE Computer Society, Washington, DC (1998)Google Scholar
  5. 5.
    Alstrup, S., Secher, J.P., Spork, M.: Optimal on-line decremental connectivity in trees. Inf. Process. Lett. 64(4), 161–164 (1997)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Baeza-Yates, R.A.: Searching subsequences. Theor. Comput. Sci. 78(2), 363–376 (1991)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comput. Sci. 321(1), 5–12 (2004)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. Syst. Sci. 48(2), 214–230 (1994)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 373–389. SIAM, San Francisco (2011)Google Scholar
  10. 10.
    Boasson, L., Cegielski, P., Guessarian, I., Matiyasevich, Y.: Window-accumulated subsequence matching problem is linear. Ann. Pure Appl. Log. 113(1), 59–80 (2001)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Cégielski, P., Guessarian, I., Lifshits, Y., Matiyasevich, Y.: Window subsequence problems for compressed texts. In: Grigoriev, D., Harrison, J., Hirsch, E.A. (eds.) Computer Science-Theory and Applications. Lecture Notes in Computer Science, vol. 3967, pp. 127–136. Springer, Berlin, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Cégielski, P., Guessarian, I., Matiyasevich, Y.: Multiple serial episodes matching. Inf. Proc. Lett. 98(6), 211–218 (2006)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Crochemore, M., Melichar, B., Troníček, Z.: Directed acyclic subsequence graph—overview. J. Discret. Algorithms 1(3), 255–280 (2003)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Das, G., Fleischer, R., Gasieniec, L., Gunopulos, D., Kärkkäinen, J.: Episode matching. In: Apostolico, A., Hein, J. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 1264, pp. 12–27. Springer, Berlin, Heidelberg (1997)CrossRefGoogle Scholar
  16. 16.
    Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) Algorithms and Data Structures. Lecture Notes in Computer Science, vol. 519, pp. 32–40. Springer, Berlin, Heidelberg (1991)CrossRefGoogle Scholar
  17. 17.
    Ferragina, P., Muthukrishnan, S.: Efficient dynamic method-lookup for object oriented languages. In: Diaz, J., Serna, M. (eds.) European Symposium on Algorithms. Lecture Notes in Computer Science, vol. 1136, pp. 107–120. Springer, Berlin, Heidelberg (1996)Google Scholar
  18. 18.
    Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. P. IEEE. 88(11), 1722–1732 (2000)Google Scholar
  20. 20.
    Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Disc. 1(3), 259–289 (1997)CrossRefGoogle Scholar
  21. 21.
    Muthukrishnan, S., Müller, M.: Time and space efficient method-lookup for object-oriented programs. In: Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 42–51. Society for Industrial and Applied Mathematics, Atlanta (1996)Google Scholar
  22. 22.
    Rytter, W.: Application of Lempel–Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1), 211–222 (2003)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Sleator, D.D., Endre Tarjan, R.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Thorup, M.: Randomized sorting in \(O(n\log \log n)\) time and linear space using addition, shift, and bit-wise boolean operations. J. Algorithms 42(2), 205–230 (2002)CrossRefMATHGoogle Scholar
  25. 25.
    Tiskin, A.: Faster subsequence recognition in compressed strings. J. Math. Sci. 158(5), 759–769 (2009)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Tiskin, A.: Towards approximate matching in compressed strings: local subsequence recognition. In: Kulikov, A., Vereshchagin, N. (eds.) Computer Science-Theory and Applications, vol. 6651, pp. 401–414. Springer, Berlin, Heidelberg (2011)CrossRefGoogle Scholar
  27. 27.
    Troníček, Z.: Episode matching. In: Amir, A. (ed.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2089, pp. 143–146. Springer, Berlin, Heidelberg (2001)CrossRefGoogle Scholar
  28. 28.
    Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster subsequence and dont-care pattern matching on compressed texts. In: Giancarlo, R., Manzini, G. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 6661, pp. 309–322. Springer, Berlin, Heidelberg (2011)CrossRefGoogle Scholar
  29. 29.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Philip Bille
    • 1
  • Patrick Hagge Cording
    • 1
  • Inge Li Gørtz
    • 1
  1. 1.Technical University of DenmarkKongens LyngbyDenmark

Personalised recommendations