Advertisement

Longest Common Extensions in Trees

  • Philip Bille
  • Paweł Gawrychowski
  • Inge Li Gørtz
  • Gad M. Landau
  • Oren WeimannEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9133)

Abstract

The longest common extension (LCE) of two indices in a string is the length of the longest identical substrings starting at these two indices. The LCE problem asks to preprocess a string into a compact data structure that supports fast LCE queries.

In this paper we generalize the LCE problem to trees and suggest a few applications of LCE in trees to tries and XML databases. Given a labeled and rooted tree \(T\) of size \(n\), the goal is to preprocess \(T\) into a compact data structure that support the following LCE queries between subpaths and subtrees in \(T\). Let \(v_1\), \(v_2\), \(w_1\), and \(w_2\) be nodes of \(T\) such that \(w_1\) and \(w_2\) are descendants of \(v_1\) and \(v_2\) respectively.

  • \({\mathrm {LCE}_{ PP }}(v_1, w_1, v_2, w_2)\): (path-path \({\mathrm {LCE}}\)) return the longest common prefix of the paths \(v_1 \leadsto w_1\) and \(v_2 \leadsto w_2\).

  • \({\mathrm {LCE}_{ PT }}(v_1, w_1, v_2)\): (path-tree \({\mathrm {LCE}}\)) return maximal path-path LCE of the path \(v_1 \leadsto w_1\) and any path from \(v_2\) to a descendant leaf.

  • \({\mathrm {LCE}_{ TT }}(v_1, v_2)\): (tree-tree \({\mathrm {LCE}}\)) return a maximal path-path LCE of any pair of paths from \(v_1\) and \(v_2\) to descendant leaves.

We present the first non-trivial bounds for supporting these queries. For \({\mathrm {LCE}_{ PP }}\) queries, we present a linear-space solution with \(O(\log ^{*} n)\) query time. For \({\mathrm {LCE}_{ PT }}\) queries, we present a linear-space solution with \(O((\log \log n)^{2})\) query time, and complement this with a lower bound showing that any path-tree LCE structure of size \(O(n \text {polylog}(n))\) must necessarily use \({\varOmega }(\log \log n)\) time to answer queries. For \({\mathrm {LCE}_{ TT }}\) queries, we present a time-space trade-off, that given any parameter \(\tau \), \(1 \le \tau \le n\), leads to an \(O(n\tau )\) space and \(O(n/\tau )\) query-time solution. This is complemented with a reduction to the set intersection problem implying that a fast linear space solution is not likely to exist.

Keywords

Query Time Suffix Tree XPath Query Common Prefix Difference Cover 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 73–84. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  2. 2.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with \(k\) mismatches. J. Algorithms 50(2), 257–275 (2004)zbMATHMathSciNetCrossRefGoogle Scholar
  3. 3.
    Bannai, H., Gawrychowski, P., Inenaga, S., Takeda, M.: Converting SLP to LZ78 in almost Linear Time. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 38–49. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  4. 4.
    Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776. Springer, Heidelberg (2000) Google Scholar
  5. 5.
    Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321(1), 5–12 (2004)zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. Syst. Sci. 48(2), 214–230 (1994)zbMATHMathSciNetCrossRefGoogle Scholar
  7. 7.
    Breslauer, D.: The suffix tree of a tree and minimizing sequential transducers. Theoret. Comput. Sci. 191(1–2), 131–144 (1998)zbMATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40–42), 3795–3800 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  9. 9.
    Cole, R., Hariharan, R.: Approximate string matching: a simpler faster algorithm. SIAM J. Comput. 31(6), 1761–1782 (2002)zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS ’91. LNCS, vol. 519, pp. 32–40. Springer, Heidelberg (1991) CrossRefGoogle Scholar
  11. 11.
    Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  12. 12.
    Fredman, M.L., Komlos, J., Szemeredi, E.: Storing a sparse table with \(O(1)\) worst case access time. In Proceedings of 23rd FOCS, pp. 165–169, November 1982Google Scholar
  13. 13.
    Geary, R.F., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. ACM Trans. Algorithms 2(4), 510–534 (2006)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997) zbMATHCrossRefGoogle Scholar
  15. 15.
    Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004)zbMATHMathSciNetCrossRefGoogle Scholar
  16. 16.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)zbMATHMathSciNetCrossRefGoogle Scholar
  17. 17.
    Kosaraju, S.R.: Efficient tree pattern matching. In: Proceedings of 30th FOCS, pp. 178–183 (1989)Google Scholar
  18. 18.
    Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27(2), 557–582 (1998)zbMATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10, 157–169 (1989)zbMATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Main, M.G., Lorentz, R.J.: An \(O(n \log n)\) algorithm for finding all repetitions in a string. J. Algorithms 5(3), 422–432 (1984)zbMATHMathSciNetCrossRefGoogle Scholar
  21. 21.
    Pǎtraşcu, M., Roditty, L.: Distance oracles beyond the Thorup-Zwick bound. In: Proceedings of 51st IEEE FOCS, pp. 815–823 (To appear, 2010)Google Scholar
  22. 22.
    Pǎtraşcu, M., Thorup, M.: Time-space trade-offs for predecessor search. In: Proceedings of 38th STOC, pp. 232–240 (2006)Google Scholar
  23. 23.
    Ružić, M.: Uniform algorithms for deterministic construction of efficient dictionaries. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221, pp. 592–603. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  24. 24.
    Shibuya, T.: Constructing the suffix tree of a tree with a large alphabet. In: Aggarwal, A.K., Pandu Rangan, C. (eds.) ISAAC 1999. LNCS, vol. 1741, pp. 225–236. Springer, Heidelberg (1999) CrossRefGoogle Scholar
  25. 25.
    van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Math. Syst. Theory 10, 99–127 (1977)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Philip Bille
    • 1
  • Paweł Gawrychowski
    • 2
  • Inge Li Gørtz
    • 1
  • Gad M. Landau
    • 3
    • 4
  • Oren Weimann
    • 3
    Email author
  1. 1.DTU InformaticsCopenhagenDenmark
  2. 2.University of WarsawWarsawPoland
  3. 3.University of HaifaHaifaIsrael
  4. 4.NYUNew YorkUSA

Personalised recommendations