Advertisement

CPM 2015: Combinatorial Pattern Matching pp 52-64

# Longest Common Extensions in Trees

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9133)

## Abstract

The longest common extension (LCE) of two indices in a string is the length of the longest identical substrings starting at these two indices. The LCE problem asks to preprocess a string into a compact data structure that supports fast LCE queries.

In this paper we generalize the LCE problem to trees and suggest a few applications of LCE in trees to tries and XML databases. Given a labeled and rooted tree $$T$$ of size $$n$$, the goal is to preprocess $$T$$ into a compact data structure that support the following LCE queries between subpaths and subtrees in $$T$$. Let $$v_1$$, $$v_2$$, $$w_1$$, and $$w_2$$ be nodes of $$T$$ such that $$w_1$$ and $$w_2$$ are descendants of $$v_1$$ and $$v_2$$ respectively.

• $${\mathrm {LCE}_{ PP }}(v_1, w_1, v_2, w_2)$$: (path-path $${\mathrm {LCE}}$$) return the longest common prefix of the paths $$v_1 \leadsto w_1$$ and $$v_2 \leadsto w_2$$.

• $${\mathrm {LCE}_{ PT }}(v_1, w_1, v_2)$$: (path-tree $${\mathrm {LCE}}$$) return maximal path-path LCE of the path $$v_1 \leadsto w_1$$ and any path from $$v_2$$ to a descendant leaf.

• $${\mathrm {LCE}_{ TT }}(v_1, v_2)$$: (tree-tree $${\mathrm {LCE}}$$) return a maximal path-path LCE of any pair of paths from $$v_1$$ and $$v_2$$ to descendant leaves.

We present the first non-trivial bounds for supporting these queries. For $${\mathrm {LCE}_{ PP }}$$ queries, we present a linear-space solution with $$O(\log ^{*} n)$$ query time. For $${\mathrm {LCE}_{ PT }}$$ queries, we present a linear-space solution with $$O((\log \log n)^{2})$$ query time, and complement this with a lower bound showing that any path-tree LCE structure of size $$O(n \text {polylog}(n))$$ must necessarily use $${\varOmega }(\log \log n)$$ time to answer queries. For $${\mathrm {LCE}_{ TT }}$$ queries, we present a time-space trade-off, that given any parameter $$\tau$$, $$1 \le \tau \le n$$, leads to an $$O(n\tau )$$ space and $$O(n/\tau )$$ query-time solution. This is complemented with a reduction to the set intersection problem implying that a fast linear space solution is not likely to exist.

## Keywords

Query Time Suffix Tree XPath Query Common Prefix Difference Cover
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 73–84. Springer, Heidelberg (2000)
2. 2.
Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with $$k$$ mismatches. J. Algorithms 50(2), 257–275 (2004)
3. 3.
Bannai, H., Gawrychowski, P., Inenaga, S., Takeda, M.: Converting SLP to LZ78 in almost Linear Time. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 38–49. Springer, Heidelberg (2013)
4. 4.
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776. Springer, Heidelberg (2000) Google Scholar
5. 5.
Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321(1), 5–12 (2004)
6. 6.
Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. Syst. Sci. 48(2), 214–230 (1994)
7. 7.
Breslauer, D.: The suffix tree of a tree and minimizing sequential transducers. Theoret. Comput. Sci. 191(1–2), 131–144 (1998)
8. 8.
Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40–42), 3795–3800 (2010)
9. 9.
Cole, R., Hariharan, R.: Approximate string matching: a simpler faster algorithm. SIAM J. Comput. 31(6), 1761–1782 (2002)
10. 10.
Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS ’91. LNCS, vol. 519, pp. 32–40. Springer, Heidelberg (1991)
11. 11.
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)
12. 12.
Fredman, M.L., Komlos, J., Szemeredi, E.: Storing a sparse table with $$O(1)$$ worst case access time. In Proceedings of 23rd FOCS, pp. 165–169, November 1982Google Scholar
13. 13.
Geary, R.F., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. ACM Trans. Algorithms 2(4), 510–534 (2006)
14. 14.
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
15. 15.
Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004)
16. 16.
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
17. 17.
Kosaraju, S.R.: Efficient tree pattern matching. In: Proceedings of 30th FOCS, pp. 178–183 (1989)Google Scholar
18. 18.
Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27(2), 557–582 (1998)
19. 19.
Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10, 157–169 (1989)
20. 20.
Main, M.G., Lorentz, R.J.: An $$O(n \log n)$$ algorithm for finding all repetitions in a string. J. Algorithms 5(3), 422–432 (1984)
21. 21.
Pǎtraşcu, M., Roditty, L.: Distance oracles beyond the Thorup-Zwick bound. In: Proceedings of 51st IEEE FOCS, pp. 815–823 (To appear, 2010)Google Scholar
22. 22.
Pǎtraşcu, M., Thorup, M.: Time-space trade-offs for predecessor search. In: Proceedings of 38th STOC, pp. 232–240 (2006)Google Scholar
23. 23.
Ružić, M.: Uniform algorithms for deterministic construction of efficient dictionaries. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221, pp. 592–603. Springer, Heidelberg (2004)
24. 24.
Shibuya, T.: Constructing the suffix tree of a tree with a large alphabet. In: Aggarwal, A.K., Pandu Rangan, C. (eds.) ISAAC 1999. LNCS, vol. 1741, pp. 225–236. Springer, Heidelberg (1999)
25. 25.
van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Math. Syst. Theory 10, 99–127 (1977)

## Copyright information

© Springer International Publishing Switzerland 2015

## Authors and Affiliations

• Philip Bille
• 1
• Paweł Gawrychowski
• 2
• Inge Li Gørtz
• 1
• Gad M. Landau
• 3
• 4
• Oren Weimann
• 3
Email author
1. 1.DTU InformaticsCopenhagenDenmark
2. 2.University of WarsawWarsawPoland
3. 3.University of HaifaHaifaIsrael
4. 4.NYUNew YorkUSA