Efficient Counting of Square Substrings in a Tree
We give an algorithm which in O(nlog2 n) time counts all distinct squares in labeled trees. There are two main obstacles to overcome. Crochemore et al. showed in 2012 that the number of such squares is bounded by Θ(n 4/3). This is substantialy different from the case of classical strings, which admit only a linear number of distinct squares. We deal with this difficulty by introducing a compact representation of all squares (based on maximal cyclic shifts) that requires only O(n logn) space. The second obstacle is lack of adequate algorithmic tools for labeled trees. Consequently we develop several novel techniques, which form the most complex part of the paper. In particular we extend Imre Simon’s implementation of the failure function in pattern matching machines.
Unable to display preview. Download preview PDF.