Fast Algorithm for Partial Covers in Words

A factor u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u$$\end{document} of a word w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document} is a cover of w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document} if every position in w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document} lies within some occurrence of u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u$$\end{document} in w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document}. A word w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document} covered by u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u$$\end{document} thus generalizes the idea of a repetition, that is, a word composed of exact concatenations of u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u$$\end{document}. In this article we introduce a new notion of α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-partial cover, which can be viewed as a relaxed variant of cover, that is, a factor covering at least α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} positions in w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document}. We develop a data structure of O(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(n)$$\end{document} size (where n=|w|\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=|w|$$\end{document}) that can be constructed in O(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(n\log n)$$\end{document} time which we apply to compute all shortest α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-partial covers for a given α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}. We also employ it for an O(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(n\log n)$$\end{document}-time algorithm computing a shortest α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-partial cover for each α=1,2,…,n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =1,2,\ldots ,n$$\end{document}.


Introduction
The notion of periodicity in words and its many variants have been well-studied in numerous fields like combinatorics on words, pattern matching, data compression, automata theory, formal language theory, and molecular biology (see [9]).However the classic notion of periodicity is too restrictive to provide a description of a word such as abaababaaba, which is covered by copies of aba, yet not exactly periodic.To fill this gap, the idea of quasiperiodicity was introduced [1].In a periodic word, the occurrences of the period do not overlap.In contrast, the occurrences of a quasiperiod in a quasiperiodic word may overlap.Quasiperiodicity thus enables the detection of repetitive structures that would be ignored by the classic characterization of periods.
The most well-known formalization of quasiperiodicity is the cover of word.A factor u of a word w is said to be a cover of w if u = w, and every position in w lies within some occurrence of u in w.Equivalently, we say that u covers w.Note that a cover of w must also be a border -both prefix and suffix -of w.Thus, in the above example, aba is the shortest cover of abaababaaba.
A linear-time algorithm for computing the shortest cover of a word was proposed by Apostolico et al. [2], and a linear-time algorithm for computing all the covers of a word was proposed by Moore & Smyth [23].Breslauer [4] gave an online linear-time algorithm computing the minimal cover array of a word -a data structure specifying the shortest cover of every prefix of the word.Li & Smyth [22] provided a linear-time algorithm for computing the maximal cover array of a word, and showed that, analogous to the border array [8], it actually determines the structure of all the covers of every prefix of the word.
A known extension of the notion of cover is the notion of seed.A seed is not necessarily aligned with the ends of the word being covered, but is allowed to overflow on either side.More formally, a word u is a seed of w if u is a factor of w and w is a factor of some word y covered by u.Seeds were first introduced by Iliopoulos, Moore, and Park [18].A linear algorithm for computing the shortest seed of a word was given by Kociumaka et al. [19].
Still it remains unlikely that an arbitrary word, even over the binary alphabet, has a cover (or even a seed).For example, abaaababaabaaaababaa is a word that not only has no cover, but whose every prefix also has no cover.In this article we provide a natural form of quasiperiodicity.We introduce the notion of partial covers, that is, factors covering at least a given number of positions in w.Recently, Flouri et al. [14] suggested a related notion of enhanced covers which are additionally required to be borders of the word.
Partial covers can be viewed as a relaxed variant of covers alternative to approximate covers [24].The approximate covers require each position to lie within an approximate occurrence of the cover.This allows for small irregularities within each fragment of a word.On the other hand partial covers require exact occurrences but drop the condition that all positions need to be covered.This allows some fragments to be completely irregular as long as the total length of such fragments is small.The significant advantage of partial covers is that they enjoy more combinatorial properties, and consequently the algorithms solving the most natural problems are much more efficient than those concerning approximate covers, where the time complexity rarely drops below quadratic and some problems are even NP-hard.
Let Covered (u, w) denote the number of positions in w covered by occurrences of the word u in w; we call this value the cover index of u within w.For example, Covered (aba, aababab) = 5.We primarily focus on the following two problems, but the tools we develop can be used to answer a number of questions concerning partial covers, some of which are discussed in the Conclusions.

PartialCovers problem
Input: a word w of length n and a positive integer α ≤ n.
Output: all shortest factors u such that Covered (u, w) ≥ α.
Each factor given in the output is represented by the first and the last starting position of its occurrence in w.
AllPartialCovers problem Input: a word w of length n.
Our contribution.The following summarizes our main result.
Theorem 1.The PartialCovers and AllPartialCovers problems can be solved in O(n log n) time and O(n) space.
We extensively use suffix trees, for an exposition see [8,12].A suffix tree of a word is a compact trie of its suffixes, the nodes of the trie which become nodes of the suffix tree are called explicit nodes, while the other nodes are called implicit.Each edge of the suffix tree can be viewed as an upward maximal path of implicit nodes starting with an explicit node.Moreover, each node belongs to a unique path of that kind.Then, each node of the trie can be represented in the suffix tree by the edge it belongs to and an index within the corresponding path.Each factor of the word corresponds to an explicit or implicit node of the suffix tree.A representation of this node is called the locus of the factor.Our algorithm finds the loci of the shortest partial covers, it is then straightforward to locate an occurrence for each of them.
A Sketch of the Algorithm The algorithm first augments the suffix tree of w, that is, a linear number of implicit extra nodes become explicit.Then, each node of the augmented tree is annotated with two integer values.They allow for determining the size of the covered area for each implicit node by a simple formula, since limited to a single edge of the augmented suffix tree, these values form an arithmetic progression.This yields a solution to the PartialCovers.For an efficient solution to the AllPartialCovers problem, we additionally find the upper envelope of a number of line segments constructed from the arithmetic progressions.
Structure of the Paper In Section 2 we formally introduce the augmented and annotated suffix tree that we call Cover Suffix Tree.We show its basic properties and present its application for PartialCovers and AllPartialCovers problems.Section 4 is dedicated to the construction of the Cover Suffix Tree.Before that, Section 3 presents an auxiliary data structure being an extension of the classical Union/Find data structure; its implementation is given later, in Section 5. Additional applications of the Cover Suffix Tree are given in Sections 6 and 7.The former presents how the data structure can be used to compute all primitively rooted squares in a word and a linear-sized representation of all the seeds in a word.The latter contains a short discussion of variants of the PartialCovers problem that can be solved in a similar way.

Augmented and Annotated Suffix Trees
Let w be a word of length n over a totally ordered alphabet Σ.The suffix tree T of w can be constructed in O(n log |Σ|) time [13,25].For an explicit or implicit node v of T , we denote by v the word obtained by spelling the characters on a path from the root to v. We also denote |v| = |v|.As in most applications of the suffix tree, the leaves of T play an auxiliary role and do not correspond to factors (actually they are suffixes of w#, where # / ∈ Σ).They are labeled with the starting positions of the suffixes of w.We introduce the Cover Suffix Tree of w, denoted by CST (w), as an augmented -new nodes are added -suffix tree in which the nodes are annotated with information relevant to covers.CST (w) is similar to the data structure named Minimal Augmented Suffix Tree (see [3,5]).
For a set X of integers and x ∈ X, we define next X (x) = min{y ∈ X, y > x}, and we assume next X (x) = ∞ if x = max X.By Occ(v, w) we denote the set of starting positions of occurrences of v in w.For any i ∈ Occ(v, w), we define: Note that δ(i, v) = ∞ if i is the last occurrence of v. Additionally, we define:  A word u is called primitive if u = y k for a word y and an integer k implies that y = u, and non-primitive otherwise.A square u 2 is called primitively rooted if u is primitive.
Observation 1.Let v be a node in the suffix trie of w.Then vv is a primitively rooted square in w if and only if there exists i ∈ Occ(v, w) such that δ(i, v) = |v|.
Proof.Recall that, by the synchronization property of primitive words (see [8]), v is primitive if and only if it occurs exactly twice in vv.
(⇒) If vv occurs in w at position i then δ(i, v) = |v|.
(⇐) If δ(i, v) = |v| then obviously vv occurs in w at position i.Additionally, if v was not primitive then δ(i, v) < |v| would hold.
In CST (w), we introduce additional explicit nodes called extra nodes, which correspond to halves of primitively rooted square factors of w.Moreover we annotate all explicit nodes (including extra nodes) with the values cv , ∆; see, for example, Fig. 2. The number of extra nodes is bounded by the number of distinct squares, which is linear [15], so CST (w) takes O(n) space.
Lemma 1.Let v 1 , v 2 , . . ., v k be the consecutive implicit nodes on the edge from an explicit node v of CST (w) to its explicit parent.Then for i=1 forms an arithmetic progression.Proof.Note that Occ(v i , w) = Occ(v, w), since otherwise v i would be an explicit node of CST (w).Also note that if any two occurrences of v in w overlap, then the corresponding occurrences of vi overlap.Otherwise, by Observation 1, the path from v to v i (excluding v) would contain an extra node.Hence, when we go up from v (before reaching its parent) the size of the covered area decreases at each step by ∆(v).As a consequence of Lemma 1 we obtain the following result.Recall that the locus of a factor v of w, given by its start and end position in w, can be found in O(log log |v|) time [21].
Lemma 2. Assume we are given CST (w).Then we can compute: (1) for any α, the loci of the shortest α-partial covers in linear time; (2) given the locus of a factor u in the suffix tree CST (w), the cover index Covered (u, w) in O(1) time.
Proof.Part (2) is a direct consequence of Lemma 1.As for part (1), for each edge of CST (w), leading from v to its parent v ′ , we need to find minimum |v| ≥ j > |v ′ | for which cv (v) − ∆(v) • (|v| − j) ≥ α.Such a linear inequality can be solved in constant time.
Due to this fact the efficiency of the PartialCovers problem relies on the complexity of CST (w) construction.In turn, the following lemma, also a consequence of Lemma 1, can be used to solve AllPar-tialCovers problem provided that CST (w) is given.As a tool a solution to the geometric problem of upper envelope [17] is applied.
Lemma 3. Assume we are given CST (w).Then we can compute the locus of a shortest α-partial cover for each α = 1, 2, . . ., n in O(n log n) time and O(n) space.
Proof.Consider an edge of CST (w) from v to its parent v ′ containing k implicit nodes.For each such edge, we form a line segment on the plane connecting points (|v|, cv (v)) and (|v|−k, cv (v)−k •∆(v)) (if there are no implicit nodes on the edge, the line segment is a single point).Denote all such line segments obtained from CST (w) as s 1 , . . ., s m , we have m = O(n).We consider the upper envelope E of the set of these segments.Formally, if each s i connecting points (x i , y i ) and ( Here we are actually interested in an integer envelope E ′ , that is, E limited to integer arguments, see Fig. 3.By Lemma 1, for any j ∈ {1, . . ., n}, E ′ (j) equals the maximum of Covered (u, w) over all factors u of w such that |u| = j.A piecewise linear representation of E can be computed in O(m log m) time and O(m) space [17], therefore the function E ′ for all its arguments can be computed in the same time complexity.Let us introduce a prefix maxima sequence for E ′ : µ i = max{E ′ (j) : j ∈ {1, . . ., i}}, with µ 0 = 0. Note that µ i is non-decreasing.If µ i > µ i−1 then the shortest α-partial cover for all α ∈ (µ i−1 , µ i ] has length i.An example of such a partial cover can be recovered if we explicitly store the initial line segments used in the pieces of the representation of E. Thus the solution of the AllPartialCovers problem can be obtained from the sequence µ i in O(m) = O(n) time.
In the following two sections we provide an O(n log n) time construction of CST (w).Together with Lemmas 2 and 3, it yields Theorem 1.

Extension of Disjoint-Set Data Structure
In this section we extend the classic disjoint-set data structure to compute the change lists of the sets being merged, as defined below.First, let us extend the next notation.For a partition P = {P 1 , . . ., P k } of U = {1, . . ., n}, we define next P (x) = next Pi (x) where x ∈ P i .
Now for two partitions P, P ′ let us define the change list (see also Fig. 4) by We say that (P, id) is a partition of U labeled by L if P is a partition of U and id : P → L is a one-to-one (injective) mapping.A label ℓ ∈ L is called active if id(P ) = ℓ for some P ∈ P and free otherwise.Lemma 4. Let n ≤ k be positive integers such that k is of magnitude Θ(n).There exists a data structure of size O(n), which maintains a partition (P, id) of {1, . . ., n} labeled by L = {1, . . ., k} and supports the following operations: • Find (x) for x ∈ {1, . . ., n} gives the label of P ∈ P containing x.
• Union(I, ℓ) for a set I of active labels and a free label ℓ replaces all P ∈ P with labels in I by their set-theoretic union with the label ℓ.The change list of the corresponding modification of P is returned.
Initially P is a partition into singletons with id({x}) = x.Any valid sequence of Union operations is performed in O(n log n) time.A single Find operation takes O(1) time.
Note that these are actually standard disjoint-set data structure operations except for the fact that we require Union to return the change list.The technical proof of Lemma 4 is postponed until Section 5.

O(n log n)-time Construction of CST (w)
The suffix tree of w augmented with extra nodes is called the skeleton of CST (w), which we denote by sCST (w).It could be constructed using the fact that all square factors of a word can be computed in linear time [16,10,11].However, we do not need such a complicated machinery here.We will compute sCST (w) on the fly, simultaneously annotating the nodes with cv , ∆.
We introduce auxiliary notions related to covered area of nodes: In the course of the algorithm some nodes will have their values c, ∆ already computed; we call them processed nodes.Whenever v will be processed, so will its descendants.
The algorithm processes inner nodes v of sCST (w) in the order of non-increasing height h = |v|.The height is not defined for leaves, so we start with h = n + 1. Extra nodes are created on the fly using Observation 1 (this takes place in the auxiliary Lift routine).
We maintain the partition P of {1, . . ., n} given by sets of leaves of subtrees rooted at peak nodes.Initially the peak nodes are the leaves of sCST (w).Each time we process v all its children are peak nodes.
after processing v they are no longer peak nodes and v becomes a new peak node.The sets in the partition are labeled with identifiers of the corresponding peak nodes.Recall that leaves are labeled with the starting positions of the corresponding suffixes.We allow any labeling of the remaining nodes as long as each node of sCST (w) has a distinct label of magnitude O(n).For this set of labels we store the data structure of Lemma 4 to compute the change list of the changing partition.
We maintain the following technical invariant (see Fig. 5).

Invariant(h):
(A) For each peak node z we store: (B) For each i ∈ {1, . . ., n} we store Dist [i] = δ(i, Find (i)).We use two auxiliary routines.The Lift operation updates cv ′ and ∆ ′ values when h decrements.It also creates all extra nodes of depth h.The LocalCorrect operation is used for updating cv ′ and ∆ ′ values for children of the node v.The Dist and List arrays are stored to enable efficient implementation of these two routines.

Description of the Lift (h) Operation
The procedure Lift plays an important preparatory role in processing the current node.According to part (A) of our invariant, for all peak nodes z we know the values: cv ′ [z] = cv h+1 (z), ∆ ′ [z] = ∆ h+1 (z).Now we have to change h+1 to h and guarantee validity of the invariant: . This is exactly how the following operation updates cv ′ and ∆ ′ .
It also creates all extra nodes of depth h that were not explicit nodes of the suffix tree.By Observation 1, if i ∈ List [h] then at position i in w there is an occurrence of a primitively rooted square of half length h.Consequently, an extra node corresponding to this occurrence is created in the Lift operation.
Description of the LocalCorrect (p, q, v) Operation Here we assume that v occurs at positions p < q and that these are consecutive occurrences.Moreover, we assume that these occurrences are followed by distinct characters, i.e. (p, q) ∈ ChangeList (v).The LocalCorrect procedure updates Dist [p] to make part (B) of the invariant hold for p again.The data structure List is updated accordingly so that (C) remains satisfied.
Function LocalCorrect (p, q, v) Complexity of the Algorithm In the course of the algorithm we compute ChangeList (v) for each v ∈ T .Due to Lemma 4 we have: Consequently we perform O(n log n) operations LocalCorrect .In each of them at most one element is added to a list List [d] for some d.Hence the total number of insertions to these lists is also O(n log n).
The cost of each operation Lift is proportional to the total size of the list List [h] processed in this operation.For each h the list List [h] is processed once and the total number of insertions into lists is O(n log n), therefore the total cost of all operations Lift is also O(n log n).This proves the following fact which, together with Lemmas 2 and 3, implies our main result (Theorem 1).

Implementation Details
In this section we give a proof of Lemma 4. We use an approach similar to Brodal and Pedersen [6] (who use the results of [7]) originally devised for computation of maximal quasiperiodicities.
Theorem 3 of [6] states that a subset X of a linearly ordered universe can be stored in a height-balanced tree of linear size supporting the following operations: X.MultiInsert (Y ): insert all elements of Y to X, X.MultiPred (Y ): return all (y, x) for y ∈ Y and x = max{z ∈ X, z < y}, X.MultiSucc(Y ): return all (y, x) for y ∈ Y and x = min{z ∈ X, z > y}, In the data structure we store each P ∈ P as a height-balanced tree.Additionally, we store several auxiliary arrays, whose semantics follows.For each x ∈ {1, . . ., n} we maintain a value next[x] = next P (x) and a pointer tree[x] to the tree representing P such that x ∈ P .For each P ∈ P (technically for each tree representing P ∈ P) we store id[P ] and for each ℓ ∈ L we store id −1 [ℓ], a pointer to the corresponding tree (null for free labels).
Answering Find is trivial as it suffices to follow the tree pointer and return the id value.The Union operation is performed according to the pseudocode given below (for brevity we write P i instead of id −1 [i]).
Claim 2. The Union operation correctly computes the change list and updates the data structure.
Proof.In the Union operation for sets P i , i ∈ I, we find the largest set P i0 and MultiInsert all the elements of the remaining sets to P i0 .If (a, b) is in the change list, then a and b come from different sets P i , in particular at least one of them does not come from P i0 .Depending on which one it is, the pair (a, b) is found by MultiPred or MultiSucc operation.While computing C, the table next is not updated yet (i.e.corresponds to the state before Union operation) while S is already updated.Consequently the pairs inserted to C indeed belong to the change list.Once C is proved to be the change list, it is clear that next is updated correctly.For the other components of the data structure, correctness of updates is evident.for the largest set and for the remaining ones we have p i < 1  2 p (i.e.log p pi ≥ 1).This lets us bound the time complexity of the Union operation as follows: which is equal to the increase in potential.

By-Products of Cover Suffix Tree
In this section we present two additional applications of the Cover Suffix Tree.We show that, given CST (w) (or CST of a word that can be obtained from w in a simple manner), one can compute in linear time all distinct primitively rooted squares in w and a linear representation of all the seeds of w, in particular, the shortest seeds of w.This shows that constructing this data structure is at least as hard as computing all primitively rooted squares and seeds.While there are linear-time algorithms for these problems [16,20,10] and [19], they are all complex and rely on the combinatorial properties specific to the repetitive structures they seek for.
Theorem 4. Assume that the Cover Suffix Tree of a word of length n can be computed in T (n) time.Then all distinct primitively rooted squares in a word w of length n can be computed in T (2n) time.
Let us consider the set of explicit non-branching nodes of CST (w ′ ) and select among them the nodes corresponding to even-length factors of w ′ starting with the symbol 0. It suffices to note that there is a one-to-one correspondence between these nodes and the halves of primitively rooted squares in w.Recall that a word u is a seed of w if u is a factor of w and w is a factor of some word y covered by u, see Fig. 6.The following lemma states that the set of all seeds of w has a representation of O(n) size, where n = |w|.This representation enables, e.g., simple computation of all shortest seeds of the word.By a range on a edge of a suffix tree we mean a number of consecutive nodes on this edge (obviously at most one of these nodes is explicit).Let w R denote the reverse of the word w.Lemma 6 ([18,19]).The set of all seeds of w can be split into two disjoint classes.The seeds from one class form a single (possibly empty) range on each edge of the suffix tree of w, while the seeds from the other class form a range on each edge of the suffix tree of w R .
We will show that given CST (w) and CST (w R ) we can compute the representation of all seeds from Lemma 6 in O(n) time.Let us recall auxiliary notions of quasiseed and quasigap, see [19].
By first(u) and last (u) let us denote min Occ(u) and max Occ(u), respectively.We say that u is a complete cover in w if u is a cover of the word w[first (u), last (u) + |u| − 1].The word u is called a quasiseed of w if u is a complete cover in w, first(u) < |u| and n + 1 − last (u) < 2|u|.Alternatively, w can be decomposed into w = xyz, where |x|, |z| < |u| and u is a cover of y.
All quasiseeds of w lying on the same edge of the suffix tree with lower explicit endpoint v form a range with the lower explicit end of the range located at v. The length of the upper end of the range is denoted as quasigap(v).If the range is empty, we set quasigap(v) = ∞.Thus a representation of all quasiseeds of a given word can be provided using only the quasigaps of explicit nodes in the suffix tree.It is known that computation of quasiseeds is the hardest part of an algorithm computing seeds: Lemma 7 ( [18,19]).Assume quasigaps of all explicit nodes of suffix trees of w and w R are known.Then a representation of all seeds of w from Lemma 6 can be found in O(n) time.
It turns out that the auxiliary data in CST (w) and CST (w R ) enable constant-time computation of quasigaps of explicit nodes.By Lemma 7 this yields an O(n) time algorithm for computing a representation of all the seeds of w.This is stated formally in the following theorem.
Theorem 5. Assume that the Cover Suffix Tree of a word of length n can be computed in T (n) time.Given a word w of length n, one can compute a representation of all seeds of w from Lemma 6 in T (n) time.In particular, all the shortest seeds of w can be computed within the same time complexity.
Proof.We show how to compute quasigaps for all explicit nodes of CST (w).The computation for CST (w R ) is symmetric.Note that CST (w) may contain more explicit nodes that the suffix tree of the word.In this case, the results from any maximal sequence of edges connected by non-branching explicit nodes in CST (w) need to be merged into a single range on the corresponding edge of the suffix tree.
By the definition of cv (v), an explicit node v of CST (w) is a complete cover in w if the following condition holds: cv (v) = last (v) − first(v) + |v|.
Thus for checking whether an explicit node v of CST (w) is a quasiseed of w it suffices to check whether this condition and the following equalities hold: If v is not a quasiseed of w, we have quasigap(v) = ∞, otherwise we can assume that quasigap(v) ≤ |v|.Therefore cacc is a quasiseed of w, see also Fig. 1.
By Lemma 1, the condition for any node on the edge ending at v to be a complete cover in w is very simple: Assume this condition is satisfied and consider any implicit node v ′ on this edge.Then v ′ is a quasiseed if both inequalities: Therefore cccacc is a quasiseed of w.Since ∆(v) = 1, quasigap(v) could be smaller than 6.However, ⌈(n − last (v) + 2)/2⌉ = 6 and the above formula yields quasigap(v) = 6.
This concludes a complete set of rules for computing quasigap(v) for explicit nodes of CST (w).

Conclusions
We have presented an algorithm which constructs a data structure, called the Cover Suffix Tree, in O(n log n) time and O(n) space.The Cover Suffix Tree has been developed in order to solve the PartialCovers and AllPartialCovers problem in O(n) and O(n log n) time, respectively, but it also gives a well-structured description of the cover indices of all factors.Consequently, various questions related to partial covers can be answered efficiently.For example, with the Cover Suffix Tree one can solve in linear time a problem inverse to PartialCovers: find a factor of length between l and r that maximizes the number of positions covered.Also a similar problem to AllPartialCovers problem, to compute for all lengths l = 1, . . ., n the maximum number of positions covered by a factor of length l, can be solved in O(n log n) time.This solution was actually given implicitly in the proof of Lemma 3.
An interesting open problem is to reduce the construction time to O(n).This could be difficult, though, since by the results of Section 6 this would yield alternative linear-time algorithms finding primitively rooted squares and computing seeds.The only known linear-time algorithms for these problems (see [16,10,11] and [19]) are rather complex.

Figure 2 :Example 2 .
Figure 2: CST (w) for w = bcccacccaccaccb.It contains four extra nodes that are denoted by squares in the figure.Each node is annotated with cv (v), ∆(v).Leaves are omitted for clarity.

Figure 3 :Figure 4 :
Figure 3: Line segments constructed as in Lemma 3 for the CST (w) from Fig. 2. The marked points joined with a dashed polyline show the values of the integer upper envelope function E ′ .We infer from the graph that the lengths of the shortest α-partial covers of w are as follows: 1 for α ≤ 10, 4 for α = 11, 5 for α = 12, and α for α ≥ 13.

Lemma 5 .
Algorithm ComputeCST constructs CST (w) in O(n log n) time and O(n) space, where n = |w|.

Claim 3 .
Any sequence of Union operations takes O(n log n) time in total.Proof.Let us introduce a potential function Φ(P) = P ∈P |P | log |P |.We shall prove that the running time of a single Union operation is proportional to the increase in potential.Clearly 0 ≤ Φ(P) = P ∈P |P | log |P | ≤ P ∈P |P | log n = n log n, so this suffices to obtain the desired O(n log n) bound.Let us consider a Union operation that merges partition classes of sizes p 1 ≥ p 2 ≥ . . .≥ p k to a single class of size p = k i=1 p i .The most time-consuming steps of the algorithm are the operations on heightbalanced trees, which, for single i, run in O max p i , p i log p pi time.These operations are not performed