Efﬁcient enumeration of non-equivalent squares in partial words with few holes

A word of the form W W for some word W ∈ Σ ∗ is called a square. A partial word is a word possibly containing holes (also called don’t cares). The hole is a special symbol ♦ / ∈ Σ which matches any symbol from Σ ∪{ ♦ } . A p-square is a partial word matching at least one square W W without holes. Two p-squares are called equivalent if they match the same set of squares. A p-square is called here unambiguous if it matches exactly one square W W without holes. Such p-squares are natural counterparts of classical squares. Let PSQUARES k ( n ) and USQUARES k ( n ) be the maximum number of non-equivalent p-squares and non-equivalent unambiguous p-squares in T over all partial words T of length n with at most k holes. We show asymptotically tight bounds: PSQUARES k ( n ) = Θ( min ( nk 2 , n 2 )), USQUARES k ( n ) = Θ( nk ). We present an algorithm that reports all non-equivalent p-squares in O ( nk 3 ) time for a partial word of length n with k holes, for an integer alphabet. In particular, it runs in linear time for k = O ( 1 ) and its time complexity near-matches the asymptotic bound for PSQUARES k ( n ) . We also show an O ( n ) -time algorithm that reports all non-equivalent p-squares of a given length. The paper is a full and improved version of Charalampopoulos et al. (in Cao Y, Chen Y (eds) Proceedings of the 23rd international conference on computing and combinatorics, COCOON 2017; Springer, 2017).


Introduction
A word is a sequence of letters from a given alphabet Σ. By Σ * we denote the set of all words over Σ. A word of the form U 2 = UU , for some word U , is called a square. For a word W , a factor is a subword composed of some number of consecutive letters and a square factor is a factor of W which is a square. Enumeration of square factors in words is a well-studied topic, both from a combinatorial and from an algorithmic perspective. Obviously, a word W of length n may contain Θ(n 2 ) square factors (e.g. W = a n ), however, it is known that such a word contains only O(n) distinct square factors (Fraenkel and Simpson 1998;Ilie 2005); currently the best known upper bound is 11 6 n (Deza et al. 2015). Moreover, all distinct square factors of a word over an integer alphabet can be listed in O(n) time using the suffix tree (Gusfield and Stoye 2004;Bannai et al. 2017) or the suffix array and the structure of runs (maximal repetitions) in the word (Crochemore et al. 2014).
A partial word is a sequence of letters from Σ ∪ {♦}, where ♦ denotes a hole, that is, a don't care symbol. Two symbols a, b ∈ Σ ∪ {♦} are said to match (denoted as a ≈ b) if they are equal or one of them is a hole; note that this relation is not transitive. The relation of matching is extended in a natural way to partial words of the same length.
A partial word U V is called a p-square if U ≈ V . Like in the context of words, a p-square factor of a partial word T is a factor being a p-square; see Blanchet-Sadri et al. (2014b. (aba) 2 (baa) 2 (ba♦) 2 (aab) 2 Our work is devoted to enumeration of non-equivalent p-square factors in a partial word with a given number k > 0 of holes. Previous results Alongside , 2014b, we define a solid square as a square of a word and a square subword of a partial word T as a solid square that matches a factor of T .
Previous studies on squares in partial words were mostly focused on combinatorics. They started with the case of k = 1 ), in which case distinct square subwords correspond to non-equivalent p-square factors. It was shown that a partial word with one hole contains at most 7 2 n distinct square subwords (Blanchet-Sadri and Mercaş 2009) (3n for binary partial words; Halava et al. 2010). Also a generalization of the three squares lemma (see Crochemore and Rytter 1995) was proposed for partial words (Blanchet-Sadri and Mercaş 2012). As for a larger number of holes, the existing literature is devoted mainly to counting the number of distinct square subwords of a partial word  or all occurrences of p-square factors (Blanchet-Sadri et al. 2014a. On the algorithmic side, Manea and Tiseanu (2010) proved that the problem of counting distinct square subwords of a partial word is #P-complete and Diaconu et al. (2009), Manea et al. (2014), and Blanchet-Sadri et al. (2014b) showed quadratic-and nearly-quadratic-time algorithms for finding all occurrences of p-square factors and primitively-rooted p-square factors of a partial word, respectively. Our combinatorial results Let PSQUARES k (n) and USQUARES k (n) be the maximum number of non-equivalent p-squares and non-equivalent unambiguous p-squares in T over all partial words T of length n with at most k holes. We show the following bounds: This work can be viewed as a generalization of the results on partial words with one hole Halava et al. 2010) to k holes. Our algorithmic results We present an algorithm that reports all elements of the set psquares(T ) in a partial word of length n with k holes in O(nk 3 ) time. In particular, our algorithm runs in linear time for k = O(1) and its time complexity near-matches the maximum number of non-equivalent p-square factors. We also show an O(n)-time algorithm that reports all non-equivalent p-squares of a given length. The algorithms assume integer alphabet Σ ⊆ {1, . . . , n O(1) }. We use recently introduced advanced data structures by Kociumaka (2016). Comparison with the conference version The paper is an extended version of Charalampopoulos et al. (2017). As far as combinatorics of p-squares is concerned, the conference version of the paper derived the bound PSQUARES k (n) = Θ(min(n 2 , nk 2 )). Let ASQUARES k (n) be the maximum number of non-equivalent ambiguous p-squares in T over all partial words T of length n with at most k holes. The bound was proved by showing that ASQUARES k (n) = Θ(min(n 2 , nk 2 )) and that USQUARES k (n) = O(nk 2 ). As a new contribution here, we present a tight estimation USQUARES k (n) = Θ(nk). This lets us identify ambiguous p-squares as the ones that attain the bound on PSQUARES k (n). On the algorithmic side, Charalampopoulos et al. (2017) presented an algorithm computing the set psquares(T ) in O(nk 3 ) time.
Here the readability of the algorithm has been considerably improved; we also show a linear-time algorithm that reports all non-equivalent p-squares of a specified length. Structure of the paper After the Preliminaries comes the algorithmic part of the paper, which is followed by the combinatorial part. In Sect. 3 we show an O(n)time algorithm that reports all non-equivalent p-squares of a specified length and, as an immediate corollary, O(nk 2 )-time computation of all non-equivalent ambiguous p-squares. Then in Sect. 4 we give an O(nk 3 )-time algorithm for computing all nonequivalent unambiguous p-squares. Asymptotic bounds for ambiguous p-squares and unambiguous p-squares are presented in Sects. 5 and 6, respectively.

Preliminaries
For a word W ∈ Σ * , by |W | = n we denote the length of W , and by W i , for For a partial word T we use the same notation as for words: |T | = n for its length, T i for the ith letter, T [i.. j] for a factor. If T does not contain holes, then it is called solid. The relation ≈ of matching on Σ ∪ {♦} is defined as: a ≈ a, ♦ ≈ a, and a ≈ ♦ for all a ∈ Σ ∪ {♦}.
We define an operation such that: a a = a ♦ = ♦ a = a for all a ∈ Σ ∪ {♦}, and otherwise a b is undefined. Two equal-length partial words S and T are said to match (denoted as S ≈ T ) if S i ≈ T i for all i = 1, . . . , n. In this case, we denote for a partial word U , then we say that U occurs in T at position i.
Two equal-length partial words U and V are called cyclic shifts if there are partial words X, Y such that U = XY and V = Y X. We denote this as rot(U, |X |) = W , where |X | is the shift value.
For a partial word X , by # ♦ (X ) we denote the number of holes in X . For 1 ≤ i ≤ n and 0 ≤ q ≤ log n, we denote T i,q = T [i.. min(n, i + 2 q − 1)]. We say that T i,q is a q-basic factor of the partial word T . In other words, q-basic factors are factors of T of length 2 q and suffixes of T of length at most 2 q . By B(T ) we denote the set of all basic factors of T .

Lemma 2.1 If T is a partial word of length n with k holes, then
Proof The number of q-basic factors that contain a given position i ∈ {1, . . . , n} is at most 2 q . Thus the total number of basic factors that contain a given hole position i is at most: log n q=0 2 q ≤ 2n.
We say that a p-square is an unambiguous p-square (u-square) if its representative is solid and an ambiguous p-square (a-square) otherwise. By asquares(T ) and usquares(T ) we denote the sets of non-equivalent factors of T being a-squares and u-squares, respectively. Obviously:

Periods in solid and partial words
A positive integer q is called a period of a word W if W i = W i+q for all i = 1, . . . , n − q. In this case, W [1..q] is called a string period of W . A word W is called periodic if it has a period q such that 2q ≤ |W |.
A quantum period of a partial word T is a positive integer q such that T i ≈ T i+q for all i = 1, . . . , n − q. A deterministic period of T is an integer q such that there exists a word W such that W ≈ T and W has a period q.
The partial word T is called quantum (deterministically) periodic if it has a quantum (deterministic) period q such that 2q ≤ n.
For a partial word U and integer δ > 0, we denote We say that p is a d-approximate quantum period of a partial word T if |Mis d (T )| ≤ d. Note that a 0-approximate quantum period is exactly a quantum period.
Point (b) follows from point (a). Also point (c) follows from point (a). Indeed, if i ∈ Mis δ (U ) in this case, then for each of the positions i, i − δ in V , if it contains a hole, then it is counted only for the index i.

Computing all p-squares of specified length and non-equivalent ambiguous p-squares
In this section we develop an O(n)-time algorithm that enumerates all non-equivalent p-squares of a half length d in a partial word T of length n. As a corollary, we obtain a simple computation of all non-equivalent ambiguous p-squares in optimal time. For a partial word T , we denote by T a partial word of length n − d such that Example 3.2 Let us consider the partial word T = ab♦♦ba♦aaba♦b from Example 1.3. For d = 2 we construct the following partial word T : abbabaa#ab# from which we conclude that T contains p-squares of half length 2 with representatives: For d = 3 we construct the partial word T : aba♦#abaab which means that T contains p-squares of half length 3 with representatives:

Theorem 3.3 All non-equivalent p-squares of half length d in a partial word of length n can be reported (as factors of the partial word) in O(n) time.
Proof Let T be a partial word of length n. In O(n) time we compute T . Let S 1 , . . . , S q be a partition of T into maximal factors that do not contain the symbol #. By Observation 3.1, our task is equivalent to reporting all distinct factors of length d of the partial words S j . This can be performed by listing all nodes (implicit and explicit) at depth d in the generalized suffix tree T of S 1 , . . . , S q , that is, in the suffix tree of For details, see Gusfield (1997). As the suffix tree of a word of length n can be constructed in O(n) time (Farach 1997), the whole algorithm works in O(n) time.
As a corollary we obtain efficient computation of non-equivalent a-squares.

Theorem 3.4 For a partial word T of length n with k holes, all elements of the set asquares(T ) can be reported in
Proof There are at most k 2 possible lengths of ambiguous p-squares. For each length we use the algorithm of Theorem 3.3 to report all non-equivalent p-squares. This takes O(nk 2 ) time. In the end, for each length we need to filter out unambiguous p-squares. For a specified half length d, it suffices to check, for each p-square

Computing all non-equivalent unambiguous p-squares
We start the description of the algorithm by an abstract lemma that lets us efficiently generate all distinct squares induced by a special family of (solid) words.

Computing squares induced by a family of words
For a word S, we define its primitive root U as the shortest word such that U k = S for some integer k ≥ 1. The Lyndon root λ of a word U is the minimal cyclic shift of the shortest string period of U . The notion of a Lyndon root was introduced in the context of runs by Crochemore et al. (2014).
Example 4.1 The Lyndon root of U = abaababaababa is aabab. The word U is periodic and its shortest period is 5.
For a word W and its period q, by squares(W, q) we denote the set of square factors of W of length 2q. We say that squares(W, q) is the set of squares induced by the word W with the period q. Each square factor in squares(W, q) can be represented in O(1) space by specifying its occurrence in W . (1) n i ≤ n is the length of W i and 2q i ≤ n i ; (2) all the words W i for which 2q i = n i (so-called short words) are distinct; (3) for a given q i , the number of words W i for which 2q i < n i (so-called long words) is at most k; (4) first i is the starting position of the first occurrence of the Lyndon root λ i of W i in W i and i is its length; (5) any two Lyndon roots λ i , λ j can be compared in O(k) time.

Then we can compute the cardinality of the set S Q = i squares(W i , q i ) and its representation (as sets of intervals in
Proof Let us start with the following observation; see also The above set of integers is denoted by I i . Note that it forms one cyclic subinterval of [0.. i − 1] (composed of up to two standard intervals) and that it can be computed in O(1) time. Each of the elements a ∈ I i represents a unique square that is induced by W i and q i .
We make two transformations of the set of intervals I i so that, in the end, each square from the set S Q is induced by exactly one word W i with period q i . If any of the intervals is made empty, this corresponds to removing the word as unnecessary. The first transformation deals with the long words W i ; by definition, at most k of them share the same period q i . First transformation For every pair W i , q i and W j , q j of long words such that i = j and we remove W j . If none of the two cases holds and still I i ∩ I j = ∅, we trim I j to make it disjoint with I i . Complexity All long words can be sorted by their periods in O(N + n) time by bucket sort. There are n/2 buckets and each bucket contains at most k words. For each of the k(k − 1)/2 pairs of long words in a bucket, we check equality of their Lyndon roots, which takes O(k) time per pair and O(nk 3 ) time overall. The time complexity of trimming of cyclic intervals is dominated by this step. Second transformation For every short word W i with period q i and long word W j with period q j = q i , we check if λ i = λ j . If so and I i ⊆ I j , we remove W i . Note that I i is a singleton. Complexity All words can be sorted by their periods in O(N + n) time by bucket sort. For each short word W i , we need to inspect at most k long words and check if their Lyndon roots are equal. This takes O(k 2 ) time per short word, O(N k 2 ) time overall. Checking inclusion of elements in cyclic intervals is dominated by this step.
The two transformations take O(N k 2 + nk 3 ) time in total. Afterwards each square is induced by exactly one interval I i for a word W i and period q i , so we can list all the distinct squares in O(|S Q|) time.
For a partial word T , by ssquares(T ) we denote the set of distinct solid factors of T being squares. The following fact was already mentioned in Sect. 1. By substituting all holes in a partial word with distinct symbols # 1 , . . . , # k , we obtain the following corollary.

Corollary 4.5 For partial word T of length n, the set ssquares(T ) can be computed in O(n) time.
The algorithm of Crochemore et al. (2014) actually computes the set ssquares(T ) together with all the data in assumption of Lemma 4.2. These are the short words in the construction.
In the following section we construct a family F of words (called sealed fragments) that represent the u-squares that contain a hole and compute for them the data required in Lemma 4.2. These are the long words in the construction. Afterwards we list all distinct representatives of u-squares using Lemma 4.2. Then non-equivalent u-squares are extracted from their representatives.

Computing a special family of sealed fragments
If T is a partial word, then U is a sealed fragment of T if U is a factor of T with holes substituted by solid symbols. By unseal(U ) we denote the original factor of the partial word.
A sealed fragment is always solid. Obviously, a sealed fragment can be represented in space proportional to the number of holes that were substituted. For example, if If W is a (solid) word, then by a d-fragment we mean a concatenation of d factors Kociumaka (2016) showed that several types of operations on d-fragments can be performed in O(d) or O(d 2 ) time after O(n)-time preprocessing. We notice here that a sealed fragment of a partial word T with k holes corresponds to a d-fragment with d = O(k) in a word that corresponds to T where ♦ is treated as an alphabet symbol. Thus the following simple fact is a consequence of Observation 18 from Kociumaka (2016) that was stated in terms of d-fragments. (d) If X is a non-solid u-square in T , then X is a factor of unseal(W i ) for some W i with q i = 1 2 |X |. The size of an S-family follows from point (c).

Observation 4.8 An S-family contains O(nk) elements and thus can be represented in O(nk 2 ) space.
In the following lemma we provide an algorithm for constructing an S-family. Our approach resembles computing anchored squares in the Main-Lorentz algorithm (Main and Lorentz 1984).

Lemma 4.9 For a partial word T of length n with k holes, an S-family can be computed in O(nk 2 ) time.
Proof Each non-solid u-square X contains a hole in the first half or in the second half. Below, we construct an S-family for u-squares containing a hole in the second half. A symmetric procedure deals with the u-squares containing a hole in the first half.
For a hole h and integer q, we define the family S(q, h) of u-squares of length 2q, which contain h as the leftmost hole in the second half. For each non-empty set S(q, h), we shall construct a sealed fragment W with period q so that each u-square X ∈ S(q, h) is a factor of unseal(W ).
First, let us seal the text consistently with the representatives of u-squares in S(q, h). A hole at position i < h may only be contained in the first half, while a hole at position i ≥ h may only be contained in the second half of such a u-square. Thus, we seal the hole T Any remaining hole is sealed with a unique marker (distinct for every hole). This produces a sealed fragment T that covers the whole partial word T ; see Fig. 3. Let z be the distance between h and the position of the preceding hole (z = +∞ if there is none). We define W as a maximal fragment of T which contains T [h − q..h], is contained in T [h − q − min(q − 1, z − 1)..h + q − 1], and has period q. If |W | < 2q, there is no u-square of the desired type and we can discard W .
The fragment W is unique and it can be retrieved in O(k) time using Fact 4.6. Indeed, it suffices to compute the longest common prefix P of T [h − q..n] and T [h..n], the Fig. 3 A partial word T with 5 holes and the corresponding sealed text T with holes sealed by 5 (solid) symbols implied by the value of q. The rightmost hole is filled by a special unique marker denoted by $ . We may need to trim S so that its length exceeds neither q − 1 (so that the hole at position h is contained in the right half of the square) nor z − 1 (so that h is the leftmost hole in the right half). Similarly, we may need to trim P to the length q − 1. In total, the construction takes O(nk 2 ) time.
Let us verify that this construction indeed satisfies the condition of Definition 4.7. For each hole we construct just one sealed fragment, so the condition (c) is satisfied. Clearly, W has period q and |W | ≥ 2q, which yields point (a). Moreover, if X = T [i.. j] ∈ S(q, h), then repr(X ) = T [i.. j], so (by maximality) repr(X ) is contained in W , and X is contained in unseal(W ). This gives point (d). Finally, we shall prove that unseal(W ) does not contain two holes at distance q (condition (b)). Suppose that the holes are at positions i and i + q. Observe that one of the holes is sealed with a unique marker, which contradicts T [i] = T [i + q]. This completes the proof. Henceforth we denote by F the S-family constructed in Lemma 4.9. In order to transform it into an instance of Lemma 4.2, we need to compute the Lyndon roots of the sealed fragments W i (that is, the values first i and i ).

Lyndon roots of sealed fragments
We will show how to compute Lyndon roots λ i of sealed fragments (W i , q i ) ∈ F. Obviously, a Lyndon root of a sealed fragment can be represented in the same space complexity as the sealed fragment itself.
Let us start with the following fact that encapsulates Theorems 20 and 23 from Kociumaka (2016). Proof To compute the maximal suffix instead of the minimal suffix, we reverse the lexicographic order on the alphabet and append the d-fragment in question with a letter that is greater than all the letters from Σ.  With this missing puzzle we are ready to conclude the algorithm for reporting all unambiguous p-square factors of a partial word.

Theorem 4.16 For a partial word T of length n with k holes, all elements of the set usquares(T ) can be reported in O(nk 3 ) time.
Proof We construct a family of sealed fragments that consists of the solid p-squares ssquares(T ) and an S-family F. By Corollary 4.5 and Lemma 4.9, this family can be constructed in O(nk 2 ) time. We compute Lyndon roots of all the sealed fragments in O(nk 3 ) time using Corollary 4.15. For each solid p-square we may compute its Lyndon root in O(k 2 ) time using Lemma 4.14; we can also use the Lyndon roots as computed in Crochemore et al. (2014).
The constructed family satisfies the assumption of Lemma 4.2 with N = O(nk). (Actually, if for any sealed factor (W i , q i ) of the S-family F we have |W i | = 2q i , we need to check if it equals any of the solid squares of the same length and, if so, remove it, so that no two short words repeat.) This lemma lets us report all the distinct representatives of u-squares in O(nk 3 + |S Q|) time. The total number of u-squares that will be generated is O(nk) due to Theorem 6.6. This gives the final complexity of the algorithm.

Combinatorial bounds for ambiguous p-squares
Let T be a partial word of length n with k holes. The upper bound in the case of a-squares is straightforward.

Theorem 5.1 If T is a partial word of length n with k holes, then asquares(T ) = O(nk 2 ).
Proof The number of possible lengths of a-squares is at most k 2 , since we have k 2 possible distances between the k holes. Consequently, the number of p-squares with such lengths is at most nk 2 .
Let us proceed to the lower bound proof. We say that a set A of positive integers is an (m, t)-cover if the following conditions hold: (1) For each d ≥ m, A contains at most one pair of elements with difference d; Example 5. 2 {1, 2, 3, 6, 9, 12} is a (3, 9)-cover.
For a set A ⊆ [1.
.n] we denote by W A,n the partial word of length n over the alphabet Σ such that W A,n [i] = ♦ ⇔ i ∈ A, and W A,n [i] = a otherwise.
Proof Each even-length factor of a n−2 · W A,n · a n−2 is a p-square. Let Z be the set of these factors X which contain two positions i < j containing holes with j − i ≥ m and |X | = 2( j − i). As A is an (m, t)-cover, i and j are determined uniquely by d = j − i. Then all elements of Z are pairwise non-equivalent a-squares. The size of Z is Ω(mt) which is Ω(n · k 2 ).
Theorem 5.4 For every positive integer n and k ≤ √ 2n, there is a partial word of length n with k holes that contains Ω(nk 2 ) non-equivalent a-square factors.
Proof Due to Lemma 5.3, it is enough to construct a suitable set A. By monotonicity, we may assume that k and n are even. We take: We claim that A is an ( n 2 , t)-cover for t = Ω(k 2 ). Indeed, take any i, j ∈ [1.. k 2 ]. Then j · k 2 + n 2 − i ≥ n 2 and all such values are distinct; hence, t = k 2 4 . The thesis follows from the claim.
This partial word contains all a-squares with representatives being cyclic shifts of (♦a i ) for i = 7, . . . , 15.  ♦ ♦ a a a  a b a a ♦ ♦ a a  a a b a a ♦ ♦ a  a a a b a a ♦ ♦   W m :  a a a b a a ♦ ♦ a a a   6 Combinatorial bounds for unambiguous p-squares The following theorem shows a lower bound construction. Afterwards we design an upper bound that asymptotically matches this lower bound.
Theorem 6.1 For every positive integers n and k, k ≤ 1 3 n, there is a partial word of length n with k holes that contains Ω(nk) non-equivalent u-square factors.
Proof Let us consider the following partial word over the alphabet {a, b}: Then for every i ∈ [1..k], W m has m − k + i u-square factors of half length m − k + i containing the letter b; see also Fig. 6. Altogether the number of such u-squares is: where n = 3m − 1 = |W m |. If n gives a different remainder modulo 3, we can pad W m with the letter a.
If X is a partial word, then by LONG(X ) we denote the set of all p-squares of length at least 1 2 |X | which occur in X as a prefix. If A is a set of numbers, |A| ≥ 2, then we denote If Z is a set of partial words, then mingap(Z) denotes mingap({|S| : S ∈ Z}). Lemma 6.2 (Three p-Squares Lemma) Let X be a partial word with k holes. Assume that the set LONG(X ) contains at least three elements. Then δ = mingap(LONG(X ))/2 is a 12k-approximate quantum period of the longest p-square in LONG(X ).
Proof Let B, C ∈ LONG(X ) be p-squares such that |B| − |C| = 2δ. Also let A and D be the longest and the shortest element of LONG(X ), respectively. Let |A| = 2a, |B| = 2b, |C| = 2c, |D| = 2d. We aim to show that Mis δ (A) ≤ 12k. We consider two cases, depending on whether B = A or B = A.
Let m q = |{i ∈ I q : i − δ ∈ I q , X i−δ ≈ X i }|. We show the following inequalities: Note that X i ≈ X i+c ≈ X i+c−b = X i−δ due to p-squares B and C, respectively. Hence, X i ≈ X i−δ may hold only if X i+c = ♦. (II) m 3 ≤ k: Assume that i ∈ Mis δ (A) ∩ I 3 . Note that b < i ≤ b + c. Hence, X i ≈ X i−b ≈ X i−b+c = X i−δ due to p-squares B and C, respectively. Consequently, We conclude that |Mis δ (A)| = m 1 + m 2 + m 3 + (m 4 + m 5 ) ≤ k + 3k + k + 7k = 12k.
Recall that a deterministic period of a partial word X is an integer q such that there exists a (solid) word W such that W ≈ X and W has a period q. In the following lemma we show that if the set LONG(X ) is large enough, then the majority of its elements have strong periodic properties. Lemma 6.3 Let X be a partial word with k holes. Assume that the set LONG(X ) contains at least 16k + 3 elements. Then δ = mingap(LONG(X ))/2 is a deterministic period of all p-squares from LONG(X ) excluding possibly the 2k + 1 longest ones.
Proof Let LONG (X ) be the set LONG(X ) without the 2k + 1 longest elements, A be the longest p-square in LONG(X ), and B be the longest p-square in LONG (X ). We start by a proof of a weaker property. In the proof we will use the fact that |Mis δ (A)| ≤ 12k (Lemma 6.2).
p-squares C for which any of the positions j − dδ, j contains a hole. Assume otherwise. Then B j−dδ ≈ B j which contradicts the definition of i. 4. Let us count the p-squares that contain the position i − dδ in the second half and do not contain the position i. Using the same argument as in 2, we see that there are at most k + 1 of them. 5. Finally, we will show that there are no p-squares in LONG(X ) that do not contain the position i − dδ. If such a p-square existed, then both positions i − dδ and i would be contained in right halves of all p-squares from LONG(X ) \ LONG (X ).
There are 2k + 1 of them, which contradicts point 3.
Each p-square in LONG(X ) accounts to one of the categories 1-5. We have shown that there can be at most 9k + 3 p-squares in LONG(X ) which contradicts the assumptions of the lemma, as k > 0. This completes the proof of the lemma. By U-Pref (X ) we denote the set of unambiguous p-squares in LONG(X ) that occur in X only as a prefix.
Proof Assume to the contrary that |U-Pref (X )| ≥ 16k + 3. Let us recall that U-Pref (X ) ⊆ LONG(X ) so the assumptions of Lemma 6.3 are satisfied.
Let us assume that B = X [1.
.2a] ∈ U-Pref (X ) and let W 2 be its (solid) representative. Then C = X [1 + δ..2a + δ] is a p-square, as it matches W 2 due to the deterministic period δ. If X [2a + 1..2a + δ] did not contain a hole, then C would be another occurrence of a u-square with representative W 2 . This would contradict the assumption that B ∈ U-Pref (X ).
We say that a solid square W 2 has a solid occurrence in T if T contains a factor equal to W 2 . By the following fact, there are at most 2n non-equivalent p-square factors of T with solid occurrences. Fact 6.5 (Fraenkel and Simpson 1998;Ilie 2005;Deza et al. 2015) Every position of a (solid) word contains at most two rightmost occurrences of squares.
In the proof of the upper bound on the number of u-squares we separately count u-squares that have a solid occurrence and those that do not. In the latter case, we use Lemma 6.4, which lets us bound |U-Pref (X )| by 19k in case that k > 0. This concludes the proof.