Validating the Knuth-Morris-Pratt Failure Function, Fast and Online

Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\pi'_{w}$\end{document} denote the failure function of the Knuth-Morris-Pratt algorithm for a word w. In this paper we study the following problem: given an integer array \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$A'[1 \mathinner {\ldotp \ldotp }n]$\end{document}, is there a word w over an arbitrary alphabet Σ such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$A'[i]=\pi'_{w}[i]$\end{document} for all i? Moreover, what is the minimum cardinality of Σ required? We give an elementary and self-contained \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(n\log n)$\end{document} time algorithm for this problem, thus improving the previously known solution (Duval et al. in Conference in honor of Donald E. Knuth, 2007), which had no polynomial time bound. Using both deeper combinatorial insight into the structure of π′ and advanced algorithmic tools, we further improve the running time to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(n)$\end{document}.


Pattern Recognition and Failure Functions
The pattern matching algorithms attracted much attention since the dawn of computer science. It was particularly interesting, whether a linear-time algorithm for this problem exists. First results were obtained by Matiyasevich for a fixed pattern in the Turing Machine model [18]. However, the first fully linear time pattern matching algorithm is the Morris-Pratt algorithm [21], which is designed for the RAM machine model, and is well known for its beautiful concept. It simulates the minimal DFA recognizing Σ * p (p denotes the pattern) by using a failure function π p , known as the border array. The automaton's transitions are recovered, in amortized constant time, from the values of π p for all prefixes of the pattern, to which the DFA's states correspond. The values of π p are precomputed in a similar fashion, also in linear time.
The MP algorithm has many variants. For instance, the Knuth-Morris-Pratt algorithm [17] improves it by using an optimised failure function, namely the strict border array π (or strong failure function). This was improved by Simon [23], and further improvements are known [1,13]. We focus on the KMP failure function for two reasons. Unlike later algorithms, it is well-known and used in practice. Furthermore, the strong border array itself is of interest as, for instance, it captures all the information about periodicity of the word. Hence it is often used in word combinatorics and numerous text algorithms, see [4,5]. On the other hand, even Simon's algorithm (i.e., the very first improvement) deals with periods of pattern prefixes augmented by a single text symbol rather than pure periods of pattern prefixes.

Strict Border Array Validation
Problem Statement We investigate the following problem: given an integer array A [1 . . n], is there a word w over an arbitrary alphabet Σ such that A [i] = π w [i] for all i, where π w denotes the failure function of the Knuth-Morris-Pratt algorithm for w. If so, what is the minimum cardinality of the alphabet Σ over which such a word exists?
Pursuing these questions is motivated by the fact that in word combinatorics one is often interested only in values of π w rather than w itself. For instance, the logarithmic upper bound on delay of KMP follows from properties of the strict border array [17]. Thus it makes sense to ask if there is a word w admitting π w = A for a given array A .
We are interested in an online algorithm, i.e., one that receives the input array values one by one, and is required to output the answer after reading each single value. For the Knuth-Morris-Pratt array validation problem it means that after reading A [i] the algorithm should answer, whether there exist a word w such that A [1 . . i] = π w [1 . . i] and what is the minimum size of the alphabet over which such a word w exists.
Previous Results To our best knowledge, this problem was investigated only for a slightly different variant of π , namely a function g that can be expressed as g[n] = π [n − 1] + 1, for which an offline validation algorithm due to Duval et al. [8] is known. Validation of border arrays is used by algorithms generating all valid border arrays [9,11,20].
Unfortunately, Duval et al. [8] provided no upper bound on the running time of their algorithm, but they did observe that on certain input arrays it runs in Ω(n 2 ) time.
Algorithm 1 COMPUTE-π(w) 1: π [1] ← 0, k ← 0 2: for i ← 2 to n do 3: while k > 0 and w[k + 1] = w[i] do 4: k ← π[k] 5: end while 6: if w[k + 1] = w[i] then 7: k ← k + 1 8: end if 9: π[i] ← k 10: end for Our Results We give a simple O(n log n) online algorithm VALIDATE-π for the strong border array validation, which uses the linear offline bijective transformation between π and π . VALIDATE-π is also applicable to g validation with no changes, thus giving the first provably polynomial algorithm for the problem considered by Duval et al. [8]. Note that aforementioned bijection between π and π cannot be applied directly to g, as it essentially uses the unavailable value π[n] = π [n], see Sect. 2.
Then we improve VALIDATE-π to an optimal linear online algorithm LINEAR-VALIDATE-π . The improved algorithm relies on both more sophisticated data structures, such as dynamic suffix trees supporting LCA queries, and deeper insight into the combinatorial properties of π function.

Related Results
The study of validating arrays related to string algorithms and word combinatorics was started by Franěk et al. [11], who gave an offline linear algorithm for border array validation. This result was improved over time, in particular a simple linear online algorithm for π validation is known [9].
The border array validation problem was also studied in the more general setting of the parametrised border array validation [14,15], where parametrised border array is a border array for text in which a permutation of letters of alphabet is allowed. A linear time algorithm for a restricted variant of this problem is known [14] and a O(n 1.5 ) for the general case [15].
Recently a linear online algorithm for a closely related prefix array validation was given [2], as well as for cover array validation [6]. Algorithm 2 π -FROM-π(π) 1: π [0] ← −1 2: for i ← 1 to n − 1 do 3: if π [ For w ∈ Σ * , we denote its length by |w|. For v, w ∈ Σ * , by vw we denote the concatenation of v and w. We say that u is a prefix of w if there is v ∈ Σ * such that w = uv. Similarly, we call v a suffix of w if there is u ∈ Σ * such that w = uv. A word v that is both a prefix and a suffix of w is called a border of w. By w[i] we denote the i-th letter of w and by w[i . . j] we denote the subword w[i]w[i + 1] . . . w[j ] of w. We call a prefix (respectively: suffix, border) v of the word w proper if v = w, i.e., it is shorter than w itself.
For a word w its failure function π w is defined as follows: π w [i] is the length of the longest proper border of w [1 . . i] for i = 1, 2, . . . , n, see Table 1. It is known that π w table can be computed in linear-time, see Algorithm 1.
By π (k) w we denote the k-fold composition of π w with itself, i.e., π This convention applies to other functions as well. We omit the subscript w in π w , whenever it is unambiguous. Note that every border of w [1 . . i] has length π (k) The strong failure function π is defined as follows: π w [n] := π w [n], and for i < n, π [i] is the largest k such that w [1 . . k] is a proper border of w [1 . . i] and w[k It is well-known that π w and π w can be obtained from one another in linear time, using additional lookups in w to check whether w[i] = w[j ] for some i, j . What is perhaps less known, these lookups are not necessary, i.e., there is a constructive bijection between π w and π w . For completeness, we supply both procedures, see Algorithm 2 and Algorithm 3. By standard argument it can be shown that they run in linear time. The correctness as well as the procedures themselves are a consequence of the following observation Note that procedure π -FROM-π explicitly uses the following recursive formula for π [j ] for j < n, whose correctness follows from (1): For two arrays of numbers A and B, we write

Border Array Validation
Our algorithm uses an algorithm validating the input table as the border array. For completeness, we supply the code of one of the simplest such algorithms VALI-DATE-π , see Algorithm 4, due to Duval et al. [9]. This algorithm is online and also calculates the minimal size of the required alphabet. Roughly

Overview of the Algorithm
Since there is a bijection between valid border arrays and valid strict border arrays, it is natural to proceed as follows: Assume the input forms a valid strict border array, compute the corresponding border array using π -FROM-π (A ), and validate the result using VALIDATE-π(A). Unfortunately, π -FROM-π starts the calculations from the last entry of A , so it is not suitable for an online algorithm. Note that it is crucial that A is defined also on n + 1.
Our algorithm VALIDATE-π (and its improved variant LINEAR-VALIDATE-π ) maintains such a maximal A.
Slopes and Their Properties Imagine the array A as the set of points (i, A [i]) on the plane; we think of A in the similar way. Such a picture helps in understanding the idea behind the algorithm. In this setting we think of A as a collection of maximal slopes: a set of indices From here on whenever we refer to slope, we implicitly mean a maximal one, i.e., extending as far as possible in both directions. Note that n + 1 is part of the last slope, which may consist only of n + 1. It is even better to imagine a slope a collection of points (i, A[i]) which together span one interval on the plain, see Fig. 1. (1), i.e., the last index of a (maximal) slope is the unique one on which Let the pin be the first position on the last slope of A (in some extreme cases it might be that n + 1 is the pin). VALIDATE-π calculates and stores the pin. It turns out that all functions consistent with A differ from A only on the last slope, as shown later in Lemma 1.
When a new input value A [n] is read, the values of A and A on the last slope [i . . n + 1] should satisfy the following conditions: holds. If not, we reject: every  Fig. 3 and propagate the change along the whole slope. If this happens for A[i] = 0, then there is no further candidate value, and A is rejected. The idea is that some adjustment is needed and since pin value check does not return an index, we cannot break the slope into two and so the only possibility is to decrement A on the whole last slope.
Unfortunately, this simple combinatorial idea alone fails to produce a linear-time algorithm. The problem is caused by the second condition: large segments of A should be compared in amortised constant time. While LCA queries on suffix trees seem ideal for this task, available solutions are imperfect: the online suffix tree construction algorithms [19,24] are linear only for alphabets of constant size, while the only linear-time algorithm for larger alphabets [10] is inherently offline. To overcome this obstacle we specialise the data structures used, building the suffix tree for compressed encoding of A and multiple suffix trees for short texts over polylogarithmic alphabet. The details are presented in Sect. 8.

Details and Correctness
In this section we present technical details of the algorithm, provide a proof of its correctness and proofs of used combinatorial properties. We do not address the running time and the way the data structures are organised. We start with showing that all the consistent tables coincide on indices smaller than pin.

Lemma 1 Let
which shows the claim of the lemma.
which shows the last claim and thus completes the proof.
Thus it is left to show that CF1-CF3 are preserved by ADJUST-LAST-SLOPE. We show that during the adjusting inside ADJUST-LAST-SLOPE CF1 and CF3 hold. To be more specific, CF1 alone means that A is always a valid border array, while CF3 means that it is greater than any border table consistent with A (this is assumed to hold vacuously if no consistent table exists). Finally, we show that CF3 holds when ADJUST-LAST-SLOPE ends adjusting the last slope, i.e., that then A is in fact consistent with A .
For the completeness of the proof, we need also to show that if at any point A was reported to be invalid, it is in fact invalid.

Lemma 3 After each iteration of the loop in line
which shows that CF3 holds for A[1 . . n + 1]. Additionally, the second claim of the Lemma holds vacuously for A [1 . . n + 1], as so far it was not rejected.
Suppose that PIN-VALUE-CHECK returns no index j . Then by the induction assumption CF1 and CF3 hold, which ends the proof in this case.
Suppose that PIN-VALUE-CHECK returns j such that , i.e., A 1 is not a valid π table. So no A 1 is consistent with A , which means that A is invalid, as reported by VALIDATE-π . This ends the proof in this case.
It is left to consider the case in which PIN-VALUE-CHECK returns j such that , which is always a valid π candidate, by Val2. Furthermore, j is an end of slope for A 1 : By CF3, and therefore, by (2), it is an end of a slope for A 1 . As a consequence, by Lemma , and this condition is verified by ADJUST-LAST-SLOPE in line 8. If this equation is not satisfied by some p then clearly . j] this shows that no such A 1 exists and consequently A is invalid. This shows the second subclaim.
Suppose that A was not rejected. It is left to show that CF3 is satisfied when . . , n and thus: and as A 1 was chosen arbitrarily, CF3 holds.

Lemma 4 Suppose that PIN-VALUE-CHECK returns no j and that
Otherwise after adjusting in line 20 of ADJUST-LAST-SLOPE, CF1 and CF3 hold.
Proof Let as in the previous lemma A 1 denote any valid border array consistent with A . Since A satisfies CF3, we know that is itself a valid π value) and for each p > i we have which shows that CF3 is preserved after the adjusting.
We now prove that in fact Suppose for the sake of contradiction that but from the answer of the PIN-VALUE-CHECK we know that this is not the case.
Consider the smallest position, say p, This is a contradiction, as PIN-VALUE-CHECK should have returned this p.
Therefore, when CONSISTENCY-CHECK returns NO  The last lemma shows that when ADJUST-LAST-SLOPE finishes, CF2 is satisfied as well.

Lemma 5 When ADJUST-LAST-SLOPE finishes, CF2 is satisfied.
Proof Recall the recursive formula (2) for π . Its first case corresponds to j being the last element on the slope and the second to other j 's.
If A[j ] is an explicit value and j is not an end of a slope, this formula is verified, is explicit and j is an end of the slope then the formula trivially holds.
If A[j ] is an implicit value, i.e., such that j is on the last slope of A, PIN-VALUE-CHECK guarantees that A[j ] > A [j ] and so the second case of this formula should hold. This is verified by CONSISTENCY-CHECK. Hence CF2 holds when all adjustments are finished.
The above four lemmata: Lemma 2-Lemma 5 together show the correctness of VALIDATE-π .

Theorem 1 VALIDATE-π verifies whether A is a valid strict border array. If so, it supplies the maximal function A consistent with A .
Proof We proceed by induction on n. If n = 0, then clearly A [1] = 0, CF1-CF3 hold trivially, and A is a valid (empty) π array. If n > 0 and no adjustments were done, CF1-CF3 hold by Lemma 2. So we consider the case when ADJUST-LAST-SLOPE was invoked.
By Lemma  In the following section we explain how to perform the pin value checks and consistency checks efficiently and bound the whole running time of the algorithm.

Performing Pin Value Checks
Consider the PIN-VALUE-CHECK and two indices j , j such that We call the relation defined in (4) a domination: we say that j dominates j and write it as j ≺ j . We will show that if j j and j is an answer to PIN-VALUE-CHECK, so is j , consult Fig. 4. This observation allows to keep a collection j 1 < j 2 < · · · < j of indices such that to perform the pin value check, it is enough to see whether In particular, the answer can be given in constant time. Updates of this collection are done by removal of j 1 when i becomes j 1 + 1, or by consecutive removals from the end of the list when a new A [n] is read.

Domination Properties
As ≺ is an intersection of two transitive relations (order on indices and order on T , defined as Therefore if j is an answer to pin value check, so is j . As a consequence, we do not need to keep track of j as a potential answer to the PIN-VALUE-CHECK. Data Stored VALIDATE-π stores a list of positions j 1 < j 2 < · · · < j k such that (for the sake of simplicity, let j 0 = i, where i is the current pin): and return the answer. This way the PIN-VALUE-CHECK is answered in constant time. We show that evaluating this expression for other values for some j > j 1 . Since j 1 < j and j does not dominate j 1 : As j 1 and j are on the last slope, Update We demonstrate that all updates of the list j 1 , . . . , j k can be done in O(n) time. When new position n is read, we update the list by successively removing j 's dominated by n from the end of the queue. By routine calculations, if n j , then n j +1 as well: So we simply have to remove some tail from the list of j 's. Suppose that j , . . . , j k were removed. It is left to show that (6a), (6b) are preserved after the removal. Consider first (6a). Take any j ∈ [j −1 . . n − 1]. Then there is some j such that j ∈ [j −1 . . j − 1]. By (6b), j j . Since by assumption n j , by transitivity of , also n j . As for (6b), it holds since j −1 ⊀ n by the construction. There is another possible update: when PIN-VALUE-CHECK return j 1 then i ← j 1 + 1 and so j 1 + 1 becomes the new pin. In such case we remove j 1 from the list.
As each position enters and leaves the list at most once, the time of update is linear.

Lemma 6
All PIN-VALUE-CHECK calls can be made in amortised constant time.

Performing Consistency Checks: Slow but Easy
In order to perform consistency check we need to efficiently perform two operations: appending a letter to the current text A [1 .
. n] and checking if two fragments of the prefix read so far are the same. First we show how to implement both of them using randomisation so that the expected running time is O(log n) per one consistency check. In the next section we improve the running time to (deterministic) O(1). We use the standard labeling technique [16], assigning unique small names to all fragments of lengths that are powers of two. More formally, let . Then checking if any two fragments of A are the same is easy: we only need to cover both of them with fragments of length 2 j , where 2 j is the largest power of two not exceeding their length. Then we check if the corresponding fragments of length 2 j are the same in constant time using the previously assigned names.
Appending To implement the dictionaries M(j ), we use dynamic hashing with a worst-case constant time lookup and amortized expected constant time for updates (see [7] or a simpler variant with the same performance bounds [22]). Then the expected running time of the whole algorithm becomes O(n log n), as there are log n dictionaries, each running in expected linear time (the expectation is taken over the random choices of the algorithm).

Size of the Alphabet
VALIDATE-π not only answers whether the input table is a valid border array, but also returns the minimum size of the needed alphabet. We show that this is also true of VALIDATE-π . Roughly speaking, VALIDATE-π runs VALIDATE-π and simply returns its answers. To this end we show that the minimum alphabet size required by the fixed prefix of A matches the minimum alphabet size required by A . We further note that Lemma 7 implies that the minimum size of the alphabet required for a valid strict border array is at most as large as the one required for border array. The latter is known to be O(log n) [20,Th. 3.3a]. This observation implies the following.

Improving the Running Time to Linear
This section describes our linear time online algorithm LINEAR-VALIDATE-π by specifying necessary changes to VALIDATE-π . It suffices to show how to perform consistency checks more efficiently, as each other operations works in amortised constant time. A natural approach is as follows: construct a suffix tree [10,19,24] for the input table A [1 . . n], together with a data structure for answering LCA queries [3]. The best known algorithm for constructing the suffix tree runs in linear time, regardless of the size of the alphabet [10]. Unfortunately, this algorithm, and all other linear time solutions we are aware of, are inherently off-line, and as such invalid for our purposes. The online suffix tree constructions of [19,24] have a slightly bigger running time of O(n log |Σ|), where Σ is the alphabet. As A is a text over an alphabet {−1, 0, . . . , n − 1}, i.e., of size n + 1, these constructions would only guarantee an O(n log n) time.
To get a linear time algorithm we exploit both the structure of the π array and the relationship between subsequent consistency checks. In more detail, firstly we demonstrate how to improve Ukkonen's algorithm [24] so that it runs in time O(n) for alphabets of polylogarithmic size, which may be of independent interest. This alone is still not enough, since A is over an alphabet of linear size. To overcome this obstacle we use the combinatorial properties of A to compress it. The compressed table uses alphabet of polylogarithmic size, which makes the improved version of the Ukkonen's algorithm applicable. New problems arise, as the compressed table is a little harder to read and further conditions need to be verified to answer the consistency checks.

Suffix Trees for Polylogarithmic Alphabet
In this section we present a construction of an online dictionary with constant time access and insertion, for t = log n elements. When used in Ukkonen's algorithm [24], it guarantees the following construction of suffix trees. The only reason Ukkonen's algorithm [24] does not work in linear time is that given a vertex it needs to efficiently retrieve its child labeled with a specified letter. If we are able to perform such a retrieval in constant time, the Ukkonen's algorithm runs in linear time.
For that we can use the atomic heaps of Fredman and Willard [12], which allow constant time search and insert operations on a collection of O( √ log n)-elements sets. This results in a fairly complicated structure, which can be greatly simplified since in our case not only are the sets small, but the size of the universe is bounded as well. Simplifying Assumptions We assume that the value of log n is known. Since n is not known in advance, when we read elements of A one-by-one, as soon as the value of n doubles, we repeat the whole computation with a new value of log n . This changes the running time only by a constant factor.
It is enough to give the construction for the alphabet of size log n as for alphabets of size log c n we can encode each letter in c characters chosen from an alphabet of a logarithmic size.

First Step: Dictionary for Small Number of Elements
We implement an online dictionary for an universe of size log n. Both access and insert time are constant and the memory usage is at most linear in the number of elements stored. The first step of the construction is a simpler case of t keys, for t ≤ √ log n. Then this construction is folded twice to obtain the general case of t = Θ(log n). One step of such a construction is depicted on Fig. 5.
The indices of items currently present in the dictionary are encoded in one machine word, called the characteristic vector V , in which the bit V [i] = 1 if and only if dictionary contains key i.
We store pointer to the keys in the dictionary in a dynamically resized pointer table, in order of their arrival times: whenever we insert a new item, its pointer is put right after the previously added one. Additionally, we keep a permutation table P that encodes the order in which currently stored elements have been inserted. In other words, P [i] stores the position in the pointer table of the pointer to i. Since t ≤ √ log n, all successive values of such permutation can be stored in one machine word. Updating the Information for Small Number of Elements When a new key k arrives, it is stored in the memory at the next available position and a pointer to it is put in the dictionary: firstly we set V [k] = 1 and insert the pointer on the last position at the pointer table. We also need to update the permutation table. To do this, we calculate j = #{k < k : V [k ] = 1} and m = #{k : V [k ] = 1}, this is done in the same way as when accessing the stored pointer. Then we change the permutation table: we move all the numbers on positions greater than j one position higher and write m + 1 on position j . Since the whole permutation table fits in one code-word, this can be done in constant time: let P be the table P with all positions larger than j − 1 masked out and P the table with all position smaller than j masked out. Then we shift P by one position higher and set P ← P |P . Then we set P [j ] = m + 1.

Larger Number of Elements
When the number of items becomes bigger, we fold the above construction twice (somehow resembling the B-tree of order t = √ log n): choose a subset of keys k 1 < k 2 < · · · < k such that between k j and k j +1 there are at least t and at most 2t other keys. Observe that k 1 < k 2 < · · · < k can be kept in the above structure, with constant update and access time, we refer to it as the top structure. Moreover, for each i the keys between k i and k i+1 also can be kept in such a structure. We refer to those structures as the bottom structures.

Access for Large Number of Elements
To access information associated with a given key k, we first look up the largest chosen key smaller than k in the top structure and then look up k in the corresponding bottom structure. The second operation is already known to have constant amortised time. The first operation can be done in O(1) time by first masking out the bits on positions larger than k in top characteristic vector and then extracting the position of the largest bit. Again this can be done using standard techniques.

Update for Large Number of Elements
When we insert new item k, firstly we find i such that k i−1 ≤ k < k i , where k i−1 and k i are elements of the top structure. This is done in the same way as when information on k is accessed. Then k is inserted into proper bottom structure.
If after an insertion the bottom structure has 2t + 1 elements, we choose its middle element, insert it into the top structure, and split the keys into two parts consisting of t elements, creating two new bottom structures out of them. This requires O(t) time but the amortised insertion time is only O(1): the size of the bottom structure is t after the split and 2t before the next split, so we can charge the cost to the new t keys inserted into the tree before the splits. If k < 2 then the claim is trivial. So let k = k − 2 ≥ 0 and assume that there are more than 48 different values from [2 k , 2 k+1 ) = [4 · 2 k , 8 · 2 k ) occurring in some segment of length 2 k . Then more than 12 different values from [4 · 2 k , 8 · 2 k ) occur in a segment of length 2 k . Split the range [4 · 2 k , 8 · 2 k ) into three subranges [4 · 2 k , 5 · 2 k ), [5 · 2 k , 6 · 2 k ) and [6 · 2 k , 8 · 2 k ). Then at least 5 different values from one of these subranges occur in the segment; let [ , r) be that subrange. Note that (no matter which one it is),

Compressing
Let these 5 different values occur at positions p 1 < · · · < p 5 . Consider the sequence p i − π [p i ] + 1 for i = 1, . . . , 5: these are the beginnings of the corresponding nonextensible borders. In particular p i 's are pairwise different (since they are ends of non-extendable borders). Each sequence of length 5 contains a monotone subsequence of length 3. We consider the cases of decreasing and increasing sequence separately: 1. There exist p i 1 < p i 2 < p i 3 in this segment such that Fig. 6. Then by the definition of π [p i 1 ], x = y. We derive a contradiction by showing that x = y. To this end we use the periodicity of the word w. Define see Fig. 6. Define s = π [p i 1 ]+b, see Fig. 6; then both a, b are periods of w [1 . . s], see Fig. 6. We show that a, b ≤ s 2 and so periodicity lemma can be applied to them and word w[1 . . s].
By periodicity lemma b − a is also a period of w[1 . . s]. As position p i 1 + 1 is covered by the non-extensible border ending at p i 2 (note that b < 2 and π [p i 1 ] ≥ ): see Fig. 7. Note that ] is a letter from word w [1 . . s], which has a period b − a. Hence contradiction. 2. There exist p i 1 < p i 2 < p i 3 in this segment such that as depicted on Fig. 8. We estimate their sum: There are two subcases, depending on whether π Fig. 9. Then by definition of π [p i 1 ], x = y. We obtain a contradiction by showing that x = y.
Since the non-extensible border ending at p i 3 spans over position p i 1 + 1 and a + b < π [p i 1 ] (see (7)) it holds that Comparing the non-extensible borders ending at p i 2 and p i 3 we deduce that b is a period of w [1 . . π [p i 2 ]] and as π [ Similarly by comparing the non-extensible prefixes ending at p i 1 and p i 2 we deduce that a is a period of w [1 . . π [p i 1 ]]. Thus and therefore by (8) and (9) Fig. 10. We show that x = y and hence obtain a contradiction. Since non-extensible border ending at p i 3 spans over position p i 2 + 1, we obtain that see Fig. 10. By comparing non-extensible prefixes ending at p i 1 and p i 2 we deduce that a is a period of w [1 . . π By comparing the non-extensible prefixes ending at p i 2 and p i 3 we deduce (7), it holds that As a is a period of w [1 . . π So by (10) and (11) x = y , contradiction.
Lemma 9 can be used to bound the size of the compressed representation Compress(A ) of A . As the alphabet of Compress(A ) is of polylogarithmic size, the suffix tree for Compress(A ) can be constructed in linear time by Lemma 8.

Subchecks Consider consistency check: is
We first establish equivalence of this equality with equality of proper fragments of Compress(A ). Note, that A [ ] = A [ ] does not imply the equality of two corresponding fragments of Compress(A ), as they may refer to previous values of A . Still, such references can be only log 2 n elements backwards. This observation is formalised as follows: if and only if and A j . . j + min k, log 2 n − 1 = A i . . i + min k, log 2 n − 1 .
Proof If k ≤ log 2 n, the claim holds trivially, as (12) and (14) are exactly the same and (13) holds vacuously. So suppose that k > log 2 n.
, both fragments of Compress(A ) are created using the same input, and so they are equal. Thus (13) holds, which ends the proof in this direction.
⇐ Assume that (13) and (14)  Similarly as in the Sect. 8.1, we assume that log n is known. In the same way we repeat the whole computation from the scratch as soon as it value changes. This increases the running time by a constant factor.
We call the checks of the form (13) the compressed consistency checks, checks of the form (14)-short consistency checks and the near short consistency checks when moreover |i − j | < log 2 n. The compressed consistency checks can be answered in amortised constant time using LCA query [3] on the suffix tree built for Compress(A ). It remains to show how to perform short consistency checks in amortised constant time.

Performing Short Consistency Checks
Performing Near Short Consistency Checks To answer near short consistency checks efficiently, we split A into blocks of log 2 n consecutive letters: A = B 1 B 2 . . . B , see Fig. 11. Then we build suffix trees for each pair of consecutive blocks, i.e., B 1 B 2 , B 2 B 3 , . . . , B −1 B . Each block contains at most log 2 n values smaller than log 2 n, and at most 48 log n larger values by Lemma 9, so all suffix trees can be built in linear time by Lemma 8. For each tree we also build a data structure supporting constant-time LCA queries [3]. Then, any near short consistency check reduces to an LCA query in one of these suffix trees. Such a query also gives the actual length of the longest common prefix of the two compared strings; this is used in performing short consistency checks.

Performing Short Consistency Checks Consider again a short consistency check, which is of the form 'does
and k ≤ log 2 n. To improve the running time, the results of previous short consistency checks are reused: we store j best (which is one of indices for which previously we run short consistency check) such that j ≤ j best ≤ j + log 2 n -the length (say L) of the common prefix of A [i . . i + k − 1] and A [j best . . j best + k − 1] is known.
To answer short consistency check we first compute the common prefix of A [j . . j + k − 1] and A [j best . . j best + k − 1] (which can be done using near short consistency check) and compare it with L. If it is smaller than min(L, k), then clearly the common prefix of A [j . . j +k −1] and A [i . . i +k −1] is smaller than k; if it equals L then we naively compute the common prefix of A [j + L . . j + k − 1] and A [i + L . . i + k − 1] by letter-to-letter comparisons. Also, in such a case we switch j best to j , as it has a longer common prefix with Simplifying Assumption To simplify the presentation and analysis, we assume that the adjusting of the last slope is done in a slightly different way than written in the code of ADJUST-LAST-SLOPE (see Invariants During short consistency check we make sure that the following invariants for j best and L are preserved: We refer to them as (15a)-(15e). The intuition behind the invariants is as follows: (15a) simply states that we are interested in common prefix of length at most k. The (15b) justifies the choice of j best , i.e. we know the common prefix of A starting at j best and at i. The (15c) ensures that comparing A starting at j and j best can be done using near short consistency check. The (15d) says that if j = j best then there is a reason for that: Finally, (15e) shows maximality of L: either it is k (so it cannot be larger) or there is a mismatch at the 'next position'.
Potential The analysis of the running time is amortised. We define a potential of the configuration of LINEAR-VALIDATE-π as Let x denote the change of the value of x in some fragment of an algorithm (which will be always clear from the context); let s be the cost of comparisons and near short consistency checks (i.e. their number). Then the amortised cost is p + s. There are some additional costs, like comparing indices, checking conditions etc. All such costs are assigned either to letter-to-letter comparisons or to near short consistency checks. Note that when the change of the potential is negative then it actually helps in paying for near short consistency checks and letter-by-letter comparisons. Since 0 ≤ L ≤ k and j ≤ j best ≤ j + log 2 n, at any point the potential is non-negative and at most log 2 n, so the total cost at any point is the sum of amortised costs in each step and the potential, which is sublinear.
We pay for the amortised cost using credit that we get for the changes of n and j : For every increase n, we get 8 n units of credit; for every change of j we get 8| j | units of credit. Clearly the sum of all n is n, so in this way we are scored at most 2n credit. We show that the sum of all | j | is also O(n).

Lemma 11
The sum of all | j | over the whole run VALIDATE-π is 2n.
Proof For the purpose of the proof, whenever we change the value of i or j let i , j refer to the new values and i, j to the old ones.
It is enough to show that the sum of all increments of j is at most n then clearly the sum of all decrements of j are at most n as well. L ← L + 1 3: End while L is the length of the common prefix The j increases only when the pin i is updated in line 13 of ADJUST-LAST-SLOPE, otherwise it can only decrease. Moreover, when j is incremented, it increases by at most i: Note that in the third equality we essentially used the simplifying assumption: as i and i are on the same (last) slope, we have Since i ≤ n and i only increases, its sum of increments is at most n. So the total sum of increments of j is at most n, as claimed.

Letter-by-Letter Comparisons
The letter by letter comparisons, see Algorithm 7, are used to ensure that (15e) holds: when we already know that L letters starting at A [i] and A [j best ] are the same but we are not sure whether this is the maximal possible value of L, we verify this naively. The amortised cost is only 1, as each successful comparison decreases the potential by 1.
Proof For the purpose of the proof, let L 0 be the initial value of L and L 1 the final value of L; by 'L' we denote the value inside LETTER-BY-LETTER.
Note that i, j and k are not altered. For (15a), by assumption L 0 ≤ k before COMMON-SHORT-CONSISTENCY-CHECK, we increment L by 1 and stop as soon as it reaches k, so L 1 ≤ k. For (15b) note that A [j best . . j best + L 0 − 1] = A [i . . i + L 0 − 1] holds by the assumption and we verified A [j best + L 0 . . j best + L 1 − 1] = A [i + L 0 . . i + L 1 − 1] letter by letter. Invariant (15c) holds as neither j nor j best was changed. As for (15e), it is the termination condition of the while loop, so it holds upon its termination.
Concerning the amortised cost: i, j , j best do not change, so p = − L, i.e. it is negative. On the other hand we make L successful letter-to-letter comparisons and perhaps one unsuccessful one (we ignore the cost of checking whether L = k, as they are at most as high as the cost of letter-to-letter comparisons). So the cost of comparisons is at most L + 1. Hence the amortised cost is at most − L + L + 1 = 1, as claimed.
Answering Short Consistency Checks Using j best When we get new values of i, j and k we need to update j best and L. It turns out that as soon as we update j best and L so that they satisfy (15a)-(15c), answering short consistency check is easy: we first make letter-by-letter comparisons using LETTER-BY-LETTER to ensure that also (15e) holds, i.e. that L is maximal. Then we check the length of the common If it is less than L, then the answer to short consistency check is no. If it is at least L, then we set j best to j (as it is as good as j best ), run letter-by-letter comparison again to check whether L is k, and answer accordingly. It is easy to verify that the amortised cost of this procedure is constant and that all (15a)-(15e) hold afterward. Details are given in Algorithm 8 and lemmata below. Proof Regarding the cost, the amortised cost of LETTER-BY-LETTERis 1 by Lemma 12, setting j best to j can only lower potential, and NEAR-SHORT-CONSIST-ENCY-CHECK are answered in constant time using suffix trees.
We now show that after COMMON-SHORT-CONSISTENCY-CHECK all (15a)-(15e) hold. By assumption initially (15a)-(15c) hold. By Lemma 12 after the first LETTER-BY-LETTER they still hold and additionally (15e) holds. Suppose that < L, in particular j = j best . Then (15d) simply states that < L, which is the case. So suppose that ≥ L. Resetting j best to j may make (15e) invalid, but (15a)-(15c) are preserved: the (15a) holds as we do not change L, the (15b) holds as we know that A [j best . . j best + k − 1] has a common prefix of length L with both A [j . . j + k − 1] and A [i . . i + k − 1] and so also A [j . . j + k − 1] and A [i . . i + k − 1] have a common prefix of length L. The (15c) holds trivially. By Lemma 12 the (15a)-(15c) and (15e) hold after LETTER-BY-LETTER. Note that LETTER-BY-LETTER does not modify j and so (15d) trivially holds, as j = j best .
Concerning the correctness: if < L then j = j best and from (15b)  It remains to show how to update j best and L.
Types of Short Consistency Checks The way we update j best and L depends on why the short consistency check is made; we distinguish three situations in which ADJUST-LAST-SLOPE invokes short consistency check: (Type 1) This is a first iteration of ADJUST-LAST-SLOPE and PIN-VALUE-CHECK did not return any index in this iteration. (Type 2) This is not a first iteration of ADJUST-LAST-SLOPE and PIN-VALUE-CHECK did not return any index in this iteration. (Type 3) The PIN-VALUE-CHECK did return an index in this iteration.
We begin with showing what are the changes of i, j , k and n in each of those types of short consistency check.

Lemma 14
In Type 1 short consistency check it holds that i = j = 0, k ≥ 0 and n ≥ max(1, k); exactly 8 n units of credit are issued.
In Type 2 short consistency check it holds that i = 0, j < 0 and k = n = 0; exactly 8| j | units of credit are issued.
In Type 3 short consistency check it holds that i > 0, j = i and − i ≤ k ≤ 0, n ≥ 0; exactly 8 j + 8 n units of credit are issued.
Note in particular that when i, j are known, we can figure out which type of query this is: Type 3 short consistency check is unique with i > 0, Type 2 with j < 0 while Type 1 with i = j = 0.
Proof Recall that we issue 8 n + 8| j | units of credit, which yields the claim on the number of credit issued in each of the cases. Type 1 short consistency check: Since this is the first iteration of ADJUST-LAST-SLOPE it means that we read A [n] and it is not equal to A [A[n]]. In particular, since the last invocation of ADJUST-LAST-SLOPE we read at least one additional value of A . Hence n ≥ 1. As PIN-VALUE-CHECK did not return any index, we do not modify i and j since the last invocation of the short consistency check, so i = j = 0. Concerning k, recall that the short consistency check is asked only on A [i . . min(n, i + log 2 n − 1)], i.e. k = min(n − i + 1, log 2 n). Hence, when k 0 and n 0 are the values of k and n when previous short consistency check was asked, we have k 0 = min(n 0 − i + 1, log 2 n) (note that we can assume that log n and log n 0 are the same, as we repeat the calculation as soon as log n increases). Then k ≥ k 0 and k ≤ n, but there is no guarantee that k > 0, i.e., k 0 = k can happen when n 0 − i + 1 > log 2 n.
Type 2 short consistency check: in this case short consistency check is asked in iteration of ADJUST-LAST-SLOPE that is not the first one, and the PIN-VALUE-CHECK did not return any index in this iteration. Which means that A[i] is assigned the next candidate in line 14. Thus i, k are unchanged as compared to the previous Algorithm 9 TYPE-1-UPDATE-j best 1: continue short consistency check, while j is decreased, hence i = 0, j < 0 and k = 0. Furthermore, we do not read any new value of A , so n = 0. Type 3 short consistency check: In this case the short consistency check is run for the same slope, but pin is moved, thus the new value i is larger than the old i. By our simplifying assumption we do not decrease the last slope, just place new i on it, i.e. we set i.e., we take new j such that j = i. As n only increases, n ≥ 0. Concerning k, recall again that k = min(n − i + 1, log 2 n), hence 0 ≥ k ≥ − i.
In the following, we describe how to update j best and L in those three different cases so that (15a)-(15c) are preserved.

Type 1 Updates
In this case we do not need any update, as described in Algorithm 9.

Lemma 15
Suppose that we are to make Type 1 short consistency check and all (15a)-(15e) hold. Then (15a)-(15c) are preserved and the amortised cost is at most n.
Proof Let us inspect the change of potential: By Lemma 14 we know that j = 0 and k ≤ n, we do not change j best nor L so L = j best = 0. Hence Concerning the invariants: as L is unchanged and k ≥ 0 by Lemma 14 we get that (15a) is preserved. Similarly, since we do not change j , j best , L, the (15b)-(15c) are preserved.
This allows calculating the whole cost of answering Type 1 short consistency check.

Corollary 3
In Type 1 of short consistency check the amortised cost of TYPE-1-UPDATE-j best and COMMON-SHORT-CONSISTENCY-CHECK is covered by the released credit. The TYPE-1-UPDATE-j best followed by COMMON-SHORT-CONSIST-ENCY-CHECK preserves (15a)-(15e) and returns the correct answer to short consistency check.
Proof By Lemma 15 the update of j best and L has amortised cost at most n. By Lemma 13 the amortised cost of COMMON-SHORT-CONSISTENCY-CHECK is Algorithm 10 TYPE-2-UPDATE-j best 1: if j + log 2 n < j best then 2: j best ← j , L ← 0 3: end if at most 6. On the other hand, by Lemma 14 we know that 8 n ≥ 6 + n credit is issued, which suffice to pay for the amortised cost.
Concerning the correctness, by Lemma 15 the (15a)-(15c) are satisfied after TYPE-1-UPDATE-j best which by Lemma 13 means that after COMMON-SHORT-CONSISTENCY-CHECK all (15a)-(15e) hold and the answer to short consistency check is correct.

Type 2 Updates
Since j is decreased, it might be that j and j best no longer satisfy (15b), (as j + log 2 n < j best ). In such a case we set j ← j best and L ← 0, see Algorithm 10.

Lemma 16
Assume that all (15a)-(15e) hold and we are to make Type 2 short consistency check. Then after TYPE-2-UPDATE-j best the (15a) and (15c) are preserved. The amortised cost is at most | j | + 1.
Proof Suppose that j + log 2 n ≥ j best . The invariants (15b)-(15c) hold by assumption, as none of i, j best , L and k was modified. For (15c) note that j best ≤ j + log 2 n holds by case assumption and j ≤ j best held by assumption even before the decrement of j , so it holds now as well.
The change of the potential: by Lemma 14, we know that i = k = n = 0 and j < 0. Since L and j best were not changed, we have The cost is 1 for the comparison and so the amortised cost is | j | + 1.
If j + log 2 n < j best then after setting j best ← j and L ← 0 the (15a)-(15c) trivially hold. The change of potential is By Lemma 14 we know that k = 0. As j + log 2 n < j best we obtain that j best < − log 2 n. Since L was reset to 0 we have − L = −(−L 0 ) = L 0 , where L 0 was the previous value of L. We know that L 0 ≤ k ≤ log 2 n and so p < 0 + log 2 n − log 2 n − j = | j |.
There is additional cost 1 for the comparison of j and j best (we hide the cost of changing j best and L in it). Hence the amortised cost is at most 1 + | j |.

Algorithm 11 TYPE-3-UPDATE-j best
1: j best ← j best + j , L ← L − j 2: if L < 0 then 3: j best ← j , L ← 0 4: end if Corollary 4 In Type 2 of short consistency check the amortised cost of TYPE-2-UPDATE-j best and COMMON-SHORT-CONSISTENCY-CHECK is covered by the issued credit. The TYPE-2-UPDATE-j best followed by COMMON-SHORT-CONSISTEN-CY-CHECK preserve (15a)-(15e) and correctly answers short consistency check.
Proof By Lemma 16, the update of j best and L has amortised cost at most | j | + 1. By Lemma 13, the amortised cost of COMMON-SHORT-CONSISTENCY-CHECK is 6. On the other hand, by Lemma 14, 8| j | ≥ 7 + j credit is issued, which suffice to pay for the amortised cost.
Concerning the correctness, by Lemma 16 after TYPE-2-UPDATE-j best the (15a)-(15c) hold and so by Lemma 13 adter COMMON-SHORT-CONSISTENCY-CHECK all (15a)-(15e) hold and the answer to short consistency check is correct.

Type 3 Updates
It is left to show how to update j best and L in the Type 3 short consistency check, see Algorithm 11. In this case both j and i were increased by the same value j , see Lemma 14. This means that the new A [j . . j + k − 1] and A [i . . i + k − 1] are the suffixes of the old ones. In particular, A [j best . . j best + L − 1] has nothing to do with A [i . . i + L − 1]; still, if we also increase j best by j then the new A [j best . . j best + L − 1] is also a suffix of the old one. Unfortunately, as every table we consider is a suffix of the old one, we have to decrease L by j as well. If this turns L non-positive then A [j best . . j best + L − 1] is empty and we reset j best to j and L to 0.
Proof Consider the case in which j best and L are not reset. By Lemma 14 we get that k is decreased by at most j , while we decrease L by j , hence (15a) is preserved. Concerning (15b) let L , i and j best be the previous values of L, i and j best . Then . So (15b) is preserved. For (15c) note that we decremented j and j best by the same value j , so (15c) is preserved.
Concerning the change of potential in this case, p = k − L + j best − j By Lemma 14 k ≤ 0. We decrease L by j , so L = − j and increase j best by j , so j best = j . Hence The additional cost is 1 for the test, so the amortised cost is at most j + 1. Now consider the case in which after the decrement by j the L is non-positive, i.e., we reset j best to j and L to 0. Then (15a)-(15b) hold trivially, as L = 0, and (15c) holds because j = j best . Concerning the cost, we pay 1 for comparisons and the change of potential is: By Lemma 14 k ≤ 0. Since decreasing L by j made it non-positive and then we set it to 0, i.e., increase L, so L ≥ − j . Lastly, j best − j is now equal 0 and used to be non-negative by (15c), so j best − j ≤ 0. Hence So the amortised cost is at most 1 + j .

Corollary 5
In Type 3 of short consistency check the amortised cost of TYPE-3-UPDATE-j best and COMMON-SHORT-CONSISTENCY-CHECK is covered by the issued credit. The TYPE-3-UPDATE-j best followed by COMMON-SHORT-CONSISTEN-CY-CHECK preserve (15a)-(15e) and returns a proper answer to short consistency check.
Proof By Lemma 17 the update of j best and L has amortised cost at most 1 + j . By Lemma 13 the amortised cost of COMMON-SHORT-CONSISTENCY-CHECK is at most 6. On the other hand, by Lemma 14 we obtain that at least 8 j ≥ 7 + j credit is issued, which suffice to pay for the amortised cost.
Concerning the correctness, by Lemma 16 after TYPE-3-UPDATE-j best the (15a)-(15c) hold and so by Lemma 13 after COMMON-SHORT-CONSISTENCY-CHECK all (15a)-(15e) hold and furthermore the answer to the short consistency check is correct.
In the end, the short consistency check is performed as follows: depending on which type it is, we run one of TYPE-1-UPDATE-j best , TYPE-2-UPDATE-j best , TYPE-3-UPDATE-j best . Afterwards we apply COMMON-SHORT-CONSISTENCY-CHECK. By Corollary 3-5 the answer returned to short consistency check is correct and the issued credit covers the whole cost. Since the issued credit is linear, we are done.
Running Time VALIDATE-π runs in O(n) time: construction of the suffix trees and doing consistency checks, as well as doing pin value checks all take O(n) time.

Remarks and Open Problems
While VALIDATE-π produces the word w over the minimum alphabet such that π w = A on-line, this is not the case with VALIDATE-π and LINEAR-VALIDATE-π . At each time-step both these algorithms can output a word over minimum alphabet such that π w = A , but the letters assigned to positions on the last slope may yet change as further entries of A are read.
Two interesting questions remain: is it possible to remove the suffix trees and LCA queries from our algorithm without hindering its time complexity? We believe that deeper combinatorial insight might result in a positive answer.