On Optimal Representatives of Finite Coloured Linear Orders

Two structures A and B are n-equivalent if player II has a winning strategy in the n-move Ehrenfeucht-Fraïssé game on A and B. We extend earlier results about n-equivalence classes for finite coloured linear orders, describing an algorithm for reducing to canonical form under 2-equivalence, and concentrating on the cases of 2 and 3 moves.


Introduction
In [1] we studied the equivalence of finite coloured linear orders up to level n in an Ehrenfeucht-Fraïssé game, written as ≡ n . This means that player II has a winning strategy in an n-move Ehrenfeucht-Fraïssé game. We also made some remarks about the infinite case. We gave some bounds for the optimal representatives in the finite case, meaning ones of minimum length, and the infinite case for up to 2 moves. These results were extended in [2] to all coloured ordinals, in the monochromatic case giving a precise list of optimal representatives, and in the coloured case giving bounds.
In this paper we return to the finite case, and extend the work of the first paper, by improving the bounds in some instances, and throwing further light on uniqueness of representatives. First we briefly recall the required definitions. A coloured linear ordering, also called a coloured string, is a triple (A, <, F ) where (A, <) is a linear order and F is a mapping from A onto a set C of 'colours'. We just write A instead of (A, <, F ) provided that the ordering and colouring are clear, and if X ⊆ A, we write F (X) for {F (x) : x ∈ X}. In the n-move Ehrenfeucht-Fraïssé game on coloured linear orders A and B players I and II play alternately, I moving first. On each move I picks an element of either structure, and II responds by choosing an element of the other structure. After n moves, I and II between them have chosen elements a 1 , a 2 , . . . , a n of A, and b 1 , b 2 , . . . , b n of B, and player II wins if the map taking a i to b i for each i is an order and colour-isomorphism, and player I wins otherwise. We say that A and B are n-equivalent and write A ≡ n B, if II has a winning strategy. Then ≡ n is an equivalence relation having just finitely many n-equivalence classes. An optimal representative is a member of an equivalence class of least possible length (which is chosen lexicographically least if there is more than one such, that is according to dictionary order assuming some fixed ordering of the set of colours), and a string or coloured string is called optimal if it is optimal in its equivalence class.
In [1] we gave upper bounds for the lengths of optimal representatives of ≡ n -classes of m-coloured finite linear orders. Only for the case n = 2 were these bounds exact. We return to this case, describing explicitly the classification of finite m-coloured linear orders up to ≡ 2 -equivalence (based on the idea of an 'm-configuration' introduced in [1]). From this we are able to read off which equivalence classes are finite or infinite, and provide an algorithm for determining an optimal representative corresponding to any given finite mcoloured linear order. We also show that a finite coloured linear order is ≡ 2 -optimal if and only if each 1-character (see below for the definition) appears at most once.
The problem for more than 2 moves seems to be quite hard, so we concentrate on the case of 3 moves. The idea is that using a key inductive lemma from [1], we need to understand better how the 2-characters behave, and that is the reason for re-examining the case n = 2 in more detail.
Next we recall the notion of 'character' from [1], and the main result about characters. Let us fix an ordering of the set of colours. We write (m, n) for the family of all finite mcoloured linear orders A such that A has least length in its ≡ n -class, and subject to this, A is lexicographically least (with respect to the ordering of the colours chosen). Since there are only finitely many ≡ n -classes, (m, n) is finite. We may therefore define g(m, n) to be the maximum of the lengths of members of (m, n). We write the representative for A as [A] n . In a coloured linear order A, the n-character of a ∈ A having colour c is the ordered triple Thus we shall always include the colour as part of the n-character of a, (unlike in [1]). This means for instance that the next theorem reads a little differently from there, but it has precisely the same content. We interchangeably say that a realizes this n-character, or determines it, or exhibits it.
We need the following result, which is Theorem 2.1 of [1] (which we call the 'Cutting lemma').

Lemma 1.2 Let
A be a finite m-coloured linear order and let a and b be elements of A such that a < b satisfying the following conditions: (i) a and b determine the same n-character, (ii) for every x ∈ A with a < x ≤ b, there is y ≤ a having the same n-character as x.
Note that this may be applied in a trivial case, namely, that no two consecutive points of an ≡ n+1 -optimal finite string can have equal n-characters.
It is clear from Theorem 1.1 that if in an m-coloured linear order, no two points have the same n-character, then the ordering is ≡ n+1 -optimal, meaning that it is not (n + 1)equivalent to any shorter ordering. Based on this, we present a construction of a finite 2-coloured linear order of length 70 in which all points have distinct 2-characters, and which is therefore ≡ 3 -optimal, and show that 70 is the greatest possible number in which all 2characters are distinct. We also construct a finite coloured ≡ 3 -optimal linear order of length 74, in which 2-characters must therefore repeat. It should be possible to find longer examples, but the details would be quite tedious, so giving one of this length is good enough to illustrate the idea. This casts some light on the hypothesis required for the 'cutting lemma' (that is, what it says is not that we can reduce the length just based on repetition of characters-more is required about what happens in between).
The typical case we have in mind is that when searching for optimal representatives, we start with a possibly long coloured order, and successively reduce it by removing pieces, retaining n-equivalence, till it becomes optimal. The proof of [1] is too indirect to guarantee immediately that the final ordering is a subordering of the one we start with. We therefore extend the material of [1] by showing that for 2-equivalence at any rate, we can guarantee that the optimal representative is contained in the original one; we present an algorithm for achieving this. We believe that this is false for n = 3, and in Section 3 explain why.
With regard to the general case, but particularly applied to n = 3, we use directed graphs to help analyze n-equivalence. One method would be to take (n − 1)-characters themselves as vertices of the directed graph, with an arrow going from X 1 , Y 1 , c 1 to X 2 , Y 2 , c 2 if for some representatives x 1 , x 2 , y 1 , y 2 of X 1 , Y 1 , X 2 , Y 2 , x 1 c 1 ≡ n−1 x 2 and c 2 y 2 ≡ n−1 y 1 , where these are the strings obtained from x 1 , y 2 by adding a c 1 -coloured point on the right, a c 2 -coloured point on the left respectively. The idea is that in scanning a (long) word from left to right, at each point we can view its (n − 1)-character to left and right, and see how this varies. In practice in what we present here for m = 2, n = 3, we focus just on the 'middle' section of the given string, in which case a simplified directed graph gives all the information we require.

Classification of 2-Equivalence Classes
In this section we give a lot more detail about the 2-equivalence classes of finite coloured linear orders. In [1] we established the precise value (m 2 + 2m) of the least upper bound of the lengths of the optimal representatives of ≡ 2 -classes. Here we are able, using similar ideas, to give an explicit list of all the ≡ 2 -classes, from which we can read off, for instance, the length of the optimal representative of each class, and also note which classes are finite or infinite. The key idea here is to use the notion of 'm-configuration' which was introduced in [1].
We fix m as the number of colours. An m-configuration is then defined to be a linear order of the form T = {x i : 1 ≤ i ≤ m} ∪ {y i : 1 ≤ i ≤ m} in which x 1 < x 2 < . . . < x m and y 1 > y 2 > . . . > y m , and x 1 and y 1 are the least and greatest members of T respectively. Here all the x i are therefore distinct, and so are the y i , but it is not ruled out that x i = y j for certain i and j . Each m-configuration therefore has size between m and 2m. It is understood that the sequences (x i ) and (y i ) are part of the configuration. That is, to determine the configuration, we have to know not just which linear ordering it is, but also which of its entries are which x i or y j (as is clear from the examples given after Theorem 2.4).
Let (A, ≤, F ) be a finite coloured linear order having m colours. From this we can derive an associated m-configuration, which is defined to be the linear order induced on {x i : : z ∈ A, z ≤ x} has i elements, and y i is the greatest point y of A such that {F (z) : z ∈ A, z ≥ y} has i elements. Under these circumstances, an m-configuration receives the colouring induced from that on A. However, the same m-configuration may be thereby coloured in several different ways, if it is viewed as a substructure of possibly different coloured linear orders. Furthermore, not all m-configurations are associated with a coloured linear order at all. The following lemma explains when this happens. An mconfiguration together with a colouring that it receives in this way from some (A, ≤, F ) is called a coloured m-configuration.
Proof First to check the necessity of the given condition, suppose that The colouring is now given as follows. If y j = x i then we let F (y j ) = F (x i ). Otherwise consider colouring all the y j s which lie in (x i , x i+1 ). By the remark just made, there are at most i values of j such that y j < x i+1 , and there are i colours available for {y j : y j < x i+1 }. We have so far used |{j : y j ≤ x i }| of these colours, so the number remaining is i − |{j : y j ≤ x i }| ≥ |{j : y j < x i+1 }| − |{j : y j ≤ x i }| = |{j : x i < y j < x i+1 }|, and these points are coloured in any way using the available colours.
The construction has explicitly ensured that for each i, x i is the least point such that (−∞, x i ] is coloured by i colours. To verify the corresponding condition for y i , note that there are certainly exactly i values of k ≤ i such that y k ≥ y i , and these points are all coloured by distinct colours. Suppose that x j ≥ y i . Then as there are m colours, and all y k points are distinctly coloured, there is k such that F (x j ) = F (y k ). If k > i then y k < y i ≤ x j , contrary to x j the least point coloured F (x j ). We deduce that k ≤ i, and so F (x j ) ∈ {F (y k ) : k ≤ i} as required.
To specify a finite coloured order (A, ≤, F ) up to 2-equivalence, we need to know first of all what its associated m-configuration T is, and in addition what colouring T receives (so we need to know what its associated coloured m-configuration is). We also need to j j y j Conversely, assuming the given condition holds, let the m-configuration T = {x i : 1 ≤ i ≤ m} ∪ {y i : 1 ≤ i ≤ m} be given, and we have to find a coloured linear order (A, ≤, F ) such that T is the associated m-configuration. We take A = T , and have to show how the points can be coloured so that x i is the least point x of A such that {F (z) : z ∈ A, z ≤ x} has i elements, and y i is the greatest point y of A such that {F (z) : z ∈ A, z ≥ y} has i elements. Let us start by colouring the x i by distinct colours. Clearly this ensures that x i is the least point such that {F (x k ) : x k ∈ A, x k ≤ x i } has i elements. We have to colour the y j so that no member of {F (z) : z ∈ A, z ≤ x i } has a 'new' colour. We assign colours successively to y m , y m−1 , . . ., y 1 according to which of the sets . ., {x m }, (x m , ∞) they lie in. Given i, let j be the least such that y j < x i+1 (if any). Then by hypothesis, i + 1 + j ≤ m + 1, so j > m − i. Hence there are at most i values of j such that y j < x i+1 .
know which colours arise as the colours of points (of A) lying between any two consecutive members of T , and we write the set of colours between u and v as χ (u, v). We write C T ,χ for the set of all finite coloured orders such that T is the associated coloured m-configuration and colours between the points are given by χ . Note that not all sets of colours are possible for χ (u, v) and they will be constrained by the x i and y j . If for ease we write x m+1 = ∞ and y m+1 = −∞ (not coloured) then a point with colour c can be inserted (without alte- Proof This relies on Theorem 1.1, which tells us that A ≡ 2 A if and only if they exhibit the same 1-characters. Let Similarly for y j and y j . Next we have to see that x i ≤ y j ⇔ x i ≤ y j , and similarly for <. Suppose that x i ≤ y j (x i < y j respectively). Then x i has at least j − 1 colours to the right (at least j respectively), and as it realizes the same character as x i , this is also true of x i in A , and so it follows that x i ≤ y j (x i < y j respectively). We deduce that the same coloured m-configurations arise from A and A . To see that the corresponding functions χ and χ are equal, let u < v be consecutive members of Then the left and right 1characters of each member of (u, v) are {F (z) : z ≤ u} and {F (z) : z ≥ v} respectively, and furthermore, these characters are not realized by any other members of A. Precisely these same left and right characters are realized in (u , v ), and since the only extra ingredient required to specify the character is the colour of the point, it follows that exactly the same set of colours arises in (u, v) and (u , v ). In other words, χ (u, v) = χ (u , v ).
Conversely, supposing that A and A both lie in the same C T ,χ , we see that they both realize the same 1-characters, so are 2-equivalent.

Corollary 2.3 A ≡ 2 -class of finite linear coloured orders is finite if and only if it is a singleton, which holds if and only if χ (u, v) = ∅ for each u, v.
From Theorem 2.2 we derive an algorithm for determining an optimal member of the 2-equivalence class of a finite coloured linear order A. From A we first evaluate x i and y j . Then we replace each interval (u, v) such that u and v are consecutive points of the resulting m-configuration by one in which each of its colours only arises once. This leads to the following result. If m = 1 with colour r, then there are two possible m-configurations, with x 1 = y 1 or x 1 < y 1 . The former gives us just a singleton r (since there is no interval of consecutive points into which new elements can be inserted), and the latter rr which is a singleton ≡ 2 -class, and rrr, which lies in the infinite ≡ 2 -class {r n : 3 ≤ n} (where χ(x 1 , y 1 ) = {r}).
Including the allowed insertions, where we write r k for an arbitrary sequence of k r's (k ≥ 0), similarly b l , and w(r, b) an arbitrary string of r's and b's, this gives rise to the following list for rbrb: rr k bw(r, b)rb l , 16 ≡ 2 -classes (two options for each of k and l, and four for w(r, b)), similarly for brbr, rbbr and brrb, for rbr: rr k br l r, 4 ≡ 2 -classes, similarly for brb, for rrbb: rr k rbb l b, 4 ≡ 2 -classes, similarly for bbrr, for rbb: rbb l b, 2 ≡ 2 -classes, similarly for brr, rrb, bbr are similar to rbb, rb, just one ≡ 2 -class, and br is similar.
This gives a total of 90 ≡ 2 -classes in which two colours appear. Note that the optimal representative of each class is unique, except when there is a 'middle' section in which both colours appear. For instance, rrbrbrbb ≡ 2 rrbbrrbb, though each is of optimal length. If m = 3, there are 26 possible m-configurations, of which all but four fulfil the stipulations of Lemma 2.1 (the four which do not are given by x 1 ≤ y 3 < y 2 < x 2 < x 3 ≤ y 1 ). To list even these is quite laborious, and when their possible colourings are taken into account, as well as the possible insertions, it is seen that the list increases dramatically over the case m = 2. For instance, for the m-configuration x 1 < x 2 < x 3 < y 3 < y 2 < y 1 there are 36 ways of colouring the points, and for the rbg (red/blue/green) colouring of x i and y i points, the ≡ 2 -classes are of the forms rr i 1 br i 2 b j 2 gr i 3 b j 3 g k 3 rb j 4 g k 4 bg k 5 g where the indices are all 0 or 1, giving 2 9 possibilities, so even for this case there are 36 × 2 9 = 18432 ≡ 2 -classes.
We remark that it would also be possible to find an algorithm to reduce to an optimal form inductively on the number of colours. If we define the subsets L, R, and M of A, for 'left', 'right', and 'middle', by L = (−∞, x m ), R = (y m , ∞), and M = A \ (L ∪ R) in the above notation, then the induction would be based on the fact that each of L and R exhibit only m − 1 colours. There are however some (minor) complications in the case where L and R overlap, so the method presented just before the statement of Theorem 2.4 is preferable. Now the easiest way to demonstrate that a finite string is optimal in its ≡ 2 -class is to show that all its points have distinct 1-characters (then appeal to Theorem 1.1), and in fact this suffices for all 90 strings for m = n = 2, as one sees by inspection. The same holds for any number of colours (though not with greater values of n, as we see in the next section).

Theorem 2.5
For any m, no ≡ 2 -optimal m-coloured string realizes the same 1-character more than once.
Proof Suppose on the contrary that (A, <, F ) is ≡ 2 -optimal but a < b realize the same 1-character. We show that A ≡ 2 A \ {b}, contradicting optimality of A. This is similar to the Cutting Lemma, Lemma 1.2. We just need to show that A and A = A \ {b} realize the same 1-characters, and the result then follows by appeal to Theorem 1.1. First let x = b, and we show that x realizes the same 1-character in A and A . The 1-character is determined by the colour of the point, and by which colours occur on its left and right, so we have to show that A <x and A <x exhibit the same colours, and so do A >x and A >x . If x < b then A <x = A <x , and the colours in A >x and A >x could only possibly differ on F (b), but as A >a ≡ 1 A >b , and b ∈ A >a , there is a point > a coloured F (b), and therefore also a point > b (and hence > x) coloured F (b). A similar argument applies if x > b (using A <a ≡ 1 A <b ).
If x = b, we show that it realizes the same 1-character in A as a does in A . Since a and b have the same colour, we just have to show that A <b and A <a exhibit the same colours, and so do A >b and A >a . Now A <b ≡ 1 A <a = A <a and A >b ≡ 1 A >a ≡ 1 A >a . In all cases, ≡ 1 means that the two sets exhibit the same sets of colours, and this holds for A >a and A >a because the only point of difference between these sets is b, but because A >a ≡ 1 A >b , and b ∈ A >a , there must be a point > b coloured by F (b).

3-Equivalence Classes
To help analyze the behaviour of strings up to 3-equivalence, we introduce various labelled directed graphs to keep track of the transitions between 2-characters as we pass through the string. The basic idea is that if a 1 a 2 a 3 . . . a k is a string over an alphabet of m colours, then a node of the digraph will be taken to be a 2-character of the form x, y, c and we include a directed edge from x 1 , y 1 , c 1 to x 2 , y 2 , c 2 provided that for some strings u 1 , v 1 , u 2 , v 2 , This corresponds to the fact that the string a 1 a 2 a 3 . . . a k gives rise to a path In practice, retaining all of both co-ordinates is too cumbersome, and we use an abbreviated string which at any rate for points in the 'middle', suffices to describe the 2-character. In [1] Theorem 2.3, a very crude upper bound for g(m, n), the maximum of the lengths of optimal representatives of finite m-coloured strings under ≡ n , is given. The object here is to obtain some lower bounds on g(m, 3), by producing as long optimal strings as possible. The easiest way in which optimality can be assured is to arrange that all points have distinct 2-characters. That this is not necessary for optimality is later remarked (by contrast with Theorem 2.5 for ≡ 2 ).
To illustrate this, we show how for two colours we can construct an ≡ 3 -optimal string A of length 70. Certain features of this seem to depend heavily on the specifics of this case, and we are unsure of how to generalize. The string is composed of three sections, L, M, and R (of lengths 19, 32, and 19) with L < M < R. The idea is that, by the time we get to the middle section M, the 2-character has sufficiently 'settled down', to enable us to handle substrings rather uniformly. To describe what L, M, and R are in this case, we emphasize that this subdivision applies just to the 3-move case. The subdivision used in Section 2 for the 2-move case is also used, but on the left and right subsets, where inductively, and using Theorem 1.1 we need to look at 2-characters. To avoid confusion, we use L, M, and R to stand only for the subdivision of the whole string, and if we need to refer to the subdivision of a left or right segment, we use the terms 'left', 'middle', and 'right'. The definition here is that M comprises all those points whose left and right 2-characters both themselves have non-empty middle sections. Since it is clear that M so defined is convex, we can then let L and R be the subsets of its complement which are to its left, right respectively.
Note that we adopt the notation from the previous section, usually with respect to the initial segment L, or more generally, initial segments of the form A <a for a ∈ M ∪ R, (and this will be applied 'similarly' to R, but without corresponding details). That is, x i will stand for the least point of this set such that exactly i colours are used up to (and including) it, and y j is the greatest point of the same set such that exactly j colours are used to its right. This perspective enables us to give a more explicit description of what M is. It comprises exactly those points a such that in A <a , x m < y m , and (x m , y m ) = ∅, and similarly for A >a .
Since we shall take L = rrrrrrbbbbbbrbbbbbr, the discussion given in the previous section shows that for any a in M, [A <a ] 2 begins rrb, and it ends with rb, rbb, br, or brr (since we must have x 1 < x 2 < y 2 < y 1 ), and as (x 2 , y 2 ) contains points of both colours, the middle may be taken as rb. This means that we can essentially describe the left 2-character of a point a in M by the ending of A <a (and its colour). Although the ending will actually have length 2 or 3, we can tell what it is just from its last two points. Taking [A >a ] 2 into account in a similar way, a point is entirely characterized by just 5 entries, two on the left, two on the right, and the colour of a in the middle. The following general lemma is invoked here just for m = 2 and k = 5, but may be more widely applicable. Proof This method, using an eulerian circuit, was pointed out by P J Cameron.
Given this lemma, we can form a binary string of length 32, such that cyclically ordered, every 5-element string arises exactly once, and this may be taken explicitly as To form our sequence of length 70, we precede M by L = rrrrrrbbbbbbrbbbbbr and succeed it by R = rbbbbbrbbbbbbrrrrrr (which is L in reverse, easing some verifications). Let us write this string as a 1 a 2 a 3 . . . a 70 . We verify that all 70 points have distinct 2-characters.
First we can see that for every point a of M ∪ R, in A <a , x 1 = a 1 , x 2 = a 7 , y 1 = a i and y 2 = a j where i ≥ 19 and i > j ≥ 18, so its left 2-character begins rrbrb, and ends rb, rbb, br, or brr. However, if a ∈ L, A <a has the form r i b j rb k , r i b j r, r i b j , or r i for some i, j, k, so its left 2-character is not of this form. By symmetry, we can see that no point of L ∪ M shares a right 2-character with a point of R. We now treat each of L and M individually (and R is similar to L).
Finally, we can see that all points of M have distinct 2-characters since they are midpoints of distinct 5-element strings-notice that we have arranged things so that a 18 a 19 a 20 a 21 = a 50 a 51 a 52 a 53 , which means that distinctness of the 5-element strings persists even at the 'ends'.
This shows that g (2, 3), which is the maximum of the lengths of optimal representatives of finite 2-coloured strings under ≡ 3 is at least 70. The upper bound given in [1] is clearly absurdly high, but even so, 70 is a big increase on the optimal length for m = n = 2 which is 8.
Let us see that this is the best we can do by these methods, in which optimality is guaranteed by distinctness of the 2-characters. We can always subdivide a given 2-coloured finite linear order into 3 sections, L, M, and R, where in M, both left and right 2-characters have rb as 'middle'. Clearly M is convex, so we may take L and R to be the sets of points to the left, right of M respectively. If a ∈ M, then the left and right 2-characters of a must have at least 6 entries, and as in the discussion above, the 2-characters of the points of M are entirely determined by the 5-element strings of which they are mid-points. Hence |M| ≤ 32. Now consider what L can be. Without loss of generality, suppose it begins with r. If it has an initial segment of the form r i b j r k b l r p b q with positive exponents, then the next point does not lie in L, and similarly, q = 1 (since otherwise the final point does not lie in L), and by similar arguments, j = l = 1, giving L = r i br k br p b. If i ≥ 7 then the fourth and fifth points of A realize the same 2-characters, contrary to assumption. Hence i ≤ 6. Similarly, k, p ≤ 6. If k = 6, then 2-characters are repeated for the middle two entries in that block. Hence k ≤ 5, and similarly p ≤ 5. Hence |L| ≤ 6 + 5 + 5 + 3 = 19 (and one can check that rrrrrrbrrrrrbrrrrrb is possible).
If L = r i b j r k b l r p it again follows that j = l = 1, i ≤ 6, k, p ≤ 5, so |L| ≤ 18. If L = r i b j r k b l then j = 1 or k = 1, and again, i ≤ 6, l ≤ 5, and also j, k ≤ 5, so |L| ≤ 6 + 1 + 5 + 5 = 17. If L = r i b j r k then i ≤ 6, j ≤ 7, k ≤ 6 so |L| ≤ 19. If L = r i b j then |L| ≤ 14 and if L = r i then |L| ≤ 7.
It follows similarly that |R| ≤ 19, and hence |A| ≤ 19 + 32 + 19 = 70. Finally we remark that in ≡ 3 -optimal strings, 2-characters may be repeated, and using this we are able to construct a longer ≡ 3 -optimal 2-coloured string. We first give a small example. Consider A = rbrbrbrbrbrbrbr, which has length 15, is a palindrome (reading the same forwards and backwards), and whose 7th and 9th entries realize the same 2character (though apart from this, all 2-characters are distinct). To see that A is ≡ 3 -optimal, suppose that B ≡ 3 A, and we show that B has length at least 15. Since A realizes 14 2characters, so does B, and hence it has length at least 14. Now A realizes the 2-character rbrb, brrbbr, r , so B must realize this as well, and as rbrb lies in a singleton ≡ 2 -class, B begins rbrbr. Similarly, B realizes rbrbr, rbrbbr, b , so as the ≡ 2 -class of rbrbr is {rbr p br : p ≥ 1}, B begins with rbr p brb for some p ≥ 1. Since B begins with rbrbr it follows that p = 1, and that B begins rbrbrb. Similarly B ends with brbrbr. The other two 2-characters realized by B are χ 1 = rbrbrb, brrbbr, r and χ 2 = rbrbbr, rbrbbr, b .
Since B <b 7 = rbrbrb, b 7 must realize χ 1 , and so b 7 = r. Similarly, the 7th point from the right realizes χ 1 and is r. Since |B| ≥ 14, these two points are distinct, and as B also realizes χ 2 , there must be another point between them, so B has length at least 15.
We now present a 3-optimal string of length 74 having the same L and R as in the example given of length 70, but with longer M: The subdivision into L, M, R is indicated. By the previous discussion, this must have repeated 2-characters in M. We let L = br (the end of L) and R = rb (the beginning of R). To verify that A is ≡ 3 -optimal we first note that by the previous arguments, the 2-characters of all elements of L ∪ R are uniquely determined and different from all those occurring in M, so any 3-equivalent string must begin and end in this way. Now A determines a path through the digraph D having as vertices strings of length 4 over {r, b} arising as convex subsets of L ∪ M ∪ R , and the edges of D which may be viewed as quintuples (x 1 , x 2 , x 3 , x 4 , x 5 ) (though they are 'officially' pairs of overlapping quadruples (x 1 , x 2 , x 3 , x 4 ) → (x 2 , x 3 , x 4 , x 5 ) as in the proof of Lemma 3.1) tell us precisely which 2-characters are realized in the path. So although some will now be repeated, we have to check that no shorter path through D can realize precisely the same 2-characters. One checks that D has 28 edges. Suppose therefore that P is a path through D traversing precisely the same edges (though not necessarily the same number of times). We shall show that P has length at least 36. We may also view P as a 'multi-digraph' in which the multiplicities with which the edges of D arise in P are also recognized, and in this sense we may talk of the 'in-degree' in(x) and 'out-degree' out (x) of a vertex x. By the usual theory of eulerian paths, if i and f are the initial and final vertices of P , then i = f implies that out (i) = in(i) + 1 and in(f ) = out (f ) + 1; all other vertices x, and also i and f if they are equal, satisfy in(x) = out (x). Furthermore, since L = br, i must equal brrr, brrb, brbr, or brbb, and similarly f must equal rrrb, rbrb, brrb, or bbrb. The digraph D is shown in Fig. 1. Now we note that the vertex rrbb is therefore an internal vertex of P , so has equal inand out-degrees in P . Since its out-neighbours (in P ) rbbr and rbbb are distinct, it follows that its in-degree is at least 2. Similarly, in(rbrb), out (rbrr), out (brbr), out (bbrr) ≥ 2. In D, each of rrbb and rbrb has only one in-neighbour, so the corresponding edges in P must each appear at least twice. Similarly for the out-neighbours of rbrr, brbr, and bbrr. This already assures us that P has length at least 33. But now we know that in(rbrr) ≥ 3, and as rbrr is internal, also out (rbrr) ≥ 3. Similarly, in(rrbr), in(rrrb) ≥ 3. Since the extra edges thereby assured and contributing to out (rbrr), in(rrbr), and in(rrrb) must be rbrr → brrr, rrrb → rrbr or brrb → rrbr, and rrrr → rrrb or brrr → rrrb, this gives at least 3 extra edges in P , showing that it has length at least 36, as desired.

Future Work
We have really only scratched the surface of this topic, in [1] and here, and a great deal more effort would be required to understand fully the structure of ≡ n -optimal strings for all n, and for all colour set sizes. The method in the final example just given seems very laborious, merely to increase the length by 4, and that is only for two colours and n = 3. Undoubtedly there will be longer examples, requiring more careful checking. We remark that we also believe that one can construct finite strings having no ≡ 3 -equivalent optimal substring. The idea would be to find a string as above obtained by modifying the length 70 example, but such that this time, the path is not optimal, but that any optimal path traversing the same edges as D would have to have them in a different order, so that the optimal string would not actually be a substring of the original one.
We conclude by illustrating the specific problem which applies even to increase the number of colours by 1. We would like to apply the same kind of analysis as for the case of 2 colours, which relied on the subdivision of the string A into L, M, R. We can still do this, and for long enough strings there will be a non-trivial M, comprising those points such that the left and right 2-characters both themselves have a 'middle' in which all 3 colours, red, blue, and green, appear. Since this time 2-characters have length up to 15 (see [1]) the lengths of L and R will usually be a lot longer. The main problem comes about in M however. Last time we were able to pin down the 2-character of a point in M by a sequence of length 5. This time though, the right end of the left 2-character may be rbg, rbgg, rbbgg, rbgbg, rbggb, as well as others obtained by permuting the three colours, and we seem to need the final 4 entries at least to tell which is which, and so we'd have to have sequences of length 7, 8, or 9, in place of 5, that is, not constant. Thus the trick of using an Euler circuit does not seem to apply.