On optimal representatives of finite coloured linear orders

Two structures A and B are n-equivalent if player II has a winning strategy in the n-move Ehrenfeucht-Fraisse game on A and B. We extend earlier results about n-equivalence for finite coloured linear orders, describing an algorithm for reducing to canonical form under 2-equivalence, and concentrating on the cases of 2 and 3 moves.


Introduction
In [4] we studied the equivalence of finite coloured linear orders up to level n in an Ehrenfeucht-Fraïssé game, written as ≡ n , which means that player II has a winning strategy in this game, as well as making some remarks about the infinite case. We gave some bounds for the minimal representatives in the finite case, and the infinite case for up to 2 moves. These results were extended in [5] to all coloured ordinals, in the monochromatic case giving a precise list of optimal representatives, and in the coloured case giving bounds.
In this paper we return to the finite case, and extend the work of the first paper, by improving the bounds in some instances, and throwing further light on uniqueness of representatives. First we briefly recall the required definitions. A coloured linear ordering is a triple (A, <, F ) where (A, <) is a linear order and F is a mapping from A onto a set C of 'colours'. We just write A instead of (A, <, F ) provided that the ordering and colouring are clear. In the n-move Ehrenfeucht-Fraïssé game on coloured linear orders A and B players I and II play alternately, I moving first. On each move I picks an element of either structure, and II responds by choosing an element of the other structure. After n moves, I and II between them have chosen elements a 1 , a 2 , . . . , a n of A, and b 1 , b 2 , . . . , b n of B, and player II wins if the map taking a i to b i for each i is an order and colour-isomorphism, and player I wins otherwise. We say that A and B are n-equivalent and write A ≡ n B, if II has a winning strategy. Then ≡ n is an equivalence relation having just finitely many n-equivalence classes. An optimal representative is a member of an equivalence class of least possible length.
In [4] we gave upper bounds for the lengths of optimal representatives of ≡ nclasses of m-coloured finite linear orders. Only for the case n = 2 were these bounds exact. We return to this case, describing explicitly the classification of finite mcoloured linear orders up to ≡ 2 -equivalence (based on the idea of a 'T -configuration' introduced in [4]). From this we are able to read off which equivalence classes are finite or infinite, and provide an algorithm for determining an optimal representative corresponding to any given finite m-coloured linear order. We also show that a finite coloured linear order is ≡ 2 -optimal if and only if each 1-character (see below for the definition) appears at most once.
The problem for more than 2 moves seems to be quite hard, so we concentrate on the case of 3 moves. The idea is that using a key inductive lemma from [4], we need to understand better how the 2-characters behave, and that is the reason for re-examining the case n = 2 in more detail.
Next we recall the notion of 'character' from [4], and the main result about characters. Assume that we have found representatives for the n-equivalence classes of certain m-coloured linearly ordered sets. We write the representative for A as [A] n . In a coloured linear order A, the n-character of a ∈ A having colour c is the ordered pair [A <a ] n , [A >a ] n (where A <a = {x ∈ A : x < a} and A >a = {x ∈ A : x > a}). We let ρ c n (A) = { [A <a ] n , [A >a ] n : a ∈ A is c-coloured}. Here we shall always include the colour as part of the n-character of a, in which case we write it as [A <a ] n , [A >a ] n c (formally this would be an ordered triple).
for all c ∈ C. We need the following 'Cutting lemma' from [4].

Lemma 1.2. Let
A be a finite m-coloured linear order and let a and b be elements of A such that a < b satisfying the following conditions: (i) a and b determine the same n-character, (ii) for every x ∈ A with a < x ≤ b, there is y ≤ a having the same n-character as x.
Note that this may be applied in a trivial case, namely, that no two consecutive points of an ≡ n+1 -optimal finite string can have equal n-characters.
It is clear from Theorem 1.1 that if in an m-coloured linear order, no two points have the same n-character, then the ordering is ≡ n+1 -optimal, meaning that it is not (n + 1)-equivalent to any shorter ordering. Based on this, we present a construction of a finite 2-coloured linear order of length 70 in which all points have distinct 2-characters, and which is therefore ≡ 3 -optimal, and show that 70 is the greatest possible number in which all 2-characters are distinct. We also construct a finite coloured ≡ 3 -optimal linear order of length 74, in which 2-characters must therefore repeat. It should possible to find longer examples, but the details would be quite tedious, so giving one of this length is good enough to illustrate the idea. This casts some light on the hypothesis required for the 'cutting lemma' (that is, what it says isn't that we can reduce the length just based on repetition of characters-more is required about what happens in between).
The typical case we have in mind is that when searching for optimal representatives, we start with a possibly long coloured order, and successively reduce it by removing pieces, retaining n-equivalence, till it becomes optimal. The proof of [4] is too indirect to guarantee immediately that the final ordering is a subordering of the one we start with. We therefore extend the material of [4] by showing that for 2-equivalence at any rate, we can guarantee that the optimal representative is contained in the original one; we present an algorithm for achieving this. We believe that this is false for n = 3, and in section 3 explain why.
With regard to the general case, but particularly applied to n = 3, we use directed graphs to help analyze n-equivalence. One method would be to take (n−1)characters themselves as vertices of the directed graph, with an arrow going from X 1 , Y 1 c1 to X 2 , Y 2 c2 if for some representatives x 1 , x 2 , y 1 , y 2 of X 1 , Y 1 , X 2 , Y 2 , x 1 c 1 ≡ n−1 x 2 and c 2 y 2 ≡ n−1 y 1 , where these are the strings obtained from x 1 , y 2 by adding a c 1 -coloured point on the right, a c 2 -coloured point on the left respectively. The idea is that in scanning a (long) word from left to right, at each point we can view its (n − 1)-character to left and right, and see how this varies. In practice in what we present here for m = 2, n = 3, we focus just on the 'middle' section of the given string, in which case a simplified directed graph gives all the information we require.

Classification of 2-equivalence classes
In this section we give a lot more detail about the 2-equivalence classes of finite coloured linear orders. In [4] we established the precise value (m 2 + 2m) of the least upper bound of the lengths of the optimal representatives of ≡ 2 -classes. Here we are able, using similar ideas, to give an explicit list of all the ≡ 2 -classes, from which we can read off, for instance, the length of the optimal representative of each class, and also note which classes are finite or infinite. The key idea here is to use the notion of 'T -configuration' which was introduced in [4].
We fix m as the number of colours. A T -configuration is then defined to be a linear order of the form T = {x i : 1 ≤ i ≤ m} ∪ {y i : 1 ≤ i ≤ m} in which x 1 < x 2 < . . . < x m and y 1 > y 2 > . . . > y m , and x 1 and y 1 are the least and greatest members of T respectively. Here all the x i are therefore distinct, and so are the y i , but it is not ruled out that x i = y j for certain i and j. Each T -configuration therefore has size between m and 2m. If (A, ≤, F ) is a finite coloured linear order having m colours, then there is an associated T -configuration, which is the linear order induced on z ≤ x} has i elements, and y i is the greatest point y of A such that {F (z) : z ∈ A, z ≥ y} has i elements. Under these circumstances, the T -configuration becomes coloured. However, the same Tconfiguration may be coloured in several different ways. We remark that not all T -configurations are associated with a coloured linear order. The following lemma explains when this happens.
Proof. First to check the necessity of the given condition, suppose that {x i : 1 ≤ i ≤ m} ∪ {y i : 1 ≤ i ≤ m} is the T -configuration arising from the coloured linear order (A, ≤, F ), and let i + j ≤ m + 1. Let k be greatest such that x i ≤ y k . Then {y l : l > k} are distinctly coloured points lying in (−∞, x i ), which exhibits i − 1 colours. Hence m − k ≤ i − 1, so m + 1 ≤ i + k. We deduce that i + j ≤ i + k and hence j ≤ k, so that x i ≤ y j .
Conversely, assuming the given condition holds, let the T -configuration T = {x i : 1 ≤ i ≤ m} ∪ {y i : 1 ≤ i ≤ m} be given, and we have to find a coloured linear order (A, ≤, F ) such that T is the associated T -configuration. We take A = T , and have to show how the points can be coloured so that x i is the least point x of A such that {F (z) : z ∈ A, z ≤ x} has i elements, and y i is the greatest point y of A such that {F (z) : z ∈ A, z ≥ y} has i elements. Let us start by colouring the x i by distinct colours. Clearly this ensures that x i is the least point such that {F (x k ) : x k ∈ A, x k ≤ x i } has i elements. We have to colour the y j so that no member of {F (z) : z ∈ A, z ≤ x i } has a 'new' colour. We assign colours successively to y m , y m−1 , . . ., y 1 according to which of the sets {x 1 }, (x 1 , x 2 ), {x 2 }, (x 2 , x 3 ), . . ., {x m }, (x m , ∞) they lie in. Given i, let j be the least such that y j < x i+1 (if any). Then by hypothesis, i + 1 + j ≤ m + 1, so j > m − i. Hence there are at most i values of j such that y j < x i+1 .
The colouring is now given as follows. If y j = x i then we let F (y j ) = F (x i ). Otherwise consider colouring all the y j s which lie in (x i , x i+1 ). By the remark just made, there are at most i values of j such that y j < x i+1 , and there are i colours available for {y j : y j < x i+1 }. We have so far used |{j : y j ≤ x i }| of these colours, so the number remaining is i − |{j : y j ≤ x i }| ≥ |{j : y j < x i+1 }| − |{j : y j ≤ x i }| = |{j : x i < y j < x i+1 }|, and these points are coloured in any way using the available colours.
The construction has explicitly ensured that for each i, x i is the least point such that (−∞, x i ] is coloured by i colours. To verify the corresponding condition for y i , note that there are certainly exactly i values of k ≤ i such that y k ≥ y i , and these points are all coloured by distinct colours. Suppose that x j ≥ y i . Then as there are m colours, and all y k points are distinctly coloured, there is k such that To specify a finite coloured order up to 2-equivalence, we need to know in addition what colouring T receives (and then call this a coloured T -configuration, though we do not indicate this explicitly in the notation), and which colours arise as the colours of points lying between any two consecutive members of T , and we write the set of colours between u and v as g(u, v). We write C T,g for the set of all finite coloured orders such that T is the associated coloured T -configuration and colours between the points are given by g. Note that not all possible sets of colours are possible for g(u, v) and they will be constrained by the x i and y j . If for ease we write x m+1 = ∞ and y m+1 = −∞ (not coloured) then a point with colour c can be inserted in ( Proof. This relies on Theorem 1.1, which tells us that A ≡ 2 A ′ if and only if they exhibit the same 1-characters. Let be the coloured T -configurations associated with A and A ′ , and suppose first that A ≡ 2 A ′ . Thus A and A ′ exhibit the same 1-characters. Now by definition of Similarly for y j and y ′ j . Next we have to see that and similarly for <. Suppose that x i ≤ y j (x i < y j respectively). Then x i has at least j − 1 colours to the right (at least j respectively), and as it realizes the same character as . We deduce that A and A ′ realize the same coloured T -configurations. To see that they realize the same functions g, let u < v be consecutive members of Then the left and right 1-characters of each member of (u, v) are {F (z) : z ≤ u} and {F (z) : z ≥ v} respectively, and furthermore, these characters are not realized by any other members of A. Precisely these same left and right characters are realized in (u ′ , v ′ ), and since the only extra ingredient required to specify the character is the colour of the point, it follows that exactly same set of colours is realized in Conversely, supposing that A and A ′ both lie in the same C T,g , we see that they both realize the same 1-characters, so are 2-equivalent. Next we give an algorithm for determining an optimal member of the 2-equivalence class of a finite coloured linear order A. It would be possible to do this inductively on the number of colours, and since we shall require them later anyway, we define the subsets L, R, and M of A, for 'left', 'right', and 'middle'. A point lies in L if there are fewer than m colours occurring to its left, and is in R if there are fewer than m colours occurring to its right. The remainder (if any) is M . Thus in the above notation, L = (−∞, x m ) and R = (y m , ∞). The induction would be based on the fact that each of L and R exhibit only m − 1 colours. There are some (minor) complications in the case where L and R overlap however, so the following method, based on Theorem 2.2, is preferable.
From A we first evaluate x i and y j . Then we replace each interval (u, v) by one in which each of its colours only arises once. This leads to the following result. If m = 1 with colour r, then there are two possible T -configurations, with x 1 = y 1 or x 1 < y 1 . The former gives us just a singleton r (since there is no interval of consecutive points into which new elements can be inserted), and the latter rr which is a singleton ≡ 2 -class, and rrr, which lies in the infinite If m = 2, these are the possible T -configurations, with the corresponding singleton ≡ 2 -classes given: x 1 < x 2 < y 2 < y 1 , rbrb, rbbr, brbr, brrb, x 1 < x 2 = y 2 < y 1 , rbr, brb, x 1 < y 2 < x 2 < y 1 , rrbb, bbrr, x 1 = y 2 < x 2 < y 1 , rbb, brr, x 1 < y 2 < x 2 = y 1 , rrb, bbr, Including the allowed insertions, where we write r k for an arbitrary sequence of k rs (k ≥ 0), similarly b l , and w(r, b) an arbitrary string of rs and bs, this gives rise to the following list for rbrb: rr k bw(r, b)rb l , 16 ≡ 2 -classes (two options for each of k and l, and four for w(r, b)), similarly for brbr, rbbr and brrb, for rbr: rr k br l r, 4 ≡ 2 -classes, similarly for brb, for rrbb: rr k rbb l b, 4 ≡ 2 -classes, similarly for bbrr, for rbb: rbb l b, 2 ≡ 2 -classes, similarly for brr, rrb, bbr are similar to rbb, rb, just one ≡ 2 -class, and br is similar. This gives a total of 90 ≡ 2 -classes in which two colours appear. Note that the optimal representative of each class is unique, except when there is a 'middle' section in which both colours appear. For instance, rrbrbrbb ≡ 2 rrbbrrbb, though each is of optimal length. If m = 3, there are 26 possible T -configurations, of which all but four fulfil the stipulations of Lemma 2.1 (the four which do not are given by x 1 ≤ y 3 < y 2 < x 2 < x 3 ≤ y 1 ). To list even these is quite laborious, and when their possible colourings are taken into account, as well as the possible insertions, it is seen that the list increases dramatically over the case m = 2. For instance, for the Tconfiguration x 1 < x 2 < x 3 < y 3 < y 2 < y 1 there are 36 ways of colouring the points, and for the rbg colouring of x i and y i points, the ≡ 2 -classes are of the forms rr i1 br i2 b j2 gr i3 b j3 g k3 rb j4 g k4 bg k5 g where the indices are all 0 or 1, giving 2 9 possibilities, so even for this case there are 36 × 2 9 = 18432 ≡ 2 -classes.
We remark that the easiest way to demonstrate that a finite string is optimal in its ≡ 2 -class is to show that all its points have distinct 1-characters (then appeal to Theorem 1.1), and in fact this suffices for all 90 strings for m = n = 2, as one sees by inspection. The same holds for any number of colours (though not with greater values of n, as we see in the next section).
Theorem 2.5. For any m, no ≡ 2 -optimal m-coloured string realizes the same 1character more than once.
Proof. Suppose on the contrary that (A, <, F ) is ≡ 2 -optimal but a < b realize the same 1-character (and have the same colour). We show that A ≡ 2 A \ {b}, contradicting optimality of A. This is similar to the Cutting Lemma, Lemma 1.2. We just need to show that A and A ′ = A \ {b} realize the same 1-characters (with colours). For this we note that if x = b, then x realizes the same 1-character in A and A ′ , and if x = b then x realizes the same 1-character in A as a does in A ′ . In each case, the colours of x and its replacement are equal, so as 1-characters are entirely determined by the sets of colours occurring on left and right, we just need to look at the colours occurring in A <x , A ′<x , A >x , A ′>x , and in the second case, If x < b then A <x = A ′<x , and the colours in A >x and A ′>x could only possibly differ on F (b), but as A >a ≡ 1 A >b , and b ∈ A >a , there is a point > a coloured F (b), and therefore also a point > b (and hence > x) coloured F (b). A similar argument applies if x > b (using A <a ≡ 1 A <b ). Finally, if x = b then we can see that A <b ≡ 1 A <a = A ′<a and A >b ≡ 1 A >a ≡ 1 A ′>a (since these last two exhibit the same colours).

3-equivalence classes
To help analyze the behaviour of strings up to 3-equivalence, we introduce various labelled directed graphs to keep track of the transitions between 2-characters as we pass through the string. The basic idea is that if a 1 a 2 a 3 . . . a k is a string over an alphabet of m colours, then a node of the digraph will be taken to be a 2-character of the form x, y c and we include a directed edge from x 1 , y 1 c1 to x 2 , y 2 c2 provided that for some strings This corresponds to the fact that the string a 1 a 2 a 3 . . . a k gives rise to a path In practice, retaining all of both co-ordinates is too cumbersome, and we use an abbreviated string which at any rate for points in the 'middle', suffices to describe the 2-character. The object here is to obtain some lower bounds on g(m, 3) in the notation of [4], by producing as long optimal strings as possible. The easiest way in which optimality can be assured is to arrange that all points have distinct 2characters. That this is not necessary for optimality is later remarked (by contrast with Theorem 2.5 for ≡ 2 ).
To illustrate this, we show how for m = 2 we can construct an optimal string A of length 70. Certain features of this seem to depend heavily on the specifics of this case, and we are unsure of how to generalize. The string is composed of three sections, L, M , and R (of lengths 19, 32, and 19) with L < M < R. The idea is that, by the time we get to the middle section M , the 2-character has sufficiently 'settled down', to enable us to handle substrings rather uniformly. To describe what L, M , and R are in this case, we emphasize that this subdivision applies just to the 3-move case. The subdivision used in section 2 for the 2-move case is also used, but on the left and right subsets, where inductively, and using Theorem 1.1 we need to look at 2-characters. To avoid confusion, we use L, M , and R to stand only for the subdivision of the whole string, and if we need to refer to the subdivision of a left or right segment, we use the terms 'left', 'middle', and 'right'. The definition here is that M comprises all those points whose left and right 2-characters both themselves have non-empty middle sections. Since it is clear that M so defined is convex, we can then let L and R be the subsets of its complement which are to its left, right respectively.
Since we shall take L = rrrrrrbbbbbbrbbbbbr, the discussion given in the previous section shows that for any a in M , [A <a ] 2 begins rrb, and it ends with rb, rbb, br, or brr (since we must have x 1 < x 2 < y 2 < y 1 ), and as (x 2 , y 2 ) contains points of both colours, the middle may be taken as rb. This means that we can essentially describe the left 2-character of a point a in M by the ending of A <a (and its colour). Although the ending will actually have length 2 or 3, we can tell what it is just from its last two points. Taking [A >a ] 2 into account in a similar way, a point is entirely characterized by just 5 entries, two on the left, two on the right, and the colour of a in the middle. The following general lemma is invoked here just for m = 2 and k = 5, but may be more widely applicable. Proof. This method, using an eulerian circuit, was pointed out by P J Cameron.
Given this lemma, we can form a binary string of length 32, such that cyclically ordered, every 5-element string arises exactly once, and this may be taken explicitly as M = rbrbrrbbbrbrbbrbbbbbrrrrrbrrrbbr. To form our sequence of length 70, we precede M by L = rrrrrrbbbbbbrbbbbbr and succeed it by R = rbbbbbrbbbbbbrrrrrr (which is L in reverse, easing some verifications). Let us write this string as a 1 a 2 a 3 . . . a 70 . We verify that all 70 points have distinct 2-characters.
First we can see that for every point a of M ∪ R, in A <a , x 1 = a 1 , x 2 = a 7 , y 1 = a i and y 2 = a j where i ≥ 19 and i > j ≥ 18, so its left 2-character begins rrbrb, and ends rb, rbb, br, or brr. However, if a ∈ L, A <a has the form r i b j rb k , r i b j r, r i b j , or r i for some i, j, k, so its left 2-character is not of this form. By symmetry, we can see that no point of L ∪ M shares a right 2-character with a point of R. We now treat each of L and M individually (and R is similar to L).
Finally, we can see that all points of M have distinct 2-characters since they are midpoints of distinct 5-element strings-notice that we have arranged things so that a 18 a 19 a 20 a 21 = a 50 a 51 a 52 a 53 , which means that distinctness of the 5-element strings persists even at the 'ends'.
This shows that g (2, 3), which is defined in [4] to be the maximum of the lengths of optimal representatives of finite 2-coloured strings under ≡ 3 is at least 70. The upper bound given in [4] is clearly absurdly high, but even so, 70 is a big increase on the optimal length for m = n = 2 which is 8.
Let us see that this is the best we can do by these methods, in which optimality is guaranteed by distinctness of the 2-characters. We can always subdivide a given 2-coloured finite linear order into 3 sections, L, M , and R, where in M , both left and right 2-characters have rb as 'middle'. Clearly M is convex, so we may take L and R to be the sets of points to the left, right of M respectively. If a ∈ M , then the left and right 2-characters of a must have at least 6 entries, and as in the discussion above, the 2-characters of the points of M are entirely determined by the 5-element strings of which they are mid-points. Hence |M | ≤ 32. Now consider what L can be. Without loss of generality, suppose it begins with r. If it has an initial segment of the form r i b j r k b l r p b q with positive exponents, then the next point does not lie in L, and similarly, q = 1 (since otherwise the final point does not lie in L), and by similar arguments, j = l = 1, giving L = r i br k br p b. If i ≥ 7 then the fourth and fifth points of A realize the same 2-characters, contrary to assumption. Hence i ≤ 6. Similarly, k, p ≤ 6. If k = 6, then 2-characters are repeated for the middle two entries in that block. Hence k ≤ 5, and similarly p ≤ 5. Hence |L| ≤ 6 + 5 + 5 + 3 = 19 (and one can check that rrrrrrbrrrrrbrrrrrb is possible).
If L = r i b j r k then i ≤ 6, j ≤ 7, k ≤ 6 so |L| ≤ 19. If L = r i b j then |L| ≤ 14 and if L = r i then |L| ≤ 7. It follows similarly that |R| ≤ 19, and hence |A| ≤ 19 + 32 + 19 = 70. Finally we remark that in ≡ 3 -optimal strings, 2-characters may be repeated, and using this we are able to construct a longer ≡ 3 -optimal 2-coloured string. We first give a small example. Consider A = rbrbrbrbrbrbrbr, which has length 15, is a palindrome (reading the same forwards and backwards), and whose 7th and 9th entries realize that same 2-character (though apart from this, all 2-characters are distinct). To see that A is ≡ 3 -optimal, suppose that B ≡ 3 A, and we show that B has length at least 15. S ince A realizes 14 2-characters, so does B, and hence it has length at least 14. Now A realizes the 2-character rbrb, brrbbr r , so B must realize this as well, and as rbrb lies in a singleton ≡ 2 -class, B begins rbrbr. Similarly, B realizes rbrbr, rbrbbr b , so as the ≡ 2 -class of rbrbr is {rbr p br : p ≥ 1}, B begins with rbr p brb for some p ≥ 1. Since B begins with rbrbr it follows that p = 1, and that B begins rbrbrb. Similarly B ends with brbrbr. The other two 2-characters realized by B are χ 1 = rbrbrb, brrbbr r and χ 2 = rbrbbr, rbrbbr b . Since B <b7 = rbrbrb, b 7 must realize χ 1 , and so b 7 = r. Similarly, the 7th point from the right realizes χ 1 and is r. Since |B| ≥ 14, these two points are distinct, and as B also realizes χ 2 , there must be another point between them, so B has length at least 15.
We now present a 3-optimal string of length 74 having the same L and R as in the example given of length 70, but with longer M :  The subdivision into L, M , R is indicated. By the previous discussion, this must have repeated 2-characters in M . We let L ′ = br (the end of L) and R ′ = rb (the beginning of R). To verify that A is ≡ 3 -optimal we first note that by the previous arguments, the 2-characters of all elements of L ∪ R are uniquely determined and different from all those occurring in M , so any 3-equivalent string must begin and end in this way. Now A determines a path through the digraph D having as vertices strings of length 4 over {r, b} arising as convex subsets of L ′ ∪ M ∪ R ′ , and the edges of D tell us precisely which 2-characters are realized in the path. So although some will now be repeated, we have to check that no shorter path through D can realize precisely the same 2-characters. One checks that D has 28 edges. Suppose therefore that P is a path through D traversing precisely the same edges (though not necessarily the same number of times). We shall show that P has length at least 36. We may also view P as a 'multi-digraph' in which the multiplicities with which the edges of D arise in P are also recognized, and in this sense we may talk of the 'in-degree' in(x) and 'out-degree' out(x) of a vertex x. By the usual theory of eulerian paths, if i and f are the initial and final vertices of P , then i = f implies that out(i) = in(i) + 1 and in(f ) = out(f ) + 1; all other vertices x, and also i and f if they are equal, satisfy in(x) = out(x). Furthermore, since L ′ = br, i must equal brrr, brrb, brbr, or brbb, and similarly f must equal rrrb, rbrb, brrb, or bbrb.The digraph D is shown in figure 1.

Figure: 1
Now we note that the vertex rrbb is therefore an internal vertex of P , so has equal in-and out-degrees in P . Since its out-neighbours (in P ) rbbr and rbbb are distinct, it follows that its in-degree is at least 2. Similarly, in(rbrb), out(rbrr), out(brbr), out(bbrr) ≥ 2. In D, each of rrbb and rbrb has only one in-neighbour, so the corresponding edges in P must each appear at least twice. Similarly for the out-neighbours of rbrr, brbr, and bbrr. This already assures us that P has length at least 33. But now we know that in(rbrr) ≥ 3, and as rbrr is internal, also out(rbrr) ≥ 3. Similarly, in(rrbr), in(rrrb) ≥ 3. Since the extra edges thereby assured and contributing to out(rbrr), in(rrbr), and in(rrrb) must be rbrr → brrr, rrrb → rrbr or brrb → rrbr, and rrrr → rrrb or brrr → rrrb, this gives at least 3 extra edges in P , showing that it has length at least 36, as desired.

Future work
We have really only scratched the surface of this topic, in [4] and here, and a great deal more effort would be required to understand fully the structure of ≡ n -optimal strings for all n, and for all colour set sizes. The method in the final example just given seems very laborious, merely to increase the length by 4, and that is only for two colours and n = 3. Undoubtedly there will be longer examples, requiring more careful checking. We remark that we also believe that one can construct finite strings having no ≡ 3 -equivalent optimal substring. The idea would be to find a string as above obtained by modifying the length 70 example, but such that this time, the path is not optimal, but that any optimal path traversing the same edges as D would have to have them in a different order, so that the optimal string wouldn't actually be a substring of the original one.
We conclude by illustrating the specific problem which applies even to increase the number of colours by 1. We would like to apply the same kind of analysis as for the case of 2 colours, which relied on the subdivision of the string A into L, M , R. We can still do this, and for long enough strings there will be a non-trivial M , comprising those points such that the left and right 2-characters both themselves have a 'middle' in which all 3 colours, red, blue, and green, appear. Since this time 2-characters have length up to 15 (see [4]) the lengths of L and R will usually be a lot longer. The main problem comes about in M however. Last time we were able to pin down the 2-character of a point in M by a sequence of length 5. This time though, the right end of the left 2-character may be rbg, rbgg, rbbgg, rbgbg, rbggb, as well as others obtained by permuting the three colours, and we seem to need the final 4 entries at least to tell which is which, and so we'd have to have sequences of length 7, 8, or 9, in place of 5, that is, not constant. Thus the trick of using an Euler circuit doesn't seem to apply.