1 Introduction

Consider two words \(\mathtt{abba}\) and \(\mathtt{b}\). It is possible to concatenate (several copies of) them as \(\mathtt{b}\cdot \mathtt{abba} \cdot \mathtt{b}\), and obtain a power of a third word, namely a square \(\mathtt{bab}\cdot \mathtt{bab}\) of \(\mathtt{bab}\). In this paper, we completely describe all ways how this can happen for two words, and formalize it in Isabelle/HOL.

The corresponding theory has a long history. The question can be formulated as solving equations in three variables of the special form \(W(x,y) = z^\ell \) where the left hand side is a sequence of x’s and y’s, and \(\ell \ge 2\). The seminal result in this direction is the paper by R. C. Lyndon and M.-P. Schützenberger [10] from 1962, which solves in a more general setting of free groups the equation \(x^jy^k = z^\ell \) with \(2 \le j,k,\ell \). It was followed, in 1967, by a partial answer to our question by A. Lentin and M.-P. Schützenberger [9]. A complete characterization of monoids generated by three words was provided by L. G. Budkina and Al. A. Markov in 1973 [4]. The characterization was later, in 1976, reproved in a different way by Lentin’s student J.-P. Spehner in his Ph.D. thesis [14], which even explicitly mentions the answer to the present question. See also a comparison of the two classifications by T. Harju and D. Nowotka [7]. In 1985, the result was again reproved by E. Barbin-Le Rest and M. Le Rest [1], this time specifically focusing on our question. Their paper contains a characterization of binary interpretations of a square as a crucial tool. The latter combinatorial result is interesting on its own, but is very little known. In addition to the fact that, as far as we know, the proof is not available in English, it has to be reconstructed from Théorème 2.1 and Lemme 3.1 in [1], it is long, technical and little structured, with many intuitive steps that have to be clarified. It is symptomatic, for example, that Maňuch [11] cites the claim as essentially equivalent to his desired result but nevertheless provides a different, shorter but similarly technical proof.

The fact that several authors opted to provide their own proof of the already known result, and that even a weaker result was republished as new shows that the existing proof was not considered sufficiently convincing and approachable. This makes the topic a perfect candidate for formalization. The proof we present here naturally contains some ideas of the proof from [1] but is significantly different. Our main objective was to follow the basic methodological requirement of a good formalization, namely to identify claims that are needed in the proof and formulate them as separate lemmas and as generally as possible so that they can be reused not only in the proof but also later. Moreover, the formalization naturally forced us to consider carefully the overall strategy of the proof (which is rather lost behind technical details of published works on this topic). Under Isabelle’s pressure we eventually arrived at a hopefully clear proof structure which includes a simple, but probably innovative use of the idea of “gluing” words. The analysis of the proof is therefore another, and we believe the most important contribution of our formalization, in addition to the mere certainty that there are no gaps in the proof.

In addition, we provide a complete parametric solution of the equation \(x^ky^j = z^\ell \) for arbitrary j, k and \(\ell \), a classification which is not very difficult, but maybe too complicated to be useful in a mere unverified paper form.

The formalization presented here is an organic part of a larger project of formalization of combinatorics of words (see an introductory description in [8]). We are not aware of a similar formalization project in any proof assistant. The existence of the underlying library, which in turn extends the theories of “List” and “HOL-Library.Sublist” from the standard Isabelle distribution, critically contributes to a smooth formalization which is getting fairly close to the way a human paper proof would look like, outsourcing technicalities to the (reusable) background. We accompany claims in this text with names of their formalized counterparts.

2 Basic Facts and Notation

Let \(\varSigma \) be an arbitrary set. Lists (i.e. finite sequences) \([x_1,x_2,\dots ,x_n]\) of elements \(x_i \in \varSigma \) are called words over \(\varSigma \). The set of all words over \(\varSigma \) is usually denoted as \({\varSigma }^{*}\), using the Kleene star. A notorious ambivalence of this notation is related to the situation when we consider a set of words \(X \subset {\varSigma }^{*}\), and are interested in lists over X. They should be denoted as elements of \(X^*\). However, \(X^*\) usually means something else (in the theory of rational languages), namely the set of all words in \({\varSigma }^{*}\) generated by the set X. To avoid the confusion, we will therefore follow the notation used in the formalization in Isabelle, and write \(\texttt {lists}\, X\) instead, to make clear that the entries of an element of \(\texttt {lists}\, X\) are themselves words. In order to further help to distinguish words over the basic alphabet from lists over a set of words, we shall use boldface variables for the latter. In particular, it is important to keep in mind the difference between a letter a and the word [a] of length one, the distinction which is usually glossed over lightly in the literature on combinatorics on words. The set of words over \(\varSigma \) generated by X is then denoted as \(\left\langle X \right\rangle \). The (associative) binary operation of concatenation of two words u and v is denoted by \(u \cdot v\). We prefer this algebraic notation to the Isabelle’s original @. Moreover, we shall often omit the dot as usual. If \(\mathbf {u}= [x_1,x_2,\ldots , x_n] \in \texttt {lists}\, X\) is a list of words, then we write \(\texttt {concat}\, \mathbf {u}\) for \(x_1\cdot x_2 \cdots x_n\). We write \(\varepsilon \) for the empty list, and \(u^k\) for the concatenation of k copies of u (we use \(u^{\mathtt{@}}k\) in the formalization). We write \(u \le _p v\), \(u <_p v\), \(u \le _s v\), \(u <_s v\), and \(u \le _f v\) to denote that u is a prefix, a strict prefix, suffix, strict suffix and factor (that is, a contiguous sublist) respectively. A word is primitive if it is nonempty and not a power of a shorter word. Otherwise, we call it imprimitive. Each nonempty word w is a power of a unique primitive word \(\rho \, w\), its primitive root. A nonempty word r is a periodic root of a word w if \(w \le _p r \cdot w\). This is equivalent to w being a prefix of the right infinite power of r, denoted \(r^\omega \). Note that we deal with finite words only, and we use the notation \(r^\omega \) only as a convenient shortcut for “a sufficiently long power of r”. Two words u and v are conjugate, we write \(u \sim v\), if \(u = rq\) and \(v=qr\) for some words r and q. Note that conjugation is an equivalence whose classes are also called cyclic words. A word u is a cyclic factor of w if it is a factor of some conjugate of w. A set of words X is a code if its elements do not satisfy any nontrivial relation, that is, they are a basis of a free semigroup. For a two-element set \(\{x,y\}\), this is equivalent to x and y being non-commuting, i.e., \(xy\ne yx\), and/or to \(\rho \, x \ne \rho \, y\). An important characterization of a semigroup S of words to be free is the stability condition which is the implication \(u,v,uz,zv \in S \Longrightarrow z \in S\). The longest common prefix of u and v is denoted by \(u \wedge _p v\). If \(\{x,y\}\) is a (binary) code, then \((x \cdot w) \wedge _p (y \cdot w') = xy\wedge _p yx\) for any \(w,w'\in \left\langle \{x,y\} \right\rangle \) sufficiently long. We explain some elementary facts from combinatorics on words used in this article in more detail in Sect. 8.

3 Main Theorem

Let us introduce the central definition of the paper.

Definition 1

We say that a set X of words is primitivity preserving if there is no word \(\mathbf {w}\in \texttt {\texttt {lists}} \, X\) such that

  • \(\left| \mathbf {w} \right| \ge 2\);

  • \(\mathbf {w}\) is primitive; and

  • \(\texttt {\texttt {concat}} \, \mathbf {w}\) is imprimitive.

Note that our definition does not take into account singletons \(\mathbf {w}= [x]\). In particular, X can be primitivity preserving even if some \(x \in X\) is imprimitive. Nevertheless, in the binary case, we will also provide some information about the cases when one or both elements of the code have to be primitive.

In [12], V. Mitrana formulates the primitivity of a set in terms of morphisms, and shows that X is primitivity preserving if and only if it is the minimal set of generators of a “pure monoid”, cf. [3, p. 276]. This brings about a wider concept of morphisms preserving a given property, most classically square-freeness, see for example a characterization of square-free morphisms over three letters by M. Crochemore [5].

The target claim of our formalization is the following characterization of words witnessing that a binary code is not primitivity preserving:

Theorem 1

(bin_imprim_code). Let \(B = \{x,y\}\) be a code that is not primitivity preserving. Then there are integers \(j \ge 1\) and \(k \ge 1\), with \(k = 1\) or \(j = 1\), such that the following conditions are equivalent for any \(\mathbf {w}\in \texttt {\texttt {lists}} \, B\) with \(\left| \mathbf {w} \right| \ge 2\):

  • \(\mathbf {w}\) is primitive, and \(\texttt {\texttt {concat}} \, \mathbf {w}\) is imprimitive

  • \(\mathbf {w}\) is conjugate with \([x]^j[y]^k\).

Moreover, assuming \(\left| y \right| \le \left| x \right| \),

  • if \(j \ge 2\), then \(j=2\) and \(k=1\), and both x and y are primitive;

  • if \(k \ge 2\), then \(j=1\) and x is primitive.


Let \(\mathbf {w}\) be a word witnessing that B is not primitivity preserving. That is, \(\left| \mathbf {w} \right| \ge 2\), \(\mathbf {w}\) is primitive, and \(\texttt {concat}\, \mathbf {w}\) is imprimitive. Since \([x]^j[y]^k\) and \([y]^k[x]^j\) are conjugate, we can suppose, without loss of generality, that \(\left| y \right| \le \left| x \right| \).

First, we want to show that \(\mathbf {w}\) is conjugate with \([x]^j[y]^k\) for some \(j,k \ge 1\) such that \(k = 1\) or \(j = 1\). Since \(\mathbf {w}\) is primitive and of length at least two, it contains both x and y. If it contains one of these letters exactly once, then \(\mathbf {w}\) is clearly conjugate with \([x]^j[y]^k\) for \(j = 1\) or \(k = 1\). Therefore, the difficult part is to show that no primitive \(\mathbf {w}\) with \(\texttt {concat}\, \mathbf {w}\) imprimitive can contain both letters at least twice. This is the main task of the rest of the paper, which is finally accomplished by Theorem 4 claiming that words that contain at least two occurrences of x are conjugate with [xxy]. To complete the proof of the first part of the theorem, it remains to show that j and k do not depend on \(\mathbf {w}\). This follows from Lemma 1.

Note that the imprimitivity of \(\texttt {concat}\, \mathbf {w}\) induces the equality \(x^jy^k = z^\ell \) for some z and \(\ell \ge 2\). The already mentioned seminal result of Lyndon and Schützenberger shows that j and k cannot be simultaneously at least two, since otherwise x and y commute. For the same reason, considering its primitive root, the word y is primitive if \(j \ge 2\). Similarly, x is primitive if \(k \ge 2\). The primitivity of x when \(j = 2\) is a part of Theorem 4.    \(\square \)

We start by giving a complete parametric solution of the equation \(x^jy^k = z^\ell \) in the following theorem. This will eventually yield, after the proof of Theorem 1 is completed, a full description of not primitivity preserving binary codes. Since the equation is mirror symmetric, we omit symmetric cases by assuming \(|y| \le |x|\).

Theorem 2

(LS_parametric_solution). Let \(\ell \ge 2\), \(j,k \ge 1\) and \(|y| \le |x|\).

The equality \(x^jy^k = z^\ell \) holds if and only if one of the following cases takes place:

  1. A.

    There exists a word r, and integers \(m,n,t \ge 0\) such that

    $$\begin{aligned} mj+nk = t \ell ,&\quad \text{ and } \\ x = r^m, \quad y = r^n, \quad z = r^t;&\end{aligned}$$
  2. B.

    \(j = k = 1\) and there exist non-commuting words r and q, and integers \(m,n \ge 0\) such that

    $$\begin{aligned} m+n+1 = \ell ,&\quad \text{ and } \\ x = (rq)^mr, \quad y = q(rq)^{n}, \quad z = rq;&\end{aligned}$$
  3. C.

    \(j = \ell = 2\), \(k = 1\) and there exist non-commuting words r and q and an integer \(m \ge 2\) such that

    $$x = (rq)^m r, \quad y = qrrq, \quad z = (rq)^mrrq;$$
  4. D.

    \(j = 1\) and \(k \ge 2\) and there exist non-commuting words r and q such that

    $$x = (qr^k)^{\ell -1}q, \quad y = r, \quad z = qr^k;$$
  5. E.

    \(j = 1\) and \(k \ge 2\) and there are non-commuting words r and q, an integer \(m \ge 1\) such that

    $$x = (qr(r(qr)^m)^{k - 1})^{\ell - 2}qr(r(qr)^m)^{k - 2}rq, \quad y = r(qr)^m, \quad z = qr(r(qr)^m)^{k - 1}.$$


If x and y commute, then all three words commute, hence they are a power of a common word. A length argument yields the solution A.

Assume now that \(\{x,y\}\) is a code. Then no pair of words x, y and z commutes. We have shown in the overview of the proof of Theorem 1 that \(j = 1\) or \(k = 1\) by the Lyndon-Schützenberger theorem. The solution is then split into several cases.

Case 1: \(j = k = 1\).

Let m and r be such that \(z^mr = x\) with r a strict prefix of z. By setting \(z = rq\), we obtain the solution B with \(n = \ell - m -1\).

Case 2: \(j \ge 2, k = 1\).

Since \(|y| \le |x|\) and \(\ell \ge 2\), we have

$$2|z| \le |z^\ell | = |x^j| + |y| < 2|x^j|,$$

so z is a strict prefix of \(x^j\).

As \(x^j\) has periodic roots both z and x, and z does not commute with x, the Periodicity lemma implies \(|x^j| < |z| + |x|\). That is, \(z = x^{j-1}u\), \(x^j = zv\) and \(x = uv\) for some nonempty words u and v. As v is a prefix of z, it is also a prefix of x. Therefore, we have

$$ x = uv = vu' $$

for some word \(u'\). This is a well known conjugation equality which implies \(u = rq\), \(u' = qr\) and \(v = (rq)^nr\) for some words r, q and an integer \(n \ge 0\).

We have

$$ j|x| + |y| = |x^jy| = |z^\ell | = \ell (j-1)|x| + \ell |u|, $$

and thus \(|y| = (\ell j-\ell -j)|x| + \ell |u|\). Since \(|y| \le |x|\), \(|u| > 0\), \(j \ge 2\), and \(\ell \ge 2\), it follows that \(\ell j - \ell - j = 0\), which implies \(j = l = 2\). We therefore have \(x^2y = z^2\) and \(x^2 = zv\), hence \(vy = z\).

Combining \(u = rq\), \(u' = qr\), and \(v = (rq)^nr\) with \(x = vu'\), \(z = x^{j-1}u = xu = vu'u\), and \(vy = z\), we obtain the solution C with \(m = n+1\). The assumption \(\left| y \right| \le \left| x \right| \) implies \(m \ge 2\).

Case 3: \(j = 1, k \ge 2, y^k {\le }_s z\).

We have \(z = qy^k\) for some word q. Noticing that \(x = z^{\ell -1}q\) yields the solution D.

Case 4: \(j = 1, k \ge 2, z <_s y^k\).

This case is analogous to the second part of Case 2. Using the Periodicity lemma, we obtain \(uy^{k-1} = z\), \(y^k = vz\), and \(y = vu\) with nonempty u and v. As v is a suffix of z, it is also a suffix of y, and we have \(y = vu = u'v\) for some \(u'\). Plugging the solution of the last conjugation equality, namely \(u' = rq\), \(u = qr\), \(v = (rq)^nr\), into \(y = u'v\), \(z = uy^{k-1}\) and \(z^{\ell -1} = xv\) gives the solution E with \(m = n + 1\).

Finally, the words r and q do not commute since x and y, which are generated by r and q, do not commute.

The proof is completed by a direct verification of the converse.    \(\square \)

We now show that, for a given not primitivity preserving binary code, there is a unique pair of exponents (jk) such that \(x^jy^k\) is imprimitive.

Lemma 1

(LS_unique). Let \(B = \{x,y\}\) be a code. Assume \(j,k,j',k' \ge 1\). If both \(x^jy^k\) and \(x^{j'}y^{k'}\) are imprimitive, then \(j = j'\) and \(k = k'\).


Let \(z_1,z_2\) be primitive words and \(\ell ,\ell ' \ge 2\) be such that

$$\begin{aligned} x^jy^k = z_1^\ell \quad \text { and } \quad x^{j'}y^{k'} = z_2^{\ell '}. \end{aligned}$$

Since B is a code, the words x and y do not commute. We proceed by contradiction.

Case 1: First, assume that \(j = j'\) and \(k \ne k'\).

Let, without loss of generality, \(k < k'\). From (1) we obtain \(z_1^\ell y^{k' - k} = z_2^{\ell '}\). The case \(k' - k \ge 2\) is impossible due to the Lyndon-Schützenberger theorem. Hence \(k' - k = 1\). This is another place where the formalization triggered a simple and nice general lemma (easily provable by the Periodicity lemma) which will turn out to be useful also in the proof of Theorem 4. Namely, the lemma imprim_ext_suf_comm claims that if both uv, and uvv are imprimitive, then u and v commute. We apply this lemma to \(u = x^jy^{k-1}\) and \(v = y\), obtaining a contradiction with the assumption that x and y do not commute.

Case 2. The case \(k = k'\) and \(j \ne j'\) is symmetric to Case 1.

Case 3. Let finally \(j \ne j'\) and \(k \ne k'\). The Lyndon-Schützenberger theorem implies that either j or k is one, and similarly either \(j'\) or \(k'\) is one. We can therefore assume that \(k = j' = 1\) and \(k',j \ge 2\). Moreover, we can assume that \(\left| y \right| \le \left| x \right| \). Indeed, in the opposite case, we can consider the words \(y^kx^j\) and \(y^{k'}x^{j'}\) instead, which are also both imprimitive.

Theorem 2 now allows only the case C for the equality \(x^jy = z_1^\ell \). We therefore have \(j = \ell = 2\) and \(x = (rq)^mr\), \(y = qrrq\) for an integer \(m \ge 2\) and some non-commuting words r and q. Since \(y = qrrq\) is a suffix of \(z_2^\ell \), this implies that \(z_2\) and rq do not commute. Consider the word \(x \cdot qr = (rq)^mrqr\), which is a prefix of xy, and therefore also of \(z_2^\ell \). This means that \(x \cdot qr\) has two periodic roots, namely rq and \(z_2\), and the Periodicity lemma implies that \(\left| x \cdot qr \right| < \left| rq \right| + \left| z_2 \right| \). Hence x is shorter than \(z_2\). The equality \(xy^{k'} = z_2^{\ell '}\), with \(\ell ' \ge 2\), now implies on one hand that rqrq is a prefix of \(z_2\), and on the other hand that \(z_2\) is a suffix of \(y^{k'}\). It follows that rqrq is a factor of \((qrrq)^k\). Hence rqrq and qrrq are conjugate, thus they both have a period of length \(\left| rq \right| \), which implies \(qr = rq\). This is a contradiction.    \(\square \)

The rest of the paper, and therefore also of the proof of Theorem 1, is organized as follows. In Sect. 4, we introduce a general theory of interpretations, which is behind the main idea of the proof, and apply it to the (relatively simple) case of a binary code with words of the same length. In Sect. 5 we characterize the unique disjoint extendable \(\{x,y\}\)-interpretation of the square of the longer word x. This is a result of independent interest, and also the cornerstone of the proof of Theorem 1 which is completed in Sect. 6 by showing that a word containing at least two x’s witnessing that \(\{x,y\}\) is not primitivity preserving is conjugate with [xxy].

4 Interpretations and the Main Idea

Let X be a code, let u be a factor of \(\texttt {concat}\, \mathbf {w}\) for some \(\mathbf {w}\in \texttt {lists}\, X\). The natural question is to decide how u can be produced as a factor of words from X, or, in other words, how it can be interpreted in terms of X. This motivates the following definition.

Definition 2

Let X be a set of words over \(\varSigma \). We say that the triple \((p,s,\mathbf {w}) \in {\varSigma }^{*}\times {\varSigma }^{*} \times \texttt {\texttt {lists}} \, X\) is an X-interpretation of a word \(u \in \varSigma ^{*}\) if

  • \(\mathbf {w}\) is nonempty;

  • \(p \cdot u \cdot s = \texttt {\texttt {concat}} \, \mathbf {w}\);

  • \(p <_p \texttt {\texttt {hd}} \, \mathbf {w}\) and

  • \(s <_s \texttt {\texttt {last}} \, \mathbf {w}\).

The definition is illustrated by the following figure, where \(\mathbf {w}= [w_1,w_2,w_3,w_4]\):

figure a

The second condition of the definition motivates the notation \(p\, u\, s \sim _{\mathcal I} \mathbf {w}\) for the situation when \((p, s, \mathbf {w})\) is an X-interpretation of u.

Remark 1

For sake of historical reference, we remark that our definition of X-interpretation differs from the one used in [1]. Their formulation of the situation depicted by the above figure would be that u is interpreted by the triple \((s', w_2 \cdot w_3, p')\) where \(p\cdot s' = w_1\) and \(p'\cdot s = w_4\). This is less convenient for two reasons. First, the decomposition of \(w_2 \cdot w_3\) into \([w_2,w_3]\) is only implicit here (and even ambiguous if X is not a code). Second, while it is required that the the words \(p'\) and \(s'\) are a prefix and a suffix, respectively, of an element from X, the identity of that element is left open, and has to be specified separately.

If u is a nonempty element of \(\left\langle X \right\rangle \) and \(u = \texttt {concat}\, \mathbf {u}\) for \(\mathbf {u}\in \texttt {lists}\, X\), then the X-interpretation \(\varepsilon \, u\, \varepsilon \sim _{\mathcal I} \mathbf {u}\) is called trivial. Note that the trivial X-interpretation is unique if X is a code.

As nontrivial X-interpretations of elements from \(\left\langle X \right\rangle \) are of particular interest, the following two concepts are useful.

Definition 3

An X-interpretation \(p\, u\, s \sim _{\mathcal I} \mathbf {w}\) of \(u = \texttt {\texttt {concat}} \, \mathbf {u}\) is called

  • disjoint if \(\texttt {\texttt {concat}} \, \mathbf {w}' \ne p \cdot \texttt {\texttt {concat}} \, \mathbf {u}'\) whenever \(\mathbf {w}' \le _p \mathbf {w}\) and \(\mathbf {u}' \le _p \mathbf {u}\).

  • extendable if \(p \le _s w_p\) and \(s \le _p w_s\) for some elements \(w_p, w_s \in \left\langle X \right\rangle \).

Note that a disjoint X-interpretation is not trivial, and that being disjoint is relative to a chosen factorization \(\mathbf {u}\) of u (which is nevertheless unique if X is a code).

The definitions above are naturally motivated by the main idea of the characterization of sets X that do not preserve primitivity, which dates back to Lentin and Schützenberger [9]. If \(\mathbf {w}\) is primitive, while \(\texttt {concat}\, \mathbf {w}\) is imprimitive, say \(\texttt {concat}\, \mathbf {w}= z^k\), \(k\ge 2\), then the shift by z provides a nontrivial and extendable X-interpretation of \(\texttt {concat}\, \mathbf {w}\). (In fact, \(k-1\) such nontrivial interpretations). Moreover, the following lemma, formulated in a more general setting of two words \(\mathbf {w}_1\) and \(\mathbf {w}_2\), implies that the X-interpretation is disjoint if X is a code.

Lemma 2

(shift_interpret, shift_disjoint). Let X be a code. Let \(\mathbf {w}_1, \mathbf {w}_2 \in \texttt {\texttt {lists}} \, X\) be such that \(z \cdot \texttt {\texttt {concat}} \, \mathbf {w}_1 = \texttt {\texttt {concat}} \, \mathbf {w}_2 \cdot z\) where \(z \notin \left\langle X \right\rangle \). Then \(z \cdot \texttt {\texttt {concat}} \, \mathbf {v}_1 \ne \texttt {\texttt {concat}} \, \mathbf {v}_2\), whenever \(\mathbf {v}_1 \le _p \mathbf {w}_1^n\) and \(\mathbf {v}_2 \le _p \mathbf {w}_2^n\), \(n\in \mathbb N\).

In particular, \(\texttt {\texttt {concat}} \, \mathbf {u}\) has a disjoint extendable X-interpretation for any prefix \(\mathbf {u}\) of \(\mathbf {w}_1\).

The excluded possibility is illustrated by the following figure.

figure b


First, note that \(z \cdot \texttt {concat}\, \mathbf {w}_1^n = \texttt {concat}\, \mathbf {w}_2^n \cdot z\) for any n. Let \(\mathbf {w}_1^n = \mathbf {v}_1\cdot \mathbf {v}_1'\) and \(\mathbf {w}_2^n= \mathbf {v}_2\cdot \mathbf {v}_2'\). If \(z \cdot \texttt {concat}\, \mathbf {v}_1 = \texttt {concat}\, \mathbf {v}_2\), then also \(\texttt {concat}\, \mathbf {v}_2' \cdot z = \texttt {concat}\, \mathbf {v}_1'\). This contradicts \(z \notin \left\langle X \right\rangle \) by the stability condition.

An extendable X-interpretation of \(\mathbf {u}\) is induced by the fact that \(\texttt {concat}\, \mathbf {u}\) is covered by \(\texttt {concat}(\mathbf {w}_2 \cdot \mathbf {w}_2)\). The interpretation is disjoint by the first part of the proof.    \(\square \)

In order to apply the above lemma to the imprimitive \(\texttt {concat}\, \mathbf {w}= z^k\) of a primitive \(\mathbf {w}\), set \(\mathbf {w}_1= \mathbf {w}_2 = \mathbf {w}\). The assumption \(z \notin \left\langle X \right\rangle \) follows from the primitivity of \(\mathbf {w}\): indeed, if \(z = \texttt {concat}\, \mathbf {z}\), with \(\mathbf {z}\in \texttt {lists}\, X\), then \(\mathbf {w}= \mathbf {z}^k\) since B is a code.

We first apply the main idea to a relatively simple case of nontrivial \(\{x,y\}\)-interpretation of the word \(x \cdot y\) where x and y are of the same length.

Lemma 3

(uniform_square_interp). Let \(B = \{x,y\}\) be a code with \(\left| x \right| = \left| y \right| \). Let \(p\ (x\cdot y)\ s \sim _{\mathcal I} \mathbf {v}\) be a nontrivial B-interpretation. Then \(\mathbf {v}= [x,y,x]\) or \(\mathbf {v}= [y,x,y]\) and \(x\cdot y\) is imprimitive.


From \(p \cdot x \cdot y \cdot s = \texttt {concat}\, \mathbf {v}\), it follows, by a length argument, that \(\left| \mathbf {v} \right| \) is three. A straightforward way to prove the claim is to consider all eight possible candidates. In each case, it is then a routine few line proof that shows that \(x = y\), unless \(\mathbf {v}= [x,y,x]\) or \(\mathbf {v}= [y,x,y]\), which we omit. In the latter cases, \(x \cdot y\) is a nontrivial factor of its square \((x \cdot y)\cdot (x \cdot y)\), which yields the imprimitivity of \(x \cdot y\).    \(\square \)

The previous (sketch of the) proof nicely illustrates on a small scale the advantages of formalization. It is not necessary to choose between a tedious elementary proof for sake of completeness on one hand, and the suspicion that something was missed on the other hand (leaving aside that the same suspicion typically remains even after the tedious proof). A bit ironically, the most difficult part of the formalization is to show that \(\mathbf {v}\) is indeed of length three, which needs no further justification in a human proof.

We have the following corollary which is a variant of Theorem 4, and also illustrates the main idea of its proof.

Lemma 4

(bin_imprim_not_conjug). Let \(B = \{x,y\}\) be a binary code with \(\left| x \right| = \left| y \right| \). If \(\mathbf {w}\in \texttt {\texttt {lists}} \, B\) is such that \(\left| \mathbf {w} \right| \ge 2\), \(\mathbf {w}\) is primitive, and \(\texttt {\texttt {concat}} \, \mathbf {w}\) is imprimitive, then x and y are not conjugate.


Since \(\mathbf {w}\) is primitive and of length at least two, it contains both letters x and y. Therefore, it has either [xy] or [yx] as a factor. The imprimitivity of \(\texttt {concat}\, \mathbf {w}\) yields a nontrivial B-interpretation of \(x \cdot y\), which implies that \(x \cdot y\) is not primitive by Lemma 3.

Let x and y be conjugate, and let \(x = r \cdot q\) and \(y = q \cdot r\). Since \(x \cdot y = r \cdot q \cdot q \cdot r\) is imprimitive, also \(r \cdot r \cdot q \cdot q\) is imprimitive. Then r and q commute by the theorem of Lyndon and Schützenberger, a contradiction with \(x \ne y\).    \(\square \)

5 Binary Interpretation of a Square

Let \(B = \{x,y\}\) be a code such that \(\left| y \right| \le \left| x \right| \). In accordance with the main idea, the core technical component of the proof is the description of the disjoint extendable B-interpretations of the square \(x^2\). This is a very nice result which is relatively simple to state but difficult to prove, and which is valuable on its own. As we mentioned already, it can be obtained from Théorème 2.1 and Lemme 3.1 in [1].

Theorem 3

(square_interp_ext.sq_ext_interp). Let \(B = \{x,y\}\) be a code such that \(\left| y \right| \le \left| x \right| \), both x and y are primitive, and x and y are not conjugate.

Let \(p\, (x\cdot x)\, s \sim _{\mathcal I} \mathbf {w}\) be a disjoint extendable B-interpretation. Then

$$\begin{aligned} \mathbf {w}&= [x,y,x],&s \cdot p&= y,&p \cdot x&= x \cdot s. \end{aligned}$$

In order to appreciate the theorem, note that the definition of interpretation implies

$$\begin{aligned} p \cdot x \cdot x \cdot s = x \cdot y \cdot x, \end{aligned}$$

hence \(x \cdot y \cdot x = (p \cdot x)^2\). This will turn out to be the only way how primitivity may not be preserved if x occurs at least twice in \(\mathbf {w}\). Here is an example with \(x = \mathtt{01010}\) and \(y = \mathtt{1001}\):

figure c


By the definition of a disjoint interpretation, we have \(p\cdot x\cdot x \cdot s = \texttt {concat}\, \mathbf {w}\), where \(p \ne \varepsilon \) and \(s \ne \varepsilon \). A length argument implies that \(\mathbf {w}\) has length at least three. Since a primitive word is not a nontrivial factor of its square, we have \(\mathbf {w}= [\texttt {hd}\, \mathbf {w}] \cdot [y]^k \cdot [\texttt {last}\, \mathbf {w}]\), with \(k \ge 1\). Since the interpretation is disjoint, we can split the equality into \(p \cdot x = \texttt {hd}\, \mathbf {w}\cdot y^m \cdot u\) and \(x \cdot s = v \cdot y^\ell \cdot \texttt {last}\, \mathbf {w}\), where \(y = u \cdot v\), both u and v are nonempty, and \(k = \ell + m + 1\). We want to show \(\texttt {hd}\, \mathbf {w}= \texttt {last}\, \mathbf {w}= x\) and \(m = \ell = 0\). The situation is mirror symmetric so we can solve cases two at a time.

If \(\texttt {hd}\, \mathbf {w}= \texttt {last}\, \mathbf {w}= y\), then powers of x and y share a factor of length at least \(\left| x \right| + \left| y \right| \). Since they are primitive, this implies that they are conjugate, a contradiction. The same argument applies when \(\ell \ge 1\) and \(\texttt {hd}\, \mathbf {w}= y\) (if \(m \ge 1\) and \(\texttt {last}\, \mathbf {w}= y\) respectively). Therefore, in order to prove \(\texttt {hd}\, \mathbf {w}= \texttt {last}\, \mathbf {w}= x\), it remains to exclude the case \(\texttt {hd}\, \mathbf {w}= y\), \(\ell = 0\) and \(\texttt {last}\, \mathbf {w}= x\) (\(\texttt {last}\, \mathbf {w}= y\), \(m = 0\) and \(\texttt {hd}\, \mathbf {w}= x\) respectively). This is covered by one of the technical lemmas that we single out:

Lemma 5

(pref_suf_pers_short). Let \(x \le _p v \cdot x\), \(x \le _s p \cdot u \cdot v \cdot u\) and \(\left| x \right| > \left| v \cdot u \right| \) with \(p \in \left\langle \{u,v\} \right\rangle \). Then \(u \cdot v = v \cdot u\).

This lemma indeed excludes the case we wanted to exclude, since the conclusion implies that y is not primitive. We skip the proof of the lemma here and make instead an informal comment. Note that v is a period root of x. In other words, x is a factor of \(v^\omega \). Therefore, with the stronger assumption that \(v \cdot u \cdot v\) is a factor of x, the conclusion follows easily by the familiar principle that v being a factor of \(v^\omega \) “synchronizes” primitive roots of v. Lemma 5 then exemplifies one of the virtues of formalization, which makes it easy to generalize auxiliary lemmas, often just by following the most natural proof and checking its minimal necessary assumptions.

Now we have \(\texttt {hd}\, \mathbf {w}= \texttt {last}\, \mathbf {w}= x\), hence \(p \cdot x = x \cdot y^m \cdot u\) and \(x \cdot s = v \cdot y^\ell \cdot x\). The natural way to describe this scenario is to observe that x has both the (prefix) period root \(v \cdot y^\ell \), and the suffix period root \(y^m \cdot u\). Using again Lemma 5, we exclude situations when \(\ell = 0\) and \(m \ge 1\) (\(m = 0\) and \(\ell \ge 1\) resp.). It therefore remains to deal with the case when both m and \(\ell \) are positive. We divide this into four lemmas according to the size of the overlap the prefix \(v\cdot y^\ell \) and the suffix \(y^m\cdot u\) have in x. More exactly, the cases are:

  • \(\left| v\cdot y^\ell \right| + \left| y^m\cdot u \right| \le \left| x \right| \)

  • \(\left| x \right| < \left| v\cdot y^\ell \right| + \left| y^m\cdot u \right| \le \left| x \right| + \left| u \right| \)

  • \(\left| x \right| + \left| u \right|< \left| v\cdot y^\ell \right| + \left| y^m\cdot u \right| < \left| x \right| + \left| u \cdot v \right| \)

  • \(\left| x \right| + \left| u\cdot v \right| \le \left| v\cdot y^\ell \right| + \left| y^m\cdot u \right| \)

and they are solved by an auxiliary lemma each. The first three cases yield that u and v commute, the first one being a straightforward application of the Periodicity lemma. The last one is also straightforward application of the “synchronization” idea. It implies that \(x \cdot x\) is a factor of \(y^\omega \), a contradiction with the assumption that x and y are primitive and not conjugate. Consequently, the technical, tedious part of the whole proof is concentrated in lemmas dealing with the second, and the third case (see lemmas short_overlap and medium_overlap in the theory Binary_Square_Interpretation.thy). The corresponding proofs are further analyzed and decomposed into more elementary claims in the formalization, where further details can be found.

This completes the proof of \(\mathbf {w}= [x,y,x]\). A byproduct of the proof is the description of words x, y, p and s. Namely, there are non-commuting words r and t, and integers m, k and \(\ell \) such that

$$\begin{aligned} x&= (rt)^{m+1}\cdot r,&y&= (tr)^{k+1}\cdot (rt)^{\ell +1},&p&= (rt)^{k+1},&s&= (tr)^{\ell +1}\,. \end{aligned}$$

The second claim of the present theorem, that is, \(y = s \cdot p\) is then equivalent to \(k = \ell \), and it is an easy consequence of the assumption that the interpretation is extendable.    \(\square \)

6 The Witness with Two x’s

In this section, we characterize words witnessing that \(\{x,y\}\) is not primitivity preserving and containing at least two x’s.

Theorem 4

(bin_imprim_longer_twice). Let \(B = \{x,y\}\) be a code such that \(\left| y \right| \le \left| x \right| \). Let \(\mathbf {w}\in \texttt {\texttt {lists}} \, \{x,y\}\) be a primitive word which contains x at least twice such that \(\texttt {concat}\, \mathbf {w}\) is imprimitive.

Then \(\mathbf {w}\sim [x,x,y]\) and both x and y are primitive.

We divide the proof in three steps.

The Core Case. We first prove the claim with two additional assumptions which will be subsequently removed. Namely, the following lemma shows how the knowledge about the B-interpretation of \(x \cdot x\) from the previous section is used. The additional assumptions are displayed as items.

Lemma 6

(bin_imprim_primitive). Let \(B = \{x,y\}\) be a code with \(\left| y \right| \le \left| x \right| \) where

  • both x and y are primitive,

and let \(\mathbf {w}\in \texttt {\texttt {lists}} \, B\) be primitive such that \(\texttt {\texttt {concat}} \, \mathbf {w}\) is imprimitive, and

  • [xx] is a cyclic factor of \(\mathbf {w}\).

Then \(\mathbf {w}\sim [x,x,y]\).


Choosing a suitable conjugate of \(\mathbf {w}\), we can suppose, without loss of generality, that [xx] is a prefix of \(\mathbf {w}\). Now, we want to show \(\mathbf {w}= [x,x,y]\). Proceed by contradiction and assume \(\mathbf {w}\ne [x,x,y]\). Since \(\mathbf {w}\) is primitive, this implies \(\mathbf {w}\cdot [x,x,y] \ne [x,x,y] \cdot \mathbf {w}\).

By Lemma 4, we know that x and y are not conjugate. Let \(\texttt {concat}\, \mathbf {w}= z^k\), \(2 \le k\) and z primitive. Lemma 2 yields a disjoint extendable B-interpretation of \((\texttt {concat}\, \mathbf {w})^2\). In particular, the induced disjoint extendable B-interpretation of the prefix \(x \cdot x\) is of the form \(p\, (x \cdot x)\, s \sim _{\mathcal I} [x,y,x]\) by Theorem 3:

figure d

Let \(\mathbf {p}\) be the prefix of \(\mathbf {w}\) such that \(\texttt {concat}\, \mathbf {p}\cdot p = z\). Then

$$\begin{aligned} \texttt {concat}(\mathbf {p}\cdot [x,y]) = z \cdot (x \cdot p), \quad \texttt {concat}\, [x,x,y] = (x \cdot p)^2, \quad \texttt {concat}\, \mathbf {w}= z^k, \end{aligned}$$

and we want to show \(z = xp\), which will imply \(\texttt {concat}([x,x,y]\cdot \mathbf {w}) = \texttt {concat}(\mathbf {w}\cdot [x,x,y])\), hence \(\mathbf {w}= [x,x,y]\) since \(\{x,y\}\) is a code, and both \(\mathbf {w}\) and [xxy] are primitive.

Again, proceed by contradiction, and assume \(z \ne xp\). Then, since both z and \(x\cdot p\) are primitive, they do not commute. We now have two binary codes, namely \(\{\mathbf {w},[x,x,y]\}\) and \(\{z,xp\}\). The following two equalities, (2) and (3) exploit the fundamental property of longest common prefixes of elements of binary codes mentioned in Sect. 2. In particular, we need the following lemma:

Lemma 7

(bin_code_lcp_concat). Let \(X = \{u_0,u_1\}\) be a binary code, and let \(\mathbf {z}_0,\mathbf {z}_1 \in \texttt {\texttt {lists}} \, X\) be such that \(\texttt {\texttt {concat}} \, \mathbf {z}_0\) and \(\texttt {\texttt {concat}} \, \mathbf {z}_1\) are not prefix-comparable. Then

$$\begin{aligned} (\texttt {\texttt {concat}} \, \mathbf {z}_0) \wedge _p (\texttt {\texttt {concat}} \, \mathbf {z}_1) = \texttt {\texttt {concat}} (\mathbf {z}_0 \wedge _p \mathbf {z}_1) \cdot (u_0 \wedge u_1). \end{aligned}$$

See Sect. 8 for more comments on this property. Denote \(\alpha _{z,xp} = z \cdot xp \wedge _p xp \cdot z\). Then also \(\alpha _{z,xp} = z^k \cdot (xp)^2 \wedge _p (xp)^2 \cdot z^k\). Similarly, let \(\alpha _{x,y} = x \cdot y \wedge _p y \cdot x\). Then Lemma 7 yields

$$\begin{aligned} \begin{aligned} \alpha _{z,xp}&= \texttt {concat}(\mathbf {w}\cdot [x,x,y]) \wedge _p \texttt {concat}([x,x,y] \cdot \mathbf {w}) \\&= \texttt {concat}(\mathbf {w}\cdot [x,x,y] \wedge _p [x,x,y] \cdot \mathbf {w}) \cdot \alpha _{x,y} \end{aligned} \end{aligned}$$

and also

$$\begin{aligned} \begin{aligned} z \cdot \alpha _{z,xp} =&\texttt {concat}(\mathbf {w}\cdot \mathbf {p}\cdot [x,y]) \wedge _p \texttt {concat}(\mathbf {p}\cdot [x,y] \cdot \mathbf {w}) \\ =&\texttt {concat}(\mathbf {w}\cdot \mathbf {p}\cdot [x,y] \wedge _p \mathbf {p}\cdot [x,y] \cdot \mathbf {w}) \cdot \alpha _{x,y}. \end{aligned} \end{aligned}$$


$$\begin{aligned} \mathbf {v}_1&= \mathbf {w}\cdot [x,x,y] \wedge _p [x,x,y] \cdot \mathbf {w},&\mathbf {v}_2&= \mathbf {w}\cdot \mathbf {p}\cdot [x,y] \wedge _p \mathbf {p}\cdot [x,y] \cdot \mathbf {w}. \end{aligned}$$

From (2) and (3) we now have \( z \cdot \texttt {concat}\, \mathbf {v}_1 = \texttt {concat}\, \mathbf {v}_2\). Since \(\mathbf {v}_1\) and \(\mathbf {v}_2\) are prefixes of some \(\mathbf {w}^n\), we have a contradiction with Lemma 2.    \(\square \)

Dropping the Primitivity Assumption. We first deal with the situation when x and y are not primitive. A natural idea is to consider the primitive roots of x and y instead of x and y. This means that we replace the word \(\mathbf {w}\) with \(\mathcal {R}\mathbf {w}\), where \(\mathcal {R}\) is the morphism mapping [x] to \([\rho \, x]^{e_x}\) and [y] to \([\rho \, y]^{e_y}\) where \(x = (\rho \, x)^{e_x}\) and \(y = (\rho \, y)^{e_y}\). For example, if \(x = abab\) and \(y = aa\), and \(\mathbf {w}= [x,y,x] = [abab,aa,abab]\), then \(\mathcal {R}\mathbf {w}= [ab,ab,a,a,ab,ab]\).

Let us check which hypotheses of Lemma 6 are satisfied in the new setting, that is, for the code \(\{\rho \,x,\rho \,y\}\) and the word \(\mathcal {R}\mathbf {w}\). The following facts are not difficult to see.

  • \(\texttt {concat}\, \mathbf {w}= \texttt {concat}(\mathcal {R}\mathbf {w})\);

  • if [cc], \(c\in \{x,y\}\), is a cyclic factor \(\mathbf {w}\), then \([\rho \,c,\rho \,c]\) is a cyclic factor of \(\mathcal {R}\mathbf {w}\).

The next required property:

  • if \(\mathbf {w}\) is primitive, then \(\mathcal {R}\mathbf {w}\) is primitive;

deserves more attention. It triggered another little theory of our formalization which can be found in locale sings_code. Note that it fits well into our context, since the claim is that \(\mathcal {R}\) is a primitivity preserving morphism, which implies that its image on the singletons [x] and [y] forms a primitivity preserving set of words, see theorem code.roots_prim_morph.

Consequently, the only missing hypothesis preventing the use of Lemma 6 is \(\left| y \right| \le \left| x \right| \) since it may happen that \(\left| \rho \, x \right| < \left| \rho \,y \right| \). In order to solve this difficulty, we shall ignore for a while the length difference between x and y, and obtain the following intermediate lemma.

Lemma 8

(bin_imprim_both_squares, bin_imprim_both_squares_prim). Let \(B = \{x,y\}\) be a code, and let \(\mathbf {w}\in \texttt {\texttt {lists}} \, B\) be a primitive word such that \(\texttt {\texttt {concat}} \, \mathbf {w}\) is imprimitive. Then \(\mathbf {w}\) cannot contain both [xx] and [yy] as cyclic factors.


Assume that \(\mathbf {w}\) contains both [xx] and [yy] as cyclic factors.

Consider the word \(\mathcal {R}\mathbf {w}\) and the code \(\{\rho \, x,\rho \, y\}\). Since \(\mathcal {R}\mathbf {w}\) contains both \([\rho \,x,\rho \,x]\) and \([\rho \,y,\rho \,y]\), Lemma 6 implies that \(\mathcal {R}\mathbf {w}\) is conjugate either with the word \([\rho \,x,\rho \,x,\rho \,y]\) or with \([\rho \,y,\rho \,y,\rho \,x]\), which is a contradiction with the assumed presence of both squares.    \(\square \)

Concluding the Proof by Gluing. It remains to deal with the existence of squares. We use an idea that is our main innovation with respect to the proof from [1], and contributes significantly to the reduction of length of the proof, and hopefully also to its increased clarity. Let \(\mathbf {w}\) be a list over a set of words X. The idea is to choose one of the words, say \(u \in X\), and to concatenate (or “glue”) blocks of u’s to words following them. For example, if \(\mathbf {w}= [u,v,u,u,z,u,z]\), then the resulting list is [uvuuzuz]. This procedure is in the general case well defined on lists whose last “letter” is not the chosen one and it leads to a new alphabet \(\{u^i\cdot v \mid v \ne u\}\) which is a code if and only if X is. This idea is used in an elegant proof of the Graph lemma (see [8] and [2]). In the binary case, which is of interest here, if \(\mathbf {w}\) in addition does not contain a square of a letter, say [xx], then the new code \(\{x\cdot y, y\}\) is again binary. Moreover, the resulting glued list \(\mathbf {w}'\) has the same concatenation, and it is primitive if (and only if) \(\mathbf {w}\) is. Note that gluing is in this case closely related to the Nielsen transformation \(y \mapsto x^{-1}y\) known from the theory of automorphisms of free groups.

Induction on \(\left| \mathbf {w} \right| \) now easily leads to the proof of Theorem 4.

Proof (of Theorem 4)

If \(\mathbf {w}\) contains y at most once, then we are left with the equation \(x^j\cdot y = z^\ell \), \(\ell \ge 2\). The equality \(j = 2\) follows from the Periodicity lemma, see Case 2 in the proof of Theorem 2.

Assume for contradiction that y occurs at least twice in \(\mathbf {w}\). Lemma 8 implies that at least one square, [xx] or [yy] is missing as a cyclic factor. Let \(\{x',y'\} = \{x,y\}\) be such that \([x',x']\) is not a cyclic factor of \(\mathbf {w}\). We can therefore perform the gluing operation, and obtain a new, strictly shorter word \(\mathbf {w}' \in \texttt {lists}\, \{x' \cdot y', y'\}\). The longer element \(x'\cdot y'\) occurs at least twice in \(\mathbf {w}'\), since the number of its occurrences in \(\mathbf {w}'\) is the same as the number of occurrences of \(x'\) in \(\mathbf {w}\), the latter word containing both letters at least twice by assumption. Moreover, \(\mathbf {w}'\) is primitive, and \(\texttt {concat}\, \mathbf {w}' = \texttt {concat}\, \mathbf {w}\) is imprimitive. Therefore, by induction on \(\left| \mathbf {w} \right| \), we have \(\mathbf {w}' \sim [x'\cdot y',x' \cdot y', y']\). In order to show that this is not possible we can successfully reuse the lemma imprim_ext_suf_comm mentioned in the proof of Lemma 1, this time for \(u = x'y'x'\) and \(v = y'\). The words u and v do not commute because \(x'\) and \(y'\) do not commute. Since uv is imprimitive, the word \(uvv \sim \texttt {concat}\, \mathbf {w}'\) is primitive.    \(\square \)

This also completes the proof of our main target, Theorem 1.

7 Additional Notes on the Formalization

The formalization is a part of an evolving combinatorics on words formalization project. It relies on its backbone session, called CoW, a version of which is also available in the Archive of Formal Proofs [15]. This session covers basics concepts in combinatorics on words including the Periodicity lemma. An overview is available in [8].

The evolution of the parent session CoW continued along with the presented results and its latest stable version is available at our repository [16]. The main results are part of another Isabelle session CoW_Equations, which, as the name suggests, aims at dealing with word equations. We have greatly expanded its elementary theory Equations_Basic.thy which provides auxiliary lemmas and definitions related to word equations. Noticeably, it contains the definition factor_interpretation (Definition 2) and related facts.

Two dedicated theories were created: Binary_Square_Interpretation.thy and Binary_Code_Imprimitive.thy. The first contains lemmas and locales dealing with \(\{x,y\}\)-interpretation of the square xx (for \(\left| y \right| \le \left| x \right| \)), culminating in Theorem 3. The latter contains Theorems 1 and 4.

Another outcome was an expansion of formalized results related to the Lyndon-Schützenberger theorem. This result, along with many useful corollaries, was already part of the backbone session CoW, and it was newly supplemented with the parametric solution of the equation \(x^jy^k = z^\ell \), specifically Theorem 2 and Lemma 1. This formalization is now part of CoW_Equations in the theory Lyndon_Schutzenberger.thy.

Similarly, the formalization of the main results triggered a substantial expansion of existing support for the idea of gluing as mentioned in Sect. 6. Its reworked version is now in a separate theory called Glued_Codes.thy (which is part of the session CoW_Graph_Lemma).

Let us give a few concrete highlights of the formalization. A very useful tool, which is part of the CoW session, is the reversed attribute. The attribute produces a symmetrical fact where the symmetry is induced by the mapping rev, i.e., the mapping which reverses the order of elements in a list. For instance, the fact stating that if p is a prefix of v, then p a prefix of \(v \cdot w\), is transformed by the reversed attribute into the fact saying that if s is suffix of v, then s is a suffix of \(w \cdot v\). The attribute relies on ad hoc defined rules which induce the symmetry. In the example, the main reversal rule is

\((\mathrm {rev}\,\,\mathrm {u}\le \mathrm {p}\,\,\, \mathrm {rev}\,\,\, \mathrm {v})=\mathrm {u}\le \mathrm {s}\,\,\, \mathrm {v.}\)

The attribute is used frequently in the present formalization. For instance, Fig. 1 shows the formalization of the proof of Cases 1 and 2 of Theorem 1. Namely, the proof of Case 2 is smoothly deduced from the lemma that deals with Case 1, avoiding writing down the same proof again up to symmetry. See [13] for more details on the symmetry and the attribute reversed.

Fig. 1.
figure 1

Highlights from the formalization in Isabelle/HOL.

To be able to use this attribute fully in the formalization of main results, it needed to be extended to be able to deal with elements of type ’a list list, as the constant factor_interpretation is of the function type over this exact type. The new theories of the session CoW_Equations contain almost 50 uses of this attribute.

The second highlight of the formalization is the use of simple but useful proof methods. The first method, called primitivity_inspection, is able to show primitivity or imprimitivity of a given word.

Another method named list_inspection is used to deal with claims that consist of straightforward verification of some property for a set of words given by their length and alphabet. For instance, this method painlessly concludes the proof of lemma  bin_imprim_both_squares_prim. The method divides the goal into eight easy subgoals corresponding to eight possible words. All goals are then discharged by simp_all.

The last method we want to mention is mismatch. It is designed to prove that two words commute using the property of a binary code mentioned in Sect. 2 and explained in Sect. 8. Namely, if a product of words from \(\{x,y\}\) starting with x shares a prefix of length at least \(\left| xy \right| \) with another product of words from \(\{x,y\}\), this time starting with y, then x and y commute. Examples of usage of the attribute reversed and all three methods are given in Fig. 1.

8 Appendix: Background Results in Combinatorics on Words

A periodic root r of w need not be primitive, but it is always possible to consider the corresponding primitive root \(\rho \, r\), which is also a periodic root of w. Note that any word has infinitely many periodic roots since we allow r to be longer than w. Nevertheless, a word can have more than one period even if we consider only periods shorter than |w|. Such a possibility is controlled by the Periodicity lemma, often called the Theorem of Fine and Wilf (see [6]):

Lemma 9

(per_lemma_comm). If w has a period u and v, i.e., \(w \le _p uw\) and \(w \le _p vw\), with \(|u|+|v| - \gcd (|u|,|v|) \le |w|\), then \(uv = vu\).

Usually, the weaker test \(|u| + |v| \le |w| \) is sufficient to indicate that u and v commute.

Conjugation \(u \sim v\) is characterized as follows:

Lemma 10

(conjugation). If \(uz = zv\) for nonempty u, then there exists words r and q and an integer k such that \(u = rq\), \(v = qr\) and \(z = (rq)^kr\).

We have said that w has a periodic root r if it is a prefix of \(r^\omega \). If w is a factor, not necessarily a prefix, of \(r^\omega \), then it has a periodic root which is a conjugate of r. In particular, if \(\left| u \right| = \left| v \right| \), then \(u \sim v\) is equivalent to u and v being mutually factors of a power of the other word.

Commutation of two words is characterized as follows:

Lemma 11

(comm). \(xy = yx\) if and only if \(x = t^k\) and \(y = t^m\) for some word t and some integers \(k,m \ge 0\).

Since every nonempty word has a (unique) primitive root, the word t can be chosen primitive (k or m can be chosen 0 if x or y is empty).

We often use the following theorem, called “the theorem of Lyndon and Schützenberger”:

Theorem 5

(Lyndon_Schutzenberger). If \(x^jy^k = z^\ell \) with \(j \ge 2\), \(k \ge 2\) and \(\ell \ge 2\), then the words x, y and z commute.

A crucial property of a primitive word t is that it cannot be a nontrivial factor of its own square. For a general word u, the equality \(u\cdot u = p \cdot u \cdot s\) with nonempty p and s implies that all three words p, s, u commute, that is, have a common primitive root t. This can be seen by writing \(u = t^k\), and noticing that the presence of a nontrivial factor u inside uu can be obtained exclusively by a shift by several t’s. This idea is often described as “synchronization”.

Let x and y be two words that do not commute. The longest common prefix of xy and yx is denoted \(\alpha \). Let \(c_x\) and \(c_y\) be the letter following \(\alpha \) in xy and yx respectively. A crucial property of \(\alpha \) is that it is a prefix of any sufficiently long word in \(\left\langle \{x,y\} \right\rangle \). Moreover, if \(\mathbf {w}= [u_1,u_2,\ldots ,u_n] \in \texttt {lists}\, \{x,y\}\) is such that \(\texttt {concat}\, \mathbf {w}\) is longer than \(\alpha \), then \(\alpha \cdot [c_x]\) is a prefix of \(\texttt {concat}\, \mathbf {w}\) if \(u_1 = x\) and \(\alpha \cdot [c_y]\) is a prefix of \(\texttt {concat}\, \mathbf {w}\) if \(u_1 = y\). That is why the length of \(\alpha \) is sometimes called “the decoding delay” of the binary code \(\{x,y\}\). Note that the property indeed in particular implies that \(\{x,y\}\) is a code, that is, it does not satisfy any nontrivial relation. It is also behind our method mismatch. Finally, using this property, the proof of Lemma 7 is straightforward.