1 Introduction

Consider two words \(\texttt{abba}\) and \(\texttt{b}\). It is possible to concatenate (several copies of) them as \(\texttt{b}\cdot \texttt{abba} \cdot \texttt{b}\), and obtain a power of a third word, namely a square \(\texttt{bab}\cdot \texttt{bab}\) of \(\texttt{bab}\). In this paper, we describe a formalization in Isabelle/HOL of a full characterization of all ways how this can happen for two words. The question is part of the difficult problem of solving word equations. The simplest word equation is the commutation relation \(xy = yx\), whose solutions are characterized by the existence of a word t and non-negative integers m and n such that \(x = t^m\) and \(y = t^n\). In fact, this characterizes solutions of any non-trivial equation in two variables. While the question of preserving primitivity is trivial for commuting words, it becomes significantly more complicated in the case of non-commuting pairs, that is, in the case of binary codes, which explains the title of this article.

The corresponding theory has a long history. The question can be formulated as solving equations in three variables of the special form \(W(x,y) = z^\ell \) where the left hand side is a product of x’s and y’s, and \(\ell \ge 2\). The seminal result in this direction is the paper by Lyndon and Schützenberger [20] from 1962, which solves the equation \(x^jy^k = z^\ell \), with \(2 \le j,k,\ell \), in a more general setting of free groups. It was followed, in 1967, by a partial answer to our question by Lentin and Schützenberger [18]. Solving an equation is equivalent to finding a monoid whose generators satisfy the relation given by the equation. A complete characterization of monoids generated by three words was provided by Budkina and Markov[4]. The characterization was later, in 1976, reproved in a different way by Lentin’s student Spehner in his Ph.D. thesis [25], which even explicitly mentions the answer to the central question of this paper. See also a comparison of the two classifications by Harju and Nowotka [9]. In 1985, the result was again reproved by Barbin-Le Rest and Le Rest [1], this time specifically focusing on our question. Their paper contains a characterization of binary interpretations of a square as a crucial tool (see Sect. 4). The latter combinatorial result is interesting on its own, but is very little-known. In addition to the fact that, as far as we know, the proof is not available in English, it has to be reconstructed from Théorème 2.1 and Lemme 3.1 [1], it is long, technical and little structured, with many steps that require further clarification. It is symptomatic, for example, that Maňuch [21] cites the claim as essentially equivalent to his desired result but nevertheless provides a different, shorter but similarly technical proof. We hope that a formalization of the result will make it more convincing and approachable for the researchers and hence more widely known and used.

The proof we present here naturally contains some ideas of the original proof [1] but is significantly different. Our main objective was to follow the basic methodological requirement of a good formalization, namely to identify claims that are needed in the proof and formulate them as separate lemmas and as generally as possible so that they can be reused not only in the proof but also later. The resulting overall proof structure is new. In particular, we used the idea of “gluing” words which in itself is not new but it is not used in the original proof at all. The immediate consequence is that for our proof it is enough to consider the interpretation of the square xx, while the original proof must consider several other configurations. The analysis of the proof is therefore another important contribution of our formalization, besides the mere certainty that there are no gaps in the proof. We also provide a complete parametric solution of the equation \(x^jy^k = z^\ell \) for arbitrary j, k and \(\ell \), a classification which is not very difficult, but maybe too complicated to be useful in a mere unverified paper form. We are not aware of a previous publication of this parametric solution, although it could be reconstructed from existing characterizations of monoids generated by three words mentioned above.

The formalization presented here is an organic part of a larger project of formalization of combinatorics of words (see an introductory description [14] or the project repository [11]). The presented version is archived [12]. The existence of the underlying library, which in turn extends the theories of List and HOL-Library.Sublist from the standard Isabelle distribution, critically contributes to a smooth formalization which is getting fairly close to the way a human paper proof would look like, outsourcing technicalities to the (reusable) background. We accompany claims in this text with names of their formalized counterparts.

Outline In Sect. 2 we introduce basic tools of combinatorics on words used in the paper. The section can be therefore understood as a brief tutorial of combinatorics on words. In Sect. 3 we introduce the main theorem of the paper, and outline its proof. The proof is then completed in Sects. 47; see the end of Sect. 3 for more details. Section 5 is dedicated to a special case of uniform binary codes, which is interesting on its own, and is later used for conjugate words. Section 8 provides some further details on the formalization of the result in Isabelle/HOL.

This is an extended version of a conference paper with the same name [16]. The introductory Sect. 2 is an extension of the appendix of the conference paper. We also added to the presentation many proofs omitted in the conference version. In particular, the substantial part of the proof of the key Theorem 25 is given, including proofs of several auxiliary lemmas. Numerous examples and figures were added. Section 3 was extended by Lemma 10 containing a new complementary result. Sect. 5 about uniform codes is a new one. In Sect. 7, we have added a passage about the non-overlapping set, which replaces and generalizes the brief remark about sings-code in the conference paper.

2 Notation and Basics of Combinatorics on Words

In this section we introduce basic notation, and explain most elementary facts, ideas and intuitions that we are going to use in the paper. For proofs and more details see [5, 19, 24].

2.1 Lists, Words and Monoids

Let \(\Sigma \) be a set, usually finite or countable, also called an alphabet.

Lists (i.e. finite sequences) \([x_1,x_2,\ldots ,x_n]\) of elements \(x_i \in \Sigma \) are called words over \(\Sigma \). The length of a word \(u = [x_1,x_2,\ldots ,x_n]\) is denoted \(\left| u \right| \) and equals n. The set of all words over \(\Sigma \) is usually denoted as \(\Sigma ^*\), using the Kleene star. A notorious ambiguity of this notation is related to the situation when we consider a set of words \(X \subset \Sigma ^*\), and are interested in lists over X. They should be denoted as elements of \(X^*\). However, \(X^*\) usually means something else (in the theory of rational languages), namely the set of all words in \(\Sigma ^*\) generated by the set X. To avoid the confusion, we will therefore follow the notation used in the formalization in Isabelle, and write \(\texttt {lists}\,\,{} X\) instead, to make clear that the entries of an element of \(\texttt {lists}\,\,{} X\) are themselves words. Note that \(\Sigma ^*\) is equivalent to \(\texttt {lists}\,\,{} \Sigma \), but we shall keep the former notation for the basic alphabet \(\Sigma \). In order to further help to distinguish words over the basic alphabet from lists over a set of words, we shall use boldface variables for the latter. In particular, it is important to keep in mind the difference between a letter a and the word [a] of length one, a distinction which is usually glossed over lightly in the literature on combinatorics on words. The set of words over \(\Sigma \) generated by X is then denoted as \(\left\langle X \right\rangle \).

The (associative) binary operation of concatenation of two words u and v is denoted by \(u \cdot v\). We prefer this algebraic notation to the Isabelle’s original infix symbol @. Moreover, we shall sometimes omit the dot as usual. With respect to this operation, the set \(\Sigma ^*\) is a monoid (with the empty word as the neutral element), and \(\left\langle X \right\rangle \) is a submonoid of \(\Sigma ^*\) for any \(X\subseteq \Sigma ^*\).

If \({\textbf{u}}= [x_1,x_2,\ldots , x_n] \in \texttt {lists}\,\,{} X\) is a list of words, then we write \(\texttt {concat}\,\,{\textbf{u}}\) for \(x_1\cdot x_2 \cdots x_n\). If \({\textbf{u}}\) is nonempty, its first element \(x_1\) is denoted \(\texttt {hd}\,\,{\textbf{u}}\) (head) and its last element \(x_n\) is denoted \(\texttt {last}\,\,{\textbf{u}}\). We write \(\varepsilon \) for the empty list (the empty word), and \(u^k\) for the concatenation of k copies of u (we use \(u^{\texttt {@}}k\) in the formalization). We write \(u \le _p v\), \(u <_p v\), \(u \le _s v\), \(u <_s v\), and \(u \le _f v\) to say that u is a prefix, a strict prefix, a suffix, a strict suffix and a factor (that is, a contiguous sublist) of v respectively. The longest common prefix of u and v is denoted by \(u \wedge _p v\). If u is a prefix of v or v is a prefix of u, we say that u and v are prefix-comparable.

2.1.1 Primitivity

A word is primitive if it is nonempty and not a power of a shorter word. Otherwise, we call it imprimitive. Each nonempty word w is a power of a unique primitive word \(\rho (w)\), its primitive root. For example, \(u = abab\) is imprimitive with \(\rho (u) = ab\).

2.1.2 Periodic Root and Period

A word r is a periodic root of a word w if \(w <_p r \cdot w\) (note that r must be nonempty). This is equivalent to w being a prefix of a sufficiently large power of r, and we shall sometimes write \(r^\omega \) as an abbreviation of “sufficiently large power” (which remains just a notation, we do not deal with infinite words in this paper, nor in the formalization). Dual concept to (prefix-)periodic root is the suffix-periodic root, characterized by \(w <_s w \cdot r\). This is just one instance of the natural duality given by the reversal (or mirror) symmetry, which is often exploited in human proofs on an intuitive basis. See Sect. 8 for more details on how reversal symmetry is used in our formalization.

If r is a periodic root of w, then we also say that w has a period \(\left| r \right| \). Note that this is equivalent to w having a suffix-periodic root \(r'\) with \(\left| r \right| = \left| r' \right| \).

A periodic root r of w need not be primitive, but it is always possible to consider the corresponding primitive root \(\rho (r)\), which is also a periodic root of w. Note that any word has infinitely many periodic roots since we allow r to be longer than w. Nevertheless, a word can have more than one period even if we consider only periods shorter than \(\left| w \right| \). For example, the word aaabaa has periods 4 and 5, with corresponding periodic roots aaab and aaaba. Such a possibility is controlled by the Periodicity Lemma, often called the Fine–Wilf Theorem [7]:

Lemma 1

(per-lemma-comm) If \(w \le _p uw\) and \(w \le _p vw\), with \(\left| u \right| + \left| v \right| - \gcd (\left| u \right| ,\left| v \right| ) \le \left| w \right| \), then \(uv = vu\).

This implies, together with Lemma 4 below, that the word aaabaa of length 7 is a word of the greatest length with periods 4 and 5 that is not a power of a single letter. Usually, the weaker test \(\left| u \right| + \left| v \right| \le \left| w \right| \) is sufficient to indicate that u and v commute.

2.1.3 Maximal Prefix

Given a word r, we define the maximal r-prefix of a word w as \(w \wedge _p r^\omega \). For example, if \(r = ab\), then the maximal r-prefix of ababaabab is ababa. Sometimes, the easiest way to prove that two words are not equal is to show that they have different r-prefixes for a suitable r, see the proof of Lemma 24 below.

2.1.4 Conjugation

We say that two words u and v are conjugate and write \(u \sim v\) if \(u = rq\) and \(v=qr\) for some words r and q. Note that conjugation is an equivalence whose classes are also called cyclic words. For example, aab and aba are conjugate. They are elements of the conjugacy class \(\{aab,aba,baa\}\). A word u is a cyclic factor of w if it is a factor of some conjugate of w. If \(|u|\le |w|\), this is equivalent to u being a factor of ww.

Conjugation \(u \sim v\) is characterized as follows:

Lemma 2

(conjugation) If \(uz = zv\) for nonempty u, then there exists words r and q and an integer k such that \(u = rq\), \(v = qr\) and \(z = (rq)^kr\).

A word w has a periodic root r if it is a prefix of \(r^\omega \). If w is a factor, not necessarily a prefix, of \(r^\omega \), then it has a periodic root which is a conjugate of r. In particular, if \(\left| u \right| = \left| v \right| \), then \(u \sim v\) is equivalent to u and v each being a factor of a power of the other word . Note also that if \(u\sim v\) and \(u = t^k\), then \(v=s^k\) for some \(s\sim t\).

2.1.5 Lyndon and Schützenberger

The following result [20], called the Lyndon–Schützenberger Theorem, has been mentioned in the introduction:

Theorem 3

(Lyndon–Schutzenberger) If \(x^jy^k = z^\ell \) with \(j \ge 2\), \(k \ge 2\) and \(\ell \ge 2\), then the words x, y and z commute.

In the context of our paper, the preferred reading is the following: If x and y do not commute (that is, they form a binary code, see below), and \(x^jy^k\) is imprimitive, then \(j=1\) or \(k = 1\).

2.2 Binary Codes

2.2.1 Code

A set of words X is a code if its elements do not satisfy any nontrivial relation, that is, they are a basis of a free monoid. A monoid M of words is free if and only if it satisfies the stability condition which is the implication

$$\begin{aligned} u,v,uz,zv \in M \Longrightarrow z \in M. \end{aligned}$$

This is a useful characterization since it allows to recognize a free monoid without knowing its basis. Yet another formulation is that \(\texttt {concat}\,\,\) is a bijection between \(\texttt {lists}\,\,{} X\) and \(\left\langle X \right\rangle \) for a code X. The name “code” is motivated by the latter fact: any concatenation of a list of elements of the code can be uniquely factorized (“decoded”) back into the original list.

2.2.2 Commutation

It can be shown that a two-element set \(\{x,y\}\) is a (binary) code if and only if x and y do not commute. Therefore, if we say that \(B = \{x,y\}\) is a binary code, it is equivalent to saying that \(xy \ne yx\), which is often preferred in the formalization. By the definition of code, the fact means that two words commute if and only if they satisfy a nontrivial relation. This makes commutation an important property, which can be characterized as follows:

Lemma 4

(comm) \(xy = yx\) if and only if \(x = t^k\) and \(y = t^m\) for some word t and some integers \(k,m \ge 0\).

Since every nonempty word has a (unique) primitive root, the word t can be chosen primitive (k or m can be chosen 0 if x or y is empty). This implies that two nonempty words x and y commute if and only if \(\rho (x) = \rho (y)\). A useful consequence is that x and y commute if and only if \(x^j\) and \(y^k\) commute, where \(0 < k, j\).

2.2.3 Synchronization

A crucial property of a primitive word t is that it cannot be a nontrivial factor of its own square. More specifically, for a general word u, the equality \(u\cdot u = p \cdot u \cdot s\) implies that all three words p, s, u commute. Indeed, it follows that \(u \cdot u \cdot u = p \cdot u \cdot s \cdot u = u \cdot p \cdot u \cdot s\), and thus \(p \cdot u = u \cdot p\) and \(s \cdot u = u \cdot s\). Therefore, p, s and u have a common primitive root t. Hence, the presence of a nontrivial factor u inside uu can be obtained exclusively by a shift by several t’s. Especially, if u is primitive, i.e., \(u = t\), we either have \(p = u = t\) and s empty, or vice versa. We shall refer to this idea of shifting by the primitive root as synchronization. A slightly more general formulation, which can be seen as an instance of synchronization, says that if \(w\cdot v\) is a prefix of \(v^\omega \), then w and v commute.

2.2.4 Decoding Delay

Let x and y be two words that do not commute. Equivalently, let \(B = \{x,y\}\) be a binary code. Then the longest common prefix \((x \cdot y) \wedge _p (y \cdot x)\) of \(x\cdot y\) and \(y\cdot x\), denote it \(\alpha \), is a strict prefix of both \(x\cdot y\) and \(y\cdot x\). Let \(c_x\) and \(c_y\) be distinct letters following \(\alpha \) in \(x\cdot y\) and \(y\cdot x\) respectively. A crucial property of \(\alpha \), not very difficult to prove, is that it is a prefix of any sufficiently long word in \(\left\langle \{x,y\} \right\rangle \). Moreover, if \({\textbf{w}}= [u_1,u_2,\ldots ,u_n] \in \texttt {lists}\,\,{} \{x,y\}\) is such that \(\texttt {concat}\,\,{\textbf{w}}\) is longer than \(\alpha \), then \(\alpha \cdot [c_x]\) is a prefix of \(\texttt {concat}\,\,{\textbf{w}}\) if \(u_1 = x\) and \(\alpha \cdot [c_y]\) is a prefix of \(\texttt {concat}\,\,{\textbf{w}}\) if \(u_1 = y\). That is why the length of \(\alpha \) is sometimes called the decoding delay of the binary code \(\{x,y\}\). Note that the property indeed in particular implies that \(\{x,y\}\) is a code, that is, it does not satisfy any nontrivial relation. For example, \(\alpha = aba\) if \(x = abaa\) and \(y = ab\), with \(c_x = a\) and \(c_y = b\). Suppose that we want to decode a word starting with \(abaaabaa\cdots \in \left\langle \{x,y\} \right\rangle \), that is, we want to find its unique decomposition into words x and y. We first see the prefix \(\alpha = aba\) which says nothing about the decomposition since it is common to all messages. It is followed by \(c_x = a\) which indicates that the first word in the decomposition is x.

The property is also behind our method mismatch (see Sect. 8). Finally, using this property, the proof of the following lemma is straightforward.

Lemma 5

(bin-code-lcp-concat’) Let \(X = \{x,y\}\) be a binary code, and let \({\textbf{w}}_0,{\textbf{w}}_1 \in \texttt {lists}\,\,{} X\) be such that \(\texttt {concat}\,\,{\textbf{w}}_0\) and \(\texttt {concat}\,\,{\textbf{w}}_1\) are not prefix-comparable. Then

$$\begin{aligned} (\texttt {concat}\,\,{\textbf{w}}_0) \wedge _p (\texttt {concat}\,\,{\textbf{w}}_1) = \texttt {concat}\,\,({\textbf{w}}_0 \wedge _p {\textbf{w}}_1) \cdot (xy \wedge _p yx). \end{aligned}$$

3 Main Theorem

Let us introduce the central definition of the paper.

Definition 6

We say that a set X of words is primitivity-preserving if there is no list \({\textbf{w}}\in \texttt {lists}\,\,{} X\) such that

  • \(\left| {\textbf{w}} \right| \ge 2\);

  • \({\textbf{w}}\) is primitive; and

  • \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive.

If X is not primitivity-preserving, then each primitive \({\textbf{w}}\in \texttt {lists}\,\,{} X\) of length at least two such that \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive will be called a witness (of the fact that X is not primitivity-preserving).

Note that our definition does not take into account singletons \({\textbf{w}}= [z]\). In particular, X can be primitivity-preserving even if some \(z \in X\) is imprimitive. For example, if \(X = \{aa,b\}\), then X is primitivity-preserving despite the fact that \(\texttt {concat}\,\,{\textbf{w}}\) of the primitive list \({\textbf{w}}= [x]\) is an imprimitive word aa. On the other hand, in the binary case, Theorem 7 gives conditions under which x and y must be primitive if \(\{x,y\}\) is not primitivity-preserving.

Mitrana [22] shows that X is primitivity-preserving if and only if it is the minimal set of generators of a “pure monoid”, cf. [3, p. 276]. The latter definition requires that even individual code words are primitive, which is a significant difference from our definition (see, in particular, the concept of a non-overlapping set in Sect. 7). Mitrana formulates the primitivity of a set in terms of morphisms, that is, mappings \(h: A^* \rightarrow \Sigma ^*\) satisfying \(h(u\cdot v) = h(u) \cdot h(v)\) for all \(u,v \in A^*\). This is equivalent to our formulation in the following way. Consider a binary alphabet \(A = \{a,b\}\), and a morphism \(h: A^* \rightarrow \Sigma ^*\) defined by \(h: a \mapsto x, b\mapsto y\). Then h represents our set \(X = \{x,y\}\), and we can write, for example, h(abaa) instead of \(\texttt {concat}\,\,[x,y,x,x]\). The concept of preserving primitivity puts our paper into a wider context of morphisms preserving a given property, most classically square-freeness; see for example a characterization of square-free morphisms over three letters by Crochemore [6].

The target claim of our formalization is the following characterization of lists witnessing that a binary code is not primitivity-preserving:

Theorem 7

(bin-imprim-code)

Let \(B = \{x,y\}\) be a binary code that is not primitivity-preserving. Then there are integers \(j \ge 1\) and \(k \ge 1\), with \(k = 1\) or \(j = 1\), such that the following conditions are equivalent for any \({\textbf{w}}\in \texttt {lists}\,\,{} B\) with \(\left| {\textbf{w}} \right| \ge 2\):

  • \({\textbf{w}}\) is primitive, and \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive;

  • \({\textbf{w}}\) is conjugate with \([x]^j[y]^k\).

Moreover, assuming \(\left| y \right| \le \left| x \right| \),

  • if \(j \ge 2\), then \(j=2\) and \(k=1\), and both x and y are primitive;

  • if \(k \ge 2\), then \(j=1\) and x is primitive.

First, note that the integers j and k depend on the set B only. Consequently, the theorem says that there is a unique witness of the form \([x]^j[y]^k\) for a given B, and all witnesses are conjugate with it.

Proof

(overview) Let \({\textbf{w}}\) be a witness. That is, \(\left| {\textbf{w}} \right| \ge 2\), \({\textbf{w}}\) is primitive, and \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive. Since \([x]^j[y]^k\) and \([y]^k[x]^j\) are conjugate, we can suppose, without loss of generality, that \(\left| y \right| \le \left| x \right| \).

First, we want to show that \({\textbf{w}}\) is conjugate with \([x]^j[y]^k\) for some \(j,k \ge 1\) such that \(k = 1\) or \(j = 1\). Since \({\textbf{w}}\) is primitive and of length at least two, it contains both x and y. If it contains one of these letters exactly once, then \({\textbf{w}}\) is clearly conjugate with \([x]^j[y]^k\) for \(j = 1\) or \(k = 1\). Therefore, the difficult part is to show that no primitive \({\textbf{w}}\) with \(\texttt {concat}\,\,{\textbf{w}}\) imprimitive can contain both letters at least twice. This is the main task of the rest of the paper, which is finally accomplished by Theorem 25 claiming that lists that contain at least two occurrences of x are conjugate with [xxy]. To complete the proof of the first part of the theorem, it remains to show that j and k are unique for a given \(\{x,y\}\). This follows from Lemma 9.

Note that the imprimitivity of \(\texttt {concat}\,\,{\textbf{w}}\), with \({\textbf{w}}= [x]^j[y]^k\), induces the equality \(x^jy^k = z^\ell \) for some z and \(\ell \ge 2\). The already mentioned seminal result of Lyndon and Schützenberger (Theorem 3) shows that j and k cannot be simultaneously at least two, since otherwise x and y commute. For the same reason, considering its primitive root, the word y is primitive if \(j \ge 2\). Similarly, x is primitive if \(k \ge 2\). The primitivity of x when \(j = 2\) is a part of Theorem 25. \(\square \)

We start by giving a complete parametric solution of the equation \(x^jy^k = z^\ell \) in the following theorem. This will eventually yield, after the proof of Theorem 7 is completed, a full description of not primitivity-preserving binary codes. Since the equation is mirror symmetric, we omit symmetric cases by assuming \(\left| y \right| \le \left| x \right| \).

Theorem 8

(LS-parametric-solution) Let \(j,k \ge 1\), \(\ell \ge 2\) and \(\left| y \right| \le \left| x \right| \).

The equality \(x^jy^k = z^\ell \) holds in exactly the following cases:

  1. A.

    There exist a word r and integers \(m,n,t \ge 0\) such that

    $$\begin{aligned} mj+nk&= t \ell , \quad \text{ and } \\ x = r^m, \quad y&= r^n, \quad z = r^t; \end{aligned}$$
  2. B.

    \(j = k = 1\) and there exist non-commuting words r and q, and integers \(m,n \ge 0\) such that

    $$\begin{aligned} m+n+1&= \ell , \quad \text{ and } \\ x = (rq)^mr, \quad y&= q(rq)^{n}, \quad z = rq; \end{aligned}$$
  3. C.

    \(j = 1\) and \(k \ge 2\) and there exist non-commuting words r and q such that

    $$\begin{aligned} x = (qr^k)^{\ell -1}q, \quad y = r, \quad z = qr^k; \end{aligned}$$
  4. D.

    \(j = 1\) and \(k \ge 2\) and there exist non-commuting words r and q, an integer \(m \ge 1\) such that

    $$\begin{aligned} x = (qr(r(qr)^m)^{k - 1})^{\ell - 2}qr(r(qr)^m)^{k - 2}rq, \ y = r(qr)^m, \ z = qr(r(qr)^m)^{k - 1}; \end{aligned}$$
  5. E.

    \(j = \ell = 2\), \(k = 1\) and there exist non-commuting words r and q and an integer \(m \ge 2\) such that

    $$\begin{aligned} x = (rq)^m r, \quad y = qrrq, \quad z = (rq)^mrrq. \end{aligned}$$
Fig. 1
figure 1

Illustration of the distinct cases of Theorem 8

All the cases of the last theorem are illustrated in Fig. 1. See also Example 11.

Proof

If x and y commute, then all three words commute, hence they are powers of a common word. A length argument yields the solution A.

Assume now that \(\{x,y\}\) is a code. It follows that z does not commute with x. We have shown in the overview of the proof of Theorem 7 that \(j = 1\) or \(k = 1\) by the Lyndon–Schützenberger Theorem 3. The solution is then split into several cases.

Case 1: \(j = k = 1\).

Let m and r be such that \(z^mr = x\) with r a strict prefix of z. By setting \(z = rq\), we obtain the solution B with \(n = \ell - m -1\).

Case 2: \(j \ge 2, k = 1\).

Since \(\left| y \right| \le \left| x \right| \) and \(\ell \ge 2\), we have

$$\begin{aligned} 2\left| z \right| \le \left| z^\ell \right| = \left| x^j \right| + \left| y \right| < 2\left| x^j \right| , \end{aligned}$$

hence z is a strict prefix of \(x^j\).

As \(x^j\) has periodic roots both z and x, and z does not commute with x, the Periodicity Lemma 1 implies \(\left| x^j \right| < \left| z \right| + \left| x \right| \). That is, \(z = x^{j-1}u\), \(x^j = zv\) and \(x = uv\) for some nonempty words u and v. Since \(x^j\) is a prefix of \(z^\ell \), we deduce that v is a prefix of \(z^{\ell -1}\), and therefore also of x since x is a prefix of z. This implies that

$$\begin{aligned} x = uv = vu' \end{aligned}$$

for some word \(u'\). That is, the words u and \(u'\) are conjugate by v. The characterization of conjugate words yields \(u = rq\), \(u' = qr\) and \(v = (rq)^nr\) for some words r, q and an integer n. Moreover, since x and y do not commute, the words r and q are nonempty.

We have

$$\begin{aligned} j\left| x \right| + \left| y \right| = \left| x^jy \right| = \left| z^\ell \right| = \left| (x^{j-1}u)^\ell \right| = \ell (j-1)\left| x \right| + \ell \left| u \right| , \end{aligned}$$

and thus \(\left| y \right| = (\ell j-\ell -j)\left| x \right| + \ell \left| u \right| \). From \(\left| y \right| \le \left| x \right| \), \(\left| u \right| > 0\), and \(\ell \ge 2\), we deduce that \(\ell j-\ell -j\) is not positive. Since \(j,\ell \ge 2\), this implies \(j = \ell = 2\). Then \(z = x^{j-1}u = xu = uvu\). From \(x^2y= z^2\), we have \(uvuvy = uvuuvu = uvuvu'u,\) hence \(y = u'u\). Substituting \(u = rq\), \(u' = qr\), and \(v = (rq)^nr\), we obtain the solution E with \(m = n+1\), where \(m \ge 2\) follows from \(\left| y \right| \le \left| x \right| \).

Case 3: \(j = 1, k \ge 2, y^k \le _s z\).

We have \(z = qy^k\) for some word q. Noticing that \(x = z^{\ell -1}q\) and setting \(y = r\) yields the solution C. The words r and q do not commute since x and y do not commute.

Case 4: \(j = 1, k \ge 2, z <_s y^k\).

This case is analogous to Case 2. Using the Periodicity Lemma 1, we obtain \(uy^{k-1} = z\), \(y^k = vz\), and \(y = vu\) with nonempty u and v. As v is a suffix of \(z^{\ell -1}\) and is shorter than y, it is also a suffix of y, and we have \(y = vu = u'v\) for some \(u'\) conjugate with u by v. We therefore have nonempty words r and q such that \(u' = rq\), \(u = qr\), and \(v = (rq)^nr\). Using \(y = u'v\), \(z = uy^{k-1}\) and \(z^{\ell -1} = xv\), we obtain the solution D with \(m = n + 1\). Again, the words r and q do not commute since x and y, which are generated by r and q, do not commute.

The proof is completed by a direct verification of the converse. \(\square \)

The case analysis in the previous proof also shows that at most one of the cases holds for given x, y and z.

Recall that Theorem 7 claims two things. First, if there is a word \({\textbf{w}}\) witnessing that \(\{x,y\}\) is not primitivity-preserving, then \({\textbf{w}}\) is conjugate with \([x]^j[y]^k\) for some j and k. Second, there is at most one such pair (jk) for a given \(\{x,y\}\), (and exactly one if \(\{x,y\}\) is not primitivity-preserving). The next lemma proves the second claim.

Lemma 9

(LS-unique) Let \(B = \{x,y\}\) be a binary code. Assume \(j,k,j',k' \ge 1\). If both \(x^jy^k\) and \(x^{j'}y^{k'}\) are imprimitive, then \(j = j'\) and \(k = k'\).

Proof

Let \(z_1,z_2\) be primitive words and \(\ell ,\ell ' \ge 2\) be such that

$$\begin{aligned} x^jy^k = z_1^\ell \quad \text { and } \quad x^{j'}y^{k'} = z_2^{\ell '}. \end{aligned}$$
(1)

Since B is a code, the words x and y do not commute. We proceed by contradiction.

Case 1: First, assume that \(j = j'\) and \(k \ne k'\).

Let, without loss of generality, \(k < k'\). From (1) we obtain \(z_1^\ell y^{k' - k} = z_2^{\ell '}\). The case \(k' - k \ge 2\) is impossible due to the Lyndon–Schützenberger Theorem 3. Hence \(k' - k = 1\). This is another place where the formalization led to a simple and nice general lemma (easily provable by the Periodicity Lemma 1) which will turn out to be useful also in the proof of Theorem 25. Namely, the lemma imprim-ext-suf-comm claims that if both uv, and uvv are imprimitive, then u and v commute (see also a comment in Sect. 8). We apply this lemma to \(u = x^jy^{k-1}\) and \(v = y\), obtaining a contradiction to the assumption that x and y do not commute.

Case 2. The case \(k = k'\) and \(j \ne j'\) is symmetric to Case 1.

Case 3. Let finally \(j \ne j'\) and \(k \ne k'\). The Lyndon–Schützenberger Theorem 3 implies that either j or k is one, and similarly either \(j'\) or \(k'\) is one. We can therefore assume that \(k = j' = 1\) and \(k',j \ge 2\). Moreover, we can assume that \(\left| y \right| \le \left| x \right| \). Indeed, in the opposite case, we can consider the words \(y^kx^j\) and \(y^{k'}x^{j'}\) instead, which are also both imprimitive.

Theorem 8 now allows only the case E for the equality \(x^jy = z_1^\ell \). We therefore have \(j = \ell = 2\) and \(x = (rq)^mr\), \(y = qrrq\) for an integer \(m \ge 2\) and some non-commuting words r and q. Assume that \(z_2\) and rq have the same primitive root. Then \(qr = rq\), since \(|qr| = |rq|\) and \(y = qrrq\) is a suffix of \(z_2^{\ell '}\), a contradiction. Therefore \(z_2\) and rq do not commute. Since \(y = qrrq\) is a suffix of \(z_2^\ell \), this implies that \(z_2\) and rq do not commute. Consider the word \(x \cdot qr = (rq)^mrqr\), which is a prefix of xy, and therefore also of \(z_2^\ell \). This means that \(x \cdot qr\) has two periodic roots, namely rq and \(z_2\), and the Periodicity Lemma 1 implies that \(\left| x \cdot qr \right| < \left| rq \right| + \left| z_2 \right| \). Hence x is shorter than \(z_2\). The equality \(xy^{k'} = z_2^{\ell '}\), with \(\ell ' \ge 2\), now implies on one hand that rqrq is a prefix of \(z_2\), and on the other hand that \(z_2\) is a suffix of \(y^{k'}\). It follows that rqrq is a factor of \((qrrq)^{k'}\). Hence rqrq and qrrq are conjugate and qrrq is a square since rqrq is a square, see Sect. 2.1.4. Thus they both have a period of length \(\left| rq \right| \), which implies \(qr = rq\) , a contradiction. \(\square \)

A natural question is whether the property of being primitivity-preserving it algorithmically decidable for given \(\{x,y\}\). It follows from Theorem 7 that it is enough to check primitivity of elements from the set

$$\begin{aligned} \{xxy\} \cup \{xy^k \mid k \ge 1\}. \end{aligned}$$

From the computational point of view, we therefore need an upper bound on k in terms of \(\left| x \right| \) and \(\left| y \right| \). Such a bound is given by the following lemma.

Lemma 10

(LS-exp-le) Let \(B = \{x,y\}\) be a binary code and let \(x \cdot y^k = z^l\) with \(k,\ell \ge 2\). Then

$$\begin{aligned} k \le \frac{\left| x \right| - 4}{\left| y \right| } + 2. \end{aligned}$$

Proof

By Theorem 8, it is enough to consider cases C and D. In case C, using successively \(\ell \ge 2\), \(\left| r \right| \ge 1\) and \(\left| q \right| \ge 1\), we have

$$\begin{aligned} \frac{\left| x \right| - 4}{\left| y \right| } + 2&= \frac{(\ell - 1)(\left| q \right| + k\left| r \right| ) + \left| q \right| - 4}{\left| r \right| } + 2 \ge \frac{2\left| q \right| + k\left| r \right| - 4}{\left| r \right| } + 2 \ge \\&\ge k + 2 + \frac{2\left| q \right| - 4}{\left| r \right| } \ge k + 2 - \frac{2}{\left| r \right| } \ge k. \end{aligned}$$

Similarly, in case D, using \(\ell \ge 2\) and \(\left| qr \right| \ge 2\), we have

$$\begin{aligned} \frac{\left| x \right| - 4}{\left| y \right| } + 2&= \frac{(\ell - 2)\left| qr(r(qr)^m)^{k-1} \right| + 2\left| qr \right| + (k-2)\left| y \right| - 4}{\left| y \right| } + 2 \ge \\&\ge \frac{2\left| qr \right| + (k-2)\left| y \right| - 4}{\left| y \right| } + 2 \ge k + \frac{2\left| qr \right| -4}{\left| y \right| } \ge k. \end{aligned}$$

\(\square \)

Note that the bound is sharp since we have equality for \(\ell = 2\) and \(\left| q \right| = \left| r \right| = 1\) in both cases as the following example points out.

Example 11

(examples-bound-optimality) For any \(k \ge 2\) the triples

corresponding to cases C and D (with \(m = 1\)) of Theorem 8, satisfy \(\left| y \right| \le \left| x \right| \), \(x \cdot y^k = z \cdot z\), \(x \cdot y \ne y \cdot x\) and \(k = \nicefrac {(\left| x \right| - 4)}{\left| y \right| } + 2.\)

We remark that the primitivity-preserving property is decidable for all finite sets due to the characterization of star-free regular languages as those with aperiodic syntactic monoid. See Mitrana [22, Corollary 6] for more details and further references.

The rest of the paper, and therefore also of the proof of Theorem 7, is organized as follows. In Sect. 4, we introduce a general theory of interpretations, which is behind the main idea of the proof. In Sect. 5, we apply it to the (relatively simple) case of a binary code with words of the same length. In Sect. 6, we characterize the unique disjoint extendable \(\{x,y\}\)-interpretation of the square of the longer word x. This is a result of independent interest, and also the cornerstone of the proof of Theorem 7 which is completed in Sect. 7 by showing that a list containing at least two x’s witnessing that \(\{x,y\}\) is not primitivity-preserving is conjugate with [xxy].

4 Interpretations and the Main Idea

Let X be a code, and let u be a factor of \(\texttt {concat}\,\,{\textbf{w}}\) for some \({\textbf{w}}\in \texttt {lists}\,\,{} X\). The natural question is to decide how u can be produced as a factor of words from X, or, in other words, how it can be interpreted in terms of X. This motivates the following definition.

Definition 12

Let X be a set of words over \(\Sigma \). We say that the triple \((p,s,{\textbf{w}}) \in \Sigma ^*\times \Sigma ^* \times \texttt {lists}\,\,{} X\) is an X-interpretation of a word \(u \in \Sigma ^*\) if

  • \({\textbf{w}}\) is nonempty;

  • \(p \cdot u \cdot s = \texttt {concat}\,\,{\textbf{w}}\);

  • \(p <_p \texttt {hd}\,\,{\textbf{w}}\) and

  • \(s <_s \texttt {last}\,\,{\textbf{w}}\).

The definition is illustrated by the following figure, where \({\textbf{w}}= [w_1,w_2,w_3,w_4]\):

The second condition of the definition motivates the notation \(p\, u\, s \sim _{{\mathcal {I}}} {\textbf{w}}\) for the situation when \((p, s, {\textbf{w}})\) is an X-interpretation of u.

Remark 13

For sake of historical reference, we remark that our definition of X-interpretation differs from the one used in [1]. The formulation in [1] of the situation depicted by the above figure would be that u is interpreted by the triple \((s', w_2 \cdot w_3, p')\) where \(p\cdot s' = w_1\) and \(p'\cdot s = w_4\). This is less convenient for two reasons. First, the decomposition of \(w_2 \cdot w_3\) into \([w_2,w_3]\) is only implicit here (and even possibly ambiguous if X is not a code). Second, while it is required that the words \(p'\) and \(s'\) are a prefix and a suffix, respectively, of an element from X, the identity of that element is left open, and has to be specified separately.

If u is a nonempty element of \(\left\langle X \right\rangle \) and \(u = \texttt {concat}\,\,{\textbf{u}}\) for \({\textbf{u}}\in \texttt {lists}\,\,{} X\), then the X-interpretation \(\varepsilon \, u\, \varepsilon \sim _{{\mathcal {I}}} {\textbf{u}}\) is called trivial. Note that the trivial X-interpretation is unique if X is a code.

As nontrivial X-interpretations of elements from \(\left\langle X \right\rangle \) are of particular interest, the following two concepts are useful.

Definition 14

An X-interpretation \(p\, u\, s \sim _{{\mathcal {I}}} {\textbf{w}}\) of \(u = \texttt {concat}\,\,{\textbf{u}}\) is called

  • disjoint if \(\texttt {concat}\,\,{\textbf{w}}' \ne p \cdot \texttt {concat}\,\,{\textbf{u}}'\) whenever \({\textbf{w}}' \le _p {\textbf{w}}\) and \({\textbf{u}}' \le _p {\textbf{u}}\);

  • extendable if \(p \le _s w_p\) and \(s \le _p w_s\) for some elements \(w_p, w_s \in \left\langle X \right\rangle \).

Informally, an interpretation is disjoint if no “edge” between words in \({\textbf{u}}\) fits an “edge” between words in \({\textbf{w}}\) in the equality \(p\cdot \texttt {concat}\,\,{\textbf{u}}\cdot s = \texttt {concat}\,\,{\textbf{w}}\). That is, the equality does not split in two shorter ones. Let, for example, \(x = \texttt{01101}\) and \(y = \texttt{01}\) and let \({\textbf{u}}= [y,y]\). Then the interpretation

$$\begin{aligned} \texttt{011}\, \texttt{0101}\, \texttt{101} \sim _{{\mathcal {I}}} [x,x] \end{aligned}$$

of \(u = \texttt{0101} = \texttt {concat}\,\,[y,y]\) is not disjoint, see Fig. 2. Moreover, it is not extendable, since \(\texttt{101}\) is not a prefix of any word from \(\left\langle \{x,y\} \right\rangle \). We could also argue that it is not extendable because \(\texttt{011}\) is not a suffix of any word from \(\left\langle \{x,y\} \right\rangle \). In contrast, the interpretation in Fig. 5 is disjoint and extendable.

Fig. 2
figure 2

Non-disjoint interpretation

Note that a disjoint X-interpretation is not trivial, and that being disjoint is relative to a chosen factorization \({\textbf{u}}\) of u (which is nevertheless unique if X is a code). It should be clear from the definition in which way an extendable interpretation of \(\texttt {concat}\,\,{\textbf{u}}\) can be “extended” into an X-interpretation of \(\texttt {concat}\,\,{\textbf{w}}\).

The definitions above are naturally motivated by the main idea of the characterization of sets X that do not preserve primitivity, which dates back to Lentin and Schützenberger [18]. If \({\textbf{w}}\) is primitive while \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive, say \(\texttt {concat}\,\,{\textbf{w}}= z^\ell \), \(\ell \ge 2\), then the shift by z provides a nontrivial and extendable X-interpretation of \(\texttt {concat}\,\,{\textbf{w}}\). (In fact, there are \(\ell -1\) such nontrivial interpretations). Moreover, the following lemma, formulated in a more general setting of two lists \({\textbf{w}}_1\) and \({\textbf{w}}_2\), implies that the X-interpretation is disjoint if X is a code.

Lemma 15

(shift-disjoint,shift-interpret) Let X be a code. Let \({\textbf{w}}_1, {\textbf{w}}_2 \in \texttt {lists}\,\,{} X\) be such that \(z \cdot \texttt {concat}\,\,{\textbf{w}}_1 = \texttt {concat}\,\,{\textbf{w}}_2 \cdot z\) where \(z \notin \left\langle X \right\rangle \). Then \(z \cdot \texttt {concat}\,\,{\textbf{v}}_1 \ne \texttt {concat}\,\,{\textbf{v}}_2\), whenever \({\textbf{v}}_1 \le _p {\textbf{w}}_1^n\) and \({\textbf{v}}_2 \le _p {\textbf{w}}_2^n\), \(n\in {\mathbb {N}}\).

In particular \(\texttt {concat}\,\,{\textbf{u}}\) has a disjoint extendable X-interpretation for any nonempty prefix \({\textbf{u}}\) of \({\textbf{w}}_1\).

Fig. 3
figure 3

The situation prohibited by Lemma 15

Proof

First, note that \(z \cdot \texttt {concat}\,\,{\textbf{w}}_1^n = \texttt {concat}\,\,{\textbf{w}}_2^n \cdot z\) for any n. Let \({\textbf{w}}_1^n = {\textbf{v}}_1\cdot {\textbf{v}}_1'\) and \({\textbf{w}}_2^n= {\textbf{v}}_2\cdot {\textbf{v}}_2'\). If \(z \cdot \texttt {concat}\,\,{\textbf{v}}_1 = \texttt {concat}\,\,{\textbf{v}}_2\), then also \(\texttt {concat}\,\,{\textbf{v}}_2' \cdot z = \texttt {concat}\,\,{\textbf{v}}_1'\). This contradicts \(z \notin \left\langle X \right\rangle \) by the stability condition. We have proved the first part of the lemma excluding the situation illustrated by Fig. 3. The corresponding lemma in formalization is shift-disjoint.

The second part is covered in the formalization by shift-interpret. An extendable X-interpretation of \(\texttt {concat}\,\,{\textbf{u}}\) is induced by the fact that \(\texttt {concat}\,\,{\textbf{u}}\) is a factor of \(\texttt {concat}\,\,({\textbf{w}}_2 \cdot {\textbf{w}}_2)\). By Lemma 2, there are words q and r such that \(\texttt {concat}\,\,{\textbf{w}}_1 = rq\), \(\texttt {concat}\,\,{\textbf{w}}_2 = qr\) and \(z = (rq)^mr\) for some r. Since \(z \notin \left\langle X \right\rangle \), we deduce that also \(r\notin X\) and the assumptions of the present lemma are satisfied for r. We can therefore assume that \(z = r\). In particular, \(|z| < |\texttt {concat}\,\,{\textbf{w}}_2|\) and \(z\cdot \texttt {concat}\,\,{\textbf{u}}\) is a prefix of \(\texttt {concat}\,\,{\textbf{w}}_2^{2}\). Let \({\textbf{p}}\), \({\textbf{v}}\) be such that

  • \({\textbf{p}}\cdot {\textbf{v}}\le {\textbf{w}}_2^{2}\);

  • \({\textbf{p}}\) is the maximum prefix of \({\textbf{w}}_2^{2}\) such that \(\texttt {concat}\,\,{\textbf{p}}\le _p z\); and

  • \({\textbf{p}}\cdot {\textbf{v}}\) is the minimum prefix of \({\textbf{w}}_2^{2}\) such that \(z \cdot \texttt {concat}\,\,{\textbf{u}}\le _p \texttt {concat}\,\,({\textbf{p}}\cdot {\textbf{v}})\),

and let p, s be words such that \(p \cdot \texttt {concat}\,\,{\textbf{u}}\cdot s = \texttt {concat}\,\,{\textbf{v}}\). The situation is illustrated by Fig. 4. Here p and s satisfy \(\texttt {concat}\,\,{\textbf{p}}\cdot p = z\) and \(z \cdot \texttt {concat}\,\,{\textbf{u}}\cdot s = \texttt {concat}\,\,({\textbf{p}}\cdot {\textbf{v}})\). The maximality of \({\textbf{p}}\), and the minimality of \({\textbf{p}}\cdot {\textbf{v}}\) also implies that \(p <_p \texttt {hd}\,\,{\textbf{v}}\) and \(s <_s \texttt {last}\,\,{\textbf{v}}\). Therefore we have \(p\, (\texttt {concat}\,\,{\textbf{u}})\, s \sim _{{\mathcal {I}}} \texttt {concat}\,\,{\textbf{v}}\). The interpretation is disjoint by the first part of the proof. It is also extendable since s is a prefix of \(\texttt {concat}\,\,{\textbf{s}}\), where \({\textbf{u}}\cdot {\textbf{s}}= {\textbf{w}}_1^2\), and p is a suffix of \({\textbf{w}}_1\). \(\square \)

Fig. 4
figure 4

Interpretation of \(\texttt {concat}\,\,{\textbf{u}}\) from Lemma 15

Let \(B = \{x,y\}\) be a binary code. In order to apply the above lemma to the imprimitive \(\texttt {concat}\,\,{\textbf{w}}= z^k\), \(2 \le k\), of a primitive \({\textbf{w}}\in \texttt {lists}\,\,{} B\), set \({\textbf{w}}_1= {\textbf{w}}_2 = {\textbf{w}}\). Let us verify the assumption \(z \notin \left\langle B \right\rangle \). If \(z = \texttt {concat}\,\,{\textbf{z}}\), with \({\textbf{z}}\in \texttt {lists}\,\,{} B\), then \(\texttt {concat}\,\,({\textbf{z}}\cdot {\textbf{w}}) = \texttt {concat}\,\,({\textbf{w}}\cdot {\textbf{z}})\), hence \({\textbf{z}}\cdot {\textbf{w}}= {\textbf{w}}\cdot {\textbf{z}}\) since B is a code. This implies that \({\textbf{w}}\) and \({\textbf{z}}\) have the same primitive root. Since \(\texttt {concat}\,\,{\textbf{z}}\) is strictly shorter than \(\texttt {concat}\,\,{\textbf{w}}\), we deduce that \({\textbf{w}}\) is not primitive, a contradiction.

See also Fig. 7 which illustrates our main application of the lemma with \({\textbf{u}}= [x,x]\), \({\textbf{w}}_1 = {\textbf{w}}_2\), and \({\textbf{v}}= [x,y,x]\).

5 Uniform Binary Codes

In this section, we use the main idea in a relatively simple case of uniform binary codes, that is, binary codes \(B = \{x,y\}\) with \(\left| x \right| = \left| y \right| \). The key ingredient is the following technical lemma characterizing possible \(\{x,y\}\)-interpretations of the word \(x \cdot y\) in the uniform case.

Lemma 16

(uniform-square-interp) Let \(B = \{x,y\}\) be a binary code with \(\left| x \right| = \left| y \right| \). Let \(p\ (x\cdot y)\ s \sim _{{\mathcal {I}}} {\textbf{v}}\) be a nontrivial B-interpretation. Then \({\textbf{v}}= [x,y,x]\) or \({\textbf{v}}= [y,x,y]\) and \(x\cdot y\) is imprimitive.

Proof

We have \(p \cdot x \cdot y \cdot s = \texttt {concat}\,\,{\textbf{v}}\), where p and s are not empty (otherwise the interpretation is trivial). Therefore \(0< \left| p\cdot s \right| < 2 \left| x \right| \), and a length argument yields that \(\left| {\textbf{v}} \right| \) is three. A straightforward way to prove the claim is to consider all eight possible candidates. If \({\textbf{v}}= [x,y,x]\) or \({\textbf{v}}= [y,x,y]\), then \(x \cdot y\) is a nontrivial factor of its square \((x \cdot y)\cdot (x \cdot y)\), which yields the imprimitivity of \(x \cdot y\) (see Sect. 2.2.3).

The remaining six cases can be easily excluded one by one. In each case we obtain \(x=y\), a contradiction to B being a code.

  • If \({\textbf{v}}= [x,x,x]\) then \(p \cdot x \cdot y \cdot s = x \cdot x \cdot x\) implies, by synchronization (see Sect. 2.2.3), that \(y \cdot s\) and x commute, that is, \(x \cdot y \cdot s = y\cdot s\cdot x\). This implies \(x = y\) due to \(\left| x \right| = \left| y \right| \). We can argue similarly for \({\textbf{v}}= [y,y,y]\).

  • If \({\textbf{v}}= [x,x,y]\), then \(p \cdot x \cdot y \cdot s = x \cdot x \cdot y\) implies that \(x = p \cdot t\) for some word t. Then \(p \cdot p \cdot t \cdot y \cdot s = p \cdot t \cdot p \cdot t \cdot y\) implies \(p \cdot t = t \cdot p\) and \(y \cdot s = t \cdot y\). Therefore \(x \le _p t \cdot x\) and \(y \le _p t \cdot y\). This implies that both x and y have a periodic root t, and since they are of the same length, we conclude \(x = y\). A symmetric argument deals with \({\textbf{v}}= [x,y,y]\).

  • Let now \({\textbf{v}}= [y,y,x]\). Then \(p \cdot x \cdot y \cdot s = y \cdot y \cdot x\) and \(y = p \cdot t\) for some word t. From \(p\cdot x \cdot p \cdot t \cdot s = p \cdot t \cdot p \cdot t \cdot x\) and \(\left| x \right| = \left| y \right| = \left| t\cdot p \right| \) we deduce \(x = t \cdot p\). The equality \(p \cdot t \cdot p \cdot y \cdot s = t \cdot p \cdot t \cdot y \cdot x\) now implies \(y = p\cdot t = t\cdot p = x\). The last case \({\textbf{v}}= [x,x,y]\) is again symmetric.

\(\square \)

The previous proof is interesting from the point of view of the formalization. It is a case analysis in which each case is easy. It is nevertheless not always easy to check that the case analysis is complete. See Sect. 8 for a custom method performing such a verification in our formalization. Let us also remark that to show that \({\textbf{v}}\) is indeed of length three, which needs no further justification in a human proof, requires some effort in the formalization. The difference reveals the often non-reflected intuition behind human dealing with small natural numbers.

Lemma 16 immediately implies the following characterization of primitive uniform binary morphisms (see [22, Theorem 10]).

Theorem 17

(bin-uniform-prim-morph) Let \(B = \{x,y\}\) be a binary code with \(\left| x \right| = \left| y \right| \). The code B is primitivity-preserving if and only if \(x \cdot y\) is primitive.

Proof

If B is primitivity-preserving, then \(x\cdot y\) is primitive by definition.

Assume now that \(x\cdot y\) is primitive and proceed by contradiction. Let \({\textbf{w}}\) be primitive of length at least two such that \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive. Then \({\textbf{w}}\) contains both letters x and y, hence it has either [xy] or [yx] as a factor. Conjugating \({\textbf{w}}\), we can assume that [xy] is a factor of \({\textbf{w}}\). The imprimitivity of \(\texttt {concat}\,\,{\textbf{w}}\) yields a nontrivial B-interpretation of \(x \cdot y\), which implies that \(x \cdot y\) is not primitive by Lemma 16, a contradiction. \(\square \)

The previous theorem can be reformulated as saying that an imprimitivity witness for a uniform binary code must be of length two, which means that \(x \cdot y\) (and hence also \(y\cdot x\)) must be imprimitive. Consequently, there is only one way to get a uniform binary code not preserving primitivity: take an odd power of a primitive word of even length, and split it in half. For example, the third power of the primitive word \(t = abba\) yields the code \(\{abbaab,baabba\}\), which is not primitivity-preserving.

Uniform binary codes also have the following property.

Lemma 18

(bin-uniform-imprim) Let \(B = \{x,y\}\) be a binary code with \(\left| x \right| = \left| y \right| \). If \(x\cdot y\) is not primitive, then both x and y are primitive.

Proof

Let \(x\cdot y\) be imprimitive. Then there are words r and s and a positive integer i such that \(r \cdot s\) is the primitive root of \(x\cdot y\), \(x = (r\cdot s)^i\cdot r\) and \(y = (s \cdot r)^i \cdot s\). Note that \(\{r,s\}\) is a uniform binary code. Since \(r\cdot s\) is primitive, the code \(\{r,s\}\) is primitivity-preserving by Lemma 17. Since both \([r,s]^i\cdot [r]\) and \([s,r]^i\cdot [s]\) are primitive, we conclude that x and y, their concatenations, are also primitive. \(\square \)

This immediately yields the following improved version of Theorem 17:

Theorem 19

(bin-uniform-prim-morph’) Let \(B = \{x,y\}\) be a binary code with \(\left| x \right| = \left| y \right| \). If \(x \cdot y\) is primitive or if at least one of x and y is imprimitive, then B is primitivity-preserving.

We will later need the following corollary.

Lemma 20

(bin-imprim-not-conjug) Let \(B = \{x,y\}\) be a uniform binary code which is not primitivity-preserving. Then x and y are not conjugate.

Proof

By Theorem 17, the word \(x\cdot y\) is imprimitive. Let x and y be conjugate, and let \(x = p \cdot q\) and \(y = q \cdot p\). Since \(x \cdot y = p \cdot q \cdot q \cdot p\) is imprimitive, also \(p \cdot p \cdot q \cdot q\) is imprimitive. Then p and q commute by the Lyndon–Schützenberger Theorem 3, a contradiction to B being a code. \(\square \)

6 Binary Interpretation of a Square

In Sect. 4, we have pointed out the main idea of the proof of our main goal, Theorem 7. Namely, the fact that an imprimitive \(\texttt {concat}\,\,{\textbf{w}}\) of a primitive \({\textbf{w}}\in \texttt {lists}\,\,\{x,y\}\) provides a disjoint extendable \(\{x,y\}\)-interpretation of \(\texttt {concat}\,\,{\textbf{u}}\) for any factor \({\textbf{u}}\) of \({\textbf{w}}\). More specifically, the plan is to apply this idea to \({\textbf{u}}= [x,x]\). Consequently, the core technical component of the proof is a characterization of disjoint extendable \(\{x,y\}\)-interpretations of the square \(x^2\), where \(\{x,y\}\) is a binary code, and \(\left| y \right| \le \left| x \right| \). This is a very nice result which is relatively simple to state but difficult to prove, and which is valuable on its own. As we mentioned already, it can be obtained from Théorème 2.1 and Lemme 3.1 [1].

Theorem 21

(square-interp-ext.sq-ext-interp) Let \(B = \{x,y\}\) be a binary code such that \(\left| y \right| \le \left| x \right| \), both x and y are primitive, and x and y are not conjugate.

Let \(p\, (x\cdot x)\, s \sim _{{\mathcal {I}}} {\textbf{w}}\) be a disjoint extendable B-interpretation. Then

$$\begin{aligned} {\textbf{w}}&= [x,y,x],&s \cdot p&= y,&p \cdot x&= x \cdot s. \end{aligned}$$

In order to appreciate the connection of the theorem to the problem of preserving primitivity, note that the definition of interpretation implies

$$\begin{aligned} p \cdot x \cdot x \cdot s = x \cdot y \cdot x, \end{aligned}$$

hence \(x \cdot y \cdot x = (p \cdot x)^2\). This will turn out to be the only way how primitivity may not be preserved if x occurs at least twice in \({\textbf{w}}\). Fig. 5 yields Here is an example with \(x = \texttt {01010}\) and \(y = \texttt {1001}\).

Fig. 5
figure 5

Disjoint binary interpretation of a square

Proof of Theorem 21

By the definition of a disjoint interpretation, we have \(p\cdot x\cdot x \cdot s = \texttt {concat}\,\,{\textbf{w}}\), where \(p \ne \varepsilon \) and \(s \ne \varepsilon \). A length argument implies that \({\textbf{w}}\) has length at least three. Since a primitive word is not a nontrivial factor of its square, we have \({\textbf{w}}= [\texttt {hd}\,\,{\textbf{w}}] \cdot [y]^k \cdot [\texttt {last}\,\,{\textbf{w}}]\), with \(k \ge 1\). Since the interpretation is disjoint, we can split the equality into \(p \cdot x = \texttt {hd}\,\,{\textbf{w}}\cdot y^m \cdot u\) and \(x \cdot s = v \cdot y^\ell \cdot \texttt {last}\,\,{\textbf{w}}\), where \(y = u \cdot v\), both u and v are nonempty, and \(k = \ell + m + 1\), as in Fig. 6.

Fig. 6
figure 6

Definition of u and v

We want to show \(\texttt {hd}\,\,{\textbf{w}}= \texttt {last}\,\,{\textbf{w}}= x\) and \(m = \ell = 0\). The situation is mirror symmetric so we can prove claims for \(\texttt {hd}\,\,\) and for \(\texttt {last}\,\,\) two at a time. We proceed by contradiction and exclude all unwanted situations. Some cases are concluded by showing that u and v commute, which contradicts the fact that y is primitive.

If \(\texttt {hd}\,\,{\textbf{w}}= \texttt {last}\,\,{\textbf{w}}= y\), then \(x^2\) and \(y^{k+2}\) share a factor of length at least \(\left| x \right| + \left| y \right| \). Since x and y are primitive, this implies that they are conjugate, a contradiction. A similar argument applies when \(\ell \ge 1\) and \(\texttt {hd}\,\,{\textbf{w}}= y\) (if \(m \ge 1\) and \(\texttt {last}\,\,{\textbf{w}}= y\) respectively). Therefore, in order to prove \(\texttt {hd}\,\,{\textbf{w}}= \texttt {last}\,\,{\textbf{w}}= x\), it remains to exclude the case \(\texttt {hd}\,\,{\textbf{w}}= y\), \(\ell = 0\) and \(\texttt {last}\,\,{\textbf{w}}= x\) (\(\texttt {last}\,\,{\textbf{w}}= y\), \(m = 0\) and \(\texttt {hd}\,\,{\textbf{w}}= x\) respectively). This is covered by one of the technical lemmas that we single out:

Lemma 22

(pref-suf-pers-short) Let \(x \le _p v \cdot x\), \(x \le _s r \cdot u \cdot v \cdot u\) and \(\left| x \right| > \left| v \cdot u \right| \) with \(r \in \left\langle \{u,v\} \right\rangle \). Then \(u \cdot v = v \cdot u\).

Let us first explain how this lemma is used. With assumptions of the case we want to exclude, we obtain \(x\cdot s = v\cdot x\) and \(p\cdot x = y^{m+1}\cdot u = y^m \cdot u\cdot v\cdot u\). Since x is a factor of \(y^{m+2}\), the inequality in \(\left| u\cdot v \right| = \left| y \right| \le \left| x \right| \) becomes strict because we assume that x and y are not conjugate. All hypotheses of Lemma 22 are therefore readily seen to be satisfied with \(r = y^m\), the conclusion yielding a contradiction.

Proof of Lemma 22

The conclusion is trivial if u or v is empty. Assume they are nonempty. From \(x \le _s r\cdot u\cdot v\cdot u\) and \(\left| x \right| > \left| v\cdot u \right| \), we obtain a nonempty word q such that \(x = q\cdot v\cdot u\) and \(q \le _s r\cdot u\). From \(x \le _p v\cdot x\) we have that x is a prefix of \(v^\omega \). The equality \(x = q\cdot v\cdot u\) now implies, by synchronization, that q commutes with v and \(u \le _p v \cdot u\). Then also \(u \le _p t \cdot u\), where t is the common primitive root of q and v. Moreover, t is a suffix of \(r \cdot u\), where \(r \in \left\langle \{u,t\} \right\rangle \). From the last fact, it is easy to see that t is a suffix of \(t \cdot u^k\) for some positive k, which implies \(t \le _s t \cdot u\). From \(t \le _p u\cdot t\) and \(t \le _s u \cdot t\), it follows that u commutes with t, and therefore also with v.

Lemma 22 exemplifies virtues of formalization. First, the very fact that it is formulated as a general claim significantly improves the structure and readability of the proof of Theorem 21, in particular since it will be used several times. Second, note that we used Lemma 22 in the special case \(r = y^m\). Therefore, its independent formulation led to a generalization. Third, the proof of Lemma 22 is relatively simple since it is formulated in terms of elementary principles (which themselves are independent lemmas in the formalization).

Even the rest of the proof of Theorem 21, which we only sketch here, has, in our approach, a similarly modular structure. We now have \(\texttt {hd}\,\,{\textbf{w}}= \texttt {last}\,\,{\textbf{w}}= x\), hence \(p \cdot x = x \cdot y^m \cdot u\) and \(x \cdot s = v \cdot y^\ell \cdot x\). The natural way to describe this scenario is to observe that x has both the (prefix-)periodic root \(v \cdot y^\ell \), and the suffix-periodic root \(y^m \cdot u\).

Using again Lemma 22, we exclude situations when \(\ell = 0\) and \(m \ge 1\) (\(m = 0\) and \(\ell \ge 1\) resp.). It therefore remains to deal with the case when both m and \(\ell \) are positive. We divide this into four lemmas according to the size of the overlap the prefix \(v\cdot y^\ell \) and the suffix \(y^m\cdot u\) have in x. More exactly, the cases are:

  • \(\left| v\cdot y^\ell \right| + \left| y^m\cdot u \right| \le \left| x \right| \) (no overlap)

  • \(\left| x \right| < \left| v\cdot y^\ell \right| + \left| y^m\cdot u \right| \le \left| x \right| + \left| u \right| \) (short overlap)

  • \(\left| x \right| + \left| u \right|< \left| v\cdot y^\ell \right| + \left| y^m\cdot u \right| < \left| x \right| + \left| u \cdot v \right| \) (medium overlap)

  • \(\left| x \right| + \left| u\cdot v \right| \le \left| v\cdot y^\ell \right| + \left| y^m\cdot u \right| \) (large overlap)

They are each solved by one auxiliary lemma. The first three cases yield that u and v commute, the first one being a straightforward application of the Periodicity Lemma 1 since both \(\left| v\cdot y^\ell \right| \) and \(\left| y^m\cdot u \right| \) are periods of x. The last one is an also straightforward application of the synchronization idea. Namely, if the prefix \(v\cdot y^\ell \) and the suffix \(y^m\cdot u\) of x overlap by at least \(\left| y \right| \), then x is a factor of \(y^\omega \). Moreover, since \(v\cdot u\) is both a prefix and a suffix of x, we obtain that x commutes with \(v\cdot u\), a contradiction to x and y being primitive and not conjugate.

The technical part of the whole proof is concentrated in lemmas dealing with the second, and the third case (see lemmas short-overlap and medium-overlap in the theory Binary-Square-Interpretation.thy), which shows that for the length conditions referred to as “short overlap” and “medium overlap” above, the words u, v and x pairwise commute. The corresponding proofs are again further analyzed and decomposed into more general claims. The lemma short-overlap ultimately depends on the following technical claim, which is proved using elementary tools:

Lemma 23

(uvu-pref-uvv) Let \(p\cdot u\cdot v\cdot v\cdot s = u\cdot v\cdot u\cdot q\) where p is a prefix of u, both s and q are prefixes of some words from \(\left\langle \{u,v\} \right\rangle \), and \(\left| u \right| \le \left| s \right| \). Then u and v commute.

This lemma is a natural ingredient of the whole proof, since it forbids a certain kind of overlap within the language generated by the binary code \(\{u,v\}\). Observe that p commutes with \(u\cdot v\), since \(p \cdot u \cdot v\) is a prefix of \((u\cdot v)^2\). This in particular simplifies the equality in the lemma into \(p\cdot v\cdot s = u \cdot q\). We therefore once more formulate a general claim whose proof is a relatively simple and intuitive application of the idea of comparison of maximal prefixes:

Lemma 24

(comm-puv-pvs-eq-uq) Let \(p\cdot u\cdot v = u\cdot v\cdot p\) and \(p\cdot v\cdot s = u \cdot q\) where p is a prefix of u, both s and q are prefixes of some words from \(\left\langle \{u,v\} \right\rangle \), and \(\left| u \right| \le \left| s \right| \). Then u and v commute.

Proof

(sketch) We compare maximal t-prefixes of \(p\cdot v\cdot s\) and \(u \cdot q\), where t is the common primitive root of p and \(u \cdot v\). Assume that \(u\cdot v\) and \(v \cdot u\) do not commute and let \(\alpha = u\cdot v \wedge _p v \cdot u\). Since t is the primitive root of \(u\cdot v\), we have that the maximal t-prefix of \(v \cdot u\) is exactly \(\alpha \). Using the decoding delay principle, we deduce that the maximal t-prefix of \(p \cdot v \cdot s\) is \(p \cdot \alpha \). On the other hand, the word \(u \cdot \alpha \), which is a prefix of \(u \cdot q\), is also a prefix of \(t^\omega \) since it is in particular a prefix of \((u\cdot v)^2\). The equality \(p\cdot v\cdot s = u \cdot q\) therefore implies that \(p \cdot \alpha = u \cdot \alpha \), and \(u = p\). Hence u and v commute. \(\square \)

We omit the proof of lemma medium-overlap which is of a similar nature and difficulty.

This completes the proof of \({\textbf{w}}= [x,y,x]\). A byproduct of the proof is the description of words x, y, p and s. Namely, there are non-commuting words r and t, and integers m, k and \(\ell \) such that

$$\begin{aligned} x&= (rt)^{m+1}\cdot r,&y&= (tr)^{k+1}\cdot (rt)^{\ell +1},&p&= (rt)^{k+1},&s&= (tr)^{\ell +1}\,. \end{aligned}$$

The claim \(y = s \cdot p\) is then equivalent to \(k = \ell \), and it is an easy consequence of the assumption that the interpretation is extendable. This completes the proof of Theorem 21.

7 The Witness with Two x’s

In this section, we characterize lists witnessing that \(\{x,y\}\) is not primitivity-preserving and containing at least two x’s.

Theorem 25

(bin-imprim-longer-twice) Let \(B = \{x,y\}\) be a binary code such that \(\left| y \right| \le \left| x \right| \). Let \({\textbf{w}}\in \texttt {lists}\,\,{} \{x,y\}\) be a primitive list which contains x at least twice such that \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive.

Then \({\textbf{w}}\sim [x,x,y]\) and both x and y are primitive.

We divide the proof in three steps.

7.1 The Core Case

We first prove the claim with two additional assumptions which will be subsequently removed. Namely, the following lemma shows how the knowledge about the B-interpretation of \(x \cdot x\) from the previous section is used. The additional assumptions are displayed as items.

Lemma 26

(bin-imprim-primitive) Let \(B = \{x,y\}\) be a code with \(\left| y \right| \le \left| x \right| \) where

  • both x and y are primitive,

and let \({\textbf{w}}\in \texttt {lists}\,\,{} B\) be primitive such that \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive, and

  • [xx] is a cyclic factor of \({\textbf{w}}\).

Then \({\textbf{w}}\sim [x,x,y]\).

Proof

Choosing a suitable conjugate of \({\textbf{w}}\), we can suppose, without loss of generality, that [xx] is a prefix of \({\textbf{w}}\). Now, we want to show \({\textbf{w}}= [x,x,y]\). By Lemma 20, we know that x and y are not conjugate.

Let z be the primitive root of \(\texttt {concat}\,\,{\textbf{w}}\), and let \(\texttt {concat}\,\,{\textbf{w}}= z^k\), \(2 \le k\). Since \({\textbf{w}}\) is primitive and \(\{x,y\}\) is a code, the word z is not in \(\left\langle \{x,y\} \right\rangle \). Lemma 15 yields a disjoint extendable B-interpretation of \(\texttt {concat}\,\,{\textbf{w}}\). In particular, the induced disjoint extendable B-interpretation of the prefix \(x \cdot x\) is of the form \(p\, (x \cdot x)\, s \sim _{{\mathcal {I}}} [x,y,x]\) by Theorem 21. Let \({\textbf{p}}\) be the (nonempty and strict) prefix of \({\textbf{w}}\) such that \(\texttt {concat}\,\,{\textbf{p}}\cdot p = z\). Then \({\textbf{p}}\cdot [x,y,x]\) is a prefix of \({\textbf{w}}^2\),

$$\begin{aligned} \texttt {concat}\,\,({\textbf{p}}\cdot [x,y]) = z \cdot xp,\ \text {and}\ \texttt {concat}\,\,[x,x,y] = (xp)^2, \end{aligned}$$
(2)

see Fig. 7.

Fig. 7
figure 7

Interpretation of xx induced by the shift by z

Proceed by contradiction and assume \({\textbf{w}}\ne [x,x,y]\). Since both \({\textbf{w}}\) and [xxy] are primitive, this implies \({\textbf{w}}\cdot [x,x,y] \ne [x,x,y] \cdot {\textbf{w}}\). Since \(\{x,y\}\) is a code, we also have

$$\begin{aligned} \texttt {concat}\,\,({\textbf{w}}\cdot [x,x,y]) \ne \texttt {concat}\,\,([x,x,y]\cdot {\textbf{w}}), \end{aligned}$$

which yields that z and \(x\cdot p\) do not commute using (2) a \({\textbf{w}}= z^k\). Therefore \(\{z,xp\}\) is a binary code, as well as \(\{x,y\}\), and we can use its decoding delay property. Write

$$\begin{aligned} \alpha _{z,xp}&= z \cdot xp \wedge _p xp \cdot z,\ \text {and} \\ \alpha _{x,y}&= x \cdot y \wedge _p y \cdot x. \end{aligned}$$

Then

$$\begin{aligned} \begin{aligned} \alpha _{z,xp}&= z^k \cdot (xp)^2 \wedge _p (xp)^2 \cdot z^k = \\&= \texttt {concat}\,\,({\textbf{w}}\cdot [x,x,y]) \wedge _p \texttt {concat}\,\,([x,x,y] \cdot {\textbf{w}}) = \\&= \texttt {concat}\,\,({\textbf{w}}\cdot [x,x,y] \wedge _p [x,x,y] \cdot {\textbf{w}}) \cdot \alpha _{x,y}, \end{aligned} \end{aligned}$$
(3)

where the last equality follows from Lemma 5. Similarly, we have

$$\begin{aligned} \begin{aligned} z \cdot \alpha _{z,xp}&= z \cdot (z^k \wedge _p xp) = z^k \cdot z \cdot xp \wedge _p z \cdot xp \cdot z^k = \\&= \texttt {concat}\,\,({\textbf{w}}\cdot {\textbf{p}}\cdot [x,y]) \wedge _p \texttt {concat}\,\,({\textbf{p}}\cdot [x,y] \cdot {\textbf{w}}) = \\&=\texttt {concat}\,\,({\textbf{w}}\cdot {\textbf{p}}\cdot [x,y] \wedge _p {\textbf{p}}\cdot [x,y] \cdot {\textbf{w}}) \cdot \alpha _{x,y}. \end{aligned} \end{aligned}$$
(4)

Again, the last equality follows from Lemma 5, but we have to verify the hypothesis that \(\texttt {concat}\,\,({\textbf{w}}\cdot {\textbf{p}}\cdot [x,y])\) and \(\texttt {concat}\,\,({\textbf{p}}\cdot [x,y] \cdot {\textbf{w}})\) are not comparable. This is equivalent to \({\textbf{w}}\cdot {\textbf{p}}\cdot [x,y] \ne {\textbf{p}}\cdot [x,y] \cdot {\textbf{w}}\) since \(\{x,y\}\) is a code. This is further equivalent to \({\textbf{p}}\cdot [x,y] \ne {\textbf{w}}\) since \({\textbf{w}}\) is primitive, and \({\textbf{p}}\cdot [x,y]\) is a strict prefix of \({\textbf{w}}\cdot {\textbf{w}}\). The possibility \({\textbf{p}}= [x]\) leads to the claim we want to prove (and which we presently assume not to be true). If \({\textbf{p}}\ne [x]\), then \({\textbf{p}}\cdot [x,y] \ne {\textbf{w}}\) follows from \(\texttt {concat}\,\,{\textbf{p}}\cdot p = z\) and \(\texttt {concat}\,\,{\textbf{w}}= z^k\) by a length argument: the word \(\texttt {concat}\,\,{\textbf{w}}\) is longer than \(\texttt {concat}\,\,{\textbf{p}}\cdot [x,y]\).

Denote

$$\begin{aligned} {\textbf{v}}_1&= {\textbf{w}}\cdot [x,x,y] \wedge _p [x,x,y] \cdot {\textbf{w}},&{\textbf{v}}_2&= {\textbf{w}}\cdot {\textbf{p}}\cdot [x,y] \wedge _p {\textbf{p}}\cdot [x,y] \cdot {\textbf{w}}. \end{aligned}$$

From (3) and (4) we now have \( z \cdot \texttt {concat}\,\,{\textbf{v}}_1 = \texttt {concat}\,\,{\textbf{v}}_2\). Since \({\textbf{w}}\cdot [x,x,y] \ne [x,x,y] \cdot {\textbf{w}}\) and \({\textbf{w}}\cdot {\textbf{p}}\cdot [x,y] \ne {\textbf{p}}\cdot [x,y] \cdot {\textbf{w}}\) we have \({\textbf{v}}_1 \le _p {\textbf{w}}\cdot [x,x]\). Therefore both \({\textbf{v}}_1\) and \({\textbf{v}}_2\) are prefixes \({\textbf{w}}^3\), a contradiction to Lemma 15. \(\square \)

7.2 Dropping the Primitivity Assumption

We first deal with the situation when x and y are not primitive. A natural idea is to consider the primitive roots of x and y instead of x and y. This means that we replace the list \({\textbf{w}}\) with \({\mathcal {R}\,}{\textbf{w}}\), where \({\mathcal {R}\,}\) is the morphism mapping [x] to \([\rho (x)]^{e_x}\) and [y] to \([\rho (y)]^{e_y}\) where \(x = \rho (x)^{e_x}\) and \(y = \rho (y)^{e_y}\). For example, if \(x = abab\) and \(y = aa\), and \({\textbf{w}}= [x,y,x] = [abab,aa,abab]\), then \({\mathcal {R}\,}{\textbf{w}}= [ab,ab,a,a,ab,ab]\).

Let us check which hypotheses of Lemma 26 are satisfied in the new setting, that is, for the code \(\{\rho (x),\rho (y)\}\) and the list \({\mathcal {R}\,}{\textbf{w}}\). The following facts are straightforward:

  • \(\texttt {concat}\,\,{\textbf{w}}= \texttt {concat}\,\,({\mathcal {R}\,}{\textbf{w}})\);

  • if [cc] is a cyclic factor \({\textbf{w}}\), where \(c\in \{x,y\}\), then \([\rho (c),\rho (c)]\) is a cyclic factor of \({\mathcal {R}\,}{\textbf{w}}\).

Let us consider the next required property:

  • if \({\textbf{w}}\) is primitive of length at least two, then \({\mathcal {R}\,}{\textbf{w}}\) is also primitive. (\(*\))

First, note that the above property does not necessarily hold if \({\textbf{w}}\) is a singleton. Indeed, the situation we are dealing with here, namely that x or y is imprimitive, means that also \({\mathcal {R}\,}\, [x]\) or \({\mathcal {R}\,}\, [y]\) is not primitive. However, we assume that some [cc] is a factor of \({\textbf{w}}\), which means that \({\textbf{w}}\) can be primitive only if it contains both letters. In our formalization, the required property is proved for a more general class of codes (whose introduction was triggered by the formalization of the present result). Namely we define a locale for non-overlapping sets, sets in which two different elements have no overlap.

Definition 27

We say that a set C of words is non-overlapping if

  • \(\varepsilon \notin C\);

  • if a nonempty word z is a suffix of u and a prefix of v, \(u,v \in C\), then \(u = v\);

  • if u is a factor of v, then \(u = v\).

Note that a non-overlapping set is a code, even a bifix code, that is, no two distinct code words are prefix (suffix) comparable. From this it follows that \(\texttt {concat}\,\,{\textbf{u}}\le _p \texttt {concat}\,\,{\textbf{v}}\) implies \({\textbf{u}}\le _p {\textbf{v}}\) (\(\texttt {concat}\,\,{\textbf{u}}\le _s \texttt {concat}\,\,{\textbf{v}}\) implies \({\textbf{u}}\le _s {\textbf{v}}\)) for any \({\textbf{u}}, {\textbf{v}}\in \texttt {lists}\,\,{} C\). Moreover, a non-overlapping code has the following property, which can be seen as a weaker version of a the self-synchronization property of codes (self-synchronizing codes are also called comma-free codes, see [3, p. 285]). The property is weaker just because an element of a non-overlapping code can be imprimitive, and hence a nontrivial factor of its own square.

Lemma 28

(non-overlapping.fac-concat-fac) Let C be a non-overlapping set. Assume that \({\textbf{u}}, {\textbf{v}}\in \texttt {lists}\,\,{} C\) and that \({\textbf{u}}\) contains at least two distinct elements of C. If \(p \cdot \texttt {concat}\,\,{\textbf{u}}\cdot s = \texttt {concat}\,\,{\textbf{v}}\), then there exists \({\textbf{p}}\) and \({\textbf{s}}\) such that \({\textbf{p}}\cdot {\textbf{u}}\cdot {\textbf{s}}= {\textbf{v}}\), \(\texttt {concat}\,\,{\textbf{p}}= p\) and \(\texttt {concat}\,\,{\textbf{s}}= s\).

Proof

Let \({\textbf{u}}_1\cdot {\textbf{u}}_2 = {\textbf{u}}\) be a factorization of \({\textbf{u}}\) into nonempty lists such that \(\texttt {last}\,\,{\textbf{u}}_1 \ne \texttt {hd}\,\,{\textbf{u}}_2\). Then \(\texttt {last}\,\,(\texttt {concat}\,\,{\textbf{u}}_1) \ne \texttt {hd}\,\,(\texttt {concat}\,\,{\textbf{u}}_2)\) since C is non-overlapping. The non-overlapping property now implies that the edge between \(\texttt {concat}\,\,{\textbf{u}}_1\) and \(\texttt {concat}\,\,{\textbf{u}}_2\) must correspond to an edge inside \(\texttt {concat}\,\,{\textbf{v}}\). That is, there are lists \({\textbf{v}}_1\) and \({\textbf{v}}_2\) such that

$$\begin{aligned} p \cdot \texttt {concat}\,\,{\textbf{u}}_1&= \texttt {concat}\,\,{\textbf{v}}_1,&\texttt {concat}\,\,{\textbf{u}}_2 \cdot s&= \texttt {concat}\,\,{\textbf{v}}_2, \quad \text {and} \quad {\textbf{v}}_1 \cdot {\textbf{v}}_2 = {\textbf{v}}\,. \end{aligned}$$

Since C is a bifix code, we deduce that \({\textbf{u}}_1\) is suffix of \({\textbf{v}}_1\), and \({\textbf{u}}_2\) is a prefix of \({\textbf{v}}_2\), which concludes the proof. \(\square \)

A non-overlapping set forms a primitivity-preserving set of words, which is stated as the next theorem.

Theorem 29

(non-overlapping.prim-morph) Let C be a non-overlapping set. Let \({\textbf{w}}\in \texttt {lists}\,\,{} C\) be a primitive list of length at least 2. Then \(\texttt {concat}\,\,{\textbf{w}}\) is primitive.

Proof

Assume that \(\texttt {concat}\,\,{\textbf{w}}\) is not primitive. Hence there are k and z such that \(z^k = \texttt {concat}\,\,{\textbf{w}}\) with \(k \ge 2\). It follows that \(z \cdot \texttt {concat}\,\,{\textbf{w}}\cdot z^{k-1} =\texttt {concat}\,\,({\textbf{w}}\cdot {\textbf{w}})\). As \({\textbf{w}}\) is primitive and of length at least 2, it contains two distinct elements of C. Hence, since C is non-overlapping, by Lemma 28, there exists \({\textbf{v}}\) such that \({\textbf{v}}\le _p {\textbf{w}}\cdot {\textbf{w}}\) and \(\texttt {concat}\,\,{\textbf{v}}= z\). Therefore, \({\textbf{v}}^k \in \texttt {lists}\,\,{} C\). As \(\texttt {concat}\,\,({\textbf{v}}^k) = \texttt {concat}\,\,{\textbf{w}}\), we conclude that \({\textbf{v}}^k = {\textbf{w}}\), which is a contradiction since \({\textbf{w}}\) is primitive. \(\square \)

The previous lemma implies that the morphism \({\mathcal {R}\,}\) has the required property (\(*\)). This conclusion is straightforward but slightly technical. We use it as an opportunity to illustrate several properties of morphisms on lists. We assume that x and y are of type ’a list. Hence \({\textbf{w}}\) is of type ’a list list and \({\mathcal {R}\,}\) is a morphism of type ’a list list \(\Rightarrow \) ’a list list. The morphism \({\mathcal {R}\,}\) is defined by its core mapping \({\mathcal {R}\,}^{{{\mathcal {C}}}}\) of type ’a list \(\Rightarrow \) ’a list list which maps \(x \mapsto [\rho (x)]^{e_x}\) and \(y \mapsto [\rho (y)]^{e_y}\). The relation between \({\mathcal {R}\,}\) and \({\mathcal {R}\,}^{{\mathcal {C}}}\) is given by

$$\begin{aligned} {\mathcal {R}\,}{\textbf{w}}= \texttt {concat}\,\,\left( \texttt {map}\,\,\, {\mathcal {R}\,}^{{\mathcal {C}}}\, {\textbf{w}}\right) . \end{aligned}$$

Since \({\mathcal {R}\,}^{{{\mathcal {C}}}}\, x \ne {\mathcal {R}\,}^{{{\mathcal {C}}}}\, y\), it is trivial that \(\texttt {map}\,\,{\mathcal {R}\,}^{{{\mathcal {C}}}} {\textbf{w}}\) (of type ’a list list list) is primitive if and only if \({\textbf{w}}\) (of type ’a list list) is primitive, for any \({\textbf{w}}\in \texttt {lists}\,\,{} \{x,y\}\). Using the above example \(x = abab\) and \(y = aa\), and \({\textbf{w}}= [x,y,x]\) we have

$$\begin{aligned}{} & {} {\mathcal {R}\,}^{{\mathcal {C}}}x = [ab,ab], \quad {\mathcal {R}\,}^{{\mathcal {C}}}y = [a,a], \quad \texttt {map}\,\,{} {\mathcal {R}\,}^{{\mathcal {C}}}{\textbf{w}}= \left[ [ab,ab],[a,a],[ab,ab]\right] , \\{} & {} {\mathcal {R}\,}{\textbf{w}}= \texttt {concat}\,\,{} (\texttt {map}\,\,{} {\mathcal {R}\,}{\textbf{w}}) = [ab,ab,a,a,ab,ab]. \end{aligned}$$

Since x and y do not commute, we have \(\rho (x) \ne \rho (y)\). Hence \(\left\{ [\rho (x)]^{e_x}, [\rho (y)]^{e_y} \right\} \) is a non-overlapping set. Theorem 29 now implies that \({\mathcal {R}\,}{\textbf{w}}\) is primitive if \({\textbf{w}}\) is primitive of length at least two as required in \((*)\).

Consequently, the only missing hypothesis preventing the use of Lemma 26 is \(\left| y \right| \le \left| x \right| \) since it may happen that \(\left| \rho (x) \right| < \left| \rho (y) \right| \). In order to solve this difficulty, we shall ignore for a while the length difference between x and y, and obtain the following intermediate lemma.

Lemma 30

(bin-imprim-both-squares, bin-imprim-both-squares-prim) Let \(B = \{x,y\}\) be a binary code, and let \({\textbf{w}}\in \texttt {lists}\,\,{} B\) be a primitive list such that \(\texttt {concat}\,\,{\textbf{w}}\) is imprimitive. Then \({\textbf{w}}\) cannot contain both [xx] and [yy] as cyclic factors.

Proof

Assume that \({\textbf{w}}\) contains both [xx] and [yy] as cyclic factors.

Consider the list \({\mathcal {R}\,}{\textbf{w}}\) and the code \(\{\rho (x),\rho (y)\}\). Since \({\mathcal {R}\,}{\textbf{w}}\) contains both \([\rho (x),\rho (x)]\) and \([\rho (y),\rho (y)]\), Lemma 26 implies that \({\mathcal {R}\,}{\textbf{w}}\) is conjugate either with the list \([\rho (x),\rho (x),\rho (y)]\) or with \([\rho (y),\rho (y),\rho (x)]\), which is a contradiction with the assumed presence of both squares. \(\square \)

7.3 Concluding the Proof by Gluing

It remains to deal with the existence of squares. We use an idea that is our main innovation with respect to the proof from [1], and that contributes significantly to the reduction of the length of the proof, and also to its increased clarity. Let \({\textbf{w}}\) be a list over a set of words X. The idea is to choose one of the words, say \(u \in X\), and to concatenate (or “glue”) blocks of u’s to words following them. For example, if \({\textbf{w}}= [u,v,u,u,z,u,z]\), then the resulting list is [uvuuzuz]. This procedure is in the general case well defined on lists whose last “letter” is not the chosen one and it leads to a new alphabet \(\{u^i\cdot v \mid v \ne u\}\), which is a code if and only if X is. This idea is used in an elegant proof of the Graph Lemma [2, 14]. Consider the binary case, which is of interest here. If \({\textbf{w}}\) does not contain a square of some letter, say [xx], then the new code is again binary, namely \(\{x\cdot y, y\}\). Moreover, the resulting glued list \({\textbf{w}}'\) has the same concatenation, and it is primitive if (and only if) \({\textbf{w}}\) is. Note that gluing is in this case closely related to the Nielsen transformation \(y \mapsto x^{-1}y\) known from the theory of automorphisms of free groups.

Induction on \(\left| {\textbf{w}} \right| \) now easily leads to the proof of Theorem 25.

Proof of Theorem 25

If \({\textbf{w}}\) contains y at most once, then we are left with the equation \(x^j\cdot y = z^\ell \), \(\ell \ge 2\). The equality \(j = 2\) follows from the Periodicity Lemma 1, see Case 2 in the proof of Theorem 8.

Assume for contradiction that y occurs at least twice in \({\textbf{w}}\). Lemma 30 implies that at least one square, [xx] or [yy] is missing as a cyclic factor. Let \(\{x',y'\} = \{x,y\}\) be such that \([x',x']\) is not a cyclic factor of \({\textbf{w}}\). We can therefore perform the gluing operation, and obtain a new, strictly shorter list \({\textbf{w}}' \in \texttt {lists}\,\,{} \{x' \cdot y', y'\}\). The longer element \(x'\cdot y'\) occurs at least twice in \({\textbf{w}}'\), since the number of its occurrences in \({\textbf{w}}'\) is the same as the number of occurrences of \(x'\) in \({\textbf{w}}\), the latter list containing both words at least twice by assumption. Moreover, \({\textbf{w}}'\) is primitive, and \(\texttt {concat}\,\,{\textbf{w}}' = \texttt {concat}\,\,{\textbf{w}}\) is imprimitive. Therefore, by induction on \(\left| {\textbf{w}} \right| \), we have \({\textbf{w}}' \sim [x'\cdot y',x' \cdot y', y']\). In order to show that this is not possible we can successfully reuse the lemma imprim-ext-suf-comm mentioned in the proof of Lemma 9, this time for \(u = x'y'x'\) and \(v = y'\). The words u and v do not commute because \(x'\) and \(y'\) do not commute. Since uv is imprimitive, the word \(uvv \sim \texttt {concat}\,\,{\textbf{w}}'\) is primitive. \(\square \)

This also completes the proof of our main goal, Theorem 7.

8 Additional Notes on the Formalization

The formalization is implemented in the proof assistant Isabelle/HOL [17, 23]. It is a part of a larger combinatorics on words formalization project. The project’s most recent version is published at GitLab [11] while the presented version is archived [12]. The formalization of the presented results relies on the project’s backbone session, called CoW, a version of which is also available in the Archive of Formal Proofs [10]. An overview of this session is available in [14]. The formalization itself is available in the Archive of Formal Proofs as a separate entry [13]. The backbone session covers all basics concepts used in this article, including the Periodicity Lemma 1, and many more elementary concepts of combinatorics on words. The concept of gluing used in Sect. 7.3 is covered by another session of the project, Graph Lemma, and its theory Glued-Codes.

The main results of this article are in a separate session in two dedicated theories: Binary-Square-Interpretation and Binary-Code-Imprimitive. The first contains lemmas and locales dealing with \(\{x,y\}\)-interpretation of the square xx (for \(\left| y \right| \le \left| x \right| \)), culminating in Theorem 21. The latter contains Theorems 7 and 25. A third theory Binary-Imprimitive-Decision covers Lemma 10 and Example 11. However, the formalized proof of Lemma 10 is an alternative simple proof independent of the parametric solution Theorem 8.

8.1 Formalization Highlights

Let us give a few concrete highlights of the formalization. We start by showing selected statements and a definition. The main result, Theorem 7, is formalized as follows:

figure a

Definition 14 is not formalized directly as a definition, but as two locales:

figure b

Theorem 21 is then stated within the last locale:

figure c

The next highlight is the usage of the reversed attribute — very useful tool, which is part of the CoW session. The attribute produces a symmetrical fact where the symmetry is induced by the mapping rev, i.e., the mapping which reverses the order of elements in a list. For instance, the fact stating that if p is a prefix of v, then p a prefix of \(v \cdot w\), is transformed by the reversed attribute into the fact saying that if s is a suffix of v, then s is a suffix of \(w \cdot v\). The attribute is based on rewriting and relies on ad hoc defined (possibly conditional) rules which induce the symmetry. In the example, the main reversal rule is

figure d

The attribute is used frequently in the present formalization. For instance, Fig. 8 shows the formalization of the proof of Cases 1 and 2 of Theorem 9. Namely, the proof of Case 2 is smoothly deduced from the lemma that deals with Case 1, avoiding writing down the same proof again up to symmetry.

Fig. 8
figure 8

Highlights from the formalization in Isabelle/HOL

The third highlight of the formalization is the use of simple but useful proof methods. The first method, called primitivity-inspection, is able to show primitivity or imprimitivity of a given word.

Another method named list-inspection is used to deal with claims that consist of straightforward verification of some property for a set of words given by their length and alphabet. For instance, this method painlessly provides the case analysis needed in the proof of lemma bin-imprim-both-squares-prim. The method automatically divides the goal into eight (relatively easy) subgoals corresponding to eight possible words. This removes the tedious verification that the case analysis is complete, which is typically rather implicit in human proofs.

The last method we want to mention is mismatch. It is designed to prove that two words commute using the decoding delay property of a binary code explained in Sect. 2.2.4. Namely, if a product of words from \(\{x,y\}\) starting with x shares a prefix of length at least \(\left| xy \right| \) with another product of words from \(\{x,y\}\), this time starting with y, then x and y commute. Examples of usage of the attribute reversed and all three methods are given in Fig. 8.

We conclude with an example, illustrating one of the main virtues of a good formalization, namely identification of auxiliary claims, and their independent formulation in a general form. As pointed out in the introduction, we see this aspect as our main contribution to the topic of the present paper. Consider the lemma imprim-ext-suf-comm mentioned twice above (see p. 12 and p. 28):

Lemma 31

(imprim-ext-suf-comm) Let both \(u\cdot v\) and \(u\cdot v\cdot v\) be imprimitive. Then u and v commute.

Proof

Since \(u\cdot v\) and \(u\cdot v \cdot v\) are imprimitive, also \(v\cdot u\) and \(v \cdot u \cdot v\) are imprimitive. Let \(z_1\) be the primitive root of \(v\cdot u\) and \(z_2\) the periodic root of \(v\cdot u\cdot v\). Note that \(z_1\) is a periodic root of \(v\cdot u\cdot v\). Hence \(z_1 = z_2\) by the Periodicity Lemma 1. Indeed, both \(z_1\) and \(z_2\) are periodic roots of \(v\cdot u\cdot v\) which is of length at least \(\left| z_1 \right| + \left| z_2 \right| \). \(\square \)

In the formalization, this elegant proof is divided into several claims. First, imprim-ext-suf-comm is proved as a consequence of imprim-ext-pref-comm which claims that u and v commute if \(v\cdot u\) and \(v \cdot u \cdot v\) are both imprimitive. The lemma imprim-ext-pref-comm, in turn, is proved using per-le-prim-iff:

Lemma 32

(per-le-prim-iff) Let u be of length at least twice of the length of its periodic root r. Then u is imprimitive if and only if it commutes with p.

The latter lemma is a specific application of the weak version of the Periodicity Lemma 1 as explained in the above proof of imprim-ext-suf-comm. We want to compare this approach with the proof by Mitrana [22, Theorem 5, p. 278], within which a claim equivalent to our imprim-ext-suf-comm is proved ad hoc for two particular words. Consequently, the fact, interesting in itself, cannot be reused, it makes the proof in which it is included harder to parse, and, moreover, it is proved using particular properties of the two words, which are not necessary for the claim (without making the proof easier at that).

8.2 Contribution to the Main Project

As mentioned above, the two dedicated theories containing the formalization of the main results of this article rely on backbone sessions of the project of formalization of combinatorics on words. In turn, the formalization presented in this paper led to an expansion of those backbone sessions. Let us briefly list the main components of that expansion.

  • extension of the reversal attribute in Reversal-Symmetry.thy to work with lists of lists;

  • substantial expansion of the elementary theory Equations-Basic.thy which provides auxiliary lemmas and definitions related to word equations;

  • expansion of the elementary theory Lyndon–Schutzenberger.thy by the parametric solution of the equation \(x^jy^k = z^\ell \), specifically Theorem 8 and Lemma 9;

  • substantial expansion of existing support for the idea of gluing as mentioned in Sect. 7 into a separate theory called Glued-Codes.thy (which is part of the session CoW-Graph-Lemma);

  • locale formalizing non-overlapping sets (Sect. 7.2).

9 Conclusion

The results presented in this paper have been known in some form since 1985 [1]. Nevertheless, their negligible use in literature proves that they have not been fully absorbed by the community of combinatorics on words. Our own experience indicates that this is so at least partly due to the complex nature and not sufficiently clear structure of the proof. Our effort is an example of the crucial role a formalization can play in approaching this kind of results. The formalization enforces a better proof structure, fills steps based on vague intuition with clearly formulated lemmas, facilitates natural generalizations and allows for transparent and systematic reuse of key ideas.

A broader context of the result we present in this paper are relations between small sets of words, or, seen from another perspective, solutions of simple word equations [15]. As it is often the case in combinatorics, and as the present paper shows, simply formulated problems of this kind can be very difficult. The work described in this paper is a natural starting point for exploring further problems from this area. A concrete example we have in mind, which served as a motivation for the present work, is the classification of binary equality words. This is a task still significantly more complex, in which a research unsupported by formalization may easily reach the borderline of what is humanly feasible [8].