Equivalence classes of circular codes induced by permutation groups

In the 1950s, Crick proposed the concept of so-called comma-free codes as an answer to the frame-shift problem that biologists have encountered when studying the process of translating a sequence of nucleotide bases into a protein. A little later it turned out that this proposal unfortunately does not correspond to biological reality. However, in the mid-90s, a weaker version of comma-free codes, so-called circular codes, was discovered in nature in J Theor Biol 182:45–58, 1996. Circular codes allow to retrieve the reading frame during the translational process in the ribosome and surprisingly the circular code discovered in nature is even circular in all three possible reading-frames (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^3$$\end{document}C3-property). Moreover, it is maximal in the sense that it contains 20 codons and is self-complementary which means that it consists of pairs of codons and corresponding anticodons. In further investigations, it was found that there are exactly 216 codes that have the same strong properties as the originally found code from J Theor Biol 182:45–58. Using an algebraic approach, it was shown in J Math Biol, 2004 that the class of 216 maximal self-complementary \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^3$$\end{document}C3-codes can be partitioned into 27 equally sized equivalence classes by the action of a transformation group \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L \subseteq S_4$$\end{document}L⊆S4 which is isomorphic to the dihedral group. Here, we extend the above findings to circular codes over a finite alphabet of even cardinality \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\Sigma |=2n$$\end{document}|Σ|=2n for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \in {\mathbb {N}}$$\end{document}n∈N. We describe the corresponding group \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_n$$\end{document}Ln using matrices and we investigate what classes of circular codes are split into equally sized equivalence classes under the natural equivalence relation induced by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_n$$\end{document}Ln. Surprisingly, this is not always the case. All results and constructions are illustrated by examples.


Introduction
proposed a class of trinucleotide codescalled comma-free codes-as nature's key to avoid errors when translating the genetic code. In Crick's biological setting, comma-free codes used a subset of the 64 possible codons for coding the 20 amino acids in a way such that they allowed the detection of errors in the translation process from coding sequences to proteins. Thus, these codes did not only raise interest from biologists but also were of great interest from the point of coding theory because they form a particular type of error correcting codes. Naturally, combinatorial properties of comma-free codes were studied extensively thereafter passing from the biological setting to words of arbitrary fixed length over alphabets of arbitrary size (see Golomb et al. 1958a andGolomb et al. 1958b). A series of papers was inspired by these seminal works, mostly dealing with purely combinatorial aspects of comma-free codes and finally posing some challenging open problems (see Cummings 1976;Eastman 1965;Levenshtein 2004;Scholtz 1969;Tang et al. 1987;Ball and Cummings 1976b;Bilotta et al. 2013). Later on, also strong comma-free codes (under the name of strongly regular codes or non-overlapping codes) were investigated and gained interest in automata theory as well as the theory of frame synchronization (see Blackburn 2015;Levenšteĭn 1964Levenšteĭn , 1970 as well as Bajić and Stojanović 2004;Bilotta et al. 2013Bilotta et al. , 2012Chee et al. 2013;Guibas and Odlyzko 1978).

3
However, in the early 1960s, after the Poly-U experiment by Nirenberg and Matthaei, it became clear that the proposal of Crick appraise to be wrong (Hayes 1998). In fact, there are 408 maximal comma-free codes (Golomb et al. 1958a) that code for at most 13 amino acids (Michel 2014). Nevertheless, recent works have shown that instead of comma-free codes, a weaker class of codes-called circular codes-is indeed used in protein-coding sequences. Circular codes are a less confined version of comma-free codes and can be used for normal reading frame retrieval (se Arquès and Michel 1996;Michel et al. 2008;Michel 2020). A particular circular code-called X-had been found by extensive statistical investigations in large samples of genetic data of archaea, plasmids and viruses, in addition to bacteria and eukaryotes (see Arquès and Michel 1996;Michel 2015Michel , 2017. The code X contains the following 20 trinucleotides: Arquès and Michel did not only discover that this code was able to detect frame-shifts in the normal reading frame but also in the two shifted frames and it is self-complementary which means that it is symmetric with respect to the double helix structure of the DNA. In Arquès and Michel (1996), it is proved that there exist exactly 216 such codes-called maximal self-complementary C 3 -codes. Among these 216 codes, the maximal number of amino acids that can be coded is 14 (see Michel 2014) while comma-free codes which are self-complementary, or C 3 , or C 3 self-complementary can contain at most 16 trinucleotides and code for at most 11 amino acids (Michel 2020). Fimmel et al. (2014) later showed that the class of 216 maximal self-complementary C 3 -codes over the genetic fourletter alphabet Σ = {A, C, G, T} can be partitioned into 27 equal-sized classes so that each of these equivalence classes has eight maximal self-complementary C 3 -codes that are related by a subset of transformations L ⊆ S Σ of the symmetric group S Σ . The use of the symmetric group in order to study circular codes had already been initiated in Michel and Pirillo (2011). The transformations in L are exactly those permutations of Σ that preserve self-complementarity and it was shown that L is isomorphic to the Dihedral group of order 8. The important implication of this result is that all codes in one equivalence class share the same error detecting properties while those in different classes are stronger or weaker. Moreover, recent findings by Seligman and others show that applying a systematic change of bases to RNA, e.g. by applying a transformation from S {A,C,G,T} , may lead to existing RNA-called Swinger RNA (Seligman 2016;Michel and Seligmann 2014). Particularly, the transformations from L turned out to yield such Swinger RNA. It is tempting to speculated that nature may use this mechanism X ={AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC}. in order to encode not only one set of information in DNA but 8 (the size of L) or even 24 (the size of S {A,C,G,T} ) sets at the same time.
In the present work, we extend the previous results to a finite alphabet of even length |Σ| = 2n for n ∈ ℕ and generalize the group L to L n . The motivation to restrict ourselves to alphabets of even cardinality obviously comes from biology due to the sets of complementary bases there. However, the presented approach could also be investigated for alphabets of odd size but some of the constructions, e.g. in Theorem 4.4, would not work in that case. In " Generalization of the group L" section, we describe L n and some of its properties using matrices. In "Equivalence classes of codes induced by the action of thegroup L n " section , we discuss equivalence classes of codes with respect to the action of the group L n . In general, we can't reach equally sized classes for any class of codes. For instance, we will show that maximal self-complementary circular codes can't be classified into equally sized equivalence classes due to the action of L n (see Example 4.18). On the other hand, we prove that dinucleotide circular codes are divided into equally sized equivalence classes due to the action of L n (see Lemma 4.19) and the same holds true for the general class of -maximum circular C l -codes over general alphabets (see Theorem 4.20).

Definitions and Notions
Let Σ be an arbitrary finite alphabet of size m. For a natural number l ≥ 2 an l-letter code simply is a subset X ⊆ Σ l where Σ l is the set of all words of length l over Σ (the length of a word is the number of its letters, e.g. x 1 ⋯ x n has length n) . As usual, Σ * denotes the set of all finite length words over Σ , i.e. Σ * = ⋃ n∈ℕ Σ n including the empty word . Given v, w ∈ Σ * , we call v a prefix of w if w = vv � for some v � ∈ Σ * and we call v a suffix of w if w = v � v for some v � ∈ Σ * . Moreover, if w = x 1 ⋯ x n ∈ Σ n , then i (w) = x i+1 ⋯ x n ⋅ x 1 ⋯ x i is called the i-th circular shift of w for n − 1 ≥ i ≥ 1 and we put 0 (w) = w . This notion obviously extends to sets, i.e.
We recall a few classical definitions of codes as follows: Definition 2.1 Let X ⊆ Σ l be an l-letter code and k ∈ ℕ . We say that X is (1) a k-circular l-letter code if for any m ≤ k and any concatenation x 1 ⋯ x m of l-tuple from X there is only one partition into l-tuple from X when read on a circle. In other words, for any 1 ≤ i ≤ l − 1 the circular shift (2) a circular l-letter code if it is a k-circular l-letter code for all k ∈ ℕ; (3) a strong comma-free code if no v ∈ Σ * , v ≠ appears both as a prefix and a suffix in X. In other words, given any two non-necessarily distinct elements b 1 = x 1 ⋯ x l and b 2 = y 1 ⋯ y l of X, for every k ∈ {1, ..., l − 1} we have (4) a comma-free code if for any two elements x 1 ⋯ x l and y 1 ⋯ y l in X, we have (5) a maximal (k−)circular (comma-free, strong commafree) l-letter code if it is not contained in a larger (k−) circular code; (6) a maximum (k−) circular (comma-free, strong commafree) l-letter code or, equivalently, code of maximal size if |Y| ≤ |X| whenever Y is a (k−)circular (comma-free, strong comma-free) l-letter code over Σ.
Obviously, any strong comma-free code is also commafree and hence also circular. Moreover, a maximum code is certainly also maximal. There is a general upper bound for the size of a maximum l-letter code ((k-)circular, commafree, strong comma-free), namely the maximal size of a 1-circular l-letter code over Σ . Such a code can contain at most one element from each of the complete classes Here complete means that the size of this set is equal to l. The number of such complete classes is given by where m = |Σ| and is the Möbius function and it was shown in Fimmel et al. (2019) that all maximum circular l-letter codes over Σ indeed have this size. However, this is not known for other classes of codes and therefore we add the following definition.
An interesting class of codes is also given by the socalled C n -codes. Note that it is not known if a maximum C n -code is also -maximum. Definition 2.3 A circular code X ⊆ Σ n is called a C n -code if also i (X) is circular for all 1 ≤ i ≤ n − 1 . In other words, the shifted codes of X are also circular.
For the convenience of the reader, we give some examples in a biological setting choosing Σ = B = {A, C, T, G} to be the genetic alphabet. Here, A stands for adenine, C stands for cytosine, G stands for guanine and T stands for thymine (see Then, X ⊆ Σ 3 is a maximum (and hence maximal) strong comma-free triletter code.
We will also consider the so-called symmetric group acting on the elements of the alphabet Σ which is defined as endowed with the usual group operation given by the composition of functions. The group S Σ has |Σ|! elements and for every l ∈ ℕ , any bijective mapping ∶ Σ → Σ can be applied componentwise to x ∈ Σ l and thus induces a bijective map Σ l → Σ l , which is also called . A bijection of S Σ is an involutory function (or an involution) if • (x) = x for every x ∈ Σ , i.e. is of order 2. A fixed point of a bijection ∈ S Σ is an element x ∈ Σ such that (x) = x . We will state a remark here that is clear but important in the sequel of the paper since it justifies why we will restrict to alphabets of even cardinality in the next section.

Remark 2.5
If the cardinality of |Σ| is even, then S Σ contains involutory bijections without fixed points.
We will need some further notations from general group theory (see Hall 1970;Rotman 1995 for further details). Give a group G and a subset S ⊆ G of G, the centralizer C G (S) of S in G is the set of all elements of G that commute with all elements of S, i.e. C G (S) = {g ∈ G|g•s = s•g for all s ∈ S} . Moreover, the normalizer of S in G is the set of all elements g of G that commute with S as a set but not necessarily point wise, i.e. N G (S) = {g ∈ G|g•S = S•g} . Both, centralizer and normalizer are subgroups of the group.
Motivated by recent results in mathematical biology related to the genetic code, we also define -self-complementarity of a code for some involution ∈ S Σ . Fix such an involution and a code X ⊆ Σ n , then X is called -selfcomplementary if ← (X) = X . Here, ← is the reversing operation and assigns to a word w = x 1 ⋯ x n ∈ Σ n the reversed word x n ⋯ x 1 .
Again for the convenience of the reader, we give an example in the biological setting for B = {A, C, G, T} the genetic alphabet. Fix c as the permutation (AT)(CG) that switches C and G and A and T, it is easy to see that the code is a c-self-complementary circular code.
We conclude this section with an easy result that shows that circularity, (strong)-comma freeness and also the C n -property of codes are preserved under permutations of the alphabet Σ , i.e. under the action of S Σ . Proposition 2.6 Let Σ be a finite alphabet and X ⊆ Σ n a circular (respectively, comma-free, strong comma-free, C n -) code. If ∈ S Σ , then (X) is again a circular (respectively, comma-free, strong comma-free, C n -) code.
Proof Easy-see also . ◻ However, in contrast to the above proposition, it is not true in general that permutations from S Σ preserve -selfcomplementarity of codes. In fact, it was shown in Fimmel et al. (2014) that in the setting of the genetic code, there are exactly eight permutations that preserve c-self-complementarity with c from above. Proof For the proof see (Fimmel et al. 2014). ◻ In Fimmel et al. (2014), it turned out that the only permutations from S Σ (with Σ = B = {A, C, G, T} ) that commute with c = (AT)(CG) are the following eight permutations that form a subgroup L of the symmetric group S Σ which is isomorphic to the dihedral group D 4 -the symmetry group of the square: X ={AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC}. This group will be studied later on in more detail and its multiplication table is given in Table 1: Consequently, the group L acts on the class of c-selfcomplementary codes. In the sequel of the paper, we will generalize this result to arbitrary alphabets and circular codes. In Fimmel et al. (2014), Fimmel et al showed that the class of 216 maximal c-self-complementary C 3 -codes over Σ = B = {A, C, G, T} can be partitioned into 27 equal-sized equivalence classes under the action of the subgroup L of the symmetrical group from (+). This representation had many implications on the study of these codes since the codes from the same equivalence class have the same error-detecting properties (Fimmel et al. 2014) and also share other properties. We now intend to generalize this approach and first define the group L in the more general setting. Let Σ be a finite alphabet as before.

Generalization of the group L
Lemma 3.1 Let S Σ be the symmetric group acting on the elements of the alphabet Σ and let ∈ S Σ be a permutation. Moreover, let L Σ ⊆ S Σ be the set of all bijections which commute with , i.e. Then, L Σ is a subgroup of S Σ .
Proof Clearly, L Σ is the centralizer of in S Σ and hence a subgroup of S Σ as it is well known in group theory (see, e.g. Hall 1970). ◻ We now want to determine the structure of L Σ when is an involution without fixed points. Thus, we will be p r e c assuming from now on and for the rest of the paper that Σ will denote an alphabet of even cardinality with |Σ| = 2n for some n ∈ ℕ . Moreover, we will assume that c ∈ S Σ is an involutory bijection without fixed points and we abbreviate L c Σ by L n = L c Σ .

Description of L n using matrices
In this subsection we first develop a description of the group L n using matrices. Recall that S Σ is the symmetric group acting on the elements of the alphabet Σ where |Σ| = 2n is even and c ∈ S Σ is an involution without fixed points. Moreover, Let Δ be the ring of all 2 × 2-matrices over the field F 2 = {0, 1} with two elements.

Lemma 3.2 The subgroup L n ⊆ S Σ of all bijections which commute with c is isomorphic to the group of all n × nmatrices over Δ such that in each row and in each column
there is exactly one non-trivial ∈ Δ of the form 1 0 0 1 or 0 1 1 0 . Thus, we have Proof Without loss of generality we may assume that Clearly, any ∈ S Σ can be represented as a (2n × 2n)-matrix with binary entries such that in every column and in every row there is exactly one entry equal to 1 and the remaining entries are 0. In this representation, c has the following form: Now, the product c• is obtained from the matrix associated to by swapping the (2i − 1) th and the (2i)th rows for (i = 1, … , n) , while the product •c is obtained from the matrix associated to by swapping the (2j − 1) th and the (2j)th columns for (j = 1, … , n) . If c• = •c , then we thus have for every pair i, j = 1, … , n There are only three 2 × 2-matrices with binary entries having at most one 1 in every row and column fulfilling this condition: Recalling that is represented by a binary matrix in which there is exactly one 1 in each row and column, it is obvious that the entire 2n × 2n -matrix consists of n × n blocks from Δ such that exactly n of them will have the shape E 1 or E 2 and the rest are the trivial matrices E 3 . Clearly, there are n!2 n possibilities for distributing these matrices and hence |L n | = n!2 n . ◻ We would like to remark that there are alternative descriptions of the group L n , e.g. using wreath products.

Remark 3.3
The group L n , which we described above, has very interesting group-theoretical properties. For instance, it can be proven that where X = {1, 2, … , n} and C 2 is the cyclic group of order 2 and at the same time where ⋉ denotes the (outer) semidirect product with respect to and ≀ X is the so-called wreath product. It is an interesting investigation in itself to get involved with the group. However, this is outside the scope of the article.

Equivalence classes of codes induced by the action of the group L n
Recall from the previous sections that Σ is a finite alphabet of even size 2n and c ∈ S Σ is an involution without fix points that was used to define the group L n consisting of all permutations in S Σ that commute with c. It was pointed out in " Definitions and Notions" Section that in this case, the mappings from L n preserve all properties of codes given in Definition 2.1 including c-self-complementarity. It is thus natural to define an equivalence relation on codes by setting for codes X and X ′ ; the corresponding equivalence class of a code X will be denoted by [X]. (Of course if the codes are not c-self-complementary, then one could use the full symmetric group instead of L n but for now we will restrict to the group L n ). In equivalence classes under the action of L 2 which are all of size 8 -the size of the group L 2 . However, in general, we cannot expect to get equivalence classes of the same size as was demonstrated in Keller (2014), Lemegne (2015).
In the following, we will consider different classes of codes and determine if the action of L n divides the class into equivalence classes of the same size or different sizes. As a basis, recall that Fimmel et al. in Fimmel et al. (2019) calculated the size and number of maximal l-letter circular codes, diletter and triletter comma-free codes, maximum self-complementary comma-free triletter codes, l-letter strong comma-free codes and maximal strong self-complementary comma-free triletter codes over Σ.
Let us remark that in the trivial case l = 1 , the code classes of maximal c-self-complementary strong commafree, strong comma-free, c-self-complementary comma-free, comma-free, c-self-complementary circular and circular codes coincide. In this case, there is only one maximal 1-letter code of each kind, namely X = Σ and, thus, its equivalence class consists of one element.

Equivalence classes of strong comma-free codes
Let us consider the class of maximal (c-self-complementary) strong comma-free codes.
We begin with a consideration of the case of the binary alphabet, in which the situation is rather unspectacular: Proof Let X be a strong comma-free l-letter code over Σ . If x ∈ X , then x must start with 0 and end with 1 or vice versa due to strong comma-freeness (follows immediately from the definition). Without loss of generality, assume that x starts with 0 and ends with 1. However, if c(X) = X , then X would also contain c(x) which starts with 1 and ends with 0-contradiction to strong comma-freeness. ◻ Before we continue our considerations about equivalence classes, we have to close a gap and count the number of maximal c-self-complementary strong comma-free diletter codes over an alphabet of even cardinality: Lemma 4.2 Let Σ be a finite alphabet with |Σ| = 2n for some n ∈ ℕ and let L n ⊂ S Σ be the centralizer C S Σ (c) of some involution c without fixed points. Then, there are 2 n different maximal (=maximum) c-self-complementary strong commafree diletter codes over Σ.
Proof In Fimmel et al. (2017b), the structure of a maximal diletter strong comma-free code over an arbitrary alphabet was described. For instance, if an alphabet is of an even cardinality, it is partitioned into two disjoint sets T + and T − of equal size so that each diletter from the code begins with a letter from T − and ends with a letter from T + . In order to construct a c-self-complementary code, we have to ensure that for every x ∈ Σ x and c(x) belong to different sets. There are n pairs of complementary letters, thus, we have 2 n possibilities to constitute T − and, correspondingly, T + . ◻ In general, it can be assumed that in the case of maximum strong comma-free codes, the equivalence classes can be smaller than |L n | , as the following example shows: Example 4.3 Let B = {A, C, G, T} be the genetic alphabet. There are exactly eight maximal (=maximum) strong comma-free triletter codes of size 9 as follows: Each of these codes is invariant under the permutation (CG) and (AT) from L 2 . Thus, there are two equivalence classes induced by the action of L 2 , namely The following theorem gets to the bottom of the problem and shows that for any word and alphabet cardinality for the classes of maximal (self-complementary) strong comma-free codes, some L n -induced equivalence classes are truly smaller than the order of L n . The result is at most general, since it applies to any l-letter ( l ≥ 1 ) words.
Theorem 4.4 Let Σ be a finite alphabet with |Σ| = 2n for some 1 < n ∈ ℕ and let L n ⊂ S Σ be the centralizer of some involution c without fix points. Moreover, let l ∈ ℕ . Then, the action of L n on C induces some equivalence classes of size strictly less than |L n | where C is one of the following classes of codes: (1) The class of all maximal (=maximum) strong commafree l-letter codes; (2) The class of all maximal (=maximum) c-self-complementary strong comma-free l-letter codes for n ≥ 3. [X 8 ] = {X 4 , X 5 , X 7 , X 8 } and [X 1 ] = {X 1 , X 2 , X 3 , X 6 } Proof Let Σ and c as well as L n and l be given as stated in the theorem. We start with the proof of (1), so let C be the class of all maximal strong comma-free l-letter codes over Σ .
(1) For l = 1 , there is only one maximal strong comma-free 1-letter code, namely X = Σ and, thus, its equivalence class consists of one element but |L n | > 1.
Let us now consider the case l = 2 . In Fimmel et al. (2017b), it is shown that the number of maximal (=maximum) strong comma-free diletter codes over Σ is equal to 2n n = (2n)! (n!) 2 . This number cannot be divided by the order of the group |L n | = n!2 n , hence there must be an equivalence class of size strictly smaller than |L n | .
Let us consider now the case l ≥ 3: By (Fimmel et al. 2019, Theorem 5.1.), there is a bijection between the class C and the collection P of all sequences ((T Note that partition means in particular that the sets T − i and T + i are disjoint and also note that in (a), it is required that T − 1 and T + 1 are both non-empty while T − i and T + i for i > 1 can be empty. The bijection above is given by sending such a partition sequence ((T − i , T + i )) 1≤i≤l to the code We now fix a permutation ∈ L n that has order not equal to 2n (e.g. the permutation c -note that n > 1 and hence 2n > 2 ). We aim to construct a code in C such that (X) = X and hence the induced equivalence class under L n would be of non-maximal size. Since the order of is smaller than 2n, we can choose T − 1 and T + 1 nonempty and disjoint such that By induction on i, we claim that we can also choose T − i and T + i such that (a) and (b) hold and also (T It is now obvious that we can choose T − i ≠ � and Since T − 1 and T + 1 were both nonempty, it follows that T − i does not cover all of It follows that X = ∑ l−1 i=1 T − i T + l−i then also satisfies (X) = X and so we have proved (1).
(2) We now prove (2). Assume now that C is the class of all maximal (=maximum) c-self-complementary strong comma-free l-letter codes. For l = 1 , there is only one maximal c-self-complementary strong comma-free 1-letter code, namely X = Σ and, thus, its equivalence class consists of one element but |L n | > 1.
Let us now consider the case l = 2 . In Lemma 4.2, it is shown that the number of maximal c-self-complementary strong comma-free diletter codes over Σ is equal to 2 n . This number cannot be divided by the order of the group |L n | = n!2 n , hence there must be an equivalence class of size strictly smaller than that of L n .
Let us consider now the case l ≥ 3: Again we fix ∈ L n , so recall that c• = •c . However, this time we require the following extra condition: for all N 1 , N 2 ∈ Σ , s ≥ 1 (note that N 1 = N 2 is not excluded!).
For instance, if Σ = {a i , b i | c(a i ) = b i , i ≤ n} , then we can choose = (a 1 a 2 ⋯a n )(b 1 b 2 ⋯b n ) . Clearly, this commutes with c and hence ∈ L n . Moreover, has order at least 3 since n ≥ 3 and thus the above condition holds. We now claim that also for all N 1 , N 2 , ⋯ , N r ∈ Σ and s < ord( ) . However, this is immediate since for r odd equality above would imply that s (N r+1 2 ) = c(N r+1 2 ) and that was excluded by assumption (+) . Moreover, if r is even, then equality would imply that s (N r 2 N r 2 +1 ) = c(N r 2 +1 N r 2 ) contradicting (+).
We proceed as in the proof of (1) in order to construct a code X ∈ C such that (X) = X . Thus, we need to ensure that X is also c-self-complementary. Therefore, we choose T − 1 and T + 1 as above but in addition we require that ⟵ c(T − 1 ) = T + 1 (and consequently also ⟵ c(T + 1 ) = T − 1 ).
For instance, we choose a letter x ∈ Σ and take the orbit Orb(x) under . Then, (Orb(x)) = Orb(x) . Moreover, since commutes with c. So also ⟵ c(Orb(x)) is invariant under . We now have to ensure that But this follows from our property (++) . It now follows that is a disjoint union where K = Σ�(Orb(x) ∪ ⟵ c(Orb(x))) and we can just split the rest K into two disjoint parts K 1 and K 2 such that ⟵ c(K 1 ) = K 2 and both are invariant under and finally put L 1 = Orb(x) ∪ K 1 and R 1 = ⟵ c(Orb(x)) ∪ K 2 . An easy induction argument shows that we can continue this way as in the proof of (1) but each time excluding the self-complementary r-letter words and end with a code X ∈ C that satisfies both conditions ⟵ c(X) = X and (X) = X. ◻ We would like to remark that the above proof certainly also works for ∈ S Σ of order less than 2n in case (1). It is not clear if not all of the codes are invariant under some permutation.

Remark 4.5
For n = 2 , part (2) of Theorem 4.4 above is not correct as the following example shows. The construction in Theorem 4.4 leads to eight different maximal (=maximum) c-self-complementary strong comma-free triletter codes: which constitute a single equivalence class of maximal size 8.
The reason is that in the proof of Theorem 4.4 part 2, it is essential to assume that n ≥ 3 because for n = 2 it is not possible to choose with property (+) : For instance, (AG)(CT) would imply that (AC) = GT = ← c(AC). [X 1 ] = {X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 , X 8 }.
The following example shows how the construction from Theorem 4.4 works.

Finally, one puts
This code X is a strong comma-free code which is c-selfcomplementary and invariant under .
However, the result of Theorem 4.4 does not mean that all equivalence classes are smaller than |L n | . There can also be equivalence classes of maximal cardinality as the following example shows: Then, X is not invariant under any permutation from L 3 and hence X generates an equivalence class of size equal to the size of L 3 .
We shortly give an argument why the code in the above Example is not invariant under any permutation from L 3 : Clearly, the size of X is 18 and it is strong comma-free since all triletters in X start with either A, C or O 2 and ends with either G, T or O 1 , so first letters and last letters are disjoint. Moreover, A appears exactly nine times in the first position, C exactly six times and O 2 exactly three times. Thus, any permutation from L 3 that leaves X invariant must fix these three letters and a similar argument on the last letters shows that it must also fix G, T and O 1 , hence it is the identity.
Remark 4.8 In case of maximal (c-self-complementary) strong comma-free triletter codes, the negative result can also be obtained more easily. The numbers of maximum (c-self-complementary) strong comma-free triletter codes calculated in Fimmel et al. (2019) correspondingly, cannot be divided by the order of the group |L n | = n!2 n .

Equivalence classes of comma-free codes
We now consider the class of comma-free codes and start with a first observation.

Proposition 4.9
Let Σ be a finite alphabet with |Σ| = 2n for some n ∈ ℕ and let L n ⊂ S Σ be the centralizer of some involution c without fix points. Moreover, let C be the class of 1. Maximum diletter comma-free codes over Σ; 2. Maximum c-self-complementary comma-free triletter codes over Σ with n ≠ 1; Then, the action of L n on C induces some equivalence classes of size strictly less than | L n |. Proof In order to show that L n induces equivalence classes of sizes |L n | when acting on the mentioned class C of codes, we show that the size of L n is not a divisor of the number of such codes.
1. The number of maximal comma-free diletter codes over Σ is where m ∶= 2n 3 and r = 2n − 3m (see Fimmel et al. 2019). Since the order of L n is 2 n n! , we expect that the size of an induced equivalence class is However, this number is not an integer. Hence, there must be equivalence classes of smaller size than the size of L n under the action of L n on maximal comma-free diletter codes. 2. The number of maximal c-self-complementary commafree triletter codes over Σ is 2 for n = 1 , 4 for n = 2 and 54 for n = 3 . Since in these cases, the order of L 1 , L 2 and L 3 is 2, 8 and 48, respectively, the order of L n is not a divisor of the number of codes except for n = 1.
For n ≥ 4 , we expect the size of an equivalence class to be However, this is obviously not an integer. Thus, there must be equivalence classes of smaller size than the size of L n under the action of L n on maximum self complementary comma-free triletter codes.
◻ We illustrate the above Proposition by some example where it is also shown that not all equivalence classes need to be of smaller size than the size of L n in the situation of Proposition 4.9. The equivalence classes of X 1 , X 2 , X 3 and X 4 are identical and coincide with the set {X 1 , X 2 , X 3 , X 4 }.
In the above Proposition 4.9, we have seen that in some cases, the equivalence class sizes can be smaller than |L n | because the number of codes in the code class C is not divisible by the order of the group L n . This is not the case for the maximum triletter comma-free codes: Their number (compare Fimmel et al. 2019) is divisible by the order of the group L n . However, even in this case, there are equivalence classes that are smaller than the order of the group L n . To show this, we first look at the criterion for the maximum triletter comma-free codes proven in Golomb et al. (1958b): Theorem 4.11 (Golomb, Welsh 1958) Let Σ be a finite alphabet with |Σ| = m , X ⊂ Σ 3 a triletter code over Σ with |X| = (m 3 −m) 3 . For m > 2 , the necessary and sufficient condition that X constitute a maximum comma-free triletter code is that no initial diletter ever occurs as a final diletter. (1 + √ 2) 2n 2 � (2n)! For m = 2 , the Theorem above is not true, as the following example shows (compare Golomb et al. 1958b):
In the following, we will only need the sufficient condition of Theorem 4.11 , which we will prove for the convenience of the reader: Lemma 4.13 Let Σ be a finite alphabet with |Σ| = m , X ⊂ Σ 3 a triletter code over Σ in that no initial diletter ever occurs as a final diletter. Then, X is comma-free.

Consider a concatenation
It is clear that N 2 N 3 N 4 ∉ X and N 3 N 4 N 5 ∉ X since there are no words in X beginning with N 2 N 3 or ending with N 4 N 5 . Thus, X is comma-free. ◻ Theorem 4.14 Let S Σ be the symmetric group acting on the elements of the alphabet Σ with |Σ| = 2n, n ∈ ℕ , c ∈ S Σ an involutory bijection without fix points. Then, there is a maximum triletter comma-free code X ⊂ Σ 3 with Proof Without loss of generality, we may assume that Further let N 1 N 2 N 3 ∈ Σ 3 ⧵ {xxx|x ∈ Σ} , [N 1 N 2 N 3 ] its (complete) cyclic equivalence class and Let us remark that M > m is always valid.

Case m is even or m is odd and
In this case, we choose with y = m, x ≥ y = m, z > y = m . This choice is possible because of the definition of m and because not all three positions N i are equal. Let us remark that obviously in this case, no initial diletter ever occurs as a final diletter.

Case m is odd and
In this case, we choose with y = x, z ≠ x, |y − z| = 1 . Thus, has the same shape again and no initial diletter ever occurs as a final diletter in both cases 1. and 2.1.

For all
In this case, we choose with x = M > m + 1 . Then, c(x) > m + 1 since m + 1 is even and c(y) = z . Thus, c(xyz) has the same shape again and no initial diletter ever occurs as a final diletter. The so constructed code X ⊂ Σ 3 ⧵ {xxx|x ∈ Σ} is commafree according to Theorem 4.13: Let' us take a look at xyz ∈ X . Due to the construction, we always have x ≥ y if y ≠ z applies. Therefore, an initial diletter xy with x = y can never appear as a final diletter. If x > y applies, xy cannot appear as final diletter in every tuple from the first case, because y < z applies to it. In case 2. the final diletter always looks like (m + 1)m or m(m + 1) with an odd m. The constellation (m + 1)m can only appear as the initial diletter in case 1, with an even m. In summary, in the constructed code, an initial diletter can never appear as the final diletter, so the code is comma-free.
X is also maximum (in fact -maximum), because we have chosen exactly one element from each complete equivalence class. Furthermore, according to construction, c(X) = X is valid. ◻ Again for the convenience of the reader, we illustrate the above Theorem by an example.

Example 4.15
Let B = {A, C, G, T} be the genetic alphabet. If we assign to A 1 , to T 2, to C 3 and to G 4, the known complementarity transformation will correspond to the c defined in the Theorem 4.14. The code X we define according to the theorem above will look like this: AAT, CAC, GAC, CTA, CAG, GAG, GTA, CAT, GAT, TTA, CCG, CTC, GGC, GTC, CTG, TTC, GTG, TTG}. The code is maximum comma-free and invariant under c.
As an immediate corollary we obtain Corollary 4.16 Let Σ be a finite alphabet with |Σ| = 2n for some n ∈ ℕ and let L n ⊂ S Σ be the centralizer of some involution c without fix points. Moreover, let C be the class of maximum triletter comma-free codes over Σ . Then, the action of L n on C induces some equivalence classes of size strictly less than |L n |.
Proof Immediately follows from Theorem 4.14. ◻ Again we illustrate the above Corollary by some example.

Equivalence classes of circular and C l -codes
As explained above, the task of the article is motivated by the successful division of the set of all maximum self-complementary C 3 -codes and the associated practical benefit for the research of code classes. So it was an obvious idea to try to do the same with the class of maximum self-complementary circular codes. This failed (see Lemegne 2015). For instance, dropping the C 3 -property it turned out that for the class of maximal c-self-complementary circular codes, the action of L 2 induces 64 equivalence classes of size 8 but also 2 equivalence classes of size 4: Lemma 4.19 Let Σ be a finite alphabet with |Σ| = 2n for some n ∈ ℕ and let L n ⊂ S Σ be the centralizer of some involution c without fixed points. Let C be the class of all maximal diletter circular codes. Then, the action of L n induces equally sized equivalence classes (of size | L n | ) on C.
Proof To show that all equivalence classes indeed have the same size we recall a result from Fimmel et.al in Fimmel et al. (2019) where it was proved that any maximal diletter circular code has a presentation of the following form: where Σ = {N 1 , … , N 2n } . Consequently, the first diletter N 1 appears 2n − 1 times as a prefix of words from X, the second letter N 2 appears 2n − 2 times and so on. The last but one letter N 2n−1 has only one occurrence as a prefix of some word from X while the last letter N 2n never occurs as a prefix. Now, assuming that ∈ L n satisfies (X) = X , we conclude that for every i ≤ 2n , we must have (N i ) = N i which means that = id. Thus, no maximal diletter circular code is invariant under a nontrivial ∈ L n . ◻ With the theorem below, we try to explain which code properties are responsible for the success or failure of a code class division into classes of equal size. Recall that -maximum means that the code contains exactly one l-letter from each complete equivalence class.
Theorem 4.20 Let Σ be a finite alphabet or arbitrary size m and let C be the class of all -maximum l-letter C l codes for some natural number l. Then, the action of S Σ induces equally sized equivalence classes on C. (TC) (X 1 ) = (AG) (X 1 ) = X 1 , (TC) (X 2 ) = (AG) (X 2 ) = X 2 .
Proof We try to prove the above theorem by showing that for any X ∈ C and any ∈ S Σ with ≠ id we have (X) ≠ X . First, we collect some facts that we would like to use. So assume Σ and X ∈ C are given, i.e. X is a maximum l-letter C l code. Last but not least assume that ∈ S Σ such that (X) = X . Then, the following hold: (i) Let k = ord( ) , i.e. k is the smallest natural number s such that s = e , the identity. Then, also s (X) = X for all s ≤ k . Moreover, k−1 = −1 ; (ii) Since s (X) = X and obviously commutes with 1 , ⋯ , l−1 we also have that s ( i (X)) = i (X) for all s ≤ k and i ≤ l − 1; (iii) Any l-letter N 1 ⋯ N l ∈ Σ l must be contained in either X or one of the i (X) ( i ≤ l − 1 ) by maximality of X provided N 1 ⋯ N l generates a complete equivalence class. (iv) If x = N 1 ⋯ N l is an l-letter such that for some i we have N i = N i+1 and for all j ≠ i we have N j ≠ N j+1 (i.e. x has only one pair of identical consecutive letters with the convention that l + 1 = 1 ), then x generates a complete equivalence class (because any shift of x moves the only two identical consecutive letters to another position).
We now write as a direct product of disjoint cycles, i.e.
where each j is of the form (N 1 ,⋯,N k (j)) for different N 1 , ⋯ , N k(j) ∈ Σ . Since the cycles are disjoint, it follows that for each j also conditions (i) to (iii) from above hold when considered on X ∩ {N 1 , ⋯ , N k(j) } . Thus, we assume without loss of generality that = 1 = (N 1 ,⋯,N k ) is a cycle keeping in mind that from now on all arguments have to use l-letter word from {N 1 , ⋯ , N k(j) } only. We now distinguish cases: • Case 1: k is even.
In this case, let s = k 2 . Then s (N 1 ) = N s+1 and s (N s+1 ) = N 1 . a) l Let x = N 1 N s+1 ⋯ N 1 N s+1 N 1 . By condition (iv), it follows that x generates a complete equivalence class. Hence, we may assume without loss of generality that x ∈ X by condition (iii) (note that x has exactly two identical consecutive letters N 1 ). However, s (x) = N s+1 N 1 ⋯ N s+1 N 1 N s+1 ∈ X then implies that = 1 ⋯ l has two decompositions contradicting the circularity of X. b) l Let x = N 1 N s+1 · · · N 1 N s+1 N s+1 N 1 · · · N s+1 N 1 where each coloured part consists of exactly l 2 letters. Again, by condition (iv), the word x generates a complete equivalence class (note that it has exactly to identical consecutive letters N s+1 ) and by condition (iii), we may assume that x ∈ X . However, is then in the same equivalence class as x contradicting circularity of X.
We need to distinguish cases again according to the size of l.
Since l < k , the word x is a concatenation of k words of length l, say y 1 ⋯ y k with y 1 = N 1 ⋯ N l . Since all N i were different, y 1 generates a complete equivalence class and hence we may assume that x ∈ X by condition (iii). Moreover, l (y i ) = y i+1 for all i < k and l (y k ) = y 1 . Thus, also y 2 , ⋯ , y k ∈ X . This shows that x ∈ X k . However, a similar argument shows that applied to x gives 1 (x) and hence also 1 (X) ∈ X k -a contradiction to the circularity of X. b) l ≥ k l ≢ 0 k In this case, the same construction as in Case a) applies and yields a contradiction. Note that also in this case, the y 1 , … , y k generate complete equivalence classes. c) l ≥ k l ≡ 0 k L e t l = mk a n d c h o o s e x = N 1 … N 1 N 2 … N 2 … N k ⋯ N k -the concatenation of blocks of m copies of N i . Then, clearly x generates a complete equivalence class and by (iii) we may assume that x ∈ X . However, then (x) = N 2 ⋯ N 2 N 3 … N 3 … N k … N k N 1 … N 1 ∈ X contradicts circularity since obviously x and (x) are in the same equivalence class.
◻ A first corollary is immediate.
Corollary 4.21 Let Σ be a finite alphabet or even size 2n and let C be the class of all -maximum c-self-complementary l-letter C l codes for some natural number l where c ∈ S Σ is an involution without fix points. Then, the action of L n induces equally sized equivalence classes on C.
Proof Follows directly from Theorem 4.20. ◻ We would like to remark that it is an open question if maximum and -maximum are the same for C l -codes. However, it is true for circular codes. We have an immediate corollary that is well known. Proof Follows directly from the above Theorem 4.20 since maximum in this case is indeed the same as -maximum. ◻

Conclusions
In the present work, classes of l-letter codes over general alphabets Σ have been investigated with respect to their behaviour under the natural action of a specific subgroup L of the symmetric group S Σ acting on the letters of the alphabet. These codes all share some error-detecting and -correcting properties of decreasing strength from strong comma-freeness to comma-freeness to circularity. The group L was motivated from a biological context where the class of maximal circular self-complementary C 3 -codes had been found in nature and seem to play an important role for frame retrieval during the translation process in the ribosome (see Arquès and Michel 1996;Michel 2015Michel , 2017. Self-complementarity originates from the double helix structure of the DNA but in general it can be seen as a kind of correspondence between letters, e.g. in the binary case 0 and 1 correspond to each other. Based on these findings, several models of the evolution of the genetic code were developed proposing strong comma-free or comma-free ancient predecessor codes of the current standard genetic code (see Fimmel et al. 2020. Passing from the biological context to coding theory and the field of signal processing all classes of codes were deeply investigated with respect to their errorrevealing properties using graph theory and combinatorics (see Ball and Cummings 1976a;Fimmel et al. 2020Fimmel et al. , 2019Fimmel et al. , 2017aFimmel et al. , b, 2016Fimmel et al. , 2014Levenshtein 2004). Three important observations had motivated our research. The first one is that codes belonging to the same equivalence class under the action of the group L share identical errordetecting and error-correcting properties. Thus, it seemed reasonable to investigate how large such equivalence classes turn out to be. In the genetic code setting, it had already been observed that the 216 maximal self-complementary C 3 -codes are divided into 27 equivalence classes of size |L|. However, for general circular codes or comma-free codes, this was wrong (see Keller 2014;Lemegne 2015). The second motivation was given by several research studies showing that there are variants of the genetic code that are based on six bases and other research studies proposing ancient genetic codes that used dinucleotides, tetra-nucleotides or even penta-nucleotides for coding amino acids (see Demongeot and Seligmann 2020;Fimmel et al. 2020;Malyshev et al. 2014;Michel and Pirillo 2013). Therefore, it was reasonable to study codes not only in the triletter case over the genetic alphabet with four letters but general l-letter codes over larger alphabets. The last motivation came from a series of papers by Seligman (see Seligmann 2020, 2019;Michel and Seligmann 2014;Seligman 2016) who discovered so-called Swinger RNA which is RNA that can be obtained from different RNA by applying a systematic change of bases (i.e. by applying one of the transformations from L). It was speculated that nature may use this mechanism in order to encode not only one set of information in DNA but 8 (the size of L) or even 24 (the size of S {A,C,G,T} ) sets at the same time. These Swinger copies would then use the corresponding circular code in the equivalence class of codes under L for frame synchronization.
Our results clarify completely the situation for several classes of codes showing the (non-) existence of equivalence classes of size |L| or strictly smaller size. Besides the canonical application to the genetic code or the extended (up to six coding nucleotide bases) genetic code, the case of the binary alphabet is especially important for applications in signal processing. It proves to be a special case for classes of maximal strong comma-free and maximum selfcomplementary comma-free trinucleotide codes. Namely, only in this case, the corresponding equivalence classes all have the maximum possible size.
Moreover, the results of the present investigation suggest that the code properties responsible for the maximal size of equivalence classes are that the codes are maximally large and retain their error-detecting properties in all frames ( C l property).
Funding Open Access funding enabled and organized by Projekt DEAL..

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.