Circular Tessera Codes in the Evolution of the Genetic Code.

The origin of the modern genetic code and the mechanisms that have contributed to its present form raise many questions. The main goal of this work is to test two hypotheses concerning the development of the genetic code for their compatibility and complementarity and see if they could benefit from each other. On the one hand, Gonzalez, Giannerini and Rosa developed a theory, based on four-based codons, which they called tesserae. This theory can explain the degeneracy of the modern vertebrate mitochondrial code. On the other hand, in the 1990s, so-called circular codes were discovered in nature, which seem to ensure the maintenance of a correct reading-frame during the translation process. It turns out that the two concepts not only do not contradict each other, but on the contrary complement and enrichen each other.


Introduction
In 1986, John Maynard Smith stated: "We understand biological phenomena only when we have invented machines with similar properties" (Smith 1986, pp 99-100). This quotation explains the motivation of this work quite well. This paper was written in order to better understand the origin of the genetic code using such a machinery. One possible machine or rather a model which gives a feasible explanation for an important aspect of the evolutionary processes of the genetic code was found by Gonzalez, Giannerini and Rosa. In their work "On the origin of degeneration in the genetic code" (Gonzalez et al. 2019)  coding and especially on symmetry as an essential cause and consequence of the natural phenomena of degeneracy (compare also . A famous example, which shows the importance of including symmetry deliberations when considering natural phenomena, can be found in quantum mechanics. Here, symmetry describes more than just the patterns that matter takes -it is used to classify the nature of quantum states. This is by far not the only example of its kind. Noether's theorem even states a one-to-one connection between fundamental laws of natureso-called conservation laws-and respective symmetries in nature. Taking these general considerations into account, Gonzalez, Giannerini, and Rosa argue that none of the theories regarding the origin of the genetic code pays the necessary attention to the idea of symmetry (Gonzalez et al. 2019). As a consequence the concept of tessera codes was developed. The tesserae build a subset of all tetranucleotides, chosen in such a way that the degeneracy of the vertebrate mitochondrial genetic code can be explained from the symmetries of the tesserae (Gonzalez et al. 2012).
The other line of thought adressed by the current work is the theory of circular codes. This theory is intended to explain the property of the noise-immunity of the genetic code, and is based on a proposal by Crick et al. (1957). They argue that the coding of amino acids requires only a subset of codons where the correct reading-frame is automatically and immediately recognizable -the so-called comma-free property. While Crick's theory was refuted in reality (Nirenberg and Matthaei 1961), 40 years later so-called circular codes were discovered in nature (Arqués and Michel 1996). More specifically, it has been noticed that the set of codons, which, together with their frame-shifts in three potential reading-frames, are the most commonly used across all species, has very remarkable properties in terms of detecting the correct reading-frame Michel 2017). The comma-free codes proposed by Crick belong to the same family of circular codes, but within them they have the most distinct error-detecting properties (see, for instance, Fimmel et al. , 2017. The natural circular codes have even more interesting structural properties, which makes it very doubtful that these structures play no role in biological processes (Arqués andMichel 1996, Fimmel and).
The primary goal of this work is to combine the two concepts, tesserae and circular codes, and see if they could benefit from each other. In this work we specify among other things a construction algorithm for circular tessera codes of maximal length. Furthermore, self-complementary tessera codes are characterized and criteria for their self-complementarity are formulated and proved in the language of graph theory. The growth tables for circular and comma-free tessera codes are also presented for the first time. In summary, one result of the work is that the two concepts under scrutinythat of tessera codes and circularity-have proved to be mutually compatible and complementary.
Thus, with this work we hope to bring more clarity into the possible role of tesserae in the evolutionary process of the genetic code and the mechanisms behind it.

Definitions and Notations
The genetic code is written with words of three letters called codons, built over an alphabet of four letters which are called nucleotide bases Uracil (Thymine), Cytosine, Adenine, and Guanine, in short U (T ), C, A, G. Clearly, the number of codons is 4 3 = 64 and by |B 3 | we will denote the cardinality of the set B 3 . Accordingly, the set B 2 denotes the set of 16 dinucleotides and the set B 4 contains the 256 tetranucleotides. It is hypothesized that during evolution the genetic code had several ancestors that might have consisted not only of trinucleotides but of dinucleotides or tetranucleotides or even combinations of these (see Baranov et al. 2009;Gonzalez et al. 2012;Seligmann 2014;Patel 2005;Wilhelm and Nikolajewa 2004;Wu et al. 2005). In particular, in Gonzalez et al. (2012) the tessera code was suggested as an ancestral code that might have been the origin of the mitochondrial code (see also Gonzalez et al. 2019). In order to define the tessera code we have to introduce some group theory and how it can be applied in the genetic setting.

Klein Four-Group and Equivalence Classes of Dinucleotides
The symmetric group on a set of elements is usually known as the group of permutations of these elements. Applying this to our genetic alphabet B we define the symmetric group S B as S B = {π : B → B | π is bi jective} with the usual group operation given by composition of functions. Recall that a group (H , •) is a set H together with an operation • : H → H such that • is associative and H contains a neutral element e as well as inverses h −1 for all h ∈ H (see Rotman 1995 for more details on groups). The group S B has 4! = 24 elements and is trivially isomorphic to the symmetric group S 4 on four elements. We will use standard notation as can be found in Rotman (1995), e.g. we will either write π = (A, G, C, U ) or π : and π(G) = C. Naturally, any permutation π : B → B can be applied to n-nucleotides of any length componentwise, i.e. if x = b 1 · · · b n ∈ B n , then π(x) = π(b 1 ) · · · π(b n ). There is no danger of confusion when denoting the induced bijective map B n → B n by π again for any natural number n. In Fimmel et al. (2014), Fimmel et al. (2015 a subgroup L of S B was identified that seems to play an important role in error-detection and error-correction mechanisms during the translation process. This group consists of all permutations from S B that preserve the codon-anticodon relation and can be geometrically interpreted as the symmetry group of a square. In particular, it contains 4 bijective transformations In particular, the complementary map c is biologically important since it mirrors the hydrogen bonds A ↔ T and C ↔ G of the DNA double helix. Moreover, the transformation r from above carries codons of degeneracy class 4 to codons of degeneracy class less than 4 and vice versa -a symmetry property of the genetic code that was already observed by Rumer (see Fimmel et al. 2014Fimmel et al. , 2015 for more details). In the sequel we will denote the set of these four transformations as V = {I , SW , Y R, K M} ( Fig. 1). Equipped with the usual group operation of S B the set V forms a subgroup of the symmetric group S B which is isomorph to the so-called Klein four-group. It can be easily verified that the group V is commutative, i.e. α • β = β • α for all α, β ∈ V and that all permutations in V are of order two, i.e applying them twice yields the identity α • α = id for every α ∈ V.
As we will see in the next section, the group V is used in order to define the class of tesserae in mathematical terms. If we consider V acting on the set of dinucleotides B 2 we obtain four orbits of size four. Recall that an orbit of an element x (here a dinculeotide) under some group H (here V) is defined as [x] = {h(x) : h ∈ H }. Each orbit represents an equivalence class under the natural equivalence relation d 1 d 2 ∼ We are now almost in the position to define the set of tesserae as introduced in Gonzalez et al. (2012). But before we need some more technicalities. Besides the group S B acting as a group of exchanges of bases, there is a second important group which consists of transformations that permute the positions of single bases in a nucleotide sequence. Together with the usual composition • of maps these permutations form again a group that once more can be seen as a symmetric group S n . For the convenience of the reader we here only recall the biologically relevant permutations that will be of importance for us: the so-called reversing permutation and the n − 1 shift operations α 1 , · · · , α n−1 . Given an n-nucleotide x = N 1 · · · N n we define ← − and α k for k ≤ n − 1 as ← −−−−− − N 1 · · · N n = N n · · · N 1 , α k (x) = N k+1 · · · N n N 1 · · · N k which are the n-nucleotides obtained from x by reversing or a shift of k positions, respectively. Explicitely, for n = 4 we have It is now obvious that the anti-n-nucleotide of some n-nucleotide x can be described as ← −−− − SW (x) with the complementary map SW from V. For trinucleotides (codons) it is well-known that the anti-codon is always different from the codon. However, if n is even it might happen that ← −−− − SW (x) = x for some n-nucleotide x. These nucleotide

Tesserae: Definition and Structure
Tesserae were motivated biologically in an evolutionary context in Gonzalez et al. (2012). Each tessera is a tetranucleotide that has a particular form that comes from the symmetries induced by the group V. Let us give a definition of a tessera in mathematical terms (see also Gonzalez et al. 2012 andStrüngmann 2019): Definition 2.1 A tessera is a tetranucleotide (four letter word) t ∈ B 4 of the form The set of all valid tesserae is denoted by T E SS.
The set T E SS is also called the tessera code since it is a subset of B 4 and hence a code in the sense that every concatenation of words from T E SS has a unique decomposition over T E SS. Clearly, the size of T E SS is 64 and so we have |T E SS| = |B 3 |. Table 2 shows the set of all tesserae together with their generating transformation.
It is easy to see that a codon N 1 N 2 N 3 ∈ B 3 can be uniquely extended to a valid tessera tess(N 1 N 2 N 3 ) = N 1 N 2 N 3 N 4 by determing the unique permutation α ∈ V such that α(N 1 ) = N 3 and letting N 4 = α(N 3 ). This shows that the tessera code T E SS is 1-error-correcting and it was shown in Fimmel and Strüngmann (2019) that T E SS can be obtained as a linear code from B 3 and by the so-called Plotkin construction from B 2 -for more details on this see (Fimmel and Strüngmann 2019). In Gonzalez et al. (2012) the idea of symmetric primeval adaptor molecules that could recognize the normal reading frame in the coding strand in the 3 -5 direction, in the complementary strand in the 3 -5 direction, in the coding strand in the reverse 5 -3 direction and in the complementary strand in the reverse 5 -3 direction was utilized to propose an ancient model of tRNA adaptors that explains the reading mechanism and degeneracy distribution of the tesserae. In particular, since there exist self-complementary tesserae, e.g. AC GU , the tessera code allows degeneracy 2 and 4 only. Maintaining the degeneracy an algorithm was suggested in Gonzalez et al. (2019) for passing from the tessera code back to the (mitochondrial) genetic code in the following way: We assign to each of the transformations from V a letter in the genetic alphabet via I ↔ A, SW ↔ U , K M ↔ C and Y R ↔ G and then perform the following algorithm displayed in Fig. 2.
For instance, the tessera AC GU will be mapped to the codon CUU since K M(A) = C and SW (C) = G. In the sequel we will denote by cod(N 1 N 2 N 3 N 4 ) the corresponding codon under this algorithm. However, note that the two mappings tess(·) and cod(·) are not inverses of each other. We now aim for a better description of tesserae. Let us assume that N 1 N 2 N 3 N 4 is a tessera. By definition there is an element α ∈ V such that This implies that N 1 N 2 and N 3 N 4 have to be in the same equivalence class α displayed in Table 1. Thus, the tessera code can be split into four disjoint subsets.
Clearly, any subset X ⊆ T E SS has a similar induced decomposition where the components could be empty.
Definition 2.2 Let X ⊆ B 4 be a tessera code. Then The above decomposition will be used in Sect. 4 for constructing all maximal circular tessera codes.

Graph Theoretical Approach
In this section we recall a graph theory approach from  that turned out to be very useful for determining properties of circular codes (see Sect. 3 for the definition of circularity) and extend it to our setting of tesserae. To each subset X ⊆ B n a directed graph G(X ) will be associated as the union of disjoint components C j (X ) where 1 ≤ j ≤ n 2 . The vertices of such a component C j (X ) will be initial segments and end segments of n-tuples from X of length l and n − l, respectively.
) with set of vertices V j (X ) and set of arcs E j (X ) as follows: It is easy to see that the graph components C j (X ) of a representing graph G(X ) are pairwise disjoint since their labels have different lengths. However, the components need not be connected. For the convenience of the reader and for a better illustration we give some examples for n = 2, 3 and 4 (Figs. 3, 4 and 5).
Since the tesserae are tetranucleotides it follows that any set of tesserae has two (maybe empty) graph components in their representing graph, one with labels of length 1 and 3 and the other with labels of length 2. In  the graph approach was used to characterize circularity of codes in terms of graph theory. We will consider circular tessera codes in the next section but it seems reasonable to state the corresponding theorem in this section. For the technical definition of circularity see Definition 3.1. (1) X is a circular code; (2) the representing graph G(X ) is acyclic, i.e. does not contain any cycle.
In the particular case of tesserae we will use a second graph associated to a set that we shall utilize later on in order to construct maximal circular tessera codes.
Definition 2.5 Let X ⊆ T E SS. The di-cut-graphs T 1,3 (X ) and T 2,4 (X ) associated to X are defined as the representing graphs G(X 1,3 ) and G(X 2,4 ) of the sets and To conclude this section we give an example of a di-cut-graph T (X ) of some tessera code X (Fig. 6).

Circular Tessera Codes
In this section we consider circular tessera codes. Simply speaking circularity means that a frame-shift in any concatenation of tesserae from that code will be detected. In the biological setting of the genetic code, a circular set of trinucleotides was first observed in Arqués and Michel (1996) and is supposed to play an important role in errordetection mechanisms during the translational process. We start with the definition of circularity for tesserae.
Definition 3.1 Let n ∈ N. A tessera code X ⊆ B 4 is called n-circular if for any set of tessera t i ∈ X (1 ≤ i ≤ n) the concatenation t 1 . . . t m has a unique decomposition into tesserae from the code X for any m ≤ n if considered on the circle. We will call a tessera code X ⊆ B 4 circular, if it is n-circular for all n ∈ N.
As we had noted before in Theorem 2.4 a tessera code X is circular if and only if its representing graph G(X ) is acyclic. Moreover, it is easy to see that the code X is n-circular if and only if for any concatenation t 1 · · · t m of tesserae t 1 , · · · , t m from X with m ≤ n the shifted sequences α i (t 1 · · · t m ) for i ≤ 3 do not yield a valid sequence in X m , i.e.
In particular, a tessera code X is 1-circular if it does not contain the cyclically shifted tesserae of its members, i.e.
for all i ≤ 3 and t ∈ X . Therefore, a circular code can not contain any tessera that equals one of its shifts, e.g. AC AC = α 2 (AC AC), and it makes sense to consider the equivalence classes that are formed by tesserae and their circular shifts. If all shifts are different, then this class is called complete. There are 12 such complete equivalence classes, each containing 4 elements. Four other classes each contain one element   (Table 3).
Since any circular code is also 1-circular and there are only 12 complete equivalence classes, it is obvious that a circular tessera code can contain at most 12 elements.
Definition 3.2 A circular tessera code is called maximal if it contains exactly 12 elements.
We will show in Sect. 4 how to construct all maximal circular tessera codes and now give an example of a 1-circular tessera code that is not 2-circular. Example 3.3 Let X = {AC GU , C AU G, GU C A, U G AC}. Then X is a 1-circular tessera code but the word AC GU C AU G has two decompositions on a circle AC GU |C AU G and GUC A|U G AC = α 2 (AC GU C AU G).
Thus X is not 2-circular. In particular, the graph component C 2 (X ) of the representing graph of G(X ) of X contains a cycle.
Moreover, the example below shows that also the classes of 2-and 3-circular tessera codes are different: Example 3.4 Let X = {C AGU, U GC A, GUU G}. Then X is a 2-circular (by means of easy computations) but not a 3-circular tessera code since the word C AGUU GC AGUU G has two decompositions on a circle

C AGU|U GC A|GUU G and GUU G|C AGU|U GC A
We show next that the graph component C 2 (G) being not acyclic is not an accident but in fact it is the only possibility for 1-circular codes not to be circular. In order to do so recall that a cycle in a graph G is a sequence e 1 → · · · → e n → e 1 of distinct vertices e i (i ≤ n) in G. The length of this cycle is then defined to be n. Note that for n = 1 a cycle of length 1 is a loop.
Proposition 3.5 Let X be a tessera code. Then the following hold: (i) The maximal length of a cycle in C 1 (X ) is 2; in particular, the maximal length of a path that does not contain a cycle is 1; (ii) The maximal length of a cycle in C 2 (X ) is 4; in particular, the maximal length of a path that does not contain a cycle is 3.
Proof Let X be a tessera code. We first prove (i) by showing that any path in C 1 (X ) of length 2 must contain a cycle. Hence assume that C 1 (X ) contains a path of length 2. Without loss of generality we may assume that it starts with a nucleotide, e.g.
Then N 1 N 2 N 3 N 4 and N 2 N 3 N 4 N 5 are valid tesserae from X . By definition of tesserae the former tells us that there is a transformation α ∈ V such that α(N 2 ) = N 4 and α(N 3 ) = N 1 . The latter however, then implies that also α(N 3 ) = N 5 and so N 1 = N 5 which shows that α 1 (N 1 N 2 N We now prove (ii) by showing that any path of length 4 in C 2 (X ) contains a cycle. Assume that C 2 (X ) contains a path of length 4, e.g.
Consequently, the path itself is a cycle of length 4. As a corollary we obtain an important theorem. Note that part (ii) was also obtained in a bachelor-thesis (Cisowski 2015) with a much more technical proof.

Theorem 3.6 Let X be a tessera code. Then the following hold:
(i) If X is 1-circular, then C 1 (X ) is acyclic; (ii) The following two conditions are equivalent: (a) X is circular; (b) X is 3-circular.

Proof
We first prove (i). By Proposition 3.5 we know that the maximal length of a cycle in C 1 (X ) is 2, hence a cycle would be of the form N 1 N 2 → N 3 N 4 → N 1 N 2 which contradicts 1-circularity since α 2 (N 1 N 2 N 3 N 4 In order to prove (ii) note that by Proposition 3.5 the maximal length of a cycle in G(X ) is 4. However, a cycle of even length 2 is excluded by 1-circularity and of length 4 by 2-circularity since

has two decompositions -a contradiction. Hence G(X ) does not contain any cycle of even length and the maximal length of an odd cycle is 3. By Theorem 2.3 from [13] we conclude that X is circular if and only if it is 3circular.
We conclude this section with a result that gives a handy criterion for constructing circular tessera codes and some application.

Theorem 3.7 Let X ⊆ T E SS be a tessera code. Then X is circular if
• X is 1-circular • One of the di-cut graphs T 1,3 (X ) and T 2,4 (X ) is acyclic.
Proof Assume that X is 1-circular and one of the di-cut graphs T 1,3 (X ) and T 2,4 (X ) is acyclic. Without loss of generality we assume that T 1,3 (X ) is acyclic. Assume that X is not circular. Then Proposition 3.5 and Theorem 3.6 imply that the component C 1 (X ) is acyclic and the maximal length of a cycle in C 2 (X ) is 4. Assume without loss of generality that is a cycle in G(X ). Thus the tesserae N 1 N 2 N 3 N 4 , N 3 N 4 N 5 N 6 , N 5 N 6 N 7 N 8 and N 7 N 8 N 1 N 2 are in X . By definition of T 1,3 (X ) is follows that N 1 N 3 , N 3 N 5 , N 5 N 7 and N 7 N 1 are dinucleotides in the set X 1,3 and hence N 1 , N 3 , N 5 and N 7 are vertices of T 1,3 (X ). Moreover, is a cycle in T (X ) -a contradiction to the fact that T 1,3 (X ) is acyclic.
The converse of Theorem 3.7 does not hold as the following example shows. Note, however, that the code X 1,3 (respectively X 2,4 ) can never contain dinucleotides of the form N N since they would imply that there is a tessera of the form N K N K in X which contradicts 1-circularity.

X ={AGU C, G A AG, C A AC, GGCC, AGCU , U GC A, GU AC, UU A A, CG AU , G ACU, CUU C, GUU G},
then X is a maximal circular tessera code but neither T 1,3 (X ) nor T 2,4 (X ) is acyclic.
We now state some application of the above results in order to construct maximal circular tessera codes from circular dinucleotide codes. In fact, the constructed codes will even have stronger properties: Definition 3.9 A circular tessera code X ⊆ T E SS is called a C 4 -code if also the three shifted codes α 1 (X ), α 2 (X ) and α 3 (X ) are circular.

is any linear ordering of the genetic alphabet B.
Proposition 3.10 Let D = {N 1 N 2 , N 1 N 3 , N 1 N 4 , N 2 N 3 , N 2 N 4 , N 3 N 4 } be a maximal circular dinucleotide code. Then Proof We first prove circularity of the code X . Clearly, T 1,3 (X ) = G(D). Since D is circular its graph G(D) is acyclic by Theorem 2.4 and thus we only need to verify that X is 1-circular by Theorem 3.7. But this is clear since the code contains exactly one tessera from each of the twelve complete equivalence classes from Table 5. Now let X (n) be the nth shift of X for n ≤ 3. Then we have N 1 N 3 , N 1 N 4 , N 3 N 4 , N 4 N 3 , N 2 N 3 , N 4 N 2 , N 3 N 1,3 = {N 2 N 1 , N 3 N 1 , N 4 N 1 , N 4 N 3 , N 3 N 4 , N 3 N 2 , N 2 N 4 , N 4 N 2 , N 4 N Clearly, X (2) 1,3 is a dinucleotide circular code since it is equal to ← − D , hence its representing graph G(X (2) 1,3 ) = T 1,3 (X (2) ) is acyclic and as above X (2) is 1-circular. By Theorem 3.7 we conclude that X (2) is a circular code. It remains to show that also X (1) and X (3) are circular. However, in this case which is circular and so Theorem 3.7 implies that also X (1) and also X (3) are circular. Hence X is a C 4 -code.
We would like to remark that the construction in the above lemma has some flexibility, e.g the tessera of the form N i N i N j N j can be substituted by tessera from the same equivalence class. However, it is not obvious how to construct all maximal circular tessera codes using this method. Nevertheless, in the next section we will give a way to obtain all such codes.

Construction of All Maximal Circular Tessera Codes
This section introduces one possibility to construct all maximal circular tessera codes. Recall that a circular tessera code is maximal if it contains exactly 12 elements. The construction will be accomplished in two major steps. Firstly, for each of the four equivalence classes from Table 1 we define a tournament on four vertices which are representing the single dinucleotides. Finally, we combine the four tournaments constructed in the previous step to construct maximal circular tessera codes. Recall that a tournament is a complete oriented graph (see e.g. Clark and Holton 1991). Figure 7 shows an example of a tournament.
As already proved in Theorem 3.6, the graph component C 1 (X ) associated to a tessera code X has either no path bigger than 1 or X is not circular. Even more precise, if C 1 (X ) is acyclic the code X must not even be 1-circular. Considering that, a construction of a maximal circular tessera code could almost be reduced to the problem of constructing a valid and acyclic C 2 which represents a correct tessera code X .
Step 1: In this step we construct four acyclic tournaments which together represent a tessera code X of length 24 so that C 2 (X ) is acyclic. Note that a tournament on 4 vertices has exactly 6 edges and in order to be acyclic it has to be isomorphic to the tournament given in Fig. 7. Below we will show how to construct tournaments on four vertices that represent a correct (circular) tessera code, i.e. the tournaments will be acyclic. Together they form the desired code X as (1) with |X I | = |X SW | = |X Y R | = |X K M | = 6 and, thus, |X | = 24 (2) As it can be seen from the construction, C 2 (X ) is acyclic as it is the union of acyclic tournaments, while C 1 (X ) is not. Yet, for this initial step we can ignore this fact. Since C 2 (X I ), C 2 (X SW ), C 2 (X Y R ) and C 2 (X K M ) are disjoint it is sufficient that these subgraphs are acyclic to ensure the acyclicity of C 2 (X ). As mentioned above, each of these subgraphs has to be isomorphic to the graph in Fig. 7.
Let us choose one of the equivalence classes i , i ∈ {I , SW , Y R, K M} and assign numbers 1, 2, 3, 4 to the dinucleotides of i . Now we draw directed edges from each node to the nodes with a higher number. This way we will obtain four acyclic tournaments, each of them represents a circular tessera code of size 6. This gives 4! possible assignments per subgraph. Hence, there are altogether (4!) 4 = 331776 tessera codes of size 24 with an acyclic C 2component.
Step 2: In this step, we use the 331776 tessera codes, constructed in Step 1, to construct all possible maximal circular tessera codes. Since the C 2 is already acyclic, it is sufficient to focus on C 1 .

Lemma 4.1 Let X be a tessera code constructed as above and
for some γ ∈ V. Then the following hold: Proof First we prove (1). Obviously, t is represented by the arrow N 1 N 2 → N 3 N 4 in the corresponding tournament. Obviously, γ = id. Let us consider α 2 (t) = N 3 N 4 N 1 N 2 . It follows that α 2 (t) / ∈ X since it would be represented in the same tournament by the opposite directed arrow N 4 N 3 → N 1 N 2 -a contradiction. Now we claim that one of the remaining shifts of t α 3 (t) = N 4 N 1 N 2 N 3 or α 1 (t) = N 2 N 3 N 4 N 1 is necessarily in the code X . Let us first assert that the dinucleotides N 4 N 1 and N 2 N 3 cannot be in the same equivalence class as N 1 N 2 and N 3 N 4 since in this case N 4 = N 2 takes place and, thus, γ = id. Consequently, one of the arrows N 4 N 1 → N 2 N 3 or N 2 N 3 → N 4 N 1 is drawn in the corresponding tournament and it follows that α 3 (t) ∈ X or α 1 (t) ∈ X . This proves (2).
The above lemma shows that consequently, X consists of 12 pairs of cyclically equivalent tesserae. To ensure that the codes are circular, one of the cyclically equivalent tuples must be removed. This has to be done for all 12 cyclically equivalent pairs of tuples in such a code X . It follows that each of the 331776 codes can be used to construct 2 12 circular codes -with possible repetitions. It remains to prove that all maximal circular tessera codes can be obtained this way. Let X be a auch a maximal code. As shown above, the C 2 component of each X i , i ∈ {I , K M, SW , Y R} is a simple directed acyclic graph with a maximum of four nodes. According to Theorem 3.1 (Fimmel et al. 2017), such a graph can be embedded in an acyclic tournament. In Step 1, all possible acyclic tournaments are constructed.
Step 2 takes all possible subgraphs of each tournament and combines those. This ensures that all possible maximal circular tessera codes are represented in the construction.
The table below gives the exact numbers of circular and even C 4 -codes (compare 3.9) for all cardinalities from 1 to the maximum 12. Moreover, it also shows that number of comma-free codes. Recall that comma-free codes form a subclass of circular codes.
Definition 4.2 A code X ⊆ B l is called comma-free if any concatenation x 1 x 2 does not contain any x ∈ X as a substring except for x 1 (as initial segment) and x 2 (as end segment) themselves.
Clearly, a comma-free code is circular and X is comma-free if and only if it associated graph has no path of length more than 2 (see   (Table 4).

Self-Complementary Circular Tessera Codes
In this section we will discuss some properties of self-complementary tessera codes. In particular, we will determine all maximal self-complementary comma-free tessera codes and give a graph-theoretical characterization of self-complementarity for tessera codes.
Let us first recall the definition of self-complementarity of a code.

Definition 4.3
Let X ⊆ B be a -nucleotide code. We will call X self-complementary if for each -nucleotide x ∈ X its anti--nucleotide We will also use the notation According to the above, a circular tessera code can contain a maximum of 12 tesserae. Such a code can even be self-complementary, as the next example shows. The next lemma gives the exact number of self-complementary 1-circular tessera codes.

Lemma 4.5 The maximal size of a self-complementary 1-circular tessera code is 12 and the number of them is 4096.
Proof Firstly, Example 4.4 shows that there are self-complementary circular codes of size 12 which is maximal. Secondly, inn order to calculate the exact number of self-complementary 1-circular codes, we first ascertain that for 6 conjugacy classes, the respective antitessera of a tessera from that class is found in another conjugacy class: The antitesserae of tesserae from class D 2 are all in class D 5 , from class D 3 in class D 6 and from class D 10 in class D 12 and, of course, vice versa. Thus, we have 4 3 possibilities to choose 6 tesserae from these conjugacy classes for a 1-circular self-complementary tessera code. As for the classes D 1 , D 4 , D 7 , D 8 , D 9 , D 11 , only the self-complementary tesserae can be chosen from these, since the other two form tessera-antitessera pairs and are cyclically equivalent. So we have further 2 6 possibilities for this. Altogether we have 2 6 · 4 3 = 4096 maximal self-complementary 1-circular codes.
The following example shows that not every 1-circular self-complementary tessera code is also circular (even not 2-circular).  (Table 6) tessera codes of maximal length.
We now aim for a graph-theoretical characterization of self-complementarity for tessera codes. Let us start with some observations on self-complementary 1-circular tessera codes: where d 1 , d 2 ∈ SW . However, cyclically equivalent tesserae cannot be in the same 1-circular code.
The next property is discovered by examining maximum circular codes of codons (RNA triplets) . Assume that Y ⊂ B 3 is a trinucleotide selfcomplementary code, G(Y ) = (V , E) the graph associated to Y . Then the following conditions are true: Table 6 The list of all self-complementary comma-free tessera codes of maximal length where d + (v) of a vertex v denotes the number of outgoing edges (directed edges that start in v) and d − (v) denotes the number of ingoing edges, respectively. It was also shown in  that the conditions from above are not sufficient in general to ensure self-complementarity but only for circular codes of size at least 18. We will show next that in the case of tesserae or dinucleotides, the size of the code does not matter and that one can obtain a similar result. Let us first prove the claim for dinucleotides: Lemma 4.9 Let X ⊆ B 2 be a 1-circular dinucleotide code, G(X ) = (V , E) its associated graph. X is self-complementary if and only if Let X be a self-complementary dinucleotide code, l 1 l 2 ∈ X for some l 1 l 2 ∈ B. Due to self-complementarity of X we have c(l 2 )c(l 1 ) ∈ X which implies that both conditions (1) and (2). Conversely, assume that X is a 1-circular code. Then its associated graph G(X ) can be embedded into a tournament on four vertices A, C, G, U ∈ B (compare Fimmel et al. 2017). Assume that G(X ) satisfies the conditions (1) and (2). The presence or absence of the self-complementary dinucleotides AU , U A, CG or GC in X does not affect either the self-complementarity of X or the conditions (1) and (2). Let us focus then on non-self-complementary dinucleotides from X . Suppose without loss of generality that the dinucleotide A → C is in the code. For conditions (1) and (2) to be met, a dinucleotide N 1 U and a dinucleotide G N 2 must be in the code. This can be achieved in three ways: (2) can now only be met if the dinucleotide AG ∈ X and the code is self-complementary or • N 1 = C, N 2 = A The condition (2) can now only be met if the dinucleotide U G ∈ X and the code is self-complementary This proves that X is self-complementary.
In the case of tesserae we should additionally consider the condition from the Lemma 4.8 and obtain a handy characterization of self-complementarity.
Theorem 4.10 Let X ⊆ T E SS be a 1-circular tessera code, Proof One implication is analogous to the proof of Proposition 3.1 in  considering Lemma 4.8. Conversely, assume that X ⊆ T E SS is a 1-circular tessera code that satisfies all three conditions (1) and for each not self-complementary tessera T = d 1 d 2 ∈ X i where i ∈ {I , SW , Y R, K M} its anti-tessera should be in the same component X i due to the fact that The rest of the proof can now be done analogously to the proof of Lemma 4.9.
In the Theorem above, the condition of 1-circularity can not be omitted, as the following example shows: Example 4.11 Let us consider the following tessera code The code is obviously not 1-circular and non-self-complementary since, for instance, ← −−−−−− c(AG AG) = CU CU / ∈ X takes place. But all three conditions from Theorem 4.10 are fulfilled. In the picture below, the round and square nodes represent pairs of reversedcomplementary dinucleotides.
We conclude this section with a second theorem that gives a graph-theoretical characterization for tessera codes that are not 1-circular using the graph component C 1 (X ) of a code X .
Theorem 4.12 Let X ⊆ T E SS be a tessera code, C 1 (X ) = (V 1 , E 1 ). X is selfcomplementary if and only if Proof Let us assume that X ⊆ T E SS satisfies properties (1) and (2) from Theorem 4.12. Hence, for any tessera N 1 N 2 N 3 N 4 ∈ X we have that N 2 N 3 N 4 ∈ V 1 and by property (1) also c(N 4 N 3 N 2 ) ∈ V 1 . Property (2) then implies that c(N 4 N 3 N 2 )N 5 ∈ X for some basis N 5 . It is clear that N 5 has to be the complement of N 1 by the unique definition of tesserae. More precisely, assume that π ∈ V such that N 2 = π(N 4 ) which implies that c(N 2 ) = π(c(N 4 )) and thus c(N 3 ) = π(N 5 ). Hence N 5 = c(N 1 ). Therefore c(N 4 N 3 N 2 )N 5 = c( ← −−−−−− − N 1 N 2 N 3 N 4 ) ∈ X and X is self-complementary.
Let us make a final remark: A 1-circular tessera code X represented by a tournament which is built on four dinucleotides of one of the equivalence classes (see Table 1) is self-complementary if and only if the numbers 1, 2, 3, 4 (see paragraph Construct a Tournament) are assigned to dinucleotides so that 1 is complementary to 4 and 2 is complementary to 3, i.e. d 1 = ←−− c(d 4 ), d 2 = ←−− c(d 3 ). In order to see this let the order on dinucleotides be defined as described above, d i d j ∈ X , i < j, i, j ∈ {1, 2, 3, 4} and If i = 1 or j = 4 then it is obvious that k < l since k = 1 or l = 4 and d k d l ∈ X . The only remaining case is i = 2, j = 3. But in this case k = 2, l = 3 takes place per definition of the order on dinucleotides and d 2 d 3 ∈ X . The opposite direction: Let d 1 = ←−− c(d 2 ) and, correspondingly, is analogous. In both cases X is not a self-complementary code. Here is an example. This shows that in the construction of all maximal circular tessera codes one can also identify and construct all maximal self-complementary circular codes.

Conclusions
In this work we have identified and characterized circular tessera codes and their properties. In Gonzalez et al. (2012) and Gonzalez et al. (2019) Gonzalez, Giannerini and Rosa had proposed an ancestor code of the universal genetic code that is based on 64 tetranucleotides built from dinucleotides by using the Klein four symmetry group. It was hypothesized that this tessera code existed before LUCA and even before the early genetic code that coded for 20 amino acids using all 64 codons. Possible primeval adaptor molecules that could decode the tessera were also modelled and it was shown that the tessera code mirrors exactly the degeneracy distribution of the mitochondrial genetic code.
We have combined the theory of tesserae with the the theory of circular codes that have been studied extensively during the last decades. Circular codes were found by an extensive statistical investigation in Arqués and Michel (1996) and seem to play an important role in the detection and correction mechanisms of the ribosome during translation. Moreover, it was hypothesized in [13] that ancestor codes of the universal genetic code might have used codons from a circular code only. Thus it was reasonable to investigate circular tessera codes which could have existed between a primitive genetic code and the tessera code. Our results show that circular tessera codes can be of size 12 at most and we have given construction methods for all circular tessera codes of this size. Moreover, the number of circular (comma-free, self-complementary) tessera codes of any size between 1 and 12 have been calculated.