Abstract
Comma-free codes constitute a class of circular codes, which has been widely studied, in particular by Golomb et al. (Biologiske Meddelelser, Kongelige Danske Videnskabernes Selskab 23:1–34, 1958a, Can J Math 10:202–209, 1958b), Michel et al. (Comput Math Appl 55:989–996, 2008a, Theor Comput Sci 401:17–26, 2008b, Inf Comput 212:55–63, 2012), Michel and Pirillo (Int J Comb 2011:659567, 2011), and Fimmel and Strüngmann (J Theor Biol 389:206–213, 2016). Based on a recent approach using graph theory to study circular codes Fimmel et al. (Philos Trans R Soc 374:20150058, 2016), a new class of circular codes, called strong comma-free codes, is identified. These codes detect a frameshift during the translation process immediately after a reading window of at most two nucleotides. We describe several combinatorial properties of strong comma-free codes: enumeration, maximality, self-complementarity and \(CF^3\)-property (comma-free property in all the three possible frames). These combinatorial results also highlight some new properties of the genetic code and its evolution. Each amino acid in the standard genetic code is coded by at least one strong comma-free code of size 1. There are 9 amino acids \(S=\{Asn,Asp,Gln,Gly,Lys,Met,Phe,Pro,Trp\}\) among 20 such that for each amino acid from S, its synonymous trinucleotide set (excluding the necessary periodic trinucleotides \(\{AAA,CCC,GGG,TTT\}\)) is a strong comma-free code. The primeval comma-free RNY code of Eigen and Schuster (Naturwissenschaften 65:341–369, 1978) is a self-complementary \(CF^3\)-code of size 16. Furthermore, it is the union of two strong comma-free codes of size 8 which are complementary to each other.
Similar content being viewed by others
References
Arquès DG, Michel CJ (1996) A complementary circular code in the protein coding genes. J Theor Biol 182:45–58
Canapa A, Cerioni PN, Barucca M, Olmo E, Caputo V (2002) A centromeric satellite DNA may be involved in heterochromatin compactness in gobiid fishes. Chromosom Res 10:297–304
Clark J, Holton DA (1991) A first look at graph theory. World Scientific, Singapore
Crick FH, Brenner S, Klug A, Pieczenik G (1976) A speculation on the origin of protein synthesis. Orig Life 7:389–397
Crick F, Griffith JS, Orgel LE (1957) Codes without commas. Proceedings of the National Academy of Sciences, vol 43. U.S.A, pp 416–421
Eigen M, Schuster P (1978) The hypercycle. A principle of natural self-organization. Part C: the realistic hypercycle. Naturwissenschaften 65:341–369
El Soufi K, Michel CJ (2014) Circular code motifs in the ribosome decoding center. Comput Biol Chem 52:9–17
El Soufi K, Michel CJ (2015) Circular code motifs near the ribosome decoding center. Comput Biol Chem 59:158–176
El Soufi K, Michel CJ (2016) Circular code motifs in genomes of eukaryotes. J Theor Biol 408:198–212
El Soufi K, Michel CJ (2017) Unitary circular code motifs in genomes of eukaryotes. Biosystems 153:45–62
Fimmel E, Giannerini S, Gonzalez D, Strüngmann L (2014) Circular codes, symmetries and transformations. J Math Biol. doi:10.1007/s00285-014-0806-7
Fimmel E, Strüngmann L (2015) On the hierarchy of trinucleotide n-circular codes and their corresponding amino acids. J Theor Biol 364:113–120
Fimmel E, Strüngmann L (2016) Maximal dinucleotide comma-free codes. J Theor Biol 389:206–213
Fimmel E, Michel CJ, Strüngmann L (2016) \(n\)-Nucleotide circular codes in graph theory. Philos Trans R Soc A 374:20150058
Frey G, Michel CJ (2006) Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes. Comput Biol Chem 30:87–101
Gemayel R, Vinces MD, Legendre M, Verstrepen KJ (2010) Variable tandem repeats accelerate evolution of coding and regulatory sequences. Ann Rev Genet 44:445–477
Golomb SW, Delbruck M, Welch LR (1958a) Construction and properties of comma-free codes. Biologiske Meddelelser, Kongelige Danske Videnskabernes Selskab 23:1–34
Golomb SW, Gordon B, Welch LR (1958b) Comma-free codes. Can J Math 10:202–209
Michel CJ (2012) Circular code motifs in transfer and 16S ribosomal RNAs: a possible translation code in genes. Comput Biol Chem 37:24–37
Michel CJ (2013) Circular code motifs in transfer RNAs. Comput Biol Chem 45:17–29
Michel CJ (2015) The maximal \(C^3\) self-complementary trinucleotide circular code \(X\) in genes of bacteria, eukaryotes, plasmids and viruses. J Theor Biol 380:156–177
Michel CJ (2017) The maximal \(C^3\) self-complementary trinucleotide circular code \(X\) in genes of bacteria, archaea, eukaryotes, plasmids and viruses. Life 7(20):1–16
Michel CJ, Pirillo G (2011) Strong trinucleotide circular codes. Int J Comb 2011:659567. doi:10.1155/2011/659567
Michel CJ, Pirillo G, Pirillo MA (2008a) Varieties of comma free codes. Comput Math Appl 55:989–996
Michel CJ, Pirillo G, Pirillo MA (2008b) A relation between trinucleotide comma-free codes and trinucleotide circular codes. Theor Comput Sci 401:17–26
Michel CJ, Pirillo G, Pirillo MA (2012) A classification of 20-trinucleotide circular codes. Inf Comput 212:55–63
Nirenberg MW, Matthaei JH (1961) The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proceedings of the National Academy of Sciences, vol 47. U.S.A., pp 1588–1602
Shepherd JCW (1981) Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proceedings of the National Academy of Sciences, vol 78. U.S.A., pp 1596–1600
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fimmel, E., Michel, C.J. & Strüngmann, L. Strong Comma-Free Codes in Genetic Information. Bull Math Biol 79, 1796–1819 (2017). https://doi.org/10.1007/s11538-017-0307-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-017-0307-0