An Extended RNA Code and its Relationship to the Standard Genetic Code: An Algebraic and Geometrical Approach

  • Marco V. JoséEmail author
  • Eberto R. Morgado
  • Tzipe Govezensky
Original Article


An algebraic and geometrical approach is used to describe the primaeval RNA code and a proposed Extended RNA code. The former consists of all codons of the type RNY, where R means purines, Y pyrimidines, and N any of them. The latter comprises the 16 codons of the type RNY plus codons obtained by considering the RNA code but in the second (NYR type), and the third, (YRN type) reading frames. In each of these reading frames, there are 16 triplets that altogether complete a set of 48 triplets, which specify 17 out of the 20 amino acids, including AUG, the start codon, and the three known stop codons. The other 16 codons, do not pertain to the Extended RNA code and, constitute the union of the triplets YYY and RRR that we define as the RNA-less code. The codons in each of the three subsets of the Extended RNA code are represented by a four-dimensional hypercube and the set of codons of the RNA-less code is portrayed as a four-dimensional hyperprism. Remarkably, the union of these four symmetrical pairwise disjoint sets comprises precisely the already known six-dimensional hypercube of the Standard Genetic Code (SGC) of 64 triplets. These results suggest a plausible evolutionary path from which the primaeval RNA code could have originated the SGC, via the Extended RNA code plus the RNA-less code. We argue that the life forms that probably obeyed the Extended RNA code were intermediate between the ribo-organisms of the RNA World and the last common ancestor (LCA) of the Prokaryotes, Archaea, and Eucarya, that is, the cenancestor. A general encoding function, E, which maps each codon to its corresponding amino acid or the stop signal is also derived. In 45 out of the 64 cases, this function takes the form of a linear transformation F, which projects the whole six-dimensional hypercube onto a four-dimensional hyperface conformed by all triplets that end in cytosine. In the remaining 19 cases the function E adopts the form of an affine transformation, i.e., the composition of F with a particular translation. Graphical representations of the four local encoding functions and E, are illustrated and discussed. For every amino acid and for the stop signal, a single triplet, among those that specify it, is selected as a canonical representative. From this mapping a graphical representation of the 20 amino acids and the stop signal is also derived. We conclude that the general encoding function E represents the SGC itself.


Primaeval RNA code Standard genetic code Evolution of the genetic code Extended RNA codes Algebra and geometry 


  1. Becerra, A., Islas, S., Leguina, J.I., Silva, E., Lazcano, A., 1997. Polyphyletic gene losses can bias backtrack characterizations of the cenancestor. J. Mol. Evol. 45, 115–118.CrossRefGoogle Scholar
  2. Coxeter, H.S.M., 1973. Regular Polytopes, 3rd edition. Dover Publication Inc., New York.Google Scholar
  3. Crick, F.H.C., 1968. The origin of the genetic code. J. Mol. Biol. 38, 367–379.CrossRefGoogle Scholar
  4. Crick, F.H.C., Griffith, J.S., Orgel, L.E., 1957. Codes without commas. Proc. Natl. Acad. Sci. USA 43, 417–421.CrossRefMathSciNetGoogle Scholar
  5. Eigen, M., Schuster, P., 1977. The hypercycle. A principle of natural self-organization. Part A: Emergence of the hypercycle. Naturwissenschaften 64, 541–565.CrossRefGoogle Scholar
  6. Eigen, M., Schuster, P., 1978. The hypercycle. A principle of natural self-organization. Part C: The realisitic hypercycle. Naturwissenschaften 64, 341–369.CrossRefGoogle Scholar
  7. Eigen, M., Lindemann, B., Winkler-Oswatitsch, R., Clarke, C.H., 1985. Pattern analysis of 5S rRNA. Proc. Natl. Acad. Sci. USA 82, 2437–2441.CrossRefGoogle Scholar
  8. Freeland, S.J., Hurst, L.D., 1998. The genetic code is one in a million. J. Mol. Evol. 47, 238–248.CrossRefGoogle Scholar
  9. García, J.A., Alvarez, S., Flores, A., Govezensky, T., Bobadilla, J.R., José, M.V., 2004. Statistical analysis of the distribution of amino acids in Borrelia burgdorferi genome under different genetic codes. Physica A 342, 288–293.CrossRefGoogle Scholar
  10. Gesteland, R.F., Cech, T.R., Atkins, J.F., 1999. The RNA World. Cold Spring Harbor Laboratory Press.Google Scholar
  11. Gil, R., Sabater-Muñoz, B., Latorre, A., Silva, S.J., Moya, A., 2002. Extreme genome reduction in Buchnera spp.: Toward the minimal genome needed for symbiotic life. Proc. Natl. Acad. Sci. USA 99, 4454–4458.CrossRefGoogle Scholar
  12. Gilbert, W., 1986. The RNA World. Nature 319, 618.Google Scholar
  13. He, M., Petoukhov, S.V., Ricci, P.E., 2004. Genetic code, Hamming distance and stochastic matrices. Bull. Math. Biol. 66, 1405–1421.CrossRefMathSciNetGoogle Scholar
  14. Hutchinson III, C.A., Peterson, S.N., Gill, S.R., Cline, R.T., White, O., Fraser, C., Smith, H.O., Venter, J.C., 1999. Global transposon mutagenesis and a minimal mycoplasma genome. Science 286, 2165–2169.CrossRefGoogle Scholar
  15. Jiménez-Montaño, M.A., de la Mora-Basañez, C.R., Pöschel, T., 1996. The hypercube structure of the genetic code explains conservative and non-conservative amino acid substitutions in vivo and in vitro. Biosystems 39, 117–125.CrossRefGoogle Scholar
  16. Keck, J.G., Baldick, C.J., Moss, B., 1990. Role of DNA replication in Vaccinia virus gene expression: A naked template is required for transcription of three late trans-activator genes. Cell 61, 801–809.CrossRefGoogle Scholar
  17. Kenneth, D.J., Ellington, A.D., 1995. The search for missing links between self-replicating nucleic acids and the RNA world. Orig. Life Evol. Biosph. 25, 515–530.CrossRefGoogle Scholar
  18. Konecny, J., Schöniger, M., Hofacker, L., 1995. Complementary coding to the primeval comma-less code. J. Theor. Biol. 173, 263–270.CrossRefGoogle Scholar
  19. Jacob, F., 1977. Evolution and tinkering. Science 196, 1161–1166.CrossRefGoogle Scholar
  20. Lazcano, A., 1995. Cellular evolution during the early Archean: What happened between the progenote and the cenancestor? Microbiologia SEM 11, 1–13.Google Scholar
  21. Lazcano, A., Miller, S.L., 1996. The origin and early evolution of life: Prebiotic chemistry, the pre-RNA World, and time. Cell 85, 793–798.CrossRefGoogle Scholar
  22. Mushegian, A.R., Koonin, E.V., 1996. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc. Natl. Acad. Sci. USA 93, 10268–10273.CrossRefGoogle Scholar
  23. Sánchez, R., Morgado, E., Grau, R., 2004a. The genetic code Boolean lattice, MATCH Commun. Math. Comput. Chem. 52, 29–46.zbMATHGoogle Scholar
  24. Sánchez, R., Morgado, E., Grau, R., 2004b. Genetic code Boolean algebras. W-Seas Transactions of the International Conference on Biology and Biomedicine, vol. 1, issue 2, pp. 190–197. Corfus, Greece, ISNN 1109–1518.Google Scholar
  25. Sánchez, R., Morgado, E., Grau, R., 2005. A genetic code boolean structure I. The meaning of boolean deductions. Bull. Math. Biol. 67, 1–14.CrossRefMathSciNetGoogle Scholar
  26. Szathmáry, E., 2005. In search of the simplest cell. Science. 433, 469–470.Google Scholar
  27. White, M., Undated. Maximum symmetry in the genetic code: The Rafiki map. Unpublished manuscript.
  28. Woese, C., 1967. The Genetic Code, Ch. 7. Harper and Row, New York.Google Scholar

Copyright information

© Society for Mathematical Biology 2006

Authors and Affiliations

  • Marco V. José
    • 1
    Email author
  • Eberto R. Morgado
    • 2
  • Tzipe Govezensky
    • 1
  1. 1.Theoretical Biology Group, Instituto de Investigaciones BiomédicasUniversidad Nacional Autónoma de MéxicoMéxico D.F.México
  2. 2.Facultad de Matemática, Física y ComputaciónUniversidad Central “Marta Abreu” de Las VillasSanta ClaraCuba

Personalised recommendations