Abstract
The genetic code has a high level of error robustness. Using values of hydrophobicity scales as a proxy for amino acid character, and the mean square measure as a function quantifying error robustness, a value can be obtained for a genetic code which reflects the error robustness of that code. By comparing this value with a distribution of values belonging to codes generated by random permutations of amino acid assignments, the level of error robustness of a genetic code can be quantified. We present a calculation in which the standard genetic code is shown to be optimal. We obtain this result by (1) using recently updated values of polar requirement as input; (2) fixing seven assignments (Ile, Trp, His, Phe, Tyr, Arg, and Leu) based on aptamer considerations; and (3) using known biosynthetic relations of the 20 amino acids. This last point is reflected in an approach of subdivision (restricting the random reallocation of assignments to amino acid subgroups, the set of 20 being divided in four such subgroups). The three approaches to explain robustness of the code (specific selection for robustness, amino acid–RNA interactions leading to assignments, or a slow growth process of assignment patterns) are reexamined in light of our findings. We offer a comprehensive hypothesis, stressing the importance of biosynthetic relations, with the code evolving from an early stage with just glycine and alanine, via intermediate stages, towards 64 codons carrying todays meaning.
This is a preview of subscription content, access via your institution.

Notes
For a recent update on prebiotic synthesis see (Parker et al. 2011) and references therein.
In the original calculation, Haig and Hurst ignored the three ‘stop codons’ encoding chain termination.
When using the Freeland and Hurst weights (and hence the MS FH0 -values), it is possible to fix another set of three amino acids Phe, His, Trp in order to make the SGC optimal.
References
Aboderin AA (1971) An empirical hydrophobicity scale for α-amino-acids and some of its applications. Int J Biochem 2(11):537–544
Alff-Steinberger C (1969) The genetic code and error transmission. Proc Natl Acad Sci USA 64(2):584–591
Ardell DH (1998) On error minimization in a sequential origin of the standard genetic code. J Mol Evol 47(1):1–13
Berg JM, Tymoszko JL, Stryer L (2007) Biochemistry, 6th edn. W.H. Freeman and Company, New York, p 664
Biou V, Gibrat JF, Levin JM, Robson B, Garnier J (1988) Secondary structure prediction: combination of three different methods. Protein Eng 2(3):185–191
Buhrman H, van der Gulik PTS, Kelk SM, Koolen WM, Stougie L (2011) Some mathematical refinements concerning error minimization in the genetic code. IEEE/ACM Trans Comput Biol Bioinf 8(5):1358–1372
Burkard R, Derigs U (1980) Assignment and matching problems: solution methods with FORTRAN-programs. Lecture notes in economics and mathematical systems. Springer-Verlag, Berlin. http://books.google.nl/books?id=0jwZAQAAIAAJ
Burkard RE, Rendl F (1984) A thermodynamically motivated simulation procedure for combinatorial optimization problems. Eur J Oper Res 17(2):169–174
Butler T, Goldenfeld N, Mathew D, Luthey-Schulten Z (2009) Extreme genetic code optimality from a molecular dynamics calculation of amino acid polar requirement. Phys Rev E 79(6):060,901(R)
Caporaso JG, Yarus M, Knight R (2005) Error minimization and coding triplet/binding site associations are independent features of the canonical genetic code. J Mol Evol 61(5):597–607
Cornette JL, Cease KB, Margalit H, Spouge JL, Berzofsky JA, DeLisi C (1987) Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J Mol Biol 195(3):659–685
Crick FHC (1968) The origin of the genetic code. J Mol Biol 38(3):367–379
Crick FHC, Barnett L, Brenner S, Watts-Tobin RJ (1961) General nature of the genetic code for proteins. Nature 192(4809):1227–1232
Di Giulio M (1989) The extension reached by the minimization of the polarity distances during the evolution of the genetic code. J Mol Evol 29(4):288–293
Di Giulio M (2008) An extension of the coevolution theory of the origin of the genetic code. Biol Direct 3:37
Eigen M, Schuster P (1978) A principle of natural self organization. Part C: the realistic hypercycle. Naturwissenschaften 65(7):341–369
Eisenberg D, McLachlan AD (1986) Solvation energy in protein folding and binding. Nature 319(6050):199–203
Ellington AD, Szostak JW (1990) In vitro selection of RNA molecules that bind specific ligands. Nature 346:818–822
Eppstein D (2003) Setting parameters by example. SIAM J Comput 32(3):643–653
Erives A (2011) A model of proto-anti-codon RNA enzymes requiring L-amino acid homochirality. J Mol Evol 73:10–22. doi:10.1007/s00239-011-9453-4
Freeland SJ, Hurst LD (1998a) The genetic code is one in a million. J Mol Evol 47(3):238–248
Freeland SJ, Hurst LD (1998b) Load minimization of the genetic code: history does not explain the pattern. Proc R Soc B Biol Sci 265(1410):2111–2119
Freeland SJ, Knight RD, Landweber LF, Hurst LD (2000) Early fixation of an optimal genetic code. Mol Biol Evol 17(4):511–518
Freeland SJ, Wu T, Keulmann N (2003) The case for an error minimizing standard genetic code. Orig Life Evol Biosp 33(4-5):457–477
Gilis D, Massar S, Cerf NJ, Rooman M (2001) Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol 2(11):R49
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185(4154):862–864
Grosjean H, de Crecy-Lagard V, Marck C (2010) Deciphering synonymous codons in the three domains of life: co-evolution with specific tRNA modification enzymes. FEBS Lett 584(2):252–264
Haig D, Hurst LD (1991) A quantitative measure of error minimization in the genetic code. J Mol Evol 33(5):412–417
Higgs PG (2009) A four- column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol Direct 4:16
Higgs PG, Pudritz RE (2009) A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic code. Astrobiology 9(5):483–490
Ikehara K (2002) Origins of gene, genetic code, protein and life: comprehensive view of life systems from a GNC-SNS primitive genetic code hypothesis. J Biosci 27(2):165–186
Ikehara K, Omori Y, Arai R, Hirose A (2002) A novel theory on the origin of the genetic code: a GNC-SNS hypothesis. J Mol Evol 54(4):530–538
Illangasekare M, Yarus M (2002) Phenylalanine-binding RNAs and genetic code evolution. J Mol Evol 54(3):298–311
Janas T, Widmann JJ, Knight R, Yarus M (2010) Simple, recurring RNA binding sites for l-arginine. RNA 16(4):805–816
Jensen RA (1976) Enzyme recruitment in evolution of new function. Annu Rev Microbiol 30:409–425
Johansson MJO, Esberg A, Huang B, Bjork GR, Bystrom AS (2008) Eukaryotic wobble uridine modifications promote a functionally redundant decoding system. Mol Cell Biol 28(10):3301–3312
Johnson DBF, Wang L (2010) Imprints of the genetic code in the ribosome. Proc Natl Acad Sci USA 107(18):8298–8303
Kawashima S, Ogata H, Kanehisa M (1999) AAindex: amino acid index database. Nucleic Acids Res 27(1):368–369
Knight RD, Freeland SJ, Landweber LF (1999) Selection, history and chemistry: the three faces of the genetic code. Trends Biochem Sci 24(6):241–247
Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157(1):105–132
Lehman N, Jukes TH (1988) Genetic code development by stop codon takeover. J Theor Biol 135(2):203–214
Li Y, Pardalos P, Resende M (1994) A greedy randomized adaptive search procedure for the quadratic assignment problem. Quadratic Assign Relat Probl 16:237–261
Lozupone C, Changayil S, Majerfeld I, Yarus M (2003) Selection of the simplest RNA that binds isoleucine. RNA 9(11):1315–1322
Majerfeld I, Chocholousova J, Malaiya V, Widmann J, McDonald D, Reeder J, Iyer M, Illangasekare M, Yarus M, Knight R (2010) Nucleotides that are essential but not conserved; a sufficient l-tryptophan site in RNA. RNA 16(10):1915–1924
Majerfeld I, Puthenvedu D, Yarus M (2005) RNA affinity for molecular l-histidine; genetic code origins. J Mol Evol 61:226–235
Majerfeld I, Yarus M (1994) An RNA pocket for an aliphatic hydrophobe. Nat Struct Biol 1(5):287–292
Majerfeld I, Yarus M (2005) A diminutive and specific RNA binding site for l-tryptophan. Nucleic Acids Res 33(17):5482–5493. doi:10.1093/nar/gki861
Massey SE (2006) A sequential "2-1-3" model of genetic code evolution that explains codon constraints. J Mol Evol 62(6):809–810
Massey SE (2008) A neutral origin for error minimization in the genetic code. J Mol Evol 67(5):510–516
Mathew DC, Luthey-Schulten Z (2008) On the physical basis of the amino acid polar requirement. J Mol Evol 66(5):519–528
MATLAB: version 7.12.0 (R2011a) The MathWorks Inc., Natick, Massachusetts (2011)
Meirovitch H, Rackovsky S, Scheraga HA (1980) Empirical studies of hydrophobicity. 1. Effect of protein size on the hydrophobic behavior of amino acids. Macromolecules 13(6):1398–1405
Miyazawa S, Jernigan RL (1985) Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18(3):534–552
Miyazawa S, Jernigan RL (1999) Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues. Proteins 34(1):49–68
Noller HF (2004) The driving force for molecular evolution of translation. RNA 10(12):1833–1837
Novozhilov AS, Wolf YI, Koonin EV (2007) Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol Direct 2:24
Ohno S (1970) Evolution by gene duplication. Springer, Berlin
Oobatake M, Ooi T (1977) An analysis of non-bonded energy of proteins. J Theor Biol 67(3):567–584
Parker ET, Cleaves HJ, Dworkin JP, Glavin DP, Callahan M, Aubrey A, Lazcano A, Bada JL (2011) Primordial synthesis of amines and amino acids in a 1958 Miller H2S-rich spark discharge experiment. Proc Natl Acad Sci USA 108(14):5526–5531
Philip GK, Freeland SJ (2011) Did evolution select a nonrandom "alphabet" of amino acids? Astrobiology 11(3):235–240
Ponnuswamy PK, Prabhakaran M, Manavalan P (1980) Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. Biochim Biophys Acta 623(2):301–316
Rahman S, Bashton M, Holliday G, Schrader R, Thornton J (2009) Small molecule subgraph detector (SMSD) toolkit. J Cheminform 1(1):12. doi:10.1186/1758-2946-1-12 http://www.jcheminf.com/content/1/1/12
Rode BM, Son HL, Suwannachot Y, Bujdak J (1999) The combination of salt induced peptide formation reaction and clay catalysis: a way to higher peptides under primitive earth conditions. Orig Life Evol Biosph 29(3):273–286
Schwendinger MG, Rode BM (1989) Possible role of copper and sodium in prebiotic evolution of peptides. Anal Sci 5:411–414
Sweet RM, Eisenberg D (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171(4):479–488
Szostak JW (2012) The eightfold path to non-enzymatic rna replication. J Syst Chem 3:2
Taylor FJR, Coates D (1989) The code within the codons. BioSystems 22(3):177–187
Turk RM, Chumachenko NV, Yarus M (2010) Multiple translational products from a five-nucleotide ribozyme. Proc Natl Acad Sci USA 107(10):4585–4589
van der Gulik P, Massar S, Gilis D, Buhrman H, Rooman M (2009) The first peptides: the evolutionary transition between prebiotic amino acids and early proteins. J Theor Biol 261(4):531–539
van der Gulik PTS, Hoff WD (2011) Unassigned codons, nonsense suppression, and anticodon modifications in the evolution of the genetic code. J Mol Evol 73(3-4):59–69
Vetsigian K, Woese C, Goldenfeld N (2006) Collective evolution and the genetic code. Proc Natl Acad Sci USA 103(28):10,696–10,701
Voet D, Voet JG (1995) Biochemistry, 2nd edn, Wiley, New York, p 773
Woese CR (1965) Order in the genetic code. Proc Natl Acad Sci USA 54(1):71–75
Woese CR (1967) The genetic code. Harper and Row, New York
Woese CR (1973) Evolution of the genetic code. Naturwissenschaften 60(10):447–459
Woese CR, Dugre DH, Dugre SA, Kondo M, Saxinger WC (1966a) On the fundamental nature and evolution of the genetic code. Cold Spring Harb Symp Quant Biol 31:723–736
Woese CR, Dugre DH, Saxinger WC, Dugre SA (1966b) The molecular basis for the genetic code. Proc Natl Acad Sci USA 55(4):966–974
Wolf YI, Koonin EV (2007) On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization. Biol Direct 2:14
Wong JT (1975) A co-evolution theory of the genetic code. Proc Natl Acad Sci USA 72(5):1909–1912
Wong JT (1980) Role of minimization of chemical distances between amino acids in the evolution of the genetic code. Proc Natl Acad Sci USA 77(2 II):1083–1086
Wong JT (2007) Question 6: coevolution theory of the genetic code: a proven theory. Orig Life Evol Biosph 37(4-5):403–408
Wong JTF (2005) Coevolution theory of genetic code at age thirty. BioEssays 27(4):416–425
Yarus M (2011) The meaning of a minuscule ribozyme. Philos Trans R Soc B Biol Sci 366(1580):2902–2909
Yarus M, Widmann JJ, Knight R (2009) RNA-amino acid binding: a stereochemical era for the genetic code. J Mol Evol 69(5):406–429
Acknowledgements
We thank the EiC and two anonymous reviewers for suggestions which improved the manuscript. Part of this research has been funded by NWO-VICI Grant 639-023-302, by the NWO-CLS MEMESA Grant, by the Tinbergen Institute, and by a NWO-VENI Grant.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendices
Four further observations are reported here. Firstly, as explained in ‘Introduction’ section , consideration of the biosynthetic pathways leading to the different amino acids suggests an aspect of organization of the SGC, in which GNN codons tend to be assigned to ‘prebiotic amino acids’, ANN codons to comparatively small, aspartate-derived amino acids, CNN codons to larger amino acids, and UNN codons to the largest, or (in the case of cysteine) the most instable and reactive amino acid. In other words: the first position of the codon might have a link with the complexity of biochemistry, e.g. the UNN codons being the only ones encoding aromatic amino acids and the instable cysteine, and reflecting the most advanced stage of biochemistry during the evolution of the genetic code (when the biochemistry was sufficiently complex to handle cysteine, and to build tryptophan). In ‘Appendix :Molecular Structure Matrix’, we study this link with the biosynthetic development of amino acids by measuring how many one-atom changes are required to transform one amino acid into another. With respect to this distance measure, amino acids derived from the same precursor (like e.g. Ile and Thr) are comparatively close, because they share structure parts. Changing the second position of the codon (in the case of Ile and Thr: changing AUU to ACU) would then replace an amino acid by one with a comparatively similar structure, reflecting their membership of the same biosynthetic family. If the error-robustness calculation is performed with these molecular-structure distances, the SGC is found to have error protection in substitution mutations in the second position (and therefore grouping e.g. ANU codons together). The results are given in ‘Appendix :Molecular Structure Matrix’.
Secondly, we tried to find numerical values for the 20 amino acids which make the SGC optimal in terms of error robustness among all possible genetic codes. Using a numerical optimization approach developed by Eppstein (2003), we were able to find 20 such values. In fact, many different sets of 20 values have this property. Details about these SGC-optimality calculations can be found in ‘Appendix: Inverse Parametric Optimization’.
Thirdly, we screened a large list of physico-chemical amino acid characteristics on their performance in our error-robustness calculations. Polar requirement was one of the best performing measures. This strongly supports the remark by Haig and Hurst [‘The natural code is very conservative with respect to polar requirement. The striking correspondence between codon assignments and such a simple measure deserves further study’ (Haig and Hurst 1991)]. The observation of Vetsigian, Woese, and Goldenfeld (‘Although we do not know what defines amino acid ‘similarity’ in the case of the code, we do know one particular amino acid measure that seems to express it quite remarkably in the coding context. That measure is amino acid polar requirement […]’ (Vetsigian et al. 2006) should also be mentioned. More details are given in ‘Appendix: Scan of Other Amino Acid Properties’.
Finally, we wondered if, by fixing just one or two more assignments, the SGC would be optimal without using the subdivision leading to the historically reasonable set of possible codes (as explained in ‘Introduction’ section) This was not the case. When working with Haig–Hurst weights (i.e. equal weighting), there exist 34 sets of 9 fixed assignments which do have this characteristic. However, none of these 34 sets consists of the seven fixed assignments based on aptamer considerations plus two more amino acids. The smallest set containing the seven has size 10. When working with Freeland–Hurst weights (see ‘Methods’ section), sets of 8 or 9 fixed assignments with the required characteristic, do not exist. This work is presented in ‘Appendix: Minimal Number of Fixed Assignments’.
Molecular Structure Matrix
Polar requirement is just one physico-chemical aspect of amino acids. The discovery that only 1 in 10000 random codes has a lower error-robustness value than the SGC when polar requirement is used as an amino acid characteristic (Haig and Hurst 1991) is compelling evidence that error robustness is present in the SGC. When a conservative attitude is taken, and a phenomenon is considered noteworthy only when the probability to encounter it as a random effect is <0.1 %, the SGC is clearly noteworthy. If one considers the error-robustness values for the three positions separately [please refer to Buhrman et al. (2011) for details] the results in the left column of Fig. 2 are obtained. The third position is in the <0.1 % category, the first position is in the <1 % category, while the second position, with about 22 %, is not even in the <5 % category, and can thus not be considered special.
Histograms of the MS-values of 10 million random samples using updated polar requirement (Mathew and Luthey-Schulten 2008) (4 histograms on the left) and molecular-structure distances from Table 3 squared (4 histograms on the right). The top row shows the MS0 value, the second row is the component from the first codon position (MScore1), third and forth row the components from the middle (MScore2) and last (MScore3) codon position. In contrast to the original definition (Haig and Hurst 1991) of MSi for i ≥ 1, we have chosen to normalize MScorei with the same constant as MS0 so that MS0 = ∑ 3 i=1 MScorei. The dashed red line indicates the value of the SGC
This result is not entirely satisfactory, because the codons of several pairs of similar amino acids are related by second position changes. For instance, a change from phenylalanine (Phe) to tyrosine (Tyr) is clearly a conservative change from a biological viewpoint. To develop a measure for this kind of amino acid relatedness, we introduce a new way of measuring amino acid similarity by one-atom changes which yields a measure of similarity in terms of molecular structure. We should stress that this measure does not reflect actual chemical reactions/steps. As an example, we compute the distance between Phe and Tyr to be 3 as follows: the hydrogen atom at the end of the side chain of Phe is taken off as a first step. An oxygen atom is placed on the position, which the hydrogen atom had before as a second step. The Tyr molecule is completed by addition of an hydrogen atom on top of this oxygen atom, producing the hydroxyl group at the end of the side chain of Tyr, and this is the third and final step. Generally, the distance between two molecules is defined to be the minimal number of ‘allowed one-atom changes’ to transform one molecule into the other, where the allowed one-atom changes are the following:
-
taking off or attaching an arbitrary single atom,
-
creating or destroying a single bond (thereby possibly opening or closing a ring structure),
-
changing a single bond to a double bond or vice versa.
It is not hard to see that an algorithmic way of computing the distance between two molecules m1 and m2 is to find the maximal common sub-graph mc of their molecular structure, and to sum up how many steps are required to go from m1 to mc and from m2 to mc. The distance matrix between the 20 amino acids in Table 3 has been obtained in this way, using the Small Molecule Subgraph Detector (SMSD) toolkit (Rahman et al. 2009) to find the maximal common subgraph and post-processing this information with a python script. The software code can be found in the supplemental information.
In order to perform the error-robustness calculations, we followed the procedure by Haig and Hurst (1991) and considered the squared distances. In this way, the zeroes in the diagonal remain zero. The values for small changes become slightly larger (so the edge from Phe to Tyr gets a value 9), while the values for large changes (like going from Gly to Tyr) become considerably larger (in the case of Gly to Tyr 20 becomes 400). Large changes thus get stronger emphasis (Di Giulio 1989). Whether squaring is the right way to make these kind of calculations has been discussed elsewhere (Ardell 1998; Freeland et al. 2000); we just want to compare molecular structure as an input to characteristics like polar requirement, hydropathy, volume and isoelectric point, as studied by Haig and Hurst (1991). The histograms of the error-robustness in terms of molecular structure are shown in the right column of Fig. 2.
Although not producing (unlike polar requirement) a result in the <0.1Â % category, it is still remarkable that the SGC is, with 0.151Â %, in the <1Â % category when molecular structure is used as input. This means that this matrix is performing better than volume or the hydropathy scale of hydrophobicity in the work of Haig and Hurst (1991). Even more remarkable, the error robustness comes mainly from the second position, using this measure (Fig. 2).
Inverse Parametric Optimization
Instead of asking the question ‘What is the most error-robust genetic code in terms of e.g. polar requirement?’, one could also ask the question ‘Is there a set of numerical values for the 20 amino acids such that the SGC is the optimal code in terms of error robustness?’ If one particular set of 20 values turns out to have that property, one can compare this set with different sets of amino acid characteristics, and suggest which characteristic resembles the ‘ideal values’ best. This then might be the factor playing a selective role during evolution of the SGC.
Let A be the set of amino acids and let \(\mathcal{F}\) be the set of all codes. We aim at solving the following problem: find a non-trivial vector \({\bf x} \in {\rm I R}_{\geq 0}^{A}\) of amino acid property values such that \(MS^{\alpha, {\bf x}}_0({\rm SGC}) = \mathop{\rm arg min}\limits_{F \in \mathcal{F}}MS^{\alpha, {\bf x}}_0(F)\).
To solve this problem, we used a modification of the method of Eppstein (2003). We define variables \({\bf x} \in {\rm I R}^{20}\) and consider the following constraint satisfaction problem: Find x such that
Note that the number of inequalities (3) equals the size of the code space, which can be quite large. To deal with the potentially large number of constraints we follow a cutting plane approach. We work with intermediate solutions \(\overline{{\bf x}}_i\), start with i = 0, and set \(\overline{{\bf x}}_0\) to some random values that satisfy constraints (1) and (2). We then solve the separation problem for the class of constraints (3). That is, we have to find a code F such that \(MS^{\alpha,\overline{{\bf x}_i}}_0(F) < MS^{\alpha,\overline{{\bf x}}_i}_0({\rm SGC})\) or prove that no such code exists. We can answer this question by finding
using the quadratic assignment approach described in Buhrman et al. (2011). In fact, for the actual procedure it suffices to use much faster QAP heuristics, e.g. based on simulated annealing (Burkard and Rendl 1984) or the GRASP heuristic (Li et al. 1994), instead of full QAP solvers. If we find an F with \(MS^{\alpha,\overline{{\bf x}_i}}_0(F) <\; MS^{\alpha,\overline{{\bf x}}_i}_0({\rm SGC})\), we have found a violated inequality
which we add to the constraint satisfaction problem. We solve this set of quadratic constraints using the non-linear constraint solver fmincon from MATLAB’s optimization toolbox (MATLAB 2011), obtain a new set of values \(\overline{{\bf x}}_{i+1}\) and iterate the process until no more violated inequalities can be separated. A final solution x* can be verified by a QAP solver such as Burkard and Derigs (1980). All software used is provided as supplemental information.
Using this procedure, we found many different sets of 20 values under which the SGC is optimal with respect to error-robustness. We steered the values towards the polar requirement values r by using the distance to r as the objective function in our approach. See Fig. 3 for an illustration of some of the solutions we found.
Eight examples of sets of values for the 20 amino acids that make the SGC the most error-robust genetic code. The (artificial) values are found by using inverse parametric optimization, as described in Appendix: Inverse Parametric Optimization. All sets have been normalized to have mean 0 and standard deviation 1. For comparison, we also show the original polar requirements on top (1), and the updated polar-requirement values on the second row (2). Value sets 3–6 make the SGC optimal with respect to MS0. Value sets 7–10 make the SGC optimal with respect to MS FH0
An analysis of the correlation coefficients of these ‘ideal’ values with a database of 744 known amino acid properties from the literature (AAindex: Kawashima et al. 1999) shows no correlation above 0.82 except with polar requirement. In other words, we do not know of any sets of straightforward physico-chemical amino acid properties which resemble one of these ‘ideal’ sets. This might suggest that a combination of several aspects of code evolution and amino acid properties [as suggested by e.g. Higgs (2009)] resulted in the configuration of the SGC.
Scan of Other Amino Acid Properties
We performed error-robustness calculations for all (complete) amino acid properties of the AAindex-database (Kawashima et al. 1999). For the purpose of comparison, we extended the database to include the original polar requirements (Woese et al. 1966a), and the updated polar requirements (Mathew and Luthey-Schulten 2008), as well as two sets of numerical values found by the procedure described in ‘Appendix: Inverse Parametric Optimization’.
In a first scan, 50000 random codes were sampled from
-
1.
all codes,
-
2.
codes with the seven assignments of Phe, Tyr, Trp, His, Leu, Ile, and Arg fixed,
-
3.
codes with seven fixed assignments and respecting the structure enforced by the constraint of the historically reasonable set of possible codes (all 11,520 codes were computed in this case).
For all of the three settings above, error-robustness values were computed using Haig–Hurst and Freeland–Hurst weights (the same random samples were used for the two weight sets, the results are thus statistically correlated).
Out of the 55 best-performing codes, the same calculations as above were performed with 106 samples. The 20 best performing properties are presented in Table 4. Not surprisingly, our two sets of (artificial) numerical values found by inverse parametric optimization (described in ‘Appendix Inverse Parametric Optimization’) end up on the top.
Furthermore, we observe that the SGC is error-robust in terms of several measures of polar requirement [as noted, e.g. in Vetsigian et al. (2006)]. One of these (for which this is not immediately obvious) is Grantham’s polarity scale (1974), which is a combination of Aboderin’s scale (1971) and polar requirement. It is especially noteworthy that the updated polar requirement (Mathew and Luthey-Schulten 2008) is consistently showing up within the best four sets of numerical values. When the sets found by inverse parametric optimization are left out, the updated values of polar requirement are in all three settings (no blocks fixed, 7 blocks fixed, and the set of 11,520 codes resulting from 7 fixed blocks plus the constraint of the historically reasonable set of possible codes) the best set of values when Freeland–Hurst weights are used.
Minimal Number of Fixed Assignments
In this appendix, we investigate how many amino acid assignments need to be fixed such that the SGC is the most error-robust genetic code with respect to the updated polar requirements (Mathew and Luthey-Schulten 2008), when we do not use the constraint of the historically reasonable set of possible codes.
For the case of the Haig–Hurst weights, there are 67 different minimal subsets \(S_1, S_2, \ldots, S_{67} \subseteq \{\hbox{Phe}, \hbox{Leu}, \hbox{Ile}, \ldots, \hbox{Ser}, \hbox{Gly}\)} such that for any \(i \in \{1,2,\ldots, 67\)}, fixing the assignments of all amino acids in Si makes the SGC the most error-robust genetic code. Any super-set of these 67 minimal subsets will also have this property, because fixing more assignments only limits the number of possible genetic codes. Out of the 67 minimal subsets, 34 of them are of size 9, 15 of size 10, 15 of size 11, and 3 of size 12.
When fixing the seven assignments of Phe, Tyr, Trp, His, Leu, Ile, and Arg (based on aptamer experiments) the minimal sets of assignments that need to be fixed in addition are: {Ser, Gln, Cys} or {Met, Ser, Gln}.
For the case of the Freeland–Hurst weights, there are 186 different minimal subsets: 2 subsets of size 10, 4 of size 11, 13 of size 12, 44 of size 13, 52 of size 14, 45 of size 15, 21 of size 16, and 5 of size 17. When fixing the seven assignments of Phe, Tyr, Trp, His, Leu, Ile, and Arg (based on aptamer experiments), there are 6 different minimal sets (of size 6) each of which can be fixed in addition in order to make the SGC the most error-robust genetic code.
Rights and permissions
About this article
Cite this article
Buhrman, H., van der Gulik, P.T.S., Klau, G.W. et al. A Realistic Model Under Which the Genetic Code is Optimal. J Mol Evol 77, 170–184 (2013). https://doi.org/10.1007/s00239-013-9571-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-013-9571-2

