Crick Wobble and Superwobble in Standard Genetic Code Evolution

Yarus, Michael

doi:10.1007/s00239-020-09985-7

Crick Wobble and Superwobble in Standard Genetic Code Evolution

Original Article
Open access
Published: 07 January 2021

Volume 89, pages 50–61, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Molecular Evolution Aims and scope Submit manuscript

Crick Wobble and Superwobble in Standard Genetic Code Evolution

Download PDF

Michael Yarus ORCID: orcid.org/0000-0003-0295-9795¹

3025 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Wobble coding is inevitable during evolution of the Standard Genetic Code (SGC). It ultimately splits half of NN U/C/A/G coding boxes with different assignments. Further, it contributes to pervasive SGC order by reinforcing close spacing for identical SGC assignments. But wobble cannot appear too soon, or it will inhibit encoding and more decisively, obstruct evolution of full coding tables. However, these prior results assumed Crick wobble, NN U/C and NN A/G, read by a single adaptor RNA. Superwobble translates NN U/C/A/G codons, using one adaptor RNA with an unmodified 5′ anticodon U (appropriate to earliest coding) in modern mitochondria, plastids, and mycoplasma. Assuming the SGC was selected when evolving codes most resembled it, characteristics of the critical selection events can be calculated. For example, continuous superwobble infrequently evolves SGC-like coding tables. So, continuous superwobble is a very improbable origin hypothesis. In contrast, late-arising superwobble shares late Crick wobble’s frequent resemblance to SGC order. Thus late superwobble is possible, but yields SGC-like assignments less frequently than late Crick wobble. Ancient coding ambiguity, most simply, arose from Crick wobble alone. This is consistent with SGC assignments to NAN codons.

Pentamers with Non-redundant Frames: Bias for Natural Circular Code Codons

Article 07 January 2020

Pathways of Genetic Code Evolution in Ancient and Modern Organisms

Article 09 June 2015

The Standard Genetic Code can Evolve from a Two-Letter GC Code Without Information Loss or Costly Reassignments

Article 29 June 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Calculation of the Evolution of Individual Coding Tales

Information below comes from simulation of the process of SGC evolution (see ‘Methods’). An era of early triplet assignment, decay and capture of new triplets is followed to a finished code. Time elapses in passages, computer visits to an evolving genetic code table (Yarus 2020a), which are proportional to real-world time. During a passage initial assignments, decays, and mutational capture of new triplets occur with assigned probabilities. Repeated passages yield complete coding tables (with all 22 functions), full coding tables (with 64 triplets assigned), and even near-full codes that are also near completion, proximal to the SGC itself (Yarus 2020b). Variation of the rules and probabilities for codon assignment allows calculation of evolved SGC frequencies (see ‘Methods’). Such frequencies determine how many independent codes would have to be examined to find an SGC-like code (Yarus 2020b). One can thereby seek the most likely route to SGC-like codes. Here, Crick wobble (Crick 1966) and superwobble (Rogalski et al. 2008) are compared in this way.

First-Hand Information on Code Evolution

The genetic code evolves. Many evolutionarily recent departures from the near-universal code are known (Jukes and Osawa 1993), though a minority of codon assignments have been seen to change. Often, universal stop codons are modified (Osawa and Jukes 1989). Limited observable change is understandable among a complex biota which must compete with other highly selected systems, so that code change is rare. Modern code evolution is therefore said to be “frozen” (Crick 1968), though it might be called chilled. Nonetheless, modern changes offer important information. Altered assignments define practical variations and thereby, indicate low barriers over which evolutionary revision might go. Such indicators have their limits: They are most informative about the current nucleoprotein-based code, because they occurred in the molecular context of the modern SGC. So modern coding offers the most explicit information only about terminal stages of coding evolution. This is consistent with repeated recoding of stop codons, whose definitive encoding must have been late, after domain separation (Burroughs and Aravind 2019). However, modern changes necessarily also reflect the logic of coding itself, offering indirect guidance about the course of the likely ancestral, RNA-based code. Accordingly, modern coding variations are retroevolutionary pointers, defining usable routes toward the SGC.

Examples of Change: Reassigned Termination Codons

A protist parasite of insects, Blastocrithidia, has reassigned UAA, UAG and UGA, thus altering all its ‘universal’ stop codons (Záhonová et al. 2016). Apparently, UAA and UAG can be translated as both glutamine and stop, while UGA has become a tryptophan codon. Terminal mRNA structure may determine when ambiguous translation as stop rather than an amino acid occurs (Swart et al. 2016). Similar ambiguous stop translation is common today, seen even in metazoa, as for hundreds of Drosophila genes (Jungreis et al. 2011).

Examples of Change: Reassigned Termination Codons and New Amino Acids

Eubacterial selenium-containing enzymes have active sites translated using the ‘universal’ UGA stop as a codon for selenocysteine (the 21st amino acid). Encoding requires a dedicated aminoacyl-tRNA and special translation factor (Zinoni et al. 1990). Similarly, the Archaeal methanogen Methanosarcina uses the ‘universal’ UAG stop codon to co-translationally insert pyrrolysine (the 22nd amino acid) using a dedicated aminoacyl-RNA synthetase and tRNA (Polycarpo et al. 2004).

Examples of Change: Unassigned Amino Acid Codons

The Gram-positive bacterium Mycoplasma capricolum has no adaptor to translate ‘universal’ CGG arginine (Andachi et al. 1989).

Examples of Change: Reassigned Amino Acid Codons

The eukaryotic yeast Candida translates cytoplasmic ‘universal’ CUG leucine codons as serine, using a tRNA^Ser mutated to pair with the leucine codon CUG (Santos et al. 2011). The altered tRNA is mostly charged with serine, but is also acylated with a small minority of leucine. Coding reassignment may depend on evolutionary pressure from changing DNA base composition (Jukes and Osawa 1993) and/or an intermediate ambiguous encoding (Schultz and Yarus 1994). Such ambiguity is documented for Candida (Santos et al. 2011) and Blastocrithidia (Záhonová et al. 2016).

Examples of Change: Unassigned Amino Acid Codons and Termination Codons

The complete genome of bacterium E coli has been replaced with synthetic DNA, making no use of ‘universal’ UCA and UCG serine, and simultaneously removing ‘universal’ UAG stops. The resulting bacterium has three unused codons, as a result of 1.8 × 10⁴ genomic codon changes. This is particularly impressive, because no overt functional selection was applied. In minimal growth medium at 37 °C, the recoded cell is quite competent, doubling in 1.7 × the parental bacterium’s time (Fredens et al. 2019). Thus, partial codes, even when they do not meet a selected requirement, are viable and functional: that is, legitimate evolutionary intermediates. In fact, the altered E. coli code resembles a computed evolutionary intermediate with unassigned sense and stop codons (Yarus 2020b).

Alternate Wobbles

Informative coding changes extend beyond assignments, including also changed coding machinery. RNA adaptors, like aminoacyl-tRNAs, can pair to and translate more than one template codon using alternative base pairing, first recognized and called wobble by Frances Crick (Crick 1966) shortly after the genetic code was defined (e.g., Nirenberg et al. 1963). Nucleotide modifications enable a variety of such pairs with third codon nucleotides in modern coding (Grosjean and Westhof 2016). However, if one accepts a limitation to unmodified nucleotides, whose universal modern use makes a strong argument for ancient presence in the code, primordial wobble would include pairing to NN U/C and NN A/G codons, based on Crick’s (Crick 1966) G:U and U:G wobble pairs. Here this is termed Crick wobble, though this naming neglects Crick’s inosine wobble, because inosine is a modified A (as in Bass and Weintraub 1988).

Superwobbles

Yeast Saccharomyces mitochondria (Bonitz et al. 1980) and fungal Neurospora mitochondria (Heckman et al. 1980) have only one tRNA to translate unmixed family boxes; that is, with all four codons NN U/C/A/G assigned to a single amino acid. For example, all alanine GC U/C/A/G translation is carried out with a single tRNA, having an unmodified U at its anticodon wobble position. Sometimes called ‘superwobble’, the same wobble system appears in bacterial Mycoplasma (Andachi et al. 1989) and tobacco Nicotiana plastids (Rogalski et al. 2008).

The genetic mechanism has been extensively worked out in tobacco plastids (Alkatib et al. 2012). In plastids, superwobble always exists in unmixed family boxes. However, translation is inefficient with respect to pairs of Crick-wobbling tRNAs or Crick wobble for NN U/C and overlapping superwobble in addition (Rogalski et al. 2008). Superwobble would also be strikingly appropriate for primordial coding: Simpler adaptor sets are needed for coverage of 20 assigned functions (van der Gulik and Hoff 2011), suited for fewer expressed genes, and appropriate for reduced levels of gene products (Vernon et al. 2001). An emerging genetic code plausibly also required a simplified translation apparatus, expressing only a few functions, and initially might not demand exceptional amounts of product. There is also a more specific rationale for superwobble. Continuous Crick wobble evolution has intrinsic difficulty evolving full codes, with all triplets assigned (Yarus 2020a). Superwobble, which assigns four codons at once rather than one or two, might increase wobble assignments via greater rates, extents, or both.

Results

Late Crick Wobble

The panels of Figs. 1, 2 and 3 compare average kinetics for coding table evolution following three different histories (Yarus 2020a). In Fig. 1a, late Crick wobble history is used: This implies that after an initial group of single-triplet assignments, translational mechanics required to make third position wobble specific and accurate (Moazed and Noller 1986; Ogle and Ramakrishnan 2005) evolve. Thereafter, Crick wobble is quickly adopted wherever possible in the nascent code (Fig. 1a). Such late adoption of wobble is the preferred path to SGC-like codes, because it easily evolves full coding tables, and also allows more frequent access to SGC-like (Yarus 2020a) codes. The alternative to late wobble is continuous wobble, where wobble exists throughout code evolution (Yarus 2020a).

For late Crick wobble, pyrimidine- and purine-ending codon groups, NN U/C and NN A/G, have the same assignment, but pyrimidine-ending codons can have different assignments from the purine-encoded triplets (Crick 1966). Such evolution (Yarus 2020a) easily approaches a full coding table (“assigned”, Fig. 1a) while simultaneously attaining coding capacity for 20 or more functions (“ ≥ 20 fn”, Fig. 1a), which becomes significant in a population after 60 passages. The average code evolves to a serviceable semifinal state, with sufficient codons left unassigned for later-evolving initiation and termination functions (Yarus 2020a), and perhaps a delayed amino acid (“encoded”, Fig. 1a).

Superwobble Implementation

To emulate modern superwobble (Alkatib et al. 2012), Crick wobble and superwobble overlap in the event called “superwobble” here. That is, a newly assigned triplet can adopt Crick wobble, given that its wobble partner is free for such coding. If the other two triplets in its family box are also free, then it can expand to be translated by superwobble, creating identical assignments for NN U/C/A/G. But if either of the additional two triplets is already assigned, then coding stops at Crick wobble: NN U/C or NN A/G. To complete this assignment list, a triplet assigned a unique meaning during a pre-wobble era can also retain it, persisting as a single, non-wobbling codon (possibly with a differently assigned neighbor) into the later post-wobble era. When an assignment decays, its absence frees all triplets previously read for reassignment.

Late Superwobble

Figure 1b presents mean results of superwobble implementation at the cited times, in passages. The results are much like Fig. 1a, for Crick wobble. However, more assignment in every use of superwobble, which can assign four codons at a time, appears in a greater number of codons occupied (“assigned”, Fig. 1b) just after 60 passages, when codes with near-complete coding capacity (“ ≥ 20 fn”) begin to appear. However, later behavior of Crick- and superwobble is similar, with full coding tables and near–complete coding appearing for both histories.

Continuous Superwobble

Continuous superwobble, existing from the initiation of code evolution (Fig. 1c), is very different from late Crick and late superwobble, above. Marked differences appear in average codons occupied (“assigned”, Fig. 1c), in functions coded (“encoded”, Fig. 1c), and ultimately, in acquisition of near-full coding capacity (“ ≥ 20 fn”, Fig. 1c). All these indices of progress toward SGC capabilities are diminished or slowed.

Assignment of triplets does not approach full coding. Further, this average deficit stabilizes within the Figure. It is a property of the near-steady state—even given time, full assignment will not occur (Fig. 1c).

Capacity for near-complete encoding, ≥ 20 functions, accumulates very slowly. To make its kinetics visible, it is plotted at 1 × and 10 × its observed value in Fig. 1c. Whereas late Crick wobble and late superwobble population evolve to more than 77% near-complete coding in the early times shown in Fig. 1, continuous superwobble allows ≈ 200-fold less accumulated capacity.

Mean encoded functions reach about 15.8 of 22 amino acids/start/stop in Fig. 1c and this value is near-steady; it will not improve greatly. This is not just true of the mean; even the complete tables at the upper tail of the distribution are quite rare at ≈ 1 in 10⁶.

The Difficulty with Continuous Superwobble

Figure 2 shows why the continuous wobble deficiency exists. It plots the fraction of coding tables that became capable of encoding 20 functions in bins of 25 passages, out to times of 2000 passages. Figure 2a is the relevant plot for any late wobble history, either Crick- or superwobble. It shows the acquisition of near-complete coding capacity during the early period of non-wobbling common to either late-wobbling scheme. Notably, near-complete coding occurs at a sharply defined early time. Late wobble evolution therefore quickly acquires, and virtually always confers, near-full coding capacity.

In contrast, Fig. 2b, for continuous superwobble, shows that coding capacity is delayed, and its average acquisition is at far later times than that for late wobble. Therefore, the probability that code evolution will reach this goal is small, at times when late wobble has already established near-full coding capacity.

Coding Capacity and SGC-Like Assignments Together

One can evolve capacity to encode all functions and still not be SGC-like, if assignments differ from the standard code. Therefore, to evaluate an evolutionary history one wants to know how often a scheme yields coding capacity and SGC-like assignments together. These data are in Fig. 3, for the same range of early times as in Figs. 1 and 2.

Coding Capacity with Accurate Assignment During Late Crick Wobble

Figure 3a shows joint competence for late Crick wobble alone, thus overlapping previously presented data (Yarus 2020a, 2020b). The plot for mean coding capacity, ≥ 20 functions, from Fig. 1a is shown again to facilitate comparisons. Coding capacity accompanied by accurate assignments is plotted in six accompanying curves.

Five of these plots result from counting assignments that differ from the SGC. Thus the data labeled “ ≥ 20 fn & mis ≤ 4” is the fraction of coding tables that encode 20 or more functions with less than or equal to 4 misassignments by comparison to the SGC. At their optimum, these capable ≤ 4 misassignment codes comprise 0.0153 or 1.53% of all late-wobbling coding tables.

Notably, these data also descend to small, but finite values for “ ≥ 20 fn & mis = 0”, which represent elevated coding capacity with no differences at all from SGC codon assignments. These are rarer, as expected: 0.00008 or 0.008% of late-wobbling evolutions.

Finally, evolution of joint competence is evaluated for encoding of 20 or more functions along with previous indices of SGC-like order (Yarus 2020a). Rather than counting misassignments, order is measured via SGC-like spacing in identical assignments, close spacing of assignments with similar side chain chemistry (Woese 1965; Mathew and Luthey-Schulten 2008), and mutational distance from the SGC. To be accounted “close”, a coding table must be ≥ 90% the distance from random codes to the SGC, for all three progress values (termed “jpr (joint progress) ≥ 0.9”). This is the topmost plot, showing “ ≥ 20 fn & jpr ≥ 0.9” achieved in 0.0183 or 1.83% of all evolutions.

Notably, coding capacity with accurate assignments and coding capacity with SGC-like order have overlapping maxima at an early time, as previously pointed out (Yarus 2020b). Both capacity-plus-order criteria then decrease at later times. So, there is an early optimal era during which late Crick-wobbling coding tables most resemble the SGC itself, using indices of both SGC-like order and codon assignment (Yarus 2020b).

Distribution Fitness for Late Crick Wobble

At the 120 passage maximum, coding capacity with SGC-like order exists in ≈ 1.8% of code evolutions, ≈ 1.5% have coding capacity with ≤ 4 differences from the SGC, 0.64% capacity and ≤ 3 differences—down to 0.008% with SGC assignments only. This defines a varied population that can be tested to select a code. The property called distribution fitness (Yarus 2020a) for late Crick wobble is established; very close relatives of the SGC are available. This is significant in itself, but also, such data from times across the optimum argue that if the SGC arose during the era when evolving codes most resemble the SGC, then a nascent SGC could have resembled codes evolved here (see 'Discussion').

Coding Capacity with Accurate Assignment During Late Superwobble

Figure 3b shows data paralleling Fig. 3a, but for late superwobble rather than late Crick wobble. As for Figs. 1a, b and 3 data are somewhat similar. Crick and superwobble data are plotted against the same set of ordinates, and accompanied by their similar ≥ 20 function plots, to facilitate such comparison.

Notably, while progress values (order) and coding capacity are similar for the two histories, assignment accuracy differs. Superwobble reproducibly yields less accurate assignments. This difference is only slightly varied among 0, ≤ 1, ≤ 2, ≤ 3 or ≤ 4 misassignments, so Crick wobble evolution yields, on average, an optimum of ≈ 1.7-fold more frequent SGC-like assignment than superwobble at all levels of accuracy. In particular, this applies to the no-error, mis = 0 assignment identity class—1.6-fold more frequent for Crick wobble than for superwobble.

Coding Capacity with Accurate Assignment During Continuous Superwobble

Figure 3c parallels the first panels of Fig. 3 for late Crick and superwobble, but instead, is computed for continuous superwobble as coding history. To appreciate the differences, note that ordinates in Fig. 3c are smaller than the rest of Fig. 3; smaller by large, order-of-magnitude factors. Continuous superwobble radically reduces both coding capacity (as observed in Figs. 1c and 2b), and the resulting abundance of capable, accurately assigned coding tables. This deficit appears in accuracy assessed as both overall order (joint progress; “ ≥ 20 fn & jpr ≥ 0.9”, Fig. 3c) or literal assignment accuracy (“ ≥ 20 fn & mis…”, Fig. 3c).

An optimal time does not exist for continuous superwobble in the same sense as for late Crick and late superwobble histories (Fig. 3c). Varying amounts of wobble when it is instituted at different times create the optimum for late Crick wobble (Yarus 2020b) and late superwobble (Fig. 3b). Continuous superwobble does not share a comparable effect. But again, constant superwobble’s net effect is similar when measured at different levels of assignment accuracy (cf. Figure 3a, c). So its effect can be summarized: continuous superwobble depresses the evolution of combined coding capacity and assignment accuracy, with respect to best late Crick wobble, by ≅ 100-fold.

Discussion

Here late Crick wobble, late superwobble and continuous superwobble are compared (see ‘Methods’), quantifying their effects on evolving coding tables. These effects are assessed throughout an early era when coding approaches the ≥ 20 function capacity required for an SGC (Fig. 1). The emphasis is: Does superwobble (NN U/C/A/G translation by one adaptor) aid SGC-like evolution?

Previous Implications are Strengthened

Comparison of continuous and late superwobble parallels prior work (Yarus 2020a), where continuous Crick wobble and late Crick wobble were compared. Late Crick wobble previously appeared superior, because it both allowed fuller coding, and created more frequent access to the SGC. Here again, late superwobble allows fuller coding (Fig. 1b, c) than does continuous superwobble, and also much more frequent access to full, accurate, SGC-like assignments (Figs. 2b and 3b, c). Moreover, while the greater span of superwobble coding ambiguity can slightly increase early assignment (Fig. 1a, b), it does not correct continuous wobble’s deficit in near-steady-state assignments (Fig. 1c). Finally, though late superwobble shares late Crick wobble’s approach to full and complete coding (Fig. 1a, b), its quadruple assignments do not increase overall code order (Fig. 3a, b). Continuous superwobble actually decreases codon assignment accuracy, measured as SGC-like assignments in near-complete codes (Fig. 3a–c). One’s impression is: wobble helps structure the code (Yarus 2020a), but too much such help is counterproductive. The best wobble is the least that is sufficient. Late wobble is better than continuous wobble, Crick wobble is better than superwobble.

Notably, late superwobble shares late Crick wobble’s early maximum (≈ 120 passages), when both overall code order and accurate assignments appear maximally and nearly simultaneously (Yarus 2020b), Fig. 3a, b). Early selection of an SGC-like code, when it is most prevalent, is strengthened by these data, showing that such an optimum exists for different late wobble systems.