Abstract
The genetic code was likely complete in its current form by the time of the last universal common ancestor (LUCA). Several scenarios have been proposed for explaining the code’s pre-LUCA emergence and expansion, and the relative order of the appearance of amino acids used in translation. One co-evolutionary model of genetic code expansion proposes that at least some amino acids were added to the code by the ancient divergence of aminoacyl-tRNA synthetase (aaRS) families. Of all the amino acids used within the genetic code, Trp is most frequently claimed as a relatively recent addition. We observe that, since TrpRS and TyrRS are paralogous protein families retaining significant sequence similarity, the inferred sequence composition of their ancestor can be used to evaluate this co-evolutionary model of genetic code expansion. We show that ancestral sequence reconstructions of the pre-LUCA paralog ancestor of TyrRS and TrpRS have several sites containing Tyr, yet a complete absence of sites containing Trp. This is consistent with the paralog ancestor being specific for the utilization of Tyr, with Trp being a subsequent addition to the genetic code facilitated by a process of aaRS divergence and neofunctionalization. Only after this divergence could Trp be specifically encoded and incorporated into proteins, including the TyrRS and TrpRS descendant lineages themselves. This early absence of Trp is observed under both homogeneous and non-homogeneous models of ancestral sequence reconstruction. Simulations support that this observed absence of Trp is unlikely to be due to chance or model bias. These results support that the final stages of genetic code evolution occurred well within the “protein world,” and that the presence–absence of Trp within conserved sites of ancient protein domains is a likely measure of their relative antiquity, permitting the relative timing of extremely early events within protein evolution before LUCA.
Similar content being viewed by others
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Alves R, Savageau MA (2005) Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes. Mol Microbiol 56:1017–1034
Andam CP, Williams D, Gogarten JP (2010) Biased gene transfer mimics patterns created through shared ancestry. Proc Natl Acad Sci USA 107:10679–10684
Bedouelle H, Guez-Ivanier V, Nageotte R (1993) Discrimination between transfer-RNAs by tyrosyl-tRNA synthetase. Biochimie 75:1099–1108
Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2014) GenBank. Nucleic Acids Res 42:D32–D37
Brochier C, Gribaldo S, Zivanovic Y, Confalonieri F, Forterre P (2005) Nanoarchaea: representatives of a novel archaeal phylum or a fast-evolving euryarchaeal lineage related to Thermococcales? Genome Biol 6:R42
Brooks DJ, Fresco JR (2002) Increased frequency of cysteine, tyrosine, and phenylalanine residues since the last universal ancestor. Mol Cell Proteomics 1:125–131
Brooks DJ, Fresco JR, Lesk AM, Singh M (2002) Evolution of amino acid frequencies in proteins over deep time: inferred order of introduction of amino acids into the genetic code. Mol Biol Evol 19:1645–1655
Brown JR, Doolittle WF (1995) Root of the universal tree of life based on ancient aminoacyl-tRNA synthetase gene duplications. Proc Natl Acad Sci USA 92:2441–2445
Brown JR, Robb FT, Weiss R, Doolittle WF (1997) Evidence for the early divergence of tryptophanyl- and tyrosyl-tRNA synthetases. J Mol Evol 45:9–16
Cavalcanti AR, Leite ES, Neto BB, Ferreira R (2004) On the classes of aminoacyl-tRNA synthetases, amino acids and the genetic code. Orig Life Evol Biosph 34:407–420
Chandrasekaran SN, Yardimci GG, Erdogan O, Roach J, Carter CW Jr (2013) Statistical evaluation of the Rodin-Ohno hypothesis: sense/antisense coding of ancestral class I and II aminoacyl-tRNA synthetases. Mol Biol Evol 30:1588–1604
Dasgupta S, Basu G (2014) Evolutionary insights about bacterial GlxRS from whole genome analyses: is GluRS2 a chimera? BMC Evol Biol 14:26
Di Giulio M (1992) The evolution of aminoacyl-tRNA synthetases, the biosynthetic pathways of amino acids and the genetic code. Orig Life Evol Biosph 22:309–319
Dong X, Zhou M, Zhong C, Yang B, Shen N, Ding J (2010) Crystal structure of Pyrococcus horikoshii tryptophanyl-tRNA synthetase and structure-based phylogenetic analysis suggest an archaeal origin of tryptophanyl-tRNA synthetase. Nucleic Acids Res 38:1401–1412
Doublie S, Bricogne G, Gilmore C, Carter CW Jr (1995) Tryptophanyl-tRNA synthetase crystal structure reveals an unexpected homology to tyrosyl-tRNA synthetase. Structure 3:17–31
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Fang ZP et al (2014) Coexistence of bacterial leucyl-tRNA synthetases with archaeal tRNA binding domains that distinguish tRNA(Leu) in the archaeal mode. Nucleic Acids Res 42:5109–5124
Fondi M, Brilli M, Emiliani G, Paffetti D, Fani R (2007) The primordial metabolism: an ancestral interconnection between leucine, arginine, and lysine biosynthesis. BMC Evol Biol 7(Suppl 2):S3
Fournier GP, Gogarten JP (2007) Signature of a primitive genetic code in ancient protein lineages. J Mol Evol 65:425–436
Fournier GP, Gogarten JP (2010) Rooting the ribosomal tree of life. Mol Biol Evol 27:1792–1801
Fournier GP, Andam CP, Alm EJ, Gogarten JP (2011) Molecular evolution of aminoacyl tRNA synthetase proteins in the early history of life. Orig Life Evol Biosph 41:621–632
Fukuchi S, Yoshimune K, Wakayama M, Moriguchi M, Nishikawa K (2003) Unique amino acid composition of proteins in halophilic bacteria. J Mol Biol 327:347–357
Gaucher EA, Thomson JM, Burgan MF, Benner SA (2003) Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins. Nature 425:285–288
Grosjean H, de Crecy-Lagard V, Marck C (2010) Deciphering synonymous codons in the three domains of life: co-evolution with specific tRNA modification enzymes. FEBS Lett 584:252–264
Groussin M, Boussau B, Gouy M (2013) A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences. Syst Biol 62:523–538
Gueguen L et al (2013) Bio++: efficient extensible libraries and tools for computational molecular evolution. Mol Biol Evol 30:1745–1750
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321
Hartlein M, Cusack S (1995) Structure, function and evolution of seryl-tRNA synthetases: implications for the evolution of aminoacyl-tRNA synthetases and the genetic code. J Mol Evol 40:519–530
Hartman H, Smith TF (2014) The evolution of the ribosome and the genetic code. Life 4:227–249
Higgs PG (2009) A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol Direct 4:16
Higgs PG, Pudritz RE (2009) A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic code. Astrobiology 9:483–490
Hogue CW, Doublie S, Xue H, Wong JT, Carter CW Jr, Szabo AG (1996) A concerted tryptophanyl-adenylate-dependent conformational change in Bacillus subtilis tryptophanyl-tRNA synthetase revealed by the fluorescence of Trp92. J Mol Biol 260:446–466
Huang J, Xu Y, Gogarten JP (2005) The presence of a haloarchaeal type tyrosyl-tRNA synthetase marks the opisthokonts as monophyletic. Mol Biol Evol 22:2142–2146
Inagaki Y, Susko E, Roger AJ (2006) Recombination between elongation factor 1alpha genes from distantly related archaeal lineages. Proc Natl Acad Sci USA 103:4528–4533
Jukes TH (1973) Possibilities for the evolution of the genetic code from a preceding form. Nature 246:22–26
Jukes TH (1981) Amino acid codes in mitochondria as possible clues to primitive codes. J Mol Evol 18:15–17
Klipcan L, Safro M (2004) Amino acid biogenesis, evolution of the genetic code and aminoacyl-tRNA synthetases. J Theor Biol 228:389–396
Knauth LP (2005) Temperature and salinity history of the Precambrian ocean: implications for the course of microbial evolution. Palaeogeogr Palaeoclimatol Palaeoecol 219:53–69
Koonin EV, Novozhilov AS (2009) Origin and evolution of the genetic code: the universal enigma. IUBMB Life 61:99–111
Landes C, Perona JJ, Brunie S, Rould MA, Zelwer C, Steitz TA, Risler JL (1995) A structure-based multiple sequence alignment of all class I aminoacyl-tRNA synthetases. Biochimie 77:194–203
Miller SL (1953) A production of amino acids under possible primitive earth conditions. Science 117:528–529
Nagel GM, Doolittle RF (1995) Phylogenetic analysis of the aminoacyl-tRNA synthetases. J Mol Evol 40:487–498
Osawa S, Jukes TH, Watanabe K, Muto A (1992) Recent evidence for evolution of the genetic code. Microbiol Rev 56:229–264
Podar M et al (2008) A genomic analysis of the archaeal system Ignicoccus hospitalis-Nanoarchaeum equitans. Genome Biol 9:R158
Praetorius-Ibba M et al (2000) Ancient adaptation of the active site of tryptophanyl-tRNA synthetase for tryptophan binding. Biochemistry 39:13136–13143
Ribas de Pouplana L, Frugier M, Quinn CL, Schimmel P (1996) Evidence that two present-day components needed for the genetic code appeared after nucleated cells separated from eubacteria. Proc Natl Acad Sci USA 93:166–170
Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504
Tivorsak TL (2001) Reconstructing ancestral biosynthetic enzymes: an approach to explore the evolution of the genetic code—tryptophan synthase as a model. Senior Thesis, Princeton University
Trifonov EN (2000) Consensus temporal order of amino acids and evolution of the triplet code. Gene 261:139–151
Tsunoda M et al (2007) Structural basis for recognition of cognate tRNA by tyrosyl-tRNA synthetase from three kingdoms. Nucleic Acids Res 35:4289–4300
Vetsigian K, Woese C, Goldenfeld N (2006) Collective evolution and the genetic code. Proc Natl Acad Sci USA 103:10696–10701
Wetzel R (1978) Aminoacyl-tRNA synthetase families and their significance to the origin of the genetic code. Orig Life 9:39–50
Wetzel R (1995) Evolution of the aminoacyl-tRNA synthetases and the origin of the genetic code. J Mol Evol 40:545–550
Wong JT (1988) Evolution of the genetic code. Microbiol Sci 5:174–181
Xie Y, Reeve JN (2005) Regulation of tryptophan operon expression in the archaeon Methanothermobacter thermautotrophicus. J Bacteriol 187:6419–6429
Acknowledgments
This work was supported by the National Science Foundation Grant 0936234, NASA Astrobiology Institute Grant NNA08CN84A, and an appointment from the NASA Postdoctoral Program to GPF at the Massachusetts Institute of Technology. We thank Mathieu Groussin and Bastien Boussau for helpful discussions and their assistance with implementing non-homogeneous ancestral reconstruction models.
Conflict of interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
239_2015_9672_MOESM1_ESM.xlsx
Homogeneous ancestral reconstruction of TyrRS/TrpRS paralog ancestor, per-site amino acid probabilities. Sites within majority topology regions are labeled V(vertical), sites within proposed partial HGT regions are labeled R1-R3(recombined). Combined alignment sites refer to the native sequence alignment, and match the numbering used throughout the manuscript. Topology-specific alignment sites refer to the provided FASTA format sequence alignments for each topology region (V, R1, R2, R3). As R regions were removed from the V alignment, these numberings differ. Numbering in the “prob” column headings refers to each internal node reconstruction. The mapping of these nodes to the phylogeny is provided in Online Resource 11. For each node, “max” refers to the maximum-likelihood AA for each site. Supplementary material 1 (XLSX 10629 kb)
239_2015_9672_MOESM2_ESM.xlsx
Non-homogeneous ancestral reconstruction of TyrRS/TrpRS paralog ancestor, per-site amino acid probabilities. Sites within majority topology regions are labeled V(vertical), sites within proposed partial HGT regions are labeled R1-R3(recombined). Combined alignment sites refer to the native sequence alignment, and match the numbering used throughout the manuscript. Topology-specific alignment sites refer to the provided FASTA format sequence alignments for each topology region (V, R1, R2, R3). As R regions were removed from the V alignment, these numberings differ. Numbering in the “prob” column headings refers to each internal node reconstruction. The mapping of these nodes to the phylogeny is provided in Online Resource X. For each node, “max” refers to the maximum-likelihood AA for each site. Supplementary material 2 (XLSX 10649 kb)
239_2015_9672_MOESM4_ESM.fasta
FASTA format alignment of TyrRS/TrpRS protein sequences, with partial HGT regions removed. Supplementary material 4 (FASTA 183 kb)
239_2015_9672_MOESM5_ESM.fasta
FASTA format alignment of TyrRS/TrpRS protein sequences, proposed partial HGT region R1. Supplementary material 5 (FASTA 7 kb)
239_2015_9672_MOESM6_ESM.fasta
FASTA format alignment of TyrRS/TrpRS protein sequences, proposed partial HGT region R2. Supplementary material 6 (FASTA 13 kb)
239_2015_9672_MOESM7_ESM.fasta
FASTA format alignment of TyrRS/TrpRS protein sequences, proposed partial HGT region R3. Supplementary material 7 (FASTA 5 kb)
Rights and permissions
About this article
Cite this article
Fournier, G.P., Alm, E.J. Ancestral Reconstruction of a Pre-LUCA Aminoacyl-tRNA Synthetase Ancestor Supports the Late Addition of Trp to the Genetic Code. J Mol Evol 80, 171–185 (2015). https://doi.org/10.1007/s00239-015-9672-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-015-9672-1