The Phylogenomic Roots of Translation

  • Derek Caetano-AnollésEmail author
  • Gustavo Caetano-AnollésEmail author


The natural history of translation is mysterious but central to our understanding of the origin and evolution of biochemistry and life. tRNA is at the center of this biological process. Its interactions with aminoacyl-tRNA synthetase enzymes define the specificities of the genetic code and those with the ribosome their accurate biosynthetic interpretation. Here we review structural phylogenomic explorations of thousands of genomes and molecular structures that reveal a ‘metabolic-first’ origin of proteins, the early history of tRNA in interaction with cognate synthetase enzymes, the late appearance of a functional ribosome, and the co-evolutionary history of rRNA and proteins during ribosomal growth. We also discuss how the history of amino acid charging and codon specificities is embedded in tRNA and is encoded in genomes. Results uncover a hidden link between the genetic code and protein flexibility and suggest that tRNA molecules are building blocks of ribosomes and genomes. We make explicit the need to understand processes of molecular growth of macromolecules that would explain a primordial ribosome with both biocatalytic and genetic memory storage functions.


Genetic Code tRNA Molecule Remote Homology Nucleic Acid Molecule Peptidyl Transferase Center 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Computational biology is supported by grants from NSF (OISE-1172791 and DBI-1041233) and USDA (ILLU-802-909) to GCA. DCA is the recipient of NSF postdoctoral fellowship award 1523549.


  1. 1.
    Dyson F. Origins of life. Cambridge: Cambridge University Press; 1999.CrossRefGoogle Scholar
  2. 2.
    Reynolds NM, Lazazzera BA, Ibba M. Cellular mechanisms that control mistranslation. Nature Rev Microbiol. 2010;8:849–56.CrossRefGoogle Scholar
  3. 3.
    Francklyn CS, Minajigi A. tRNA as active chemical scaffold for diverse chemical transformations. FEBS Lett. 2010;584:366–75.CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Hoeppner MP, Gardner PP, Poole AM. Comparative analysis of RNA families reveals distinct repertoires for each domain of life. PLoS Comput Biol. 2012;8:e1002752.CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Fitch WM, Upper K. The phylogeny of tRNA sequences provides evidence for ambiguity reduction in the origin of the genetic code. Cold Spring Harb Symp Quant Biol. 1987;52:759–67.CrossRefPubMedGoogle Scholar
  6. 6.
    Eigen M, Lindemann BF, Tietze M, Winkler-Oswatitsch R, Dress A, von Haeseler A. How old is the genetic code? Science. 1989;244:673–9.CrossRefPubMedGoogle Scholar
  7. 7.
    Di Giulio M. The phylogeny of tRNA molecules and the origin of the genetic code. Orig Life Evol Biosph. 1994;24:425–34.CrossRefPubMedGoogle Scholar
  8. 8.
    Sun F-J, Caetano-Anollés G. The origin and evolution of tRNA inferred from phylogenetic analysis of structure. J Mol Evol. 2008;66:21–35.CrossRefPubMedGoogle Scholar
  9. 9.
    Farias ST. Suggested phylogeny of tRNAs based on the construction of ancestral sequences. J Theor Biol. 2013;335:245–8.CrossRefPubMedGoogle Scholar
  10. 10.
    Hennig W. Phylogenetic systematics. Urbana: University of Illinois Press; 1966.Google Scholar
  11. 11.
    Zuckerkandl E. The appearance of new structures and functions in proteins during evolution. J Mol Evol. 1975;7:1–57.CrossRefPubMedGoogle Scholar
  12. 12.
    Dayhoff MO. The origin and evolution of protein superfamilies. Fed Proc. 1976;35:2132–8.PubMedGoogle Scholar
  13. 13.
    Almo SC, Garforth SJ, Hillerich BS, Love JD, Seidel RD, Burley SK. Protein production from the structural genomics perspective: achievements and future needs. Curr Opin Struct Biol. 2013;23:335–44.CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28:235–42.CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Reddy TBK, Thomas A, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos E, Kyrpides N. The Genomes OnLine Database (GOLD) v. 5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res. 2014;. doi: 10.1093/nar/gku950.Google Scholar
  16. 16.
    Wheeler WC. Systematics: a course of lectures. Hoboken: John Wiley & Sons; 2012.CrossRefGoogle Scholar
  17. 17.
    Caetano-Anollés G, Sun F-J, Wang M, Yafremava LS, Harish A, Kim HS, Knudsen V, Caetano-Anollés D, Mittenthal JE. Origin and evolution of modern biochemistry: insights from genomes and molecular structure. Front Biosci. 2008;13:5212–40.CrossRefPubMedGoogle Scholar
  18. 18.
    Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–40.PubMedGoogle Scholar
  19. 19.
    Orengo CA, Michie A, Jones S, Jones DT, Swindells M, Thornton JM. CATH–a hierarchic classification of protein domain structures. Structure. 1997;5:1093–109.CrossRefPubMedGoogle Scholar
  20. 20.
    Caetano-Anollés G, Caetano-Anollés D. An evolutionarily structured universe of protein architecture. Genome Res. 2003;13:1563–71.CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Nasir A, Caetano-Anollés G. A phylogenomic data-driven exploration of viral origins and evolution. Science Adv. 2015;1:e1500527.CrossRefGoogle Scholar
  22. 22.
    Wang M, Jiang Y-Y, Kim KM, Qu G, Ji H-F, Zhang H-Y, Caetano-Anollés G. A molecular clock of protein folds and its power in tracing the early history of aerobic metabolism and planet oxygenation. Mol Biol Evol. 2011;28:567–82.CrossRefPubMedGoogle Scholar
  23. 23.
    Laurin M. Recent progress in paleontological methods for dating the Tree of Life. Front Genet. 2012;3:130.CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Wang M, Caetano-Anollés G. The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure. 2009;17:66–78.CrossRefPubMedGoogle Scholar
  25. 25.
    Wang M, Yafremava LS, Caetano-Anollés D, Mittenthal LE, Caetano-Anollés G. Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genome Res. 2007;17:1572–85.CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Nasir A, Caetano-Anollés G. Comparative analysis of proteomes and functionomes provides insights into origins of cellular diversification. Archaea. 2013;2013:648746.CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Caetano-Anollés G, Kim HS, Mittenthal JE. The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture. Proc Natl Acad Sci USA. 2007;104:9358–63.CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Kim KM, Qin T, Jiang Y-Y, Chen L-L, Xiong M, Caetano-Anollés D, Zhang H-Y, Caetano-Anollés G. Protein domain structure uncovers the origin of aerobic metabolism and the rise of planetary oxygen. Structure. 2012;20:67–76.CrossRefPubMedGoogle Scholar
  29. 29.
    Caetano-Anollés K, Caetano-Anollés G. Structural phylogenomics reveals gradual evolutionary replacement of abiotic chemistries by protein enzymes in purine metabolism. PLoS ONE. 2013;8:e59300.CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Caetano-Anollés D, Kim KM, Mittenthal JE, Caetano-Anollés G. Proteome evolution and metabolic origins of translation and cellular life. J Mol Evol. 2011;72:14–33.CrossRefPubMedGoogle Scholar
  31. 31.
    Caetano-Anollés G, Wang M, Caetano-Anollés D. Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS ONE. 2013;8:e72225.CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Harish A, Caetano-Anollés G. Ribosomal history reveals origins of modern protein synthesis. PLoS ONE. 2012;7:e32776.CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Caetano-Anollés G, Caetano-Anollés D. Computing the origin and evolution of the ribosome from its structure—uncovering processes of macromolecular accretion benefiting synthetic biology. Comp Struct Biotech J. 2015;13:427–47.CrossRefGoogle Scholar
  34. 34.
    Dupont CL, Butcher A, Valas RE, Bourne PE, Caetano-Anollés G. History of biological metal utilization inferred through phylogenomic analysis of protein structures. Proc Natl Acad Sci USA. 2010;107:10567–72.CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Nath N, Mitchel JOB, Caetano-Anollés G. The natural history of biocatalytic mechanisms. PLoS Comput Biol. 2014;10:e1003642.CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Debès C, Wang M, Caetano-Anollés G, Gratër F. Evolutionary optimization of protein folding. PLoS Comput Biol. 2013;9:e1002861.CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Nasir A, Kim KM, Caetano-Anollés G. Global patterns of domain gain and loss in superkingdoms. PLoS Comput Biol. 2014;10:e1003452.CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Kim KM, Caetano-Anollés G. The proteomic complexity and rise of the primordial ancestor of diversified life. BMC Evol Biol. 2011;11:140.CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Caetano-Anollés G, Mittenthal JE, Caetano-Anollés D, Kim KM. A calibrated chronology of biochemistry reveals a stem line of descent responsible for planetary biodiversity. Front Genet. 2014;5:306.PubMedPubMedCentralGoogle Scholar
  40. 40.
    Vandergon TL. Protein domain structure evolution. Molecular Life Sciences. New York: Springer; 2014. doi: 10.1007/978-1-4614-6436-5_19-2
  41. 41.
    Caetano-Anollés G. Novel strategies to study the role of mutation and nucleic acid structure in evolution. Plant Cell Tissue Org Cult. 2001;67:115–32.CrossRefGoogle Scholar
  42. 42.
    Caetano-Anollés G. Evolved RNA secondary structure and the rooting of the universal tree of life. J Mol Evol. 2002;4:333–45.Google Scholar
  43. 43.
    Caetano-Anollés G. Tracing the evolution of RNA structure in ribosomes. Nucleic Acids Res. 2002;30:2575–87.CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Sun F-J, Fleurdépine S, Bousquet-Antonelli C, Caetano-Anollés G, Deragon J-M. Common evolutionary trends for SINE RNA structures. Trends Genet. 2007;23:26–33.CrossRefPubMedGoogle Scholar
  45. 45.
    Bailor MH, Sun X, Al-Hashimi HM. Topology links RNA secondary structure with global conformation, dynamics, and adaptation. Science. 2010;327:202–6.CrossRefPubMedGoogle Scholar
  46. 46.
    Hyeon C, Thirumalai D. Chain length determines the folding rates of RNA. Biopys J. 2012;102:L11–3.CrossRefGoogle Scholar
  47. 47.
    Fontana W. Modeling ‘evo-devo’ with RNA. BioEssays. 2002;24:1164–77.CrossRefPubMedGoogle Scholar
  48. 48.
    Sun F-J, Caetano-Anollés G. Evolutionary patterns in the sequence and structure of transfer RNA: Early origins of Archaea and viruses. PLoS Comput Biol. 2008;4:e1000018.CrossRefPubMedPubMedCentralGoogle Scholar
  49. 49.
    Sun F-J, Caetano-Anollés G. Evolutionary patterns in the sequence and structure of transfer RNA: A window into early translation and the genetic code. PLoS ONE. 2008;3:e2799.CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Sun F-J, Caetano-Anollés G. The evolutionary history of the structure of 5S ribosomal RNA. J Mol Evol. 2009;69:430–43.CrossRefPubMedGoogle Scholar
  51. 51.
    Sun F-J, Caetano-Anollés G. The ancient history of the structure of ribonuclease P and the early origins of Archaea. BMC Bioinformatics. 2010;11:153.CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Weston PH. Indirect and direct methods in systematics. In: Humphries CJ, editor. Ontogeny and Systematics. New York: Columbia University Press; 1988. p. 27–56.Google Scholar
  53. 53.
    Caetano-Anollés G, Wang M, Caetano-Anollés D, Mittenthal JE. The origin, evolution and structure of the protein world. Biochem J. 2009;417:621–37.CrossRefPubMedGoogle Scholar
  54. 54.
    Sun F-J, Harish A, Caetano-Anolles G. Phylogenetic utility of RNA structure: evolution’s arrow and emergence of modern biochemistry and diversified life. In: Caetano-Anollés G, editor. Evolutionary bioinformatics and systems biology. Hoboken: Wiley-Blackwell; 2010. p. 329–60.Google Scholar
  55. 55.
    Przytycka T, Aurora R, Rose GD. A protein taxonomy based on secondary structure. Nature Struct Biol. 1999;6:672–82.CrossRefPubMedGoogle Scholar
  56. 56.
    Mittenthal JE, Caetano-Anollés D, Caetano-Anollés G. Biphasic patterns of diversification and the emergence of modules. Front Genet. 2012;3:147.CrossRefPubMedPubMedCentralGoogle Scholar
  57. 57.
    Tal G, Boca SM, Mittenthal JE, Caetano-Anollés G. A dynamic model for evolution of protein structure. J Mol Evol. 2016;82:230–243.Google Scholar
  58. 58.
    Caetano-Anollés G, Kim KM, Caetano-Anollés D. The phylogenomic roots of modern biochemistry: Origins of proteins, cofactors and protein biosynthesis. J Mol Evol. 2012;74:1–34.CrossRefPubMedGoogle Scholar
  59. 59.
    Bukhari SA, Caetano-Anollés G. Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes. PLoS Comput Biol. 2013;9:e1003009.CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Ikehara K. Possible steps to the emergence of life: The [GADV]-protein world hypothesis. Chem Rec. 2005;5:107–18.CrossRefPubMedGoogle Scholar
  61. 61.
    Jakschitz T, Rode BM. Evolution from simple in- organic compounds to chiral peptides. Chem Soc Rev. 2012;41:5484–9.CrossRefPubMedGoogle Scholar
  62. 62.
    Söding J, Lupas AN. More than the sum of their parts: On the evolution of proteins from peptides. BioEssays. 2003;25:837–46.CrossRefPubMedGoogle Scholar
  63. 63.
    Trifonov EN, Frenkel ZM. Evolution of protein modularity. Curr Op Struct Biol. 2009;18:335–40.CrossRefGoogle Scholar
  64. 64.
    Goncearenco A, Berezovsky IN. Protein function from its emergence to diversity in contemporary proteins. Phys Biol. 2015;12:045002.CrossRefPubMedGoogle Scholar
  65. 65.
    Aziz MF, Caetano-Anollés G. The early history and emergence of molecular functions and modular scale-free behavior. Sci Rep. 2016;6:25058.Google Scholar
  66. 66.
    Schimmel P, Giege R, Moras D, Yokoyama S. An operational RNA code for amino acids and possible relation to the genetic code. Proc Natl Acad Sci USA. 1993;90:8763–8.CrossRefPubMedPubMedCentralGoogle Scholar
  67. 67.
    Carter CW Jr, Wolfenden R. tRNA acceptor stem and anticodon bases form independent codes related to protein folding. Proc Natl Acad Sci USA. 2015;112:7489–94.CrossRefPubMedPubMedCentralGoogle Scholar
  68. 68.
    Caetano-Anollés G, Sun F-J. The natural history of transfer RNA and its interactions with the ribosome. Front Genet. 2014;5:127.PubMedPubMedCentralGoogle Scholar
  69. 69.
    Rodin SN, Rodin AS. On the origin of the genetic code: signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heredity. 2008;100:341–55.CrossRefPubMedGoogle Scholar
  70. 70.
    Root-Bernstein M, Root-Bernstein R. The ribosome as a missing link in the evolution of life. J Theor Biol. 2015;367:130–58.CrossRefPubMedGoogle Scholar
  71. 71.
    Farias ST, Rêgo TG, José MV. Origin and evolution of the peptidyl transferase center from proto-tRNAs. FEBS Open Bio. 2014;4:175–8.CrossRefPubMedPubMedCentralGoogle Scholar
  72. 72.
    Farias ST, Rêgo TG, José MV. tRNA core hypothesis for the transition between the RNA world to the ribonucleoprotein world. 2016 (submitted).Google Scholar
  73. 73.
    Agmon I, Bashan A, Yonath A. On ribosome conservation and evolution. Israel J Ecol Evol. 2006;52:359–74.CrossRefGoogle Scholar
  74. 74.
    Bloch D, McArthur B, Widdowson R, Spector D, Guimarães RC, Smith J. tRNA-rRNA sequence homologies: a model for the origin of a common ancestral molecule, and prospects for its reconstruction. Orig Life. 1984;14:571–8.CrossRefPubMedGoogle Scholar
  75. 75.
    Caetano-Anollés G, Root-Bernstein R, Caetano-Anollés G. tRNA: building blocks of ribosomes and genomes. 2016 (submitted).Google Scholar
  76. 76.
    Yang S, Doolittle RF, Bourne PE. Phylogeny determined by protein domain content. Proc Natl Acad Sci USA. 2005;102:373–8.CrossRefPubMedPubMedCentralGoogle Scholar
  77. 77.
    Fang H, Oates ME, Pethica RB, Greenwood JM, Sardar AJ, Rackham OJ, Donoghue PC, Stamatakis A, de Lima Morais DA, Gough J. A daily-updated tree of (sequenced) life as a reference for genome research. Sci Rep. 2013;3:2015.Google Scholar
  78. 78.
    Edwards H, Abeln S, Deane CM. Exploring fold preferences of new-born and ancient protein superfamilies. PLoS Comput Biol. 2013;9:e1003325.CrossRefPubMedPubMedCentralGoogle Scholar
  79. 79.
    Goldman AD, Bernhard TM, Dolzhenko E, Landweber LF. LUCApedia: a database for the study of ancient life. Nucleic Acids Res. 2013;41:D1079–82.CrossRefPubMedGoogle Scholar
  80. 80.
    Kim KM, Nasir A, Caetano-Anollés G. The importance of using realistic evolutionary models for retrodicting proteomes. Biochimie. 2014;99:129–37.CrossRefPubMedGoogle Scholar
  81. 81.
    Farris JS. Parsimony and explanatory power. Cladistics. 2008;24:1–23.CrossRefGoogle Scholar
  82. 82.
    Wächtershäuser G. In praise of error. J Mol Evol. 2016. doi: 10.1007/s00239-015-9727-3.PubMedGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Evolutionary GeneticsMax-Planck-Institut für EvolutionsbiologiePlönGermany
  2. 2.Evolutionary Bioinformatics Laboratory, Department of Crop SciencesUniversity of IllinoisUrbanaUSA

Personalised recommendations