Journal of Molecular Evolution

, Volume 72, Issue 1, pp 14–33 | Cite as

Proteome Evolution and the Metabolic Origins of Translation and Cellular Life

  • Derek Caetano-Anollés
  • Kyung Mo Kim
  • Jay E. Mittenthal
  • Gustavo Caetano-AnollésEmail author


The origin of life has puzzled molecular scientists for over half a century. Yet fundamental questions remain unanswered, including which came first, the metabolic machinery or the encoding nucleic acids. In this study we take a protein-centric view and explore the ancestral origins of proteins. Protein domain structures in proteomes are highly conserved and embody molecular functions and interactions that are needed for cellular and organismal processes. Here we use domain structure to study the evolution of molecular function in the protein world. Timelines describing the age and function of protein domains at fold, fold superfamily, and fold family levels of structural complexity were derived from a structural phylogenomic census in hundreds of fully sequenced genomes. These timelines unfold congruent hourglass patterns in rates of appearance of domain structures and functions, functional diversity, and hierarchical complexity, and revealed a gradual build up of protein repertoires associated with metabolism, translation and DNA, in that order. The most ancient domain architectures were hydrolase enzymes and the first translation domains had catalytic functions for the aminoacylation and the molecular switch-driven transport of RNA. Remarkably, the most ancient domains had metabolic roles, did not interact with RNA, and preceded the gradual build-up of translation. In fact, the first translation domains had also a metabolic origin and were only later followed by specialized translation machinery. Our results explain how the generation of structure in the protein world and the concurrent crystallization of translation and diversified cellular life created further opportunities for proteomic diversification.


Origin of life Phylogenetic analysis Protein domain structure Ribonucleoprotein world RNA world 



Aminoacyl-tRNA synthetase




Fold superfamily


Fold family


Node distance


Ribosomal protein


Structural classification of proteins



A substantial portion of this work is part of DCA’s undergraduate thesis. We thank Ajith Harish and Feng-Jie Sun for providing data on RNA-protein interactions, Minglei Wang for phylogenomic reconstruction, and Rakhee Kalelkar for help with construction of Z-diagrams. Research was supported by the National Science Foundation (MCB-0749836), the Illinois C-FAR program, CREES-USDA, and the International Atomic Energy Agency in Vienna. Any opinions, findings, and conclusions and recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Supplementary material

239_2010_9400_MOESM1_ESM.pdf (734 kb)
Supplementary material 1 (PDF 733 kb)


  1. Altman S (2009) A view of RNase P. Mol Biosys 3:604–607Google Scholar
  2. Andreeva A, Howorth D, Chandonia J-M, Brenner SE, Hubbard TJP, Chothia C, Murzin AG (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419–D425PubMedGoogle Scholar
  3. Archie JW (1989) Homoplasy excess ratios: new indices for measuring levels of homoplasy in phylogenetic systematics and a critique of the consistency index. Syst Zool 38:253–269Google Scholar
  4. Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29PubMedGoogle Scholar
  5. Bacher JM, Waas WF, Metzgar D, de Crecy-Lagard V, Schimmel P (2007) Genetic code ambiguity confers a selective advantage on Acinetobacter baybili. J Bacteriol 189:6469–6496Google Scholar
  6. Bagley RJ, Farmer JD, Fontana W (1991) Evolution of metabolism. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. Studies in the science of complexity, vol X. Addison-Wesley, Princeton, pp 141–158Google Scholar
  7. Berchtold H, Reshetnikova L, Reiser COA, Schirmer NK, Sprinzl M, Hilgenfeld R (1993) Crystal structure of active elongation factor Tu reveals major domain rearrangements. Nature 365:126–132PubMedGoogle Scholar
  8. Bogdanov AA, Dontsova OA, Dokudovskaya SS, Lavrik IN (1995) Structure and function of 5S rRNA in the ribosome. Biochem Cell Biol 73:869–876PubMedGoogle Scholar
  9. Brenner SE, Kohl P, Levitt M (2000) The ASTRAL compendium for protein and sequence analysis. Nucleic Acids Res 29:254–256Google Scholar
  10. Britton RA (2009) Role of GTPases in bacterial ribosome assembly. Annu Rev Microbiol 63:155–176PubMedGoogle Scholar
  11. Caetano-Anollés G (2002) Tracing the evolution of RNA structure in ribosomes. Nucleic Acids Res 30:2575–2587PubMedGoogle Scholar
  12. Caetano-Anollés G, Caetano-Anollés D (2003) An evolutionarily structured universe of protein architecture. Genome Res 13:1563–1571PubMedGoogle Scholar
  13. Caetano-Anollés G, Kim HS, Mittenthal JE (2007) The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture. Proc Natl Acad Sci USA 104:9358–9363PubMedGoogle Scholar
  14. Caetano-Anollés G, Wang M, Caetano-Anollés D, Mittenthal JE (2009a) The origin, evolution and structure of the protein world. Biochem J 417:621–637PubMedGoogle Scholar
  15. Caetano-Anollés G, Yafremava LS, Gee H, Caetano-Anollés D, Kim HS, Mittenthal JE (2009b) The origin and evolution of modern metabolism. Intl J Biochem Cell Biol 41:285–297Google Scholar
  16. Caetano-Anollés G, Yafremava LS, Mittenthal JE (2010) Modularity and dissipation in evolution of macromolecular structures, functions, and networks. In: Caetano-Anollés G (ed) Evolutionary genomics and systems biology. Wiley, Hoboken, pp 431–450Google Scholar
  17. Choi I-G, Kim S-H (2007) Global extent of horizontal gene transfer. Proc Natl Acad Sci USA 104:4489–4494PubMedGoogle Scholar
  18. Chothia C, Gough J (2009) Genomic and structural aspects of protein evolution. Biochem J 419:15–28PubMedGoogle Scholar
  19. Collins LJ, Kurland CG, Biggs P, Penny D (2009) The modern RNP world of eukaryotes. J Hered 100:597–604PubMedGoogle Scholar
  20. Coulson AFW, Moult J (2002) A unifold, mesofold, and superfold model of protein fold use. Proteins 46:61–71PubMedGoogle Scholar
  21. Csaba G, Birzele F, Zimmer R (2009) Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct Biol 9:23PubMedGoogle Scholar
  22. Daigle DM, Brown ED (2004) Studies of the interaction of Escherichia coli YjeQ with the ribosome in vitro. J Bacteriol 186:1381–1387PubMedGoogle Scholar
  23. Danchin A, Fang G, Noria S (2007) The extant core bacterial proteome is an archive of the origin of life. Proteomics 7:875–889PubMedGoogle Scholar
  24. Deutscher MP (1984) Processing of tRNA in prokaryotes and eukaryotes. CRC Crit Rev Biochem 17:45–71PubMedGoogle Scholar
  25. Dokudovskaya S, Dontsova O, Shpanchenko O, Bogdanov A, Brimacombe R (1996) Loop IV of 5S ribosomal RNA has contacts both to domain II and to domain V of the 23S RNA. RNA 2:146–152PubMedGoogle Scholar
  26. Doolittle WF (1999) Phylogenetic classification and the universal tree. Science 284:2124–2129PubMedGoogle Scholar
  27. Doolittle RF (2005) Evolutionary aspects of whole-genome biology. Curr Opin Struct Biol 15:248–253PubMedGoogle Scholar
  28. Dupont CL, Butcher A, Valas RE, Bourne PE, Caetano-Anollés G (2010) History of biological metal utilization inferred through phylogenomic analysis of protein structure. Proc Natl Acad Sci USA 107:10567–10572PubMedGoogle Scholar
  29. Egel R (2009) Peptide-dominated membranes preceding the genetic takeover by RNA: latest thinking on a classic controversy. BioEssays 31:1100–1109PubMedGoogle Scholar
  30. Ellington AD, Chen X, Robertson M, Syrett A (2009) Evolutionary origins and directed evolution of RNA. Intl J Biochem Cell Biol 41:254–265Google Scholar
  31. Forslund K, Henricson A, Hollich V, Sonnhammer E (2008) Domain tree-based analysis of protein architecture evolution. Mol Biol Evol 25:254–264PubMedGoogle Scholar
  32. Fox SW (1980) Metabolic microspheres. Naturwissenschaften 67:378–383PubMedGoogle Scholar
  33. Freeland SJ, Knight RD, Landweber LF (1999) Do proteins predate DNA. Science 286:690–692PubMedGoogle Scholar
  34. Gesteland RF, Cech TR, Atkins JF (2006) The RNA world, 3rd edn. Cold Spring Harbor Laboratory Press, New YorkGoogle Scholar
  35. Goldman AD, Samudrala R, Baross JA (2010) The evolution and functional repertoire of translation proteins following the origin of life. Biol Direct 5:15PubMedGoogle Scholar
  36. Gough J (2005) Convergent evolution of domain architectures (is rare). Bioinformatics 21:1464–1471PubMedGoogle Scholar
  37. Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a library of Hidden Markov Models that represent all proteins of known structure. J Mol Biol 313:903–919PubMedGoogle Scholar
  38. Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA (2007) The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 35:D291–D297PubMedGoogle Scholar
  39. Hartwell LH, Hopfield JJ, Leibler S, Murray AW (1999) From molecular to modular cell biology. Nature 402:C47–C52PubMedGoogle Scholar
  40. Hillis DM, Huelsenbeck JP (1992) Signal, noise, and reliability in molecular phylogenetic analysis. J Hered 83:189–195PubMedGoogle Scholar
  41. Holland T, Veretnik S, Shindyalov I, Bourne P (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol 361:562–590PubMedGoogle Scholar
  42. Holzmann J, Frank P, Löfler E, Bennett KL, Gerner C, Rossmanith W (2008) RNase P without RNA: identification and functional reconstitution of the human mitochondrial tRNA processing enzyme. Cell 135:462–474PubMedGoogle Scholar
  43. Hoogstraten CG, Sumita M (2007) Structure-function relationships in RNA and RNP enzymes: recent advances. Biopolymers 87:317–328PubMedGoogle Scholar
  44. Huber C, Wächtershäuser G (2007) α-Hydroxy and α-amino acids under possible Hadean, volcanic origin-of-life conditions. Science 314:630–632Google Scholar
  45. Ikehara K (2009) Pseudo-replication of [GADV]-proteins and origin of life. Int J Mol Sci 10:1525–1537PubMedGoogle Scholar
  46. Jain R, Rivera MC, Lake JA (1999) Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci USA 96:3801–3806PubMedGoogle Scholar
  47. Jeffares DC, Poole AM, Penny D (1998) Relics from the RNA world. J Mol Evol 46:18–36PubMedGoogle Scholar
  48. Ji HF, Kong DX, Shen L, Chen LL, Ma BG, Zhang HY (2007) Distribution patterns of small molecule ligands in the protein universe and implications for origins of life and drug discovery. Genome Biol 8:R176PubMedGoogle Scholar
  49. Joyce GF (2002) The antiquity of RNA-based evolution. Nature 418:214–221PubMedGoogle Scholar
  50. Kacser H, Beeby R (1984) On the origin of enzyme species by means of natural selection. J Mol Evol 20:38–51PubMedGoogle Scholar
  51. Karplus K (2009) SAM-T08, HHM-based protein structure prediction. Nucleic Acids Res 37:W492–W497PubMedGoogle Scholar
  52. Kauffmann SA (1986) Autocatalytic sets of proteins. J Theor Biol 119:1–24Google Scholar
  53. Kauffmann SA (1993) The origins of order. Oxford University Press, New YorkGoogle Scholar
  54. Kauffmann SA (2007) Question 1: origin of life and the living state. Orig Life Evol Biosph 37:315–322Google Scholar
  55. Kim KM, Caetano-Anollés G (2010) Emergence and evolution of modern molecular functions inferred from phylogenomic analysis of ontological data. Mol Biol Evol 27:1710–1733PubMedGoogle Scholar
  56. Kim HS, Mittenthal JE, Caetano-Anollés G (2006) MANET: tracing evolution of protein architecture in metabolic networks. BMC Bioinformatics 7:351PubMedGoogle Scholar
  57. Kluge AG, Farris JS (1969) Quantitative phyletics and the evolution of anurans. Syst Zool 30:1–32Google Scholar
  58. Kummerfeld SK, Teichmann SA (2009) Protein domain organization: adding order. BMC Bioinformatics 10:39PubMedGoogle Scholar
  59. Kurland CG, Canback B, Berg OG (2003) Horizontal gene transfer: a critical view. Proc Natl Acad Sci USA 100:9658–9662PubMedGoogle Scholar
  60. Leibundgut M, Frick C, Thanbichler M, Böck A, Ban N (2005) Selenocysteine tRNA-specific elongation factor SelB is a structural chimaera of elongation and initiation factors. EMBO J 24:11–22PubMedGoogle Scholar
  61. Lesk AM (2001) Introduction to protein architecture. Oxford University Press, New York, USAGoogle Scholar
  62. Levitt M (2009) Nature of the protein universe. Proc Natl Acad Sci USA 106:11079–11084PubMedGoogle Scholar
  63. Li J, Browning S, Mahal SP, Oelschiegel AM, Weissmann C (2010) Darwinian evolution of prions in cell culture. Science 327:869–872PubMedGoogle Scholar
  64. Maguire BA, Beniaminov AD, Ramu H, Mankin AS, Zimmermann RA (2005) A protein component at the heart of an RNA machine: the importance of protein L27 for the function of the bacterial ribosome. Molecular Cell 20:427–435PubMedGoogle Scholar
  65. Marahiel MA (2009) Working outside the protein-synthesis rules: insights into non-ribosomal peptide synthesis. J Pept Sci 15:799–807PubMedGoogle Scholar
  66. Moore P (2005) The GTPase switch in ribosomal translocation. J Biol 4:7PubMedGoogle Scholar
  67. Moore AD, Björklund ÅK, Ekman D, Bornberg-Buer E, Elofsson A (2008) Arrangements in the modular evolution of proteins. Trends Biochem Sci 33:444–451PubMedGoogle Scholar
  68. Morowitz HJ (1999) A theory of biochemical organization, metabolic pathways, and evolution. Complexity 4:39–53Google Scholar
  69. Murzin AG, Brenner SE, Hubbard TH, Chothia C (1995) SCOP: the structural classification of proteins database. J Mol Biol 247:536–540PubMedGoogle Scholar
  70. Nixon KC (1999) The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407–414Google Scholar
  71. Orgel LE (2000) Self-organizing biochemical cycles. Proc Natl Acad Sci USA 97:12503–12507PubMedGoogle Scholar
  72. Philippe H, Laurent J (1998) How good are deep phylogenetic trees? Curr Opin Genet Dev 8:616–623PubMedGoogle Scholar
  73. Raff R (1996) The shape of life. University of Chicago Press, ChicagoGoogle Scholar
  74. Ranea JAG, Sillero A, Thornton JM, Orengo CA (2006) Protein superfamily evolution and the last universal common ancestor (LUCA). J Mol Evol 63:513–525PubMedGoogle Scholar
  75. Rodnina MV, Wintermeyer W (2009) Recent mechanistic insights into eukaryotic ribosomes. Curr Opin Cell Biol 21:435–443PubMedGoogle Scholar
  76. Schimmel P (2009) Development of tRNA synthetases and connection to genetic code and disease. Protein Sci 17:1643–1652Google Scholar
  77. Schimmel P, Ribas de Pouplana L (2000) Footprints of aminoacyl-tRNA synthetases are everywhere. Trends Genet 25:207–209Google Scholar
  78. Schuster P (2010) Genotypes and phenotypes in the evolution of molecules. In: Caetano-Anollés G (ed) Evolutionary genomics systems biology. Wiley, Hoboken, pp 123–152Google Scholar
  79. Seiradake E, Mao W, Hernandez V, Baker SJ, Plattner JJ, Alley MRK, Cusack S (2009) Structure of the human cytosolic leucyl-tRNA synthetase editing domain. J Mol Biol 390:196–207PubMedGoogle Scholar
  80. Sun F-J, Caetano-Anollés G (2008a) Evolutionary patterns in the sequence and structure of transfer RNA: a window into early translation and the genetic code. PLoS ONE 3:e2799PubMedGoogle Scholar
  81. Sun F-J, Caetano-Anollés G (2008b) The origin and evolution of tRNA inferred from phylogenetic analysis of structure. J Mol Evol 66:21–35PubMedGoogle Scholar
  82. Sun F-J, Caetano-Anollés G (2009) The evolutionary history of the structure of 5S ribosomal RNA. J Mol Evol 69:430–443PubMedGoogle Scholar
  83. Sun F-J, Caetano-Anollés G (2010) The ancient history of the structure of ribonuclease P and the early origins of Archaea. BMC Bioinformatics 11:153PubMedGoogle Scholar
  84. Swofford DL (2002) Phylogenetic Analysis Using Parsimony and Other Programs (PAUP*). Ver 4.0b10. Sinauer, Sunderland, MAGoogle Scholar
  85. Trefil J, Morowitz HJ, Smith E (2009) The origins of life. Am Sci 97:206–213Google Scholar
  86. Tress ML, Ezkurdia A, Richardson JS (2009) Target domain definition and classification in CAP8. Proteins 77:10–17PubMedGoogle Scholar
  87. Vetsigian K, Woese CR, Goldenfeld N (2006) Collective evolution and the genetic code. Proc Natl Acad Sci USA 103:10696–10701PubMedGoogle Scholar
  88. Vogel C (2005) Function annotation of SCOP domain superfamilies 1.69. Superfamily—HMM library and genome assignments server.
  89. Vogel C, Chothia C (2006) Protein family expansions and biological complexity. PLoS Comp Biol 2:e48Google Scholar
  90. Voorhees RM, Weixlbaumer A, Loakes D, Kelley AC, Ramakrishnan V (2009) Insights into substrate stabilization from snapshots of the peptidyl transferase center of the intact 70S ribosome. Nat Struct Mol Biol 16:528–533PubMedGoogle Scholar
  91. Wächtershäuser G (1990) Evolution of the first metabolic cycles. Proc Natl Acad Sci USA 87:200–204PubMedGoogle Scholar
  92. Wächtershäuser G (2007) On the chemistry and evolution of the pioneer organism. Chem Biodivers 4:584–602PubMedGoogle Scholar
  93. Wang M, Caetano-Anollés G (2006) Global phylogeny determined by the combination of protein domains in proteomes. Mol Biol Evol 23:2444–2454PubMedGoogle Scholar
  94. Wang M, Caetano-Anollés G (2009) The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure 17:66–78PubMedGoogle Scholar
  95. Wang M, Boca SM, Kalelkar R, Mittenthal JE, Caetano-Anollés G (2006) A phylogenomic reconstruction of the protein world based on a genomic census of protein fold architecture. Complexity 12:27–40Google Scholar
  96. Wang M, Yafremava LS, Caetano-Anollés D, Mittenthal JE, Caetano-Anollés G (2007) Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genome Res 17:1572–1585PubMedGoogle Scholar
  97. Wang M, Jiang Y-Y, Kim KM, Qu G, Ji HF, Mittenthal JE, Zhang H-Y, Caetano-Anollés G (2010) A universal molecular clock of protein folds and its power in tracing the early history of aerobic metabolism and planet oxygenation. Mol Biol Evol [Epub ahead of print]Google Scholar
  98. Woese CR (1998) The universal ancestor. Proc Natl Acad Sci USA 95:6854–6859PubMedGoogle Scholar
  99. Woese CR (2002) On the evolution of cells. Proc Natl Acad Sci USA 99:8742–8747PubMedGoogle Scholar
  100. Wolf YI, Aravind L, Grishin NV, Koonnin EV (1999) Evolution of aminoacyl-tRNA synthetases—analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res 9:689–710PubMedGoogle Scholar
  101. Yang S, Bourne PE (2009) The evolutionary history of protein domains viewed by species phylogeny. PLoS ONE 4:e8378PubMedGoogle Scholar
  102. Yang S, Doolittle RF, Bourne PE (2005) Phylogeny determined based on protein domain content. Proc Natl Acad Sci USA 102:373–378PubMedGoogle Scholar
  103. Ycas M (1974) On earlier states of the biochemical system. J Theor Biol 44:145–160PubMedGoogle Scholar
  104. Yomo T, Saito S, Sasai M (1999) Gradual development of protein-like global structures through functional selection. Nat Struct Biol 6:743–746PubMedGoogle Scholar
  105. Yusupov MM, Yusupov GZ, Baucom A, Lieberman L, Earnest TN, Cate JHD, Noller HF (2001) Crystal structure of the ribosome at 5.5 Å resolution. Science 292:883–896PubMedGoogle Scholar
  106. Zavialov AV, Hauryliuk VV, Ehrenberg M (2005) Guanine-nucleotide exchange on ribosome-bound elongation factor G initiates the translocation of tRNAs. J Biol 4:9PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Derek Caetano-Anollés
    • 1
    • 2
  • Kyung Mo Kim
    • 1
  • Jay E. Mittenthal
    • 3
  • Gustavo Caetano-Anollés
    • 1
    Email author
  1. 1.Evolutionary Bioinformatics Laboratory, Department of Crop SciencesUniversity of IllinoisUrbanaUSA
  2. 2.School of Molecular and Cellular BiologyUniversity of IllinoisUrbanaUSA
  3. 3.Department of Cell and Developmental BiologyUniversity of IllinoisUrbanaUSA

Personalised recommendations