Abstract
The origin of life has puzzled molecular scientists for over half a century. Yet fundamental questions remain unanswered, including which came first, the metabolic machinery or the encoding nucleic acids. In this study we take a protein-centric view and explore the ancestral origins of proteins. Protein domain structures in proteomes are highly conserved and embody molecular functions and interactions that are needed for cellular and organismal processes. Here we use domain structure to study the evolution of molecular function in the protein world. Timelines describing the age and function of protein domains at fold, fold superfamily, and fold family levels of structural complexity were derived from a structural phylogenomic census in hundreds of fully sequenced genomes. These timelines unfold congruent hourglass patterns in rates of appearance of domain structures and functions, functional diversity, and hierarchical complexity, and revealed a gradual build up of protein repertoires associated with metabolism, translation and DNA, in that order. The most ancient domain architectures were hydrolase enzymes and the first translation domains had catalytic functions for the aminoacylation and the molecular switch-driven transport of RNA. Remarkably, the most ancient domains had metabolic roles, did not interact with RNA, and preceded the gradual build-up of translation. In fact, the first translation domains had also a metabolic origin and were only later followed by specialized translation machinery. Our results explain how the generation of structure in the protein world and the concurrent crystallization of translation and diversified cellular life created further opportunities for proteomic diversification.
Similar content being viewed by others
Abbreviations
- aRS:
-
Aminoacyl-tRNA synthetase
- F:
-
Fold
- FSF:
-
Fold superfamily
- FF:
-
Fold family
- Nd:
-
Node distance
- r-Protein:
-
Ribosomal protein
- SCOP:
-
Structural classification of proteins
References
Altman S (2009) A view of RNase P. Mol Biosys 3:604–607
Andreeva A, Howorth D, Chandonia J-M, Brenner SE, Hubbard TJP, Chothia C, Murzin AG (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419–D425
Archie JW (1989) Homoplasy excess ratios: new indices for measuring levels of homoplasy in phylogenetic systematics and a critique of the consistency index. Syst Zool 38:253–269
Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Bacher JM, Waas WF, Metzgar D, de Crecy-Lagard V, Schimmel P (2007) Genetic code ambiguity confers a selective advantage on Acinetobacter baybili. J Bacteriol 189:6469–6496
Bagley RJ, Farmer JD, Fontana W (1991) Evolution of metabolism. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. Studies in the science of complexity, vol X. Addison-Wesley, Princeton, pp 141–158
Berchtold H, Reshetnikova L, Reiser COA, Schirmer NK, Sprinzl M, Hilgenfeld R (1993) Crystal structure of active elongation factor Tu reveals major domain rearrangements. Nature 365:126–132
Bogdanov AA, Dontsova OA, Dokudovskaya SS, Lavrik IN (1995) Structure and function of 5S rRNA in the ribosome. Biochem Cell Biol 73:869–876
Brenner SE, Kohl P, Levitt M (2000) The ASTRAL compendium for protein and sequence analysis. Nucleic Acids Res 29:254–256
Britton RA (2009) Role of GTPases in bacterial ribosome assembly. Annu Rev Microbiol 63:155–176
Caetano-Anollés G (2002) Tracing the evolution of RNA structure in ribosomes. Nucleic Acids Res 30:2575–2587
Caetano-Anollés G, Caetano-Anollés D (2003) An evolutionarily structured universe of protein architecture. Genome Res 13:1563–1571
Caetano-Anollés G, Kim HS, Mittenthal JE (2007) The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture. Proc Natl Acad Sci USA 104:9358–9363
Caetano-Anollés G, Wang M, Caetano-Anollés D, Mittenthal JE (2009a) The origin, evolution and structure of the protein world. Biochem J 417:621–637
Caetano-Anollés G, Yafremava LS, Gee H, Caetano-Anollés D, Kim HS, Mittenthal JE (2009b) The origin and evolution of modern metabolism. Intl J Biochem Cell Biol 41:285–297
Caetano-Anollés G, Yafremava LS, Mittenthal JE (2010) Modularity and dissipation in evolution of macromolecular structures, functions, and networks. In: Caetano-Anollés G (ed) Evolutionary genomics and systems biology. Wiley, Hoboken, pp 431–450
Choi I-G, Kim S-H (2007) Global extent of horizontal gene transfer. Proc Natl Acad Sci USA 104:4489–4494
Chothia C, Gough J (2009) Genomic and structural aspects of protein evolution. Biochem J 419:15–28
Collins LJ, Kurland CG, Biggs P, Penny D (2009) The modern RNP world of eukaryotes. J Hered 100:597–604
Coulson AFW, Moult J (2002) A unifold, mesofold, and superfold model of protein fold use. Proteins 46:61–71
Csaba G, Birzele F, Zimmer R (2009) Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct Biol 9:23
Daigle DM, Brown ED (2004) Studies of the interaction of Escherichia coli YjeQ with the ribosome in vitro. J Bacteriol 186:1381–1387
Danchin A, Fang G, Noria S (2007) The extant core bacterial proteome is an archive of the origin of life. Proteomics 7:875–889
Deutscher MP (1984) Processing of tRNA in prokaryotes and eukaryotes. CRC Crit Rev Biochem 17:45–71
Dokudovskaya S, Dontsova O, Shpanchenko O, Bogdanov A, Brimacombe R (1996) Loop IV of 5S ribosomal RNA has contacts both to domain II and to domain V of the 23S RNA. RNA 2:146–152
Doolittle WF (1999) Phylogenetic classification and the universal tree. Science 284:2124–2129
Doolittle RF (2005) Evolutionary aspects of whole-genome biology. Curr Opin Struct Biol 15:248–253
Dupont CL, Butcher A, Valas RE, Bourne PE, Caetano-Anollés G (2010) History of biological metal utilization inferred through phylogenomic analysis of protein structure. Proc Natl Acad Sci USA 107:10567–10572
Egel R (2009) Peptide-dominated membranes preceding the genetic takeover by RNA: latest thinking on a classic controversy. BioEssays 31:1100–1109
Ellington AD, Chen X, Robertson M, Syrett A (2009) Evolutionary origins and directed evolution of RNA. Intl J Biochem Cell Biol 41:254–265
Forslund K, Henricson A, Hollich V, Sonnhammer E (2008) Domain tree-based analysis of protein architecture evolution. Mol Biol Evol 25:254–264
Fox SW (1980) Metabolic microspheres. Naturwissenschaften 67:378–383
Freeland SJ, Knight RD, Landweber LF (1999) Do proteins predate DNA. Science 286:690–692
Gesteland RF, Cech TR, Atkins JF (2006) The RNA world, 3rd edn. Cold Spring Harbor Laboratory Press, New York
Goldman AD, Samudrala R, Baross JA (2010) The evolution and functional repertoire of translation proteins following the origin of life. Biol Direct 5:15
Gough J (2005) Convergent evolution of domain architectures (is rare). Bioinformatics 21:1464–1471
Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a library of Hidden Markov Models that represent all proteins of known structure. J Mol Biol 313:903–919
Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA (2007) The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 35:D291–D297
Hartwell LH, Hopfield JJ, Leibler S, Murray AW (1999) From molecular to modular cell biology. Nature 402:C47–C52
Hillis DM, Huelsenbeck JP (1992) Signal, noise, and reliability in molecular phylogenetic analysis. J Hered 83:189–195
Holland T, Veretnik S, Shindyalov I, Bourne P (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol 361:562–590
Holzmann J, Frank P, Löfler E, Bennett KL, Gerner C, Rossmanith W (2008) RNase P without RNA: identification and functional reconstitution of the human mitochondrial tRNA processing enzyme. Cell 135:462–474
Hoogstraten CG, Sumita M (2007) Structure-function relationships in RNA and RNP enzymes: recent advances. Biopolymers 87:317–328
Huber C, Wächtershäuser G (2007) α-Hydroxy and α-amino acids under possible Hadean, volcanic origin-of-life conditions. Science 314:630–632
Ikehara K (2009) Pseudo-replication of [GADV]-proteins and origin of life. Int J Mol Sci 10:1525–1537
Jain R, Rivera MC, Lake JA (1999) Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci USA 96:3801–3806
Jeffares DC, Poole AM, Penny D (1998) Relics from the RNA world. J Mol Evol 46:18–36
Ji HF, Kong DX, Shen L, Chen LL, Ma BG, Zhang HY (2007) Distribution patterns of small molecule ligands in the protein universe and implications for origins of life and drug discovery. Genome Biol 8:R176
Joyce GF (2002) The antiquity of RNA-based evolution. Nature 418:214–221
Kacser H, Beeby R (1984) On the origin of enzyme species by means of natural selection. J Mol Evol 20:38–51
Karplus K (2009) SAM-T08, HHM-based protein structure prediction. Nucleic Acids Res 37:W492–W497
Kauffmann SA (1986) Autocatalytic sets of proteins. J Theor Biol 119:1–24
Kauffmann SA (1993) The origins of order. Oxford University Press, New York
Kauffmann SA (2007) Question 1: origin of life and the living state. Orig Life Evol Biosph 37:315–322
Kim KM, Caetano-Anollés G (2010) Emergence and evolution of modern molecular functions inferred from phylogenomic analysis of ontological data. Mol Biol Evol 27:1710–1733
Kim HS, Mittenthal JE, Caetano-Anollés G (2006) MANET: tracing evolution of protein architecture in metabolic networks. BMC Bioinformatics 7:351
Kluge AG, Farris JS (1969) Quantitative phyletics and the evolution of anurans. Syst Zool 30:1–32
Kummerfeld SK, Teichmann SA (2009) Protein domain organization: adding order. BMC Bioinformatics 10:39
Kurland CG, Canback B, Berg OG (2003) Horizontal gene transfer: a critical view. Proc Natl Acad Sci USA 100:9658–9662
Leibundgut M, Frick C, Thanbichler M, Böck A, Ban N (2005) Selenocysteine tRNA-specific elongation factor SelB is a structural chimaera of elongation and initiation factors. EMBO J 24:11–22
Lesk AM (2001) Introduction to protein architecture. Oxford University Press, New York, USA
Levitt M (2009) Nature of the protein universe. Proc Natl Acad Sci USA 106:11079–11084
Li J, Browning S, Mahal SP, Oelschiegel AM, Weissmann C (2010) Darwinian evolution of prions in cell culture. Science 327:869–872
Maguire BA, Beniaminov AD, Ramu H, Mankin AS, Zimmermann RA (2005) A protein component at the heart of an RNA machine: the importance of protein L27 for the function of the bacterial ribosome. Molecular Cell 20:427–435
Marahiel MA (2009) Working outside the protein-synthesis rules: insights into non-ribosomal peptide synthesis. J Pept Sci 15:799–807
Moore P (2005) The GTPase switch in ribosomal translocation. J Biol 4:7
Moore AD, Björklund ÅK, Ekman D, Bornberg-Buer E, Elofsson A (2008) Arrangements in the modular evolution of proteins. Trends Biochem Sci 33:444–451
Morowitz HJ (1999) A theory of biochemical organization, metabolic pathways, and evolution. Complexity 4:39–53
Murzin AG, Brenner SE, Hubbard TH, Chothia C (1995) SCOP: the structural classification of proteins database. J Mol Biol 247:536–540
Nixon KC (1999) The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407–414
Orgel LE (2000) Self-organizing biochemical cycles. Proc Natl Acad Sci USA 97:12503–12507
Philippe H, Laurent J (1998) How good are deep phylogenetic trees? Curr Opin Genet Dev 8:616–623
Raff R (1996) The shape of life. University of Chicago Press, Chicago
Ranea JAG, Sillero A, Thornton JM, Orengo CA (2006) Protein superfamily evolution and the last universal common ancestor (LUCA). J Mol Evol 63:513–525
Rodnina MV, Wintermeyer W (2009) Recent mechanistic insights into eukaryotic ribosomes. Curr Opin Cell Biol 21:435–443
Schimmel P (2009) Development of tRNA synthetases and connection to genetic code and disease. Protein Sci 17:1643–1652
Schimmel P, Ribas de Pouplana L (2000) Footprints of aminoacyl-tRNA synthetases are everywhere. Trends Genet 25:207–209
Schuster P (2010) Genotypes and phenotypes in the evolution of molecules. In: Caetano-Anollés G (ed) Evolutionary genomics systems biology. Wiley, Hoboken, pp 123–152
Seiradake E, Mao W, Hernandez V, Baker SJ, Plattner JJ, Alley MRK, Cusack S (2009) Structure of the human cytosolic leucyl-tRNA synthetase editing domain. J Mol Biol 390:196–207
Sun F-J, Caetano-Anollés G (2008a) Evolutionary patterns in the sequence and structure of transfer RNA: a window into early translation and the genetic code. PLoS ONE 3:e2799
Sun F-J, Caetano-Anollés G (2008b) The origin and evolution of tRNA inferred from phylogenetic analysis of structure. J Mol Evol 66:21–35
Sun F-J, Caetano-Anollés G (2009) The evolutionary history of the structure of 5S ribosomal RNA. J Mol Evol 69:430–443
Sun F-J, Caetano-Anollés G (2010) The ancient history of the structure of ribonuclease P and the early origins of Archaea. BMC Bioinformatics 11:153
Swofford DL (2002) Phylogenetic Analysis Using Parsimony and Other Programs (PAUP*). Ver 4.0b10. Sinauer, Sunderland, MA
Trefil J, Morowitz HJ, Smith E (2009) The origins of life. Am Sci 97:206–213
Tress ML, Ezkurdia A, Richardson JS (2009) Target domain definition and classification in CAP8. Proteins 77:10–17
Vetsigian K, Woese CR, Goldenfeld N (2006) Collective evolution and the genetic code. Proc Natl Acad Sci USA 103:10696–10701
Vogel C (2005) Function annotation of SCOP domain superfamilies 1.69. Superfamily—HMM library and genome assignments server. http://supfam.mrc-lmb.cam.ac.uk/beta_SUPERFAMILY/function.html
Vogel C, Chothia C (2006) Protein family expansions and biological complexity. PLoS Comp Biol 2:e48
Voorhees RM, Weixlbaumer A, Loakes D, Kelley AC, Ramakrishnan V (2009) Insights into substrate stabilization from snapshots of the peptidyl transferase center of the intact 70S ribosome. Nat Struct Mol Biol 16:528–533
Wächtershäuser G (1990) Evolution of the first metabolic cycles. Proc Natl Acad Sci USA 87:200–204
Wächtershäuser G (2007) On the chemistry and evolution of the pioneer organism. Chem Biodivers 4:584–602
Wang M, Caetano-Anollés G (2006) Global phylogeny determined by the combination of protein domains in proteomes. Mol Biol Evol 23:2444–2454
Wang M, Caetano-Anollés G (2009) The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure 17:66–78
Wang M, Boca SM, Kalelkar R, Mittenthal JE, Caetano-Anollés G (2006) A phylogenomic reconstruction of the protein world based on a genomic census of protein fold architecture. Complexity 12:27–40
Wang M, Yafremava LS, Caetano-Anollés D, Mittenthal JE, Caetano-Anollés G (2007) Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genome Res 17:1572–1585
Wang M, Jiang Y-Y, Kim KM, Qu G, Ji HF, Mittenthal JE, Zhang H-Y, Caetano-Anollés G (2010) A universal molecular clock of protein folds and its power in tracing the early history of aerobic metabolism and planet oxygenation. Mol Biol Evol [Epub ahead of print]
Woese CR (1998) The universal ancestor. Proc Natl Acad Sci USA 95:6854–6859
Woese CR (2002) On the evolution of cells. Proc Natl Acad Sci USA 99:8742–8747
Wolf YI, Aravind L, Grishin NV, Koonnin EV (1999) Evolution of aminoacyl-tRNA synthetases—analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res 9:689–710
Yang S, Bourne PE (2009) The evolutionary history of protein domains viewed by species phylogeny. PLoS ONE 4:e8378
Yang S, Doolittle RF, Bourne PE (2005) Phylogeny determined based on protein domain content. Proc Natl Acad Sci USA 102:373–378
Ycas M (1974) On earlier states of the biochemical system. J Theor Biol 44:145–160
Yomo T, Saito S, Sasai M (1999) Gradual development of protein-like global structures through functional selection. Nat Struct Biol 6:743–746
Yusupov MM, Yusupov GZ, Baucom A, Lieberman L, Earnest TN, Cate JHD, Noller HF (2001) Crystal structure of the ribosome at 5.5 Å resolution. Science 292:883–896
Zavialov AV, Hauryliuk VV, Ehrenberg M (2005) Guanine-nucleotide exchange on ribosome-bound elongation factor G initiates the translocation of tRNAs. J Biol 4:9
Acknowledgments
A substantial portion of this work is part of DCA’s undergraduate thesis. We thank Ajith Harish and Feng-Jie Sun for providing data on RNA-protein interactions, Minglei Wang for phylogenomic reconstruction, and Rakhee Kalelkar for help with construction of Z-diagrams. Research was supported by the National Science Foundation (MCB-0749836), the Illinois C-FAR program, CREES-USDA, and the International Atomic Energy Agency in Vienna. Any opinions, findings, and conclusions and recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Caetano-Anollés, D., Kim, K.M., Mittenthal, J.E. et al. Proteome Evolution and the Metabolic Origins of Translation and Cellular Life. J Mol Evol 72, 14–33 (2011). https://doi.org/10.1007/s00239-010-9400-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-010-9400-9