Journal of Molecular Evolution

, Volume 74, Issue 1, pp 1–34

The Phylogenomic Roots of Modern Biochemistry: Origins of Proteins, Cofactors and Protein Biosynthesis

  • Gustavo Caetano-Anollés
  • Kyung Mo Kim
  • Derek Caetano-Anollés

DOI: 10.1007/s00239-011-9480-1

Cite this article as:
Caetano-Anollés, G., Kim, K.M. & Caetano-Anollés, D. J Mol Evol (2012) 74: 1. doi:10.1007/s00239-011-9480-1


The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.


Aminoacyl-tRNA synthetasesNon-ribosomal protein synthesisOrigin of lifePhylogenetic analysisProtein domain structureRibonucleoprotein world



Aminoacyl-tRNA synthetase


Coenzyme A




Fold superfamily


Fold family


Node distance


Peptidyl carrier protein


Ribosomal protein


Structural classification of proteins

Supplementary material

239_2011_9480_MOESM1_ESM.doc (1.2 mb)
Supplementary material 1 (DOC 1.21 mb)

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Gustavo Caetano-Anollés
    • 1
  • Kyung Mo Kim
    • 2
  • Derek Caetano-Anollés
    • 1
  1. 1.Evolutionary Bioinformatics Laboratory, Department of Crop SciencesUniversity of IllinoisUrbanaUSA
  2. 2.Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB)DaejeonKorea