Journal of Molecular Evolution

, Volume 74, Issue 1, pp 1-34

The Phylogenomic Roots of Modern Biochemistry: Origins of Proteins, Cofactors and Protein Biosynthesis

  • Gustavo Caetano-AnollésAffiliated withEvolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Email author 
  • , Kyung Mo KimAffiliated withKorean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB)
  • , Derek Caetano-AnollésAffiliated withEvolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.


Aminoacyl-tRNA synthetases Non-ribosomal protein synthesis Origin of life Phylogenetic analysis Protein domain structure Ribonucleoprotein world