Journal of Molecular Evolution

, Volume 65, Issue 4, pp 425–436

Signature of a Primitive Genetic Code in Ancient Protein Lineages



The genetic code is the syntactic foundation underlying the structure and function of every protein in the history of the biological world. Its highly ordered degenerate complexity suggests an incremental evolution, the result of a combination of selective, mechanistic, and random processes. These evolutionary processes are still poorly understood and remain an open question in the study of early life on Earth. We perform a compositional analysis of ribosomal proteins and ATPase subunits in bacterial and archaeal lineages, using conserved positions that came and remained under purifying selection before and up to the most recent common ancestor. An observable shift in amino acid usage at these conserved positions likely provides an untapped window into the history of protein sequence space, allowing events of genetic code expansion to be identified. We identify Cys, Glu, Phe, Ile, Lys, Val, Trp, and Tyr as recent additions to the genetic code, with Asn, Gln, Gly, and Leu among the more ancient. Our observations are consistent with a scenario in which genetic code expansion primarily favored amino acids that promoted an increase in polypeptide size and functionality. We propose that this expansion would have been critical in the takeover of many RNA-mediated processes, as well as the addition of novel biological functions inaccessible to an RNA-based physiology, such as crossing lipid membranes. Thus, expansion of the genetic code likely set the stage for the transition from RNA-based to protein-based life.


Evolution of the genetic code RNA world Most recent common ancestor Ribosome Ribosomal proteins ATPase Origin 

Supplementary material

239_2007_9024_MOESM1_ESM.pdf (322 kb)
ESM1 (PDF 321 kb)


  1. Betts M, Russell R (2003) Amino acid properties and consequences of substitutions. Wiley, West SussexGoogle Scholar
  2. Brooks D, Fresco J, Singh M (2004) A novel method for estimating ancestral amino acid composition and its application to proteins of the Last Universal Ancestor. Bioinformatics 20:2251–2257PubMedCrossRefGoogle Scholar
  3. Brown J, Doolittle W (1995) Root of the universal tree of life based on ancient aminoacyl-tRNA synthetase gene duplications. Proc Natl Acad Sci USA 92:2441–2445PubMedCrossRefGoogle Scholar
  4. Brown JR, Doolittle WF (1999) Gene descent, duplication, and horizontal transfer in the evolution of glutamyl- and glutaminyl-tRNA synthetases. J Mol Evol 49:485–495PubMedCrossRefGoogle Scholar
  5. Bywater R, Thomas D, Vriend G (2001) A sequence and structural study of transmembrane helices. J Comput Aided Mol Des 15:533–552PubMedCrossRefGoogle Scholar
  6. Cavalcanti A, Leite E, Neto B, Ferreira R (2004) On the classes of aminoacyl-tRNA synthetases, amino acids and the genetic code. Orig Life Evol Biosph 34:407–420PubMedCrossRefGoogle Scholar
  7. Collins L, Penny D (2005) Complex splicesomal organization ancestral to extant eukaryotes. Mol Biol Evol 22:1053–1066PubMedCrossRefGoogle Scholar
  8. Cummings L, Riley L, Black L, Souyoroy A, Resenchuk S, Dondoshansky I, Tatusova T (2002) Genomic BLAST: custom-defined virtual databases for complete and unfinished genomes. FEMS Microbiol Lett 216:133–138PubMedCrossRefGoogle Scholar
  9. Davis B (1999) Evolution of the genetic code. Prog Biophys Mol Biol 72:157–243PubMedCrossRefGoogle Scholar
  10. Delaye L, Becerra A, Lazcano A (2005) The last common ancestor: what’s in a name? Orig Life Evol Biosph 35:537–554PubMedCrossRefGoogle Scholar
  11. Di Giulio M (2006) The non-monophyletic origin of the tRNA molecule and the origin of genes only after the evolutionary stage of the last universal common ancestor (LUCA). J Theor Biol 240:343–352PubMedCrossRefGoogle Scholar
  12. Douzery E, Delsuc F, Philippe H (2006) Molecular dating in the genomic era. Med Sci (Paris) 22:374–380Google Scholar
  13. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797PubMedCrossRefGoogle Scholar
  14. Gogarten JP, Taiz L (1992) Evolution of proton pumping ATPases: rooting the tree of life. Photosynth Res 33:137–146CrossRefGoogle Scholar
  15. Gogarten-Boekels M, Hilario E, Gogarten J (1995) The effects of heavy meteorite bombardment on the early evolution–the emergence of the three domains of life. Orig Life Evol Biosph 25:251–264PubMedCrossRefGoogle Scholar
  16. Gogarten J, Kibak H, Dittrich P, Taiz L, Bowman E, Bowman B, Manolson M, Poole R, Date T, Oshima T, Konishi J, Denda K, Yoshida M (1989) Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci USA 86:6661–6665PubMedCrossRefGoogle Scholar
  17. Gribaldo S, Cammarano P (1998) The root of the universal tree of life inferred from anciently duplicated genes encoding components of the protein-targeting machinery. J Mol Evol 47:508–516PubMedCrossRefGoogle Scholar
  18. Harris J, Kelley S, Spiegelman G, Pace N (2003) The genetic core of the universal ancestor. Genome Res 13:407–412PubMedCrossRefGoogle Scholar
  19. Hartlein M, Cusack S (1995) Structure, function and evolution of seryl-tRNA synthetases: implications for the evolution of aminoacyl-tRNA synthetases and the genetic code. J Mol Evol 40:519–30PubMedCrossRefGoogle Scholar
  20. Hartman H (1975) Speculations on the evolution of the genetic code. Orig Life 6:423–427PubMedCrossRefGoogle Scholar
  21. Hartman H (1978) Speculations on the evolution of the genetic code II. Orig Life 9:133–136PubMedCrossRefGoogle Scholar
  22. Hartman H (1984) Speculations on the evolution of the genetic code III: the evolution of t-RNA. Orig Life 14:407–412Google Scholar
  23. Hartman H (1995) Speculations on the evolution of the genetic code IV: the evolution of the aminoacyl-tRNA synthetases. Orig Life Evol Biosph 25:265–269PubMedCrossRefGoogle Scholar
  24. Higgs P, Purdritz R (2006) From protoplanetary disks to prebiotic amino acids and the origin of the genetic code. Cambridge University PressGoogle Scholar
  25. Holbrook S (2005) RNA structure: the long and the short of it. Curr Opin Struct Biol 15:302–308PubMedCrossRefGoogle Scholar
  26. Ibba M, Celic I, Curnow A, Kim H, Pelaschier J, Tumbula D, Vothknecht U, Woese C, Soll D (1997) Aminoacyl-tRNA synthesis in Archaea. Nucleic Acids Symp Ser 37:305–306PubMedGoogle Scholar
  27. Imlay J (2006) Iron-sulfur clusters and the problem with oxygen. Mol Microbiol 59:1073–1082PubMedCrossRefGoogle Scholar
  28. Jadhav V, Yarus M (2002) Coenzymes as coribozymes. Biochimie 84:877–888PubMedCrossRefGoogle Scholar
  29. Jordan IK, Kondrashov FA, Adzhubei IA, Wolf YI, Koonin EV, Kondrashov AS, Sunyaev S (2005) A universal trend of amino acid gain and loss in protein evolution. Nature 433:633–638PubMedCrossRefGoogle Scholar
  30. Keefe AD, Lazcano A, Miller SL (1995) Evolution of the biosynthesis of the branched-chain amino acids. Orig Life Evol Biosph 25:99–110PubMedCrossRefGoogle Scholar
  31. Klipcan L, Safro M (2004) Amino acid biogenesis, evolution of the genetic code and aminoacyl-tRNA synthetases. J Theor Biol 228:389–396PubMedCrossRefGoogle Scholar
  32. Knight RD, Freeland SJ, Landweber LF (1999) Selection, history and chemistry: the three faces of the genetic code. Trends Biochem Sci 24:241–247PubMedCrossRefGoogle Scholar
  33. Koonin EV (2003) Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol 1:127–136PubMedCrossRefGoogle Scholar
  34. McDonald JH (2006) Apparent trends of amino Acid gain and loss in protein evolution due to nearly neutral variation. Mol Biol Evol 23:240–244PubMedCrossRefGoogle Scholar
  35. Nagel GM, Doolittle RF (1995) Phylogenetic analysis of the aminoacyl-tRNA synthetases. J Mol Evol 40:487–498PubMedCrossRefGoogle Scholar
  36. Nisbet E, Sleep N (2001) The habitat and nature of early life. Nature 409:1083–1091PubMedCrossRefGoogle Scholar
  37. Nishida H, Nishiyama M, Kobashi N, Kosuge T, Hoshino T, Yamane H (1999) A prokaryotic gene cluster involved in synthesis of lysine through the amino adipate pathway: a key to the evolution of amino acid biosynthesis. Genome Res 409:1175–1183CrossRefGoogle Scholar
  38. Nixon J, Wang A, Morrison H, McArthur A, Sogin M, Loftus B, Samuelson J (2002) A splicesomal intron in Giardia lamblia. Proc Natl Acad Sci USA 99:3701–3705PubMedCrossRefGoogle Scholar
  39. Noller HF, Hoang L, Fredrick K (2005) The 30S ribosomal P site: a function of 16S rRNA. FEBS Lett 579:855–858PubMedCrossRefGoogle Scholar
  40. Poirot O, Suhre K, Abergel C, O’Toole E, Notredame C (2004) 3DCoffee@igs: a web server for combining sequences and structures into a multiple sequence alignment. Nucleic Acids Res 32:W37–W40PubMedCrossRefGoogle Scholar
  41. Sadeghi M, Naderi-Manesh H, Zarrabi M, Ranjbar B (2006) Effective factors in thermostability of thermophilic proteins. Biophys Chem 119:256–270PubMedCrossRefGoogle Scholar
  42. Saran D, Frank J, Burke DH (2003) The tyranny of adenosine recognition among RNA aptamers to coenzyme A. BMC Evol Biol 3:26PubMedCrossRefGoogle Scholar
  43. Shih P, Pedersen LG, Gibbs PR, Wolfenden R (1998) Hydrophobicities of the nucleic acid bases: distribution coefficients from water to cyclohexane. J Mol Biol 280:421–430PubMedCrossRefGoogle Scholar
  44. Sorimachi K, Itoh T, Kawarabayasi Y, Okayasu T, Akimoto K, Niwa A (2001) Conservation of the basic pattern of cellular amino acid composition of archaeobacteria during biological evolution and the putative amino acid composition of primitive life forms. Amino Acids 21:393–399PubMedCrossRefGoogle Scholar
  45. Stoltzfus A (1999) On the possibility of constructive neutral evolution. J Mol Evol 49:169–181PubMedCrossRefGoogle Scholar
  46. Syvanen M (2002) Recent emergence of the modern genetic code: a proposal. Trends Genet 18:245–248PubMedCrossRefGoogle Scholar
  47. Thompson J, Higgins D, Gibson T (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighing, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedCrossRefGoogle Scholar
  48. Trifonov EN (2004) The triplet code from first principles. J Biomol Struct Dyn 22:1–11PubMedGoogle Scholar
  49. Tumbula D, Vothknecht UC, Kim HS, Ibba M, Min B, Li T, Pelaschier J, Stathopoulos C, Becker H, Soll D (1999) Archaeal aminoacyl-tRNA synthesis: diversity replaces dogma. Genetics 152:1269–1276PubMedGoogle Scholar
  50. Tumbula-Hansen D, Feng L, Toogood H, Stetter KO, Soll D (2002) Evolutionary divergence of the archaeal aspartyl-tRNA synthetases into discriminating and nondiscriminating forms. J Biol Chem 277:37184–37190PubMedCrossRefGoogle Scholar
  51. Velasco AM, Leguina JI, Lazcano A (2002) Molecular evolution of the lysine biosynthetic pathways. J Mol Evol 55:445–459PubMedCrossRefGoogle Scholar
  52. Vlassov A (2005) How was membrane permeability produced in an RNA world? Orig Life Evol Biosph 35:135–149PubMedCrossRefGoogle Scholar
  53. Vogel H (1964) Distribution of lysine pathways among fungi: evolutionary implications. Am Nat 98:446–455CrossRefGoogle Scholar
  54. Weber AL, Miller SL (1981) Reasons for the occurrence of the twenty coded protein amino acids. J Mol Evol 17:273–284PubMedCrossRefGoogle Scholar
  55. Wong JT (2005) Coevolution theory of the genetic code at age thirty. Bioessays 27:416–425PubMedCrossRefGoogle Scholar
  56. Zhaxybayeva O, Lapierre P, Gogarten JP (2005) Ancient gene duplications and the root(s) of the tree of life. Protoplasma 227:53–64PubMedCrossRefGoogle Scholar
  57. Zuckerkandl E, Derancourt J, Vogel H (1971) Mutational trends and random processes in the evolution of informational macromolecules. J Mol Biol 59:473–490PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Molecular and Cell BiologyUniversity of ConnecticutStorrsUSA

Personalised recommendations