Abstract
Twenty-seven protein sequence elements, six to nine amino acids long, were extracted from 15 phylogenetically diverse complete prokaryotic proteomes. The elements are present in all of these proteomes, with at least one copy each (omnipresent elements), and have presumably been conserved since the last universal common ancestor (LUCA). All these omnipresent elements are identified in crystallized protein structures as parts of highly conserved closed loops, 25–30 residues long, thus representing the closed-loop modules discovered in 2000 by Berezovsky et al. The omnipresent peptides make up seven distinct groups, of which the largest groups, Aleph and Beth, contain 18 and four elements, respectively, which are related but different, while five other groups are represented by only one element each. The LUCA modules appear with one or several copies per protein molecule in a variety of combinations depending on the functional identity of the corresponding protein. The functional involvement of individual LUCA modules is outlined on the basis of known protein annotations. Analyses of all the related sequences in a large, formatted protein sequence space suggest that many, if not all, of the 27 omnipresent elements have a common sequence origin. This sequence space network analysis may lead to elucidation of the earliest stages of protein evolution.
Similar content being viewed by others
References
Aharonovsky E, Trifonov EN (2005) Protein sequence modules. J Biomol Struct Dynam 23:237–242
Berezovsky IN, Grosberg AY, Trifonov EN (2000) Closed loops of nearly standard size: common basic element of protein structure. FEBS Lett 466:283–286
Berezovsky IN, Kirzhner A, Kirzhner VM, Rosenfeld VR, Trifonov EN (2003a). Protein sequences yield a proteomic code. J Biomol Struct Dynam 21:317–325
Berezovsky IN, Kirzhner A, Kirzhner VM, Trifonov EN (2003b). Spelling protein structure. J Biomol Struct Dynam 21:327–339
Berezovsky IN, Trifonov EN (2002) Flowering buds of globular proteins: transpiring simplicity of protein organization. Comp Funct Genom 3:525–534
Frenkel ZM, Trifonov EN (2005) Closed loops of TIM barrel protein fold. J Biomol Struct Dynam 22:643–655
Frenkel ZM, Trifonov EN (2007a) Walking through protein sequence space. J Theor Biol 244:77–80
Frenkel ZM, Trifonov EN (2007b) Walking through the protein sequence space: Towards new generation of the homology modeling. Proteins 67:271–284
Frenkel ZM, Trifonov EN (2007c) Evolutionary networks in the formatted protein sequence space. J Comput Biol 14:1044–1057
Gusfield D (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge
Maynard Smith J (1970) Natural selection and the concept of a protein space. Nature 225:563–564
Sobolevsky Y, Trifonov EN (2005) Conserved sequences of prokaryotic proteomes and their compositional age. J Mol Evol 61:591–596
Sobolevsky Y, Trifonov EN (2006) Protein modules conserved since LUCA. J Mol Evol 63:622–634
Trifonov EN (2004) The triplet code from first principles. J Biomol Struct Dynam 22:1–11
Trifonov EN (2006a) Theory of early molecular evolution: Predictions and confirmations. In: Eisenhaber F (ed) Discovering biomolecular mechanisms with computational biology. Landes Bioscience, Georgetown, pp. 107–116
Trifonov EN (2006b) Early molecular evolution. Isr J Ecol Evol 52:375–387
Trifonov EN, Berezovsky IN (2003) Evolutionary aspects of protein structure and folding. Curr Opin Struct Biol 13:110–114
Trifonov EN, Gabdank I, Barash D, Sobolevsky Y (2006) Primordia vita. Deconvolution from modern sequences. Orig Life Evol Biosph 36:559–565
Trifonov EN, Kirzhner A, Kirzhner VM, Berezovsky IN (2001) Distinct stages of protein evolution as suggested by protein sequence analysis. J Mol Evol 53:394–401
Walker JE, Saraste M, Runswick MJ, Gay NJ (1982) Distantly related sequences in the alpha-subunits and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. Embo J 1:945–951
Acknowledgments
This work was supported in part by an ISF grant 710/02-19.0 and by a Center for Complexity Science grant (GR2006-018.) Z.M.F. is also supported by the Ministry of Absorption. Comments and suggestions by colleagues of the Genome Diversity Center are highly appreciated.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sobolevsky, Y., Frenkel, Z.M. & Trifonov, E.N. Combinations of Ancestral Modules in Proteins. J Mol Evol 65, 640–650 (2007). https://doi.org/10.1007/s00239-007-9032-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-007-9032-x