Abstract
A combinatorial sequence space (CSS) model was introduced to represent sequences as a set of overlapping k-tuples of some fixed length which correspond to points in the CSS. The aim was to analyze clusterization of protein sequences in the CSS and to test various hypotheses about the possible evolutionary basis of this clusterization. The authors developed an easy-to-use technique which can reveal and analyze such a clusterization in a multidimensional CSS. Application of the technique led to an unexpectedly high clusterization of points in the CSS corresponding to k-tuples from known proteins. The clusterization could not be inferred from nonuniform amino acid frequencies or be explained by the influence of homologous data. None of the tested possible evolutionary and structural factors could explain the clusterization observed either. It looked as if certain protein sequence variations occurred and were fixed in the early course of evolution. Subsequent evolution (predominantly neutral) allowed only a limited number of changes and permitted new variants which led to preservation of certain k-tuples during the course of evolution. This was consistent with the theory of exon shuffling and protein block structure evolution. Possible applications of sequence space features found were also discussed.
Similar content being viewed by others
References
Argos P (1987) Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. Strategies for protein folding and a guide for site-directed mutagenesis. J Mol Biol 197:331–348
Benner SA, Cohen MA, Gonnet GH (1993) Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol 229:1065–1082
Burbaum JJ, Starzyk RM, Schimmel P (1990) Understanding structural relationships in proteins of unsolved three-dimensional structure. Proteins 7:99–111
Cid H, Bunster M, Arriagada E (1982) Prediction of the secondary structure by means of the hydorphobicity character. FEBS Lett 150:247–254
Dayhoff MO, Schwartz RM, Orcutt BC (1979) A model of evolutionary change in proteins. Atlas Protein Structure 5(Suppl 3):345–352
Doolittle RF (1981) Similar amino acid sequences: chance or common ancestry? Science 214:149–159
Dorit RL, Schoenbach L, Gilbert W (1990) How big is the universe of exons? Science 250:1377–1382
Galat A (1982) A hypothesis on protein folding in vivo. J Biochem 14:883–886
George DG, Baker WC, Hunt LT (1986) The protein identification resources. Nucleic Acids Res 14(Suppl):11–16
Grantham R (1974) Amino acids difference formula to help explain protein evolution. Science 185:862–864
Holland SK, Blake CCF (1987) Proteins, exons and molecular evolution. Biosystems 20:181–206
Jones TA, Thirup S (1986) Using known substructures in protein model building and crystallography. EMBO J 5:819–822
Kabsch W, Sander C (1984) On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. Proc Natl Acad Sci USA 81:1075–1078
Ladunga I (1992) Phylogenic continuum indicates “galaxies” in the protein universe: preliminary results on the natural group structures of proteins. J Mol Evol 34:358–375
Levitt M (1992) Accurate modelling of protein conformation by automatic segment matching. J Mol Biol 226:507–533
Pascarella S, Argos P (1992) Analysis of insertions/deletions in protein structures. J Mol Biol 224:461–471
Patthy L (1991) Exons-original building blocks of proteins? Bioessays 13:187–192
Rackovsky S (1990) Quantitative organization of the known protein X-ray structures. I. Methods and short-length-scale results. Proteins 7:378–402
Rooman MJ, Wodak SJ (1988) Identification of predictive sequence motifs limited by protein structure data base size. Nature 335:45–49
Saroff HA (1984) The uniqueness of protein sequences. Uniqueness diagrams for the Dayhoff file—1984. Bull Math Biol 46:661–672
Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948
Unger R, Harel D, Wherland S, Sussman JL (1989) A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins 5:355–373
Vickerly LE (1987) Interactive analysis of protein structure using a microcomputer spreadsheet. Trends Biochem Sci 1:37–39
Wilson IA, Haft DH, Getzoff ED, Tainer JA, Lerner RA, Brenner S (1985) Identical short peptide sequences in unrelated proteins can have different conformations: A testing ground for theories of immune recognition. Proc Natl Acad Sci USA 82:5255–5259
Zuckerkandl E (1975) The appearance of new structures and functions in proteins during evolution. J Mol Evol 7:1–57
Author information
Authors and Affiliations
Additional information
Correspondence to: H.A. Lim
Rights and permissions
About this article
Cite this article
Strelets, V.B., Shindyalov, I.N. & Lim, H.A. Analysis of peptides from known proteins: Clusterization in sequence space. J Mol Evol 39, 625–630 (1994). https://doi.org/10.1007/BF00160408
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00160408