Skip to main content
Log in

Analysis of peptides from known proteins: Clusterization in sequence space

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

A combinatorial sequence space (CSS) model was introduced to represent sequences as a set of overlapping k-tuples of some fixed length which correspond to points in the CSS. The aim was to analyze clusterization of protein sequences in the CSS and to test various hypotheses about the possible evolutionary basis of this clusterization. The authors developed an easy-to-use technique which can reveal and analyze such a clusterization in a multidimensional CSS. Application of the technique led to an unexpectedly high clusterization of points in the CSS corresponding to k-tuples from known proteins. The clusterization could not be inferred from nonuniform amino acid frequencies or be explained by the influence of homologous data. None of the tested possible evolutionary and structural factors could explain the clusterization observed either. It looked as if certain protein sequence variations occurred and were fixed in the early course of evolution. Subsequent evolution (predominantly neutral) allowed only a limited number of changes and permitted new variants which led to preservation of certain k-tuples during the course of evolution. This was consistent with the theory of exon shuffling and protein block structure evolution. Possible applications of sequence space features found were also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Argos P (1987) Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. Strategies for protein folding and a guide for site-directed mutagenesis. J Mol Biol 197:331–348

    Google Scholar 

  • Benner SA, Cohen MA, Gonnet GH (1993) Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol 229:1065–1082

    Google Scholar 

  • Burbaum JJ, Starzyk RM, Schimmel P (1990) Understanding structural relationships in proteins of unsolved three-dimensional structure. Proteins 7:99–111

    Google Scholar 

  • Cid H, Bunster M, Arriagada E (1982) Prediction of the secondary structure by means of the hydorphobicity character. FEBS Lett 150:247–254

    Google Scholar 

  • Dayhoff MO, Schwartz RM, Orcutt BC (1979) A model of evolutionary change in proteins. Atlas Protein Structure 5(Suppl 3):345–352

    Google Scholar 

  • Doolittle RF (1981) Similar amino acid sequences: chance or common ancestry? Science 214:149–159

    Google Scholar 

  • Dorit RL, Schoenbach L, Gilbert W (1990) How big is the universe of exons? Science 250:1377–1382

    Google Scholar 

  • Galat A (1982) A hypothesis on protein folding in vivo. J Biochem 14:883–886

    Google Scholar 

  • George DG, Baker WC, Hunt LT (1986) The protein identification resources. Nucleic Acids Res 14(Suppl):11–16

    Google Scholar 

  • Grantham R (1974) Amino acids difference formula to help explain protein evolution. Science 185:862–864

    Google Scholar 

  • Holland SK, Blake CCF (1987) Proteins, exons and molecular evolution. Biosystems 20:181–206

    Google Scholar 

  • Jones TA, Thirup S (1986) Using known substructures in protein model building and crystallography. EMBO J 5:819–822

    Google Scholar 

  • Kabsch W, Sander C (1984) On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. Proc Natl Acad Sci USA 81:1075–1078

    Google Scholar 

  • Ladunga I (1992) Phylogenic continuum indicates “galaxies” in the protein universe: preliminary results on the natural group structures of proteins. J Mol Evol 34:358–375

    Google Scholar 

  • Levitt M (1992) Accurate modelling of protein conformation by automatic segment matching. J Mol Biol 226:507–533

    Google Scholar 

  • Pascarella S, Argos P (1992) Analysis of insertions/deletions in protein structures. J Mol Biol 224:461–471

    Google Scholar 

  • Patthy L (1991) Exons-original building blocks of proteins? Bioessays 13:187–192

    Google Scholar 

  • Rackovsky S (1990) Quantitative organization of the known protein X-ray structures. I. Methods and short-length-scale results. Proteins 7:378–402

    Google Scholar 

  • Rooman MJ, Wodak SJ (1988) Identification of predictive sequence motifs limited by protein structure data base size. Nature 335:45–49

    Google Scholar 

  • Saroff HA (1984) The uniqueness of protein sequences. Uniqueness diagrams for the Dayhoff file—1984. Bull Math Biol 46:661–672

    Google Scholar 

  • Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948

    Google Scholar 

  • Unger R, Harel D, Wherland S, Sussman JL (1989) A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins 5:355–373

    Google Scholar 

  • Vickerly LE (1987) Interactive analysis of protein structure using a microcomputer spreadsheet. Trends Biochem Sci 1:37–39

    Google Scholar 

  • Wilson IA, Haft DH, Getzoff ED, Tainer JA, Lerner RA, Brenner S (1985) Identical short peptide sequences in unrelated proteins can have different conformations: A testing ground for theories of immune recognition. Proc Natl Acad Sci USA 82:5255–5259

    Google Scholar 

  • Zuckerkandl E (1975) The appearance of new structures and functions in proteins during evolution. J Mol Evol 7:1–57

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Correspondence to: H.A. Lim

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strelets, V.B., Shindyalov, I.N. & Lim, H.A. Analysis of peptides from known proteins: Clusterization in sequence space. J Mol Evol 39, 625–630 (1994). https://doi.org/10.1007/BF00160408

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00160408

Key words

Navigation