Analysis of peptides from known proteins: Clusterization in sequence space

Strelets, Victor B.; Shindyalov, Ilya N.; Lim, Hwa A.

doi:10.1007/BF00160408

Analysis of peptides from known proteins: Clusterization in sequence space

Published: December 1994

Volume 39, pages 625–630, (1994)
Cite this article

Journal of Molecular Evolution Aims and scope Submit manuscript

Victor B. Strelets¹,
Ilya N. Shindyalov² &
Hwa A. Lim¹

29 Accesses
4 Citations
Explore all metrics

Abstract

A combinatorial sequence space (CSS) model was introduced to represent sequences as a set of overlapping k-tuples of some fixed length which correspond to points in the CSS. The aim was to analyze clusterization of protein sequences in the CSS and to test various hypotheses about the possible evolutionary basis of this clusterization. The authors developed an easy-to-use technique which can reveal and analyze such a clusterization in a multidimensional CSS. Application of the technique led to an unexpectedly high clusterization of points in the CSS corresponding to k-tuples from known proteins. The clusterization could not be inferred from nonuniform amino acid frequencies or be explained by the influence of homologous data. None of the tested possible evolutionary and structural factors could explain the clusterization observed either. It looked as if certain protein sequence variations occurred and were fixed in the early course of evolution. Subsequent evolution (predominantly neutral) allowed only a limited number of changes and permitted new variants which led to preservation of certain k-tuples during the course of evolution. This was consistent with the theory of exon shuffling and protein block structure evolution. Possible applications of sequence space features found were also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Argos P (1987) Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. Strategies for protein folding and a guide for site-directed mutagenesis. J Mol Biol 197:331–348
Google Scholar
Benner SA, Cohen MA, Gonnet GH (1993) Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol 229:1065–1082
Google Scholar
Burbaum JJ, Starzyk RM, Schimmel P (1990) Understanding structural relationships in proteins of unsolved three-dimensional structure. Proteins 7:99–111
Google Scholar
Cid H, Bunster M, Arriagada E (1982) Prediction of the secondary structure by means of the hydorphobicity character. FEBS Lett 150:247–254
Google Scholar
Dayhoff MO, Schwartz RM, Orcutt BC (1979) A model of evolutionary change in proteins. Atlas Protein Structure 5(Suppl 3):345–352
Google Scholar
Doolittle RF (1981) Similar amino acid sequences: chance or common ancestry? Science 214:149–159
Google Scholar
Dorit RL, Schoenbach L, Gilbert W (1990) How big is the universe of exons? Science 250:1377–1382
Google Scholar
Galat A (1982) A hypothesis on protein folding in vivo. J Biochem 14:883–886
Google Scholar
George DG, Baker WC, Hunt LT (1986) The protein identification resources. Nucleic Acids Res 14(Suppl):11–16
Google Scholar
Grantham R (1974) Amino acids difference formula to help explain protein evolution. Science 185:862–864
Google Scholar
Holland SK, Blake CCF (1987) Proteins, exons and molecular evolution. Biosystems 20:181–206
Google Scholar
Jones TA, Thirup S (1986) Using known substructures in protein model building and crystallography. EMBO J 5:819–822
Google Scholar
Kabsch W, Sander C (1984) On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. Proc Natl Acad Sci USA 81:1075–1078
Google Scholar
Ladunga I (1992) Phylogenic continuum indicates “galaxies” in the protein universe: preliminary results on the natural group structures of proteins. J Mol Evol 34:358–375
Google Scholar
Levitt M (1992) Accurate modelling of protein conformation by automatic segment matching. J Mol Biol 226:507–533
Google Scholar
Pascarella S, Argos P (1992) Analysis of insertions/deletions in protein structures. J Mol Biol 224:461–471
Google Scholar
Patthy L (1991) Exons-original building blocks of proteins? Bioessays 13:187–192
Google Scholar
Rackovsky S (1990) Quantitative organization of the known protein X-ray structures. I. Methods and short-length-scale results. Proteins 7:378–402
Google Scholar
Rooman MJ, Wodak SJ (1988) Identification of predictive sequence motifs limited by protein structure data base size. Nature 335:45–49
Google Scholar
Saroff HA (1984) The uniqueness of protein sequences. Uniqueness diagrams for the Dayhoff file—1984. Bull Math Biol 46:661–672
Google Scholar
Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948
Google Scholar
Unger R, Harel D, Wherland S, Sussman JL (1989) A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins 5:355–373
Google Scholar
Vickerly LE (1987) Interactive analysis of protein structure using a microcomputer spreadsheet. Trends Biochem Sci 1:37–39
Google Scholar
Wilson IA, Haft DH, Getzoff ED, Tainer JA, Lerner RA, Brenner S (1985) Identical short peptide sequences in unrelated proteins can have different conformations: A testing ground for theories of immune recognition. Proc Natl Acad Sci USA 82:5255–5259
Google Scholar
Zuckerkandl E (1975) The appearance of new structures and functions in proteins during evolution. J Mol Evol 7:1–57
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Genetics & Biophysics, Supercomputer Computations Research Institute, Florida State University, 32306-4052, Tallahassee, FL, USA
Victor B. Strelets & Hwa A. Lim
Department of Biochemistry & Molecular Biophysics, Columbia University, 630 W. 168th Street, 10032, New York, NY, USA
Ilya N. Shindyalov

Authors

Victor B. Strelets
View author publications
You can also search for this author in PubMed Google Scholar
Ilya N. Shindyalov
View author publications
You can also search for this author in PubMed Google Scholar
Hwa A. Lim
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Correspondence to: H.A. Lim

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strelets, V.B., Shindyalov, I.N. & Lim, H.A. Analysis of peptides from known proteins: Clusterization in sequence space. J Mol Evol 39, 625–630 (1994). https://doi.org/10.1007/BF00160408

Download citation

Received: 08 January 1994
Accepted: 15 May 1994
Issue Date: December 1994
DOI: https://doi.org/10.1007/BF00160408

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of peptides from known proteins: Clusterization in sequence space

Abstract

Access this article

Similar content being viewed by others

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Fast Implementations of Markov Clustering for Protein Sequence Grouping

A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Analysis of peptides from known proteins: Clusterization in sequence space

Abstract

Access this article

Similar content being viewed by others

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Fast Implementations of Markov Clustering for Protein Sequence Grouping

A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation