Skip to main content
Log in

Protein local conformations arise from a mixture of Gaussian distributions

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

The classical approaches for protein structure prediction rely either on homology of the protein sequence with a template structure or on ab initio calculations for energy minimization. These methods suffer from disadvantages such as the lack of availability of homologous template structures or intractably large conformational search space, respectively. The recently proposed fragment library based approaches first predict the local structures, which can be used in conjunction with the classical approaches of protein structure prediction. The accuracy of the predictions is dependent on the quality of the fragment library. In this work, we have constructed a library of local conformation classes purely based on geometric similarity. The local conformations are represented using Geometric Invariants, properties that remain unchanged under transformations such as translation and rotation, followed by dimension reduction via principal component analysis. The local conformations are then modeled as a mixture of Gaussian probability distribution functions (PDF). Each one of the Gaussian PDF’s corresponds to a conformational class with the centroid representing the average structure of that class. We find 46 classes when we use an octapeptide as a unit of local conformation. The protein 3-D structure can now be described as a sequence of local conformational classes. Further, it was of interest to see whether the local conformations can be predicted from the amino acid sequences. To that end, we have analyzed the correlation between sequence features and the conformational classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

GI:

Geometric invariants

GMM:

Gaussian mixture model

PC:

principal component

PCA:

principal component analysis

PDF:

probability distribution functions

References

  • Baker D and Sali A 2001 Protein structure prediction and structural genomics; Science 294 93–96

    Article  PubMed  CAS  Google Scholar 

  • Brenner S E, Koehl P and Levitt M 2000 The ASTRAL compedium for protein structure and sequence analysis; Nucleic Acids Res. 28 254–256

    Article  PubMed  CAS  Google Scholar 

  • Bystroff C, Thorsson C and Baker D 2000 HMMSTR: a Hidden Markov Model for local sequence-structure correlations in proteins; J. Mol. Biol. 301 173–190

    Article  PubMed  CAS  Google Scholar 

  • De Brevern A and Hazout S 2003 “Hybrid Protein Model” for optimally defining 3D protein structure fragments; Bioinformatics 19 345–353

    Article  PubMed  CAS  Google Scholar 

  • Duda R, Hart P and Stork D 2001 Pattern classification second edition (Wiley Interscience)

  • Engel D and DeGrado W 2004 Amino acid propensities are position-dependent throughout the length of alpha-helices; J. Mol. Biol. 337 1195–1205

    Article  PubMed  CAS  Google Scholar 

  • Kabsch W and Sander C 1983 Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features; Biopolymers 22 2577–2637

    Article  PubMed  CAS  Google Scholar 

  • Kumar S and Bansal M 1998 Dissecting alpha-helices: position-specific analysis of alpha-helices in globular proteins; Proteins Struct. Funct. Genet. 31 460–476

    Article  PubMed  CAS  Google Scholar 

  • Jacobson M and Sali A 2004 Comparative protein modelling and its applications to drug discovery; Annu. Rep. Med. Chem. 39 259–276

    Article  CAS  Google Scholar 

  • Moore A W 1999 Very Fast EM-based Mixture Model Cluster using Multiresolution kd-trees; Adv. Neural Inform. Processing Sys. 11 543–549

    Google Scholar 

  • Mumford D, Fogarty J and Kirwan F 1994 Geometric invariant theory (Newyork: Springer-Verlag)

    Google Scholar 

  • Oldfield T and Hubbard R 1994 Analysis of C-alpha geometry in protein structures; Proteins Struct. Funct. Genet. 18 324–334

    Article  PubMed  CAS  Google Scholar 

  • Orengo C, Jones D and Thornton J 2003 Bioinformatics: Genes, proteins and computers (Oxford: BIOS Scientific)

    Google Scholar 

  • Richardson J 1981 The anatomy and taxonomy of protein structure; Adv. Protein Chem. 34 167–339

    Article  PubMed  CAS  Google Scholar 

  • Tendulkar A V, Sohohi M A, Samant V V, Mone C Y and Wangikar P P 2003 Parameterization and classification of protein universe via geometric technique; J. Mol. Biol. 334 157–172

    Article  PubMed  CAS  Google Scholar 

  • Tendulkar A V, Joshi A A, Sohohi M A and Wangikar P P 2004 Clustering of protein structural fragments reveals modular building block approach of nature; J. Mol. Biol. 338 611–629

    Article  PubMed  CAS  Google Scholar 

  • Tendulkar A V, Sohohi M A, Oggunnike B A and Wangikar P P 2005 Geometric Invariant based Framework for Analysis of Protein Conformational Space; Bioinformatics 21 3622–3628

    Article  PubMed  CAS  Google Scholar 

  • Terstappen G and Reggiani A 2001 In silico research in drug discovery; Trends Pharmacol. Sci. 22 23–26

    Article  PubMed  CAS  Google Scholar 

  • Weyl H 1939 The classical groups, their invariants and representations (Princeton: University Press)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pramod P. Wangikar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tendulkar, A.V., Ogunnaike, B. & Wangikar, P.P. Protein local conformations arise from a mixture of Gaussian distributions. J Biosci 32 (Suppl 1), 899–908 (2007). https://doi.org/10.1007/s12038-007-0090-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12038-007-0090-4

Keywords

Navigation