Abstract
The classical approaches for protein structure prediction rely either on homology of the protein sequence with a template structure or on ab initio calculations for energy minimization. These methods suffer from disadvantages such as the lack of availability of homologous template structures or intractably large conformational search space, respectively. The recently proposed fragment library based approaches first predict the local structures, which can be used in conjunction with the classical approaches of protein structure prediction. The accuracy of the predictions is dependent on the quality of the fragment library. In this work, we have constructed a library of local conformation classes purely based on geometric similarity. The local conformations are represented using Geometric Invariants, properties that remain unchanged under transformations such as translation and rotation, followed by dimension reduction via principal component analysis. The local conformations are then modeled as a mixture of Gaussian probability distribution functions (PDF). Each one of the Gaussian PDF’s corresponds to a conformational class with the centroid representing the average structure of that class. We find 46 classes when we use an octapeptide as a unit of local conformation. The protein 3-D structure can now be described as a sequence of local conformational classes. Further, it was of interest to see whether the local conformations can be predicted from the amino acid sequences. To that end, we have analyzed the correlation between sequence features and the conformational classes.
Similar content being viewed by others
Abbreviations
- GI:
-
Geometric invariants
- GMM:
-
Gaussian mixture model
- PC:
-
principal component
- PCA:
-
principal component analysis
- PDF:
-
probability distribution functions
References
Baker D and Sali A 2001 Protein structure prediction and structural genomics; Science 294 93–96
Brenner S E, Koehl P and Levitt M 2000 The ASTRAL compedium for protein structure and sequence analysis; Nucleic Acids Res. 28 254–256
Bystroff C, Thorsson C and Baker D 2000 HMMSTR: a Hidden Markov Model for local sequence-structure correlations in proteins; J. Mol. Biol. 301 173–190
De Brevern A and Hazout S 2003 “Hybrid Protein Model” for optimally defining 3D protein structure fragments; Bioinformatics 19 345–353
Duda R, Hart P and Stork D 2001 Pattern classification second edition (Wiley Interscience)
Engel D and DeGrado W 2004 Amino acid propensities are position-dependent throughout the length of alpha-helices; J. Mol. Biol. 337 1195–1205
Kabsch W and Sander C 1983 Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features; Biopolymers 22 2577–2637
Kumar S and Bansal M 1998 Dissecting alpha-helices: position-specific analysis of alpha-helices in globular proteins; Proteins Struct. Funct. Genet. 31 460–476
Jacobson M and Sali A 2004 Comparative protein modelling and its applications to drug discovery; Annu. Rep. Med. Chem. 39 259–276
Moore A W 1999 Very Fast EM-based Mixture Model Cluster using Multiresolution kd-trees; Adv. Neural Inform. Processing Sys. 11 543–549
Mumford D, Fogarty J and Kirwan F 1994 Geometric invariant theory (Newyork: Springer-Verlag)
Oldfield T and Hubbard R 1994 Analysis of C-alpha geometry in protein structures; Proteins Struct. Funct. Genet. 18 324–334
Orengo C, Jones D and Thornton J 2003 Bioinformatics: Genes, proteins and computers (Oxford: BIOS Scientific)
Richardson J 1981 The anatomy and taxonomy of protein structure; Adv. Protein Chem. 34 167–339
Tendulkar A V, Sohohi M A, Samant V V, Mone C Y and Wangikar P P 2003 Parameterization and classification of protein universe via geometric technique; J. Mol. Biol. 334 157–172
Tendulkar A V, Joshi A A, Sohohi M A and Wangikar P P 2004 Clustering of protein structural fragments reveals modular building block approach of nature; J. Mol. Biol. 338 611–629
Tendulkar A V, Sohohi M A, Oggunnike B A and Wangikar P P 2005 Geometric Invariant based Framework for Analysis of Protein Conformational Space; Bioinformatics 21 3622–3628
Terstappen G and Reggiani A 2001 In silico research in drug discovery; Trends Pharmacol. Sci. 22 23–26
Weyl H 1939 The classical groups, their invariants and representations (Princeton: University Press)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tendulkar, A.V., Ogunnaike, B. & Wangikar, P.P. Protein local conformations arise from a mixture of Gaussian distributions. J Biosci 32 (Suppl 1), 899–908 (2007). https://doi.org/10.1007/s12038-007-0090-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-007-0090-4