Abstract
The description of protein 3D structures can be performed through a library of 3D fragments, named a structural alphabet. Our structural alphabet is composed of 16 small protein fragments of 5 Cα in length, called protein blocks (PBs). It allows an efficient approximation of the 3D protein structures and a correct prediction of the local structure. The 72 most frequent series of 5 consecutive PBs, called structural words (SWs) are able to cover more than 90% of the 3D structures. PBs are highly conditioned by the presence of a limited number of transitions between them. In this study, we propose a new method called “pinning strategy” that used this specific feature to predict long protein fragments. Its goal is to define highly probable successions of PBs. It starts from the most probable SW and is then extended with overlapping SWs. Starting from an initial prediction rate of 34.4%, the use of the SWs instead of the PBs allows a gain of 4.5%. The pinning strategy simply applied to the SWs increases the prediction accuracy to 39.9%. In a second step, the sequence-structure relationship is optimized, the prediction accuracy reaches 43.6%.
Similar content being viewed by others
Abbreviations
- PB:
-
Protein block
- PSOWs:
-
preferential succession of overlapping structural words
- SF:
-
sequence family
- SWs:
-
structural words
References
Alexandrov N and Shindyalov I 2003 PDP: protein domain parser; Bioinformatics 19 429–430
Alland C, Moreews F, Boens D, Carpentier M, Chiusa S, Lonquety M, Renault N, Wong Y, Cantalloube H, Chomilier J et al. 2005 RPBS: a web resource for structural bioinformatics; Nucleic Acids Res. 33 W44–W49
Altschul S.F, Gish W, Miller W, Myers E W and Lipman D J 1990 Basic local alignment search tool; J. Mol. Biol. 215 403–410
Bairoch A, Boeckmann B, Ferro S and Gasteiger E 2004 Swiss-Prot: juggling between evolution and stability; Brief Bioinform 5 39–55
Benros C, de Brevern A G, Etchebest C and Hazout S 2006 Assessign a novel approach for predicting local 3D protein structures from sequence; Proteins 62 865–880
Benros, C, de Brevern A G and Hazout S 2003 Hybrid Protein Model (HPM): A Method For Building A Library Of Overlapping Local Structural Prototypes. Sensitivity Study And Improvements Of The Training; in IEEE Workshop on Neural Networks for Signal Processing (Toulouse, France) pp 53–72
Benros C, de Brevern A G and Hazout S 2004 Predicting Local Structural Candidates from Sequence by the “Hybrid Protein Model” Approach; in 12th Intelligent Systems for Molecular Biology (ISMB) / 3rd the European Conference on Computational Biology (ECCB), Glasgow
Bystroff C and Baker D 1998 Prediction of local structure in proteins using a library of sequence-structure motifs; J. Mol. Biol. 281 565–577
Camproux A C, Brevern A G, Hazout S and Tufféry P 2001 Exploring the use of a structural alphabet for structural prediction of protein loops; Theor. Chem. Acc. 106 28–35
Camproux A C, Gautier R and Tuffery P 2004 A hidden markov model derived structural alphabet for proteins; J. Mol. Biol. 339 591–605
Camproux A C, Tuffery P, Buffat L, Andre C, Boisvieux J F and Hazout S 1999a Using short structural building blocks defined by a Hidden Markov Model for analysing patterns between regular secondary structures; Theor. Chem. Acc. 101 33–40
Camproux A C, Tuffery P, Chevrolat J P, Boisvieux J F and Hazout S 1999b Hidden Markov model approach for identifying the modular framework of the protein backbone; Protein Eng. 12 1063–1073
Chan A W, Hutchinson E G, Harris D and Thornton J M 1993 Identification, classification, and analysis of beta-bulges in proteins; Protein Sci. 2 1574–1590
Chivian D, Kim D E, Malmstrom L, Schonbrun J, Rohl C A and Baker D 2005 Prediction of CASP-6 structures using automated Robetta protocols; Proteins (Suppl. 7) 61 157–166
Colloc’h N, Etchebest C, Thoreau E, Henrissat B and Mornon J P 1993 Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment; Protein Eng. 6 377–382
Cuff J A and Barton G J 1999 Evaluation and improvement of multiple sequence methods for protein secondary structure prediction; Proteins 34 508–519
de Brevern A G 2005 New assessment of Protein Blocks; In Silico Biol. 5 283–289
de Brevern A G, Benros C, Gautier R, Valadie H, Hazout S and Etchebest C 2004 Local backbone structure prediction of proteins; In Silico Biol. 4 381–386
de Brevern A G, Camproux A-C, Hazout S, Etchebest C and Tuffery P 2001 Protein structural alphabets: beyond the secondary structure description; in Recent research developments in protein engineering (ed.) S Sangadai (Trivandrum: Research Signpost) pp 319–331
de Brevern A G, Etchebest C and Hazout S 2000 Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks; Proteins 41 271–287
de Brevern A G and Hazout S 2000 Hybrid Protein Model (HPM): a method to compact protein 3D-structures information and physicochemical properties; IEEE — Comput. Soc. S1 49–54
de Brevern A G and Hazout S 2001 Compacting local protein folds with a “hybrid protein model”; Theor. Chem. Acc. 106 36–47
de Brevern A G and Hazout S 2003 ’Hybrid protein model’ for optimally defining 3D protein structure fragments; Bioinformatics 19 345–353
de Brevern A G, Valadie H, Hazout S and Etchebest C 2002 Extension of a local backbone description using a structural alphabet: a new approach to the sequence-structure relationship; Protein Sci. 11 2871–2886
de Brevern A G, Wong H, Tournamille C, Colin Y, Le Van Kim C and Etchebest C 2005 A structural model of a seven-transmembrane helix receptor: The Duffy antigen/receptor for chemokine (DARC); Biochim. Biophys. Acta 1724 288–306
Efimov A V 1997 Structural trees for protein superfamilies; Proteins 28 241–260
Eisenberg D 2003 The discovery of the alpha-helix and beta-sheet, the principal structural features of proteins; Proc. Natl. Acad. Sci. USA 100 11207–11210
Errami, M, Geourjon C and Deleage G 2003 Detection of unrelated proteins in sequences multiple alignments by using predicted secondary structures; Bioinformatics 19 506–512
Espadaler J, Fernandez-Fuentes N, Hermoso A, Querol E, Aviles F X, Sternberg M J and Oliva B 2004 ArchDB: automated protein loop classification as a tool for structural genomics; Nucleic Acids Res. 32 D185–188
Etchebest C, Benros C, Hazout S and de Brevern A G 2005 A structural alphabet for local protein structures: Improved prediction methods; Proteins 59 810–827
Fetrow J S, Palumbo M J and Berg G 1997 Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme; Proteins 27 249–271
Fourrier L, Benros C and de Brevern A G 2004 Use of a structural alphabet for analysis of short loops connecting repetitive structures; BMC Bioinformatics 5 58
Gelly J C, de Brevern A G and Hazout S 2006 ’Protein Peeling’: an approach for splitting a 3D protein structure into compact fragments; Bioinformatics 22 129–133
Geourjon C, Combet C, Blanchet C and Deleage G 2001 Identification of related proteins with weak sequence identity using secondary structure information; Protein Sci. 10 788–797
Girod A, Ried M, Wobus C, Lahm H, Leike K, Kleinschmidt J, Deleage G and Hallek M 1999 Genetic capsid modifications allow efficient re-targeting of adeno-associated virus type 2; Nat. Med. 5 1438
Hartigan, J A and Wong M A 1979 k-means; Appl. Stat. 28 100–115
Henikoff S and Henikoff J G 1992 Amino acid substitution matrices from protein blocks; Proc. Natl. Acad. Sci. USA 89 10915–10919
Humphrey W, Dalke A and Schulten K 1996 VMD: visual molecular dynamics; J. Mol. Graph. 14 33–38, 27–38
Hunter C G and Subramaniam S 2003a Protein fragment clustering and canonical local shapes; Proteins 50 580–588
Hunter C G and Subramaniam S 2003b Protein local structure prediction from sequence; Proteins 50 572–579
Jones D T 1999 Protein secondary structure prediction based on position-specific scoring matrices; J. Mol. Biol. 292 195–202
Jurkowski W, Brylinski M, Konieczny L, Wiiniowski Z and Roterman I 2004 Conformational subspace in simulation of early-stage protein folding; Proteins 55 115–127
Karchin R 2003 Evaluating local structure alphabets for protein structure prediction, Ph. D. thesis, University of California, Santz Cruz, USA
Karchin R, Cline M, Mandel-Gutfreund Y and Karplus K 2003 Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry; Proteins 51 504–514
Kohonen T 1982 Self-organized formation of topologically correct feature maps; Biol. Cybern. 43 59–69
Kohonen T 2001 Self-organizing maps 3rd edition (Springer) pp 501
Koradi R, Billeter M and Wuthrich K 1996 MOLMOL: a program for display and analysis of macromolecular structures; J. Mol. Graph. 14 29–32
Kuang R, Leslie C S and Yang A S 2004 Protein backbone angle prediction with machine learning approaches; Bioinformatics 20 1612–1621
Kullback S and Leibler R A 1951 On information and sufficiency: Ann. Math. Stat. 22 79–86
Martin J, Letellier G, Marin A, Taly J-F, de Brevern A G and Gibrat J-F 2005 Protein secondary structure assignment revisited: a detailed analysis of different assignment methods; BMC Struct. Biol. 5 17
Milner-White E J 1990 Situations of gamma-turns in proteins. Their relation to alpha-helices, beta-sheets and ligand binding sites; J. Mol. Biol. 216 386–397
Murzin A G, Brenner S E, Hubbard T and Chothia C 1995 SCOP: a structural classification of proteins database for the investigation of sequences and structures; J. Mol. Biol. 247 536–540
Némethy G and Printz M P 1972 The gamma turn, a possible folded conformation of the polypeptide chain. Comparison with the beta turn; Macromolecules 5 755–758
Oliva B, Bates P A, Querol E, Aviles F X and Sternberg M J 1997 An automated classification of the structure of protein loops; J. Mol. Biol. 266 814–830
Orengo C A, Michie A D, Jones S, Jones D T, Swindells M B and Thornton J M 1997 CATH-a hierarchic classification of protein domain structures; Structure 5 1093–1108
Pauling L and Corey R B 1951a Atomic coordinates and structure factors for two helical configurations of polypeptide chains; Proc. Natl. Acad. Sci. USA 37 235–240
Pauling L and Corey R B 1951b The pleated sheet, a new layer configuration of polypeptide chains; Proc. Natl. Acad. Sci. USA 37 251–256
Pei J and Grishin N V 2004 Combining evolutionary and structural information for local protein structure prediction; Proteins 56 782–794
Petersen T N, Lundegaard C, Nielsen M, Bohr H, Bohr J, Brunak S, Gippert G P and Lund O 2000 Prediction of protein secondary structure at 80% accuracy; Proteins 41 17–20
Pollastri G and McLysaght A 2005 Porter: a new, accurate server for protein secondary structure prediction; Bioinformatics 21 1719–1720
Pollastri G, Przybylski D, Rost B and Baldi P 2002 Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles; Proteins 47 228–235
Prestrelski S J, Williams A L Jr and Liebman M N 1992 Generation of a substructure library for the description and classification of protein secondary structure. I. Overview of the methods and results; Proteins 14 430–439
Rabiner L R 1989 A tutorial on hidden Markov models and selected application in speech recognition; Proc. IEEE 77 257–286
Richardson J S, Getzoff E D and Richardson D C 1978 The beta bulge: a common small unit of nonrepetitive protein structure; Proc. Natl. Acad. Sci. USA 75 2574–2578
Ring C S, Kneller D G, Langridge R and Cohen F E 1992 Taxonomy and conformational analysis of loops in proteins; J. Mol. Biol. 224 685–699
Rohl C A and Doig A J 1996 Models for the 3(10)-helix/coil, pi-helix/coil, and alpha-helix/3(10)-helix/coil transitions in isolated peptides; Protein Sci. 5 1687–1696
Sander O, Sommer I and Lengauer T 2006 Local protein structure prediction using discriminative models; BMC Bioinformatics 7 14
Sayle R A and Milner-White E J 1995 RASMOL: biomolecular graphics for all; Trends Biochem. Sci. 20 374
Schuchhardt J, Schneider G, Reichelt J, Schomburg D and Wrede P 1996 Local structural motifs of protein backbones are classified by self-organizing neural networks; Protein Eng. 9 833–842
Shannon C 1948 A mathematical theory of communication; Bell Syst. Tech. J. 27 379–423
Sibanda B L and Thornton J M 1991 Conformation of beta hairpins in protein structures: classification and diversity in homologous structures; Methods Enzymol. 202 59–82
Sowdhamini R and Blundell T L 1995 An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins; Protein Sci. 4 506–520
Tendulkar A V, Joshi A A, Sohoni M A and Wangikar P P 2004 Clustering of protein structural fragments reveals modular building block approach of nature; J. Mol. Biol. 338 611–629
Thompson J D, Higgins D G and Gibson T J 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice; Nucleic Acids Res. 22 4673–4680
Tsai H H, Tsai C J, Ma B and Nussinov R 2004 In silico protein design by combinatorial assembly of protein building blocks; Protein Sci. 13 2753–2765
Tyagi M, Sharma P, Swamy C, Cadet F, Srinivasan N, De Brevern A G and Offmann B 2006 Protein Block Expert (PBE): A web-based protein structure analysis server using a structural alphabet; Nucleic Acids Res. (in press)
Unger R, Harel D, Wherland S and Sussman J L 1989 A 3D building blocks approach to analyzing and predicting structure of proteins; Proteins 5 355–373
Unger R and Sussman J L 1993 The importance of short structural motifs in protein structure analysis; J. Comput. Aided Mol. Des. 7 457–472
Wintjens R T, Rooman M J and Wodak S J 1996 Automatic classification and analysis of alpha alpha-turn motifs in proteins; J. Mol. Biol. 255 235–253
Wojcik J, Mornon J P and Chomilier J 1999 New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification; J. Mol. Biol. 289 1469–1490
Author information
Authors and Affiliations
Corresponding author
Additional information
Both authors contributed equally to this work.
Rights and permissions
About this article
Cite this article
de Brevern, A.G., Etchebest, C., Benros, C. et al. “Pinning strategy”: a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci 32, 51–70 (2007). https://doi.org/10.1007/s12038-007-0006-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-007-0006-3