Propensities of Amino Acid Pairings in Secondary Structure of Globular Proteins

  • Cevdet NacarEmail author


A class of secondary structure prediction algorithms use the information from the statistics of the residue pairs found in secondary structural elements. Because the protein folding process is dominated by backbone hydrogen bonding, an approach based on backbone hydrogen-bonded residue pairings would improve the predicting capabilities of these class algorithms. The reliability of the prediction algorithms depends on the quality of the statistics, therefore, of the data set. In this study, it was aimed to determine the propensities of the backbone hydrogen-bonded residue pairings for secondary structural elements of α-helix and β-sheet in globular proteins using a new and comprehensive data set created from the peptides deposited in Worldwide Protein Data Bank. A master data set including 4882 globular peptide chains with resolution better than 2.5 Å, sequence identity smaller than 25% and length of no shorter than 100 residues were created. Separate data sub sets also were created for helix and sheet structures from master set and each sub set includes 4594 and 4483 chains, respectively. Backbone hydrogen-bonded residue pairings in helices and sheets were detected and the propensities of them were represented as odds ratios (observed/[random or expected]) in matrices. Propensities assigned by this study to the residue pairings in secondary structural elements (as helix, overall strands, parallel strands and antiparallel strands) differ from the previous studies by 19 to 34%. These dissimilarities are important and they would cause further improvements in secondary structure prediction algorithms.


Secondary structure prediction Residue pairing Residue propensity Hydrogen bonding 




Compliance with Ethical Standards

Conflict of interest

The author declares that he has no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by the author.

Supplementary material

10930_2020_9880_MOESM1_ESM.pdf (667 kb)
Electronic supplementary material 1 (PDF 667 kb)
10930_2020_9880_MOESM10_ESM.pdf (297 kb)
Electronic supplementary material 2 (PDF 297 kb)
10930_2020_9880_MOESM11_ESM.pdf (194 kb)
Electronic supplementary material 3 (PDF 194 kb)
10930_2020_9880_MOESM12_ESM.pdf (80 kb)
Electronic supplementary material 4 (PDF 80 kb)
10930_2020_9880_MOESM2_ESM.pdf (475 kb)
Electronic supplementary material 5 (PDF 475 kb)
10930_2020_9880_MOESM3_ESM.pdf (485 kb)
Electronic supplementary material 6 (PDF 485 kb)
10930_2020_9880_MOESM4_ESM.pdf (450 kb)
Electronic supplementary material 7 (PDF 450 kb)
10930_2020_9880_MOESM5_ESM.pdf (610 kb)
Electronic supplementary material 8 (PDF 610 kb)
10930_2020_9880_MOESM6_ESM.pdf (197 kb)
Electronic supplementary material 9 (PDF 197 kb)
10930_2020_9880_MOESM7_ESM.pdf (193 kb)
Electronic supplementary material 10 (PDF 193 kb)
10930_2020_9880_MOESM8_ESM.pdf (73 kb)
Electronic supplementary material 11 (PDF 73 kb)
10930_2020_9880_MOESM9_ESM.pdf (303 kb)
Electronic supplementary material 12 (PDF 303 kb)


  1. 1.
    Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234(3):779–815CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10(12):980CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Zhang Y, Skolnick J (2005) The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA 102(4):1029–1034CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Bonneau R, Tsai J, Ruczinski I, Chivian D, Rohl C, Strauss CE et al (2001) Rosetta in CASP4: progress in ab initio protein structure prediction. Proteins Suppl 5:119–126CrossRefGoogle Scholar
  5. 5.
    Bystroff C, Thorsson V, Baker D (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301(1):173–190CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Levitt M, Warshel A (1975) Computer simulation of protein folding. Nature 253(5494):694–698CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Osguthorpe DJ (1999) Improved ab initio predictions with a simplified, flexible geometry model. Proteins Suppl 3:186–193CrossRefGoogle Scholar
  8. 8.
    Simons KT, Bonneau R, Ruczinski I, Baker D (1999) Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins Suppl 3:171–176CrossRefGoogle Scholar
  9. 9.
    Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(4096):223–230CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Bonneau R, Baker D (2001) Ab initio protein structure prediction: progress and prospects. Annu Rev Biophys Biomol Struct 30:173–189CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Scheraga HA (1971) Theoretical and experimental studies of conformations of polypeptides. Chem Rev 71(2):195–217CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Burgess AW, Ponnuswamy PK, Scheraga HA (1974) Analysis of conformations of amino acid residues and prediction of backbone topography in proteins. Israel J Chem 12(1–2):239–86CrossRefGoogle Scholar
  13. 13.
    Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A (2018) Critical assessment of methods of protein structure prediction (CASP)-Round XII. Proteins 86(Suppl 1):7–15CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J (2019) Critical assessment of methods of protein structure prediction (CASP)—round XIII. Proteins 87(12):1011–1020CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Deleage G, Roux B (1987) An algorithm for protein secondary structure prediction based on class prediction. Protein Eng 1(4):289–294CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins 23(4):566–579CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Frishman D, Argos P (1996) Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng 9(2):133–142CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Garnier J, Gibrat JF, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Garnier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120(1):97–120CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Geourjon C, Deleage G (1994) SOPM: a self-optimized method for protein secondary structure prediction. Protein Eng 7(2):157–164CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Geourjon C, Deleage G (1995) SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput Appl Biosci 11(6):681–684PubMedPubMedCentralGoogle Scholar
  22. 22.
    Gibrat JF, Garnier J, Robson B (1987) Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol 198(3):425–443CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Guermeur Y, Geourjon C, Gallinari P, Deleage G (1999) Improved performance in protein secondary structure prediction by inhomogeneous score combination. Bioinformatics 15(5):413–421CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    King RD, Sternberg MJ (1996) Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci 5(11):2298–2310CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Levin JM (1997) Exploring the limits of nearest neighbour secondary structure prediction. Protein Eng 10(7):771–776CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Levin JM, Garnier J (1988) Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. Biochim Biophys Acta 955(3):283–295CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Levin JM, Robson B, Garnier J (1986) An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett 205(2):303–308CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232(2):584–599CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Rost B, Sander C (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19(1):55–72CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13(2):222–245CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Chou PY, Fasman GD (1974) Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13(2):211–222CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Pauling L, Corey RB, Branson HR (1951) The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA 37(4):205–211CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Pauling L, Corey RB (1951) The pleated sheet, a new layer configuration of polypeptide chains. Proc Natl Acad Sci USA 37(5):251–256CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Deleage G, Blanchet C, Geourjon C (1997) Protein structure prediction. Implications for the biologist. Biochimie 79(11):681–686CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    de Sousa MM, Munteanu CR, Pazos A, Fonseca NA, Camacho R, Magalhaes AL (2011) Amino acid pair- and triplet-wise groupings in the interior of alpha-helical segments in proteins. J Theor Biol 271(1):136–144CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Fonseca NA, Camacho R, Magalhaes AL (2008) Amino acid pairing at the N- and C-termini of helical segments in proteins. Proteins 70(1):188–196CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Fooks HM, Martin AC, Woolfson DN, Sessions RB, Hutchinson EG (2006) Amino acid pairing preferences in parallel beta-sheets in proteins. J Mol Biol 356(1):32–44CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Hutchinson EG, Sessions RB, Thornton JM, Woolfson DN (1998) Determinants of strand register in antiparallel beta-sheets of proteins. Protein Sci 7(11):2287–2300CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Kim SB, Tsui KL, Borodovsky M (2006) Multiple testing in large-scale contingency tables: inferring patterns of pair-wise amino acid association in beta-sheets. Int J Bioinform Res Appl 2(2):193–217CrossRefGoogle Scholar
  40. 40.
    Wouters MA, Curmi PM (1995) An analysis of side chain interactions and pair correlations within antiparallel beta-sheets: the differences between backbone hydrogen-bonded and non-hydrogen-bonded residue pairs. Proteins 22(2):119–131CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Zhang N, Duan G, Gao S, Ruan J, Zhang T (2010) Prediction of the parallel/antiparallel orientation of beta-strands using amino acid pairing preferences and support vector machines. J Theor Biol 263(3):360–368CrossRefPubMedPubMedCentralGoogle Scholar
  42. 42.
    Zhang N, Ruan J, Duan G, Gao S, Zhang T (2009) The interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of beta-strands. Biochem Biophys Res Commun 386(3):537–543CrossRefPubMedPubMedCentralGoogle Scholar
  43. 43.
    Zhang N, Ruan J, Wu J, Zhang T (2007) SHEETSPAIR: a database of amino acid pairs in protein sheet structures. Data Sci J 6:S589–S595CrossRefGoogle Scholar
  44. 44.
    Rose GD, Fleming PJ, Banavar JR, Maritan A (2006) A backbone-based theory of protein folding. Proc Natl Acad Sci USA 103(45):16623–16633CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Lifson S, Sander C (1980) Specific recognition in the tertiary structure of beta-sheets of proteins. J Mol Biol 139(4):627–639CrossRefPubMedPubMedCentralGoogle Scholar
  46. 46.
    Petersen SB, Neves-Petersen MT, Henriksen SB, Mortensen RJ, Geertz-Hansen HM (2012) Scale-free behaviour of amino acid pair interactions in folded proteins. PLoS ONE 7(7):e41322CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    ww PDBc (2019) Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 47(D1):D520–D528CrossRefGoogle Scholar
  48. 48.
    Worldwide Protein Data Bank. FTP site. Accessed 16 Apr 2019
  49. 49.
    Stephen White laboratory at UC Irvine. Membrane Proteins of Known 3D Structure. Accessed 16 Apr 2019
  50. 50.
    Wikipedia The Free Encyclopedia. Extremophile. Accessed 2 May 2019
  51. 51.
    Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42(Database issue):D310–D314CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    MRC Laboratory of Molecular Biology. Structural Classification of Proteins 2. Accessed 11 Oct 2019.
  53. 53.
    Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197CrossRefPubMedPubMedCentralGoogle Scholar
  55. 55.
    Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919CrossRefPubMedPubMedCentralGoogle Scholar
  56. 56.
    NCBI National Center for Biotechnology Information. BLOSUM Matrices. Accessed 2 May 2019.
  57. 57.
    Baker EN, Hubbard RE (1984) Hydrogen bonding in globular proteins. Progr Biophys Mol Biol 44(2):97–179CrossRefGoogle Scholar
  58. 58.
    QB64. . Accessed 21 Oct 2019.
  59. 59.
    Periti PF, Quagliarotti G, Liquori AM (1967) Recognition of alpha-helical segments in proteins of known primary structure. J Mol Biol 24(2):313–322CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Chemmama IE, Chapagain PP, Gerstman BS (2015) Pairwise amino acid secondary structural propensities. Phys Rev E 91(4):042709CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Department of Biophysics, School of MedicineMarmara UniversityIstanbulTurkey

Personalised recommendations