Skip to main content

Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure

Abstract

Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) Nucleic Acids Res 28:235

    Article  CAS  Google Scholar 

  2. Qian N, Sejnowski TJ (1988) J Mol Biol 202:865

    Article  CAS  Google Scholar 

  3. Chou PY, Fasman GD (1974) Biochemistry 13:222

    Article  CAS  Google Scholar 

  4. Garnier J, Osguthorpe DJ, Robson B (1978) J Mol Biol 1:97

    Article  Google Scholar 

  5. Garnier J, Gibrat JF, Robson B (1996) Methods Enzymol 226:540

    Article  Google Scholar 

  6. Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJE (1987) J Mol Biol 195:957

    Article  CAS  Google Scholar 

  7. Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Proteins 49:154

    Article  CAS  Google Scholar 

  8. Salzberg S, Cost S (1992) J Mol Biol 227:371

    Article  CAS  Google Scholar 

  9. Yi TM, Lander ES (1993) J Mol Biol 232:1117

    Article  CAS  Google Scholar 

  10. Salamov AA, Solovyev VV (1995) J Mol Biol 247:11

    Article  CAS  Google Scholar 

  11. Salamov AA, Solovyev VV (1997) J Mol Biol 268:31

    Article  CAS  Google Scholar 

  12. Vapnik VN (2000) The nature of statistical learning theory (information science and statistics). Springer, New York

    Book  Google Scholar 

  13. Ward JJ, McGuffin LJ, Buxton BF, Jones DT (2003) Bioinformatics 19:1650

    Article  CAS  Google Scholar 

  14. Montgomerie S, Sundaraj S, Gallin W, Wishart D (2006) BMC Bioinforma 301:301

    Article  Google Scholar 

  15. Pollastri G, Martin A, Mooney C, Vullo A (2007) BMC Bioinforma 8:201

    Article  Google Scholar 

  16. Wang G, Zhao Y, Wang D (2008) Neurocomputing 72:262

    Article  Google Scholar 

  17. Malekpour SA, Naghizadeh S, Pezeshk H, Sadeghi M, Eslahchi C (2009) Math Biosci 217:145

    Article  CAS  Google Scholar 

  18. Palopoli L, Rombo SE, Terracina G, Tradigo G, Veltri P (2009) Inf Fusion 10:217

    Article  Google Scholar 

  19. Santiago-Gómez MP, Kermasha S, Nicaud JM, Belin JM, Husson F (2010) J Mol Catal B Enzym 65:63

    Article  Google Scholar 

  20. Yang B, Wei H, Zhun Z, Huabin Q (2009) Expert Syst Appl 36:9000

    Article  Google Scholar 

  21. Zhou Z, Yang B, Hou W (2010) Expert Syst Appl 37:6381

    Article  Google Scholar 

  22. Babaei S, Geranmayeh A, Seyyedsalehi SA (2010) Comput Methods Prog Biomed 100:237

    Article  Google Scholar 

  23. Rost B, Sander C (1993) J Mol Biol 232:584

    Article  CAS  Google Scholar 

  24. Rost B (1996) Methods Enzymol 266:525

    Article  CAS  Google Scholar 

  25. Cuff JA, Barton GJ (2000) Proteins 40:502

    Article  CAS  Google Scholar 

  26. Cheng H, Sen TZ, Jernigan RL, Kloczkowski A (2007) Bioinformatics 23:2628

    Article  CAS  Google Scholar 

  27. Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan R (2005) Polymer 46:4314

    Article  CAS  Google Scholar 

  28. Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) Bioinformatics 21:2787. doi:10.1093/bioinformatics/bti408

    Article  CAS  Google Scholar 

  29. Sen TZ, Cheng H, Kloczkowski A, Jernigan R (2006) Prot Sci 15:2499

    Article  CAS  Google Scholar 

  30. Rost B, Yachdav G, Liu J (2004) Nucleic Acids Res 32:W321

    Article  CAS  Google Scholar 

  31. Eddy SR (1998) Bioinformatics 14:755

    Article  CAS  Google Scholar 

  32. Jones D (1999) J Mol Biol 292:195

    Article  CAS  Google Scholar 

  33. Kihara D (2005) Protein Sci 14:1955

    Article  CAS  Google Scholar 

  34. Madera M, Calmus R, Thiltgen G, Karplus K, Gough J (2010) Bioinformatics 26:596

    Article  CAS  Google Scholar 

  35. Yang B, Wu Q, Ying Z (2011) Knowl-Based Syst 24:304

    Article  Google Scholar 

  36. Koliński A (2004) Acta Biochem Pol 51:349

    Google Scholar 

  37. Huang GB, Zhu QY, Siew CK (2006) Neurocomputing 70:489

    Article  Google Scholar 

  38. Saraswathi S, Jernigan RL, Koliński A, Kloczkowski A (2010) P IJCCI/ICNC 370–375

  39. Suresh S, Saraswathi S, Sundararajan N (2010) EAAI 23:1149

    Article  Google Scholar 

  40. Kennedy J, Eberhart RC (1995) P ICNN 4:1942

    Google Scholar 

  41. Fernández-Martínez JL, García-Gonzalo E (2008) JAEA 2008:15

    Google Scholar 

  42. Fernández-Martínez JL, García-Gonzalo E, Fernández-Alvarez JP (2008) IJCIR 4:93

    Article  Google Scholar 

  43. García-Gonzalo E, Fernández-Martínez JL (2009) In: P ICCMS, pp 1280–1290

  44. Fernández-Martínez JL, García-Gonzalo E (2010) In: P IJCCI/ICNC, pp 237–242

  45. Fernández-Martínez JL, García-Gonzalo E (2011) IEEE Trans Evol Comput 15:405

    Article  Google Scholar 

  46. Fernández-Martínez JL, García-Gonzalo E, Saraswathi S, Jernigan RL, Kloczkowski A (2011) ASILNCS 6728:1

    Google Scholar 

  47. Rost B, Sander C (1994) Proteins 20:216

    Article  CAS  Google Scholar 

  48. Zemla A, Venclovas E, Fidelis K, Rost B (1999) Proteins Struct Funct Bioinforma 34:220

    Article  CAS  Google Scholar 

  49. Saraswathi S, Fernández-Martínez JL, Koliński A, Jernigan RL, Kloczkowski A (2012) JMM 18:4275

    Article  CAS  Google Scholar 

  50. Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) J Comput Chem 33(3):259

    Article  CAS  Google Scholar 

  51. Kazemian M, Moshiri B, Nikbakht H, Lucas C (2007) Comput Biol Chem 31:44

    Article  CAS  Google Scholar 

  52. Costantini S, Colonna G, Facchiano AM (2006) BBRC 342:441–451

    CAS  Google Scholar 

  53. Kabsch W, Sander C (1983) Biopolymers 22:2577

    Article  CAS  Google Scholar 

  54. Needleman SB, Wunsch CD (1970) J Mol Biol 48:443

    Article  CAS  Google Scholar 

  55. Henikoff S, Henikoff J (1992) Proc Natl Acad Sci U S A 89:10915

    Article  CAS  Google Scholar 

  56. Sander C, Schneider R (1991) Proteins 9:56

    Article  CAS  Google Scholar 

  57. Fernández-Martínez JL, García-Gonzalo E (2009) Swarm Intell Spec Publ PSO 3:245

    Article  Google Scholar 

  58. Kyte J, Doolittle RF (1982) JMB 157:105

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The algorithm for the knowledge-based potentials data, was developed by members from the Kolinski [36] lab. This work was supported by the National Science Foundation grant IGERT-0504304, National Science Foundation grant MSB-1021785 and National Institutes of Health grants R01GM081680 and R01GM072014.

Author information

Affiliations

Authors

Corresponding author

Correspondence to A. Kloczkowski.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 508 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Saraswathi, S., Fernández-Martínez, J.L., Koliński, A. et al. Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure. J Mol Model 19, 4337–4348 (2013). https://doi.org/10.1007/s00894-013-1911-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00894-013-1911-z

Keywords

  • Amino acids
  • Knowledge-based potentials
  • Machine learning
  • Neural networks
  • Particle swarm optimization
  • Protein secondary structure prediction