The Protein Journal

, Volume 28, Issue 6, pp 273–280 | Cite as

Improved Prediction of Protein Binding Sites from Sequences Using Genetic Algorithm



We undertook this project in response to the rapidly increasing number of protein structures with unknown functions in the Protein Data Bank. Here, we combined a genetic algorithm with a support vector machine to predict protein–protein binding sites. In an experiment on a testing dataset, we predicted the binding sites for 66% of our datasets, made up of 50 testing hetero-complexes. This classifier achieved greater sensitivity (60.17%), specificity (58.17%), accuracy (64.08%), and F-measure (54.79%), and a higher correlation coefficient (0.2502) than those of the support vector machine. This result can be used to guide biologists in designing specific experiments for protein analysis.


Protein–protein interaction sites Genetic algorithm Support vector machine Protein sequence profile 



Protein Data Bank


False positive


Support vector machine


False negative


Genetic algorithm and support vector machine


Correlation coefficient


True positive


True negative


Homology-derived secondary structure of protein

Supplementary material

10930_2009_9192_MOESM1_ESM.xls (16 kb)
(XLS 16 kb)


  1. 1.
    Ban YEA, Edelsbrunner H, Rudolph J (2006) JACM 53:361–378CrossRefGoogle Scholar
  2. 2.
    Keskin O, Gursoy A, Ma B, Nussinov R (2008) Chem Rev 108(4):1225–1244CrossRefGoogle Scholar
  3. 3.
    Keskin O, Nussinov R, Gursoy A (2008) Methods Mol Biol 484:505–521CrossRefGoogle Scholar
  4. 4.
    Zhou HX, Qin S (2007) Bioinformatics 23:2203CrossRefGoogle Scholar
  5. 5.
    de Vries SJ, Bonvin A (2008) Curr Protein Pept Sci 9:394–406CrossRefGoogle Scholar
  6. 6.
    Dominguez C, Boelens R, Bonvin A (2003) J Am Chem Sec 125:1731–1737CrossRefGoogle Scholar
  7. 7.
    Halperin I, Ma B, Wolfson H, Nussinov R (2002) Proteins-New York 47:409–443Google Scholar
  8. 8.
    Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S (2002) Acta Crystallogr D Biol Crystallogr 58:899–907Google Scholar
  9. 9.
    Ben-Shem A, Frolow F, Nelson N (2003) Nature 426:630–635CrossRefGoogle Scholar
  10. 10.
    Lanman J, Lam TKT, Barnes S, Sakalian M, Emmett MR, Marshall AG, Prevelige PE (2003) J Mol Biol 325:759–772CrossRefGoogle Scholar
  11. 11.
    Mrowka R, Patzak A, Herzel H (2001) Genome Res 11:1971–1973Google Scholar
  12. 12.
    Trester-Zedlitz M, Kamada K, Burley SK, Feny D, Chait BT, Muir TW (2003) Young 72:267–275Google Scholar
  13. 13.
    Chung JL, Wang W, Bourne PE (2006) Proteins-New York 62:630Google Scholar
  14. 14.
    Koike A, Takagi T (2004) Protein Eng Des Sel 17:165–173CrossRefGoogle Scholar
  15. 15.
    Wang B, San Wong H, Huang DS (2006) Protein Pept Lett 13:999–1005CrossRefGoogle Scholar
  16. 16.
    Chen H, Zhou HX (2005) Proteins Struct Funct Bioinformatics 61:21–35Google Scholar
  17. 17.
    Ofran Y, Rost B (2003) FEBS Lett 544:236–239CrossRefGoogle Scholar
  18. 18.
    Li MH, Lin L, Wang XL, Liu T (2007) Bioinformatics 23:597CrossRefGoogle Scholar
  19. 19.
    Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR (2006) J Mol Biol 362:365–386CrossRefGoogle Scholar
  20. 20.
    Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M (2003) Science 302:449–453Google Scholar
  21. 21.
    Chen X-w, Jeong JC (2009) Bioinformatics 25:585–591CrossRefGoogle Scholar
  22. 22.
    Šikić M, Tomić S, Vlahoviček K (2009) PLoS Comput Biol 5:e1000278CrossRefGoogle Scholar
  23. 23.
    Yan C, Dobbs D, Honavar V (2004) Bioinformatics 20:i371–i378CrossRefGoogle Scholar
  24. 24.
    Grosdidier S, Fernández-Recio J (2008) BMC Bioinformatics 9:447CrossRefGoogle Scholar
  25. 25.
    Res I, Mihalek I, Lichtarge O (2005) Bioinformatics 21:2496–2501CrossRefGoogle Scholar
  26. 26.
    Shulman-Peleg A, Shatsky M, Nussinov R, Wolfson H (2007) BMC Biol 5:43CrossRefGoogle Scholar
  27. 27.
    Li N, Sun Z, Jiang F (2008) BMC Bioinformatics 9:553CrossRefGoogle Scholar
  28. 28.
    Tuncbag N, Gursoy A, Guney E, Nussinov R, Keskin O (2008) J Mol Biol 381(3):785–802CrossRefGoogle Scholar
  29. 29.
    Bahadur RP, Zacharias M (2008) Cell Mol Life Sci 65:1059–1072CrossRefGoogle Scholar
  30. 30.
    Yan C, Wu F, Jernigan RL, Dobbs D, Honavar V (2008) Protein J 27:59–70CrossRefGoogle Scholar
  31. 31.
    Darnell S, LeGault L, Mitchell J (2008) Nucleic Acids Res 36:W265–W269CrossRefGoogle Scholar
  32. 32.
    Higurashi M, Ishida T, Kinoshita K (2009) Nucleic Acids Res 37:D360CrossRefGoogle Scholar
  33. 33.
    JS B, JH F, AT V (2008) BMC Bioinformatics 9:492CrossRefGoogle Scholar
  34. 34.
    Kufareva I, Budagyan L, Raush E, Totrov M, Abagyan R (2007) Proteins 67(2):400–417CrossRefGoogle Scholar
  35. 35.
    Neuvirth H, Heinemann U, Birnbaum D, Tishby N, Schreiber G (2007) Nucleic Acids Res 35:W543–W548CrossRefGoogle Scholar
  36. 36.
    Neuvirth H, Raz R, Schreiber G (2004) J Mol Biol 338:181–199CrossRefGoogle Scholar
  37. 37.
    Pla R, Molina A (2008) Procesamiento del Lenguaje Natural 40:137–143Google Scholar
  38. 38.
    Qin S, Zhou H (2007) Bioinformatics 23(24):3386–3387CrossRefGoogle Scholar
  39. 39.
    Schein C, Oezguen N, Power T, Braun W (2007) Bioinformatics 23(24):3397–3399CrossRefGoogle Scholar
  40. 40.
    Shulman-Peleg A, Shatsky M, Nussinov R, Wolfson H (2008) Nucleic Acids Res 36:W260–W264CrossRefGoogle Scholar
  41. 41.
    Tjong H, Qin S, Zhou H (2007) Nucleic Acids Res 35:W357–W362CrossRefGoogle Scholar
  42. 42.
    Wei Y, Ko J, Murga L, Ondrechen M (2007) BMC Bioinformatics 8:119CrossRefGoogle Scholar
  43. 43.
    Grefenstette JJ (1986) IEEE Transactions on Systems, Man and Cybernetics 16:122–128CrossRefGoogle Scholar
  44. 44.
    Wright AH (1991) Foundations of genetic algorithms 1:205–218Google Scholar
  45. 45.
    Szustakowski, JD and Weng Z (2000) Proteins Struct Funct Genetics 38:428–440Google Scholar
  46. 46.
    Jacob E, Sasikumar R, Nair KNR (2005) Bioinformatics 21:1403–1407CrossRefGoogle Scholar
  47. 47.
    Ooi CH, Tan P (2003) Bioinformatics 19:37–44Google Scholar
  48. 48.
    Dong Q, Wang X, Lin L, Guan Y (2007) BMC Bioinformatics 8:147CrossRefGoogle Scholar
  49. 49.
    McGinnis S, Madden TL (2004) Nucleic Acids Res 32:W20–W25CrossRefGoogle Scholar
  50. 50.
    Fariselli P, Pazos F, Valencia A, Casadio R (2002) Eur J Biochem 269:1356–1361CrossRefGoogle Scholar
  51. 51.
    Rost B, Sander C (1994) Proteins Struct Funct Genetics 20:216–226Google Scholar
  52. 52.
    Kabsch W, Sander C (1983) Biopolymers 22:2577–2637Google Scholar
  53. 53.
    Chow R, Zhong W, Blackmon M, Stolz R, Dowell M (2008) In: proceedings of the 10th annual conference on genetic and evolutionary computation, Atlanta, GA, pp 1373–1380Google Scholar
  54. 54.
    Dodge C, Schneider R, Sander C (1998) Nucleic Acids Res 26:313CrossRefGoogle Scholar
  55. 55.
    Guo Y, Yu L, Wen Z and Li M (2008) Nucleic Acids Res 36:3025–3030Google Scholar
  56. 56.
    Bradley AP (1997) Pattern Recogn 30:1145–1159CrossRefGoogle Scholar
  57. 57.
    Krishna Murthy HM, Judge K, DeLucas L, Padmanabhan R (2000) J Mol Biol 301:759–767CrossRefGoogle Scholar
  58. 58.
    Dai S, Schwendtmayer C, Schürmann P, Ramaswamy S, Eklund H (2000) Science 287:655CrossRefGoogle Scholar
  59. 59.
    Birtalan SC, Phillips RM, Ghosh P (2002) Mol Cell 9:971–980CrossRefGoogle Scholar
  60. 60.
    Huang B, Schroeder M (2005) In: proceedings of the German conference on bioinformatics GI LNI71, pp 159–173Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Key Laboratory of Intelligent Computing and Signal Processing, Ministry of EducationAnhui UniversityHefeiChina

Personalised recommendations