Abstract
We undertook this project in response to the rapidly increasing number of protein structures with unknown functions in the Protein Data Bank. Here, we combined a genetic algorithm with a support vector machine to predict protein–protein binding sites. In an experiment on a testing dataset, we predicted the binding sites for 66% of our datasets, made up of 50 testing hetero-complexes. This classifier achieved greater sensitivity (60.17%), specificity (58.17%), accuracy (64.08%), and F-measure (54.79%), and a higher correlation coefficient (0.2502) than those of the support vector machine. This result can be used to guide biologists in designing specific experiments for protein analysis.
Similar content being viewed by others
Abbreviations
- PDB:
-
Protein Data Bank
- FP:
-
False positive
- SVM:
-
Support vector machine
- FN:
-
False negative
- GA/SVM:
-
Genetic algorithm and support vector machine
- CC:
-
Correlation coefficient
- TP:
-
True positive
- TN:
-
True negative
- HSSP:
-
Homology-derived secondary structure of protein
References
Ban YEA, Edelsbrunner H, Rudolph J (2006) JACM 53:361–378
Keskin O, Gursoy A, Ma B, Nussinov R (2008) Chem Rev 108(4):1225–1244
Keskin O, Nussinov R, Gursoy A (2008) Methods Mol Biol 484:505–521
Zhou HX, Qin S (2007) Bioinformatics 23:2203
de Vries SJ, Bonvin A (2008) Curr Protein Pept Sci 9:394–406
Dominguez C, Boelens R, Bonvin A (2003) J Am Chem Sec 125:1731–1737
Halperin I, Ma B, Wolfson H, Nussinov R (2002) Proteins-New York 47:409–443
Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S (2002) Acta Crystallogr D Biol Crystallogr 58:899–907
Ben-Shem A, Frolow F, Nelson N (2003) Nature 426:630–635
Lanman J, Lam TKT, Barnes S, Sakalian M, Emmett MR, Marshall AG, Prevelige PE (2003) J Mol Biol 325:759–772
Mrowka R, Patzak A, Herzel H (2001) Genome Res 11:1971–1973
Trester-Zedlitz M, Kamada K, Burley SK, Feny D, Chait BT, Muir TW (2003) Young 72:267–275
Chung JL, Wang W, Bourne PE (2006) Proteins-New York 62:630
Koike A, Takagi T (2004) Protein Eng Des Sel 17:165–173
Wang B, San Wong H, Huang DS (2006) Protein Pept Lett 13:999–1005
Chen H, Zhou HX (2005) Proteins Struct Funct Bioinformatics 61:21–35
Ofran Y, Rost B (2003) FEBS Lett 544:236–239
Li MH, Lin L, Wang XL, Liu T (2007) Bioinformatics 23:597
Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR (2006) J Mol Biol 362:365–386
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M (2003) Science 302:449–453
Chen X-w, Jeong JC (2009) Bioinformatics 25:585–591
Šikić M, Tomić S, Vlahoviček K (2009) PLoS Comput Biol 5:e1000278
Yan C, Dobbs D, Honavar V (2004) Bioinformatics 20:i371–i378
Grosdidier S, Fernández-Recio J (2008) BMC Bioinformatics 9:447
Res I, Mihalek I, Lichtarge O (2005) Bioinformatics 21:2496–2501
Shulman-Peleg A, Shatsky M, Nussinov R, Wolfson H (2007) BMC Biol 5:43
Li N, Sun Z, Jiang F (2008) BMC Bioinformatics 9:553
Tuncbag N, Gursoy A, Guney E, Nussinov R, Keskin O (2008) J Mol Biol 381(3):785–802
Bahadur RP, Zacharias M (2008) Cell Mol Life Sci 65:1059–1072
Yan C, Wu F, Jernigan RL, Dobbs D, Honavar V (2008) Protein J 27:59–70
Darnell S, LeGault L, Mitchell J (2008) Nucleic Acids Res 36:W265–W269
Higurashi M, Ishida T, Kinoshita K (2009) Nucleic Acids Res 37:D360
JS B, JH F, AT V (2008) BMC Bioinformatics 9:492
Kufareva I, Budagyan L, Raush E, Totrov M, Abagyan R (2007) Proteins 67(2):400–417
Neuvirth H, Heinemann U, Birnbaum D, Tishby N, Schreiber G (2007) Nucleic Acids Res 35:W543–W548
Neuvirth H, Raz R, Schreiber G (2004) J Mol Biol 338:181–199
Pla R, Molina A (2008) Procesamiento del Lenguaje Natural 40:137–143
Qin S, Zhou H (2007) Bioinformatics 23(24):3386–3387
Schein C, Oezguen N, Power T, Braun W (2007) Bioinformatics 23(24):3397–3399
Shulman-Peleg A, Shatsky M, Nussinov R, Wolfson H (2008) Nucleic Acids Res 36:W260–W264
Tjong H, Qin S, Zhou H (2007) Nucleic Acids Res 35:W357–W362
Wei Y, Ko J, Murga L, Ondrechen M (2007) BMC Bioinformatics 8:119
Grefenstette JJ (1986) IEEE Transactions on Systems, Man and Cybernetics 16:122–128
Wright AH (1991) Foundations of genetic algorithms 1:205–218
Szustakowski, JD and Weng Z (2000) Proteins Struct Funct Genetics 38:428–440
Jacob E, Sasikumar R, Nair KNR (2005) Bioinformatics 21:1403–1407
Ooi CH, Tan P (2003) Bioinformatics 19:37–44
Dong Q, Wang X, Lin L, Guan Y (2007) BMC Bioinformatics 8:147
McGinnis S, Madden TL (2004) Nucleic Acids Res 32:W20–W25
Fariselli P, Pazos F, Valencia A, Casadio R (2002) Eur J Biochem 269:1356–1361
Rost B, Sander C (1994) Proteins Struct Funct Genetics 20:216–226
Kabsch W, Sander C (1983) Biopolymers 22:2577–2637
Chow R, Zhong W, Blackmon M, Stolz R, Dowell M (2008) In: proceedings of the 10th annual conference on genetic and evolutionary computation, Atlanta, GA, pp 1373–1380
Dodge C, Schneider R, Sander C (1998) Nucleic Acids Res 26:313
Guo Y, Yu L, Wen Z and Li M (2008) Nucleic Acids Res 36:3025–3030
Bradley AP (1997) Pattern Recogn 30:1145–1159
Krishna Murthy HM, Judge K, DeLucas L, Padmanabhan R (2000) J Mol Biol 301:759–767
Dai S, Schwendtmayer C, Schürmann P, Ramaswamy S, Eklund H (2000) Science 287:655
Birtalan SC, Phillips RM, Ghosh P (2002) Mol Cell 9:971–980
Huang B, Schroeder M (2005) In: proceedings of the German conference on bioinformatics GI LNI71, pp 159–173
Acknowledgments
We would like to thank Dr. Chih-Jen Lin from the National Taiwan University for providing the original LIBSVM tool. This work was supported by the Project of the Provincial Natural Scientific Fund of the Bureau of Education of Anhui Province (KJ2007B239) and the Project of the Doctoral Foundation of the Ministry of Education, China. (200403057002).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Du, X., Cheng, J. & Song, J. Improved Prediction of Protein Binding Sites from Sequences Using Genetic Algorithm. Protein J 28, 273–280 (2009). https://doi.org/10.1007/s10930-009-9192-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10930-009-9192-1