The Protein Journal

, Volume 28, Issue 6, pp 273–280 | Cite as

Improved Prediction of Protein Binding Sites from Sequences Using Genetic Algorithm



We undertook this project in response to the rapidly increasing number of protein structures with unknown functions in the Protein Data Bank. Here, we combined a genetic algorithm with a support vector machine to predict protein–protein binding sites. In an experiment on a testing dataset, we predicted the binding sites for 66% of our datasets, made up of 50 testing hetero-complexes. This classifier achieved greater sensitivity (60.17%), specificity (58.17%), accuracy (64.08%), and F-measure (54.79%), and a higher correlation coefficient (0.2502) than those of the support vector machine. This result can be used to guide biologists in designing specific experiments for protein analysis.


Protein–protein interaction sites Genetic algorithm Support vector machine Protein sequence profile 



Protein Data Bank


False positive


Support vector machine


False negative


Genetic algorithm and support vector machine


Correlation coefficient


True positive


True negative


Homology-derived secondary structure of protein

Supplementary material

10930_2009_9192_MOESM1_ESM.xls (16 kb)
(XLS 16 kb)


Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Key Laboratory of Intelligent Computing and Signal Processing, Ministry of EducationAnhui UniversityHefeiChina

