Journal of Systems Science and Complexity

, Volume 23, Issue 5, pp 1012–1023

Sequence-based protein-protein interaction prediction via support vector machine


    • College of ScienceChina Agricultural University
    • Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau BiologyChinese Academy of Sciences
  • Jiguang Wang
    • Institute of Systems Science, Academy of Mathematics and Systems ScienceChinese Academy of Sciences
  • Zhixia Yang
    • College of Mathematics and Systems ScienceXinjiang University
  • Naiyang Deng
    • College of ScienceChina Agricultural University

DOI: 10.1007/s11424-010-0214-z

Cite this article as:
Wang, Y., Wang, J., Yang, Z. et al. J Syst Sci Complex (2010) 23: 1012. doi:10.1007/s11424-010-0214-z


This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in the amino acid level, and design a procedure to select negative training set for dealing with the training dataset imbalance problem, i.e., the number of interacting protein pairs is scarce relative to large scale non-interacting protein pairs. The proposed methods are validated on PPIs data of Plasmodium falciparum and Escherichia coli, and yields the predictive accuracy of 93.8% and 95.3%, respectively. The functional annotation analysis and database search indicate that our novel predictions are worthy of future experimental validation. The new methods will be useful supplementary tools for the future proteomics studies.

Key words

Imbalance problemprotein-protein interactionssequence-basedsupport vector machine

Copyright information

© Institute of Systems Science, Academy of Mathematics and Systems Science, CAS and Springer-Verlag Berlin Heidelberg 2010