Journal of Systems Science and Complexity

, Volume 23, Issue 5, pp 1012–1023

Sequence-based protein-protein interaction prediction via support vector machine

  • Yongcui Wang
  • Jiguang Wang
  • Zhixia Yang
  • Naiyang Deng
Article

Abstract

This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in the amino acid level, and design a procedure to select negative training set for dealing with the training dataset imbalance problem, i.e., the number of interacting protein pairs is scarce relative to large scale non-interacting protein pairs. The proposed methods are validated on PPIs data of Plasmodium falciparum and Escherichia coli, and yields the predictive accuracy of 93.8% and 95.3%, respectively. The functional annotation analysis and database search indicate that our novel predictions are worthy of future experimental validation. The new methods will be useful supplementary tools for the future proteomics studies.

Key words

Imbalance problem protein-protein interactions sequence-based support vector machine 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Institute of Systems Science, Academy of Mathematics and Systems Science, CAS and Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yongcui Wang
    • 1
    • 2
  • Jiguang Wang
    • 3
  • Zhixia Yang
    • 4
  • Naiyang Deng
    • 1
  1. 1.College of ScienceChina Agricultural UniversityBeijingChina
  2. 2.Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau BiologyChinese Academy of SciencesXiningChina
  3. 3.Institute of Systems Science, Academy of Mathematics and Systems ScienceChinese Academy of SciencesBeijingChina
  4. 4.College of Mathematics and Systems ScienceXinjiang UniversityUrumuchiChina

Personalised recommendations