Skip to main content
Log in

Sequence-based protein-protein interaction prediction via support vector machine

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in the amino acid level, and design a procedure to select negative training set for dealing with the training dataset imbalance problem, i.e., the number of interacting protein pairs is scarce relative to large scale non-interacting protein pairs. The proposed methods are validated on PPIs data of Plasmodium falciparum and Escherichia coli, and yields the predictive accuracy of 93.8% and 95.3%, respectively. The functional annotation analysis and database search indicate that our novel predictions are worthy of future experimental validation. The new methods will be useful supplementary tools for the future proteomics studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. Wang, S. Zhang, Y. Wang, et al., Disease-aging network reveals significant roles of aging genes in connecting genetic diseases, PLoS Computational Biology, 2009, 5(9): e1000521.

    Article  Google Scholar 

  2. S. Fields and O. Song, A novel genetic system to detect protein-protein interactions, Nature, 1989, 340: 245–246.

    Article  Google Scholar 

  3. T. Ito, T. Chiba, R. Ozawa, et al., A comprehensive two-hybrid analysis to explore the yeast protein interactome, Proceedings of the National Academy of Sciences, 2001, 98: 4569–4574.

    Article  Google Scholar 

  4. A. C. Gavin, M. Boche, R. Krause, et al., Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 2002, 415: 141–147.

    Article  Google Scholar 

  5. Y. Ho, A. Gruhler, A. Heilbut, et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry, Nature, 2002, 415: 180–183.

    Article  Google Scholar 

  6. H. Zhu, M. Bilgin, R. Bangham, et al., Global analysis of protein activities using proteome chips, Science, 2001, 193: 2101–2105.

    Article  Google Scholar 

  7. Y. Z. Guo, L. Z. Yu, Z. N. Wen, and M. L. Li, Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences, Nucleic Acids Research, 2008, 36: 3025–3030.

    Article  Google Scholar 

  8. S. Martin, D. Roe, and J. L. Faulon, Predicting protein-protein interactions using signature products, Bioinformatics, 2005, 21: 218–226.

    Article  Google Scholar 

  9. K. C. Chou and Y. D. Cai, Predicting protein-protein interactions from sequences in a hybridization space, Journal of Proteome Research, 2006, 5: 316–322.

    Article  Google Scholar 

  10. R. Jansen, H. J. Bussemaker, and M. Gerstein, Revisiting the codon adaptation index from a wholegenome perspective: Analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models, Nucleic Acids Research, 2003, 31: 2242–2251.

    Article  Google Scholar 

  11. K. A. Dittmar, M. A. Sorensen, J. Elf, et al., Selective charging of tRNA isoacceptors induced by amino-acid starvation, EMBO Reports, 2005, 6: 151–157.

    Article  Google Scholar 

  12. H. S. Najafabadi and R. Salavati, Sequence-based prediction of protein-protein interactions by means of codon usage, Genome Biology, 2008, 9: R87–R95.

    Article  Google Scholar 

  13. J. W. Shen, J. Zhang, X. M. Luo, et al., Predicting protein-protein interactions based only on sequences information, Proceedings of the National Academy of Sciences, 2007, 104: 4337–4341.

    Article  Google Scholar 

  14. B. Schökopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  15. B. Schökopf, K. Tsuda, and J. P. Vert, Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004, 71–92.

    Google Scholar 

  16. S. Kerrien, Y. Alam-Faruque, B. Aranda, et al., IntAct-open source resource for molecular interaction data, Nucleic Acids Research, 2007, 35: D561–D565.

    Article  Google Scholar 

  17. L. Salwinski, C. S. Miller, A. J. Smith, et al., The database of interacting proteins: 2004 update, Nucleic Acids Research, 2004, 32: D449–D451.

    Article  Google Scholar 

  18. G. D. Bader, I. Donaldson, C. Wolting, et al., BIND: The Biomolecular Interaction Network Database, Nucleic Acids Research, 2003, 31: 248–250.

    Article  Google Scholar 

  19. G. R. Mishra, M. Suresh, K. Kumaran, et al., Human protein reference database-2006 update, Nucleic Acids Research, 2006, 34: 411–414.

    Article  Google Scholar 

  20. Y. C. Wang, J. G. Wang, Z. X. Yang, et al., Prediction of protein-protein interaction based only on coding sequences, Proceedings of the 8th International Symposium on Optimization and Systems Biology, Zhangjiajie, 2009, 151–158.

  21. http://www.genedb.org/.

  22. C. W. Hsu, C. C. Chang, and C. J. Lin, A practical guide to Support Vector Classfication, 2007, URL: http://www.csie.ntu.edu.tw/cjlin.

  23. M. Gribskov and N. L. Robinson, Use of receiver operating characteristic (roc) analysis to evaluate sequence matching, Computers and Chemistry, 1996, 20: 25–33.

    Article  Google Scholar 

  24. J. Platt, Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, Advances in Large Margin Classifiers, 1999: 61–74.

  25. D. J. LaCount, M. Vignali, R. Chettier, et al., A protein interaction network of the malaria parasite Plasmodium falciparum, Nature, 2005, 10: 103–107.

    Article  Google Scholar 

  26. G. D. Bader and C. W. Hogue, Analyzing yeast protein-protein interaction data obtained from different sources, Nature Biotechnology, 2002, 20: 991–997.

    Article  Google Scholar 

  27. A. Kumar and M. Snyder, Protein complexes take the bait, Nature, 2002, 415: 123–124.

    Article  Google Scholar 

  28. C. Hertz-Fowler, C. S. Peacock, V. Wood, et al., GeneDB: A resource for prokaryotic and eukaryotic organisms, Nucleic Acids Research, 2004, 32: D339–D343.

    Article  Google Scholar 

  29. C. Aurrecoechea, J. Brestelli, B. P. Brunk, et al., PlasmoDB: A functional genomic database for malaria parasites, Nucleic Acids Research, 2009, 37: D539–D543.

    Article  Google Scholar 

  30. C. Su, J. M. Peregrin-Alvarez, G. Butland, et al., Bacteriome.org-an integrated protein interaction database for E. coli, Nucleic Acids Research, 2008, 36: D632–D636.

    Article  Google Scholar 

  31. I. M. Keseler, C. Bonavides-Martínez, J. Collado-Vides, et al., EcoCyc: A comprehensive view of Escherichia coli biology, Nucleic Acids Research, 2009, 37: D464–D470.

    Article  Google Scholar 

  32. E. Andres Leon, I. Ezkurdia, B. García, et al., EcID. A database for the inference of functional interactions in E. coli, Nucleic Acids Research, 2009, 37: D629–D635.

    Article  Google Scholar 

  33. A. Ben-Hur and W. S. Noble, Kernel methods for predicting protein-protein interactions, Bioinformatics, 2005, 21: i38–i46.

    Article  Google Scholar 

  34. K. C. Chou, Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins: Structure, Function, and Genetics, 2001, 43: 246–255.

    Article  Google Scholar 

  35. K. C. Chou, Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 2005, 21: 10–19.

    Article  Google Scholar 

  36. G. R. Lanckriet, M. Deng, N. Cristianini, et al., Kernel-based data fusion and its application to protein function prediction in yeast, Pacific Symposium on Biocomputing, 2004.

  37. Y. Guan, C. Myers, D. Hess, et al., Predicting gene function in a hierarchical context with an ensemble of classifiers, Genome Biology, 2008, 9(S3).

  38. B. Li, J. Hu, K. Hirasawa, et al., Support vector machine with fuzzy decision-making for realworld data classification, IEEE World Congress on Computational Intelligence, Int. Joint Conf. on Neural Networks, Canada, 2006.

  39. R. Jayadeva Khemchandani and S. Chandra, Twin support vector machines for pattern classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29: 905–910.

    Article  Google Scholar 

  40. S. Ghorai, A. Mukherjee, and P. K. Dutta, Nonparallel plane proximal classifier, Signal Processing, 2008, 89: 510–522.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongcui Wang.

Additional information

This research is supported by the Key Project of the National Natural Science Foundation of China under Grant No. 10631070, the National Natural Science Foundation of China under Grant Nos. 10801112, 10971223, 11071252, and the Ph.D Graduate Start Research Foundation of Xinjiang University Funded Project under Grant No. BS080101.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Wang, J., Yang, Z. et al. Sequence-based protein-protein interaction prediction via support vector machine. J Syst Sci Complex 23, 1012–1023 (2010). https://doi.org/10.1007/s11424-010-0214-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-010-0214-z

Key words

Navigation