Sequence-based protein-protein interaction prediction via support vector machine

Wang, Yongcui; Wang, Jiguang; Yang, Zhixia; Deng, Naiyang

doi:10.1007/s11424-010-0214-z

Sequence-based protein-protein interaction prediction via support vector machine

Published: 09 November 2010

Volume 23, pages 1012–1023, (2010)
Cite this article

Journal of Systems Science and Complexity Aims and scope Submit manuscript

Yongcui Wang^1,2,
Jiguang Wang³,
Zhixia Yang⁴ &
…
Naiyang Deng¹

188 Accesses
10 Citations
Explore all metrics

Abstract

This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in the amino acid level, and design a procedure to select negative training set for dealing with the training dataset imbalance problem, i.e., the number of interacting protein pairs is scarce relative to large scale non-interacting protein pairs. The proposed methods are validated on PPIs data of Plasmodium falciparum and Escherichia coli, and yields the predictive accuracy of 93.8% and 95.3%, respectively. The functional annotation analysis and database search indicate that our novel predictions are worthy of future experimental validation. The new methods will be useful supplementary tools for the future proteomics studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A SVM-Based System for Predicting Protein-Protein Interactions Using a Novel Representation of Protein Sequences

Using discriminative vector machine model with 2DPCA to predict interactions among proteins

Article Open access 24 December 2019

Predicting protein–protein interaction sites using modified support vector machine

Article 11 January 2016

References

J. Wang, S. Zhang, Y. Wang, et al., Disease-aging network reveals significant roles of aging genes in connecting genetic diseases, PLoS Computational Biology, 2009, 5(9): e1000521.
Article Google Scholar
S. Fields and O. Song, A novel genetic system to detect protein-protein interactions, Nature, 1989, 340: 245–246.
Article Google Scholar
T. Ito, T. Chiba, R. Ozawa, et al., A comprehensive two-hybrid analysis to explore the yeast protein interactome, Proceedings of the National Academy of Sciences, 2001, 98: 4569–4574.
Article Google Scholar
A. C. Gavin, M. Boche, R. Krause, et al., Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 2002, 415: 141–147.
Article Google Scholar
Y. Ho, A. Gruhler, A. Heilbut, et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry, Nature, 2002, 415: 180–183.
Article Google Scholar
H. Zhu, M. Bilgin, R. Bangham, et al., Global analysis of protein activities using proteome chips, Science, 2001, 193: 2101–2105.
Article Google Scholar
Y. Z. Guo, L. Z. Yu, Z. N. Wen, and M. L. Li, Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences, Nucleic Acids Research, 2008, 36: 3025–3030.
Article Google Scholar
S. Martin, D. Roe, and J. L. Faulon, Predicting protein-protein interactions using signature products, Bioinformatics, 2005, 21: 218–226.
Article Google Scholar
K. C. Chou and Y. D. Cai, Predicting protein-protein interactions from sequences in a hybridization space, Journal of Proteome Research, 2006, 5: 316–322.
Article Google Scholar
R. Jansen, H. J. Bussemaker, and M. Gerstein, Revisiting the codon adaptation index from a wholegenome perspective: Analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models, Nucleic Acids Research, 2003, 31: 2242–2251.
Article Google Scholar
K. A. Dittmar, M. A. Sorensen, J. Elf, et al., Selective charging of tRNA isoacceptors induced by amino-acid starvation, EMBO Reports, 2005, 6: 151–157.
Article Google Scholar
H. S. Najafabadi and R. Salavati, Sequence-based prediction of protein-protein interactions by means of codon usage, Genome Biology, 2008, 9: R87–R95.
Article Google Scholar
J. W. Shen, J. Zhang, X. M. Luo, et al., Predicting protein-protein interactions based only on sequences information, Proceedings of the National Academy of Sciences, 2007, 104: 4337–4341.
Article Google Scholar
B. Schökopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2002.
Google Scholar
B. Schökopf, K. Tsuda, and J. P. Vert, Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004, 71–92.
Google Scholar
S. Kerrien, Y. Alam-Faruque, B. Aranda, et al., IntAct-open source resource for molecular interaction data, Nucleic Acids Research, 2007, 35: D561–D565.
Article Google Scholar
L. Salwinski, C. S. Miller, A. J. Smith, et al., The database of interacting proteins: 2004 update, Nucleic Acids Research, 2004, 32: D449–D451.
Article Google Scholar
G. D. Bader, I. Donaldson, C. Wolting, et al., BIND: The Biomolecular Interaction Network Database, Nucleic Acids Research, 2003, 31: 248–250.
Article Google Scholar
G. R. Mishra, M. Suresh, K. Kumaran, et al., Human protein reference database-2006 update, Nucleic Acids Research, 2006, 34: 411–414.
Article Google Scholar
Y. C. Wang, J. G. Wang, Z. X. Yang, et al., Prediction of protein-protein interaction based only on coding sequences, Proceedings of the 8th International Symposium on Optimization and Systems Biology, Zhangjiajie, 2009, 151–158.
http://www.genedb.org/.
C. W. Hsu, C. C. Chang, and C. J. Lin, A practical guide to Support Vector Classfication, 2007, URL: http://www.csie.ntu.edu.tw/cjlin.
M. Gribskov and N. L. Robinson, Use of receiver operating characteristic (roc) analysis to evaluate sequence matching, Computers and Chemistry, 1996, 20: 25–33.
Article Google Scholar
J. Platt, Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, Advances in Large Margin Classifiers, 1999: 61–74.
D. J. LaCount, M. Vignali, R. Chettier, et al., A protein interaction network of the malaria parasite Plasmodium falciparum, Nature, 2005, 10: 103–107.
Article Google Scholar
G. D. Bader and C. W. Hogue, Analyzing yeast protein-protein interaction data obtained from different sources, Nature Biotechnology, 2002, 20: 991–997.
Article Google Scholar
A. Kumar and M. Snyder, Protein complexes take the bait, Nature, 2002, 415: 123–124.
Article Google Scholar
C. Hertz-Fowler, C. S. Peacock, V. Wood, et al., GeneDB: A resource for prokaryotic and eukaryotic organisms, Nucleic Acids Research, 2004, 32: D339–D343.
Article Google Scholar
C. Aurrecoechea, J. Brestelli, B. P. Brunk, et al., PlasmoDB: A functional genomic database for malaria parasites, Nucleic Acids Research, 2009, 37: D539–D543.
Article Google Scholar
C. Su, J. M. Peregrin-Alvarez, G. Butland, et al., Bacteriome.org-an integrated protein interaction database for E. coli, Nucleic Acids Research, 2008, 36: D632–D636.
Article Google Scholar
I. M. Keseler, C. Bonavides-Martínez, J. Collado-Vides, et al., EcoCyc: A comprehensive view of Escherichia coli biology, Nucleic Acids Research, 2009, 37: D464–D470.
Article Google Scholar
E. Andres Leon, I. Ezkurdia, B. García, et al., EcID. A database for the inference of functional interactions in E. coli, Nucleic Acids Research, 2009, 37: D629–D635.
Article Google Scholar
A. Ben-Hur and W. S. Noble, Kernel methods for predicting protein-protein interactions, Bioinformatics, 2005, 21: i38–i46.
Article Google Scholar
K. C. Chou, Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins: Structure, Function, and Genetics, 2001, 43: 246–255.
Article Google Scholar
K. C. Chou, Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 2005, 21: 10–19.
Article Google Scholar
G. R. Lanckriet, M. Deng, N. Cristianini, et al., Kernel-based data fusion and its application to protein function prediction in yeast, Pacific Symposium on Biocomputing, 2004.
Y. Guan, C. Myers, D. Hess, et al., Predicting gene function in a hierarchical context with an ensemble of classifiers, Genome Biology, 2008, 9(S3).
B. Li, J. Hu, K. Hirasawa, et al., Support vector machine with fuzzy decision-making for realworld data classification, IEEE World Congress on Computational Intelligence, Int. Joint Conf. on Neural Networks, Canada, 2006.
R. Jayadeva Khemchandani and S. Chandra, Twin support vector machines for pattern classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29: 905–910.
Article Google Scholar
S. Ghorai, A. Mukherjee, and P. K. Dutta, Nonparallel plane proximal classifier, Signal Processing, 2008, 89: 510–522.
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Science, China Agricultural University, Beijing, 100083, China
Yongcui Wang & Naiyang Deng
Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810008, China
Yongcui Wang
Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Jiguang Wang
College of Mathematics and Systems Science, Xinjiang University, Urumuchi, 830046, China
Zhixia Yang

Authors

Yongcui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhixia Yang
View author publications
You can also search for this author in PubMed Google Scholar
Naiyang Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongcui Wang.

Additional information

This research is supported by the Key Project of the National Natural Science Foundation of China under Grant No. 10631070, the National Natural Science Foundation of China under Grant Nos. 10801112, 10971223, 11071252, and the Ph.D Graduate Start Research Foundation of Xinjiang University Funded Project under Grant No. BS080101.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Wang, J., Yang, Z. et al. Sequence-based protein-protein interaction prediction via support vector machine. J Syst Sci Complex 23, 1012–1023 (2010). https://doi.org/10.1007/s11424-010-0214-z

Download citation

Received: 02 November 2009
Revised: 29 January 2010
Published: 09 November 2010
Issue Date: October 2010
DOI: https://doi.org/10.1007/s11424-010-0214-z

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequence-based protein-protein interaction prediction via support vector machine

Abstract

Access this article

Similar content being viewed by others

A SVM-Based System for Predicting Protein-Protein Interactions Using a Novel Representation of Protein Sequences

Using discriminative vector machine model with 2DPCA to predict interactions among proteins

Predicting protein–protein interaction sites using modified support vector machine

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Sequence-based protein-protein interaction prediction via support vector machine

Abstract

Access this article

Similar content being viewed by others

A SVM-Based System for Predicting Protein-Protein Interactions Using a Novel Representation of Protein Sequences

Using discriminative vector machine model with 2DPCA to predict interactions among proteins

Predicting protein–protein interaction sites using modified support vector machine

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation