Evaluation of Machine Learning Algorithms on Protein-Protein Interactions

  • Indrajit Saha
  • Tomas Klingström
  • Simon Forsberg
  • Johan Wikander
  • Julian Zubek
  • Marcin Kierczak
  • Dariusz Plewczynski
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 242)


Protein-protein interactions are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using proteins’ sequence, structural and genomic data. Hence, this fact motivated us to perform a comparative study of various machine learning methods, training them on the set of known protein-protein interactions, using proteins’ global and local attributes. The results of the classifiers were evaluated through cross-validation and several performance measures were computed. It was noticed from the results that support vector machine outperformed other classifiers. This fact has also been established through statistical test, called Wilcoxon rank sum test, at 5% significance level.


bioinformatics machine learning protein-protein interactions 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1996)Google Scholar
  2. 2.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar
  3. 3.
    Breitkreutz, B.J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bähler, J., Wood, V., Dolinski, K., Tyers, M.: The BioGRID interaction database: 2008 update. Nucleic Acids Research 36, D637–D640 (2008)Google Scholar
  4. 4.
    Burger, L., van Nimwegen, E.: Accurate prediction of protein-protein interactions from sequence alignments using a bayesian method. Molecular Systems Biology 4 (2008)Google Scholar
  5. 5.
    Chatr-aryamontri, A., Ceol, A., Palazzi, L.M., Nardelli, G., Schneider, M.V., Castagnoli, L., Cesareni, G.: MINT: the molecular interaction database. Nucleic Acids Research 35, D572–D574 (2007)Google Scholar
  6. 6.
    Chu, Y.S., Liu, Y.Q., Wu, Q.: SVM-based prediction of protein-protein interactions of glucosinolate biosynthesis. In: Proceedings of International Conference on Machine Learning and Cybernetics (ICMLC 2012), vol. 2, pp. 471–476. IEEE (2012)Google Scholar
  7. 7.
    Deane, C.M., Salwiński, Ł., Xenarios, I., Eisenberg, D.: Protein interactions: Two methods for assessment of the reliability of high throughput observations. Molecular & Cellular Proteomics 1(5), 349–356 (2002)CrossRefGoogle Scholar
  8. 8.
    Hollander, M., Wolfe, D.A.: Nonparametric Statistical Methods, 2nd edn. Wiley-Interscience (1999)Google Scholar
  9. 9.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI 1995), pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  10. 10.
    Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R.P., Kohler, C., Khadake, J., Leroy, C., Liban, A., Lieftink, C., Montecchi-Palazzi, L., Orchard, S.E., Risse, J., Robbe, K., Roechert, B., Thorneycroft, D., Zhang, Y., Apweiler, R., Hermjakob, H.: IntAct–open source resource for molecular interaction data. Nucleic Acids Research 35, D561–D565 (2007)Google Scholar
  11. 11.
    Klingström, T., Plewczyński, D.: Protein-protein interaction and pathway databases, a graphical review. Briefings in Bioinformatics 12(6), 702–713 (2010)CrossRefGoogle Scholar
  12. 12.
    MacKay, D.J.C.: The evidence framework applied to classification networks. Neural Computation 4(5), 720–736 (1992)CrossRefGoogle Scholar
  13. 13.
    Muley, V.Y.: Improved computational prediction and analysis of protein - protein interaction networks. Ph.D. thesis, Manipal University, References pp. 138–150, Appendix 151–157 (2012)Google Scholar
  14. 14.
    Plewczynski, D., Tkacz, A., Wyrwicz, L.S., Rychlewski, L., Ginalski, K.: AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update. Journal of Molecular Modeling 14(1), 69–76 (2008)CrossRefGoogle Scholar
  15. 15.
    Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)CrossRefMATHGoogle Scholar
  16. 16.
    Reyes, J.A.: Machine learning for the prediction of protein-protein interactions. Ph.D. thesis, University of Glasgow (2010)Google Scholar
  17. 17.
    Saha, I., Maulik, U., Bandyopadhyay, S., Plewczynski, D.: Improvement of new automatic differential fuzzy clustering using SVM classifier for microarray analysis. Expert Systems with Applications 38(12), 15,122–15,133 (2011)Google Scholar
  18. 18.
    Saha, I., Mazzocco, G., Plewczynski, D.: Consensus classification of human leukocyte antigen class II proteins. Immunogenetics 65(2), 97–105 (2013)CrossRefGoogle Scholar
  19. 19.
    Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., Eisenberg, D.: The database of interacting proteins: 2004 update. Nucleic Acids Research 32, D449–D451 (2004)Google Scholar
  20. 20.
    The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)Google Scholar
  21. 21.
    Vapnik, V.: The nature of statistical learning theory. Springer (1995)Google Scholar
  22. 22.
    Vapnik, V.: Statistical Learning Theory. Wiley-Interscience (1998)Google Scholar
  23. 23.
    Wang, Y., Wang, J., Yang, Z., Deng, N.: Sequence-based protein-protein interaction prediction via support vector machine. Journal of Systems Science and Complexity 23(5), 1012–1023 (2010)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Yellaboina, S., Tasneem, A., Zaykin, D.V., Raghavachari, B., Jothi, R.: DOMINE: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Research 39, D730–D735 (2011)Google Scholar
  25. 25.
    Yu, G., Li, F., Qin, Y., Bo, X., Wu, Y., Wang, S.: GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7), 976–978 (2010)CrossRefGoogle Scholar
  26. 26.
    Yuan, Y., Shaw, M.J.: Induction of fuzzy decision trees. Fuzzy Sets and Systems 69(2), 125–139 (1995)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Zhang, L.V., Wong, S.L., King, O.D., Roth, F.P.: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5(1), 38 (2004)CrossRefGoogle Scholar
  28. 28.
    Zhao, X.W., Ma, Z.Q., Yin, M.H.: Predicting protein-protein interactions by combing various sequence-derived features into the general form of chou’s pseudo amino acid composition. Protein and Peptide Letters 19(5), 492–500 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Indrajit Saha
    • 1
  • Tomas Klingström
    • 2
  • Simon Forsberg
    • 3
  • Johan Wikander
    • 4
  • Julian Zubek
    • 5
  • Marcin Kierczak
    • 3
  • Dariusz Plewczynski
    • 2
  1. 1.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia
  2. 2.Interdisciplinary Centre for Mathematical and Computational ModelingUniversity of WarsawWarsawPoland
  3. 3.Department of Clinical Sciences, Computational Genetics SectionSwedish University of Agricultural SciencesUppsalaSweden
  4. 4.Bioinformatics Program, Faculty of Technology and Natural SciencesUppsala UniversityUppsalaSweden
  5. 5.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations