Protein Features Identification for Machine Learning-Based Prediction of Protein-Protein Interactions

  • Khalid Raza
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 750)


The long awaited challenge of post-genomic era and systems biology research is computational prediction of protein-protein interactions (PPIs) that ultimately lead to protein functions prediction. The important research questions is how protein complexes with known sequence and structure be used to identify and classify protein binding sites, and how to infer knowledge from these classification such as predicting PPIs of proteins with unknown sequence and structure. Several machine learning techniques have been applied for the prediction of PPIs, but the accuracy of their prediction wholly depends on the number of features being used for training. In this paper, we have performed a survey of protein features used for the prediction of PPIs. The open research challenges and opportunities in the area have also been discussed.


Protein-protein interactions Machine learning Supervised learning Feature selection Protein features 



This work is financially supported by Jamia Millia Islamia, New Delhi, India under innovative research activities.


  1. Blow, N.: Systems biology: untangling the protein web. Nature 460, 415–418 (2009)CrossRefGoogle Scholar
  2. Bock, J.R., Gough, D.A.: Predicting protein–protein interactions from primary structure. Bioinformatics 17, 455–460 (2001)CrossRefGoogle Scholar
  3. Bordner, A.J., Abagyan, R.: Statistical analysis and prediction of protein-protein interfaces. Proteins 60, 353–366 (2005)CrossRefGoogle Scholar
  4. Browne, F., Wang, H., Zheng, H., Azuaje, F.: An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions. J. Integr. Bioinform. 3, 230–246 (2006)Google Scholar
  5. Chatterjee, P., et al.: PPI_SVM: prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables. Cell. Mol. Biol. Lett. 16, 264–278 (2011)CrossRefGoogle Scholar
  6. Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26, 73–79 (1998)CrossRefGoogle Scholar
  7. Cho, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T., Gabrielian, A., Landsman, D., Lockhart, D., Davis, R.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2(1), 65–73 (1998)CrossRefGoogle Scholar
  8. Connolly, M.L.: Solvent-accessible surfaces of proteins and nucleic acids. Science 221(4612), 709–713 (1983)CrossRefGoogle Scholar
  9. De Las Rivas, J., de Luis, A.: Interactome data and databases: different types of protein interaction. Comp. Funct. Genomics 5, 173–178 (2004)CrossRefGoogle Scholar
  10. De Las Rivas, J., Fontanillo, C.: Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput. Biol. 6(6), e1000807 (2010). doi: 10.1371/journal.pcbi.1000807 CrossRefGoogle Scholar
  11. Deng, L., Guan, J., Dong, Q., Zhou, S.: Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinformatics, 10, 426 (2009)CrossRefGoogle Scholar
  12. Dong, Q., Wang, X., Lin, L., Guan, Y.: Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins. BMC Bioinformatics 8, 147 (2007)Google Scholar
  13. Fariselli, P., et al.: Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269, 1356–1361 (2002)CrossRefGoogle Scholar
  14. Grigoriev, A.: On the number of protein- protein interactions in the yeast proteome. Nucleic Acids Res. 31, 4157–4161 (2003)CrossRefGoogle Scholar
  15. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003). doi: 10.1126/science.1087361 CrossRefGoogle Scholar
  16. Lee, B., Richards, F.M.: The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55(3), 379–400 (1971)CrossRefGoogle Scholar
  17. Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J Mol. Biol. 257, 342–358 (1996)CrossRefGoogle Scholar
  18. Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, no. 1998, pp. 296–304 (1998)Google Scholar
  19. Mewes, H.W., Frishman, D., Gruber, C., Geier, B., Haase, D., Kaps, A., Lemcke, K., Mannhaupt, G., Pfeiffer, F., Schuller, C., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 28, 37–40 (2000)CrossRefGoogle Scholar
  20. Mihel, J., Šikić, M., Tomic, S., Jeren, B., Vlahovicek, K.: PSAIA—protein structure and interaction analyzer. BMC Struct. Biol. 8, 21 (2008)CrossRefGoogle Scholar
  21. Neuvirth, H., Raz, R., Schreiber, G.: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. 338, 181–199 (2004)CrossRefGoogle Scholar
  22. Ofran, Y., Rost, B.: Predicted protein–protein interaction sites from local sequence information. FEBS Lett. 544, 236–239 (2003)CrossRefGoogle Scholar
  23. Patil, A., Nakamura, H.: Filtering high-throughput protein-protein interaction data using a combination of genomic features. BMC Bioinform. 6, 100 (2005)CrossRefGoogle Scholar
  24. Rao, V., Srinivas, K., Sujini, G.N., Sunand, G.N.: Protein-protein interaction detection: methods and analysis. J. Proteomics 12, e0173163 (2014)Google Scholar
  25. Res, I., Mihalek, I., Lichtarge, O.: An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics 21, 2496–2501 (2005)CrossRefGoogle Scholar
  26. Richmond, T.J.: Solvent accessible surface area and excluded volume in proteins: analytical equations for overlapping spheres and implications for the hydrophobic effect. J. Mol. Biol. 178(1), 63–89 (1984)CrossRefGoogle Scholar
  27. Schneider, R., Sander, C.: The HSSP database of protein structure- sequence alignments. Nucleic Acids Res. 24, 201–205 (1996)CrossRefGoogle Scholar
  28. Shrake, A., Rupley, J.A.: Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79(2), 351–371 (1973)CrossRefGoogle Scholar
  29. Šikić, M., Tomic, S., Vlahovicek, K.: Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput. Biol. 5, e1000278 (2009)CrossRefGoogle Scholar
  30. Wang, B., Sun, W., Zhang, J., Chen, P.: Current status of machine learning-based methods for identifying protein-protein interaction sites. Curr. Bioinform. 8, 177–182 (2013)CrossRefGoogle Scholar
  31. Wang, B., Chen, P., Huang, D.-S., Li, J., Lok, T.-M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580(2), 380–384 (2006)CrossRefGoogle Scholar
  32. Weiser, J., Shenkin, P.S., Still, W.C.: Approximate atomic surfaces from linear combinations of pairwise overlaps (LCPO). J. Comput. Chem. 20(2), 217–230 (1999)CrossRefGoogle Scholar
  33. You, Z., Ming, Z., Niu, B., Deng, S., Zhu, Z.: A SVM-based system for predicting protein-protein interactions using a novel representation of protein sequences. In: Huang, D.S., Bevilacqua, V., Figueroa, J.C., Premaratne, P. (eds.) ICIC 2013. LNCS, vol. 7995, pp. 629–637. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39479-9_73 CrossRefGoogle Scholar
  34. You, Z., Zhu, L., Zheng, C., Yu, H., Deng, S., Ji, Z.: Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinform. 15(Suppl 15), S9 (2014)CrossRefGoogle Scholar
  35. Yu, H., Greenbaum, D., Xin, LuH, Zhu, X., Gerstein, M.: Genomic analysis of essentiality within protein networks. Trends Genet. 20(6), 227–231 (2004)CrossRefGoogle Scholar
  36. Xue, L.C., Dobbs, D., Bonvin, A.M., Honavar, V.: Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett. 589(23), 3516–3526 (2015)CrossRefGoogle Scholar
  37. Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res 37, 455–458 (2009)CrossRefGoogle Scholar
  38. Zhang, M., Su, S., Bhatnagar, R., Hassett, D., Lu, L.: Prediction and analysis of the protein interactome in Pseudomonas aeruginosa to enable network-based drug target selection. PLoS ONE 7(7), e41202 (2012)CrossRefGoogle Scholar
  39. Zubek, J., Tatjewski, M., Boniecki, A., Mnich, M., Basu, S., Plewczynski, D.: Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae. Peer J. 3, 1041 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceJamia Millia IslamiaNew DelhiIndia

Personalised recommendations