Abstract
Protein–protein interactions (PPIs) are important for the study of protein functions and pathways involved in different biological processes, as well as for understanding the cause and progression of diseases. Several high-throughput experimental techniques have been employed for the identification of PPIs in a few model organisms, but still, there is a huge gap in identifying all possible binary PPIs in an organism. Therefore, PPI prediction using machine-learning algorithms has been used in conjunction with experimental methods for discovery of novel protein interactions. The two most popular supervised machine-learning techniques used in the prediction of PPIs are support vector machines and random forest classifiers. Bayesian-probabilistic inference has also been used but mainly for the scoring of high-throughput PPI dataset confidence measures. Recently, deep-learning algorithms have been used for sequence-based prediction of PPIs. Several clustering methods such as hierarchical and k-means are useful as unsupervised machine-learning algorithms for the prediction of interacting protein pairs without explicit data labelling. In summary, machine-learning techniques have been widely used for the prediction of PPIs thus allowing experimental researchers to study cellular PPI networks.
Similar content being viewed by others
References
Alonso-López D, Gutiérrez MA, Lopes KP, Prieto C, Santamaría R and De Las Rivas J 2016 APID interactomes: Providing proteome-based interactomes with controlled quality for multiple species and derived networks. Nucleic Acids Res. 44 W529–W535
An JY, You ZH, Meng FR, Xu SJ and Wang Y 2016 RVMAB: Using the relevance vector machine model combined with average blocks to predict the interactions of proteins from protein sequences. Int. J. Mol. Sci. 17 757
Bader GD and Hogue CW 2003 An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinf. 4 2
Bader GR, Roth FP, Tavernier J and Vidal M 2017 HuRI: The human reference protein interactome mapping project (Canada: Bader Lab, The Donnelly Centre, The University of Toronto)
Bandyopadhyay S and Mallick K 2017 A new feature vector based on gene ontology terms for protein–protein interaction prediction. IEEE/ACM Trans. Comput. Biol. Bioinf./IEEE, ACM 14 762–770
Barabasi AL, Gulbahce N and Loscalzo J 2011 Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 12 56–68
Barman RK, Saha S and Das S 2014 Prediction of interactions between viral and host proteins using supervised machine learning methods. PLoS ONE 9 e112034
Barman RK, Jana T, Das S and Saha S 2015 Prediction of intra-species protein–protein interactions in enteropathogens facilitating systems biology study. PLoS ONE 10 e0145648
Ben-Hur A and Noble WS 2005 Kernel methods for predicting protein–protein interactions. Bioinformatics 21 (Suppl 1) i38–i46
Blagus R and Lusa L 2010 Class prediction for high-dimensional class-imbalanced data. BMC Bioinf. 11 523
Bock JR and Gough DA 2001 Predicting protein–protein interactions from primary structure. Bioinformatics 17 455–460
Bradford JR and Westhead DR 2005 Improved prediction of protein–protein binding sites using a support vector machines approach. Bioinformatics 21 1487–1494
Breiman L, Friedman J, Stone CJ and Olshen RA 1984 Classification and regression trees. Wadsworth statistics/probability (Belmont, California: Chapman & Hall/CRC)
Carducci M, Perfetto L, Briganti L, Paoluzi S, Costa S, Zerweck J, Schutkowski M, Castagnoli L and Cesareni G 2012 The protein interaction network mediated by human SH3 domains. Biotechnol. Adv. 30 4–15
Cestra G, Castagnoli L, Dente L, Minenkova O, Petrelli A, Migone N, Hoffmüller U, Schneider-Mergener J and Cesareni G 1999 The SH3 domains of endophilin and amphiphysin bind to the proline-rich region of synaptojanin 1 at distinct sites that display an unconventional binding specificity. J. Biol. Chem. 274 32001–32007
Chatr-Aryamontri A, Oughtred R, Boucher L, Rust J, Chang C, Kolas NK, O'Donnell L, Oster S, Theesfeld C, Sellam A, Stark C, Breitkreutz BJ, Dolinski K and Tyers M 2017 The BioGRID interaction database: 2017 update. Nucleic Acids Res. 45 D369–D379
Chatterjee P, Basu S, Kundu M, Nasipuri M and Plewczynski D 2011 PPI_SVM: Prediction of protein–protein interactions using machine learning, domain–domain affinities and frequency tables. Cell. Mol. Biol. Lett. 16 264–278
Chen XW and Liu M 2005 Prediction of protein–protein interactions using random decision forest framework. Bioinformatics 21 4394–4400
Chen XW and Jeong JC 2009 Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 25 585–591
Chen J, Sawyer N and Regan L 2013 Protein–protein interactions: General trends in the relationship between binding affinity and interfacial buried surface area. Protein Sci.: A Publ. Protein Soc. 22 510–515
Choi H, Larsen B, Lin ZY, Breitkreutz A, Mellacheruvu D, Fermin D, Qin ZS, Tyers M, Gingras AC and Nesvizhskii AI 2011 SAINT: Probabilistic scoring of affinity purification-mass spectrometry data. Nat. Methods 8 70–73
Collins BC, Gillet LC, Rosenberger G, Röst HL, Vichalkovski A, Gstaiger M and Aebersold R 2013 Quantifying protein interaction dynamics by SWATH mass spectrometry: Application to the 14-3-3 system. Nat. Methods 10 1246–1253
Du T, Liao L, Wu CH and Sun B 2016 Prediction of residue-residue contact matrix for protein–protein interaction with Fisher score features and deep learning. Methods 110 97–105
Fariselli P, Pazos F, Valencia A and Casadio R 2002 Prediction of protein–protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269 1356–1361
Fukuhara N and Kawabata T 2008 HOMCOS: A server to predict interacting protein pairs and interacting sites by homology modeling of complex structures. Nucleic Acids Res. 36 W185–W189
Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, et al. 2002 Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415 141–147
Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, et al. 2006 Proteome survey reveals modularity of the yeast cell machinery. Nature 440 631–636
Guo Y, Yu L, Wen Z and Li M 2008 Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res. 36 3025–3030
Hou T, Li N, Li Y and Wang W 2012 Characterization of domain–peptide interaction interface: Prediction of SH3 domain-mediated protein–protein interaction network in yeast by generic structure-based models. J. Proteome Res. 11 2982–2995
Huang YA, You ZH, Gao X, Wong L and Wang L 2015 Using weighted sparse representation model combined with discrete cosine transformation to predict protein–protein interactions from protein sequence. BioMed Res. Int. 2015 902198
Huang L, Liao L and Wu CH 2018 Completing sparse and disconnected protein–protein network by deep learning. BMC Bioinf. 19 103
Jain S and Bader GD 2016 Predicting physiologically relevant SH3 domain mediated protein–protein interactions in yeast. Bioinformatics 32 1865–1872
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF and Gerstein M 2003 A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302 449–453
Jones RB, Gordus A, Krall JA and MacBeath G 2006 A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439 168–174
Kaushansky A, Gordus A, Chang B, Rush J and MacBeath G 2008 A quantitative study of the recruitment potential of all intracellular tyrosine residues on EGFR, FGFR1 and IGF1R. Mol. BioSyst. 4 643–653
Kiemer L, Costa S, Ueffing M and Cesareni G 2007 WI-PHI: A weighted yeast interactome enriched for direct physical interactions. Proteomics 7 932–943
Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, et al. 2006 Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440 637–643
Kundu K, Mann M, Costa F and Backofen R 2014 MoDPepInt: An interactive web server for prediction of modular domain–peptide interactions. Bioinformatics 30 2668–2669
Landgraf C, Panni S, Montecchi-Palazzi L, Castagnoli L, Schneider-Mergener J, Volkmer-Engert R and Cesareni G 2004 Protein interaction networks by proteome peptide scanning. PLoS Biol. 2 E14
Li BQ, Feng KY, Chen L, Huang T and Cai YD 2012 Prediction of protein–protein interaction sites by random forest algorithm with mRMR and IFS. PLoS ONE 7 e43927
Li ZW, You ZH, Chen X, Li LP, Huang DS, Yan GY, Nie R and Huang YA 2017 Accurate prediction of protein–protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier. Oncotarget 8 23638–23649
Liu GH, Shen HB and Yu DJ 2016 Prediction of protein–protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J. Membr. Biol. 249 141–153
Liu P, Yang L, Shi D and Tang X 2015 Prediction of protein–protein interactions related to protein complexes based on protein interaction networks. BioMed Res. Int. 2015 259157
Maheshwari S and Brylinski M 2017 Across-proteome modeling of dimer structures for the bottom-up assembly of protein–protein interaction networks. BMC Bioinf. 18 257
Martin S, Roe D and Faulon JL 2005 Predicting protein–protein interactions using signature products. Bioinformatics 21 218–226
Mei S 2013 Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins. PLoS ONE 8 e79606
Miller ML, Jensen LJ, Diella F, Jørgensen C, Tinti M, Li L, Hsiung M, Parker SA, et al. 2008 Linear motif atlas for phosphorylation-dependent signaling. Sci. Signal. 1 ra2
Mostafavi S and Morris Q 2012 Combining many interaction networks to predict gene function and analyze gene lists. Proteomics 12 1687–1696
Mrowka R, Patzak A and Herzel H 2001 Is there a bias in proteome research? Genome Res. 11 1971–1973
Murakami Y and Mizuguchi K 2010 Applying the Naive Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 26 1841–1848
Navlakha S and Kingsford C 2010 The power of protein interaction networks for associating genes with diseases. Bioinformatics 26 1057–1063
Ofran Y and Rost B 2007 ISIS: Interaction sites identified from sequence. Bioinformatics 23 e13–e16
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, Campbell NH, et al. 2014 The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42 D358–D363
Qi Y, Bar-Joseph Z and Klein-Seetharaman J 2006 Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63 490–500
Rain JC, Selig L, De Reuse H, Battaglia V, Reverdy C, Simon S, Lenzen G, Petel F, Wojcik J, Schächter V, Chemama Y, Labigne A and Legrain P 2001 The protein–protein interaction map of Helicobacter pylori. Nature 409 211–215
Rodgers-Melnick E, Culp M and DiFazio SP 2013 Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS. BMC Genomics 14 608
Ruan P, Hayashida M, Akutsu T and Vert JP 2018 Improving prediction of heterodimeric protein complexes using combination with pairwise kernel. BMC Bioinf. 19 39
Saha S, Kaur P and Ewing RM 2010 The bait compatibility index: Computational bait selection for interaction proteomics experiments. J. Proteome Res. 9 4972–4981
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU and Eisenberg D 2004 The database of interacting proteins: 2004 update. Nucleic Acids Res. 32 D449–D451
Sardiu ME, Cai Y, Jin J, Swanson SK, Conaway RC, Conaway JW, Florens L and Washburn MP 2008 Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics. Proc. Natl. Acad. Sci. USA 105 1454–1459
Sarkar D, Jana T and Saha S 2015 LMPID: A manually curated database of linear motifs mediating protein–protein interactions. Database: J. Biol. Databases Curation 2015 bav014
Sarkar D, Jana T and Saha S 2018 LMDIPred: A web-server for prediction of linear peptide sequences binding to SH3, WW and PDZ domains. PLoS One 13 e0200430
Scott MS and Barton GJ 2007 Probabilistic prediction and ranking of human protein–protein interactions. BMC Bioinf. 8 239
Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y and Jiang H 2007 Predicting protein–protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA 104 4337–4341
Sowa ME, Bennett EJ, Gygi SP and Harper JW 2009 Defining the human deubiquitinating enzyme interaction landscape. Cell 138 389–403
Sparks AB, Rider JE, Hoffman NG, Fowlkes DM, Quillam LA and Kay BK 1996 Distinct ligand preferences of Src homology 3 domains from Src, Yes, Abl, Cortactin, p53bp2, PLCgamma, Crk, and Grb2. Proc. Natl. Acad. Sci. USA 93 1540–1544
Spirin V and Mirny LA 2003 Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA 100 12123–12128
Sprinzak E and Margalit H 2001 Correlated sequence-signatures as markers of protein–protein interaction. J. Mol. Biol. 311 681–692
Srinivasulu YS, Wang JR, Hsu KT, Tsai MJ, Charoenkwan P, Huang WL, Huang HL and Ho SY 2015 Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes. BMC Bioinf. 16 (Suppl 18) S14
Sriwastava BK, Basu S and Maulik U 2015 Predicting protein–protein interaction sites with a novel membership based fuzzy SVM classifier. IEEE/ACM Trans. Comput. Biol. Bioinf./IEEE, ACM 12 1394–1404
Stiffler MA, Chen JR, Grantcharova VP, Lei Y, Fuchs D, Allen JE, Zaslavskaia LA and MacBeath G 2007 PDZ domain binding selectivity is optimized across the mouse proteome. Science 317 364–369
Su C, Peregrin-Alvarez JM, Butland G, Phanse S, Fong V, Emili A and Parkinson J 2008 Bacteriome.org: an integrated protein interaction database for E. coli. Nucleic Acids Res. 36 D632–D636
Sun T, Zhou B, Lai L and Pei J 2017 Sequence-based prediction of protein–protein interaction using a deep-learning algorithm. BMC Bioinf. 18 277
Sze-To A, Fung S, Lee EA and Wong AKC 2016 Prediction of protein–protein interaction via co-occurring aligned pattern clusters. Methods 110 26–34
Teo G, Kim S, Tsou CC, Collins B, Gingras AC, Nesvizhskii AI and Choi H 2015 mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry. J. Proteomics 129 108–120
Teo G, Koh H, Fermin D, Lambert JP, Knight JD, Gingras AC and Choi H 2016 SAINTq: Scoring protein–protein interactions in affinity purification: mass spectrometry experiments with fragment or peptide intensity data. Proteomics 16 2238–2245
Tong AH, Drees B, Nardelli G, Bader GD, Brannetti B, Castagnoli L, Evangelista M, Ferracuti S, et al. 2002 A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295 321–324
Tonikian R, Zhang Y, Sazinsky SL, Currell B, Yeh JH, Reva B, Held HA, Appleton BA, et al. 2008 A specificity map for the PDZ domain family. PLoS Biol. 6 e239
Tonikian R, Xin X, Toret CP, Gfeller D, Landgraf C, Panni S, Paoluzi S, Castagnoli L, et al. 2009 Bayesian modeling of the yeast SH3 domain interactome predicts spatiotemporal dynamics of endocytosis proteins. PLoS Biol. 7 e1000218
Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, et al. 2000 A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403 623–627
Vapnik VN 1999 An overview of statistical learning theory. IEEE Trans. Neural Netw. 10 988–999
Venkatesan K, Rual JF, Vazquez A, Stelzl U, Lemmens I, Hirozane-Kishikawa T, Hao T, Zenkner M, et al. 2009 An empirical framework for binary interactome mapping. Nat. Methods 6 83–90
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S and Bork P 2002 Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417 399–403
Walhout AJ, Sordella R, Lu X, Hartley JL, Temple GF, Brasch MA, Thierry-Mieg N and Vidal M 2000 Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287 116–122
Wang J, Li C, Wang E and Wang X 2009 Uncovering the rules for protein–protein interactions from yeast genomic data. Proc. Natl. Acad. Sci. USA 106 3752–3757
Wang B, Chen P, Wang P, Zhao G and Zhang X 2010 Radial basis function neural network ensemble for predicting protein–protein interaction sites in heterocomplexes. Protein Pept. Lett. 17 1111–1116
Wang Y, You Z, Li X, Chen X, Jiang T and Zhang J 2017a PCVMZM: Using the probabilistic classification vector machines model combined with a zernike moments descriptor to predict protein–protein interactions from protein sequences. Int. J. Mol. Sci. 18(5) E1029
Wang YB, You ZH, Li X, Jiang TH, Chen X, Zhou X and Wang L 2017b Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol. BioSyst. 13 1336–1344
Wei ZS, Yang JY, Shen HB and Yu DJ 2015 A cascade random forests algorithm for predicting protein–protein interaction sites. IEEE Trans. Nanobiosci. 14 746–760
Wu J, Vallenius T, Ovaska K, Westermarck J, Mäkelä TP and Hautaniemi S 2009 Integrated network analysis platform for protein–protein interactions. Nat. Methods 6 75–77
Xenarios I, Salwínski L, Duan XJ, Higney P, Kim SM and Eisenberg D 2002 DIP, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30 303–305
Xia JF, Han K and Huang DS 2010 Sequence-based prediction of protein–protein interactions by means of rotation forest and autocorrelation descriptor. Protein Pept. Lett. 17 137–145
Xu B and Guan J 2014 From function to interaction: A new paradigm for accurately predicting protein complexes based on protein-to-protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinf./IEEE, ACM 11 616–627
You ZH, Lei YK, Zhu L, Xia J and Wang B 2013 Prediction of protein–protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinf. 14 (Suppl 8) S10
You ZH, Zhu L, Zheng CH, Yu HJ, Deng SP and Ji Z 2014 Prediction of protein–protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinf. 15 (Suppl 15) S9
You ZH, Chan KC and Hu P 2015a Predicting protein–protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS One 10 e0125811
You ZH, Li J, Gao X, He Z, Zhu L, Lei YK and Ji Z 2015b Detecting protein–protein interactions with a novel matrix-based protein sequence representation and support vector machines. BioMed Res. Int. 2015 867516
Yousef A and Moghadam Charkari N 2013 A novel method based on new adaptive LVQ neural network for predicting protein–protein interactions from protein sequences. J. Theor. Biol. 336 231–239
Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, et al. 2008 High-quality binary protein interaction map of the yeast interactome network. Science 322 104–110
Yu CY, Chou LC and Chang DT 2010 Predicting protein–protein interactions in unbalanced data using the primary structure of proteins. BMC Bioinf. 11 167
Yugandhar K and Gromiha MM 2014 Feature selection and classification of protein–protein complexes based on their binding affinities using machine learning approaches. Proteins 82 2088–2096
Zahiri J, Yaghoubi O, Mohammad-Noori M, Ebrahimpour R and Masoudi-Nejad A 2013 PPIevo: Protein–protein interaction prediction from PSSM based evolutionary information. Genomics 102 237–242
Zhang QC, Petrey D, Deng L, Qiang L, Shi Y, Thu CA, Bisikirska B, Lefebvre C, Accili D, Hunter T, Maniatis T, Califano A and Honig B 2012 Structure-based prediction of protein–protein interactions on a genome-wide scale. Nature 490 556–560
Zhou HX and Shan Y 2001 Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44 336–343
Zhou C, Yu H, Ding Y, Guo F and Gong XJ 2017 Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree. PLoS ONE 12 e0181426
Zhu H, Domingues FS, Sommer I and Lengauer T 2006 NOXclass: Prediction of protein–protein interaction types. BMC Bioinf. 7 27
Acknowledgements
DS acknowledges the DBT-sponsored project titled, ‘Centre of Excellence (CoE) in Bioinformatics Centre at Bose Institute’ for financial support. This work is dedicated to the Centenary of Bose Institute.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by BJ Rao
Corresponding editor: BJ Rao
Rights and permissions
About this article
Cite this article
Sarkar, D., Saha, S. Machine-learning techniques for the prediction of protein–protein interactions. J Biosci 44, 104 (2019). https://doi.org/10.1007/s12038-019-9909-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12038-019-9909-z