Machine-learning techniques for the prediction of protein–protein interactions
- 3 Downloads
Protein–protein interactions (PPIs) are important for the study of protein functions and pathways involved in different biological processes, as well as for understanding the cause and progression of diseases. Several high-throughput experimental techniques have been employed for the identification of PPIs in a few model organisms, but still, there is a huge gap in identifying all possible binary PPIs in an organism. Therefore, PPI prediction using machine-learning algorithms has been used in conjunction with experimental methods for discovery of novel protein interactions. The two most popular supervised machine-learning techniques used in the prediction of PPIs are support vector machines and random forest classifiers. Bayesian-probabilistic inference has also been used but mainly for the scoring of high-throughput PPI dataset confidence measures. Recently, deep-learning algorithms have been used for sequence-based prediction of PPIs. Several clustering methods such as hierarchical and k-means are useful as unsupervised machine-learning algorithms for the prediction of interacting protein pairs without explicit data labelling. In summary, machine-learning techniques have been widely used for the prediction of PPIs thus allowing experimental researchers to study cellular PPI networks.
KeywordsClustering deep learning decision tree machine-learning techniques protein–protein interaction support vector machine
DS acknowledges the DBT-sponsored project titled, ‘Centre of Excellence (CoE) in Bioinformatics Centre at Bose Institute’ for financial support. This work is dedicated to the Centenary of Bose Institute.
- Bader GR, Roth FP, Tavernier J and Vidal M 2017 HuRI: The human reference protein interactome mapping project (Canada: Bader Lab, The Donnelly Centre, The University of Toronto)Google Scholar
- Bandyopadhyay S and Mallick K 2017 A new feature vector based on gene ontology terms for protein–protein interaction prediction. IEEE/ACM Trans. Comput. Biol. Bioinf./IEEE, ACM 14 762–770Google Scholar
- Breiman L, Friedman J, Stone CJ and Olshen RA 1984 Classification and regression trees. Wadsworth statistics/probability (Belmont, California: Chapman & Hall/CRC)Google Scholar
- Cestra G, Castagnoli L, Dente L, Minenkova O, Petrelli A, Migone N, Hoffmüller U, Schneider-Mergener J and Cesareni G 1999 The SH3 domains of endophilin and amphiphysin bind to the proline-rich region of synaptojanin 1 at distinct sites that display an unconventional binding specificity. J. Biol. Chem. 274 32001–32007PubMedCrossRefGoogle Scholar
- Li ZW, You ZH, Chen X, Li LP, Huang DS, Yan GY, Nie R and Huang YA 2017 Accurate prediction of protein–protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier. Oncotarget 8 23638–23649PubMedPubMedCentralGoogle Scholar