Abstract
Protein–protein interactions (PPIs), such as protein–protein inhibitor, antibody–antigen complex, and supercomplexes play diverse and important roles in cells. Recent advances in structural analysis methods, including cryo-EM, for the determination of protein complex structures are remarkable. Nevertheless, much room remains for improvement and utilization of computational methods to predict PPIs because of the large number and great diversity of unresolved complex structures. This review introduces a wide array of computational methods, including our own, for estimating PPIs including antibody–antigen interactions, offering both historical and forward-looking perspectives.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Protein–protein interactions (PPIs) play fundamentally important roles in cellular functions and biological processes, and structural understanding of the PPIs is important for the elucidation of those functions (Jones and Thornton 1995, 1996). In 2001, the Critical Assessment of PRediction of Interactions (CAPRI, 2022) began as a community-wide experiment designed to assess methods for predicting PPIs based on the estimation of PPIs for previously solved structures of protein complexes. The latest experiment (Round 54) was conducted during May–August in 2022 (CAPRI Round 54, 2022). A recent report of a CAPRI experiment (Lensink et al. 2020) indicated that the increase of structures of protein complexes deposited in the Protein Data Bank (PDB) enables their use as structural complex templates for predicting the structures of other protein complexes, particularly in homo protein–protein docking. That finding implies that classical docking may no longer be strictly necessary for the prediction of homo PPIs. Nevertheless, these methods have not yet been successfully applied to the prediction of hetero PPIs, including antibody–antigen interactions, and this area of research has room for improvement. In this review, we place our focus on computational docking-based approaches and AI-based approaches, introducing a wide array of methods, including our own, for PPI prediction.
Protein–protein docking
Traditional protein–protein docking methods have been of central importance for sampling the conformational space of protein complexes (Smith and Sternberg 2002). In the last 10 years, sophisticated high-precision docking methods such as HADDOCK (van Zundert et al. 2015), ClusPro (Desta et al. 2020), ZDOCK (Pierce et al. 2014), and LightDock (Jiménez-García et al. 2018) have been developed and continually improved. To sample conformations more efficiently, they have used not only three-dimensional structural information in both modeling and scoring steps, but also evolutionary information and information obtained from experiments of several types (van Noort et al. 2021). Evolutionary information has been used for the prediction of PPIs based on the idea that mutations of the residues involved in the interaction on a protein engender mutations of the interface residues on the partner protein (Jothi et al. 2006). The InterEvDock docking pipeline integrates a coarse-grained potential accounting for interface coevolution based on multiple sequence alignments (MSAs) of paired proteins (Yu et al. 2016). For many targets in recent CAPRI experiments, this method predicted several high-quality models (Lensink et al. 2020).
Professor Nakamura and his colleagues, including one of the authors of the present review (Y.T.), participated in CAPRI about 10 years ago (Fleishman et al. 2011; Moretti et al. 2013; Lensink et al. 2014). For each CAPRI experiment target, the docking of two component (mainly unbound) protein structures was performed to generate many complex models, and the position and relative geometry of the predicted interfaces were evaluated based on the scoring function. We constructed docking methods with and without evolutional tracing using conservation information, where the conservation information contributed to model selections particularly for the targets of enzyme and signal transduction proteins (Kanamori et al. 2007).
Recently, data of low-resolution shapes obtained by small-angle scattering methods such as SAXS, were also used for the filtering of docked models in several docking methods (van Noort et al. 2021), e.g., pyDockSAXS (Jiménez-García et al. 2015) and HADDOCK (van Zundert et al. 2015).
Scoring models
After sampling complex models, the most “appropriate” model(s) must usually be selected using some method such as scoring function, clustering, or consensus (Lensink and Wodak 2010). In CAPRI, the “scorer” performance is measured and reported as part of the competition (Lensink and Wodak 2010). Predicted models for each target were pooled and provided for the scoring experiment. Based on the idea that the accumulation of complementary interactions on local (residue or atomic level) surface regions forms the PPI between two proteins and strengthens their interaction, we specifically examined the complementarities of physicochemical properties such as electrostatic potential and hydrophobicity, and shapes of the surfaces to assess PPIs on the molecular surfaces of proteins (Tsuchiya et al. 2006a, b).
Machine-learning (ML) and AI-based scoring methods have been developed in the past few years (Lensink et al. 2020). The ML-based iScore is a linearly combined score of GraphRank with HADDOCK energetic terms (Geng et al. 2020). The GraphRank score is based on a support vector machine (SVM) classification between the interface graphs of native and non-native protein–protein interfaces, where the graph consists of nodes (interface residues with evolutionary conservation scores) and edges (their contacts) (Geng et al. 2020). The iScore separated native from non-native interfaces much better than the case of individual usage of the GraphRank or HADDOCK score, when the GraphRank contribution is higher than the energetic terms. Das et al. also developed an SVM classification-based scoring scheme by particularly addressing differences between native and non-native protein–protein interface features such as binding energy (ΔG), frequencies of hydrogen bonds and salt-bridges, and accessible and buried surface areas (Das and Chakrabarti 2021). Their findings demonstrated the accessible surface area as the most distinguishable feature. In native interfaces, the buried surface areas of Phe, Tyr, and Ile are markedly high whereas that of Lys is low, and hydrogen bonds in Arg with Asp or Glu are more important (Das and Chakrabarti 2021). Scoring based on these features led to higher accuracy than that provided by PatchDock scores (Schneidman-Duhovny et al. 2005). Additionally, we have developed the Protein Interface Analysis using COvarying signals (PIACO 2019), a method for identifying biological interfaces in protein crystal structures. PIACO is a statistical classifier that uses covariation calculated from MSAs as input (Fukasawa and Tomii 2019).
Another recently developed AI-based scoring method, known as DOVE, is based on the use of three-dimensional (3D) convolutional neural networks (Wang et al. 2020). It first maps the inputted decoy structures into a 3D grid, then scans and examines the protein–protein interfaces in terms of inter-atom interaction patterns and their energetic contributions. Finally, DOVE judges whether the decoy structure is close to the native structure, or not.
Hot spots
In PPI prediction, a hot spot is a prominent feature that allows one to distinguish between native and non-native interface models (DeLano 2002). Typically, a hot spot is a small set of residues that contribute greatly to PPI formation (Thorn and Bogan 2001). From the converse perspective, blocking hot spot residues might engender the disruption of the PPIs (Lu et al. 2020). Hot spots are fundamentally identified based on changes in binding free energy (ΔΔG) obtained by alanine scanning mutagenesis, whereas computational methods of the accurate predictions of hot spots have also been developed (Ovek et al. 2022). Ovek et al. introduce two interesting examples in their recent review (Ovek et al. 2022): disease-causing single mutations are likely to occur on hot spots (Ozdemir et al. 2018) and disease-causing PPI can be disrupted by a small molecule that binds to the target hot spots on the PPI (Lim et al. 2019). These findings suggest that hot spot information might contribute to the discovery of drugs that disrupt the disease-causing PPI (Ovek et al. 2022).
When we participated in the CAPRI experiment, several groups tried to predict hot spots to select near native models. Of those, Fernandez-Recio’s group predicted hot spots using the normalized interface propensity values derived from rigid-body docking, with electrostatics and desolvation scoring, which required no structural information of protein–protein complexes (Grosdidier and Fernández-Recio 2008). The same group recently constructed a method to detect protein–protein inhibitor (small molecule) binding sites by integrating molecular dynamics simulations for the generation of transient cavities on the interface with hot-spot predictions (Rosell and Fernández-Recio 2020). It generated and selected the transient cavities that were similar to known inhibitor binding sites (Rosell and Fernández-Recio 2020).
Template-based method
As we mentioned, protein–protein docking has been centrally important to PPI prediction. However, along with the increase of solved complex structures, it has been found to be more helpful to use them as templates for PPI prediction. As mentioned earlier (Lensink et al. 2020), found that the use of solved complex structures as templates is particularly appropriate for homo PPI prediction because of the abundance of such homo PPI structures within the PDB.
As an example, we introduce work from our own group in which we have developed a profile–profile comparison method: Fold Recognition TEchnique (FORTE) (Tomii and Akiyama 2004). The method was developed originally for predicting protein structures. We recognized that detecting and using complex templates is effective for PPI prediction (even for hetero PPI in some cases) through participation in past CAPRI-CASP assembly prediction experiments (Lensink et al. 2016, 2018; Nakamura et al. 2018). We also found that FORTE is helpful for selecting an appropriate form of the protein complex, even for cases with multiple forms of complexes observed among homologous proteins of a target (Nakamura et al. 2018). Indeed, as we mentioned below, most of AI-based approaches which utilize information of templates are powerful for PPI, except in the case of antibody–antigen interaction prediction (Ambrosetti et al. 2020).
As a general rule, the prediction capability for generating accurate target structures depends on the existence of available templates in the PDB (Lafita et al. 2018). In CAPRI-CASP12, according to the assessors’ definition, there were three levels of difficulty for estimating target complexes—EASY, MEDIUM, and HARD. We showed that template-based models based on profile–profile comparison methods are useful for predicting protein complexes, even for MEDIUM/HARD targets (Nakamura et al. 2018; Lafita et al. 2018; Lensink et al. 2018). This finding implies that PPIs also tend to be conserved and/or limited for many cases, although it is considered that complex structures are often not conserved during evolution (Poupon and Janin 2010).
Earlier attempts of application of AlphaFold2 to PPI prediction
When the great achievement of AlphaFold2 (AF2) for the modeling of monomeric structures of protein in CASP14 was announced, it was assumed that AF2 would be useful for the modeling of protein–protein complexes, even though it was originally trained on individual protein chains (Jumper et al. 2021). RoseTTaFold, the first attempt to replicate AF2 before the release of the AF2 codes, confirmed that deep neural network model trained for monomer structure prediction can predict protein–protein complex models with some degree of accuracy (Baek et al. 2021). They used the pseudo-multimer sequence, two or more sequences with a gap separating them, (and templates) as input instead of a monomer sequence. AF2 and RoseTTaFold have since been used for predicting PPIs in proteomes of human (Burke et al. 2022) and yeast (Humphreys et al. 2021).
Similar tricks to that developed in RoseTTaFold were also applied for AF2, such as AF2-Gap (in ColabFold (Mirdita et al. 2022)) and AF2-Linker (Moriwaki 2021; Evans et al. 2021). Other groups applied similar protocols for protein–peptide docking (Ko and Lee 2021; Lei et al. 2021; Tsaban et al. 2022). All of these used the pseudo-multimer sequence linking the peptide to the protein via poly-glycine as input. The protocol that was tested for the benchmark set shows higher performance than the previous state-of-the-art method (Tsaban et al. 2022). FoldDock, another sophisticated AF2-based multimer prediction method using a paired MSA, was developed and tested for a large benchmark set of heterodimers (Bryant et al. 2022a). These results underscore the superiority of AF2-based methods to other docking methods. Although the method described above is powerful for dimers (or trimer, perhaps), they might have limitations because of the difficulty in preparing good MSAs of larger complexes as their input. In contrast, MolPC is an ambitious attempt for using AF2 to build a larger complex (more than 10 chains) combining the predicted subcomponents using Monte Carlo tree search (Bryant et al. 2022b). The AF2Complex uses the MSAs of each chain by padding the gaps and templates as inputs and by generating complex models by the AF2 deep learning model after multiple recycling steps (Gao et al. 2022).
A combination of AF2 and an existing docking method was also tested (Ghani et al. 2021). The AF2-ClusPro method was adopted for the protein–protein docking benchmark 5.0 (BM5) (Vreven et al. 2015). Then we selected 17 heteromers which appeared in the PDB after May 2020 (the last date of structures utilized in the AF2 training set). The systematic benchmark of 152 diverse heterodimers from BM5.5 data set (Vreven et al. 2015; Guest et al. 2021) was expanded by adding more antibody–antigen cases to BM5, with results revealing a significant ability of AF2-based methods for all categories of modeling of complexes (except for modeling of antibody–antigen complexes) (Yin et al. 2022), approaches represented early attempts for AF2-based multimer prediction done mainly during the period between the release of the original AF2 (ver. 2.0.0) and that of the specifically trained AF2 for multimer prediction (ver. 2.1.0).
Multimetric version of AF2 and recent attempts to go beyond AF2
The multimeric version of AF2, AF2-Multimer, appeared recently (Evans et al. 2021). Benchmarking results obtained for the selected 17 heteromer and a large data set composed of recently released 4433 complexes was compared with AF2-Gap, AF2-linker and AF2-ClusPro, and it showed improvement in multimer prediction accuracy compared with those input-adjustment AF2-based multimer predictions. It was also demonstrated that the performance is generally higher for homomeric prediction than for heteromeric prediction, and that it is poor for prediction of the antibody–antigen complex. The latter result was also clarified by the benchmark of another group (Yin et al. 2022).
More recently, two avenues of approaches were used to go beyond the AF2 and AF2-Multimer which involved the development of replication of AF2 with a larger training dataset and development of the predictor using the protein language model (pLM) instead of MSA. Among the follow-up replicative studies of AF2, such as OpenFold (Ahdritz et al. 2022), Uni-Fold (Li et al. 2022a, b), MegaFold (Liu et al. 2022), and HelixFold (Wang et al. 2022), it was only Uni-Fold and Uni-Fold-symmetry (Li et al. 2022b) that could succeed not only in monomer prediction but also in multimer prediction upon using the original trained parameters or protocols. No third-party group has yet published a large benchmark result. Actually, pLM-based predictors such as OmegaFold (Wu et al. 2022), ESMFold (Lin et al. 2022), and IgFold (Ruffolo et al. 2022) are anticipated as the next breakthroughs in structure prediction because they omit the construction of MSA, which is crucially important for performance, and which is the most time-consuming part of AF2. In spite of these expectations, no MSA-free method has achieved performance equal to that of AF2, with no explicit implementation of multimer modeling with the exception that IgFold was built specifically for antibody modeling.
Although no explicit implementation of multimer treatment in OmegaFold has been described, it is relatively straightforward to use pseudo-multiple sequence inputs as an AF2-linker which we designate as OmegaFold-linker. Figure 1 presents multimer modeling results in terms of the DockQ score (Basu and Wallner 2016) using methods developed post AF2 such as OmegaFold-linker, AF2-Multimer (v2.1.1) with parameters released on Nov. 2021, and AF2-Multimer (v.2.2.0) with parameters released on Mar. 2022. In the construction of Fig. 1, AF2 was run with “max_template_date = 2020–10–01”. The targets are taken from those of CASP14 (not included in the training set of AF2 and AF2-multimer, but two of these structures (PDB IDs: 6N64 and 6YA2) of them are released before 2020–10–01). They are all dimers, and the target IDs are T1032 (PDB ID: 6N64), T1038 (PDB ID: 6YA2), T1054 (PDB ID: 6V4V), T1078 (PDB ID: 7CWP), H1045 (PDB ID: 6XOD), and H1065 (PDB ID: 7M5F). For OmegaFold-linker, due to the memory-limitation, T1032 with more than 500 residues was omitted. For Omega-linker and AF2-linker, the input sequences were connected via 21 length poly-GLY-GLY-SER linker according to the previous research (Evans et al. 2021). For comparison, DockQ scores of docking structures calculated using ZDOCK (Pierce et al. 2014) are also shown. As the initial monomer structures of ZDOCK, the same models predicted by AF2-Multimer (v.2.2.0) were used. Basically, the prediction difficulty depends on targets. For instance, all methods except ZDOCK show 0.77 or more for H1065, while all methods show DockQ < 0.04 for T1054. These trends, except for T1038 (6YA2) whose structure was released before 2020–10–01, are generally similar to those observed in CASP14. Although, this may imply that the prediction difficulty of PPI depends on the availability of “good” templates, the results accord with those reported from benchmark research (Evans et al. 2021; Bryant et al. 2022a): AF2-Multimer shows the highest performance, and AF2-based methods using parameters trained for monomers are still comparable in some cases.
Antibody–antigen interaction
The prediction of PPI between antibody and antigen proteins is not an easy task because of the flexibility of the antibody’s hypervariable loops, particularly the complementary determining loop 3 in the heavy chain (CDR-H3 loop). Four high-precision docking software suites ClusPro (Brenke et al. 2012), LightDock (Jiménez-García et al. 2018), ZDOCK (Pierce et al. 2014), and HADDOCK (van Zundert et al. 2015) were used to examine which structural information contributes to the accuracy of the model building of an antibody–antigen complex, such as the structural information of CDR loops, paratope (antigen-binding residues on an antibody CDR), antigen surface, and epitope (antibody-binding residues on an antigen) (Ambrosetti et al. 2020). The findings showed for all docking methods that the overall performance decreased without epitope information. They were improved by consideration of the low-resolution epitope information. Accurate modeling of the structure of the long CDR-H3 loop remains challenging. However, the flexible refinement of HADDOCK led to the improvement of the prediction accuracy of CDR-H3 loop conformations if the epitope information was available, even though the resolution of the information is low (Ambrosetti et al. 2020).
The AI-based prediction of antibody–antigen interfaces has been developed as PECAN (Pittala and Bailey-Kellogg 2020). Here, antibody and antigen structures are presented as the respective graphs. They are input to the neural network that consists of graph convolution, attention, and fully connected layers. The network discriminates antibody and antigen residues between interface and non-interface residues. The predictive accuracies of PECAN achieved when using the datasets provided with epitope (Krawczyk et al. 2014), and paratope prediction methods (Daberdaku and Ferrari 2019) were found to have higher precision and recall rates than these providers’ methods. It is noteworthy that the providers’ methods also predict epitope and paratope regions with high accuracies. Epitope prediction, using the program EpiPred, predicts epitopes based on geometric fitting and knowledge-based asymmetric antibody–antigen scoring (Krawczyk et al. 2014). The paratope prediction method uses a SVM classifier to distinguish interface surface patches from non-interface ones based on 3D Zernike descriptors that represent global and local protein surface shapes and physicochemical properties on the surfaces (Daberdaku and Ferrari 2019). Again, note that even recent AI-based approaches, such as AF2-Multimer, suffer from an inability to predict accurate antibody-antigen complex structures (as mentioned above).
To obtain information about antibody-specific epitopes, some improvements in the accuracy of docking and affinity predictions must be achieved. For this purpose, a large and non-redundant benchmark set for antibody–antigen docking and affinity prediction was constructed. It includes camelid nanobodies, therapeutic monoclonal antibodies, and broadly neutralizing antibodies that target viral glycoproteins (Guest et al. 2021).
Another possible approach is the search of similar regions on proteins to known antibody-binding epitopes. This approach is based on the fact that some antibodies can cross-reactively recognize different antigen proteins having similar surface regions in the structures and properties (Vieths et al. 2006; Negi and Braun 2017). Information about cross-reactivity might provide information about the repurposing of antibody drugs. We are striving to develop a database of known and putative epitopes on proteins (PoSSuMAg 2022), which has a similar scheme to that of the PoSSuM database (PoSSuM 2021), which is helpful to detect putative pockets that are similar to known ligand-binding sites on protein structures (Tabei et al. 2010; Ito et al. 2012b, a, 2015). Our new database presents information about putative epitopes that are similar to known epitopes on the antigen proteins in complex with antibodies, including antibody drugs. Information about known and putative epitopes on SARS-CoV-2 proteins will also be available. Although the current version includes information of putative epitopes on non-redundant protein structures, we expect to increase the data through future studies.
Concluding remarks
Here we have introduced and discussed a diverse range of computational methods, from docking-based to AI-based approaches, for PPI prediction. Along with the increase in the amount of information related to protein sequences and structures, AI-based approaches, especially those based on evolutionary information and templates, are expected to become more powerful and useful. Yet, prediction of hetero PPIs leaves room for improvement. Particularly, antibody–antigen interaction predictions remain very limited in terms of their accuracy, although many sophisticated prediction methods have been developed (Ambrosetti et al. 2020).
References
Ahdritz G, Bouatta N, Kadyan S, Xia Q, Gerecke W, O’Donnell TJ, Berenberg D, Fisk I, Zanichelli N, Zhang B, Nowaczynski A, Wang B, Stepniewska-Dziubinska MM, Zhang S, Ojewole A, Guney ME, Biderman S, Watkins AM, Ra S, Lorenzo PR, et al (2022) OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv 2022.11.20.517210. https://doi.org/10.1101/2022.11.20.517210
Ambrosetti F, Jiménez-García B, Roel-Touris J, Bonvin AMJJ (2020) Modeling antibody-antigen complexes by information-driven docking. Structure 28:119-129.e2. https://doi.org/10.1016/j.str.2019.10.011
Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, Millán C, Park H, Adams C, Glassman CR, DeGiovanni A, Pereira JH, Rodrigues AV, van Dijk AA, Ebrecht AC, Opperman DJ et al (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science 373:871–876. https://doi.org/10.1126/science.abj8754
Basu S, Wallner B (2016) DockQ: a quality measure for protein-protein docking models. PLoS ONE 11:e0161879. https://doi.org/10.1371/journal.pone.0161879
Brenke R, Hall DR, Chuang G-Y, Comeau SR, Bohnuud T, Beglov D, Schueler-Furman O, Vajda S, Kozakov D (2012) Application of asymmetric statistical potentials to antibody-protein docking. Bioinformatics 28:2608–2614. https://doi.org/10.1093/bioinformatics/bts493
Bryant P, Pozzati G, Elofsson A (2022a) Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun 13:1265. https://doi.org/10.1038/s41467-022-28865-w
Bryant P, Pozzati G, Zhu W, Shenoy A, Kundrotas P, Elofsson A (2022b) Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. BioRxiv. https://doi.org/10.1101/2022.03.12.484089
Burke DF, Bryant P, Barrio-Hernandez I, Memon D, Pozzati G, Shenoy A, Zhu W, Dunham AS, Albanese P, Keller A, Scheltema RA, Bruce JE, Leitner A, Kundrotas P, Beltrao P, Elofsson A (2022) Towards a structurally resolved human protein interaction network. bioRxiv. https://doi.org/10.1101/2021.11.08.467664
CAPRI (2022) CAPRI: Critical Assessment of PRediction of Interactions. https://www.ebi.ac.uk/pdbe/complex-pred/capri/ (Accessed 2nd December 2022)
CAPRI Round 54 (2022) CASP15-CAPRI assembly prediction experiment. https://www.ebi.ac.uk/pdbe/complex-pred/capri/round/54/ (Accessed 2nd December 2022)
Daberdaku S, Ferrari C (2019) Antibody interface prediction with 3D Zernike descriptors and SVM. Bioinformatics 35:1870–1876. https://doi.org/10.1093/bioinformatics/bty918
Das S, Chakrabarti S (2021) Classification and prediction of protein-protein interaction interface using machine learning algorithm. Sci Rep 11:1761. https://doi.org/10.1038/s41598-020-80900-2
DeLano WL (2002) Unraveling hot spots in binding interfaces: progress and challenges. Curr Opin Struct Biol 12:14–20. https://doi.org/10.1016/S0959-440X(02)00283-X
Desta IT, Porter KA, Xia B, Kozakov D, Vajda S (2020) Performance and its limits in rigid body protein-protein docking. Structure 28:1071-1081.e3. https://doi.org/10.1016/j.str.2020.06.006
Evans R, O’Neill M, Pritzel A, Antropova N, Senior A, Green T, Žídek A, Bates R, Blackwell S, Yim J, Ronneberger O, Bodenstein S, Zielinski M, Bridgland A, Potapenko A, Cowie A, Tunyasuvunakool K, Jain R, Clancy E, Kohli P et al (2021) Protein complex prediction with AlphaFold-Multimer. bioRxiv. https://doi.org/10.1101/2021.10.04.463034
Fleishman SJ, Whitehead TA, Strauch E-M, Corn JE, Qin S, Zhou H-X, Mitchell JC, Demerdash ONA, Takeda-Shitaka M, Terashi G, Moal IH, Li X, Bates PA, Zacharias M, Park H, Ko J, Lee H, Seok C, Bourquard T, Bernauer J et al (2011) Community-wide assessment of protein-interface modeling suggests improvements to design methodology. J Mol Biol 414:289–302. https://doi.org/10.1016/j.jmb.2011.09.031
Fukasawa Y, Tomii K (2019) Accurate classification of biological and non-biological interfaces in protein crystal structures using subtle covariation signals. Sci Rep 9:12603. https://doi.org/10.1038/s41598-019-48913-8
Gao M, Nakajima An D, Parks JM, Skolnick J (2022) AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 13:1744. https://doi.org/10.1038/s41467-022-29394-2
Geng C, Jung Y, Renaud N, Honavar V, Bonvin AMJJ, Xue LC (2020) iScore: a novel graph kernel-based function for scoring protein–protein docking models. Bioinformatics 36:112–121. https://doi.org/10.1093/bioinformatics/btz496
Ghani U, Desta I, Jindal A, Khan O, Jones G, Kotelnikov S, Padhorny D, Vajda S, Kozakov D (2021) Improved docking of protein models by a combination of Alphafold2 and ClusPro. bioRxiv. https://doi.org/10.1101/2021.09.07.459290
Grosdidier S, Fernández-Recio J (2008) Identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinformatics 9:447. https://doi.org/10.1186/1471-2105-9-447
Guest JD, Vreven T, Zhou J, Moal I, Jeliazkov JR, Gray JJ, Weng Z, Pierce BG (2021) An expanded benchmark for antibody-antigen docking and affinity prediction reveals insights into antibody recognition determinants. Structure 29:606-621.e5. https://doi.org/10.1016/J.STR.2021.01.005
Humphreys IR, Pei J, Baek M, Krishnakumar A, Anishchenko I, Ovchinnikov S, Zhang J, Ness TJ, Banjade S, Bagde SR, Stancheva VG, Li X-H, Liu K, Zheng Z, Barrero DJ, Roy U, Kuper J, Fernández IS, Szakal B, Branzei D et al (2021) Computed structures of core eukaryotic protein complexes. Science 374:eabm4805. https://doi.org/10.1126/science.abm4805
Ito J-I, Tabei Y, Shimizu K, Tomii K, Tsuda K (2012a) PDB-scale analysis of known and putative ligand-binding sites with structural sketches. Proteins Struct Funct Bioinforma 80:747–763. https://doi.org/10.1002/prot.23232
Ito J-I, Tabei Y, Shimizu K, Tsuda K, Tomii K (2012b) PoSSuM: a database of similar protein-ligand binding and putative pockets. Nucleic Acids Res 40:D541–D548. https://doi.org/10.1093/nar/gkr1130
Ito J, Ikeda K, Yamada K, Mizuguchi K, Tomii K (2015) PoSSuM vol 2.0: data update and a new function for investigating ligand analogs and target proteins of small-molecule drugs. Nucleic Acids Res 43:D392–D398. https://doi.org/10.1093/nar/gku1144
Jiménez-García B, Pons C, Svergun DI, Bernadó P, Fernández-Recio J (2015) pyDockSAXS: protein–protein complex structure by SAXS and computational docking. Nucleic Acids Res 43:W356–W361. https://doi.org/10.1093/nar/gkv368
Jiménez-García B, Roel-Touris J, Romero-Durana M, Vidal M, Jiménez-González D, Fernández-Recio J (2018) LightDock: a new multi-scale approach to protein–protein docking. Bioinformatics 34:49–55. https://doi.org/10.1093/bioinformatics/btx555
Jones S, Thornton JM (1995) Protein-protein interactions: a review of protein dimer structures. Prog Biophys Mol Biol 63:31–65. https://doi.org/10.1016/0079-6107(94)00008-w
Jones S, Thornton JM (1996) Principles of protein-protein interactions. Proc Natl Acad Sci U S A 93:13–20. https://doi.org/10.1073/pnas.93.1.13
Jothi R, Cherukuri PF, Tasneem A, Przytycka TM (2006) Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol 362:861–875. https://doi.org/10.1016/j.jmb.2006.07.072
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. https://doi.org/10.1038/s41586-021-03819-2
Kanamori E, Murakami Y, Tsuchiya Y, Standley DM, Nakamura H, Kinoshita K (2007) Docking of protein molecular surfaces with evolutionary trace analysis. Proteins Struct Funct Bioinforma 69:832–838. https://doi.org/10.1002/prot.21737
Ko J, Lee J (2021) Can AlphaFold2 predict protein-peptide complex structures accurately? bioRxiv. https://doi.org/10.1101/2021.07.27.453972
Krawczyk K, Liu X, Baker T, Shi J, Deane CM (2014) Improving B-cell epitope prediction and its application to global antibody-antigen docking. Bioinformatics 30:2288–2294. https://doi.org/10.1093/bioinformatics/btu190
Lafita A, Bliven S, Kryshtafovych A, Bertoni M, Monastyrskyy B, Duarte JM, Schwede T, Capitani G (2018) Assessment of protein assembly prediction in CASP12. Proteins Struct Funct Bioinforma 86:247–256. https://doi.org/10.1002/prot.25408
Lei Y, Li S, Liu Z, Wan F, Tian T, Li S, Zhao D, Zeng J (2021) A deep-learning framework for multi-level peptide-protein interaction prediction. Nat Commun 12:5465. https://doi.org/10.1038/s41467-021-25772-4
Lensink MF, Wodak SJ (2010) Docking and scoring protein interactions: CAPRI 2009. Proteins 78:3073–3084. https://doi.org/10.1002/prot.22818
Lensink MF, Moal IH, Bates PA, Kastritis PL, Melquiond ASJ, Karaca E, Schmitz C, van Dijk M, Bonvin AMJJ, Eisenstein M, Jiménez-García B, Grosdidier S, Solernou A, Pérez-Cano L, Pallara C, Fernández-Recio J, Xu J, Muthu P, Praneeth Kilambi K, Gray JJ et al (2014) Blind prediction of interfacial water positions in CAPRI. Proteins Struct Funct Bioinforma 82:620–632. https://doi.org/10.1002/prot.24439
Lensink MF, Velankar S, Kryshtafovych A, Huang SY, Schneidman-Duhovny D, Sali A, Segura J, Fernandez-Fuentes N, Viswanath S, Elber R, Grudinin S, Popov P, Neveu E, Lee H, Baek M, Park S, Heo L, Lee GR, Seok C, Qin S et al (2016) Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling: a CASP-CAPRI experiment. Proteins Struct Funct Bioinforma 84:323–348. https://doi.org/10.1002/prot.25007
Lensink MF, Velankar S, Baek M, Heo L, Seok C, Wodak SJ (2018) The challenge of modeling protein assemblies: the CASP12-CAPRI experiment. Proteins Struct Funct Bioinforma 86:257–273. https://doi.org/10.1002/prot.25419
Lensink MF, Nadzirin N, Velankar S, Wodak SJ (2020) Modeling protein‐protein, protein‐peptide, and protein‐oligosaccharide complexes: CAPRI 7th edition. Proteins Struct Funct Bioinforma 88:916–938. https://doi.org/10.1002/prot.25870
Li Z, Liu X, Chen W, Shen F, Bi H, Ke G, Zhang L, Technology DP (2022a) Uni-Fold: an open-source platform for developing protein folding models beyond AlphaFold. bioRxiv. https://doi.org/10.1101/2022.08.04.502811
Li Z, Yang S, Liu X, Chen W, Wen H, Shen F, Ke G, Zhang L, Technology DP (2022b) Uni-Fold Symmetry: harnessing symmetry in folding large protein complexes. bioRxiv. https://doi.org/10.1101/2022.08.30.505833
Lim H, Chun J, Jin X, Kim J, Yoon J, No KT (2019) Investigation of protein-protein interactions and hot spot region between PD-1 and PD-L1 by fragment molecular orbital method. Sci Rep 9:16727. https://doi.org/10.1038/s41598-019-53216-z
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Dos A, Costa S, Fazel-Zarandi M, Sercu T, Candido S, Rives A, Ai M (2022) Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv. https://doi.org/10.1101/2022.07.20.500902
Liu S, Zhang J, Chu H, Wang M, Xue B, Ni N, Yu J, Xie Y, Chen Z, Chen M, Liu Y, Patra P, Xu F, Chen J, Wang Z, Yang L, Yu F, Chen L, Gao YQ (2022) PSP: million-level protein sequence dataset for protein structure prediction. ArXiv. https://doi.org/10.48550/arxiv.2206.12240
Lu H, Zhou Q, He J, Jiang Z, Peng C, Tong R, Shi J (2020) Recent advances in the development of protein-protein interactions modulators: mechanisms and clinical trials. Signal Transduct Target Ther 5:213. https://doi.org/10.1038/s41392-020-00315-3
Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M (2022) ColabFold: making protein folding accessible to all. Nat Methods 19:679–682. https://doi.org/10.1038/s41592-022-01488-1
Moretti R, Fleishman SJ, Agius R, Torchala M, Bates PA, Kastritis PL, Rodrigues JPGLM, Trellet M, Bonvin AMJJ, Cui M, Rooman M, Gillis D, Dehouck Y, Moal I, Romero-Durana M, Perez-Cano L, Pallara C, Jimenez B, Fernandez-Recio J, Flores S et al (2013) Community-wide evaluation of methods for predicting the effect of mutations on protein-protein interactions. Proteins Struct Funct Bioinforma 81:1980–1987. https://doi.org/10.1002/prot.24356
Moriwaki Y (2021) Twitter post: AlphaFold2 can also predict heterocomplexes. all you have to do is input the two sequences you want to predict and connect them with a long linker. In: https://twitter.com/Ag_smith/status/1417063635000598528
Nakamura T, Oda T, Fukasawa Y, Tomii K (2018) Template-based quaternary structure prediction of proteins using enhanced profile–profile alignments. Proteins Struct Funct Bioinforma. https://doi.org/10.1002/prot.25432
Negi SS, Braun W (2017) Cross-React: a new structural bioinformatics method for predicting allergen cross-reactivity. Bioinformatics 33:1014–1020. https://doi.org/10.1093/bioinformatics/btw767
Ovek D, Abali Z, Zeylan ME, Keskin O, Gursoy A, Tuncbag N (2022) Artificial intelligence based methods for hot spot prediction. Curr Opin Struct Biol 72:209–218. https://doi.org/10.1016/j.sbi.2021.11.003
Ozdemir ES, Gursoy A, Keskin O (2018) Analysis of single amino acid variations in singlet hot spots of protein-protein interfaces. Bioinformatics 34:i795–i801. https://doi.org/10.1093/bioinformatics/bty569
PIACO 2019 Protein Interface Analysis using COvarying signals. https://github.com/yfukasawa/piaco (Accessed 2nd December 2022)
Pierce BG, Wiehe K, Hwang H, Kim B-H, Vreven T, Weng Z (2014) ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers. Bioinformatics 30:1771–1773. https://doi.org/10.1093/bioinformatics/btu097
Pittala S, Bailey-Kellogg C (2020) Learning context-aware structural representations to predict antigen and antibody binding interfaces. Bioinformatics 36:3996–4003. https://doi.org/10.1093/bioinformatics/btaa263
PoSSuM 2021 Pocket Similarity Search using Multiple-Sketches https://possum.cbrc.pj.aist.go.jp/PoSSuM/ (Accessed 2nd December 2022)
PoSSuMAg (2022) Pocket Similarity Search using Multiple-Sketches (Antigen) (in preparation)
Poupon A, Janin J (2010) Analysis and prediction of protein quaternary structure. Methods Mol Biol 609:349–364. https://doi.org/10.1007/978-1-60327-241-4_20
Rosell M, Fernández-Recio J (2020) Docking-based identification of small-molecule binding sites at protein-protein interfaces. Comput Struct Biotechnol J 18:3750–3761. https://doi.org/10.1016/j.csbj.2020.11.029
Ruffolo JA, Chu L-S, Mahajan SP, Gray JJ (2022) Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. bioRxiv. https://doi.org/10.1101/2022.04.20.488972
Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ (2005) PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res 33:W363-367. https://doi.org/10.1093/nar/gki481
Smith GR, Sternberg MJE (2002) Prediction of protein-protein interactions by docking methods. Curr Opin Struct Biol 12:28–35. https://doi.org/10.1016/S0959-440x(02)00285-3
Tabei Y, Uno T, Sugiyama M, Tsuda K (2010) Single versus multiple sorting for all pairs similarity search. In: The Second Asian Conference on Machine Learning (ACML2010), Tokyo, Japan. pp 145–160
Thorn KS, Bogan AA (2001) ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 17:284–285. https://doi.org/10.1093/bioinformatics/17.3.284
Tomii K, Akiyama Y (2004) FORTE: a profile-profile comparison tool for protein fold recognition. Bioinformatics 20:594–595. https://doi.org/10.1093/bioinformatics/btg474
Tsaban T, Varga JK, Avraham O, Ben-Aharon Z, Khramushin A, Schueler-Furman O (2022) Harnessing protein folding neural networks for peptide-protein docking. Nat Commun 13:176. https://doi.org/10.1038/s41467-021-27838-9
Tsuchiya Y, Kinoshita K, Ito N, Nakamura H (2006a) PreBI: prediction of biological interfaces of proteins in crystals. Nucleic Acids Res 34:W320-324. https://doi.org/10.1093/nar/gkl267
Tsuchiya Y, Kinoshita K, Nakamura H (2006b) Analyses of homo-oligomer interfaces of proteins from the complementarity of molecular surface, electrostatic potential and hydrophobicity. Protein Eng Des Sel 19:421–429. https://doi.org/10.1093/protein/gzl026
van Noort CW, Honorato RV, Bonvin AMJJ (2021) Information-driven modeling of biomolecular complexes. Curr Opin Struct Biol 70:70–77. https://doi.org/10.1016/j.sbi.2021.05.003
van Zundert GCP, Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis PL, Karaca E, Melquiond ASJ, van Dijk M, de Vries SJ, Bonvin AMJJ (2015) The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. J Mol Biol 428:720–725. https://doi.org/10.1016/j.jmb.2015.09.014
Vieths S, Scheurer S, Ballmer-Weber B (2006) Current understanding of cross-reactivity of food allergens and pollen. Ann N Y Acad Sci 964:47–68. https://doi.org/10.1111/j.1749-6632.2002.tb04132.x
Vreven T, Moal IH, Vangone A, Pierce BG, Kastritis PL, Torchala M, Chaleil R, Jiménez-García B, Bates PA, Fernandez-Recio J, Bonvin AMJJ, Weng Z (2015) Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J Mol Biol 427:3031–3041. https://doi.org/10.1016/J.JMB.2015.07.016
Wang X, Terashi G, Christoffer CW, Zhu M, Kihara D (2020) Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics 36:2113–2118. https://doi.org/10.1093/bioinformatics/btz870
Wang G, Fang X, Wu Z, Liu Y, Xue Y, Xiang Y, Yu D, Wang F, Ma Y (2022) HelixFold: an efficient implementation of AlphaFold2 using PaddlePaddle. ArXiv. https://doi.org/10.48550/arxiv.2207.05477
Wu R, Ding F, Wang R, Shen R, Zhang X, Luo S, Su C, Wu Z, Xie Q, Berger B, Ma J, Peng J (2022) High-resolution de novo structure prediction from primary sequence. bioRxiv. https://doi.org/10.1101/2022.07.21.500999
Yamamori Y, Tsuchiya Y, Tomii K (2022) PPI prediction results for six CASP14 targets using AF2-related methods. https://doi.org/10.6084/m9.figshare.21716330
Yin R, Feng BY, Varshney A, Pierce BG, Brian Pierce CG (2022) Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci 31:e4379. https://doi.org/10.1002/PRO.4379
Yu J, Vavrusa M, Andreani J, Rey J, Tufféry P, Guerois R (2016) InterEvDock: a docking server to predict the structure of protein-protein interactions using evolutionary information. Nucleic Acids Res 44:W542–W549. https://doi.org/10.1093/nar/gkw340
Acknowledgements
We thank the organizers of the issue for extending this opportunity to us.
Funding
The study was supported by Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number JP22ama121028.
Author information
Authors and Affiliations
Contributions
YT, YY, and KT contributed to the design of the study and wrote the manuscript. PPI predictions in this study were performed by YY. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tsuchiya, Y., Yamamori, Y. & Tomii, K. Protein–protein interaction prediction methods: from docking-based to AI-based approaches. Biophys Rev 14, 1341–1348 (2022). https://doi.org/10.1007/s12551-022-01032-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12551-022-01032-7