Introduction

Protein–protein interactions are crucial in many biological processes like signal transduction, intracellular trafficking, and immune recognition. Among all protein–protein interactions, a significant portion is formed by symmetric homo-oligomers (Andre et al. 2008; Goodsell and Olson 2000; Poupon and Janin 2010). According to the Protein Data Bank (PDB) (Berman et al. 2000), more than one third of the proteins have some types of symmetry. For example, many transmembrane proteins like ion channels are formed by symmetric homo-oligomer assemblies. The symmetry of homo-oligomeric proteins is thought to be associated with many potential benefits like greater stability, reduced aggregation, and robustness to errors in synthesis (Andre et al. 2008; Goodsell and Olson 2000). The interface between symmetric homo-oligomers is often the targeting site for regulating the biological processes (Petsalaki and Russell 2008). Therefore, determining the complex structure of symmetric proteins is important (Lensink et al. 2016, 2018). Theoretically, one can use a general protein–protein docking approach to predict the complex structure of symmetric homo-oligomers by docking one monomer against the other (Comeau et al. 2004; de Vries et al. 2010, 2015; Torchala et al. 2013; Tovchigrechko and Vakser 2006). However, such a general docking strategy is not efficient for symmetric homo-oligomers. On one hand, the general protein–protein docking approach treats two interacting partners as different proteins and therefore often don’t generate the complex structures with strict symmetry; On the other hand, general protein–protein docking normally don’t consider the symmetry restraints during the docking process, and therefore is not computationally efficient. Therefore, specialized protein–protein docking algorithms are needed for predicting the complex structure of symmetric protein homo-oligomers.

One important symmetry in proteins is cyclic symmetry (Cn), for which the oligomeric structure can be constructed by n consecutive rotations of 360°/n around a single rotational axis of one subunit (Andre et al. 2008). Despite the importance of symmetric protein homo-oligomers, only a few algorithms have been developed for symmetric protein docking. Wolfson et al. developed a fast docking algorithm for cyclically symmetric complexes through local feature matching, which is referred to as SymmDock (Schneidman-Duhovny et al. 2005). SymmDock constructs the symmetric homo-oligomer complexes by restricting the search to symmetric cyclic transformations. The Weng group developed an FFT-based algorithm for symmetric protein–protein docking by restricting the search space with cyclic symmetry (M-ZDOCK) (Pierce et al. 2005). Based on the symmetric protein complexes in the PDB, several web servers that use template-based methods like ROBETTA (DiMaio et al. 2011), SWISS-MODEL (Biasini et al. 2014), and GalaxyGemini (Lee et al. 2013) have also been proposed to predict the homo-oligomeric structure. In addition, Ritchie and Grudinin presented a fast docking algorithm, which is named SAM, for predicting the symmetrical models of protein complexes with arbitrary point group symmetry through a spherical polar FFT-based algorithm (Ritchie and Grudinin 2016). Very recently, the Seok group has developed a combination modeling approach, GalaxyHomomer, for homo-oligomer structure prediction from a monomer sequence or structure by template-based modeling if homologous complexes are available in the PDB or ab initio docking (Baek et al. 2017).

However, despite the significant progress in the development of symmetric docking algorithms, there is still much room in improving the docking accuracy. Recently, we have developed a new pairwise shape-based scoring approach to consider long-range interactions (LSC) of protein atoms by an exponential form in FFT-based protein–protein docking. Tested on general protein–protein complexes, our LSC approach showed a significant advantage over the traditional grid-based method (Yan and Huang 2018). Extending the LSC approach to symmetric complexes, we have here developed a fast ab initio docking approach for the symmetric docking of homo-oligomers with Cn symmetry by an FFT-based search algorithm with LSC, which is referred to as CHDOCK.

Results and discussion

Comparison with other programs

We have tested our symmetric docking algorithm CHDOCK on the bound and unbound structures of our symmetric protein docking benchmark of 212 Cn targets (Yan and Huang 2019). Table 1 lists the success rates of CHDOCK in binding mode predictions for bound and unbound docking on the 212 cases with Cn symmetry when the top 1, 10, and 100 predictions are considered. The corresponding results are also shown in Fig. 1. For comparison, Table 1 and Fig. 1 also give the corresponding results of three other Cn symmetric docking algorithms, M-ZDOCK (Pierce et al. 2005), SymmDock (Schneidman-Duhovny et al. 2005), and SAM (Ritchie and Grudinin 2016), on this benchmark, in which the same clustering criteria have been applied to their final binding modes during the calculation of success rates. It can be seen from Table 1 and Fig. 1 that CHDOCK obtained a significantly better performance than the other three docking methods for bound docking and achieved a success rate of 55.19%, 72.17%, and 90.57% for top 1, 10, and 100 predictions, respectively, in comparison to those of 45.76%, 65.09%, and 89.15% for M-ZDOCK, 38.85%, 54.25%, and 84.91% for SAM, and 16.04% 31.60%, and 67.45% for SymmDock.

Table 1 The success rates (%) predicted by our CHDOCK and three other symmetric docking programs on our protein docking benchmark of 212 Cn symmetric complexes when the top 1, 10, and 100 predictions were considered
Fig. 1
figure 1

The success rates of our CHDOCK and three other symmetric docking methods in binding mode predictions on our protein docking benchmark of 212 Cn symmetric complexes for bound docking (A) and unbound docking (B). For each method, from left to right are for the results of top 100, 10, and 1 prediction, respectively

Similar trends can also be observed in the results for unbound docking, though the performance differences among different algorithms are not as much as those for bound docking due to the impact of conformational changes in the unbound structures. Namely, CHDOCK also performed significantly better than the other three methods for unbound docking and obtained a success rate of 30.66%, 44.81%, and 68.40% for top 1, 10, and 100 predictions, respectively, in comparison to those of 26.42%, 36.79%, and 66.51% for M-ZDOCK, 19.34%, 31.60%, and 63.68% for SAM, and 11.79%, 30.66%, and 58.49% for SymmDock (Table 1 and Fig. 1).

Besides the success rate of docking, we have also compared the average root mean square deviation (RMSD) of ‘hit(s)’ (i.e., successful binding mode predictions) for both bound docking and unbound docking with the other three programs when the top 1, 10, 100 predictions were considered. The results are listed in Table 2 and the corresponding results are shown in Fig. 2. From Table 2 and Fig. 2, we can see that CHDOCK also performed much better and obtained more accurate binding modes than the other three programs for both bound docking and unbound docking. For bound docking, CHDOCK obtained an average RMSD of 1.10, 1.51 and 2.11 Å for top 1, 10 and 100 predictions, respectively, in comparison to those of 2.27, 2.80 and 3.46 Å for the second-best method M-ZDOCK. As for unbound docking, similar results can also be observed. CHDOCK obtained an average RMSD of 2.54, 3.12 and 4.07 Å for top 1, 10, 100 predictions, respectively, while M-ZDOCK obtained a higher RMSD of 3.26, 3.86 and 5.07 Å. Interestingly, one can also note that among the four docking programs, if a method performs better in the success rate of binding mode prediction, it also performs better in the average RMSD of ‘hits’. That means, the performance comes from both the number and the quality of successful predictions.

Table 2 The average LRMSD (Å) of ‘hit(s)’ predicted by our CHDOCK and three other symmetric docking programs on our protein docking benchmark of 212 Cn symmetric complexes when the top 1, 10, and 100 predictions were considered
Fig. 2
figure 2

The average RMSD of first ‘hit(s)’ of our CHDOCK and three other symmetric docking methods tested on our protein docking benchmark of 212 Cn symmetric complexes for bound docking (A) and unbound docking (B). For each method, from left to right are for the results of top 100, 10, and 1 prediction, respectively

Performance of scoring function

To investigate the performance of our scoring function, we also tested our pure FFT-based docking, named CHDOCK_lite, on the benchmark, which only uses the shape complementarity to filter and sort docking poses. The docking results for bound docking and unbound docking are shown in Fig. 3. It can be seen from the figure that CHDOCK performed much better than CHDOCK_lite. With the help of our scoring function ITScorePP (Huang and Zou 2008), the success rate of bound docking for top 1 prediction increased from 21.70% to 55.19% and for unbound docking, the success rate increased from 11.32% to 30.66%. The great improvement of CHDOCK compared to CHDOCK_lite demonstrates the important role of our scoring function.

Fig. 3
figure 3

The success rate as a function of the number of top predictions for our CHDOCK and CHDOCK_lite tested on our protein docking benchmark of 212 Cn symmetric complexes for bound docking (A) and unbound docking (B)

Discussions

CHDOCK and M-ZDOCK are both the three-dimensional (3D) FFT-based docking algorithms and adopt the similar sampling strategy. However, the difference between CHDOCK and M-ZDOCK is that CHDOCK adopts a better shape complementarity score LSC (Yan and Huang 2018) and a more powerful scoring function ITScorePP (Huang and Zou 2008). In our previous study on hetero protein complexes (Yan and Huang 2018), LSC has shown its better performance than PSC (Chen and Weng 2003) used in M-ZDOCK. ITScorePP also showed a better performance in scoring decoys and finding the near native structures (Huang and Zou 2008). Therefore, the better performance of CHDOCK than M-ZDOCK would be attributed to both the shape complementarity score LSC and our scoring function ITScorePP. Although CHDOCK has achieved better performance than the other three docking programs, the success rate for top 1 prediction is still not high, especially for unbound docking. There are much room to improve the existing methods and develop new docking programs in the future.

Examples of the docking model

Figure 4 shows the top binding modes predicted by our CHDOCK for both bound and unbound docking on three example targets. It can be seen from the figure that the predicted complexes overlap well with the experimental native structures, and give a ligand RMSD of 0.42 and 4.03 Å for C2 symmetric target 1MSC, 0.92 and 3.38 Å for C4 symmetric target 1OVO, and 0.95 and 1.20 Å for C6 symmetric target 1KQ1, respectively. The good consistency between the predicted and native structures in both bound and unbound docking demonstrates the reliability of our CHDOCK.

Fig. 4
figure 4

Comparisons between the top predicted binding modes and native structures for three targets 1MSC (C2 symmetry) (A), 1OVO (C4 symmetry) (B) and 1KQ1 (C6 symmetry) (C). The native structure is colored in pink and the predicted structure is colored by chains. For each column, the upper and lower ones are for bound docking and unbound docking, respectively

Conclusion

We have developed a hierarchical docking algorithm for predicting the complex structures of homo-oligomers with Cn symmetry, which referred to as CHDOCK. The Cn symmetric binding modes were first generated by an FFT-based docking algorithm, in which a shape complementarity scoring function was used to consider long-range interactions. Then, the binding modes with best shape complementarity were optimized with our iterative scoring function for protein–protein interactions. Our symmetric docking algorithm CHDOCK was evaluated on a diverse benchmark of 212 Cn symmetric protein complexes from the PDB, and was compared with three state-of-the-art symmetric docking approaches including M-ZDOCK, SAM, and SymmDock. It shows that CHDOCK achieved a significantly better performance than the other three docking methods in both the number and the quality of successful predictions for bound docking and unbound docking. The results demonstrate the strong predictive power of our hierarchical docking algorithm CHDOCK in modeling Cn symmetric protein complexes.

Materials and methods

FFT-based translational search

The putative symmetrical complexes were constructed from a monomer or subunit in 3D translational space by a modified version of our general FFT-based docking algorithm (Yan et al. 2017; Yan and Huang 2018). Specifically, we first made two copies of the subunit or monomer. One was called “receptor” subunit and the other “ligand” subunit. For docking with Cn symmetry, the receptor subunit was fixed and the ligand subunit was rotated by an angle of 360°/n around the z-axis. To perform an FFT-based search, both the receptor and ligand subunits needed to be mapped onto a 3D grid of \(N \times N \times N\) grid points (Chen and Weng 2003; Katchalski-Katzir et al. 1992). The grid points within the VDW radius of any protein atoms were considered inside the molecule, and the others were considered as outside the protein. Here, the VDW radii for standard protein atoms were taken from the study by Li and Nussinov (1998). Then, the inside-protein grid points were divided into three parts: surface layer, near-surface layer, and core region. It is defined that a grid point belonged to the surface layer if any of its neighboring grid points is outside the protein. Similarly, a grid point belonged to the near-surface layer if any of its neighbors is in the surface layer. All the other grid points except the surface and near-surface layers inside the protein were defined as the core region. According to the above definitions, one can see that the near-surface layer and core region were normally occupied by the protein atoms, and the surface layer separated the inside protein from the outside space. Then, each grid point for the receptor (R) and ligand (L) subunits was assigned a complex value as:

$$R(l,m,n) = \left\{ {\begin{array}{lll} &{ - \sum\limits_{i,j,k} {\exp [ - {{(r - 1)}^2}] + J} }&\quad{{\rm{(for\,surface\,layer),}}}\\ &{ - 1 + 2J \times \sum\limits_{i,j,k} {\exp ( - {r^2})} }&\quad{{\rm{(for\,near\,surface\,layer),}}}\\ &{ - 1 + 10J}&\quad{{\rm{(for\,protein\,core),}}}\\ &0&\quad{{\rm{(outside\,the\,protein),}}} \end{array}} \right.$$
(1)

and

$$L(l,m,n) = \left\{ {\begin{array}{lll}&{ 1 - J }&\quad{{\rm{(for\,surface\,layer),}}}\\&{ 1 + 2J \times\sum\limits_{i,j,k} {\exp ( - {r^2})} }&\quad{{(\rm{for\,near\,surface\,layer),}}}\\&{ 1 - 10J}&\quad{{(\rm{for\,protein\,core),}}}\\ &0&\quad{{\rm{(outside\,the\,protein),}}}\end{array}} \right.$$
(2)

where \({J^2} = - 1\), l, m, and n are the indices of the 3D grid (\(l,m,n = 1, \ldots ,N\)), and r is the distance between the grid points of (i, j, k) and (l, m, n). Here, \(i \in \left[ {l - 3,l + 3} \right]\), \(j \in \left[ {m - 3,m + 3} \right]\) and \(k \in \left[ {n - 3,n + 3} \right]\) for the surface layer, and \(i \in \left[ {l - 1,l + 1} \right]\), \(j \in \left[ {m - 1,m + 1} \right]\) and \(k \in \left[ {n - 1,n + 1} \right]\) for the near-surface layer, respectively. And also, the grid point (i, j, k) should belong to near-surface layer or protein core.

With the above mapping of the proteins on the grid, the shape complementarity score between two neighboring subunits of a symmetric complex around the z-axis can be generally expressed by the following equation (Chen and Weng 2003; Katchalski-Katzir et al. 1992):

$$\begin{array}{ll} E\left( {o,p} \right) = & {\mathop{\text Re}\nolimits} \left[ {\sum\limits_{l = 1}^N {\sum\limits_{m = 1}^N {\sum\limits_{n = 1}^N {R\left( {l,m,n} \right) \times L\left( {l + o,m + p,n} \right)} } } } \right]\\ &- {\mathop{\text Im}\nolimits} \left[ {\sum\limits_{l = 1}^N {\sum\limits_{m = 1}^N {\sum\limits_{n = 1}^N {R\left( {l,m,n} \right) \times L\left( {l + o,m + p,n} \right)} } } } \right], \end{array}$$
(3)

where o and p are the number of grid points by which the ligand (L) is shifted with respect to the receptor (R) in the xy plane, respectively. There is no shift in the z-axis because the rotational axis is parallel to the z-axis, which reduces the sampling space in one translational dimension. The correlation of Eq. 3 can be calculated by an FFT-based algorithm. A higher correlation score means a better shape complementarity between two grids for a relative translation of (o, p) (Katchalski-Katzir et al. 1992).

Rotational sampling strategy

To perform a global sampling approach for putative binding modes, one needs to search the six-dimensional (i.e., 3 translational + 3 rotational) space. The exhaustive search in 3D translational space can be performed by an FFT-based approach, as described in the previous section. The exhaustive search in the rotational space will be conducted in the space of Euler angles by taking into the Cn symmetry restriction account. Specifically, the  monomer subunit is rotated by an interval of Euler angles (ϕ=0, Δθ, Δψ) in the rotational space, where the angular definition is based on the so-called “x-axis convention”. Namely, \(\phi\) is the first rotation about the z-axis, \(\theta \in \left[ {0,{\uppi \mathord{\left/ {\vphantom {\uppi 2}} \right. \kern-\nulldelimiterspace} 2}} \right]\) is the second rotation about the former x-axis (now \(x'\)), and ψ ∈ (0,2π] is the third rotation about the former z-axis (now \(z'\)). It is unnecessary to sample the ϕ angles as the rotational axis is z-axis. In addition, θ only needs to be sampled within \(\left[ {0,{\uppi \mathord{\left/ {\vphantom {\uppi 2}} \right. \kern-\nulldelimiterspace} 2}} \right]\) instead of \(\left[ {0,{{\pi }}} \right]\) because of the rotational symmetry. All these reduce the sampling space in the rotational space.

Then, for each rotation of the  monomer subunit, an FFT-based algorithm was used to calculate the shape complementarity between the grids of the receptor and the ligand in the translational space. During the docking calculation, an angle interval of 10° was used for rotational sampling, and a spacing of 1.2 Å was adopted in discretizing proteins onto grids for FFT-based translational search. Evenly distributed Euler angles were used for the rotational search, resulting in a total of 360 orientations in the rotational space. For each rotation, up to the top 100 translations with best shape complementarities were kept and optimized by our scoring function ITScorePP (Huang and Zou 2008). The binding mode that corresponds to the best energy score in an FFT-based translational search was kept for each rotation of the ligand subunit, yielding a total of 360 ligand binding modes for a docking run. Our FFT-based docking algorithm is computationally efficient and on average can complete a full docking calculation in 30 min on a 2.6 GHZ Intel CPU core.

Scoring function

All the binding modes generated from the initial stage were evaluated by ITScorePP (Huang and Zou 2008) and minimized according to their binding scores by a SIMPLEX optimization method. The binding energy score is a summation of the binding scores over all the interfaces between the subunits of the predicted complex. The final ranked binding modes were clustered with an RMSD cutoff of 5 Å, where the RMSD was calculated using the backbone atoms. If two binding modes have a ligand RMSD of <5 Å, the one with the better score is kept.

Benchmark

Based on the protein complexes in the PDB, we have also constructed a non-redundant benchmark for our symmetric protein–protein docking. Briefly, all the homo-oligomeric protein complexes with Cn symmetry were collected from the crystal structures with resolution better than 2.5 Å. The symmetry type of a complex was determined by its biological unit. The symmetric homo-oligomer complexes were then clustered according to their SCOP (version 1.75) family IDs (Lo Conte et al. 2000). For the complexes belonging to the same family, the one with the best resolution was selected as the representative, corresponding to a bound case of our benchmark, in which each subunit was called the bound structure of the complex. For the bound structure in each bound case, the unbound structure was identified by searching against the PDB database for the asymmetric structures using the BLASTP (protein–protein BLAST) algorithm of the BLAST package (Camacho et al. 2009). If an asymmetric structure had >95% sequence identity with the bound structure and covered >95% of the sequence alignment, the asymmetric structure was regarded as a candidate of the unbound structure. If there were multiple unbound structures for a subunit protein, the one with the high resolution was selected as the representative. This yielded a total of 212 homo-oligomeric protein complexes with Cn symmetry (http://huanglab.phys.hust.edu.cn/SDBenchmark/) (Yan and Huang 2019). All the structures in the benchmark have their original coordinates without any random rotation.

Evaluation criteria

The quality of a predicted binding mode was measured by the ligand RMSD (LRMSD). Here, the RMSD was calculated based on the backbone atoms of the ligand subunit after optimal superimposition of the receptor subunit and the native structure. The docking performance was evaluated by the success rate, i.e., the fraction of the targets with at least one hit in the test set when a certain number of top predictions were considered. Here, a hit is a prediction with a ligand RMSD of <10 Å (Huang 2014).