Introduction

Computer-aided drug design (CADD) is a powerful methodology for early stage drug-discovery [1]. In particular there is much interest in the use of molecular simulations methods to support drug-discovery efforts [2], via for instance investigation of protein folding mechanisms [3, 4], or ligand modulation of millisecond time-scale conformational changes in proteins [5]. Another application of molecular simulations in CADD is potency predictions to decrease time and costs of hit-to-lead and lead optimization stages needed before molecules may be progressed towards clinical studies [6]. This requires accurate description of ligand–protein energetics, which is nowadays increasingly sought via use of free energy calculations methods.

Among various existing free energy calculation methodologies, alchemical free energy calculations (AFE) have attracted much interest in recent years [7,8,9], due to their robust grounding in statistical physics. AFE calculations capture non-additivity of structure–activity relationship in congeneric series that are overlooked by empirical scoring methods [10], and have given useful potency estimates for a range of protein–ligand systems [11,12,13]. AFE methods may also be used to predict physical properties, such as lipophilicity coefficients [14,15,16]. In spite of encouraging successes, there are still important technical hurdles to overcome. Usual concerns involve finite-sampling effects that introduce statistical errors [17,18,19,20], whereas inaccuracies in potential energy functions contribute to systematic errors [21]. Additionally, algorithmic decisions for the handling of long range electrostatic interactions and finite-size artefacts affect simulation results in ways that are still poorly understood, with effects particularly apparent in the modelling of charged species [22,23,24]. Thus, it is important to improve the robustness of AFE protocols to enable their reliable application to structure-based drug design problems.

Blinded prediction competitions offer a valuable resource to reduce bias in validation studies and to test practical utility of a methodology in a setting that more closely resembles CADD in practice [25]. The D3R Grand Challenges have proven a popular blinded competition, with a focus on validating computational methods for modelling of protein–ligand interactions [25, 26]. The Statistical Assessment of Modelling of Proteins and Ligands (SAMPL) is also a well-established blinded competition for free energy science in drug discovery [27]. The SAMPL challenge was founded in 2007 and usually requests participants to predict physical chemical properties, such as binding affinities for host–guest systems, or hydration free energies of small drug-like molecules [28, 29]. Host–guest systems are attractive since they provide more tractable milestones towards validation of protocols for modelling protein–ligand binding energetics [30].

The 6th SAMPL (SAMPL6) competition was launched in September 2017. Our group focused on the host–guest leg of this contest, which requested predictions of standard free energies of binding for 27 guests across three different hosts [31]. The host molecules consisted in two octa-acids, OA and TEMOA molecules [32,33,34,35], and a cucurbituril ring clip CB8 [36,37,38,39], as shown in Fig. 1. The octa-acid systems (Fig. 1a) are basket shaped; OA contains four flexible propionate side chains bearing two rotatable single bonds each, while TEMOA contains four methyl groups, which alter the shape of the hydrophobic cavity. CB8 (Fig. 1b) is a heteroaromatic multicyclic molecule, chemically related to the cucurbiturils, made of methylene bridges containing eight glycoluril units. CB8 is considered a more flexible host than OA and TEMOA, though the latter two also contain flexible groups at the top and bottom of their cavities [38, 39]. Additionally, SAMPL6 introduced a SAMPLing challenge focused on evaluating convergence and reproducibility (across codes) of free energy predictions. To this end, input files for parameterized host–guests OA–G3, OA–G6 and CB8–G3 were provided and participants requested to evaluate convergence of their binding free energy estimates.

Fig. 1
figure 1

Depiction of the SAMPL6 host–guest dataset. a OA and TEMOA host–guest systems. b CB8 host–guest systems

This report summarizes the performance of our free energy code Sire/OpenMM Molecular Dynamics (SOMD) against the SAMPL6 host–guest dataset, as well as the lessons learned for continuing efforts to improve the robustness of AFE methods in CADD.

Theory and methods

Definition of binding affinity

The reversible binding of a ligand L to a receptor P can be written as:

$$P + L \mathop{\rightleftharpoons}\limits^{\Delta G^\circ _{{bind}} } PL$$
(1)

where ΔG°bind is the standard free energy of binding of ligand L to receptor P. A statistical thermodynamics treatment leads to Eq. 2 [40]:

$$\Delta G_{{bind~}}^{o}=~ - {k_B}T\ln \frac{{~{Z_{PL,solv~}}~{Z_{solv~~~}}V}}{{~{Z_{L,solv~}}~{Z_{P,solv~~~}}{V_o}}}$$
(2)

where ZPL,solv, Zsolv, ZL,solv and ZP,solv are the configuration integrals for complex system, the solvent molecules, the ligand and the protein system respectively, V is the volume of binding, namely the volume available to the ligand to bind the protein, and V0 the standard state volume, which is usually equal to 1661 Å3/molecule.

Computing free energies of binding through models A, B, C

Equation 2 can be applied to estimate the binding free energy for host–guest systems. Computationally, the free energy is evaluated by using molecular dynamics simulations (MD) by means of a double annihilation technique [13, 41, 42]. Figure 2 shows how this approach is used to evaluate ΔG°bind by means of a thermodynamic cycle. In the first step (discharging step) the charges of the guest’s atoms are turned off both in the solvated phase and in the bound phase, providing the discharging free energy changes \(\Delta G_{{elec}}^{{solv}}\) and \({{\varvec{\Delta}}}G_{{elec}}^{{host}}\) respectively. In the second step (vanishing step) a “non-interacting” guest is obtained by now switching off the van der Waals parameters of the discharged guest both in solvent and complex phase, giving the vanishing free energy changes, \(\Delta G_{{vdW}}^{{solv}}\) and \(\Delta G_{{vdW}}^{{host}}\), respectively. To prevent the ligand from drifting away from the host cavity a series of a flat-bottom distance restraints are defined between one guest atom j closest to the center of mass of the guest and four host atoms i. The restraint potential is given by Eq. 3 [13]:

$$U_{{(d_{{j1}} ,~ \ldots ,~d_{{jN_{{host}} }} )}}^{{restr}} = \sum\limits_{{i = 1}}^{{N_{{host}} }} {\left\{ {\begin{array}{*{20}l} 0 \hfill & {if~\left| {d_{{ji}} - R_{{ji}} } \right| \le D_{{ji}} } \hfill \\ {\kappa _{{ij}} \left( {\left| {d_{{ji}} - R_{{ji}} } \right| - D_{{ji}} } \right)^{2} } \hfill & {if~\left| {d_{{ji}} - R_{{ji}} } \right|> D_{{ji}} } \hfill \\ \end{array} } \right.}$$
(3)

where \(U_{{({d_{j1}},~ \ldots ,~{d_{j{N_{host}}}})}}^{{restr}}\) is the potential energy of the restraint as a function between a guest atom j and a set of host atoms i, |o| denotes the absolute value, Dji is the restraint deviation tolerance, Rji is the reference distance between host and guest atom, κji is the restraint force constant and Nhost is the number of host atoms that contribute to the restraint.

Fig. 2
figure 2

Thermodynamic cycle for standard binding free energy calculations. Firstly, the fully interacting guest is simulated in a free phase (top left) and a bound phase (top right), then the charges and the van der Waals terms are switched off, resulting in a non-interacting guest in water (bottom left), and bound to the host (bottom right)

From the closure of the thermodynamic cycle (Fig. 2) the binding free energy ΔGbind is given by Eq. 4:

$$\Delta G_{{bind}}^{{ModelA}}=\left( {\Delta G_{{elec}}^{{solv}}+~\Delta G_{{vdW}}^{{solv}}} \right) - ~\left( {\Delta G_{{elec}}^{{host}}+~\Delta G_{{vdW}}^{{host}}} \right)$$
(4)

Free energies of binding computed with Eq. 4 will be referred to as ModelA binding energies.

ModelA does not take into account the contribution of long range dispersions interactions due to the use of non-bonded cutoffs. Thus, to improve over ModelA, a long-range dispersion correction term is added to the free energy of binding by post-processing of the end states trajectories [43]. Additionally, a free energy correction term is introduced to relate the volume available to the restrained but non-interacting ligand to standard state conditions. This leads to Eq. 5 for predictions of binding free energies via ModelB.

$$\Delta G_{{bind}}^{{0,ModelB}}=\Delta G_{{bind}}^{{ModelA}}+~\left( {\Delta G_{{LJLRC}}^{{host}} - \Delta G_{{LJLRC}}^{{solv}}~} \right)+\Delta G_{{restr}}^{0}~$$
(5)

\(\Delta G_{{LJLRC}}^{{host}}\) is the long range correction term for the bound phase, and \(\Delta G_{{LJLRC}}^{{solv}}\) is the LRC term for the solvated phase. Details for the evaluation of these terms have been provided elsewhere [13]. \(\Delta G_{{restr}}^{0}\) is the free energy cost for imposing the host–guest restraint which is given by Eq. 6:

$$\Delta G_{{restr}}^{0}=~ - {k_B}T\ln \left( {\frac{{{Z_{H \cdot \cdot {G_{ideal}}}}~~}}{{{Z_{H,solv}}~{Z_{G,gas}}}}} \right)$$
(6)

where \({Z_{H \cdot \cdot {G_{ideal}}}}\) is the configuration integral for the restrained decoupled guest bound to the host, \({Z_{H,solv}}\) is the configuration integral for the solvated host and \({Z_{G,gas}}\) is the configuration integral for the guest in an ideal thermodynamic state. Equation 6 is evaluated by numerical integration as described elsewhere [13].

Finally, ModelC was constructed by devising an empirical correction term to account for systematic errors due to finite size artefacts and inaccuracies in potential energy functions. Linear regression models were obtained by correlating past SAMPL5 binding free energies computed with SOMD to experimental data, leading to Eq. 7 to compute ModelC binding free energies:

$$\Delta G_{{bind}}^{{0,ModelC}}=\frac{{\Delta G_{{bind}}^{{0,ModelB}} - \beta }}{\alpha }$$
(7)

where α and β are the slope and intercept of the linear regression model. SAMPL5 featured the same hosts OA and TEMOA but a different host CB7. Thus, separate regression models were determined for use with OA, TEMOA or CB8 hosts, the parameters are given in Table S1.

Preparation of host–guest input files for free energy calculations

The SAMPL6 organizers provided mol2 files for hosts, OA, TEMOA and CB8, and ligands, depicted in Fig. 1. Each file had the same Cartesian frame of reference and docking was performed with OpenEye toolkit [44,45,46] to predict the most likely binding mode. Experimental measurements were done at a pH 11.7 ± 0.1 at 298 K in presence of a buffer of 10 mM Na3PO4 for OA and TEMOA. CB8 was measured at pH 7.4 ± 0.1 at 298 K with 25 mM Na3PO4 buffer. To understand the influence of the buffer on binding free energy predictions, two different sets of input files were prepared, leading to no-buffer and buffer setups.

Input files for the no-buffer setup

In the no-buffer simulations, the presence of the additional Na3PO4 buffer was neglected. OA, TEMOA and CB8 host–guest systems were parametrized starting from the mol2 host and guest’s files. The force field parameters for OA and TEMOA hosts were taken from a preceding study of host–guest binding energies carried out for the SAMPL5 contest [13]. To create the host–guest complex input files, the utilities parmed and tleap were used [47, 48]. The combined host–guest complex mol2 file was loaded in tleap along with host force field parameters and GAFF1.8 and AM1/BCC parameters for the ligand as generated by antechamber from the AMBER16 release [49, 50]. The system was solvated in a cubic box with TIP3P water molecules [51], with a minimum distance between the solute and the box of 12 Å. Counter ions were added to neutralize the total net charge. The same approach was followed for parameterizing the ligand in a solvated phase.

Next an equilibration protocol was applied to relax the box size. Initially, energy minimization of the entire system was performed with 100 steps of steepest descent gradients, using sander. Then, solute molecules were position restrained with a force constant of 10 kcal mol−1 Å−2 while water molecules were allowed to equilibrate in an NVT ensemble, 200 ps at 298 K, followed by a NPT equilibration for further 200 ps at 1 atm pressure. Finally, a 2 ns NPT MD simulation was run with the SOMD software (revision 2017.1.0) to reach a final density of about 1 g cm−3 [52, 53]. The final coordinate files were retrieved with cpptraj. The edge length of the host–guest boxes was about 50 Å, whereas the solvated guest phase had an edge length of about 35 Å.

Input files for the buffer setup

For the second set of simulations, additional counter ions were added to mimic the presence of a buffer in the experiments. However, Na3PO4 was modelled by NaCl as force field parameters for multivalent ions were not readily available. Thus, for OA and TEMOA systems, the 10 mM sodium phosphate buffer was modelled with 60 mM of NaCl to match the ionic strength of the solution used for the experiments. Starting from the complex phase files, created as described previously, 4 additional Na+ and 4 Cl ions were added to each system, using tleap. The equilibration protocol described previously was reapplied to adjust the placement of the counter ions. For the preparation of the solvated phase, the host molecule was extracted from an equilibrated host–guest box and the host’s heavy atoms were replaced with water molecules. After equilibration the final solvated phase system had the same amount of Na+ and Cl ions as in the host–guest complex system, and a similar box size dimension. The same procedure was followed for CB8. In this case, 25 mM Na3PO4 were matched with 150 mM NaCl, thus 8 Na+ and 8 Cl ions were added to each CB8 host–guest system.

SAMPL6 simulation protocols

For the octa-acid hosts, both complex and solvated phase discharging step were run with nine equidistant λ windows. Twelve λ windows (0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0) were employed for the vanishing step, both in bound and solvated phase. For the CB8 host the bound and solvated phase discharging steps have been run with nine equidistant λ windows. The solvated vanishing step was carried out with the same windows setup as for the octa-acid guests. The bound vanishing step was carried out with 16 λ windows (λ 0.00, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.70, 0.85, 1.00) as preliminary runs indicated a need for greater number of windows to obtain reliable free energy changes.

All the simulations were run for duration of 8 ns with SOMD in an NPT ensemble. Temperature control was achieved with an Andersen Thermostat with a coupling constant of 10 ps−1 [54]. Pressure control was maintained by a Monte Carlo barostat that attempted isotropic box edge scaling every 100 fs. A 12 Å atom-based cutoff distance for the non-bonded interactions was used, using a Barker Watts reaction field with dielectric constant of 78.3 [55]. In the bound phase the restraints parameters of Eq. 3 were: Rji = 5 Å, Dji = 2 Å and κij = 10 kcal mol−1 Å−2 for all the octa-acid systems, while Rji = 7 Å, Dji = 2 Å and κji = 10 kcal mol−1 Å−2 were chosen for the CB8 simulations The guest atom j was taken as the atom closest to the center of mass of the guest. The atom names in the input files were for OA: G0 = C6; G1 = C2; G2 = C9; G3 = C6; G4 = C1; G5 = C5; G6 = C6; G7 = C6. For TEMOA: G1 = C5; G2 = C9; G3 = C6; G4 = C7; G5 = C5; G6 = C6; G7 = C6. The four host atom names i in OA and TEMOA were C45, C51, C57, C63. The selection of Rij and Dij parameters were based on an average distance of 4.9 Å measured between these four host atoms and the guest atom C6 in G0 in the input files provided by the organizers. For CB8 the guest and host atom names were: G0 = (C11, C3, C10, C18, C26), G1 = (C20, C4, C12, C22, C31), G2= (C13, C8, C18, C26, C32), G3 = (C18, C2, C10, C16, C24), G4 = (C5, C4, C10, C16, C24), G5 = (C7, C6, C14, C22, C28), G6 = (C5, C6, C14, C22, C32), G7 = (C6, C4, C10, C16, C24), G8 = (C10, C4, C14, C20, C26), G9 = (C6, C4, C12, C20, C28), G10 = (C7, C2, C10, C16, C24). The selection of Rij and Dij parameters were based on an average distance of 6.6 Å measured between the four host atoms and the guest atom in G0 in the input files provided by the organizers.

SAMPLing simulation protocols

For the SAMPLing leg of the challenge topologies and coordinate file for five replicates of OA–G3, OA–G6 and CB8–G3 were provided by the organizers for both the complex phase and the solvated phase simulations. All simulations were run for duration of 20 ns per window with SOMD with other simulation parameters identical to those used for SAMPL6 unless otherwise mentioned.

Estimation of free energy of binding and evaluation of dataset metrics

Free energy changes were computed by use of the multistate Bennet acceptance ratio MBAR method [43]. To achieve a more robust estimation of free energies, each simulation was repeated multiple times, using different initial velocities drawn from the Maxwell–Boltzmann distribution. Unless otherwise mentioned, the reported binding free energies are the mean of three runs, and statistical uncertainties are given one standard error of the mean.

As descripted in Ref. [13] for each model a population distribution for the determination coefficient R2, the mean unsigned error MUE and the Kendall τ parameters was computed by bootstrapping each free energy predictions for each host–guest dataset for ten thousand times. The resulting distributions may not be symmetric around the mean, thus uncertainties are reported with a 95% confidence interval.

Additionally, for the SAMPLing leg of the challenge, binding free energies were evaluated using ModelB by skipping the first 1.5 ns of each window, and using 1–100% of the rest of the dataset. Uncertainties were taken as the standard deviation output from pymbar and were propagated to obtain an uncertainty for the reported standard free energy of binding. The total wall-clock time was also estimated by summing up the wall-clock time for each λ window, in each phase and simulated process. The number of iterations was retrieved as the sum of the number of time-steps for each simulated process. For each host–guest replica 459,995,400 energy evaluations were carried on with an average wall-clock time of 245 h for CB8 systems and 190 h for OA. All input files for the SAMPL6 and SAMPLing protocols are publically available in the repository https://github.com/michellab/SAMPL6inputs.

Results

SAMPL6 challenge

Results for the full SAMPL6 dataset are shown in Fig. 3 for each model without and with a buffer setup. As judged by mean unsigned error, ModelA/no-buffer is the least accurate protocol, with a MUE value ca. 5.7 kcal mol−1. ModelA/buffer offers small improvements, with the MUE decreasing to ca. 5.1 kcal mol−1. Addition of long-range dispersions and standard state correction terms in ModelB decreases errors further (MUE ca. 3.9 and 3.4 kcal mol−1 for the no-buffer and buffer setups respectively). ModelC improves over ModelB with MUE values ca. 1.4 and 1.6 kcal mol−1 for the no-buffer and buffer setups respectively. Thus, the additional counter-ions in the buffer setup improve accuracy for ModelA and ModelB but not ModelC. This could be because the SAMPL5 calculations were carried out with a no-buffer setup [13], and the empirical correction terms used in ModelC do not transfer to a buffer setup.

Fig. 3
figure 3

Comparison of the predicted and measured binding free energies for a ModelA/no-buffer, b ModelA/buffer, c ModelB/no-buffer, d ModelB/buffer, e ModelC/no-buffer, f ModelC/buffer for the 27 host–guest systems. The grey line denotes perfect correlation between predictions and measurements, while the yellow shaded region indicates a ± 1 kcal mol−1 error bound. OA systems are colored in blue, TEMOA in green and CB8 in red

Ranking of the protocols according to correlation with experimental data yields a different outcome. ModelA/no-buffer and ModelB/no-buffer perform similarly well with R2 and τ values ca. 0.6, and a small decrease in predictive power is observed for ModelC/no-buffer but this is only significant for R2. This drop is observed because the empirical correction term works well to bring the OA host–guest binding energies in line with the experimental values, but leads to a tendency to underestimate the CB8 binding energies. The use of a buffer also appears detrimental to predictive power, with all buffer protocols giving significant decreases in R2 and τ parameters with respect to the equivalent no-buffer protocol.

Inspection of the results for the OA subset (Tables 1, 2) shows that ModelB and ModelC significantly improve the MUE over ModelA but not for R2 or τ metrics that are ca. 0.7 and 0.5 respectively. The buffer protocol worsens MUE over the no-buffer protocol but does not influence predictive power. The same picture holds for the TEMOA subset, with improvements for MUE only observed upon switching from ModelA to ModelB and ModelC. Switching from no-buffer to buffer gives significant worsening of the MUE for ModelA and ModelB. The R2 and τ metrics are high throughout (ca. 0.9 and 0.8) and insensitive to the various protocols. For the CB8 subset dramatic improvements in MUE are also observed as correction term are introduced (ModelA/no-buffer MUE ca. 7.3 kcal mol−1 vs. ModelC/no-buffer MUE ca. 1.6 kcal mol−1). Unlike for the octa-acid guests switching from a no-buffer to buffer setup significantly improves the MUE for ModelA and ModelB, but not for ModelC where the MUE worsens. Thus, the buffer effects are host–guest dependent. For the OA and TEMOA hosts, the guests are negatively charged acids and explicit modelling of a buffer favors the binding process (average change in binding energies of − 0.9 kcal mol−1 for ModelB). For the CB8 host, the guests are positively charged amines and explicit modelling of a buffer disfavors the binding process (average change in binding energies of + 3.1 kcal mol−1 for ModelB). The effect is particularly pronounced for some CB8 guests, e.g. the binding energies of G3, G4 and G7 increase by more than 4 kcal mol−1 upon switching from a no-buffer to buffer protocol. None of the models tested yield significant predictive power with R2 and τ metrics ca. 0.1.

Table 1 Results for all three models (no-buffer protocol) for individual host–guest families
Table 2 Results for all three models (buffer protocol) for individual host–guest families

The largest outliers for CB8 are guests G3, G4, G5 and G8. In particular the binding free energies of G3, G5 and G8 are lower than the experimental data by about 10 kcal mol−1 with ModelA/no-buffer or ModelB/no-buffer. The statistical errors are also larger than for the octa-acids, suggesting greater challenges for converging free energy changes in CB8 over the simulated time-scales. Switching to a buffer protocol decreases free energies of binding, and by up to ca. 5 kcal mol−1 for G3 and G8.

Among octa-acids the models correctly capture interesting trends in the experimental data. For instance, the models correctly predict that G7 binds significantly worse to TEMOA than to OA. The bulkiness of the two methyl groups β to the carboxylic acid moiety hinders positioning of the guest in the smaller TEMOA cavity (Fig. 1a). The most significant outlier is G2 for which the models are unable to reproduce the significantly decreased binding energetics for TEMOA versus OA. A possible reason for this discrepancy is that the different ring puckering motions of the cyclohexenyl moiety in G2 may have been poorly sampled with the simulation protocols employed here.

SAMPLing challenge

Convergence plots for the calculated binding free energies of the three host–guests CB8–G3, OA–G3 and OA–G6 are presented in Fig. 4.Footnote 1 Figure 4a shows that for CB8–G3 the binding free energy estimate obtained using the full simulation dataset is − 13.8 ± 0.7 kcal mol−1. Although the uncertainties are high the mean free energy rapidly settles around − 14 kcal mol−1 and similar estimates would have been obtained with about 20% of the simulation duration. The calculated binding free energies are consistent with those obtained for this host–guest with the SAMPL6 protocol (− 13.0 ± 2.1 kcal mol−1, Table 2). The SAMPLing reference binding free energy computed by the organizers using the software YANK is significantly different and more precise (− 10.8 ± 0.2 kcal mol−1) [56]. The reference value is also in better agreement with experimental data, though substantial differences remain (− 6.5 ± 0.1 kcal mol−1). It appears at least 60% of the simulation duration is needed to eliminate drifts in the running average for the reference calculation.

Fig. 4
figure 4

Comparison of standard binding free energies computed with SOMD (red) to SAMPLing reference values (blue) for CB8–G3 (a), OA–G3 (b) and OA–G6 (c). Bold lines denote the average free energy from five replicate simulations started from different coordinates. Shaded areas denote ± 1σ. The SAMPL6 and experimental results are depicted with green and black lines respectively, and the dotted lines denote ± 1σ

For OA–G3 (Fig. 4b) the binding free energies computed with SOMD and by the organizers are similarly precise and converge to − 5.7 ± 0.1 kcal mol−1 and − 6.7 ± 0.1 kcal mol−1 respectively. The SOMD SAMPLing free energies are as precise but more accurate than the SOMD SAMPL6 free energies (− 6.4 ± 0.1 kcal mol−1, Table 2) in comparison with experimental data (− 5.2 ± 0.1 kcal mol−1). The running average for both protocols is stable after ca. 20% of the simulation duration. For OA–G6 the SOMD and organizer’s free energies rapidly converge to very similar values (− 6.9 ± 0.1 kcal mol−1 vs. − 7.1 ± 0.1 kcal mol−1 respectively). These figures are in better agreement with experiment (− 5.0 ± 0.1 kcal mol−1) than the SAMPL6 SOMD free energies (− 8.1 ± 0.2 kcal mol−1).

Overall comparison of free energies estimated from the SAMPL6 and SAMPLing protocols shows that averaging results over multiple starting host–guest structures improved agreement of predictions with experiment for OA–G3 and OA–G6 but not CB8. No clear reason emerges to explain differences in binding free energies computed by SOMD and YANK.

Conclusions

AFE calculations were employed to estimate standard binding free energies for 27 host–guests in the SAMPL6 competition. Protocols similar to that used in the SAMPL5 competition were adopted (ModelA/no-buffer and ModelB/no-buffer) [13], leading to results of comparable performance to SAMPL5 (SAMPL6 ModelB/no-buffer R2 ca. 0.6, MUE 3.9 kcal mol−1, N = 27 vs. SAMPL5 ModelC R2 ca. 0.7, MUE 3.4 kcal mol−1, N = 22). The reasons for the systematic overestimation of free energies of binding remain unclear; this could be because of a neglect of long-range correction term to electrostatics, or use of non-polarizable force-fields.

Additionally, an empirical correction term derived by a linear regression approach against SAMPL5 data was devised to correct for systematic errors in the free energy calculation protocol (Model C/no-buffer). This leads to significant improvements in mean-unsigned error but a slight decrease in correlation with experimental trends (MUE ca. 1.4 kcal mol−1, R2 ca. 0.5). High accuracy predictions and correlations with experimental data were achieved for the OA and TEMOA hosts, but CB8 proved more challenging, with significantly higher uncertainties in the computed binding free energies and poor correlation with experiment.

The influence of the modelled buffer on the computed binding free energies was also investigated. The main finding is that explicit modelling of the buffer enhances binding of negatively charged guests to OA and TEMOA, and weakens binding of positively charged guests to CB8. Overall the MUE for the dataset (ModelA and ModelB) decreases by about 0.6 kcal mol−1 because the CB8 binding energies are more in line with experimental data. However, this improvement is also accompanied by a drop of ca. 0.2 in R2. The empirical correction term derived against SAMPL5 data is incompatible with a protocol that models explicitly a buffer, presumably because no buffer was modelled in the SAMPL5 calculations [13].

With respect to other SAMPL6 submissions the results obtained with SOMD were encouraging and among the top performing models for OA and TEMOA as judged by R2 and MUE metrics. CB8 proved challenging for most participating groups. SOMD ModelC/no-buffer gave the lowest MUE values among all submissions (ca. 1.5 kcal mol−1), but the predictive power was insignificant (R2 ca. 0.1) [31].

The OA–G3 and OA–G6 binding free energies computed with the SAMPLing protocol were significantly different from those computed with SAMPL6 protocol (0.7 and 1.2 kcal mol−1 respectively). A standard practice in our group is to at least estimate uncertainties in computed binding free energies from triplicate runs initiated from the same input coordinates. This gives a reasonable estimate of the extent to which free energies are reproducible given a starting condition, but can also give a misleading impression of convergence. Where multiple reasonable poses can be produced, efforts are better spent evaluating free energies with simulations started from different input coordinates. Comparison of SOMD’s free energies with the reference values (YANK) provided by the organizers yields a mixed picture, with a substantially significant difference (CB8–G3, 3 ± 0.7 kcal mol−1), a moderate difference (OA–G3, 1 ± 0.2 kcal mol−1), and an insignificant difference (OA–G6 0.2 ± 0.2 kcal mol−1). There are several algorithmic differences between the two codes that could explain discrepancies, a notable one being an atom-based Barker–Watts reaction-field treatment of long-range electrostatics (SOMD) versus PME (YANK). Other differences exist around the treatment of soft-cores, the coupling of non-bonded and bonded interactions with the λ schedule, and electrostatic correction terms for charged guests. More systematic reproducibility studies on larger datasets will be needed to isolate the factors that contribute to the observed variability. Such efforts are important to validate the robustness and transferability of molecular simulation algorithms.