Introduction

Estimating the affinity between a small molecule and a biomacromolecule is important in many parts of chemistry, especially in drug design [1, 2]. Therefore, numerous computational methods have been developed with this aim [1], ranging from simple scoring approaches for ligand docking [3], via end-point approaches, like linear interaction energy [4] and MM/PBSA (molecular mechanics combined with Poisson–Boltzmann and surface area solvation) [5, 6], to strict approaches based on free-energy perturbation (FEP) [7, 8] with free energies calculated by exponential averaging (EA) [9], thermodynamic integration [10] or the Bennett acceptance ratio (BAR) approach [11].

The latter methods should in principle be limited only by the accuracy of the potential-energy function and the sampling of the phase space, although uncertainties in the nature of the simulated system (e.g. the protonation state of all involved molecules and residues) may also affect the results [7, 8]. To reduce the sampling problem and allow for a better control of the actual chemical state, there has been quite some interest to study simpler systems, in particular the binding of small molecules to organic molecules of intermediate size (a few hundred atoms), i.e. host–guest systems [12, 13].

Most free energy simulations are performed by empirical potentials in the form of molecular mechanics (MM) force fields. However, during the latest decades, there has been an increasing interest in employing quantum mechanical (QM) calculations to obtain more accurate binding free energies [14]. Such calculations can be performed at many levels of approximation. Owing to the much larger computational cost of QM calculations, most such studies are based either on single-point QM calculations on structures obtained by MM sampling or on structures minimised by QM [14,15,16] or by combined QM and MM calculations (QM/MM) [17, 18]. Only a few studies have involved sampling at the QM/MM level, typically using a semiempirical QM (SQM) method [19,20,21,22,23].

Most computational studies of ligand-binding affinities are performed on systems for which the experimental affinities are known. Of course, this introduces the risk that the results are biased towards the experimental data. Therefore, prospective studies, in which the experimental results are not known when the calculations are performed, provide a more unbiased view of the performance of various methods. In this regard, the statistical assessment of the modelling of proteins and ligands (SAMPL) blind-test competitions have been invaluable to compare the true predictive value of various computational methods. Since SAMPL3, it has involved host–guest systems [24] and since SAMPL4, it has involved the binding of ligands to the octa-acid deep-cavity host (OAH, shown in Fig. 1) [25, 26], developed by the Gibb group [27, 28].

Fig. 1
figure 1

a Ligands involved in SAMPL6 challenge (G0–G7) and five ligands added to make the perturbations smaller and connected to experimental data (MeHx) and A5–A8. b The OAH and c OAM host molecules

In a series of publications, we have studied the binding of nine cyclic carboxylate guest molecules to OAH with computational methods at both the MM and QM/MM level [15, 22, 23, 29,30,31]. In the SAMPL4 challenge, we used FEP to calculate the relative affinities of the nine guests at the MM level [15], which gave the best results in the competition [25], with a mean absolute deviation (MAD) of 3.6 ± 0.2 kJ/mol and a correlation (R2) to experimental data of 0.84 ± 0.04. We also tried to improve the FEP results by performing QM calculations with density functional theory (DFT) on snapshots from the MM simulations, using large QM systems involving ~ 310 atoms (1800 DFT calculations for each ligand). However, the difference between the MM and QM potentials was so large that no converged results could be obtained. Therefore, the results were very poor with an uncertainty of 6–32 kJ/mol and MADs of 17–27 kJ/mol.

However, by using smaller QM systems (less water molecules and with the acidic groups on OAH removed) and SQM calculations with the PM6-DH2X method, we were able to obtain converged results with a precision of 1 kJ/mol for all relative free energies, using 700,000 QM calculations for each ligand [29]. Unfortunately, the results were still worse than the MM-FEP results, with a MAD of 4.9 ± 0.4 kJ/mol and a vanishing correlation. These results were obtained without any sampling at the QM/MM level, but in our next study such sampling was performed (with semiempirical PM6-DH+ calculations and only the ligand in the QM system) [22]. This gave even better results with a precision of 0.5–0.9 kJ/mol, a MAD of 4.7 ± 0.2 and a R2 correlation of 0.86 ± 0.04. Recently, we have shown that similar results can be obtained with approximately a fourth of the computational effort using multiple short QM/MM simulations [23] or by using non-equilibrium simulations and Jarzynski’s equality [32,33,34].

In the SAMPL4 study, we also tried to estimate OAH binding affinities with minimised QM structures, using a variant of the method suggested by Grimme and coworkers [15, 16]. We optimised the structures of the complexes with three different DFT approaches (in vacuum, in a continuum solvent and in a continuum solvent with four explicit water molecules). Then, binding free energies were obtained with a vacuum DFT calculation with large basis set and empirical dispersion corrections, combined with a COSMO-RS estimation of the solvation free energy and with thermostatistical corrections from a frequency calculation at the MM level. This approach gave absolute binding affinities of an intermediate accuracy with MADs of 7–14 kJ/mol and R2 of 0.60–0.78. After removing systematic errors (the mean signed deviation, MSD), the MADs (called MADtr in the following) were 5–9 kJ/mol. Similar results were obtained also by Sure and Grimme on the same system [35]. An attempt to improve the energies by local coupled-cluster calculations gave much worse results with R2, MAD and MADtr of 0.28, 37 and 14 kJ/mol, respectively [15, 30].

In SAMPL5, we employed a similar approach to calculate binding affinities of six more diverse guest molecules (with either a carboxylate or a trimethylammonium group) [31] to OAH and also to its tetra-endo-methyl variant (OAM) [36]. The calculations were improved by keeping the structures as symmetric as possible, reducing the charge and flexibility of the ligand and performing a restricted sampling of the complexes. Disappointingly, the results were worse than for SAMPL4 with MADtr of 11–22 kJ/mol and R2 below 0.30. The reason for this is probably the larger diversity of the ligands but also problems with some of the geometry optimisations (the guest carboxylate groups become too buried inside the host). The results were not improved by employing DLPNO–CCSD(T) calculations [37] (MADtr = 16–20 kJ/mol and R2 = 0–0.15). The best results in the SAMPL5 competition were obtained for free-energy simulations at the MM level, dragging the ligand out of the host [26].

In this paper, we study the binding of eight carboxylic ligands to both the OAH and OAM hosts in the SAMPL6 challenge [38, 39] with four different methods: FEP at the MM level, FEP at the PM6-DH+/MM level, as well as optimised structures at the PM6-DH+/MM and DFT/MM levels of theory. For the latter, we used more extensive sampling at the MM level and QM/MM optimised structures with explicit solvent. We also re-examine the SAMPL4 and 5 test cases with the third method to show that it gives improved results. For the first time, we get a slight improvement of the MM FEP results by employing the PM6-DH+/MM correction.

Methods

Setup of the studied systems

We have considered the eight ligands of the SAMPL6 blind-test competition, G0–G7 [38, 39], as well as four aliphatic carboxylates with 5–8 carbon atoms (called A5–A8) and the MeHx ligand from SAMPL4 [40]. They are shown in Fig. 1, together with the OAH and OAM host structures. For some test calculations, we used also the nine cyclic carboxylate ligands from SAMPL4 [40] and the four carboxylate ligands from SAMPL5 [41] (shown in Figures S1 and S2).

The host–guest complexes for the calculations were built from the coordinates for the octa-acid host with the guest molecules from previous blind-prediction challenges [15, 31]. The guest molecules were prepared and modified using the Avogadro software [42] and the geometry of the guest molecules was optimised with the UFF force field [43]. The OAM was constructed by adding four methyl groups at the corresponding hydrogen positions on the upper rim of OAH.

Both the host and the guest molecules were treated with the general AMBER force field (GAFF) [44], whereas the TIP3P model was used for water molecules [45]. Charges for the two host molecules have been reported before [15, 31]. Charges for the ligands were obtained with the same restrained electrostatic potential approach [46]: The molecules were optimised with the SQM AM1 method [47], followed by a single-point calculation at the Hartree–Fock/6-31G* level to obtain the electrostatic potentials, sampled with the Merz–Kollman scheme [48], but at a higher-than-default density (10 layers with 17 points per unit area, giving ~ 2000 points per atom). These calculations were performed with the Gaussian 09 software [49]. The potentials were then used by antechamber to fit the charges. The charges and atom types of all ligands are given in Table S8 in the supplementary material.

A few parameters missing in the force field were estimated with the Seminario approach [50]: The geometry of the ligands was optimised at TPSS/def2-SV(P) level [51, 52], followed by a frequency calculation using the aoforce module of the Turbomole software [53]. From the resulting Hessian matrix, parameters for the missing dihedrals were extracted with the Hess2FF program [54]. These parameters are given in Table S1 in the supplementary material.

Molecular dynamics simulation

All molecular dynamics (MD) simulations and FEP calculations were run with the AMBER 16 software suite [55]. Each host–guest complex was solvated in an octahedral box of water molecules extending at least 10 Å from the guest molecules using the tleap module, so that 1504–1513 water molecules were included in the simulations. All nine carboxylic groups on the host and guest molecules were assumed to be deprotonated because the binding affinities were measured at a pH of 11.7 [39]. Thus, the net charge of the host–guest complexes were − 9. No counter ions were used in the simulations, as our previous studies have shown that they have a small effect on the calculated free energies [15].

Each complex was subjected to 10,000 steps of conjugate-gradient minimisation, followed by 20 ps constant-volume equilibration and 20 ps constant-pressure equilibration, all performed with heavy non-water atoms restrained towards the starting structure with a force constant of 209 kJ/mol/Å2. Finally, the system was equilibrated for 2 ns without any restraints and with constant pressure, followed by 10 ns of production simulation, during which coordinates were saved every 10 ps. For each host–guest complex, 10 (OAH) or 20 (OAM) independent simulations were run, employing different TIP3P solvation boxes and different starting velocities [56]. Consequently, the total simulation time for each complex was 100 or 200 ns.

All bonds involving hydrogen atoms were constrained to the equilibrium value using the SHAKE algorithm [57], allowing for a time step of 2 fs. The temperature was kept constant at 300 K using Langevin dynamics [58], with a collision frequency of 2 ps−1. The pressure was kept constant at 1 atm using a weak-coupling isotropic algorithm [59] with a relaxation time of 1 ps. Long-range electrostatics were handled by particle-mesh Ewald summation [60] with a fourth-order B spline interpolation and a tolerance of 10−5. The cut-off radius for Lennard‒Jones interactions was set to 8 Å.

FEPs

The guest molecules were manually mapped for the FEP simulations as is shown in Fig. 2, keeping the perturbations as small as possible. To this aim and to connect the relative FEP calculations to experimental data [40, 61], we included also the A5–A8 and MeHx ligands. The FEP simulations were run with the pmemd module of AMBER 16 [37], using the dual-topology scheme with both ligands in the topology file. Each ligand transformation was divided into 13 steps, employing a linear transformation of the force-field potentials with the coupling parameter λ = 0.00, 0.05, 0.10, 0.20, …, 0.80, 0.90, 0.95 and 1.00. Electrostatic and van der Waals interactions were perturbed concomitantly, using soft-core potentials for both types of interactions [62, 63]. For the simplest perturbations, involving a H → CH3 perturbation (A5 → A6, A6 → A7, A7 → A8, G5 → G7 and G6 → A5), soft-core potentials were used only for the differing atoms. However, for the other seven perturbations, soft-core potentials were used for all atoms in the ligands, except those in the carboxylate group, to allow for larger differences in the dynamics of the perturbed groups (atoms without soft-core potentials have identical coordinates in the perturbations).

Fig. 2
figure 2

Ligand alchemical transformations studied with FEP

For each λ value, 100 steps of conjugate-gradient minimisation were performed with the heavy atoms of the host and ligand restrained towards the starting structure with a force constant of 418 kJ/mol/Å2. This was followed by 20 ps constant-volume equilibration with the same restraints and 1 ns constant-pressure equilibration without any restraints. Finally, a 2-ns production simulation was run (still with constant pressure), during which structures and energies were sampled every 2 ps.

Relative binding free energies between two ligands, L0 and L1 (∆∆Gbind), were estimated using a thermodynamic cycle that relates ∆∆Gbind to the free energy of alchemically transforming L0 into L1 when they were either bound to the host, ∆Gbound(L0 → L1), or free in solution, ∆Gfree(L0 → L1) [64, 65]:

$$\Delta \Delta {G_{{\text{bind}}}}=\Delta {G_{{\text{bind}}}}\left( {{{\text{L}}_1}} \right)-\Delta {G_{{\text{bind}}}}\left( {{{\text{L}}_0}} \right)=\Delta {G_{{\text{bound}}}}({{\text{L}}_0} \to {{\text{L}}_1})-\Delta {G_{{\text{free}}}}({{\text{L}}_0} \to {{\text{L}}_1})$$
(1)

Gbound(L0 → L1) and ∆Gfree(L0 → L1) were estimated by the multi-state Bennett acceptance-ratio (MBAR) method, using the pyMBAR software [66], including only statistically non-correlated energies in the calculations. All FEP calculations were repeated three times using different TIP3P solvation boxes and different starting velocities [56]. Reported free energies are the average over these three calculations, whereas the reported uncertainty is either the standard deviation over these three calculations divided by the square root of three or the square root of the sum of the variances of the three individual estimates divided by three, depending on which of the two values was largest.

QM/MM FEP calculations

Relative QM/MM binding affinities between two ligands, L0 and L1, were estimated by the reference-potential method with QM/MM sampling (RPQS) [22, 23]. In this approach, the ∆∆Gbind free energies, calculated at the MM level, as described in the previous section, are corrected by a FEP calculation for each ligand in the method space, from the MM potential to the QM/MM potential, as is shown by the thermodynamic cycle in Fig. 3. This was done both for the ligand bound to the host and when free in solution. For each state (s = bound or free), the QM/MM corrected free energy was calculated from

$$\Delta G_{{{{\text{L}}_0} \to {{\text{L}}_1},{\text{s}}}}^{{{\text{QM}}/{\text{MM}}}}=\Delta G_{{{{\text{L}}_0} \to {{\text{L}}_1},{\text{s}}}}^{{{\text{MM}}}}-\Delta G_{{{{\text{L}}_0},{\text{s}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}+\Delta G_{{{{\text{L}}_1},{\text{s}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}.$$
(2)
Fig. 3
figure 3

Thermodynamic cycle used for the RPQS calculations

Finally, the net binding free energies were calculated from

$$\Delta \Delta G_{{{\text{bind}}}}^{{{\text{QM}}/{\text{MM}}}}=\Delta G_{{{{\text{L}}_0} \to {{\text{L}}_1},{\text{bound}}}}^{{~{\text{QM}}/{\text{MM}}}}-\Delta G_{{{{\text{L}}_0} \to {{\text{L}}_1},{\text{free}}}}^{{{\text{QM}}/{\text{MM}}}}$$
(3)

All MM → QM/MM FEP simulations were performed with the AMBER 16 software [55] and for all host–guest systems shown in Fig. 1 except the MeHx ligand. In the QM/MM calculations, only the guest molecule was included in the QM region and it was treated at the SQM PM6-DH+ level of theory [67,68,69]. The MM → QM/MM free energies were calculated based on the energy function \(E\left( \Lambda \right)=\left( {1-\Lambda } \right){E_{{\text{MM}}}}+\Lambda {E_{{\text{QM}}/{\text{MM}}}}\), where \({E_{{\text{MM}}}}\) is the MM energy, \({E_{{\text{QM}}/{\text{MM}}}}\) is the QM/MM energy and \({\Lambda}\) is a coupling parameter going from 0 to 1. Based on our previous study of OAH with the SAMPL4 ligands [22], we performed calculations at four \({\Lambda}\) values: 0.0, 0.333, 0.666, and 1.0. If the overlap with four Λ values was unsatisfactory (see below), additional \({\Lambda}\) values were employed (0.1667, 0.5 or 0.8333; cf. Table S7).

For each Λ value, we performed 100 steps of conjugate-gradient minimisation with the heavy atoms of the host and guest molecules restrained towards the starting structure with a force constant of 418 kJ/mol/Å2. This was followed by 20 ps constant-volume equilibration with the same restraints and 0.5 ns constant-pressure equilibration without any restraints. Finally, a 1 ns production simulation was run, during which structures and energies were sampled every 1 ps. The QM/MM MD simulations were performed in the same manner as described in the “Molecular dynamics simulations” section. Free energies were estimated by MBAR, using the pyMBAR software [66], including only statistically non-correlated energies in the calculations.

Absolute binding free energies from QM/MM minimised structures

Absolute binding free energies were calculated using the method suggested by Grimme [16, 70], in which the binding free energy is composed of three terms:

$${\Delta}{G_{{\text{tot}}}}={\Delta}{E_{{\text{QM}}}}+{\Delta}{G_{{\text{solv}}}}+{\Delta}{G_{{\text{therm}}}}$$
(4)

where ∆EQM is a single-point vacuum QM energy, which also includes the dispersion energy, ∆Gsolv is the solvation free energy and ∆Gtherm is a thermostatistical correction term. The binding affinity was obtained as the difference in this free energy between the complex, host and guest:

$${\Delta}{G_{{\text{bind}}}}={\Delta}{G_{{\text{tot}}}}\left( {{\text{complex}}} \right) - {\Delta}{G_{{\text{tot}}}}\left( {{\text{host}}} \right) - {\Delta}{G_{{\text{tot}}}}\left( {{\text{guest}}} \right)$$
(5)

Structures for the free host and guest molecules were taken from the structures of the complexes, without further optimisation (rigid binding free energies; some tests were performed to calculate structure-relaxation energies, but they did not lead to any improvement). The calculations were performed at two levels of QM theory and based on two sets of structures. The two approaches will be called SQM and DFT in the following.

From each of the independent MD simulations of the host–guest complexes at the MM level, the last snapshot was minimised at the PM6-DH+/MM level of theory [67,68,69] using the AMBER 16 software suite [55]. The quantum system consisted of the host and guest molecules, as well as four water molecules that formed hydrogen bonds with the guest (viz. the two closest water molecules to each of the ligand carboxylate oxygen atoms). It had a net-charge of − 9. The solvation box from the MD simulations was kept in all calculations. Conjugate-gradient minimisations were run for 2000 steps without any bond-length restraint for any molecule and with no periodicity (for technical reasons). This gave 10 different host–guest structures for each guest bound to the OAH host and 20 different structures for the OAM host. The resulting structures were used directly for the SQM calculations.

The QM energy for the SQM structures (only isolated host and guest, with all water molecules removed) was calculated as a PM6-DH+ single-point energy using the AMBER sqm program [55]. This method includes dispersion and hydrogen-bond corrections [67,68,69] and is among the best SQM methods available in the Amber software.

Solvation free energies in water solution were calculated with the conductor-like solvent model (COSMO) [71, 72] real-solvent (COSMO-RS) approach [73, 74] using the COSMOTHERM software [75]. These calculations were based on two single-point BP86 calculations [76, 77] with the TZVP basis set [78], one performed in a vacuum and the other in the COSMO solvent with an infinite dielectric constant. Owing to the extensive negative charge of the hosts, we had to use the undocumented ADEG option to force the program to accept that the solvation energy is very large.

Thermal corrections to the Gibbs free energy at 298 K and 1 atm pressure (∆Gtherm), including zero-point vibrational energy, entropy and enthalpy corrections, were calculated by an ideal-gas rigid-rotor harmonic-oscillator approach [79] from vibrational frequencies calculated at the MM level (i.e. with the GAFF force field and the same charges as in the MD simulations). The frequency calculations were preceded by a minimisation at the same level of theory. To obtain more stable results, low-lying vibrational modes were treated by the free-rotor approximation, using the interpolation model suggested by Grimme with ω0 = 100 cm−1 [16]. The translational entropy was corrected by 7.99 kJ/mol for the change in the standard state from 1 atm to 1 M (used in the experiments [39]). Unfortunately, we discovered after the submission of the results that for most complexes, the ligand dissociated from the host during the MM minimisation before the frequency calculations. Therefore, the thermal corrections were recalculated after the submission, using a restraint to the starting structure during the geometry optimisation.

For the SQM calculations, energies obtained according to Eqs. (4 and 5) were calculated for all 10 or 20 snapshots and the final absolute ∆Gbind energy was obtained by either taking the minimum value, the average value or the Boltzmann-weighted average value.

In the second (DFT) approach, the structure with the most favourable SQM ∆Gbind energy was further optimised at the QM/MM level with the host, the ligand and four water molecules in the QM system, treated with the TPSS-D3/def2-SV(P) method [51, 52]. These calculations were performed with the ComQum program [80, 81], which is an interface between AMBER [55] and the QM software Turbomole software [53]. In these calculations, the MM system was kept fixed. The minimisations were run until the energy change between two iterations was less than 0.003 kJ/mol and the maximum norm of the Cartesian gradients was below 10−3 a.u. All complexes converged within 150 geometry iterations.

For the optimised structures, ∆EQM was calculated with the TPSS functional and the def2-QZVP’ basis set (the def2-QZVP basis set [52] with the f-type functions on hydrogen and the g-type functions on the other atoms deleted). The dispersion energy was included using the DFT-D3 approach [82] with Becke–Johnson damping [83] and third-order terms included. All DFT calculations were sped up by expanding the Coulomb interactions in auxiliary basis sets with the resolution-of-identity approximation (RI), using the corresponding auxiliary basis sets [84, 85]. The multipole-accelerated resolution-of-identity J approach was also employed [86]. All DFT calculations were performed using the Turbomole 7.1 or 7.2 software [53]. Finally, absolute ∆Gbind energies were obtained with Eqs. (4 and 5), using the same approach to get ∆Gsolv and ∆Gtherm as for the SQM structures. However, the final ∆Gbind was based on a single DFT structure.

Geometric measures

We have used several geometric measures [15] to analyse the structures of the host–guest systems, as described below. Atom names used in the descriptions are shown in Figure S3.

  • r Dm measures how deep the guest is inside the host; it is defined as the closest distance between any guest atom and the average of the coordinates of the four HD atoms at the bottom of the host (AD).

  • α T shows the orientation of the ligand inside the host and is defined as the angle between the guest C1–C2 vector (C1 is the carboxylate carbon and C2 is the carbon bound to C1) and the host AD–AB vector, where AB is the averaged coordinates of the four HB atoms on the top of the host.

  • r O1 and rO2 describe how much the guest reaches out of the host. They are the distance between ligand carboxylate O1 or O2 atoms and the average plane defined by the four CC atoms of the host. ΔrO = |rO1 − rO2| and shows how tilted the carboxylate group is relative to the host.

  • ΔrBB is defined as the difference in distance between two sets of opposite HB host atoms and measures the distortion of the host.

  • rC1 and rC2 describe the orientation of the host carboxylate groups. They are defined as the distance between two opposite CO atoms. rCav is the average of rC1 and rC2.

Error estimates, quality and overlap measures

All reported uncertainties are standard errors of the mean (standard deviations divided by the square root of the number of samples). The uncertainty of the MBAR free energies calculated at each λ or Λ value was estimated by bootstrapping using the PYMBAR software [66] and the total uncertainty was obtained by error propagation.

The performance of the free-energy estimates was quantified by the mean signed deviation (MSD), the mean absolute deviation (MAD), the MAD after removal of the MSD (MADtr), the root-mean-square deviation (RMSD), the maximum error (Max), the correlation coefficient (R2), the slope of the best correlation line and Kendall’s rank correlation coefficient (τ) compared to experimental data [39]. For relative affinities, τ was calculated only for the transformations that were explicitly studied, not for all combinations that can be formed from these transformations (this is marked by calling it τr). Moreover, it was also evaluated considering only differences (both experimental and calculated) that are statistically significant at the 90% level (τ90 and τr,90 for absolute and relative affinities, respectively) [87]. Note that R2 and the slope for relative affinities depend on the direction of the perturbation (i.e. whether L0 → L1 or L1 → L0 is considered, which is arbitrary). This was solved by considering both directions (both forward and backward) for all perturbations when these two measures were calculated.

The standard deviation of the quality measures was obtained by a simple simulation approach [88]. For each transformation, 1000 Gaussian-distributed random numbers were generated with the mean and standard deviation equal to the MBAR and experimental results [39] for that transformation. Then, the quality measures were calculated for each of these 1000 sets of simulated results and the standard error over the 1000 sets is reported as the uncertainty.

For all λ and Λ values of all FEP calculations, we have monitored five overlap measures, to ensure that the overlap of the studied distributions is satisfactory, viz. the Bhattacharyya coefficient Ω [89], the Wu and Kofke overlap measures of the energy probability distributions (KAB) [90] and their bias metrics (Π) [90], the weight of the maximum term in the exponential average (wmax) [91] and the difference of the forward and backward exponential average estimate (ΔΔGEA) [92]. If Π < 0 or two of the following criteria were not fulfilled: Ω > 0.7, KAB > 0.7, Π > 0.5, wmax < 0.3, ΔΔGEA < 4 kJ/mol, additional λ or Λ values were included. Overlap measures obtained in the various simulations are listed in Table S7.

Result and discussion

In this study, we have calculated the free energies for the binding of the eight ligands G0–G7 in Fig. 1 to the normal (OAH) and methylated (OAM) deep-cavity octa-acid hosts (also shown in Fig. 1) within the SAMPL6 blind-test challenge [38, 39]. Thus, the experimental data were not known when the calculations were performed and were revealed only after the predictions were submitted. The experimental affinities were measured by isothermal calorimetry in aqueous 10 mM sodium phosphate buffer at pH 11.7 and 298 K [39]. We employed four different methods and submitted four data sets. The methods are summarised in Fig. 4. First, we performed standard relative FEP calculations at the MM level. Second, we performed QM/MM-FEP calculations using the reference-potential approach with explicit QM/MM sampling (RPQS) [22] at the PM6-DH+ level of theory (only ligand treated by QM). Third, absolute binding free energies were estimated by PM6-DH+/MM optimisations on 10–20 snapshots from MD simulations (host, ligand and 4 water molecules treated by QM), with energies supplemented by continuum solvation and thermostatistical corrections (Eq. 4). Fourth, the most energetically favourable of the latter structures were reoptimised with DFT and energies were calculated with DFT and large basis sets. The results are described below in separate subsections.

Fig. 4
figure 4

Schematic description of the four different approaches employed. H, G and 4W represent the molecules included in the QM system: host, guest and four water molecules

FEP calculations at the MM level

We have calculated the relative binding free energies of the SAMPL6 ligands G0–G7 by FEP calculations at the MM level. As can be seen in Fig. 1, the eight ligands contain a carboxylic group and six to ten carbon atoms. G0 and G2 involve a five- or six-membered ring and all except G0 and G6 have one or two double bonds. Ligands G2, G4 and G5 are chiral and they were used in the isomers shown in Fig. 1 (since the host is achiral, the actual form should not matter for the binding affinities).

We developed a FEP scheme, shown in Fig. 2, in which the eight ligands are connected, keeping the change as small as possible. This was partly accomplished by adding four extra ligands, which are the aliphatic carboxylates with five to eight carbon atoms, A5–A8. Thereby, the perturbations are restricted to the introduction of a double bound, the conversion of a H atom to a methyl group, the closure of a ring, or in one case (G2), formation of a cyclohexene ring by the addition of two carbon atoms. The aliphatic ligands were employed also because experimental binding affinities are available for A6 and A8 to OAH [61], giving us the opportunity to convert the relative energies to absolute affinities. For the same reason, the MeHx ligand from SAMPL4 (shown in Fig. 1a) was also added and connected to G2. To connect the calculations of OAH and OAM, and to obtain absolute affinities for the OAM ligands, we also converted OAH to OAM with and without the A6 ligand bound (adding the four methyl groups at the same time).

The calculated relative affinities are listed in Table 1 (\(\Delta \Delta G_{{{\text{bind}}}}^{{{\text{MM}}}}\) column; free energies calculated with MBAR). It can be seen that the statistical uncertainty for most ∆∆Gbind estimates is low, 0.1–0.5 kJ/mol, owing to the use of three independent FEP calculations. This reflects that the three estimates give similar results, with a variation of up to 1.0 kJ/mol for OAH and 1.6 kJ/mol for OAM. However, for two perturbations with both hosts, G2 → MeHx and G4 → A8, the variation is much larger (9–20 kJ/mol) and therefore the precision is much worse, 1.5–2.6 kJ/mol even if we employed six independent simulations for these perturbations.

Table 1 Calculated relative binding free energies (kJ/mol) for the OAH and OAM hosts, obtained with FEP at the MM and PM6-DH+/MM levels for the perturbation scheme in Fig. 2

Besides the G5 → G7 perturbation, the results in Table 1 are not directly comparable to the experimental data, because they involve the A5–A8 and MeHx ligands that are not involved in the SAMPL6 measurements. We have used two different approaches to solve this problem. For the submitted data, we employed previously published experimental data for A6, A8 and MeHx in OAH [40, 61] to calculate absolute affinities for all ligands. This is a bit risky, because ∆∆Gbind measured in different studies (at slightly different conditions) vary somewhat. For example, the experimental free energy of the Hx ligand (Figure S1) binding to OAH, involved in SAMPL4, vary between 21.1 and 23.5 kJ/mol in two publications by the same group [40, 93] and the results for A6, A8 and A10 vary by 1.5–2.8 kJ/mol [61, 93] (we employed the newer data in this article).

Our initial calculations along these lines showed that the calculated data were somewhat problematic: As can be seen in Fig. 2, the A6 and A8 ligands are connected by two perturbations, A6 → A7 → A8. However, the initial result for these perturbations was quite poor, − 11.3 ± 0.3 kJ/mol, compared to the difference in the experimental ∆∆Gbind for A6 and A8, − 4.9 or − 6.2 kJ/mol in the two experimental studies [61, 93]. We therefore rerun these two perturbations with the whole ligand included in the perturbed group (instead of only the differing atoms). For A7 → A8, this did not change the results significantly, as can be seen in Table 1 (entry “entire ligand”). However, for A6 → A7, the result changed by 4 kJ/mol, bringing the A6 → A8 estimate closer to experiments, − 7.8 ± 1.2 kJ/mol. Unfortunately, we did not have time to rerun all the other perturbations with the whole ligand in the soft-core group, but we used the latter results for the A6 → A7 perturbation and also corrected the corresponding results for OAM with the difference between the two A6 → A7 perturbations for OAH.

Absolute affinities calculated this way are shown in Table 2 (MM columns), together with the reference ligands (from which the experimental data was taken, because there are several possibilities) and the experimental data for SAMPL6 [39] (revealed after submission our results). As can be seen in Fig. 5a, the agreement is rather good with errors of 1.9–9.7 kJ/mol for the 16 predictions. However, the MAD is rather high, 5.6 ± 0.3 and 6.2 ± 0.3 kJ/mol for the two hosts. For most of the ligands, the predicted affinities are less negative than the experimental ones—the MSD is 5.0 ± 0.3 and 2.0 ± 0.4 kJ/mol for the two hosts. Yet, for G4 in both hosts and G2 in OAM, the opposite is true (note that these two ligands were involved in transformations with a poor precision in Table 1). If the systematic error is removed, the MAD is improved significantly. For OAH, the MADtr is good, 2.6 ± 0.3 kJ/mol, whereas it is worse for OAM, 5.2 ± 0.4 kJ/mol. The reason for this is probably that the experimental data employed to calculate the absolute affinities were all for OAH, so the results for OAM involve more perturbations and therefore the possibility of accumulation of errors.

Table 2 Calculated absolute binding free energies (kJ/mol) for the SAMPL6 ligands in the OAH and OAM hosts obtained with FEP at the MM and PM6-DH+/MM levels
Fig. 5
figure 5

Comparison of the experimental and calculated absolute affinities obtained with a MM-FEP and b QM/MM-FEP methods. The line shows the perfect correlation

On the other hand, the correlation between the experimental and calculated results is better for OAM (R2 = 0.85 ± 0.02) than for OAH (0.77 ± 0.05), although the difference is not fully significant. The same applies also for τ90, which is 0.84 ± 0.02 and 0.79 ± 0.02 for OAM and OAH, respectively.

Alternatively, we instead considered only relative affinities. These were obtained by combining two or three perturbations so that they go between only the G0–G7 ligands. This can be done in a few different ways and one connected and consistent set of seven relative energies are shown in Table 3 (MM columns). It can be seen that the results are quite similar to those of the absolute affinities. The errors vary between 0.4 and 7.3 kJ/mol, except for the G0 → G2 difference in OAM, for which the error is as much as 13.6 kJ/mol (the calculated result overestimates the true difference, but with the correct sign). Consequently, the MAD is larger for OAM (5.1 ± 0.2 kJ/mol) than for OAH (3.1 ± 0.2 kJ/mol). R2 is also better for OAH (0.87 ± 0.02, compared to 0.61 ± 0.04). On the other hand, τr,90 is perfect for OAM (all statistically significant differences have the correct sign), whereas it is 0.71 for OAH (one difference has the incorrect sign). The single perturbation that involves only SAMPL6 ligands (G5 → G7) gives errors of the same size as the combined perturbations (3–7 kJ/mol), indicating that the results are not biased by poor performance of the added A5–A8 and MeHx ligands. Similar results are also obtained if relative energies involving the G0–G7 ligands, as well as the A6, A8 and MeHx ligands are considered (Table S2).

Table 3 Calculated relative binding free energies (kJ/mol) for the SAMPL6 ligands in the OAH and OAM hosts obtained with FEP at the MM and PM6-DH+/MM levels

FEP calculations at the QM/MM level

Next, we used the RPQS approach to calculate all the relative binding affinities at the QM/MM level. For this, we performed MM → QM/MM FEP calculations for all G0–G7 and A5–A8 ligands both when bound to the host and free in solution (cf. Fig. 3). The results are shown in Table 4. The individual MM → QM/MM free energies calculated when the ligand is bound to the host \(\left( {\Delta G_{{{\text{L}},{\text{bound}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}} \right)\) or free in solution \(\left( {\Delta G_{{{\text{L}},{\text{free}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}} \right)\), ranged from − 507 to − 691 kJ/mol, except for G4 (around − 260 kJ/mol) and G7 (around − 962 kJ/mol). However, for each ligand, the values in the host and in solution were of a similar size, and the resulting \({\text{MM}} \to {\text{QM}}/{\text{MM}}\) correction to ∆Gbind (\(\Delta \Delta G_{{{\text{bind}},{\text{L}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}\), shown in Table 4) ranged between − 8.8 and + 5.7 kJ/mol.

Table 4 Calculated \({\text{MM}} \to {\text{QM}}/{\text{MM}}\) free energies (kJ/mol) for ligands G0–G7 and A5–A8 (\(\Delta \Delta G_{{{\text{bind}},\;{\text{L}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}=\Delta \Delta G_{{{\text{L}},{\text{bound}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}} - \Delta \Delta G_{{{\text{L}},{\text{free}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}\))

The standard errors were between 0.2 and 0.4 kJ/mol, except for G1 and G2 bound to OAM, for which they were 0.9 and 1.2 kJ/mol. For G0–G7 in OAM, we run duplicate calculations and for these two ligands, the results differed by 1.9 and 2.4 kJ/mol, whereas for the other ligands, they agreed within 0.5 kJ/mol. In fact, the large variation came from the \(\Delta G_{{{\text{L}},{\text{bound}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}~\) term, which varied by 2.6 and 3.2 kJ/mol for these two ligands, but less than 0.2 kJ/mol for the other ligands. For the \(\Delta G_{{{\text{L}},{\text{free}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}\) term, for which we have two or three samples of each, the variation was 0.1–0.8 kJ/mol, except for G2 and G6 (1.2 and 2.0 kJ/mol).

We used five overlap measures (described in the Method section and shown in Table S7) to check that the calculated MM → QM corrections are reliable. Based on these, we added intermediate Λ values for some of the ligands, as is shown in the last two columns of Table 4.

Next, the \(\Delta \Delta G_{{{\text{bind}},~{\text{L}}}}^{{{\text{MM}} \to {\text{QM}}/{\text{MM}}}}\) corrections in Table 4 were combined with the results of the FEP calculations at the MM level (\(\Delta \Delta G_{{{\text{bind}},{{\text{L}}_0} \to {{\text{L}}_1}}}^{{{\text{MM}}}}\) in Table 1) to get the final PM6-DH+/MM relative binding free energies \(\left( {\Delta \Delta G_{{{\text{bind}},{{\text{L}}_0} \to {{\text{L}}_1}}}^{{{\text{QM}}/{\text{MM}}}}} \right)\). These results are also included in Table 1. It can be seen that most MM → QM corrections are rather small, 0.1–3.7 kJ/mol (average 2.2 kJ/mol). However, for the G1 → A6 and G2 → A7 perturbations they are − 5.4 to − 10.7 kJ/mol.

These relative energies were then recalculated to absolute affinities in the same way as for the FEP results at the MM level. These results are shown in Table 2 (QM/MM columns) and in Fig. 4b. They differ from the MM results by 2 kJ/mol on average with a maximum difference of 5.7 kJ/mol. For OAH, the results are consistently less negative than the experimental results, by 5.0–9.5 kJ/mol for all ligands except G4 (0.6 kJ/mol; MSD = 6.7 ± 0.3 kJ/mol). Therefore, the MAD is rather high, 6.7 ± 0.3 kJ/mol, but the MADtr is excellent, 2.4 ± 0.4 kJ/mol. For OAM, the deviation is less systematic and more varying with a MSD of 1.9 ± 0.4 kJ/mol, MAD = 5.5 ± 0.4 kJ/mol, MADtr = 5.0 ± 0.5 kJ/mol and a maximum error of 10.2 kJ/mol for G4. However, the correlation is better for OAM (R2 = 0.93 ± 0.02, compared to 0.81 ± 0.04 for OAH) and τ90 is perfect for OAM, but 0.84 ± 0.02 for OAH. Compared to the MM-FEP results, the performance for OAH is similar (MAD, MSD and Max are worse, but MADtr, R2 and τ90 are better). However, for OAM, the QM/MM-FEP results are clearly better for all quality measures, except for the maximum error.

We also made the corresponding analysis for the relative energies in Table 3 (QM/MM columns). The results, are similar to those obtained for the absolute energies: The MAD is lower for OAH than for OAM (3.9 ± 0.4 compared to 4.9 ± 0.4 kJ/mol). However, the correlation coefficient (R2) is better for OAM, 0.73 ± 0.04, compared to 0.56 ± 0.08. τr,90 is perfect for both hosts. Compared to the MM-FEP results, the two methods have a similar performance for OAH (MAD, R2 and Max are better for MM-FEP, MSD and τr,90 is better for QM/MM-FEP), but QM/MM-FEP is better (or equal) for OAM for all quality measures.

Absolute binding affinities from minimised semi-empirical structures

Next, we tried to calculate absolute binding affinities for all the SAMPL6 host–guest complexes with QM-optimised structures, using a variation of an approach developed by Grimme [16, 70]. In the SAMPL5 study [31], we noticed that vacuum optimisations led to structures that had the guest carboxylate groups too much buried inside the host, forming hydrogen bonds with the host, rather than with water. This could only partly be remedied by running the optimisations with an implicit solvent method, such as COSMO [71, 72] or by including four explicit water molecules in the calculations. Therefore, in this study, we decided to base the calculations on snapshots from long MD simulations of the complex, employing QM/MM optimised structures with explicit water molecules in the MM system and including the host, guest and four water molecules (that form hydrogen bonds with the carboxylate group of the guest) in the QM system. We performed 100 or 200 ns MD simulations for each host–guest complex and extracted 10 or 20 snapshots from these.

To make the calculations rapid, allowing for calibration also on the previous SAMPL4 and SAMPL5 structures, we chose to employ the semiempirical dispersion- and hydrogen-bond-corrected PM6-DH+ method for the QM calculations. This reduced the computational effort to 3–5 h (single-core) for the QM/MM minimisations, compared to 2–4 weeks for the previous DFT optimisations. This could in principle be further sped up by using parallel calculations or by keeping the MM system fixed or restrained during the minimisation. After the minimisation, single-point PM6-DH+ energies were calculated for the isolated host–guest complex and these energies were combined with COSMO-RS solvation energies and thermostatistical corrections from a MM frequency calculation, according to Eq. 4. The PM6-DH+ energy and MM frequency calculations took only some tens of seconds to complete, leaving the COSMO-RS solvation energy calculations as the computational bottleneck, as these can take as much as 1 day to converge (besides the initial MD simulations, which took about 5 h per 10 ns on one GPU).

We started by testing the protocol on the nine cyclic carboxylates binding to OAH in the SAMPL4 competition [25] (Figure S1), the four carboxylic ligands binding to OAH and OAM in SAMPL5 [26] (S5–G1, S5–G2, S5–G4 and S5–G6, shown in Figure S2; we omitted the two positively charged ligands as all SAMPL6 ligands have a single negative charge), as well as A6, A8 and A10 binding to OAH [61].

As described above, the binding energies were obtained from 10 to 20 snapshots from the MD simulations. Therefore, we need to decide how these binding energies should be combined to single final estimate. To this end, we compared three different approaches: the averaged energy, the minimum energy and the Boltzmann-weighted averaged energy. The results are presented in Tables 5 and 6 and are shown in Fig. 6. It can be seen that all three methods give too weak binding (the calculated ∆Gbind is less negative than the experimental one). Of course, this underestimation is largest for the averaged energies (MSD = 24 kJ/mol for both sets) and smallest for the minimum energies (11–13 kJ/mol). Besides this systematic error, the minimum and Boltzmann-averaged energies give similar results (the former is slightly better for SAMPL4, whereas the opposite is true for SAMPL5). Moreover, the averaged energies actually give the lowest MADtr and the best R2 and τ90 results for SAMPL4. Theoretically, Boltzmann averaging is the preferred approach and it gave the best results in SAMPL5 [36], so we therefore used this approach for the submitted energies. The better performance of the plain averages for SAMPL4 may indicate that the sampling was incomplete. The averaged energies have the advantage of giving an uncertainty. It is quite high for all ligands, 2–6 kJ/mol, again showing that much more snapshots are needed to reach reliable results.

Table 5 ∆Gbind (kJ/mol) calculated for the SAMPL4 ligands with SQM approach and three different ways to combine the 10 energies from different snapshots, plain average, the minimum energy (Min) or the Boltzmann average (Boltz). The second column shows the experimental results [40]
Table 6 ∆Gbind (kJ/mol) calculated for the SAMPL5 ligands with SQM approach and three different ways to combine the 10 energies from different snapshots, plain average, the minimum energy (Min) or the Boltzmann average (Boltz). The third column shows the experimental values
Fig. 6
figure 6

Comparison of the experimental and calculated absolute affinities obtained with the SQM method and three different ways to combine the 10 energies from different snapshots, plain average (Av), the minimum energy (Min) or the Boltzmann average (Boltz) for the a SAMPL4 and b SAMPL5 ligands. The line shows the perfect correlation

We have previously studied the same systems with minimised QM structures, but using more expensive DFT-D3 methods. For SAMPL4, the new results are of a similar quality (after removing the systematic error): The MADtr is 5.1–6.5 kJ/mol, which is similar or better than the previous DFT-D3 results, 4.6–8.6 kJ/mol. On the other hand, R2 is lower, 0.10–0.29, compared to 0.60–0.78, and τ90 is also worse (0.23–0.58, compared to 0.71–0.77). However, for SAMPL5, all the new results are much better than the old DFT-D3 results: MADtr = 3.9–4.6 kJ/mol, compared to 11–21 kJ/mol, R2 = 0.17–0.52, compared to 0–0.30 (and in many cases negative correlation), and τ90 = 0.33–0.41, compared to − 0.33 to 0.33. Still, it should be remembered that we did not include in this study the two ligands with trimethylammonium groups, which gave problems in the previous study. Yet, we believe that the present approach involves an important advantage: The geometries were optimised with PM6-DH+/MM in water, including four water molecules in the quantum system, which resulted in a lower repulsion of the negative carboxylate groups and a more realistic binding pose of the guests (with the guests always above the upper rim of the hosts and the carboxylate groups pointing upwards, forming hydrogen bonds with water molecules), shown in Figure S4 and described in Table S3. However, the vibrational frequencies still seem to be a problem, giving too positive binding affinities. In fact, the results without the frequency term gave a better MADtr for SAMPL4, but not for SAMPL5.

The results for the SAMPL6 ligands with the SQM approach using Boltzmann-weighted energies are collected in Table 7 and shown in Fig. 7a. It can be seen that they are quite similar to the results obtained for the SAMPL4 and SAMPL5 tests, with a systematic underestimation of the binding by MSD = 12–14 kJ/mol. For OAH, the MADtr is quite high 7.8 kJ/mol. The correlation is poor (R2 = 0.07), as is τ90 = 0.07. However, for the OAM host, all results are better: MADtr = 2.9 kJ/mol is excellent, and R2 and τ90 are also improved, 0.42 and 0.43, respectively.

Table 7 ∆Gbind (kJ/mol) calculated for the SAMPL6 ligands with SQM and DFT approaches and three different ways to combine the 10 energies from different snapshots, plain average, the minimum energy (Min) or the Boltzmann average (Boltz)
Fig. 7
figure 7

Comparison of the experimental and calculated absolute affinities obtained with the a SQM and b DFT methods, the former with two different ways to combine the 10–20 energies from different snapshots, plain average (Av) or the Boltzmann average (Boltz) for the SAMPL6 ligands. The line shows the perfect correlation. In a OAH energies are shown with squares and OAM energies with crosses

Interestingly, all relative results would have been improved if we had selected to submit the results of the pure average instead. For OAH, they give a MADtr of 5.1 ± 1.4 kJ/mol, a correlation of 0.16 ± 0.21 and a perfect τ90 of 1.00 ± 0.07. For OAM, MADtr = 2.3 ± 0.5 kJ/mol, R2 = 0.88 ± 0.08 and τ90 = 1.00 ± 0.05.

In Table S4, the various energy components are shown for the SQM calculations for the snapshot with the best binding energy. It can be seen that the thermostatistical term shows a small variation, 46–58 kJ/mol for OAH and 55–78 kJ/mol for OAM. It shows a weak anti-correlation with the experimental binding energies, R2 = 0.55, for OAH and 0.28 for OAM. The QM term is always large and positive, somewhat lower for OAH than for OAM, 798–942 and 758–865 kJ/mol. It shows a poor correlation with the experimental data for OAH (R2 = 0.16), but appreciably better for OAM (0.66). It is more than compensated by the solvation energy, which is again is larger in magnitude for OAH than for OAM, − 875 to − 995 and − 843 to − 932 for OAM. It shows a similar (anti-)correlation to the experimental data as the QM term, 0.12 for OAH, but 0.68 for OAM. The sum of the latter two terms shows an improved correlation to the experimental data for OAH (0.28) but a worse correlation for OAM (0.24). Adding the thermostatistical correction deteriorates the correlation for OAH, but improves it for OAM.

Figure S5 shows the variation of the individual SQM ∆Gbind results for the eight ligands in the two hosts. It can be seen that it is 20–35 kJ/mol for most ligands. However, G4 and G6 in OAH, as well as G5 and G7 in OAM show a larger variation 40–64 kJ/mol. There is little correlation between the variation and the strength of the binding or the type of host.

Absolute binding affinities from minimised DFT structures

Finally, we tried to improve the absolute binding affinities by using DFT calculations both in the geometry optimisations and in the energy estimates. Thus, we selected the minimum-energy snapshot according to SQM calculations and performed DFT/MM optimisation with the surrounding water included as a fixed MM system. We then calculated energies of the resulting structures in a way similar to that used for SQM (Eq. 4), still using thermostatistical corrections from MM vibrational frequencies and COSMO-RS solvation energies, but with TPSS-d3/def2-QZVP’ energies instead of the PM6-DH+ energies.

The DFT results are also included in Table 7 and they are shown Fig. 7b. They are less positive than the SQM results. In fact, the MSD for OAH is actually slightly negative, − 2.4 kJ/mol, whereas it is 24 kJ/mol for OAM. The solvation energies are of a similar magnitude in the two sets of calculations, whereas the energies are somewhat more positive for the PM6-DH+ calculations on OAH (by 13 kJ/mol on average), but less positive for OAM (by 9 kJ/mol on average). The thermostatistical correction is of a similar magnitude. The MADtr is 7.0–7.8 kJ/mol for the two hosts, R2 = 0.38–0.73 and τ90 = 0.29–0.63 (better for OAM than for OAH), i.e. mostly within the range of the SQM methods.

As mentioned in the “Methods” section, the ligand dissociated in most of the original MM minimisations (to calculate the frequencies for the thermostatistical corrections). This was not discovered until after the submissions. For the DFT calculations, the ligand did not dissociate, but by mistake, the calculations were performed with zeroed partial charges on all atoms of the host and the ligand. The submitted SQM and DFT results are provided in Table S4. The original SQM submission gave a much lower systematic error (MSD), but a similar performance in terms of the relative quality measures (MADtr, R2 and τ).

Finally, we compare structures of the complexes obtained in the MD simulations and after minimisation with either SQM or DFT, employing a number of geometric measures, which are described in the "Methods" section. The results are collected in Table S6 and it can be seen that the guests always bind inside the host, with the carboxylate group 0.7–4.6 Å above the upper rim, forming hydrogen bonds with the water molecules and not with the host. The average distances between the carboxylate atoms of the guests and the upper rim of the hosts, rO1 and rO2 are always slightly smaller for the SQM than the MD structures, but only by 0.1–0.7 Å, whereas the results for the DFT structures are more varying (probably because they are based on a single structure). Likewise, the ligand is always less deeply buried in the host in the SQM calculations than in the MD simulations, by up to 0.6 Å and the benzoate groups are less tilted (rCav is 0.1–0.3 Å smaller). In most cases, the tilt angle (αT) of the ligand was also somewhat smaller (4° on average) with SQM than in MD. All these differences probably reflect differences in the potential-energy method and the fact that the SQM structures are minimised and not from a MD simulation rather than a systematic error of the SQM structures, observed in our previous approaches [15, 31]. Figure S6 compares the DFT and SQM structures, showing that they are very similar.

Comparison with other submissions

There were 43 submissions for the SAMPL6 octa-acid challenge from eight research groups. Of course, the results depend on whether the two hosts are considered separately or together and how the various measures are combined and weighted. Here, we discuss the results based on six quality measures (MAD, RMSD, MSD, R2, τ and slope), provided by the organisers for the combined OAH and OAM results and give a final ranking based on the sum of the ranks for these six measures. Irrespectively of how the ranking is done, three pairs of calculations always come out among the best. Two submissions from the Merz group, gave the lowest MAD and RMSD (3.2 and 4.0 kJ/mol, respectively). They also gave quite good results for the other measures, e.g. R2 = 0.60–0.85 and τ = 0.37–0.74. The best calculation employed potential-of-mean-force umbrella-sampling simulations (i.e. dragging the ligand out of the host) and scaled the results based on the corresponding results obtained for the SAMPL5 ligands (without the scaling, the results were appreciably worse, ranking around position 14). The other calculation used the movable-type approach, fitted to the former result. In fact, this group submitted 27 data sets, which ranked from the best to the third worst.

A FEP study of absolute free energies by the Michel group, employing GAFF with AM1-BCC charges, TIP3P water and with or without counter ions also gave good results, but only after a linear fit employing the results from the SAMPL5 competition. They obtained MAD and RMSD of 5.3–5.6 and 6.7–7.4 kJ/mol, respectively, R2 = 0.78–0.79 and τ = 0.70, making them the fourth and sixth best methods. Without the linear fit, the results ranked 19–35.

Our MM-FEP and QM/MM-FEP gave similar results with MAD = 5.6–6.1 kJ/mol and RMSD = 6.3–6.8 kJ/mol, R2 = 0.66–0.71 and τ = 0.62–0.77. In fact, τ for MM-FEP was best among all submissions. Based on the sum of the ranks for all six quality measures, MM-FEP gave the third best results and QM-FEP the fifth among all 43 submissions. In particular, they were clearly the best submissions using only the raw SAMPL6 data, without any fit to the SAMPL5 data.

The DFT and SQM results ranked slightly below the middle, positions 28 and 29, with MAD = 9–11 kJ/mol and RMSD = 11–13 kJ/mol, R2 = 0.1–0.5 and τ = 0.3–0.4. However, the performance may have improved if relative quality measures were considered, like MADtr, and they would also have improved if we had submitted the average results, instead of the Boltzmann-weighted results. Still, it is quite satisfying that for the first time, a QM approach, QM/MM-FEP, come within the best six submissions. Moreover, both SQM and DFT gave decent results, better than many of the MM FEP results, e.g. a submission employing FEP with the polarisable AMOEBA force field [94] and the absolute FEP calculations without the linear fit. In particular, they are appreciably better than the other purely QM submission, employing B3PW91 calculations with complete basis sets and a SMD continuum solvent [95], which performed poorly.

Conclusions

We have studied the binding of eight ligands to two variants of the octa-acid deep-cavity host in the SAMPL6 blind-test competition [38, 39]. We have employed four different approaches (cf. Fig. 4), three of which are based on QM methods. First, we performed standard relative FEP calculations at the MM level with free energies calculated with MBAR and employing the GAFF+TIP3P force fields and RESP charges. Second, we used the reference-potential approach with explicit QM/MM sampling to obtain relative FEP free energies at the PM6-DH+/MM level of theory for the ligand. Third, we employed the same SQM method to obtain QM/MM optimised structures with the ligand, the host and four water molecules in the QM system, for which free energies were calculated by combining the PM6-DH+ interaction energies with COSMO-RS solvation free energies and thermostatistical corrections calculated at the MM level. We employed 10–20 structures taken from a MD simulation of the host–guest complexes. Finally, we reoptimised the best structures from the previous approach with the TPSS-D3/MM method and calculated QM energies with a large basis set, which were then combined with COSMO-RS and thermostatistical corrections.

The MM- and QM/MM-FEP methods gave excellent results for OAH, with MADtr of 2.4–2.6 kJ/mol and R2 of 0.77–0.81. For OAM, the MADtr was somewhat larger, 5.0–5.2 kJ/mol, but the R2 was better, 0.85–0.93. For the former, the two approaches gave similar results, whereas for OAM, QM/MM-FEP was clearly better. These results were among the five best submissions to SAMPL6 and they were actually the best ones using no fit to data from SAMPL5.

The results obtained with QM/MM optimised structures were somewhat worse, especially for OAH; MADtr = 2.3–5.1 kJ/mol and R2 = 0.16–0.88. Unfortunately, we selected to submit SQM results based on Boltzmann-averaged, rather than plain averaged energies, which gave somewhat worse results, MADtr = 2.9–7.8 kJ/mol and R2 = 0.07–0.42. However, these methods gave similar results as our previous calculations with DFT-optimised structures in SAMPL4 and much better results for SAMPL5. Compared to the other submissions, these results were mediocre, but still comparable to many approaches employing MM-FEP methods. In particular, they gave better performance than other submissions employing QM-optimised structures.

The present results are quite satisfying because for the first time we are able to improve MM-FEP results for the octa-acid host with QM/MM methods and these results are among the best five submissions. These results were obtained with the simple and cheap SQM PM6-DH+ method, demonstrating that appropriate sampling and properly converging the MM → QM/MM FEP is of greater importance than using more rigorous QM methods. However, with a functional QM/MM-FEP approach, our next challenge will be to extend it to more accurate QM methods and larger QM systems.

For the QM-minimised structures, we have shown that the results are improved by employing QM/MM-optimised structures, rather than QM structures optimised in vacuum or in a continuum solvent. This also made the calculations significantly faster. However, there are still several problems to solve with this approach. In particular, there seems to be a problem with absolute free energies, probably related to the entropy term, which vary by 10–40 kJ/mol, depending on what method is used for the geometries and the frequencies. In particular, we observe that the simple PM6-DH+ method gives lower MADtr than the inherently more accurate TPSS-D3 approach. It would be more satisfactory if the same method is used for the geometries and the frequencies. Moreover, much more sampling seems to be needed before the results are stable and reliable. With 10–20 snapshots, plain averages gave better results (but with large uncertainties, 1–5 kJ/mol). Finally, improved methods to estimate the strain energies of the host and the guest in the complexes are needed.

Still it is very satisfying that QM-based methods finally start to have some impact on calculated binding affinities for host–guest systems.