Binding free energies in the SAMPL5 octaacid host–guest challenge calculated with DFTD3 and CCSD(T)
 1.7k Downloads
 8 Citations
Abstract
We have tried to calculate the free energy for the binding of six small ligands to two variants of the octaacid deep cavitand host in the SAMPL5 blind challenge. We employed structures minimised with dispersioncorrected densityfunctional theory with small basis sets and energies were calculated using large basis sets. Solvation energies were calculated with continuum methods and thermostatistical corrections were obtained from frequencies calculated at the HF3c level. Care was taken to minimise the effects of the flexibility of the host by keeping the complexes as symmetric and similar as possible. In some calculations, the large net charge of the host was reduced by removing the propionate and benzoate groups. In addition, the effect of a restricted molecular dynamics sampling of structures was tested. Finally, we tried to improve the energies by using the DLPNO–CCSD(T) approach. Unfortunately, results of quite poor quality were obtained, with no correlation to the experimental data, systematically too positive affinities (by ~50 kJ/mol) and a mean absolute error (after removal of the systematic error) of 11–16 kJ/mol. DLPNO–CCSD(T) did not improve the results, so the accuracy is not limited by the energy function. Instead, four likely sources of errors were identified: first, the minimised structures were often incorrect, owing to the omission of explicit solvent. They could be partly improved by performing the minimisations in a continuum solvent with four water molecules around the charged groups of the ligands. Second, some ligands could bind in several different conformations, requiring sampling of reasonable structures. Third, there is an indication the continuumsolvation model has problems to accurately describe the binding of both the negatively and positively charged guest molecules. Fourth, different methods to calculate the thermostatistical corrections gave results that differed by up to 30 kJ/mol and there is an indication that HF3c overestimates the entropy term. In conclusion, it is a challenge to calculate binding affinities for this octaacid system with quantum–mechanical methods.
Keywords
Ligandbinding affinities Host–guest systems Densityfunctional theory Dispersion corrections COSMORS DLPNO–CCSD(T) SAMPL5Introduction
One of the most important challenges for computational chemistry is to accurately predict the free energy for the binding of a small molecule to a biomacromolecule. This could involve the binding of a drug candidate to its target receptor, having obvious applications in pharmaceutical chemistry. Consequently, a large number of methods have been developed for this purpose, including statisticsbased docking and scoring methods, molecularmechanics (MM) simulations, and freeenergy simulations (FES) [1, 2, 3, 4, 5, 6]. The binding free energy has contributions from a large number of interactions, such as bonded terms, dispersion, exchangerepulsion, electrostatics, polarisation, charge transfer, charge penetration, solvation, and entropy. As MM force fields have inherent limitations in treating several of these interactions, there has been a growing interest in using quantum–mechanical (QM) methods to improve bindingaffinity calculations [7, 8, 9, 10, 11, 12, 13, 14].
Protein–ligand complexes are very large, involving thousands of atoms and often present major problems in predicting binding affinities, e.g. owing to conformational changes of the protein during ligand binding or changes in the protonation states of the ligand and the receptor. In contrast, organic macrocycles with a few hundred atoms has a much smaller configurational freedom and chemical diversity. Still, the binding of small molecules to such systems involve the same type of interactions as protein–ligand binding, allowing the study of ligand binding in a simpler context. Therefore, there has been quite some interest in such host–guest systems in recent years [15, 16, 17, 18, 19, 20].
In particular, host–guest systems have been studied in blindtest challenges, in which the experimental binding affinity are not known beforehand, which reduces bias against the experimental data. For example, the SAMPL3 blind test involved the binding of eleven different guest molecules to three host molecules [21]. Ten research groups provided predictions but none of them could obtain both a good correlation and a low rootmeansquared deviation (RMSD) from the experimental binding affinities.
In the SAMPL4 competition, two hosts were involved, together with 25 guest ligands [22]. For the curcurbit [7] uril host, the best results were obtained with either FES or the much simpler and faster solvent interactionenergy method [23], both at the MM level of theory, with RMSD of 8 kJ/mol and a correlation coefficient (R ^{2}) of 0.6–0.8 [22, 24]. For the octaacid deepcavity host [25, 26], even better results were obtained by FES calculations at the MM level, giving a correlation (R ^{2}) of 0.9 and RMSD of 4 kJ/mol [22, 27]. This was partly owing to the fact that the ligands were ideally suited for FES calculations of relative energies, with a high degree of similarity and a conserved single negative charge.
We have tried to improve the approach in three aspects: by controlling the structural variation, by reducing the charge and flexibility of the ligand, and by employing a restricted molecular dynamics sampling. In addition, we also tested to improve the QM calculations with the domainbased local pair natural orbital coupledcluster singles and doubles with perturbatively treated triples approach (DLPNO–CCSD(T)) [34].
Methods
Studied systems
We have studied the two octaacid host–guest systems [26, 35] in the SAMPL5 blind challenge [29], shown in Fig. 1. As the name indicates, the hosts have eight negative charges, four benzoic groups at the upper rim of the cavitand and four propionate groups at the lower part of the host. The chemical structure has a fourfold symmetry. The two hosts differ only in that the benzoic groups have either a hydrogen atom or a methyl group in the para position of the carboxylate group (i.e. the position directed towards the cavity). The two hosts will be abbreviated OAH and OAM in the following. Another set of hosts were constructed by replacing the benzoic carboxylate groups and the full propionate groups with hydrogen atoms. These neutralised hosts will be called NOH and NOM, depending on whether they carry the methyl groups or not. They are also shown in Fig. 1.
The six guest molecules are shown in Fig. 2 and will be called G1–G6 below. G1, G2, G4, and G6 have a carboxylic group and therefore a single negative charge. G2 and G6 have a benzoic group, like the nine guests in the SAMPL4 challenge. G1 instead has a hexyne group and G4 an adamantane group. The other two guest molecules, G3 and G5, have a trimethylammonium group, giving them a single positive charge (independent of pH). G3 contains a hexane chain, whereas G5 involves an ethylbenzene group. The binding affinities were measured at pH 11.3–11.5 in order to ensure that all carboxylic groups are fully charged [29].
Structures of the hosts, guests, and complexes were built manually, based on structures obtained by MM and QM for the SAMPL4 ligands. The isolated hosts were forced to be symmetric and for the complexes we also tried to keep an approximate symmetry, thereby making the structures as similar as possible.
DFTD3 calculations
All DFT calculation were performed with the TURBOMOLE 7.0 software [36, 37]. All structures (complexes, as well as isolated hosts and guests) were optimised with the TPSSD3 method [38] and the def2SV(P) basis set [39] in a vacuum. Dispersion was included by the DFTD3 approach [40], with default damping. For each optimised structure, more accurate QM energies were calculated with both the TPSS and PBE [41] functionals and the def2QZVP’ basis set, i.e. the def2QZVP basis set [39] with the ftype functions on hydrogen and the gtype functions on the other atoms deleted [30]. In these calculations, DFTD3 dispersion with Becke–Johnson damping and thirdorder terms included were calculated with the dftd3 software [42]. All DFT calculations were sped up by expanding the Coulomb interactions in auxiliary basis sets with the resolutionofidentity approximation (RI), using the corresponding auxiliary basis sets [43, 44]. The def2QZVP’ calculations also employed the multipoleaccelerated resolutionofidentity J approach [45].
Solvation free energies in water solution were calculated with the conductorlike solvent model (COSMO) [46, 47] realsolvent (COSMORS) approach [48, 49] using the COSMOTHERM software [50]. These calculations were based on two singlepoint BP86 [51, 52] calculations with the TZVP basis set [53], one performed in a vacuum and the other in the COSMO solvent with an infinite dielectric constant. For the OAH and OAM hosts with their extensive negative charge, we had to use the undocumented ADEG option to force the program to accept that the solvation energy is very large.
Thermal corrections to the Gibbs free energy (including the zero point vibrational energy) were calculated at 298 K and 1 atm pressure using an idealgas rigidrotor harmonicoscillator approach [54] from vibrational frequencies calculated at the HF3c level [55] after a geometry optimisation at the same level of theory. The frequencies were scaled by a factor of 0.86 [30]. To obtain more stable results, lowlying vibrational modes (below 100 cm^{−1}) were treated by a freerotor approximation, using the interpolation model suggested by Grimme and implemented in the thermo program [30]. The translational entropy and therefore also the free energy were corrected by 7.9 kJ/mol for the change in the standard state from 1 atm (used in the thermo program) to 1 M (used in the experiments). For the symmetry number, it was assumed that all isolated hosts have a fourfold symmetry and that the isolated G2 has a twofold symmetry, whereas all other guests and the complexes have a unit symmetry number.
Strictly, the binding free energy should be calculated for optimised structures of all three terms in this equation. However, more stable energies are obtained if the host and guest structures are taken from that in the complex by simply deleting the other moiety (rigid binding free energies) [27, 56]. The latter energies can be corrected by the guest relaxation energy (∆E _{Grlx}), calculated at the TPSS/def2QZVP’ level of theory.
Coupledcluster calculations
DLPNO–CCSD(T) calculations [57, 58, 59] were performed with a development version of the ORCA suite of programs (based on version 3.0.3) [60]. We used the def2TZVPP and def2QZVPP basis sets with the corresponding auxiliary basis sets [39, 53, 61]. For all calculations that involved ligand 4 (which contains bromide), the scalarrelativistic zerothorder regular approximation (ZORA) [62] and consistently segmented allelectron relativistically contracted (SARC) basis sets were used [63]. Basis sets of atoms that belong to negatively charged functional groups were replaced by the corresponding minimally augmented basis sets [64]. All calculations were counterpoise corrected [65]. Hartree–Fock and correlation energies were extrapolated to the complete basis set limit [66]. A combination of NormalPNO thresholds (intramolecular interactions of host and guest molecule) and TightPNO thresholds (intermolecular interactions between host and guest molecule) were used [67, 68]. To obtain binding free energies, we simply replaced the ∆E _{QM} + ∆E _{disp} terms in Eq. 1 with the DLPNO–CCSD(T) energy.
MD simulations
To study the structure of the complexes in water solution and to sample a set of relevant structures all twelve host–guest systems were studied by molecular dynamics (MD) simulations. The simulations were performed only for the NOH and NOM systems and they were started from the optimised TPSSD3/def2SV(P) structures. The hosts and guests were solvated in a truncated octahedral box of explicit TIP4PEwald water molecules [69] extending 10 Å from the solute using the tleap module, giving a total of 1120–1296 atoms.
All MD simulations were performed using the Amber 14 software [70] with the GAFF force field [71] for the host and ligands. Parameters for NOH have been described before [72] and the parameters for the other host and the guest molecules were determined in the same way: The molecules were geometry optimised at the AM1 [73] level, followed by a calculation of the electrostatic potential at the HF/631G* [74] level of theory at points sampled around the molecule according to the Merz–Kollman scheme [75]. These calculations were performed with the Gaussian09 [76] software. Finally, restrained electrostaticpotential (RESP) charges [77] were fitted to the electrostatic potential using the antechamber program in the Amber 14 suite. The charges were symmetrised to reflect the (approximate) C _{4v} symmetry of the host molecules. One missing dihedral parameter for G2 was obtained from vibrational frequencies calculated at the B3LYP/def2SV(P) level of theory using the Seminario approach [78], implemented in the Hess2FF program [79]. The Amber topology files for NOM and the ligands, as well as the added forcefield parameters are given in the Supplementary material.
In all simulations, periodic boundary conditions were employed. For each complex, 10,000 steps of minimisation were used, followed by 20 ps constantvolume equilibration and 2 ns constantpressure equilibration. In order to allow for a time step of 2 ps, the SHAKE algorithm [80] was used to constrain bonds involving hydrogen atoms to their equilibrium values. The temperature was kept constant at 300 K using Langevin dynamics [81], with a collision frequency of 2 ps^{−1} and the pressure was kept constant at 1 atm using a weakcoupling isotropic algorithm [82] with a relaxation time of 1 ps. Longrange electrostatics were handled by the particlemesh Ewald (PME) method [83] with a fourthorder B spline interpolation and a tolerance of 10^{−5}. The cutoff radius for Lennard–Jones interactions was set to 8 Å. No counterions were used in the calculations, because we have previously shown that they only have a minor (~2 kJ/mol) influence on the binding free energies [84].
In the first simulations, G4 dissociated from the OAM host. Therefore, a restraint of 209 kJ/mol/Å^{2} was added between one of the hydrogen atoms of the host that points into the cavity and the Br atom of the guest. This ensured that the guest stayed inside the host throughout the simulation.
Previous FES calculations for the nine SAMPL4 ligands of the OAH host have shown that the deletion of the benzoic and propionate groups have only minor influence on the relative binding free energies (less than 2 kJ/mol difference for the relative free energies) [72]. We tested also an intermediate host molecule, still with benzoic groups, but with the propionate groups removed. However, it gave almost identical results to the NOH host (within 1 kJ/mol; shown in Table S1 in the Supplementary material). Therefore, this host molecule was not further tested for the SAMPL5 ligands.
Geometric measures

r _{Dm} measures how deep the guest is inside the host and is defined as the closest distance between the average of the coordinates of the four HD atoms of the host (AD) and any guest atom.

α_{t} shows the orientation of the ligand inside the host and is defined as the angle between the C1–C2 (G1, G2, G4, and G6) or N–C1 (G3 and G5) vectors and the host AD–AB vectors, where AB is the average coordinate of the four HB atoms.

r _{O1} and r _{O2} or r _{N} describe how much the guest carboxylate or trimethylammonium group reaches out of the host. They are the distance between the guest O1 and O2 (for G1, G2, G4, and G6) or N (for G3 and G5) atoms and the average plane defined by the four CC atoms. A positive distance indicates that the atom is outside the host.

Δr _{BB} measures the distortion of the host and is defined as the difference of the distances between two opposite HB atoms on the host.

r _{C1} and r _{C2} describe the orientation of the benzoic (OAH and OAM) or benzene (NOH and NOM) groups. They are calculated as the distance between opposite host CO or HO atoms. r _{Cav} is the average of these two distances.

r _{min1} and r _{min2} are the two shortest distances between the guest carboxyl oxygen atoms (i.e. only for guests G1, G2, G4, and G6) and a hydrogen atom of the host. They indicate whether there are any CH–O hydrogen bonds.
Quality estimates
The quality of the bindingaffinity estimates compared to experimental data [84] was measured using the mean absolute deviation after removal of the systematic error (i.e. the mean signed deviation; MADtr), the correlation coefficient (R ^{2}), and Kendall’s rank correlation coefficient (τ). Following the overview article, we employed the NMR data for all complexes, except for G6 and for G4 in OAH, for which ITC data was used [29]. ∆G _{bind} of the two experimental data sets differ by 0.3–3.3 kJ/mol (1.5 kJ/mol on average).
Result and discussion
In this study we have tried to estimate the binding affinities of the twelve octaacid host–guest systems in the SAMPL5 blind challenge [29]. The octaacid hosts form a hydrophobic cavity that has been shown to bind various small molecules by hydrophobic interactions inside the cavity [25, 26]. The two variants of the octaacid cavitand differ in the absence (OAH) or presence (OAM) of four methyl groups on the rim of the cavity, as is shown in Fig. 1. The six guest molecules are shown in Fig. 2. Four of them are negatively charged with a carboxylate group and the other two (G3 and G5) are positively charged with a trimethylammonium group. Three of the hosts contain a benzene ring, one an adamantane group, whereas the other two have linear chains, hexane or pentyne.
Compared to the nine octaacid OAH–guest systems in the SAMPL4 competition [22], the present ligands shows a much larger diversity, both in the general structure and in the net charge. This make them less suitable for FES calculations of relative binding free energies, which was successfully used by us and other groups in that challenge [22, 27]. Therefore, we decided to instead use the QM approach based on minimised structures, developed by Grimme [30, 31], which was also employed in SAMPL4, giving results of an intermediate quality (R ^{2} = 0.6–0.8 and a mean absolute deviation, MAD, of 5–9 kJ/mol) [27, 33].

Restricting the uncertainty caused by the flexibility of the host molecule by a strict control of the minimisation.

Reducing the uncertainty caused by the large charge of the host molecules (and also the flexibility) by removing the propionate and benzoic carboxylate groups.

Testing the effect of a restricted MD sampling.

Improving the QM method by using the DLPNO–CCSD(T) [34] approach.
Controlled minimisation
Our MD studies of OAH with the nine ligands in the SAMPL4 competition showed that there are two motions that give rise to major variations in the structure of the octaacid–ligand complexes [27]. The first is a breathing motion of the host, varying the entrance of the cavity from symmetric and circular to elongated and ellipsoidal. It can be described by the ∆r _{BB} measure. ∆r _{BB} varies by up to 8 Å on a time scale of less than 0.1 ns, but during the minimisation, the distortion is typically frozen into the structure, giving large variations in the minimised structures.
The second motion is in the propionate chains, which have two sp ^{3}hybridised dihedrals with three minima of similar energies. Unfortunately, this rotation is rather slow, on the 1–10 ns scale, so very long simulations are needed to sample all possible conformations. Therefore, again different conformations are frozen into the minimised structures and owing to the negatively charged carboxylate group at the end of the chains, the conformations may significantly affect the binding affinities.
To minimise the effect of these two movements we decided to control the minimisations much stricter than in the SAMPL4 challenge. We assumed that none of the ligands should have any certain preference of the host distortion or the propionate conformations. Therefore, we tried to get structures for all ligands that are as similar as possible with regard to the host distortion and the propionate conformations. This was done by first optimising the OAH and OAM hosts with enforced fourfold symmetry. Then, the guests were inserted as symmetric as possible and the structure was carefully optimised in order keep the geometry close to the starting point.
At this stage, we also had to decide how to perform the optimisation. In SAMPL4, we used three different approaches [27]: The optimisation was performed either in a vacuum, in a COSMO continuum solvent (with a dielectric constant of 80), or in the same COSMO solvent, but with four explicit water molecules forming hydrogen bonds to the carboxylate group of the ligand (present in all nine ligands). The three methods gave some systematic variations in the obtained structures, especially regarding the orientation of the benzoate and propionate groups and how far the ligand reached out of the host. However, somewhat unexpectedly, the vacuum structures gave the most stable binding energies, especially if relaxed interaction energies were considered, probably because the strong electrostatic repulsion between the propionate carboxylate groups in vacuum gave them a similar conformation in all structures. Therefore, we decided to use vacuumoptimised structures also in the present investigation (but the other two methods were also tested for the MD snapshots, see below).
Geometric measures for G1–G6 bound to the four hosts after optimisation at TPSSD3/def2SV(P) level
Guest  Host  r _{Dm}  α_{t}  r _{N}/r _{O1}  r _{O2}  Δr _{BB}  r _{Cav}  r _{min1}  r _{min2} 

G1  OAH  2.6  82.8  −2.0  0.2  0.5  19.6  1.9  2.7 
NOH  3.0  76.5  −2.2  −0.2  6.3  17.6  2.4  3.8  
OAM  3.1  47.1  0.6  1.5  0.5  20.1  2.5  2.6  
NOM  2.5  83.3  −0.8  −1.3  2.2  17.8  2.1  2.2  
G2  OAH  3.2  47.9  1.6  0.0  2.9  18.3  2.1  2.1 
NOH  4.0  39.9  1.5  1.4  3.8  17.6  1.9  1.9  
OAM  4.1  12.6  3.3  2.8  1.0  20.1  3.0  3.6  
NOM  2.6  1.0  1.8  1.9  3.6  17.7  2.5  2.6  
G3  OAH  2.7  50.2  0.6  0.9  17.8  
NOH  2.7  47.6  1.2  5.4  17.6  
OAM  2.8  48.4  1.5  0.1  20.0  
NOM  3.1  51.7  1.7  7.6  17.5  
G4  OAH  4.5  65.3  1.2  1.3  0.5  19.8  2.0  2.0 
NOH  4.1  65.2  0.8  0.9  1.9  17.9  1.9  1.9  
OAM  5.2  13.6  3.5  3.0  1.2  19.9  2.7  3.2  
NOM  5.4  61.7  2.5  1.7  0.3  17.2  2.3  2.4  
G5  OAH  2.3  115.8  1.2  0.4  18.2  
NOH  3.6  26.2  1.0  2.3  17.7  
OAM  3.3  27.7  2.2  1.1  19.9  
NOM  3.5  27.8  2.5  1.9  17.9  
G6  OAH  5.9  35.3  4.6  3.1  0.6  20.0  4.4  5.0 
NOH  3.8  59.1  1.1  −0.3  6.6  17.4  2.0  2.2  
OAM  6.0  49.3  2.7  4.3  0.5  20.1  2.7  3.0  
NOM  7.4  53.9  1.6  2.9  1.8  17.8  2.3  2.5 
Strangely, G2 did not bind inside the OAH host with the standard method of optimisation. We had to run the optimisation in a COSMO continuum solvent with a dielectric constant of 80 to obtain a bound structure. The results presented in this paper are obtained with that structure. Likewise, G4 tended to dissociate from the OAM host in the initial optimisations, but this could be solved by using carefully designed starting structures.
Neutralised hosts
The −8 charge of the OAH and OAM hosts gives rise to very large solvation free energies (up to −6620 kJ/mol). These are to a large extent cancelled when the difference in solvation energy between the complex, the host, and the guest are calculated (to around –1580 kJ/mol) and then further cancelled when combined with the QM binding energy, which includes the electrostatic repulsion or attraction between the host and the negatively or positively charged ligands, respectively, giving a net binding free energy of −10 to −39 kJ/mol. Therefore, both the continuumsolvation and the QM methods need to be extremely accurate to give a proper accuracy of the final estimates.
To avoid these problems, we recently suggested and showed for the SAMPL4 ligands that the both the benzoic carboxylate group and the full propionate chain can be replaced by hydrogen atoms (giving the NOH and NOM hosts in Fig. 1), without changing the relative binding free energies of the ligands by more than 2 kJ/mol [72]. In fact, as shown in Table S1 in the Supplementary material, the 2 kJ/mol difference comes mainly from the propionate ligand. Another advantage with the removal of the propionate groups is that the problem with the conformational sampling of these groups is also avoided.
Binding free energies
Calculated energy components and absolute binding free energies (kJ/mol) obtained from TPSSD3/def2SV(P) optimised structures
Host  Guest  ∆E _{QM}  ∆E _{disp}  ∆G _{solv}  ∆G _{therm}  ΔG _{tot}  ∆E _{Grlx}  ΔG _{tot,Grlx}  ΔG _{tot,rlx}  ΔG _{tot,CC} 

OAH  G1  948.0  −112.5  −873.4  77.9  40.0  −22.7  62.8  57.4  
G2  1017.0  −121.0  −957.8  71.3  9.5  −1.9  11.4  −5.1  
G3  −1062.1  −168.5  1155.9  96.2  21.4  −37.2  58.6  43.1  
G4  973.3  −154.4  −906.3  101.1  13.7  0.0  13.7  4.8  
G5  −1043.3  −173.8  1136.7  90.1  9.7  −7.6  17.3  18.7  
G6  909.5  −71.4  −899.1  72.2  11.2  −1.7  12.9  1.6  
NOH  G1  −72.4  −127.1  186.7  75.6  62.7  −16.4  79.1  67.6  83.9 
G2  −19.9  −122.1  101.4  80.2  39.5  −0.9  40.4  63.2  46.2  
G3  10.2  −149.7  73.4  88.0  21.9  −35.1  57.0  59.4  57.4  
G4  −26.1  −163.4  129.2  81.1  20.8  0.0  20.8  19.7  19.7  
G5  18.3  −165.0  68.7  81.3  3.2  −4.6  7.8  19.4  15.1  
G6  −14.3  −143.0  112.3  79.1  34.1  −2.3  36.4  56.6  49.7  
OAM  G1  953.0  −104.4  −888.3  87.8  48.1  −6.4  54.5  39.8  
G2  950.1  −99.3  −904.5  77.1  23.4  −0.3  23.7  37.7  
G3  −955.6  −163.6  1065.1  90.2  36.1  −24.0  60.1  57.1  
G4  1013.3  −173.8  −888.1  95.1  46.5  0.0  46.5  60.4  
G5  −949.2  −170.1  1121.3  83.7  85.8  −7.4  93.2  18.9  
G6  911.2  −74.8  −890.6  92.2  38.1  −7.8  45.9  29.0  
NOM  G1  −63.7  −123.0  197.0  81.7  91.9  −13.5  105.4  98.7  103.3 
G2  2.7  −131.0  89.3  78.4  39.4  −3.8  43.2  79.5  50.6  
G3  30.0  −153.5  66.3  96.9  39.7  −23.1  62.8  71.2  53.7  
G4  28.5  −168.0  93.9  98.5  52.9  −5.8  58.7  83.4  79.1  
G5  27.7  −147.4  56.8  86.0  23.1  −4.6  27.7  36.8  37.0  
G6  −29.1  −94.3  81.1  80.0  37.7  −1.6  39.3  44.2  30.4 
The first term is the singlepoint vacuum TPSS/def2QZVP’ binding energy. For the OAH and OAM hosts, this term is large, owing to the electrostatic interaction between the host with a −8 charge and the ligands with a −1 or +1 charge, 910–1017 or −949 to −1062 kJ/mol, respectively. The energy is ~ 100 kJ/mol less negative in OAM than in OAH for the positively charged ligands, whereas for the other guests there is no consistent difference. For the neutralised hosts, ∆E _{QM} is much smaller, −72 to +30 kJ/mol. It is 9–20 kJ/mol more positive in NOM than in NOH for the positively charged ligands, but again without any consistent trend for the negatively charged ligands.
∆E _{QM} is more than compensated by the COSMORS solvation energy. ∆G _{solv} is very large for the OAH and OAM hosts, −873 to −958 kJ/mol for the negatively charged guests but 1056–1165 kJ/mol for G3 and G5. Consequently, the sum of these two terms is always positive, 10–172 kJ/mol, largest for G5 in OAM and lowest for G6 in OAH. For the neutralised hosts, ∆G _{solv} is always positive, 57–197 kJ/mol, without any clear difference between the guests with different charges. The sum of the two terms is also always positive, 52–133 kJ/mol.
The dispersion energy is always negative, −71 to −174 kJ/mol. It is more negative for the two positively charged ligand and the bulky G4 ligand than for the other three ligands. It is typically least negative for G6, reflecting that G6 does not bind deeply in the host (except in NOH; cf. Figs. 4, 5). Interestingly, ∆E _{disp} is always more negative in the neutralised hosts for the negatively charged ligands, but the opposite is true for the positively charged ligand.
The thermal corrections vary only slightly among the various systems. They are always positive, 71–101 kJ/mol, reflecting the loss of translational and rotational entropy when the guest molecule binds to the host. There are no consistent differences for the various hosts and no correlation between the results obtained with the charged and neutralised hosts.
Summing the four terms gives the net binding free energy, ∆G _{tot}. Somewhat disappointingly, it is positive for all ligands, 3–92 kJ/mol. There is no consistent difference between the positively or negatively charged ligands. However, ∆G _{tot} is more positive for the truncated hosts than for the fully charged ones (by 0–30 kJ/mol), except for G6. There is no correlation between the results obtained for the charged and neutralised hosts, R ^{2} = 0.1.
Quality measures (compared to experimental data [84]) of the three total binding free energies obtained with TPSSD3/def2SV(P) minimised structures (TPSS), MD sampled structures (MD), and structures minimised with HF3c in a COSMO continuum solvent without (Cos) or with four water molecules
TPSS  MD  Cos  Wat  

∆G _{tot}  ∆G _{tot,Grlx}  ∆G _{tot,rlx}  ∆G _{tot,CC}  ∆G _{tot}  ∆G _{tot}  ∆G _{tot}  
MADtr  OAH  11.2  19.3  18.8  
NOH  14.2  17.0  15.0  16.5  14.8  16.1  11.3  
OAM  13.9  17.5  12.0  
NOM  15.9  21.6  19.8  19.8  16.6  16.0  11.8  
R ^{2}  OAH  0.01  0.05  0.05  
NOH  0.03  0.08  0.30  0.15  0.01  0.00  0.04  
OAM  0.14  0.01  0.01  
NOM  −0.02  −0.09  −0.02  0.00  −0.08  −0.06  −0.05  
τ  OAH  −0.33  −0.07  −0.07  
NOH  0.20  0.20  0.33  0.07  0.20  0.07  0.20  
OAM  0.33  −0.07  −0.33  
NOM  −0.33  −0.47  −0.20  −0.20  −0.20  −0.20  −0.20 
The ligand relaxation energy (∆E _{Grlx} in Table 2) is less than 8 kJ/mol for most of the ligands, except G1 and G3 with the linear chains (up to 37 kJ/mol). Including this energy (ΔG _{tot,Grlx} in Tables 2, 3) of course makes the binding free energy even more positive. This does not change the correlation significantly, but MADtr increases for all hosts. Thus, the results are not improved by including the ligand relaxation energy. If all energy terms are calculated for the fully relaxed host and guest molecules (∆G _{tot,rlx} in Tables 2, 3), the correlation improves slightly for NOH, R ^{2} = 0.3, and MADtr improves compared to ∆G _{tot,rlx}, but it is still worse than ∆G _{tot} for all hosts except OAM, 12–20 kJ/mol. Therefore, we will only discuss the rigid results in the following.
Compared to the corresponding results for the SAMPL4 octaacid challenge [27], the present calculations give appreciably worse results (R ^{2} = 0.0–0.1 and MADtr = 11–15 kJ/mol, compared to 0.6–0.8 and 5–9 kJ/mol). In particular, all the present binding affinities are positive, whereas this was the case only for one ligand in the SAMPL4 set. The only difference between the two sets of calculations is the use of the HF3c method for the ∆G _{therm} term, rather than MM. This term is 71–101 kJ/mol for the SAMPL5 complexes, but it was only 44–57 kJ/mol for the SAMPL4 MM results. For the Bz complex in SAMPL4 [27], the difference in the ∆G _{therm} calculated with MM and HF3c is 30 kJ/mol, indicating a significant difference in the results obtained with the two methods. The difference comes entirely from the vibrational part and it is dominated by the entropy contribution (showing that it is caused mainly by the lowfrequency vibrations), but both the enthalpy and zeropoint energy parts are also significantly different (4–6 kJ/mol), although the enthalpy part counteracts the other two contributions. Test calculations indicated that the scale factor of the HF3c frequencies (0.86) had only minor influence on the results (a scale factor of 1.0 changed the results by only 3 kJ/mol). Sure and Grimme recommended the HF3c method to obtain vibrational frequencies for host–guest binding affinities [13], but in this case this method seems to give significantly worse results than MM.
Therefore, we recalculated the ∆G _{therm} term for all the OAH and OAM complexes with MM (no scaling of the frequencies). The results are shown in Table S2 in the Supplementary material. It can be seen that ∆G _{therm} in general is larger when calculated with HF3c, by 11 kJ/mol on average, but the difference is varying −2 to 32 kJ/mol. In particular, even when calculated with MM, ∆G _{therm} is larger for the present ligands, 58–98 kJ/mol, than for the SAMPL4 ligands. This indicates that the difference in the thermostatistical corrections between the SAMPL4 and SAMPL5 sets comes primarily from differing properties of the ligands, rather than from the change in the method used to calculate the vibrational frequencies.
The difference in the rigid guest energies between the methylated and nonmethylated hosts indicate how the methylation affects the guest binding. The ∆E _{QM} energy differences are rather small for most of the ligands (up to 6 kJ/mol), but 12–16 kJ/mol for G3 in both hosts and G1 in the charged hosts. For both ligands, the methylated hosts give the smaller distortion of the guest. This indicates that we may have studied suboptimal structures of the flexible G1 and G3 guests in the less crowed unmethylated hosts, i.e. a sampling problem.
We also calculated binding free energies using PBE/def2QZVP’ calculations and dispersion parameters for PBE (because this approach gave better results than TPSS for other host–guest systems [31]). The PBE/def2QZVP’ binding energies were 30 kJ/mol more favourable than the TPSS energies on average, but this was compensated by the dispersion energies, so that the net binding free energies differed by −4 to +11 kJ/mol (3 kJ/mol on average, i.e. PBE gave a slightly weaker binding). This did not change the results significantly and therefore only TPSS results will be discussed in the following.
Coupledcluster calculations
Finally, we recalculated the QM energies with the more accurate DLPNO–CCSD(T) method. This approach can provide CCSD(T) energies with extrapolations to a complete basis set using def2TZVPP and def2QZVPP calculations even for the present complexes of up to 184 atoms. The calculations were based on the TPSSD3/def2SV(P) optimised structures (only neutralised hosts) and the DLPNO–CCSD(T) rigid interaction energies were combined with the DFT solvation energies and the HF3c thermostatistical corrections (∆G _{solv} and ∆G _{therm} in Table 2) to give net binding free energies. The results are given in the last column (ΔG _{tot,CC}) in Table 2.
The raw DLPNO–CCSD(T) rigid interaction energies differ from the TPSSD3/def2QZVP’ ∆E _{QM} + ∆E _{disp} energies by 1–36 kJ/mol. In general, the CCSD(T) energies are somewhat more positive (by 13 kJ/mol on average); the TPSSD3 energies are more positive only for G4 in NOH and G6 in NOM (by 1 and 7 kJ/mol, respectively). The largest differences are found for G1 and G3 in NOH and for G4 in NOM (21, 36, and 26 kJ/mol), whereas for the other ligands, the difference is up to 16 kJ/mol. The DLPNO–CCSD(T) energies are reasonably converged with respect to the basis set: A basisset extrapolation based on the smaller def2SVP and def2TZVPP basis set gave results that differed by less than 8 kJ/mol (4 kJ/mol on average).
Unfortunately, the DLPNO–CCSD(T) calculations did not improve the results compared to experiments (Table 3): There is still no correlation between the experimental and calculated results, and the MADtr increased slightly compared to the TPSSD3 energies (16–20 kJ/mol). Based on benchmark calculations and previous studies of intermolecular interactions with DLPNO–CCSD(T), it is reasonable to expect that the results are within 4 kJ/mol of that of canonical CCSD(T) [85, 86, 87]. Since the latter method is known to be accurate for such interaction energies between organic closedshell molecules, we believe that the electronic energies from the coupledcluster calculations are close to chemical accuracy (4 kJ/mol relative to the exact solution of the electronic Schrödinger equation at fixed geometry). While this is certainly a methodological achievement, these results highlight the importance of the other terms that enter the free energy and demonstrates that the accuracy of the calculated binding free energies is not limited by the energy calculations.
MD sampled structures
One of the largest problems with the present approach is the use of single minimised structures. For large flexible molecules, it is hard to find the global minimum and it is possible that several conformations have a low energy, all contributing to the binding free energy. We have already taken several precautions to reduce this problem, using rigid interaction energies, keeping all complexes as symmetric and similar as possible, and removing the flexible propionate groups.
As an alternative and more general approach, we also tested to use a set of structures sampled from MD simulations. For each host–guest system, we run a 10 ns MD simulation of the explicitly solvated complex, started from the TPSS/def2SV(P) structures. From these, we took ten regularly spaced snapshots, which were minimised and energies were then calculated using the same four energy terms in Eq. 2 as for the original minimised structures. To save time, the calculations were performed on the neutralised NOH and NOM hosts and the minimisations were performed at the HF3c level. Moreover, test calculations showed that the ∆G _{therm} term did not change significantly for the various structures, so we used the same value (in Table 2) for all snapshots. Only rigid interaction energies were considered. Of course, this is a rather primitive approach to include some effects of the conformational flexibility of the complexes and more accurate approaches exist [88, 89]. However, it will give a first indication of the importance of structure sampling within the present optimisation approach.
Owing to the change in the energy function (MM in the simulations, but the final energies are calculated at the TPSS/def2QZVP’ level) pure averages should not be used when evaluating the binding affinities. Instead, the snapshots should be Boltzmannweighted, giving higher weights to the structures with the lowest energies of the complexes. These Boltzmannweighted averages are shown as crosses in Fig. 7 and it can be seen that in general, the most favourable complexes also give the most favourable binding energies, reducing the influence of highenergy outliers. On the other hand, it strongly increases the importance of the structures with the most favourable binding free energy and in four cases the final binding affinities are determined from one single structure (G1, G3, and G5 for NOH and G6 in NOM). For G1 in NOH, this structure is a lowenergy outlier, viz. the only structure in which the carboxylate group of G1 is not buried inside the host, giving it a ~50 kJ/mol more favourable ∆G _{solv} than the other structures.
Second, G1 in NOM has a much too positive binding affinity. In fact, there is a similar problem of G1 in NOH, but a single lowenergy outlier provided a reasonable binding free energy after Boltzmann averaging (Fig. 7). As discussed above, this is related to the vacuum optimisations, which give structures with the carboxylate groups buried too deeply in the host (Figs. 4, 5). Third, G6 in NOM also has a somewhat too favourable binding, but this is mainly caused by the Boltzmannaveraging—the pure average is instead above the expected correlation line.
Inspired by the problem with G1, we tried to improve the structures by performing the optimisation in a COSMO continuum solvent with a dielectric constant of 80, either without or with four water molecules interacting with the carboxylate or trimethylammonium group of the ligands (as was also done in our SAMPL4 study [27]). The water molecules were deleted before the binding energies were calculated and the ∆G _{therm} term was not recalculated. To enhance the chance to obtain reasonable structures, the optimisations were started from one random snapshot from the MD simulations. The optimisation was performed at the HF3c level of theory.
Comparison of structures
Geometric measures of complexes obtained from the MD snapshots with different methods: average value over the ten MD snapshots (MM), average value over the ten MD snapshots optimised in vacuum with HF3c (Vac), or one MD snapshot optimised with HF3c in a COSMO continuum solvent without (Cos) or with four explicit water molecules (Wat)
Host  Guest  Method  r _{Dm}  α_{t}  r _{N}/r _{O1}  r _{O2}  Δr _{BB}  r _{Cav}  r _{min1}  r _{min2} 

NOH  G1  MM  3.0  42.1  1.3  1.5  1.1  17.8  3.3  3.9 
Vac  2.2  90.9  −1.3  −0.1  3.0  17.7  1.8  2.9  
Cos  2.2  80.5  −0.2  1.8  2.6  17.7  1.8  2.8  
Wat  2.4  57.8  1.2  1.2  0.3  17.7  2.3  4.4  
G2  MM  4.4  32.2  2.7  2.4  1.9  17.7  2.5  3.1  
Vac  3.7  36.7  −1.3  −0.1  3.8  17.4  1.9  1.9  
Cos  3.7  34.3  −0.2  1.8  3.7  17.4  2.0  2.0  
Wat  3.7  32.9  1.2  1.2  2.8  17.4  2.1  2.1  
G3  MM  3.4  24.1  3.1  1.2  17.6  
Vac  3.0  30.0  2.7  2.8  17.5  
Cos  3.1  19.0  2.9  0.9  17.4  
Wat  2.9  20.3  2.7  2.1  17.6  
G4  MM  5.8  33.6  3.0  2.9  1.0  17.9  3.5  4.4  
Vac  4.6  64.2  1.5  1.3  0.9  17.6  1.7  2.7  
Cos  4.0  62.7  0.0  0.4  0.5  17.6  1.8  3.0  
Wat  4.1  59.8  0.1  0.9  0.1  17.5  2.0  3.3  
G5  MM  3.9  31.8  2.6  1.3  17.9  
Vac  3.5  30.8  2.3  3.1  17.5  
Cos  3.5  28.4  2.2  0.7  17.4  
Wat  3.5  29.7  2.2  0.7  17.4  
G6  MM  5.1  64.9  2.6  2.7  4.5  17.4  3.0  3.9  
Vac  4.4  68.4  0.7  1.4  3.6  17.4  1.8  3.0  
Cos  4.7  57.5  1.6  1.9  4.3  17.3  2.0  3.6  
Wat  4.3  50.6  1.8  1.8  5.7  17.2  2.4  4.1  
NOM  G1  MM  3.1  40.9  2.2  2.4  1.5  17.8  3.0  3.1 
Vac  1.9  64.7  0.2  1.0  2.7  17.5  2.0  2.1  
Cos  2.1  54.8  0.9  2.4  1.2  17.5  2.1  2.1  
Wat  2.4  41.8  2.1  0.9  1.4  17.5  2.5  2.8  
G2  MM  4.3  12.6  3.4  3.2  1.8  17.7  3.3  3.6  
Vac  2.5  14.1  2.0  1.9  0.3  17.6  2.6  2.9  
Cos  2.6  11.7  1.8  2.3  17.5  2.8  2.9  
Wat  2.6  14.7  1.8  2.2  17.5  2.6  3.1  
G3  MM  3.5  36.2  3.5  1.7  17.7  
Vac  2.8  33.5  2.9  4.3  17.4  
Cos  3.0  28.2  2.6  2.8  17.5  
Wat  2.9  30.2  2.7  2.3  17.5  
G4  MM  5.4  19.7  2.9  2.8  0.7  17.2  2.9  3.4  
Vac  5.2  27.9  2.6  2.6  1.3  16.9  2.4  3.3  
Cos  5.1  25.7  2.6  2.8  1.7  16.9  2.2  3.5  
Wat  4.9  26.8  2.3  2.9  1.8  16.8  2.3  3.6  
G5  MM  4.5  21.7  3.5  1.6  17.6  
Vac  3.7  21.9  2.6  4.8  17.3  
Cos  3.6  20.0  2.8  3.7  17.4  
Wat  3.7  13.7  2.8  1.2  17.7  
G6  MM  5.0  22.4  3.2  2.5  2.0  17.5  3.1  3.4  
Vac  3.8  40.0  2.2  0.5  1.0  17.3  2.2  2.5  
Cos  4.5  16.4  3.2  1.9  1.2  17.5  2.8  3.7  
Wat  3.5  28.8  2.4  1.0  0.2  17.3  3.1  3.2 
Owing to the charges of both the ligands and the hosts, it is likely that the MD simulations in explicit solvent give the most realistic structures. The largest difference between the MD structures and the vacuumoptimised structures is that the charged groups of the ligands are reaching further out into the solvent in the MD structures. This is most clearly seen from r _{O1/N}, which is positive and rather large for all MD structures, 1.3–3.5 Å, whereas it is always smaller and often negative for the optimised structures. The effect is larger for the negatively charged ligands than for G3 and G5. It can partly be cured by optimising in COSMO with or without explicit water molecules, but the results are varying and the same approach does not always give the best structures. There is often a large difference between the TPSS results obtained with the full or neutralised hosts, reflecting the problem of using only a single structure.
The MD structures also give larger r _{min} distances (2.5–3.5 Å) than the minimised structures, reflecting that the carboxylate atoms form hydrogen bonds with water rather than with the host CH atoms. Again, this can be improved by the use of explicit water molecules in the optimisation, but the improvement is only partial and for some complexes, the difference is still 1.5 Å. The ligand is typically also deeper buried in the host in the optimised structures than in the MD structures, as is illustrated by the r _{Dm} distance (3.0–5.8 Å for the latter structures), but the variation is quite large between the various complexes.
For some complexes, there is also a large difference in the orientation of the guest in the host between the MD and optimised structures, indicated by the α_{t} tilt angle. The difference is particularly large for G1 in both hosts (41–42° in MD compared to 65–91° in the minimised structures) and G4 in OAH or NOH (34° compared to 63–65°). For the former, the results are improved with COSMO and explicit water molecules, especially for NOM, but not for G4. On the other hand, there is no consistent difference in the distortion of the host or the orientation of the benzyl groups between the MD and optimised structures.
For many of the OAH and OAM structures optimised with TPSS in vacuum, the benzoate groups are tilted upwards or outwards (cf. Fig. 4), whereas in the MD structures, these groups are tilted downwards. This is an effect of the missing solvation in the vacuumoptimised structures: If the structures are instead optimised in the COSMO continuum solvent, the benzoate groups tilt downwards, as was observed in our SAMPL4 study [27]. Likewise, if the benzoate groups are deleted, as in the NOH and NOM structures, the remaining benzene rings tilt downwards. However, the tilt of these groups seem to have little influence on the guest binding energies, considering that structures optimised in vacuum or with COSMO solvation gave similar binding energies in SAMPL4 [27].
There are few general differences between the structures obtained for hosts with or without the methyl groups. For all ligands, except G3, the α_{t} tilt angle is smaller for the methylated hosts. This is most pronounced for G6, for which the difference is 42° in the MD structures, whereas it is 20° for G2 and around 10° for G4 and G5. Structures obtained with the other approaches typically show qualitatively similar differences, but with larger variations, especially for G1 and G6. This indicates that the methyl groups force the ligand to bind more upright.
The methylated hosts also in general give a larger r _{O1/N} distance (by 0.4–0.9 Å for the MD structures), with few exceptions (the most important is the MD structures of G4, for which NOH gives a 0.1 Å smaller distance on average). This indicates that methyl groups make the ligands protrude somewhat more from the host. However, the ligands still reach to a similar depth into the hosts, with a difference in r _{Dm} ranging from −0.4 Å (G4) to 0.6 Å (G5) for the MD structures.
There are quite extensive variations within the snapshots taken from the MD simulations. In particular, α_{t} varies by 16–52° in the various simulations (more with NOM than with NOH) and the ∆r _{BB} distortion by 1–5 Å. The r _{Dm} and r _{O1/N} distances show a smaller variation of 0.6–2.9 and 0.9–3.6 Å, respectively. In the HF3c structures optimised from the MD snapshots, the variation can both be reduced and enhanced. For example, all HF3c structures of G2 in NOM are essentially identical, whereas they show an extensive variation in the MD snapshots (e.g. a variation in α_{t} of 3–31° and 14–15° before and after the optimisation). On the other hand, the variation of ∆r _{BB} is only 2 Å for the MD structures of G5 in NOH, but 7 Å for the HF3c structures started from these snapshots.
G6 shows two lowenergy conformations in the optimised structures in both hosts, characterised of α_{t} = 45–57° or 97–102° in NOH and 17–18° or 45–49° in NOM. It represents two orientations of the nitro group inside the host, as can be seen in Fig. 8. There is also an extensive variation of the distortion of the host and in how deep the ligand binds in the host, but both variables are independent of the change in the ligand conformation. All this variation in the structure gives rise to the extensive variation in the binding energies seen in Fig. 7.
Submitted results
Three sets of data (relative binding free energies) were submitted for each host: DFT energies based on the TPSSD3/def2SV(P) structures with either the OAH/OAM or the NOH/NOM hosts, as well as DLPNO–CCSD(T) energies, based on the latter structures (called DFTcharged, DFTneutral, and CCSD(T)neutral, respectively, in the overview article [29]). The other binding affinities discussed in this article were not finished at the time of the submission. Moreover, G2 had dissociated from OAH and G4 bound outside the cavity with the carboxylate group directed inwards the cavity for both the OAM and NOM hosts. All energies included the ∆E _{Grlx} term (i.e. they were ΔG _{tot,Grlx}), except those for OAH, which were fully relaxed free energies (∆G _{tot,rlx}; the rigid host energies were not finished before submission). Finally, the DLPNO–CCSD(T) energies were based only on the def2SVP/TZVPP basisset extrapolation and the 7.9 kJ/mol correction for the change in reference state was omitted. Finally, some of the solvation energies were incorrect. The submitted data are shown in Table S5 in the Supplementary material.
The submitted data gave much better correlation to the experimental results than the data presented in Table 3 (R ^{2} = 0.1–0.5), but worse MADtr, 20–43 kJ/mol). The results in this article provide the correct data for the current methods. It is clear that these methods are not competitive compared to the best methods for this test case, giving R ^{2} = 0.7–0.8 and MADtr = 4–6 kJ/mol (but very few methods gave both good R ^{2} and MADtr and also good results for both hosts).
Conclusions
As a part of the SAMPL5 host–guest competition, we have tried to estimate the free energy for the binding of six small, but diverse ligands to two variants of the octaacid cavitand. Our aim was to test and improve a method, originally suggested by Grimme [30, 31], employing DFT calculations with large basis sets, empirical dispersion corrections (DFTD3) [32, 40], continuum estimates of the solvation free energy [48, 49], as well as enthalpy and entropy corrections from vibrational frequencies [30, 54], all estimated from single minimised DFT structures. This approach was used for the same host in the SAMPL4 competition by both Grimme and us, giving results of intermediate quality [27, 33].
Based on those calculations, we tried to improve the calculations in four ways. First, we reduced the effect of the flexibility of the host by a strict control of the host molecules, keeping them as symmetric and similar as possible during the geometry optimisations. In particular, we controlled the breathing motion of the host and the conformation of the propionate groups. Moreover, we employed geometry optimisation in vacuum, which enhance the repulsion between the propionate and benzoate groups, thereby increasing the symmetry of the complexes [27]. We also calculated rigid interaction energies, using the geometry of the host and guest from the complex also for the isolated moieties, because this gave somewhat more stable energies (but it was less important than in our previous study [27]). Thereby, we could obtain quite symmetric complexes for all the negatively charged ligands, but for the two ligands with the bulky trimethylammonium group, the complexes were still quite distorted.
Second, we performed calculations also on host molecules for which we had removed all the propionate and benzoate groups, thereby both reducing the flexibility and deleting the large negative charge, which gives rise to very large QM and solvation energy terms that need to cancel very accurately to give reliable final results. Test calculations on the SAMPL4 ligands with FES methods showed that these charged groups had only minimal effects (<2 kJ/mol) on the relative binding free energies [72]. Unfortunately, the structures of the neutralised host were more distorted than those of the charged host, probably owing to the repulsion of the charged groups in the vacuum optimisation.
Third, we tested to perform a restricted conformational sampling by employing ten snapshots from a MD simulation. The results (Fig. 7) showed a rather limited variation in the binding free energies calculated from the various snapshots for most of the ligands, except G6. The variation was typically smaller in the methylated host, owing to a more restricted binding site.
Fourth, we tried to improve the QM energies with the DLPNO–CCSD(T) approach [34]. In SAMPL4, we employed local LCCSD(T0) calculations [90], but we needed to use fractionation methods [91] for the large complexes [27, 92]. With the DLPNO–CCSD(T) approach, such approximations could be avoided, and no strong deterioration of the results was observed, as in the previous studies. However, the results were not improved, indicating that the performance is not limited by the accuracy of the QM method.

The use of vacuumoptimised structures is a major problem, giving structures that differ significantly from those obtained in MD simulations in explicit solvent. The problem can be partly reduced by performing the optimisations in a continuum solvent with a few explicit water molecules around the charged group of the ligands (MADtr = 11–12 kJ/mol). However, the structures are still not fully satisfactorily for all ligands.

Conformational sampling is still a problem (especially in combination with the optimised structures) for some of the ligands, especially G6. It can be solved by using more snapshots from MD, but the optimisation method is still a problem.

There are indications that the COSMORS method has problems to provide solvation energies that are comparable for both the negatively and positively charged ligands. In particular, the positively charged G3 and G5 ligands give large errors.

Thermostatistical corrections from HF3c structures are more positive than those obtained by MM methods (as in our SAMPL4 calculations). These corrections seem to be the prime cause of the systematic error of the present calculations.
In conclusion, it seems currently hard to obtain accurate ligandbinding affinities with QM methods and minimised structures. In particular, the QM methods are not competitive with FES methods, based on MM sampling. The problem is not the DFTD or DLPNO–CCSD(T) energy functions, but rather the sampling, geometry optimisation, as well as the solvation and thermostatistical corrections. The octaacid system with its large negative charge seems to pose a large problem for the QM approach and this is further enhanced by ligands of a varying net charge.
An alternative approach to obtain binding free energies with QM methods is to use freeenergy simulations with referencepotential methods (i.e. performing the MD simulations at the MM level and then performing perturbations or reweighting from MM to QM) [14, 72, 93, 94]. Unfortunately, the overlap between the MM and QM potential surfaces are so poor that very many QM calculations are needed to obtain converged results, e.g. 720,000 QM calculations for each of the SAMPL4 octaacid ligands to obtain a precision of 1 kJ/mol [72]. This is ~4000 times more than an approach with single minimised structures, showing that such approach may remain competitive even with quite extensive sampling, provided that the problems with the optimisation, solvation, and thermal corrections can be solved.
Notes
Acknowledgments
This investigation has been supported by Grants from the Swedish research council (Project 20145540), and from the Knut and Alice Wallenberg Foundation (KAW 2013.0022). The computations were performed on computer resources provided by the Swedish National Infrastructure for Computing (SNIC) at Lunarc at Lund University and HPC2 N at Umeå University. F. N. and C. R. gratefully acknowledge financial support of this work by the Max Planck Society.
Supplementary material
References
 1.Gohlke H, Klebe G (2002) Angew Chem Int Ed 41:2644CrossRefGoogle Scholar
 2.Jorgensen WL (2009) Acc Chem Res 42:724CrossRefGoogle Scholar
 3.Zhou HX, Gilson MK (2009) Chem Rev 109:4092CrossRefGoogle Scholar
 4.Michel J, Essex JW (2010) J Comput Aided Mol Des 24:639CrossRefGoogle Scholar
 5.Christ CD, Mark AE, van Gunsteren WF (2010) J Comput Chem 31:1569Google Scholar
 6.Wereszczynski J, McCammon JA (2012) Quart Rev Biophys 45:1CrossRefGoogle Scholar
 7.Söderhjelm P, Ryde U (2009) J Phys Chem A 113:617CrossRefGoogle Scholar
 8.Cavalli A, Carloni P, Recanatini M (2006) Chem Rev 106:3497CrossRefGoogle Scholar
 9.Raha K, Peters MB, Wang B, Yu N, Wollacott AM, Westerhoff LM, Merz KM (2007) Drug Discov Today 12:725CrossRefGoogle Scholar
 10.Söderhjelm P, Kongsted J, Genheden S, Ryde U (2010) Interdiscip Sci Comput Life Sci 2:21–37CrossRefGoogle Scholar
 11.Söderhjelm P, Genheden S, Ryde U (2012) In: Gohlke H (ed) Methods and principles in medicinal chemistry, vol 53. WileyVCH, Weinheim, pp 121–143Google Scholar
 12.Antony J, Grimme S (2012) J Comput Chem 33:1730CrossRefGoogle Scholar
 13.Sure R, Grimme S (2015) J Chem Theory Comput 11:3785–3801CrossRefGoogle Scholar
 14.Ryde U, Söderhjelm P (2016) Ligandbinding affinity estimates supported by quantummechanical methods. Chem Rev 116:5520–5566CrossRefGoogle Scholar
 15.Houk KN, Leach AG, Kim SP, Zhang XY (2003) Angew Chem Int Ed 42:4872–4897CrossRefGoogle Scholar
 16.Moghaddam S, Inoue Y, Gilson MK (2009) J Am Chem Soc 131:4012–4021CrossRefGoogle Scholar
 17.Monroe JI, Shirts MR (2014) J Comput Aided Mol Des 28:401–415CrossRefGoogle Scholar
 18.Hsiao YW, Söderhjelm P (2014) J Comput Aided Mol Des 28:443–454CrossRefGoogle Scholar
 19.Muddana HS, Yin J, Sapra NV, Fenley AT, Gilson MK (2014) J Comput Aided Mol Des 28:463–474CrossRefGoogle Scholar
 20.Jensen JH (2015) Phys Chem Chem Phys 17:12441–12451CrossRefGoogle Scholar
 21.Muddana HS, Varnado CD, Bielawski CW, Urbach AR, Isaacs L, Geballe MT, Gilson MK (2012) J Comput Aided Mol Des 26:475CrossRefGoogle Scholar
 22.Muddana HS, Fenley AT, Mobley DL, Gilson MK (2014) J Comput Aided Mol Des 28:305–317CrossRefGoogle Scholar
 23.Naïm M, Bhat S, Rankin KN, Dennis S, Chowdhury SF, Siddiqi I, Drabik P, Sulea T, Bayly CI, Jakalian A (2007) J Chem Inf Model 47(1):122–133CrossRefGoogle Scholar
 24.Hogues H, Sulea T, Purisima E (2014) J Comput Aided Mol Des 28:417–427CrossRefGoogle Scholar
 25.Sun H, Gibb CLD, Gibb BC (2008) Supramol Chem 20:141CrossRefGoogle Scholar
 26.Gibb CLD, Gibb BC (2009) Tetrahedron 65:7240CrossRefGoogle Scholar
 27.Mikulskis P, Cioloboc D, Andrejic M, Khare S, Brorsson J, Genheden S, Mata RA, Söderhjelm P, Ryde U (2014) J Comput Aided Mol Des 28:375–400CrossRefGoogle Scholar
 28.Gibb CL, Gibb BC (2004) J Am Chem Soc 126:11408–11409CrossRefGoogle Scholar
 29.Yin J, Henriksen NM, Slochower DR, Chiu MW, Mobley DL, Gilson MK (2016) J CompAided Mol Des (in press)Google Scholar
 30.Grimme S (2012) Chem Eur J 18:9955CrossRefGoogle Scholar
 31.Antony J, Sure R, Grimme S (2015) Chem Commun 51:1764–1774CrossRefGoogle Scholar
 32.Grimme S (2011) WIREs Comput Mol Sci 1:211–228CrossRefGoogle Scholar
 33.Sure R, Antony J, Grimme S (2014) J Phys Chem B 118:3431–3440CrossRefGoogle Scholar
 34.Riplinger C, Neese F (2013) J Chem Phys 138:034106CrossRefGoogle Scholar
 35.Gan H, Benjamin CJ, Gibb BC (2011) J Am Chem Soc 133:4770–4773CrossRefGoogle Scholar
 36.Ahlrichs R, Bär M, Häser M, Horn H, Kölmel C (1989) Chem Phys Lett 162:165CrossRefGoogle Scholar
 37.Treutler O, Ahlrichs RJ (1995) Chem Phys 102:346Google Scholar
 38.Tao J, Perdew JP, Staroverov VN, Scuseria GE (2003) Phys Rev Lett 91:146401CrossRefGoogle Scholar
 39.Weigend F, Ahlrichs R (2005) Phys Chem Chem Phys 7:3297–3305CrossRefGoogle Scholar
 40.Grimme S, Antony J, Ehrlich S, Krieg H (2010) J Chem Phys 132:154104CrossRefGoogle Scholar
 41.Perdew JP, Burke K, Ernzerhof M (1996) Phys Rev Lett 77:3865–3868CrossRefGoogle Scholar
 42.
 43.Eichkorn K, Treutler O, Öhm H, Häser M, Ahlrichs R (1995) Chem Phys Lett 240:283–290CrossRefGoogle Scholar
 44.Eichkorn K, Weigend F, Treutler O, Ahlrichs R (1997) Theor Chem Acc 97:119–126CrossRefGoogle Scholar
 45.Sierka M, Hogekamp A, Ahlrichs R (2003) J Chem Phys 118:9136CrossRefGoogle Scholar
 46.Klamt A, Schüürmann G (1994) J Chem Soc Perkin Trans 2:799–805Google Scholar
 47.Schäfer A, Klamt A, Sattel D, Lohrenz JCW, Eckert F (2000) Phys Chem Chem Phys 2:2187–2193CrossRefGoogle Scholar
 48.Klamt A (1995) J Phys Chem 99:2224–2235CrossRefGoogle Scholar
 49.Eckert F, Klamt A (2002) AIChE J 48:369–385CrossRefGoogle Scholar
 50.Eckert F, Klamt A (2010) COSMOtherm, Version C30, Release 1301. COSMOlogic GmbH & Co KG, Leverkusen (Germany)Google Scholar
 51.Becke AD (1988) Phys Rev A 38:3098–3100CrossRefGoogle Scholar
 52.Perdew JP (1986) Phys Rev B 33:8822–8824CrossRefGoogle Scholar
 53.Schäfer A, Horn H, Ahlrichs R (1992) J Chem Phys 97:2571–2577CrossRefGoogle Scholar
 54.Jensen F (1999) Introduction to computational chemistry. Wiley, Chichester, pp 298–303Google Scholar
 55.Sure R, Grimme S (2013) J Comput Chem 34:1672–1685CrossRefGoogle Scholar
 56.Genheden S, Ryde U (2015) Expert Opinion Drug Discov 10:449–461CrossRefGoogle Scholar
 57.Riplinger C, Neese F (2013) J Chem Phys 138:034106CrossRefGoogle Scholar
 58.Riplinger C, Sandhoefer B, Hansen A, Neese F (2013) J Chem Phys 139:134101CrossRefGoogle Scholar
 59.Riplinger C, Pinski P, Becker U, Valeev EF, Neese F (2016) J Chem Phys 144:024109CrossRefGoogle Scholar
 60.Neese F (2012) Wires Comput Mol Sci 2:73–78CrossRefGoogle Scholar
 61.Weigend F (2006) Phys Chem Chem Phys 8:1057–1065CrossRefGoogle Scholar
 62.van Wüllen C (1998) J Chem Phys 109:392–399CrossRefGoogle Scholar
 63.Pantazis DA, Chen XY, Landis CR, Neese F (2008) J Chem Theory Comput 4:908–919CrossRefGoogle Scholar
 64.Zheng J, Xu X, Truhlar DG (2010) Theor Chem Acc 128:295–305CrossRefGoogle Scholar
 65.Boys SF, Bernardi F (1970) Mol Phys 19:553–566CrossRefGoogle Scholar
 66.Neese F, Valeev EF (2011) J Chem Theory Comput 7:33–43CrossRefGoogle Scholar
 67.Liakos DG, Sparta M, Kesharwani MK, Martin JML, Neese F (2015) J Chem Theory Comput 11:1525–1539CrossRefGoogle Scholar
 68.Sparta M, Marius R, Peter P, Ute B, Christoph R, Neese F (2016) in preparationGoogle Scholar
 69.Horn HW, Swope WC, Pitera JW, Madura JD, Dick TJ, Hura GL, HeadGordon T (2004) J Chem Phys 120:9665–9678CrossRefGoogle Scholar
 70.Case DA, Berryman JT, Betz RM, Cerutti DS, Cheatham TE III, Darden TA, Duke RE, Giese TJ, Gohlke H, Goetz AW, Homeyer N, Izadi S, Janowski P, Kaus J, Kovalenko A, Lee TS, LeGrand S, Li P, Luchko T, Luo R, Madej B, Merz KM, Monard G, Needham P, Nguyen H, Nguyen HT, Omelyan I, Onufriev A, Roe DR, Roitberg A, SalomonFerrer R, Simmerling CL, Smith W, Swails J, Walker RC, Wang J, Wolf RM, Wu X, York DM, Kollman PA (2014) AMBER 14. University of California, San FranciscoGoogle Scholar
 71.Wang JM, Wolf RM, Caldwell KW, Kollman PA, Case DA (2004) J Comput Chem 25:1157–1174CrossRefGoogle Scholar
 72.Olsson MA, Söderhjelm P, Ryde U (2016) J Comput Chem 37:1589–1600CrossRefGoogle Scholar
 73.Dewar MJS, Zoebisch EG, Healy EF, Stewart JJP (1985) J Am Chem Soc 107:3902–3909CrossRefGoogle Scholar
 74.Hehre WJ, Ditchfield R, Pople JA (1972) J Chem Phys 56:2257–2261CrossRefGoogle Scholar
 75.Besler BH, Merz KM, Kollman PA (1990) J Comput Chem 11:431–439CrossRefGoogle Scholar
 76.Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Scalmani G, Barone V, Mennucci B, Petersson GA, Nakatsuji H, Caricato M, Li X, Hratchian HP, Izmaylov AF, Bloino J, Zheng G, Sonnenberg JL, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Vreven T, Montgomery JA Jr, Peralta JE, Ogliaro F, Bearpark M, Heyd JJ, Brothers E, Kudin KN, Staroverov VN, Kobayashi R, Normand J, Raghavachari K, Rendell A, Burant JC, Iyengar SS, Tomasi J, Cossi M, Rega N, Millam JM, Klene M, Knox JE, Cross JB, Bakken V, Adamo C, Jaramillo J, Gomperts R, Stratmann RE, Yazyev O, Austin AJ, Cammi R, Pomelli C, Ochterski JW, Martin RL, Morokuma K, Zakrzewski VG, Voth GA, Salvador P, Dannenberg JJ, Dapprich S, Daniels AD, Farkas O, Foresman JB, Ortiz JV, Cioslowski J, Fox DJ (2009) Gaussian 09, revision A02. Gaussian Inc, Wallingford CTGoogle Scholar
 77.Bayly CI, Cieplak P, Cornell WD, Kollman PA (1993) J Phys Chem 97:10269–10280CrossRefGoogle Scholar
 78.Seminario JM (1996) Int J Quant Chem 60:1271CrossRefGoogle Scholar
 79.Nilsson K, Lecerof D, Sigfridsson E, Ryde U (2003) Acta Crystallogr D 59:274–289CrossRefGoogle Scholar
 80.Wang JM, Wolf RM, Caldwell KW, Kollman PA, Case DA (2004) J Comput Chem 25:1157–1174CrossRefGoogle Scholar
 81.Wu XW, Brooks BR (2003) Chem Phys Lett 381:512–518CrossRefGoogle Scholar
 82.Berendsen HJC, Postma JPM, van Gunsteren WF, Dinola A, Haak JR (1984) J Chem Phys 81:3684–3690CrossRefGoogle Scholar
 83.Darden T, York D, Pedersen L (1993) J Chem Phys 98:10089–10092CrossRefGoogle Scholar
 84.Rustenburg AS, Dancer J, Lin B, Ortwine DF, Mobley DL, Chodera JD (2016) J Comput Aided Chem Des (submitted)Google Scholar
 85.Pinski P, Riplinger C, Valeev EF, Neese F (2015) J Chem Phys 143:034108CrossRefGoogle Scholar
 86.Liakos DG, Sparta M, Kesharwani MK, Martin JML, Neese F (2015) J Chem Theory Comput 11:1525–1539CrossRefGoogle Scholar
 87.Liakos DG, Neese F (2015) J Chem Theory Comput 11:2137–2143CrossRefGoogle Scholar
 88.Chang CE, Potter MJ, Gilson MK (2003) J Phys Chem B 107:1048–1055CrossRefGoogle Scholar
 89.Cecchini M, Krivov SV, Spichty M, Karplus M (2009) J Phys Chem B 113:9728–9740CrossRefGoogle Scholar
 90.Hampel C, Werner HJ (1996) J Chem Phys 104:6286–6297CrossRefGoogle Scholar
 91.Söderhjelm P, Ryde U (2009) J Phys Chem A 113:617–627CrossRefGoogle Scholar
 92.Andrejic M, Ryde U, Mata RA, Söderhjelm P (2014) Chem Phys Chem 15:3270–3281Google Scholar
 93.Luzhkov V, Warshel A (1992) J Comput Chem 13:199–213CrossRefGoogle Scholar
 94.König G, Boresch S (2011) J Comput Chem 32:1082–1090CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.