The ability to accurately compute free energy is fundamental to most (if not all) of rational drug design [1, 2]. Quantities such as binding affinities, acidity constants, and partition coefficients are intrinsically tied to free energy differences and only highlight this demand. Numerous computational approaches have come to fruition to answer the call for reliable free energy predictions [3,4,5]. The breadth of rigor encompassed by these techniques within the computational chemistry community is vast, ranging from purely empirical to strictly ab-initio based.

Unfortunately, in practice, the sheer number of ways available to compute free energy differences makes selecting an appropriate method an almost unwieldy task, as test-sets across methodological benchmarks tend to be dissimilar, making “apples-to-apples” comparisons impossible. To address the need for unbiased evaluations of the various free energy methodologies peddled throughout the computational community, the Statistical Assessment of the Modelling of Proteins and Ligands (SAMPL) challenge seeks to systematically appraise the current state of computational approaches for computing free energy [6,7,8,9,10,11]. One of the most popular components of the SAMPL challenges is the prediction of host–guest binding affinities. The ability to accurately perform a virtual screening for a library of ligands against a potential target protein or properly ranking potential therapeutics against a known binding compound is vital to the drug design process. Thus, it is no surprise that the reliable prediction of binding free energies lies at the heart of rational drug design.

Accomplishing this goal requires a faithful and versatile energetic description that encapsulates various chemical moieties and nuanced covalent/non-covalent interactions. Mixed quantum mechanical/molecular mechanical (QM/MM) methods are well-suited for this task, but employing QM/MM Hamiltonians in free energy simulations is often prohibitive, as the cost of performing the relevant dynamics is quite egregious. Even if the computational expense can be ameliorated, the need for alchemical tricks (e.g., soft-core potentials [12, 13]) essentially necessitates the use of MM. Thus, the indirect approach to free energy is a popular strategy for achieving QM/MM free energies by performing the brunt of calculations at a classical level and simply “correcting” to a QM/MM level of theory (see Fig. 1) [14,15,16,17,18,19]. To keep the requisite free energy differences between levels of theory tenable, it is often desirable to use the Zwanzig equation (e.g., Free Energy Perturbation, FEP, see Eq. 1) [20], as the sampling is all performed classically and the QM/MM energetics can be trivially post-processed. It should be noted that Eq. 1 only works whenever there is sufficient configurational overlap between the MM and QM/MM levels of theory.

$$\begin{aligned} \varDelta {A}^{\mathbf{MM }\rightarrow {\mathbf{QM/MM }}}= -\frac{1}{\beta } \log { \left\langle \exp { \left[ -\beta {(U^{\mathbf{QM/MM }} - U^{\mathbf{MM }})} \right] }\right\rangle _{\mathbf{MM }}} \end{aligned}$$
Fig. 1
figure 1

Themodynamic cycle used for computing QM/MM free energy indirectly. By computing each of the legs (ii), (iii), and (iv), it is possible to use the cycle closure to obtain \((iii)+(iv)-(ii)=(i)\), the desired QM/MM free energy difference

This work presents a culmination of QM/MM-based attempts at computing binding free energies for the SAMPL8 drugs-of-abuse challenge. This particular challenge involved predicting the binding free energy of several narcotics, such as cocaine, morphine, and fentanyl to curcurbit-[8]-uril [21, 22]. A list of all seven challenge compounds, as well as the host molecule, can be seen in Fig. 2 (see [23]).

Fig. 2
figure 2

Molecules of interest for the SAMPL8 drugs of abuse CB[8] host–guest binding challenge. Image was provided by the SAMPL coordinators [23, 24]

Results herein are obtained in a two step process. First, classical parameters are obtained through QM intra-molecular force-matching [25,26,27,28,29,30,31,32,33], the rationale being that selecting parameters that best reproduce forces at the desired level of QM theory will produce better results through the indirect cycles (i.e., convergence of Eq. 1 will be easier to achieve). Binding free energies are then computed with the force-matched (FM) parameters, and corrections are computed at thermodynamic end states. This manuscript solely focuses on indirectly obtained QM/MM binding free energies. Further discussion on the classical results (e.g., with FM parameters) can be found in a companion work.



All parameters herein are based on the potential energy function utilized by the CHARMM forcefield [34]. Initial host/guest CHARMM parameters were obtained with the Paramchem server [35] and designated herein as the “C36” parameter set. The host parameters were reused from a previous (SAMPL6) competition, designated as “S6” CB[8]. Details of the S6 parameterization process, as well as their overall performance, can be found in [36, 37]. For molecule G5 (Ketamine), we opted to model both the R & S enantiomers of ketamine, which in principal should give identical outcomes when bound to an achiral host in achiral solvent. Since the experimental results were obtained at a pH of 7.4 [38], and the experimental pH of ketamine is about 7.5 [39], both protonated and neutral ketamine were considered, giving a total of 4 parameter sets for G5 (i.e., either R or S enantiomer, and either protonated or neutral). The guest compound p\(K_{\text {a}}\)values, found in a corresponding literature search, are summarized in SI (Table S1).

As the use of FM potentials proved beneficial in SAMPL6, we opted to incorporate a similar workflow into the present SAMPL8 challenge [37]. That is, all parameters governing bonded degrees of freedom (e.g., bonds, angles, dihedrals, etc.) were fit via QM intramolecular force-matching. Lennard-Jones (LJ) terms were carried over from CGenFF [40] (based on Paramchem assignment), and partial charges were fit via QM charge fitting. The goal of using FM based parameters is that by forsaking transferability, one can create parameters that have higher configurational overlap with the target Hamiltonian of interest (e.g., QM/MM) and thereby allow easier convergence of free energy differences between levels of theory [37].

Assigning non-bonded parameters

Using Q-Chem and following along the flowchart shown in Fig. 3, all the guest molecules were subjected to geometry optimizations with MP2/6-31G* [41, 42], whereas the host was optimized with B3LYP/6-31G* [43, 44] to reduce computational expense. We attempted to obtain charges that would be most similar to what would be found in the CGenFF force field, thus a benchmark of 19 small rigid molecules found in CGenFF was performed with various charge fitting schemes and ranked based on RMSE from the original CGenFF charges. The results of the benchmark (details can be found in SI, section S1) demonstrated CM5-symmetrized [45] charges with HF/6-311G** and PCM [46] implicit solvent provided the lowest RMSE from the CGenFF reference, and thus was used in this study. As mentioned prior, LJ parameters were kept consistent with CGenFF via the initial assignment based on the Paramchem server.

Fig. 3
figure 3

Flowchart used in the intramolecular force-matching process

Assigning bonded parameters

Employing the C36 parameter sets for the guest (and S6 for the host), 100 ns of gas-phase Langevin Dynamics (LD) without non-bonded interaction cutoffs was generated via CHARMM at 300 K with a collision frequency of 5 \(\hbox {ps}^{-1}\), a timestep of 1 fs, and a coordinate snapshot saving frequency of 10 ps [34]. From the 10,000 collected configurations per molecule, QMFootnote 1 force calculations were performed on each snapshot at the MP2/6-31G(d) [41] (B3LYP/6-31G(d) for the host), \(\omega\)B97X-D/def2-SVP, PM6-D3H4, and GFN-2 levels of theory using Psi4 [51], MOPAC [52], and XTB [48] respectively. Classical non-bonded forces (LJ and Coulomb) were removed from the QM-computed force to focus the fit solely on the intramolecular terms (i.e., bonds, angles, dihedrals, etc). Force matching proceeded via the ForceSolve program, which incorporates a Bayesian formalism that obtains parameters by minimizing a log-likelihood function of the observed QM forces [37, 53]. Urey-Bradley terms were omitted from fits, as numerical issues can arise in the force-matching process [54], and the functional form of the potential (e.g., dihedral multiplicities, improper dihedrals where present, etc.) were carried over from the initial Paramchem assignment. In a few cases where force constants for dihedrals were unfeasibly high (e.g., greater than 50 kcal/mol), the multiplicities were tweaked to obtain viable torsional parameters while maintaining reasonably similar residuals. Manual adjustments to dihedrals were based on the FM parameters obtained with PM6-D3H4 and kept consistent across other FM parameter set (i.e., dihedral multiplicities were kept identical during force-matching across different QM/MM Hamiltonian for each molecule). A summary of the various FM parameters can be found in Table 1.

Table 1 Breakdown of the different classical Hamiltonians relevant to the QM/MM submissions for the SAMPL8 challenge

Pose generation and system setup

Each of the guest molecules was docked into CB[8] using GalaxyDock [55, 56], with 3–5 unique poses identified per host–guest pairing. Pose refinements were obtained by employing GBMV implicit solvent model in CHARMM with a center-of-mass restraint between the guest and host (vide infra), as well as RMSD restraints on the host and guest individually. Successive short MD simulations followed in which the force constants of the restraints were relaxed.

Implicit solvent refinement details

Generated poses were placed into a cylindrical restraint that protruded through the CB8 host, with the z-axis of the cylinder defined by the two midpoints of the oxygens along the top and the bottom of the CB[8] host, and a cylinder radius of 6.2 Å. Initial conformations of bound poses were maintained with a heavy-atom RMSD restraint of 200 kcal/mol\(\cdot\)Å\(^{2}\) and with a maximum allowable RMSD of up to 1 Å, whereas an absolute RMSD restraint held the host fixed in Cartesian space about the origin with a force constant of 999 kcal/mol\(\cdot\)Å\(^{2}\). The guest was then further dragged into the host via a flat bottom harmonic restraint between center-of-masses of the host and the guest to keep the COM distance below a specified value (i.e., 2 Å for G3 and G4, 6 Å for G2, and 3 Å for G1, G5, G6, and G7). With an initial force constant of 0.001 kcal/mol\(\cdot\)Å\(^2\), the RMSD restraint on the guest is removed and a series of 5 ps GBMV implicit solvent LD simulations are ran, with the next subsequent simulation increasing the host–guest COM distance force constant by a factor of 10, up to a maximum of 10 kcal/mol\(\cdot\)Å\(^2\) (i.e. five sequential simulations). All of these simulations were run using a 5 \(\hbox {ps}^{-1}\) coefficient of friction, without non-bonded cutoffs and a timestep of 1 fs.

System setup

Both the bound poses and the individual guests were solvated in a 55 Å cubic water box of TIP3 water molecules. Sodium cations were added as needed to neutralize the system, and 2 additional sodium/chlorine ion pairs were added in order to replicate the experimental ionic strength of 0.15 M. All simulations herein were performed using LD with a timestep of 1 fs, a temperature of 298.15 K, a friction coefficient of 5 \(\hbox {ps}^{-1}\), a 1 ps coordinate saving frequency, particle-mesh Ewald for electrostatic treatment, and a polynomial potential switching function between 10 and 12 Å for LJ interactions. Simulations were performed with OpenMM version 7.4.2 on GPUs [57]. No bond or angle constraints were used on either the host or guest. Host–guest bound complexes were maintained via a flat bottom potential with a force constant of 1.5 kcal/mol\(\cdot\)Å\(^2\) with an offset of 5.5 Å to ensure the guest remained bound. Each system underwent a brief 1 ns NPT LD simulation using a Monte-Carlo barostat set to 1 atm with a frequency of 25 steps.

Free energy simulations

All classical binding free energies were obtained through the scheme outlined in Fig. 4.Starting from the guest–host complex (GH, top right, Fig. 4), the guest \({\mathbf {G}}\) is bound within the host \({\mathbf {H}}\) (generation of binding poses is described in the next section). To ensure that the guest remains within the host, a flat-bottom harmonic COM restraint is applied between the guest and the host, corresponding to a free energy contribution \(\varDelta A^{{{\mathbf {G}}}{{\mathbf {H}}}}_\text {restr off}\) (Eq. 4). The so-called “double-decoupling” method (DDM) [58, 59], in which electrostatics and van der Waals interactions are sequentially disabled on the guest compound, is used to make the guest “vanish” from inside the host. Then, the free energy of deactivating the COM restraint for the unbound guest is accounted for via the free energy of changing to a standard state \(\varDelta A^{{\mathbf {G}}+{\mathbf {H}}}_\text {restr off}\) (Eq. 6) [36]. Since the guest has no intermolecular interactions, the environment is switched from guest–host complex to guest free in aqueous solution without incurring any energetic penalty. At this point, the guest has intermolecular interactions restored through a second DDM calculationFootnote 2.

$$\begin{aligned} \varDelta {A}^{\text {bind}}&= \varDelta {A}^{\mathbf {G+H}}_{\text {DDM}} -\varDelta {A}^{\mathbf {G+H}}_{\text {restr off}} - \varDelta {A}^{{{\mathbf {G}}}{{\mathbf {H}}}}_{\text {DDM}} + \varDelta {A}^{{{\mathbf {G}}}{{\mathbf {H}}}}_{\text {restr off}} \end{aligned}$$

Using the DDM approach, electrostatics on the guest molecules were gradually “shut-off” over 5 simulations (\(\lambda\)-states 0–5), followed by the deactivation of guest van der Waals interactions over an additional 10 simulations (\(\lambda\)-states 5–15). Details of the 16 successive \(\lambda\)-states are listed in Table 2. Deactivation of the guest electrostatics was accomplished by simply scaling partial charges on the guest solute, whereas van der Waals deactivation was performed with the so called “soft-core” LJ potential as given by the following equation,

$$\begin{aligned}&U_{\lambda _{\mathrm {vdw}}}^{\mathrm {LJ}} (r_{ij}) = 4 \lambda _\text {vdw}\varepsilon _{ij} \left( \left( \frac{\sigma }{r_{ij} + \sigma _{ij}(1-\lambda _{\mathrm {vdw}})} \right) ^{12} \right. \nonumber \\&\quad \left. - \left( \frac{\sigma }{r_{ij} + \sigma _{ij}(1-\lambda _{\mathrm {vdw}})} \right) ^{6} \right) . \end{aligned}$$

All 16 alchemical simulations were performed independently and simultaneously. Alchemical functionality was provided through a specially modified variant of the Alchemy module in OpenMM Tools based on fixes provided in one of the last SAMPL challenges [60].

Table 2 \(\lambda\)-state schedule, with accompanying decoupling parameters for van der Waals (\(\lambda _\mathrm {vdW}\)) and electrostatic (\(\lambda _\mathrm {elec}\)) interactions defining each \(\lambda\)-state
Fig. 4
figure 4

The cycle used for computing the classical binding free energy. The guest bound-state is designated as GH, whereas the guest free in solution (or rather, infinitely seperated from the host) is labeled as G+H. The term “off” is used to indicate the deactivation of non-bonded (e.g., electrostatic and van der Waals) interactions on the guest compound. Subscripts ”restr” denotes a restraining potential is active, and “restr off” refers to the deactivation of the restraining potential . See Eq. 2 for a summary equation

Starting from snapshots obtained from the final step of the aforementioned NPT equilibration, each lambda-state underwent a 500 steps minimization. An NVT equilibration run of 1 ns followed, based on the consensus box-length obtained from the second half of the NPT equilibration. From this, production simulations were run for 15 ns, giving a total of 15,000 snapshots per lambda-state. Energies for each lambda-state were calculated on the fly per snapshot using the AlchemyFlow module from the RickFlow in-house code [61], and the deactivation free energies \(\varDelta {A}_{\text {DDM}}\) were computed via MBAR [62].

Restraint free energy

To account for the effect of the flat-bottom potential on the binding free energy, we consider the two contributions to the free energy: (1) \(\varDelta {A}^\mathbf{GH }_{\text {restr off}}\), the free energy of turning the restraint off at \(\lambda =0\) and (2) \(\varDelta {A}^{\mathbf{G }+\mathbf{H }}_{\text {restr off}}\), the free energy of turning off the restraint at \(\lambda =15\).

For \(\lambda =0\), host–guest interactions are unperturbed, and for a well-bound system the COM distance between the guest and host stays fairly stable. In this situation, the free energy of turning off the restraint can be determined using the FEP method, which yields

$$\begin{aligned} \varDelta {A}^\mathbf{GH }_{\text {restr off}}=-\frac{1}{\beta } \log {\left<\exp \left[ -\beta (\varDelta {U}_{\text {restr}}^{\text {on}\rightarrow \text {off}})\right] \right>_{\lambda =0}} \end{aligned}$$


$$\varDelta {U}_{\text {restr}}^{\text {on}\rightarrow {\text {off}}}(r) = \left\{ \begin{array}{ll} 0 \quad & {\text {if}} \quad r < 5.5 \, \text{\AA} \\ -\frac{1}{2}K(r-5.5)^2 \quad & {\text {if}} \quad r \ge 5.5 \, \text{\AA} \end{array} \right.$$

At \(\lambda =15\), the guest electrostatics and van der Waals terms are deactivated (i.e., \(\lambda _{\text {vdw}}=\lambda _{\text {elec}} = 0\)). Therefore, the flat-bottom restraint will only act as a restriction of the space the guest explores. Hence, the free energy of deactivating the COM restraint \(\varDelta {A}^{\mathbf{G }+\mathbf{H }}_{\text {restr off}}\) corresponds to the free energy of changing to standard state (e.g., changing from the effective volume explored through the COM restraint, \(V_{\text {eff}}\), to the concentration corresponding to the standard state, \(V_{0}\)). Based on the work of [63], we can compute the free energy of releasing the restraint as

$$\begin{aligned} \varDelta {A}^{\mathbf{G }+\mathbf{H }}_{\text {restr off}}=-\frac{1}{\beta }\log {\frac{V_{0}}{V_\text {eff}}}. \end{aligned}$$

where \(V_0=1661\) \(\hbox {AA}^3\) as the volume corresponding to standard state concentration. Using this, and approximating (albeit, an overestimation) \(V_{\text {eff}}\) as

$$\begin{aligned} V_\text {eff}&=\int _{{\mathbb {R}}^3} e^{-\beta {U_\text {restr}(\mathbf {{\mathbf {r}}})}}d\mathbf {{\mathbf {r}}} \\&\approx \frac{4}{3}\pi \left( r_\text {max}^3 - r_\text {min}^3\right) , \end{aligned}$$

we arrive at

$$\begin{aligned} \varDelta {A}^{\mathbf{G }+\mathbf{H }}_{\text {restr off}} =-\frac{1}{\beta }\log {\left( \frac{V_0}{\frac{4}{3}\pi (r_{\text {max}}^{3}-r_{\text {min}}^{3})}\right) } \end{aligned}$$

where \(r_{\text {max}}\) is the maximum observed COM distance between the host and guest, and \(r_{\text {min}}\) the minimum observed COM distance between host and guest [59, 63,64,65]. Values for \(r_\text {min}\) and \(r_\text {max}\) are found in SI Table S5, with the largest \(r_\text {min}\) at 0.27 Å, and thus having a maximum contribution of about 0.08 Å3.

G5 p\(K_{\text {a}}\) correction

As mentioned prior, both protonated and neutral ketamine (G5) were considered as to incorporate relevant p\(K_{\text {a}}\) effects on the binding free energy. Given that the ketamine p\(K_{\text {a}}\) in solution is known (p\(K_{\text {a}}\)(aq) = 7.5, which is close to the experimental pH of 7.4), the p\(K_{\text {a}}\) of bound-state ketamine can be indirectly obtained through computing both the protonated and neutral binding free energies (\(\varDelta {A}^{G^+}_{bind}\) and \(\varDelta {A}^{G}_{bind}\), respectively), and applying the thermodynamic cycle seen in Fig. 5.

Fig. 5
figure 5

Thermodynamic cycle used to indirectly compute the bound ligand p\(K_{\text {a}}\). Quantities (ii) and (iii) are binding free energies, computed as described in section ‘Free energy simulations’, and (iv) is known from the guest p\(K_{\text {a}}\)

Simple manipulation allows us to find \(\varDelta {A}^{G^+\rightarrow G}_{GH} = (iv)+(iii)-(ii)\), and therefore we arrive at the following expression for the bound state p\(K_{\text {a}}\):

$$\begin{aligned} \text {p}K_{\text {a}}(\text {G,}{} \textit{bnd})= \text {p}K_{\text {a}}(\text {G,}{} \textit{aq}) +\frac{\varDelta \varDelta {A}_{\text {bind}}^{G^+\rightarrow G}}{RT\ln {10}}. \end{aligned}$$

Finally, the pH dependent binding free energy is determined by the equation [66, 67],

$$\begin{aligned} \varDelta {A}^{\text {pH}=7.4}_{\text {bind}} =&\varDelta {A}_{\text {bind}}^{\text {G}^+} - \frac{1}{\beta }\log {\left[ \frac{1+10^{7.4-\text {p}K_{\text {a}}(\text {G,}{} \textit{bnd})}}{1+10^{7.4-\text {p}K_{\text {a}}(\text {G,}{} \textit{aq})}}\right] } \end{aligned}$$
$$\begin{aligned} =&\varDelta {A}_{\text {bind}}^{\text {G}} - \frac{1}{\beta }\log {\left[ \frac{1+10^{\text {p}K_{\text {a}}(\text {G,}{} \textit{bnd})-7.4}}{1+10^{\text {p}K_{\text {a}}(\text {G,}{} \textit{aq})-7.4}}\right] } \end{aligned}$$

Indirect QM/MM practical scheme

After calculating the classical binding free energies with FM potentials, we then aimed to compute QM/MM binding free energies. The cycle of Fig. 1 can be extended for obtaining QM/MM binding free energies as shown in Fig. 6.

Fig. 6
figure 6

The cycle used for the indirect calculation of QM/MM binding free energies. H and G respectively designate the host and guest systems, and (G H) the guest–host complex

In this case, we chose to only perform the QM/MM correction on the guest compounds. Although there is no conceptual barrier for correcting the host as well, the size of the host would make the convergence of these QM/MM free energy corrections dubious at best. The contributions required to obtain the indirect QM/MM binding free energy are summarized in the following:

$$\begin{aligned}&\varDelta {A}^{\mathbf{QM/MM }/(\mathbf{MM })MM}_{\text {bind}} = \varDelta {A}^{\mathbf{MM }/MM}_{\text {bind}}\nonumber \\&\quad +\varDelta {A}^{(\mathbf{MM }\rightarrow \mathbf{QM/MM })/MM}_{\mathbf{G },\text {GH}} -\varDelta {A}^{(\mathbf{MM }\rightarrow \mathbf{QM/MM })/MM}_{\mathbf{G }} \end{aligned}$$

Calculating the free energy difference between the MM and QM/MM levels of theory is done through FEP (Eq. 1). To this end, 10 ns of NVT LD simulation were performed for every guest both in bound state and free in solution, saving out a total of 10,000 configurational snapshots per end-state per guest. Four QM/MM methods were selected as target levels of theory, consisting of two SQM (PM6-D3H4 and GFN-2) and two DFT (\(\omega\)B97X-D and BLYP) approaches. For the DFT based methods, the MM regions were treated as either a field of point charges or by “smearing” charges through fitting to delocalized Gaussians (i.e., Gaussian blurring) [68]. A blur width of 1 Å was used for all QM/MM calculations with Gaussian blurring. A summary of the various levels of theory, MM charge treatments, and software used for computation can be found in Table 3. Standard deviations of the free energy difference were computed via block averaging using 10 blocks of 1000 snapshots each.

Table 3 List of target QM/MM levels of theory and the MM levels of theory through which the corrections are computed

Since QM/MM Ewald is not fully supported for all the QM levels of theory employed, QM/MM energies were evaluated without cutoffs or Ewald. Instead, following the logic of Refs. [37] and [70], classical energies were also obtained with and without cutoffs or Ewald, and the difference of the two was used to obtain the so-called “PBC” energy (\(U^{\text {MM}}_{\text {PBC}}=U^{\text {MM}}_{\text {w/PBC}} - U^{\text {MM}}_{\text {no\,PBC}}\)). From this, the relevant energies for evaluating the Eq. 1 were setup as \(U^{\text {MM}}=U^{\text {MM}}_{\text {no\,PBC}}+U^{\text {MM}}_{\text {PBC}}\) and \(U^{\text {QM/MM}}=U^{\text {QM/MM}}_{\text {no\,PBC}} + U^{\text {MM}}_{\text {PBC}}\), giving \(\varDelta {U}^{\text {MM}\rightarrow \text {QM/MM}}=\varDelta {U}^{\text {MM}\rightarrow \text {QM/MM}}_{\text {no\,PBC}}\). Inherently, the underlying assumption is that the contributions from Ewald for QM/MM and MM do not differ drastically. In the case of the two SQM/MM calculations, the energy of the MM environment as well as the QM/MM van der Waals interactions between solute-environment (which are modelled with classical van der Waals potentials) were added in post-processing through CHARMM.

QM/MM correction metrics

Although a plethora of methods exist for evaluating the convergence quality of a free energy calculation, most require sampling both end states of interest. Since sampling the QM/MM surface is prohibitive, we restrict our focus to metrics applicable to “one-sided” (i.e., only requiring sampling of the FM/MM configurational space) approaches. In previous work, it has been shown that the standard deviation of potential energy difference between FM/MM and QM/MM, as well as the so-called bias measure, can give insight as to when free energy calculations with FEP fail. In regards to \(\sigma {\left[ \varDelta {U}^{(\text {FM}\rightarrow \text {QM)/MM}}_{\text {FM/MM}}\right] }\) (i.e., the standard deviation of the potential energy difference from the FM/MM ensemble), larger values are associated with poor convergence due to a sparse collection of snapshots truly contributing to the exponential average (i.e., snapshots occupying the lower energy tail of the \(\varDelta {U}\) distribution). As a rule of thumb, keeping \(\sigma {\left[ \varDelta {U}^{(\text {FM}\rightarrow \text {QM)/MM}}_{\text {FM/MM}}\right] }\le 4k_{B}T\) (\(\approx 2.4\) kcal/mol) is desirable (the smaller the better).

The second metric, the bias measure, indicates the presence of bias in a free energy estimate [71,72,73]. By assuming that the spread of \(\varDelta {U}^{(\text {FM}\rightarrow \text {QM)/MM}}\) is similar between FM/MM and QM/MM surfaces, the bias measure can be applied to our one sided approach, taking the form

$$\begin{aligned} \varPi&=\sqrt{{\mathbf {W}}_{L}\left[ \frac{1}{2\pi }\left( N-1\right) ^2\right] }\nonumber \\&\quad - \sqrt{2\beta \left( \left<\varDelta {U}^{\left( \text {FM}\rightarrow \text {QM}\right) /\text {MM}}\right>_{\text {FM/MM}}-\varDelta {A}^{\left( \text {FM}\rightarrow \text {QM}\right) /\text {MM}}\right) } \end{aligned}$$

where \({\mathbf {W}}_L\) is the Lambert function, and \(\varDelta {A}^{\left( \text {FM}\rightarrow \text {QM}\right) \text {/MM}}\) is the FEP estimate evaluated with a configurational sampling of size N. Following the recommendations of Refs. [72, 73], \(\varPi\) should be greater than 0.5 (the larger the better). Take note that this condition is necessary but not sufficient: although failing to have a \(\varPi >0.5\) or a \(\sigma {\left[ \varDelta {U}^{(\text {FM}\rightarrow \text {QM)/MM}}_{\text {FM/MM}}\right] }\le 4k_{B}T\) can cast doubt onto the converge of the FEP estimate, satisfying both criteria does not guarantee convergence.

Results and discussion

Overview of results

Results for the 6 QM/MM submissions, as well as the three FM-based submissions they are corrected from, are found in Table 4 and Fig. 7. From a cursory glance, a few observations can be made. First, the PM6-D3H4 submission is easy to discern as our best QM/MM submission (as well as our best submission overall), with a RMSE of 2.4 kcal/mol. Similar success was achieved using PM6-D3H4 in the SAMPL6 challenge by the Ryde Group [74], and served as inspiration for utilitizing it in SAMPL8. Predictions based on GFN-2 also performed rather wellFootnote 3 with a RMSE of 2.9 kcal/mol. Although correlation statistics are comparatively high amongst the QM/MM submissions (\(\rho\) is between 0.58–0.79 [75] and Kendall-\(\tau\) is between 0.43–0.62 [76]), RMSE is the more indicative metric of success due to the rather small sample size of 7.

Fig. 7
figure 7

Statistical measurements ranked for all QM/MM submissions. Non-NIH method’s names were given as follows in the SAMPL Github (see [24]). S1: GFN2-xTB/MetaMD/GBSA/ensemble/Nobuffer, S2: GAFF-RESP/TIP3P/MD/xtb-GFN2B/Boltz-Avg, S3: GAFF-RESP/TIP3P/MD-Classical/xtb-GFN2B

Table 4 Binding free energies obtained for each guest molecule, in kcal/mol, and statistical quantities for each methods

Some noteworthy observations can be made about the effects of correcting FM to QM/MM on our SAMPL8 submissions. First, FM(PM6-D3H4) did reasonably well, but improved remarkably after the QM/MM correction step. This is not the first time that PM6-D3H4 has shown good results for binding free energies in the SAMPL challenges [36]. The semi-empirical GFN-2 was a popular QM/MM choice in this challenge, with 3 other GFN-2 based methods submitted by other groups. However, our GFN-2 results outperformed these in every single metric of interest (see Table 5).

Table 5 Table showing the change in the correction metrics \(\varPi\) and \(\sigma\) for computing \(\varDelta {A}^{\text {MM}\rightarrow \text {QM}}\) in gas-phase, with \(\varDelta \varPi =\varPi ^{\text {FM/MM}}-\varPi ^{\text {C36/MM}}\) and \(\varDelta \sigma =\sigma ^{\text {FM/MM}}-\sigma ^{\text {C36/MM}}\)

The FM(GFN-2) results were somewhat poor, but drastically improved upon correction (RMSE went from 5.16 to 2.94 kcal/mol). The results for correcting FM to QM/MM with DFT, however, seemed to either offer little improvement (e.g., going from FM(\(\omega\)B97X-D) to \(\omega\)B97X-D/def2-SVP), or worsen results. Of particular note is the systematically poor performance of DFT in predicting the G7 binding free energy. In previous work, we found underestimates of this magnitude, to the point of suggesting non-binding, were indicative of a problem with the FM parameters (vide infra section ‘Metrics for the indirect approach’).

Charge blurring, as a whole, worsened results. For BLYP/6-31G* calculations, there was a net increase in RMSD of about 0.5 kcal/mol, whereas for the \(\omega\)B97X-D/def2-SVP calculations the RMSD went up by about 2.1 kcal/mol after using blurring. These findings are not entirely unexpected, as results are often dependent on selecting the correct blur width, for which the correct choice is not always transferable among different QM levels of theory (e.g., the appropriate blur width for BLYP/6-31G* and \(\omega\)B97X-D/def2-SVP will not be the same) [68]. Overall, the SQM performance was markedly better than the DFT results. It is worth noting BLYP was included because of its good performance when computing solvation free energies in past works, but it is generally considered less robust in comparison to common methods (e.g., \(\omega\)B97X-D, B3LYP, etc.), and its poor results here are not too surprising [77]. The results for \(\omega\)B97X-D/def2-SVP, excluding G7 (RMSE \(\approx\) 2.86 kcal/mol), show promise for this level of theory if executed correctly. A summary of pre- and post-QM/MM correction binding free energies can be found in Fig. 8 and plots illustrating the associated change in binding free energy for each guest can be found in Fig. 9.

Fig. 8
figure 8

Bar graphs showing the relevant statistics for each QM/MM submission, as well as the associated classical (FM) submission for which QM/MM corrections were added. Results for the FM submissions are outlined in black, and the QM/MM blurred results are translucent

Fig. 9
figure 9

Plots of the experimental results vs. computed binding free energies before and after employing the QM/MM correction scheme. All energies are given in kcal/mol

Metrics for the indirect approach

One of the most important steps in indirectly computing QM/MM free energy is to determine the degree to which results are converged. In particular, one way to check how well force-matching improved our indirect scheme is to compare how the metrics \(\sigma _{\varDelta {U}}\) and \(\varPi\) change when using a standard force field (e.g., CGenFF) versus a FM description. In principle, changing from CGenFF to FM should improve both statistics, with \(\varPi\) getting large and \(\sigma _{\varDelta {U}}\) getting smaller (i.e., \(\varDelta \varPi >0\) and \(\varDelta \sigma _{\varDelta {U}}<0\)).

Since the force-matching is performed in gas-phase, one would expect improvements to be much larger in gas-phase than in solution. However, we anticipate that trends observed based on comparing the change in metrics when computing \(\varDelta {A}_{\text {gas}}^{\text {MM}\rightarrow \text {QM}}\) should be consistent with \(\varDelta {A}_{\text {aq}}^{\text {MM}\rightarrow \text {QM/MM}}\) (e.g., if the calculation fails when going from FM to QM in gas-phase, it will most likely fail for FM/MM to QM/MM in solution as well). The resulting metric changes for each guest in gas-phase is shown in Table 5 for PM6-D3H4 and GFN-2. Overall, the changes are somewhat modest, with guests G1, G2, G3, G4, and G6 demonstrating decent improvement. For guest G7, the quality of correction deteriorated when using a FM description, indicating a bad fit most likely stemming from overenthusiastic manual tweaking of dihedral multiplicities. This observation also explains why the DFT results were rather poor for G7.

The results for the four G5 compounds are rather perplexing. Although, in principle, the parameter sets between enantiomers with the same charge should be identical, the force-matching procedure produced subtle differences in the resulting fit for each enantiomer pair. This discrepancy is likely due to the inherent limitation of finite sampling. That is, the sampling of configurations between the two enantiomers is similar, but not perfectly identical due to different initial starting conditions (such as different random starting velocities). In the limit of infinite sampling, the ensembles would theoretically be identical. Coupling the initial deviations with the need to tweak dihedral multiplicities for each parameter set (e.g., the adjustments to multiplicities for R-G5 were different than the adjustments for S-G5), these factors only exacerbated the differences in the quality of FM fit. However, one must note that the quality of correction based on the metrics of Table 5 for the G5 derivatives seem to be Hamiltonian dependent, with GFN-2 based parameters only improving the correction as compared to the deterioration observed for the R enantiomers with the \(\omega\)B97X-D/def2-SVP based parameters. Since dihedral forms remain constant across Hamiltonians, this likely indicates that the fitting of dihedrals in the force-matching procedure will potentially have a large dependence on the Hamiltonian of interest (e.g., different levels of theory might require different dihedral multiplicities).

Small inconsistencies found within the ForceSolve software during the fitting procedure also attracted our attention, specifically regarding the need to remove electrostatic and van der Waals forces before the FM fitting. For neutral compounds, ForceSolve is able to remove all non-bonded forces internally. For charged compounds however, ForceSolve will arbitrarily neutralize the charge and therefore produce the wrong electrostatic forces. Thus, electrostatic forces should be removed prior to invoking ForceSolve, and then the force-matching should be performed with all partial charges set to 0. Whether the electrostatic forces are removed prior to invoking ForceSolve or handled by ForceSolve internally should not affect the fit. Unfortunately, it seems that intramolecular fitting results are better whenever ForceSolve is allowed to handle the electrostatic forces, as can be seen by the differences in residuals obtained for force-matching the CB[8] host (see SI, Table S4). Although the average differences is small (\(\sim\) 0.3 kcal/mol\(\cdot\)Å), it will still contribute to less effective parameters for use in the indirect approach.

Lessons learned

A few things become rather clear in retrospective. First, the level of QM/MM theory required to accurately compute binding free energies is not necessarily high. Rather, use of well-suited semi-empirical QM methods can perform fairly well at a fraction of the cost of DFT/ab-initio QM methods. Going forward, efforts towards employing SQM methods, such as PM6-D3H4 and GFN-2, and their furthering development into relevant software (e.g., including GFN-2 support into Psi4, and interfacing Psi4 with CHARMM) will be central to future challenges.

Better methods for computing the QM/MM correction, \(\varDelta {A}^\mathbf{MM \rightarrow \mathbf{QM}/MM }\). In particular, faster and more accessible SQM Hamiltonians open the door to more sophisticated methods for computing free energy differences between different levels of theory. Specifically, use of non-equilibrium approaches for computing the free energy between MM and QM/MM has shown much promise. Methods such as fast-switching non-equilibrium work [78], in conjunction with the Jarzynski equality, will, in principle, guarantee the production of a converged free energy difference (so long as the switching lengths are long enough to bridge disparity between configurational spaces). By combining non-equilibrium approaches with force-matched potentials, the length of non-equilibrium switching simulations required to converge free energy differences between levels of theory can be drastically reduced [37].

Force-Matching procedure can be improved The poor overlap between MM and QM/MM is the biggest obstacle to the mainstream use of the indirect approach to QM/MM free energy. The use of force-matching seeks to address this problem by providing an improved “launching-point” for performing one-sided free energy calculations between levels of theory, but critical consideration is required to correctly ascertain FM parameters. The largest limitation to FM based approaches is the underlying sampling of the training set. Specifically, it is important to ask how representative of the target space the sampled configurations are. Although we have shown that our force-matching methods generally improved our FEP calculations, it stands to reason that classical gas-phase ensembles may be a poor training choice for improving overlap with QM/MM aqueous/bound state phase space. Thus, we seek to shift our force-matching to train on simulations of condensed phase systems in future work.

We also seek to explore the fitting of partial charges in future challenges. Potentially, charge assignment can be provided in a somewhat arbitrary, yet uniform manner (as was the case for the above force-matching procedure). Alternatively, partial charges could be assigned as a pre-step of the force-matching process, where charge populations are evaluated at the same time forces are computed on the relevant training set (e.g., taking the average of the partial charges on all snapshots). The fitting of partial charges in force-matching is a non-linear fit, which not only slows the procedure, but is unclear as to whether it can provide improvements for the use of FM in indirect QM/MM approaches [79].

And finally, as demonstrated, highly expensive QM/MM levels of theory are not necessarily required to produce excellent binding free energies. Using computationally less demanding SQM methods paves the way for using force-matching approaches on larger, more complex molecules of interest (e.g., generating FM potentials for both host and guest, and performing QM/MM corrections for host, guest, and host–guest complex).

Better MM is needed to pair with better QM An important concern regarding QM/MM approaches in binding free energies is how well the QM level of theory works with the classical force field. Often times, more sophisticated QM approaches are out-performed by simpler QM methods (e.g., semi-empirical QM, BLYP, etc.) due to beneficial “error cancellation” (e.g., TIP3 water tends to be over-polarized, whereas BLYP tends to underestimate polarization) [77]. This highlights the unfortunate truth that often, the classical MM description is the limiting factor for the accuracy of QM/MM calculations. With the goal of employing more sophisticated QM Hamiltonians in QM/MM descriptions, it is only natural to turn ourselves to polarizable molecular mechanics [80], which represent one more step to close the gap between MM and QM/MM.


In this SAMPL challenge, we aimed to evaluate how well our indirect QM/MM free energy schemes performed on a blind dataset (particularly as QM/MM methods have historically not performed well in previous iterations of the SAMPL challenges). Our submissions showed how semi-empirical methods (GFN2, PM6-D3H4) can successfully provide accurate results at relatively low computational expense. The bridging capability of the indirect scheme coupled with force-matching procedure shone brightest at this intermediate level, allowing the correction scheme to reduce the RMSE of the binding free energies by more than 30%.

This work also sheds light on some of the more nuanced aspects of force-matching for indirect QM/MM free energies, and parameterization in general. In particular, adjusting torsional parameters in force-matched descriptions should be performed with great trepidation, as bad dihedral terms can still provide reasonable residuals at fit time, but skew overlaps with the targeted level of theory. It also illuminated the need for a critical evaluation of “best-practices” for force-matching with the goal of improving indirect QM/MM calculations in condensed phase systems.

And finally, the dominance of SQM Hamiltonians in our results highlights the feasibility of more sophisticated methods for computing free energy differences between levels of theory. Specifically, the ease and expedience with which modern SQM methods can be performed perfectly lends itself to the use of non-equilibrium methods for computing free energy between levels of theory. Pairing better one-sided approaches with enhancements to our force-matching methods will only yield improvement in the future.