The submitted values for the hydration free energies are given in Table 1. A complete listing of RMS, mean unsigned, and mean signed errors is given in Table 2. This table is ordered by submission number. Table 3 is ordered by MUE and also gives the paired t-test p-values for method comparisons. The errors and p-values will be referred to in this section.
Discussion of single-conformer AM1BCC/PBSA methods
The single-conformer AM1BCC/PBSA methods were among the best performers overall. Only an expensive quantum mechanical method with post-processing of functional groups  had a better MUE  than FreeformSolv. The MUEs for FreeformSolv, FreeformSolvNoSym, and OmegaZap were 0.94, 0.98 and 1.08 kcal/mol, respectively. All three of these methods were statistically indistinguishable even when accounting for the paired data. There was a surprisingly low paired t-test p-value of 0.33 for the FreeformSolv and FreeformSolvNoSym calculations. This is lower than the p-values that FreeformSolv and FreeformSolvNoSym have with OmegaZap, which are 0.41 and 0.57, respectively, even though there are larger differences in MUE between the methods. All of these p-values are too large to reject the null hypothesis of equivalence with statistical certainty.
The MSEs for FreeformSolv and FreeformSolvNoSym were 0.34 and 0.48 kcal/mol, respectively. The MSE for OmegaZap was −0.22, although the uncertainty of ±0.24 does cause the range to overlap zero. As was discussed in the Methods section, the FreeformSolv methods intentionally search for the best gas phase conformer while the OmegaZap intentionally searches for a suitable solution phase conformer. These methods are virtually identical once a single conformer has been chosen; therefore, we conclude that the choice of conformer is the primary cause for the difference in signed errors between these methods.
Causes of outliers in single-conformer methods
The RMS errors for the single-conformer methods were quite large considering the excellent MUEs. The RMS errors for FreeformSolv, FreeformSolvNoSym, and OmegaZap were 1.23, 1.30 and 1.58 kcal/mol, respectively. These relatively large RMS values were primarily caused by a few catastrophic failures. Due to the similarities between FreeformSolv and FreeformSolvNoSym, for convenience we focus further discussion only on FreeformSolv and OmegaZap.
The worst error of these two single-conformer methods was for 1-amino-4-hydroxy-9,10-anthraquinone (SAMPL4_051) with the OmegaZap method. The experimental hydration free energy for this compound was measured to be −9.53 kcal/mol and the OmegaZap result was −16.38 kcal/mol, yielding an error of −6.85 kcal/mol. By contrast, the FreeformSolv result was −10.94 kcal/mol, yielding a much better error of −1.41 kcal/mol. The cause of the large difference in errors is readily apparent when visualizing the OmegaZap and FreeformSolv structures, as shown in Figs. 1 and 2. In the FreeformSolv structure, the hydroxyl rotor is oriented towards the carbonyl in order to make a strong electrostatic interaction, whereas the OmegaZap structure has the hydroxyl rotor oriented away from the carbonyl. The OmegaZap hydroxyl position is an unfortunate consequence of the conformer selection methodology. The disabling of electrostatics causes the very favorable gas-phase conformer to be ignored in favor of the easily solvated conformer. In fact, the conformer selection methodology did exactly what it was designed to do and found a conformer that was over 5 kcal/mol more negative in solvation energy; however, this came at too high a cost in intramolecular electrostatic energy and adversely affected the prediction.
This effect was analyzed using the MMFF94 force field by decomposing the total energy into elastic strain and electrostatic terms. The gas-phase MMFF94 energy differences for FreeformSolv and OmegaZap are given in Table 4 for all of the molecules being discussed in this section. The first data column is the elastic strain energy difference, i.e. stretching and bending terms, torsions, and the van der Waals’ interaction. The second data column is the Coulomb interaction alone. For 1-amino-4-hydroxy-9,10-anthraquinone, the structures have very similar elastic strain energy. However, the Coulomb interaction for this molecule favors the FreeformSolv conformer by an enormous 13.36 kcal/mol in the gas-phase. The OmegaZap conformer was more favorable by 5.44 kcal/mol in solvation energy, but that is clearly not enough to break the strong intramolecular electrostatic interaction.
The next largest error was for mannitol (SAMPL4_001) with the FreeformSolv method. The experimental hydration free energy for this compound was measured to be −23.62 kcal/mol and the FreeformSolv result was −19.40 kcal/mol, yielding an error of 4.22 kcal/mol. The OmegaZap result for this compound was −24.30 kcal/mol, yielding a very small error of −0.68 kcal/mol. Once again, the cause of the large difference in errors is readily apparent when visualizing the OmegaZap and FreeformSolv structures, as shown in Figs. 3 and 4. The FreeformSolv structure is much more compact in order to form a chain of hydrogen bonds, whereas the OmegaZap structure is significantly more extended.
As shown in Table 4, the FreeformSolv conformer has more favorable electrostatics by an enormous 14.91 kcal/mol, very similar to the 13.36 kcal/mol in the previous molecule. Yet unlike 1-amino-4-hydroxy-9,10-anthraquinone, the calculated solvation energy for the OmegaZap conformer is very close to the experimental value rather than far too negative. While the FreeformSolv conformer has higher elastic strain energy by 2.33 kcal/mol, it is still 12.58 kcal/mol more favorable when including all MMFF94 terms. Despite the OmegaZap result’s excellent agreement with experiment, we do not have physics-based evidence to claim that the conformer used to model the transfer energy is actually the dominant conformer in solution.
There is no obvious explanation for the poor performance of the FreeformSolv conformer on this molecule. A similar molecule in the SAMPL2 challenge, glucose, was also problematic for this method . One contribution to the large error in glucose was the lack of hydrogen sampling. This resulted in a hydroxyl rotor forming an inferior electrostatic interaction, which in turn yielded a structure that was not the optimal gas-phase conformer. Hydrogen sampling has since been added to the OMEGA algorithm, and enabling this option retrospectively with FreeformSolv has shown that the hydroxyl rotors of mannitol had been oriented optimally. While the OmegaZap conformation is unreasonably high-energy when accounting for all force field terms, it is possible that other well-solvated conformations exist which are low in total energy. Data is sparse for highly flexible molecules such that it is not well known under what circumstances the single-conformer model will break down. Further research is required in order to understand why this method has large errors for sugars.
The third and final compound with an unsigned error greater than 3 kcal/mol for either method is 2-hydroxybenzaldehyde (SAMPL4_035). The experimental hydration free energy for this compound was measured to be −4.68 kcal/mol and the OmegaZap result was −9.09 kcal/mol, yielding an error of −4.41 kcal/mol. The FreeformSolv result for this compound was −6.32 kcal/mol, yielding a much better error of −1.64 kcal/mol. The OmegaZap and FreeformSolv structures for this compound are shown in Figs. 5 and 6. In a similar fashion to 1-amino-4-hydroxy-9,10-anthraquinone, the OmegaZap method ignores the very favorable hydroxyl-carbonyl electrostatic interaction and selects a conformer with a more favorable solvation energy.
The MMFF94 energy comparison does not favor FreeformSolv as lopsidedly as the previous two molecules. The FreeformSolv conformation is only more favorable in the gas phase by 3.39 kcal/mol when accounting for all of the force field terms. When considering the favorable solvation effects of the OmegaZap conformer, it is possible that this conformer is actually the dominant species in solution. However, the penalty paid in intramolecular electrostatic energy was not accounted for in the OmegaZap method, thus leading to an error of −4.41 kcal/mol.
Recognition of likely failures
Detecting and correcting the likely failure cases would have a substantial impact on overall results. If we would have recognized the mannitol failure in FreeformSolv and simply used the OmegaZap result for that compound, the RMS error would have improved from 1.23 to 1.07 kcal/mol. Recognizing the two OmegaZap failures discussed in the previous section and using the FreeformSolv results would have improved the RMS error nearly one-half kcal/mol, from 1.58 to 1.09 kcal/mol.
We have discovered that difference between OmegaZap and FreeformSolv is a good candidate for detecting catastrophic failure cases. Vastly different results for the OmegaZap and FreeformSolv methods indicate that there is a strong conformational dependence on the solvation energy. The conformational dependence is likely caused by strongly interacting polar groups, although it may also be possible that shielding of solvent by nonpolar groups could also cause this effect in large, flexible molecules.
The signed errors of OmegaZap and FreeformSolv along with the absolute value of the differences of the methods is plotted in Fig. 7. The three largest differences between the methods correspond to the three failure cases discussed in the previous section. For the SAMPL4 set, the largest error where OmegaZap and FreeformSolv did not substantially disagree is 2.62 kcal/mol. All three errors larger than this displayed substantial differences between the two methods, indicating that the false negative rate for this metric is low. This effect will need to be studied on a much larger sample set before statistically significant conclusions can be determined.
Discussion of single-conformer models
In general, our observation from several SAMPL meetings is that our very simple approach does peculiarly well. It is far from clear why a single low-energy conformation should perform as well as it does when there are other available conformations, although the suggestion has been made that polarization causes internally interacting states to be lower in energy that expected . It seems unlikely that the dispersion term in PBSA, i.e. a single number attached to the accessible area of the molecule, is very accurate. In fact, it is known to be misleading when the Van der Waals interactions between solvent and solute are heterogeneous . Some of this variation is parameterized into the radii—for instance, fluorine probably has a large radius in ZAP9 to decrease its contribution to solvation, hence mimicking the lack of dispersion interactions between water and this halide. However, such artificial adjustments are very crude and cannot hope to capture the realities of solvent–solute interaction the way an all-atom simulation might.
One possible explanation is that the experiments are actually less accurate than believed. If this were so, then methods that have more of the physics of solvent interaction than PBSA could be running up against a ‘glass ceiling’, i.e. they cannot get better than simpler methods because of errors in the experimental measurements. In previous SAMPLs, we have seen instances where reexamination of the literature for examples where both all-atom simulations and PBSA had very large errors led to corrections to the proposed experimental value, such as glycerol in SAMPL1 . Several revisions and deletions were made to the current SAMPL4 dataset due to incorrect compounds, reanalysis of the experimental data, and a mistake in a published table reporting experimental data [1, 2]. We have assumed these problems are unusual and that most experimental values have an experimental error as given. However, at this stage it might be worth challenging this assumption. It is very unfortunate for the field that such measurements are no longer routinely made, but it might be possible to examine the literature for instances of difference between experimental groups, or to investigate correlations in errors between theoretical methods to see if these point to systematic biases in any techniques. We have noticed that larger prediction errors seem to correspond to larger solvation energies that are more difficult to measure. In particular, we would welcome new experimental measurements of the solvation energies of sugars that have appeared in recent SAMPL challenges.