Introduction

For more than a decade the SAMPL blind prediction challenges (Statistical Assessment of Modeling of Proteins and Ligands) [1] represent an optimal testbed for evaluating and optimizing the performance of computational models to predict experimental reference data. Our group participated in the past in a number of challenges on small molecule physicochemical properties, starting with SAMPL2 on tautomerization free energies in water [2], SAMPL5 on cyclohexane-water distribution coefficients (log D7.4) [3], SAMPL6.1 on aqueous pKa values [4], and SAMPL6.2 on octanol–water partition coefficients (log P) [5]. The methodology employed throughout was the embedded cluster reference interaction site model (EC-RISM) developed by us on the basis of combining three-dimensional (3D) RISM theory [6,7,8] as a solvation model with quantum–mechanical (QM) calculations [9]. This computational model allows for the calculation of Gibbs energies of species in solution that can be combined in thermodynamic cycles to yield derived quantities such as the previous SAMPL challenge targets mentioned above. The challenges themselves triggered further development of the model in terms of identifying and optimizing methodical details throughout the history, as has been discussed in broad detail in a recent overview paper [10]. Briefly summarizing the key results, we expect a pKa accuracy on the order of 1 and octanol–water log P accuracy below 1 pK units. log D7.4 values at the pH given as subscript have only been computed thus far for cyclohexane-water distributions, yielding expected errors on the order of 2 pK units, despite considerably better performance of the underlying pKa and log P models. This finding even holds for a re-evaluation of the older SAMPL5 dataset with the most highly optimized EC-RISM setup, giving rise to speculations about fundamental inconsistencies of the computational representation of experimental reality [10]. These issues have not been resolved yet as related but methodically different QM-based log D models typically exhibit similar error margins.

The latest SAMPL7 physical property challenge [11] represents a continuous further development, as participants were this time asked to predict both aqueous pKa values similar to SAMPL6.1 and octanol–water partition coefficients, log P, as during SAMPL6.2. Both quantities could be combined in the usual way to compute log D7.4 values. Experimental reference data on these quantities have been provided after the submission deadline although these were not part of the challenge. Based on our earlier experiences we decided to essentially apply our established models from SAMPL6.1 and 6.2 [4, 5]. Slight variations to be described below were not projected to influence the expected performance. As will be demonstrated, the performance of the acidity model even surpassed expectations while the partition coefficient results were significantly worse than found before for both training and SAMPL6.2 test set data, merging to an overall still satisfactory result for log D7.4 predictions. This inspired us in the post-submission phase to generate a new set of conformations to be tested as a potential source of uncertainty. Results of the original submission and the variation including consensus values are discussed in the following, also in comparison with data from other participants who submitted both pKa and log P predictions.

Computational details

As the RISM solvation Gibbs energy parametrizations for water and octanol as well as the optimized pKa model were taken from previous SAMPL challenges [4, 5] (with one minor adjustment for octanol described below), we here focus on comparing the different schemes for generating conformations of the challenge compounds that had been employed in the past.

For the submission, the workflow originally developed during the SAMPL5 challenge was applied to all microstates, including the additional relevant microstates complementing the set during the submission phase [1, 3, 10]. For each individual microstate, 200 conformations were generated starting from the original structures with the EmbedMultipleConfs utility of RDKit [12, 13]. If the molecule contained fewer than 7 rotatable bonds only 50 conformations were generated instead to reduce the computational cost for compounds with less conformational degrees of freedom. All conformations generated this way were optimized using the antechamber tool of the Amber12 suite [14], parametrized with AM1-BCC charges and GAFF version 1.7 parameters for bonded and non-bonded terms [14,15,16,17]. Solvation in water and octanol was simulated using an ALPB implicit solvation model with dielectric constants of 78.5 for water and 9.86294 for octanol, yielding two separate sets of 50 or 200 conformations each [18]. After the optimization an energy-filtered structural root mean square differences (RMSD) based clustering was applied to reduce the number of conformations to a more manageable number. Structures with a force field energy 20 kcal mol−1 above the apparent global minimum structure of a given microstate were discarded, with the minimum structure seeding the first cluster. All other conformations were then compared to the minimum structure in the order of increasing force field energies by using the GetBestRMS function of RDKit to calculate the RMSDs. If a structure had an RMSD of less than 0.5 Å it was discarded, while structures with a larger RMSD were added as additional cluster representatives. The resulting cluster representatives were optimized quantum-chemically using the B3LYP/6–311 + G(d,p)/IEFPCM level of theory implemented in Gaussian 16 Rev. C.01 [19]. After the quantum-chemical optimization another purely RMSD-based clustering using a cutoff of 0.5 Å was employed to remove conformations that reached the same minima during optimization. Up to five conformations with the lowest quantum-chemical energy were used in EC-RISM calculations to determine the Gibbs energy in solution per microstate by computing a partition function average. The compounds’ microstate Gibbs energies in the respective solvents \(G_{t}^{{{\text{sol}}}}\) was computed with the approach used in the SAMPL6 log P challenge by taking the sum of the electronic energy of the polarized wave function \(E_{tc}^{{{\text{sol}}}}\) and the corrected excess chemical potential \(\mu_{{tc{\text{,corr}}}}^{{{\text{ex}}}}\) of all conformations c per microstate t as

$$G_{t}^{{{\text{sol}}}} = - \beta^{ - 1} \ln \sum\limits_{c} {\exp [ - \beta (E_{tc}^{{{\text{sol}}}} + \mu_{{tc{\text{,corr}}}}^{{{\text{ex}}}} )]}$$
(1)

with \(\beta = (RT)^{ - 1}\) representing an inverse temperature. Detailed descriptions of how the electronic energies and excess chemical potentials are calculated and the specific corrections used for water and octanol can be found in previously publicized works [3,4,5]. The partition coefficient then follows from

$$\log P = - \frac{{\Delta_{{{\text{trans}}}} G^{0} }}{RT\ln 10} = \frac{{G_{{{\text{wat}}}}^{0} - G_{{{\text{oct}}}}^{0} }}{RT\ln 10}$$
(2)

with

$$G^{{0,\{ {\text{wat}}|{\text{oct}}\} }} = - \beta^{ - 1} \ln \sum\limits_{t} {\sum\limits_{c} {\exp [ - \beta (E_{tc}^{{{\text{sol,}}\{ {\text{wat}}|{\text{oct}}\} }} + \mu_{{tc{\text{,corr}}}}^{{{\text{ex,}}\{ {\text{wat}}|{\text{oct}}\} }} )]} }$$
(3)

After the original submission, the conformer generation approach used during the SAMPL6 challenges was also applied to the microstates of the SAMPL7 challenge to investigate if another set of conformations yields different results [4, 5]. In this case we generated the initial structures for QM optimization by using a force field-based sampling procedure. Structures of each microstate were taken as SMILES strings provided by the organizers. The flipper utility that is part of Omega [20] was used to perform a full enumeration of stereoisomers (i.e. generation of both formal E/Z isomers in cases they were not specified in the SMILES string), and initial 3D coordinates were generated using Corina [21]. For compounds bearing a sulfoxide moiety, additional stereoisomers with inverted chirality at the sulfur atom were added manually. The subsequent conformational analysis of all states was performed using Maestro 12.5 and Macromodel 12.9 as included in the 2020–3 release of the Schrödinger software suite [22]. Default parameters were used unless stated otherwise. We used the mixed torsional/low-mode conformational search algorithm and employed the OPLS3 force field in conjunction with an implicit water model. Conformational search up to a maximum of 1000 steps was carried out with 100 steps per rotatable bond present in the microstate. For saving conformations an energy window of 5 kcal mol−1 was used and redundant conformations were eliminated based on a RMSD cutoff of 1.5 Å. All resulting microstate conformations were forwarded to QM-based geometry optimization on the B3LYP/6–311 + G(d,p)/IEFPCM level of theory, and again up to 5 highest-ranking (lowest free energy) structures were selected for further processing by EC-RISM. Unlike the RDKit-based workflow employed for submission where different conformational sets for water and octanol were obtained und reoptimized, the sampling approach yielded only one set of conformations representative for water while final structural ensembles again differed slightly between solvents due to IEFPCM optimization mimicking the respective water and octanol environments.

For the EC-RISM calculations similar settings and solvent susceptibilities to those used in the SAMPL6 log P challenge were employed here to calculate the Gibbs energies of the compounds in solution, with one minor adjustment already pointed out as a perspective in our SAMPL6.2 paper [5]. Here, the water-saturated octanol solvent susceptibility was generated using the experimental number densities of 1.3598·10–3 Å−3 for water and 3.65787·10–3 Å−3 for octanol sites, and a dielectric permittivity of 8.41. As discussed in the original paper this is not expected to lead to significant deviations from the original water-saturated octanol model. Parametrization results and slightly changed resulting parameters for correcting the RISM excess chemical potential are shown in Fig. S1 and Table S1 in Online Resource (OR) 1. The 3D RISM calculations were conducted utilizing the PSE-2 closure [23] for water and the PSE-1 (Kovalenko-Hirata) closure for octanol. The RISM equations were solved on a cubic periodic grid of fixed size consisting of 1283 grid points and 0.3 Å grid spacing. The partial molar volumes entering the free energy correction expression [5] were calculated with the experimental compressibility of 0.761·10–9 Pa−1 for octanol and the 1D RISM estimate of the isothermal compressibility of 0.717062·10–9 Pa−1 for water [18, 24] from the total correlation function route. All EC-RISM calculations were done using the MP2/6–311 + G(d,p) level of theory within Gaussian 09 Rev. E.01 [25] using exact electrostatics taken directly from the wave function [4]. As in previous works, a more recent version of Gaussian was used for optimizations to take advantage of performance improvements [3, 5].

Aqueous pKa values were calculated from the optimized model developed in our SAMPL6.1 publication [4] for each pair of microstates separated by one unit charge difference and transformed, along with tautomer Gibbs energy differences, to the standard reaction free energy format required by the organizers by referring to a specific microstate reference [11]. As will be shown elsewhere in the SAMPL7 overview paper [26], the transformation from microstate pKa values (or corresponding standard reaction free energies) to the macrostate pKa values is equivalent to the “state transition” (ST) formalism analyzed by us recently [27, 28], so these values were submitted along with the microstate standard reaction free energies from microstate-specific Gibbs energies calculated according to Eq. (1). In the following we also compare these results to the “partition function” (PF) approach [27] using the same input data for state Gibbs energies. Gas phase energies were not needed, neither for pKa nor for log P calculations, as these cancel exactly because the gas phase ensembles of compounds evaporating from the water or the octanol phases are identical [10]. Finally, log D7.4 predictions were derived from calculated pKa and log P data in the usual way [3, 10].

Results and discussion

General outline and pK a predictions

We not only present our own data but also try to put the results into context by comparison with other participants. Here we chose only those submissions for which the final quantity, log D7.4 could in principle be calculated, i.e. challenge contributions containing both, ranked pKa and log P predictions. Without going into methodical detail, the following 5 submissions satisfied the conditions, termed according to the submission nomenclature (1) “MD (CGenFF/TIP3P)|Gaussian_corrected”, (2) “TFE-SMD-solvent-opt|DFT_M06-2X_SMD_explicit_water”, (3) “TFE-NHLBI-TZVP-QM|TZVP-QM”, (4) “TFE IEFPCM MST|IEFPCM/MST”, (5) “TFE b3lypd3|DFT_M05-2X_SMD” [11]. The first part in front of the pipe symbol refers to the log P model, the second to the pKa approach. Accordingly, our own models are termed (0) “EC_RISM_wet|EC_RISM”. As outlined in the preceding section, besides data from the original structure set (“orig”) we also report results from the new set of geometries (“new”) separately and from a combination (“comb”) by simply augmenting the microstate partition function with the new energies, ignoring the possibility of duplicates. In the following analysis of acidity constants, the state transition approach [27] was used for deriving macroscopic pKa values from submitted free energies throughout for all submissions.

All pKa models agreed in the choice of the relevant ionization state change related to the observed macroscopic pKa values, going from a neutral acid to a negatively charged base, which greatly simplified the analysis. Transitions from charged acids were accompanied throughout by negative pKa predictions and could be ignored for comparison with experiment. Results for macroscopic acidity constants are shown in Table 1 and Fig. 1 with individual compound data summarized in Table 2. Apparently, EC-RISM outperformed other methods, exceeding expectations from earlier challenges and the training set performance (ca. 1 pK unit RMSE) with a submission RMSE of 0.72 pK units. High correlation measured by R2 and a regression slope near one, small absolute and signed errors indicate an overall robust model. The new set of conformations performed slightly inferior, though still in line with the metrics of the original set and not overlapping with prediction statistics of other models. Somewhat unexpectedly it turned out that the combined set of conformations did not lead to improvement. This means that the new conformations do not fully overlap with the old ones but add some new low-energy structures to the partition function that yield larger deviations in terms of their pKa performance. The only conclusion at this point is that the observed discrepancy between different conformation sets can be taken as a measure of model uncertainty (not to be confused with expected prediction uncertainty).

Table 1 Statistical metrics for predicted acidity constants pKa (root mean square error RMSE, mean absolute error MAE, mean signed error MSE, slope m′, intercept b′, and coefficient of determination R2 from descriptive regression) using EC-RISM and the other models discussed in this work
Fig. 1
figure 1

Macroscopic acidity constants for the SAMPL7 set calculated using EC-RISM (A) and other models discussed in this work (B) with different sets of conformations in panel (A) encoded by symbol and line colors, original: black, (0) EC_RISM (orig); new: orange, (0) EC_RISM (new); combined: green, (0) EC_RISM (comb). The model comparison in panel (B) is color-coded as black: (0) EC_RISM (orig), orange: (1) Gaussian_corrected, green: (2) DFT_M06-2X_SMD_explicit_water, yellow: (3) TZVP-QM, magenta: (4) IEFPCM/MST, blue: (5) DFT_M05-2X_SMD. Two data points of submission (2) DFT_M06-2X_SMD_explicit_water are not shown in panel (B) as they lie far outside the experimental range. Dashed lines indicate descriptive linear regression results. Raw data are provided as OR3 for structures and OR4 for energies. Macroscopic pKa values for other participants’ models were taken from the SAMPL7 repository [11] and are additionally collected in OR7

Table 2 Experimental and calculated data for individual compound pKa values from the different EC-RISM-based approaches

Individual compound data in Table 2 further illustrates the prediction balance, with the largest deviation between prediction and experiment on the order of 1.6 pK units found for SM34 and SM39. For completeness, we there also show results from applying the partition function approach [27] which – as expected – only marginally differs from the state transition results.

log P and log D 7.4 predictions

Given the successful application of the EC-RISM model to octanol–water phase partitioning of neutral compounds during SAMPL6.2 [5] (training and test set RMSEs of ca. 1.5 and 0.5 pK units), we expected similar performance for the SAMPL7 compound set. However, numbers reported in Tables 3 (statistical metrics) and 4 (individual compound data) and illustrated in Figs. 2 and 3 for log P and log D7.4, respectively, show a satisfactory, yet worse than expected overall result. With a log P RMSE for the original conformations of 1.84 pK units the upper limit of our expectation was slightly exceeded, and the non-zero MSE and regression intercept indicates a systematic trend to overestimate log P values, which has not been observed with our models before. Adding new conformations here somewhat improves the results, unlike the pKa case, but not to an extent that we would assume to have pinpointed the origin of the discrepancies. It is possible that the specific chemistry of the SAMPL7 set is so different from earlier datasets tested that our model development is not yet robust enough to capture very diverse systems. One candidate for deeper investigation is the element sulfur which is not well represented in our reference datasets and which could have implications for the chosen theoretical level of theory, most likely the basis set.

Table 3 Statistical metrics for partition (log P) and distribution coefficient predictions (log D7.4) (root mean square error RMSE, mean absolute error MAE, mean signed error MSE, slope m′, intercept b′, and coefficient of determination R2 from descriptive regression) using EC-RISM and the other models discussed in this work
Fig. 2
figure 2

Partition coefficients for the SAMPL7 set calculated using EC-RISM (A) and other models discussed in this work (B) with different sets of conformations in panel (A) encoded by symbol and line colors, original: black, (0) EC_RISM_wet (orig); new: orange, (0) EC_RISM_wet (new); combined: green, (0) EC_RISM_wet (comb). The model comparison in panel (B) is color-coded as black: (0) EC_RISM_wet (orig), orange: (1) MD (CGenFF/TIP3P), green: (2) TFE-SMD-solvent-opt, yellow: (3) TFE-NHLBI-TZVP-QM, magenta: (4) TFE IEFPCM MST, blue: (5) TFE b3lypd3. Dashed lines indicate descriptive linear regression results. The signs of log P predictions for model (3) TFE-NHLBI-TZVP-QM have been inverted as accidentally the wrong reaction direction has been submitted. The log P values for other participants’ models were taken from the SAMPL7 repository [11]. Raw data are provided as OR5 for structures and OR6 for energies and are additionally collected in OR7

Fig. 3
figure 3

Distribution coefficients for the SAMPL7 set calculated using EC-RISM (A) and other models discussed in this work (B) with different sets of conformations in panel (A) encoded by symbol and line colors, original: black, (0) EC_RISM_wet (orig); new: orange, (0) EC_RISM_wet (new); combined: green, (0) EC_RISM_wet (comb). The model comparison in panel (B) is color-coded as black: (0) EC_RISM_wet (orig), orange: (1) MD (CGenFF/TIP3P)|Gaussian_corrected, green: (2) TFE-SMD-solvent-opt|DFT_M06-2X_SMD_explicit_water, yellow: (3) TFE-NHLBI-TZVP-QM|TZVP-QM, magenta: (4) TFE IEFPCM MST|IEFPCM/MST, blue: (5) TFE b3lypd3|DFT_M05-2X_SMD. Dashed lines indicate descriptive linear regression results. One data point of submission (2) TFE-SMD-solvent-opt|DFT_M06-2X_SMD_explicit_water is not shown in panel (B) as it lies far outside the experimental range. The signs of underlying log P predictions for model (3) TFE-NHLBI-TZVP-QM|TZVP-QM have been inverted as accidentally the wrong reaction direction has been submitted. Calculated data are provided as OR7

Compared to the other log P models analyzed in this work our results rank average, with the best performing model (4) yielding an RMSE of ca. 1 pK unit. However, all models analyzed, including our own, show very little degree of correlation measured by R2, despite relatively reasonable regression slopes. This can be clearly traced back to a number of substantial outliers (e.g. SM42, SM43, see Table 4), for which there is no apparent explanation. The RMSE-wise best model (4) yields even a smaller value for this metric than ours, hinting at the possibility that chance plays a large role for obtaining good results.

Table 4 Experimental and calculated data for individual compound log P and log D7.4 values from the different EC-RISM-based approaches. “log D7.4,exp (indirect)” denotes numbers reconstructed from experimental log P and pKa values

Results from log D7.4 predictions are slightly better, being even below our expectation of more than 2 pK units deviation with an RMSE of 1.69 pK units, ranking second (by a very small margin to the third) in the field of challenge participants with the best model (4) reaching 1.27. Here, adding new conformations again slightly worsened results due to weaker performance already observed for pKa values. Scatter is, however, still large, so it is not possible to draw some general performance conclusions for this small and chemically focused dataset. One trend is obvious: Physics-based models such as those analyzed and compared in this work, that perform reasonably well and balanced in different prediction domains, will also perform well in combined model problems such as log D7.4 predictions. Still, log D7.4 remains a challenging property to be examined further in order to understand and improve model weaknesses. There is also room for improvement on the experimental side. We noted in some cases (see Table 4) that originally measured and reconstructed log D7.4 from pKa and log P differ. Although there is apparently no correlation with prediction performance or failure, this could at least stimulate questions to further converge computational representations to match experimental reality.

Concluding discussion

The most remarkable finding in this work is that apparently different conformational search or sampling strategies even for rather small molecules like those of the SAMPL7 set yield quite different results. Time did not permit deeper analysis of individual conformations, but it is clear that extended effort is needed for developing more consistent conformational sampling workflows. It is very likely that the problem originates already from the initial force field sampling stage as further QM-based optimization including a solvation model did not yield converged conformational ensembles.

However, our results show that conformational uncertainties alone are not responsible for the observed errors in thermodynamic quantities, which in our case imply an overestimated hydrophobicity. For water, results appear to be more reliable than for octanol, despite our earlier findings during SAMPL6.1 and SAMPL6.2 from which we expected better performance for log P than for pKa predictions. In light of the different chemistry of SAMPL7 compared to SAMPL6 compounds, this hints at a possibly problematic description of sulfur-octanol interactions which could be related to the QM level of theory and/or sulfur-octanol dispersion interactions that are not modeled by first principle methods but by empirical Lennard–Jones terms. In the SAMPL7 challenge each compound contains a sulfone moiety whereas this functional group is represented by only one single MNSOL database entry, (sulfolane, test2027). This compound was predicted with an error of 4.83 kcal mol−1 for octanol, the largest in the entire training set [5]. For water the error is only 3.63 kcal mol−1, so it is likely that the error cancellation within the same solvent, as seen for the acid/base pair within pKa predictions, does no longer apply for transfer free energies between different solvents. However, more solvent-specific experimental data, such as solvation free energies are necessary to confirm this hypothesis.

Another remarkable observation is that log D values taken directly from experiment or from a reconstruction based on experimental acidity and partition coefficients do not yield identical numbers in all cases. In cases where the two approaches differ significantly, i.e. for SM25, 26, 41–43, the reconstructed distribution coefficient is smaller, i.e. more negative than the direct measurement. This means that possibly a higher amount of the compound is dissolved in the aqueous phase than expected from neutral state partitioning alone if we take the reconstructed data as correct. If we, however, accept the direct experimental result then the opposite conclusion would emerge, namely that a larger compound fraction is dissolved in the organic phase. In other words, this could be interpreted as a missing contribution of charged species in the organic phase in our calculations where, via the standard formula for converting log P to log D, the presence of charged microstates in the nonaqueous phase is by definition excluded. This statement should in any case be viewed with caution as a range of alternative explanations could come into play, such as aggregation, nonideality effects due to insufficient dilution and so forth. However, observed inconsistencies are again a source and stimulus of deeper analysis including the correct agreement between experimental reality and its computational model representation.