Introduction

The partition coefficient P is a widely-used quantity to understand the transport and distribution of chemicals in biological, industrial and environmental systems [1, 2]. It expresses the relative ability of a solute molecule to dissolve in two different solvents, which are immiscible and in contact at an interface. The base-10 quantity logP is directly related to the Gibbs free energy of transfer \(\Delta G^{\text {transfer}}_{\text{X(B,A)}}\) from solvent A to solvent B using

$$\begin{aligned} -{\rm{logP}}\ln (10) k_{\rm{B}} T= & {} \Delta G^{{\text{transfer}}}_{\text{X(B,A)}} \nonumber \\= & {} \Delta G_{\rm{X(B)}}^{\text{solvation}} - \Delta G_{\rm{X(A)}}^{\text{solvation}} \end{aligned}$$
(1)

where ln(10) is a base conversion factor, \(k_{\rm{B}}\) Boltzmann’s constant, and T temperature. Equation 1 makes clear that logP can also be thought of as a relative solvation free energy of solute X in solvent B, \(\Delta G_{\rm{X(B)}}^{\text{solvation}}\), minus that in solvent A, \(\Delta G_{\rm{X(A)}}^{\text{solvation}}\). Values of logP are relatively straightforward to measure by the “Shake-Flask” method, followed by slow-stirring and reverse phase High Performance Liquid Chromatography [3, 4], and recently, by more accurate methods such as potentiometric titration [5]. Nonetheless, they take time and material to measure, often give highly variable results [6] and provide little insight into values obtained. Thus, there is a valuable role to play for predictive methods of logP which can save time, lower costs, and facilitate the more rational development of new chemicals, especially for the pharmaceutical industry with its long and expensive development times.

There are now a wide range of methods to predict logP, building off methods to calculate solvation free energy. Firstly, there are many knowledge-based [7, 8] and machine-learning methods [9, 10] which draw on the large amount of logP data available in literature. Many continuum solvent models have been developed in combination with electronic-structure methods to calculate solvation free energies, whose difference gives \(\Delta G^{{\text{transfer}}}_{\text{X(B,A)}}\). The most common are the Polarizable Continuum Model (PCM), the series of Solvation Models (SMx), Solvation Model based on Density (SMD), and Conductor-like Screening Model (COSMO) [11, 12]. The most accurate are the COSMO-RS and COSMO-SAC methods, which have the further advantage of being applicable to many types of molecules and solvents [12,13,14], such as the variant COSMOmic to micelles and lipid bilayers [15]. Molecular-mechanics methods, which are faster than electronic-structure methods but more approximate, are better suited to calculate logP in explicit solvent. They consider ensembles of configurations generated in molecular dynamics (MD) simulations, and require the use of a force-field, such as GAFF, GAFF-DC, OPLS-AA or CHARMM, which affects the value of logP [16, 17] but mostly have no other parameters. They are most commonly applied in the alchemical formulation, yielding \(\Delta G^{{\text{transfer}}}_{\text{X(B,A)}}\) from the solvation free energies for decoupling the solute from each solvent. Methods such as exponential averaging, Thermodynamic Integration (TI) and the Bennett Acceptance Ratio (BAR) can all yield accurate results [16,17,18,19,20], even with a coarse-grain force field [21]. Less commonly implemented are formulations that yield the free energy of each system directly, whose difference gives \(\Delta G^{{\text{transfer}}}_{\text{X(B,A)}}\). Two widely used methods in biomolecular studies are the Molecular Mechanics-Poisson Boltzmann Surface Area (MM-PBSA) and its Generalized-Born variant (MM-GBSA) [22, 23], but they have not been used to calculate logP and are not as accurate as electronic-structure methods to reproduce solvation free energies in a range of solvents. More successful approaches to calculate logP from free-energy directly have been the 3D-Reference Interaction Site method (3D-RISM) [24] or grid-based inhomogeneous solvation theory (GIST) [25]. These methods have the advantage of being general for any kind of solvent free energy but still only account for the solvation contribution.

We have developed a general method to evaluate free energy directly from an MD simulation for all molecules in the system, both solvent and solute alike, and over a large range of length scales [26,27,28]. Called Energy-Entropy Multiscale Cell Correlation (EE-MCC), it takes the energy from the simulation energy and evaluates the entropy over a series of units at multiple length scales, either correlated if covalently bonded, or in a mean-field cell if otherwise. Entropy is combined with energy to give free energy. Notably, entropy is calculated from the probability distribution over all quantum states of the system relating to all degrees of freedom of all molecules. MCC has been progressively developed for liquids [26, 27, 29], solutions [30,31,32,33], chemical reactions [34], and proteins [28, 35, 36]. As well as being general, MCC has the advantage of providing a detailed breakdown of entropy over all degrees of freedom of the system. Here we test MCC to calculate logP and understand the values obtained. We test it on a series of 22 N-acylsulfonamide bioisosteric compounds, shown in Fig. 1, in the “Statistical Assessment of the Modelling of Proteins and Ligands” (SAMPL) Physical Properties Blind Challenge.

Fig. 1
figure 1

Structures of the 22 N-acylsulfonamides bioisosters in the SAMPL7 Physical Properties Challenge [37]

As a means to encourage, promote and compare different methods to predict quantities relevant to drug design, such as logP, SAMPL is a series of blind challenges [13, 38,39,40,41,42] whereby the experimental data is made publicly available at the end of the submission period. In SAMPL5 which had the first Physical Properties Blind Challenge [38], the cyclohexane/water distribution coefficient (logD) was challenging to compute for most participants, given that logD depends on logP, protonation state and associated counter-ions. The following SAMPL6 challenges therefore separated the prediction into pKa and logP, which combine to give logD. The top-performing classes of methods were quantum-mechanics, empirical and mixed approaches, while molecular-mechanics results were more variable, given the large differences in simulation protocols. SAMPL7 follows a similar protocol to SAMPL6, and here we will only seek to calculate logP values.

Methods

LogP calculation

The water-octanol partition coefficient logP of solute X is defined in Equation 1 in terms of the transfer Gibbs free energy \(\Delta G^{\text {transfer}}_{\text{X(oct,wat)}}\) of X from water to octanol. In the EE method, \(\Delta G^{\text {transfer}}_{\text{X(oct,wat)}}\) is evaluated as the difference of the Gibbs free energies of each system

$$\begin{aligned} \Delta G^{\text {transfer}}_{\text{X(oct,wat)}} = (G_{\text{X(oct)}} + G_{\text{wat}}) - ( G_{\text{oct}} + G_{\text{X(aq)}} ) \end{aligned}$$
(2)

where X(oct) and X(aq) denote X in octanol or water, and wat and oct denote the respective pure liquid. The Gibbs free energy of each system is calculated using \(G = H - TS\) where H is the enthalpy, S the entropy and T temperature. Energy is calculated directly from the potential and kinetic energies in a molecular dynamics (MD) simulation, ignoring the small pressure-volume term at ambient pressures that in any case almost entirely cancels in the transfer process. Entropy is calculated using MCC [26, 28, 43], explained next.

Multiscale cell correlation (MCC)

Entropy is calculated from MD simulations in a multiscale fashion in terms of cells of correlated units. The total entropy is calculated as a sum of components \(S_{ab}^{cd}\) using

$$\begin{aligned} S = \sum _a^\text {molecule} \sum _b^\text {level} \sum _c^\text {motion} \sum _d^\text {minima} S_{ab}^{cd} \end{aligned}$$
(3)

In this equation, S is calculated for each kind of molecule a, at different length scales b of each molecule, in terms of translational or rotational motion c over all units at that level, and in terms of vibration or topography d for each type of motion.

Molecular entropy

The relevant molecules for a logP calculation are the solutes and the solvents water and octanol. We only consider pure solvents here, neglecting the small dissolution of water in octanol that occurs in experiment. In the solutions only the molecules in the first solvation shell are considered because the entropies of the remaining solvent molecules change little upon solute transfer and because they are not well converged, being over so many molecules. Solvation shells are defined using the Relative Angular Distance (RAD) algorithm [44, 45] based on the center-of-mass of each molecule. In each pure liquid, the same number of solvent molecules is considered as in the solute’s first solvation shell to balance stoichiometry, but the averaging of data is done over all molecules in the pure liquid to give better statistics.

Entropy for each level

For the solutes and octanol, two levels of hierarchy are used: molecule (M) and united atom (UA), where a united atom is each non-hydrogen atom with all its bonded hydrogens as a single rigid body. Water molecules are treated only at the molecule level, which is equivalent to the united-atom level.

Entropy for each type of motion

The axes of a molecule are taken as its principal axes with the origin at the molecular center of mass. All molecules considered here, being non-linear, have three translational and three rotational degrees of freedom. The origin of a united atom is taken as the heavy atom and the axes are defined with respect to the covalent bonds to other heavy atoms [26]. A united atom has three translational degrees of freedom and three rotational degrees of freedom if it is non-linear (\(\ge 2\) hydrogens), 2 if it is linear (one hydrogen), and 0 if it is a point (no hydrogens).

Entropy over minima

The potential energy surface is discretised into energy wells, leading to two contributions: vibrational, related to the average size of energy wells for that unit, and topographical, linked to the probability of each energy well for that unit. Vibrational entropy of each kind of motion and unit is calculated in the harmonic approximation for a quantum harmonic oscillator

$$\begin{aligned} S^{\rm{vib}}=k_{{\rm{B}}} \sum _{i=1}^{N_{\rm{vib}}}\left( \frac{h v_{i} / k_{{\rm{B}}} T}{\rm{e}^{h v_{i} / k_{{\rm{B}}} T}-1}-\ln \left( 1-\rm{e}^{-h v_{i} / k_{{\rm{B}}} T}\right) \right) \end{aligned}$$
(4)

where h is Planck’s constant, \(N_\text {vib}\) is the number of vibrations, and \(v_i\) are the vibrational frequencies, which are derived using

$$\begin{aligned} v_{i}=\frac{1}{2 \pi } \sqrt{\frac{\lambda _{i}}{k_{{\rm{B}}} T}} \end{aligned}$$
(5)

where \(\lambda _i\) are the eigenvalues of the \(N_{\text{transvib}} \times N_{\text{transvib}}\) mass-weighted force covariance matrix for translational vibration or \(N_{\text{rovib}} \times N_{\text{rovib}}\) moment-of-inertia-weighted torque covariance matrix for rotational vibration. Forces and torques are halved in the mean-field approximation except for the UA force covariance matrix [26, 27, 43, 46] because UA correlations are directly accounted for in the molecule reference frame. The six lowest-frequency vibrations for the UA force covariance matrix are removed to avoid double-counting entropy at the molecule level.

Topographical entropy at the molecule level manifests as positional and orientational entropy for translation and rotation. At the united-atom level it is only conformational entropy for translation, because rotational topographical entropy of united atoms is assumed to be negligible due to rigidity, symmetry or strong correlation with the solvent. Positional entropy for a dilute solute in a solvent is calculated by discretising the volume \(V^\circ \) available to the molecule at its concentration by the volume of a solvent molecule \(V_{\text{solvent}}\), giving [30, 31, 47]

$${S^{\text{transtopo}}_{\text{M}} \equiv S^{\text{pos}}} = k_{\text{B}} \ln \frac{V^\circ }{V_{\text{solvent}}} $$
(6)

\(V_{\text{solvent}}\) is taken as the volume of a simulation box of pure solvent divided by the number of solvent molecules, and \(V^\circ \) is taken as the same in both solvents and so cancels for the partition coefficient. Orientational entropy is calculated by discretising the rotational volume of the molecule about its three rotational axes according to the number of molecules in the molecule’s first solvation shell \(N_{\text{c}}\) [26, 27], weighted by the probability \(p(N_{\text{c}})\) of each \(N_{\text{c}}\) using

$$\begin{aligned}&{S^{\text{rotopo}}_{\text{M}} \equiv S^{\text{ or } }}\nonumber \\&\quad =k_{{\rm{B}}} \sum _{N_{{\text{c}}}} p\left( N_{{\text{c}}}\right) \ln \left[ \max \left( 1,\left( N_{{\text{c}}}^{3} \pi \right) ^{1/2} / \sigma \right) \right] \end{aligned}$$
(7)

taking the maximum ensures that the number of orientations is at least 1, and \(\sigma \) is the symmetry number of the molecule, taken as 1 for octanol and the 22 solutes and 2 for water. First-shell molecules are defined using the RAD algorithm [44, 45] as used before when defining the solvent affected by the solute. For water, an additional factor of 1/4 is included inside the logarithm of Equation 7 to account for correlations arising from hydrogen-bond directionality [26]. Conformational entropy is calculated using

$$\begin{aligned} {S^{\text{transtopo}}_{\text{UA}} \equiv } S^{\text{ conf }} = k_{{\rm{B}}} \sum _{i} \lambda _{i} \ln {\left( \frac{1}{\lambda _{i}}\right) } \end{aligned}$$
(8)

where \(\lambda _i\) are the eigenvalues of a \(N_{\text{conf}} \times N_{\text{conf}}\) correlation matrix of conformations [27]. \(N_{\text{conf}}\) is the number of conformations over all flexible dihedrals in the molecule involving united-atoms, whose number ranged from 3 to 6 for the solutes. Conformations for each flexible dihedral are defined from the maxima in their probability distribution. The correlation matrix accounts for correlations between different dihedrals within the same molecule.

Assembling all these terms, Equation 3 written in full for total entropy of the water solutions up to the first solvation shell of solute X becomes

$$\begin{aligned}&S_{\text{X(aq)}} = S_{\text{X,M}}^{\text{transvib}} +S_{\text{X,M}}^{\text{rovib}} + S_{\text{X}}^{\text{pos}} + S_{\text{X}}^{\text{or}} + S_{\text{X,UA}}^{\text{transvib}} +S_{\text{X,UA}}^{\text{rovib}} \nonumber \\&\quad + S_{\text{X}}^{\text{conf}} +N_{\text{c,X}} \left( S_{\text{wat,M}}^{\text{transvib}} +S_{\text{wat,M}}^{\text{rovib}} + S_{\text{wat}}^{\text{or}} \right) \end{aligned}$$
(9)

and for octanol solutions

$$\begin{aligned}&S_{\text{X(oct)}} = S_{\text{X,M}}^{\text{transvib}} +S_{\text{X,M}}^{\text{rovib}} + S_{\text{X}}^{\text{pos}} + S_{\text{X}}^{\text{or}} + S_{\text{X,UA}}^{\text{transvib}} +S_{\text{X,UA}}^{\text{rovib}} \nonumber \\&\quad + S_{\text{X}}^{\text{conf}} + N_{\text{c,X}} \left( S_{\text{oct,M}}^{\text{transvib}} +S_{\text{oct,M}}^{\text{rovib}} + S_{\text{oct}}^{\text{or}} + S_{\text{oct,UA}}^{\text{transvib}} \right. \nonumber \\&\left. + S_{\text{oct,UA}}^{\text{rovib}} + S_{\text{oct}}^{\text{conf}} \right) \end{aligned}$$
(10)

The corresponding equations for the pure liquids are the same but omit the solute terms.

Simulation protocol

The pdb files for the 22 solutes were constructed using Avogadro [48] from their SMILES string provided in the SAMPL7 GitHub repository [37]. They are labelled SM25 to SM46. Only the neutral tautomer (micro000) was considered for each solute. Four kinds of simulation box were prepared: pure water, pure octanol, one solute in water, and one solute in octanol. Cubic boxes with side \(\approx \)34 Å were created using Packmol [49] for both pure solvent and solutions, corresponding to 150 octanol molecules and 1300 water molecules, and 1 solute molecule per box in the case of the solutions. Simulations were setup using antechamber [50] and LEaP in AMBER Tools 18 [51] with the GAFF force field with AM1-BCC charges [52] for octanol and the solutes and TIP4P-Ew [53] for water. All simulations were equilibrated with 5000 steps of steepest-descent minimisation, 200 ps of NVT (constant number, volume, temperature) MD simulation at 298 K using a Langevin thermostat with a collision frequency 5.0 ps\(^{-1}\), followed by 25 ns of NPT simulation (constant number, pressure and temperature) at pressure of 1 bar using the Berendsen barostat [54] and relaxation time constant 2 ps. Data collection was run for 100 ns, saving data every 40 ps, giving 2500 frames for analysis. MD simulations were run using pmemd.cuda in AMBER 18 [55,56,57], a 10 Å cut-off for non-bonded interactions, a time step of 2 fs and the SHAKE algorithm for covalent bonds involving hydrogen. Simulations lasted 5–8 hours on 8 CPU cores or 1 GPU.

Performance assessment

The performance of the MD-based EE-MCC method to obtain logP values for the SAMPL7-logP data set is assessed by calculating the mean absolute error (MAE) and the root-mean-square error (RMSE) defined as

$$\begin{aligned} \rm{MAE}= N^{-1} \sum _{j}\left| \Delta _{j}\right| \end{aligned}$$
(11)
$$\begin{aligned} \rm{RMSE}=\sqrt{N^{-1} \sum _{j} \Delta _{j}^{2}} \end{aligned}$$
(12)

where \({\Delta _j} = {\text{logP}}{_{{\text{EE - MCC}},{\text{j}}}} - {\text{logP}}{_{{\text{experiment,j}}}} \) for the j-th value and N is the total number of values analysed. Each simulation was done in triplicate to assess the statistical uncertainty of the model, yielding a Standard Error of the Mean (SEM) calculated as

$$\begin{aligned} \text {SEM}=\frac{s}{\sqrt{n}} \end{aligned}$$
(13)

where s is the standard deviation and n the number of repetitions. The final energies and entropies are averaged over the values from all three simulations.

The model uncertainty is 1.3 kcal \(\rm{mol}^{-1}\) based on the root-mean squared error of the energy due to GAFF as found in literature [58], which corresponds to an uncertainty in logP of 0.95. This can be used to assess the accuracy of the method prior to comparison with experimental measurements.

Results and discussion

LogP values versus experiment

The octanol–water logP values computed by EE-MCC using Equations 12 and 3 are presented in Fig. 2 versus experiment for all 22 SAMPL7 compounds, together with error metrics of MAE, RMSE and SEM given by Equations 1113.

Fig. 2
figure 2

EE-MCC octanol–water logP values versus experiment with SEM error bars for the 22 solutes

The logP values are seen to come out in the right ball-park of a typical logP value but the correlation with experiment is weak and the range of predicted values from \(-2\) to 5 exceeds the experimental range of 0.5 to 3. Evidently, there are sizeable sources of error. To probe this further, Table 1 lists the predicted and experimental logP values, together with the corresponding \(\Delta H\), \(T\Delta S\), \(\Delta G\) values (see Tables S4 and S5 for the actual simulation values).

Table 1 \(\Delta H\), \(T\Delta S\), \(\Delta G\) and computed and experimental octanol–water logP values for the 22 solutes (kcal mol\(^{-1}\))

Table 1 makes clear that the larger contribution to \(\Delta G^{\text{transfer}}_{\text{X(oct,wat)}}\) comes from the enthalpy rather than the entropy, although there are cases where entropy dominates such as SM27, SM29 or SM40. In general, \(\Delta H^{\text{transfer}}_{\text{X(oct,wat)}}\) is mostly negative and \(T\Delta S^{\text{transfer}}_{\text{X(oct,wat)}}\) is mostly positive, consistent with the favourable transfer of the solutes to octanol. The large size of the fluctuation in enthalpy is made clear in the average SEM for \(\Delta H^{\text{transfer}}_{\text{X(oct,wat)}}\) over different simulation repetitions which is seen to have a larger SEM of 1.47 kcal mol\(^{-1}\) than that of \(T\Delta S^{\text{transfer}}_{\text{X(oct,wat)}}\) at 0.31 kcal mol\(^{-1}\), demonstrating that the energy fluctuations are more responsible for deviations from experiment rather than the entropy calculated by MCC. Indeed, Table S1 lists the SEMs for the enthalpy and entropy changes for the individual solutes and shows that the SEM on the total enthalpy for a given solute is 0.4-2.7 kcal mol\(^{-1}\) for the different solutes. This is the same size as the \(\Delta H^{\text{transfer}}_{\text{X(oct,wat)}}\), even for simulations on the order of 100 ns for fairly small system sizes. Even though energies appear well converged in time (Figs. S1 and S2), this suggests that even longer and/or more simulations or saving output more often would be needed in order to drive down errors in energy, although lower errors could also be achieved by considering the energy only of the solvent molecules in the solute’s solvation shell, a quantity that was not readily available using the standard energy output of AMBER. Alternatively, a recent method developed by Kofke and co-workers called mapped averaging [59,60,61] when adapted to liquids could substantially reduce the noise in these values.

Entropy components

Even though the logP values produced have substantial errors, largely because of statistical errors in the energy, the MCC components can be used to better understand how the entropy and associated molecular flexibility is being affected for all molecules, solute and solvent, in the transfer process. We first consider changes in the entropy components. Figure 3 illustrates the changes in each entropy component in the transfer of each solute from water to octanol.

Fig. 3
figure 3

Changes in entropy components as given in Eqs. 9 and 10 for water (top), octanol (middle) and the solutes (bottom). The molecule-level changes are blue for water, red for octanol, and green for the solutes. The united-atom changes are coloured orange for octanol and pink for the solutes. Each of these components is subdivided further into transvibrational, rovibrational and topographical components at each level, indicated by shading from dark to light, respectively

Data in each case is only for one of the three simulations. The most striking trend as each solute moves from water to octanol is the entropy gain of water and the entropy loss of octanol, with the latter in general being slightly smaller in magnitude. The change in water is well-known, particularly for hydrophobic molecules. The component analysis shows that the entropy gain of water is primarily orientational but offset partially by decreases in transvibrational and rovibrational entropy, consistent with earlier studies [30,31,32,33]. This is because water surrounded by water has more neighbours able to form hydrogen-bonds and the hydrogen bonds are stronger. The change for octanol is less well-known but not unexpected, given that the reduction in symmetry for molecules adjacent to solutes tends to constrain solvent molecules. A component analysis shows that essentially all terms are negative. Most of the decrease is orientational, indicating that octanol molecules have disrupted structure and fewer neighbours in the presence of the solute. There are smaller losses in united-atom topographical entropy, which is conformational, and in molecule vibration, with smaller reductions in united-atom rovibration but a tiny gain in united-atom transvibration. The changes for the solute entropy are smaller and variable in direction, indicating that the solvent is dominating the change in overall entropy. Most solutes have a smaller united-atom conformational entropy and a gain in molecular entropy, primarily orientational but also vibration. Changes for united-atom vibration are more variable. One term left out of this plot is the change in positional entropy. Only depending for dilute solutions on the molecular volumes of the solvents, this has a constant value of \(-18\) J K\(^{-1}\) mol\(^{-1}\), reflecting that there are fewer solute positions in octanol at a given concentration because of the larger volume of the octanol molecule.

A greater understanding of the components comes from looking at the absolute entropies. Fig. 4 illustrates the entropy components for the 22 solutes in octanol and in water and Fig. 5 shows the corresponding entropy components for all solvent molecules in the first solvation shell of each solute for water or octanol as solvent. Data for each solute is shown for only one of the three simulations. The corresponding values of the entropy components are given in Figs. S6 and S7 and their SEMs are given in Tables S2 and S3. The most obvious difference between Figs. 4 and 5 is that the total entropy of the first-shell solvent is much larger than that of the solute, being \(\sim \)5 times larger for water and \(\sim \)14 times larger for octanol. This is one of the main reasons why the entropy of the solvent dominates the overall entropy change. The next clear trend is that the changes in entropy going from water to octanol, given explicitly in Fig. 3, are tiny compared to the total entropy values. As for energy in EE methods, changes are a small difference between large and comparable numbers. Nonetheless, the errors in the entropy components are much smaller than that in energy as noted earlier. The plots show that the vibrational entropy contributes the most to the total entropy for all compounds while topographical entropy contributes the least, consistent with earlier work [26,27,28,29,30,31,32,33, 35]. The molecule-level vibrational entropy is near-identical for all solutes but slightly varying for the surrounding solvent. The united-atom entropy terms for the solutes are larger and more variable for the solutes and for octanol.

Fig. 4
figure 4

Total entropy and entropy components of each solute in octanol (left) and water (right). Components are coloured as for Fig. 3 for the molecule and united-atom levels and transvibrational, rovibrational, and topographical components

Fig. 5
figure 5

Total entropy and entropy components for all the solvent molecules in the solvation shell of each solute (right) and the equivalent contribution of bulk solvent without solute (left). Colouring is as for Fig. 3 for the molecule and united-atom levels and transvibrational, rovibrational, and topographical components

Concerning the entropy of the different bioisosteric solutes in Fig. 1, there is a general dependence on the size of each solute, with SM39 having the largest entropy and SM44 the smallest. All but the first four solutes can be divided into six groups, each of which has three compounds which differ by a methyl, phenyl or dimethylamine functional group attached to the sulfonyl group. The groups are G1 = SM29-SM31, G2 = SM32-SM34, G3 = SM35-SM37, G4 = SM38-SM40, G5 = SM41-SM43 and G6 = SM44-SM46. A recurring trend within each group that is evident in Fig. 4 is that the entropy of the solute with methyl is smaller than the other two solutes because of methyl’s smaller size. Another distinctive trend in the solute entropies in Fig. 4 is the lower entropies of the G5 and G6 groups of molecules. This occurs because these molecules are smaller and less flexible, primarily because they have a heteroaryl ring in place of the ethyl fragment that connects the common phenyl ring. However, these trends for the solutes do not carry over to the solvent entropy terms, the changes in entropy or to the overall logP values.

Conclusions

The EE-MCC method to calculate the free energy of a system directly from MD simulation has been used to calculate the octanol–water logP values of 22 N-acyl sulfonamides bioisosters in the SAMPL7 Physical Properties Challenge. The mean error versus experiment was 1.8 logP units and the standard error of the mean was 1.0 logP units for three separate calculations. These errors are primarily due to getting sufficiently converged energies to give accurate differences of large numbers, particularly for solvent molecules of large size and flexibility such as octanol. However, this is also an issue for entropy. Other sources of error are approximations in the force field and MCC theory, the neglect of water in the octanol phase, and different tautomeric states of the solute. The main advantages of EE-MCC are its wide applicability to many systems and that it explains the entropy in terms of all the degrees of freedom and all molecules in the system in a consistent and intuitive framework, which is superior to standard structural methods that only assess molecular flexibility for a subset of all degrees of freedom. The enthalpy of transfer from water to octanol is mostly favourable, consistent with the hydrophobic nature of the solutes. To explain the predominant gain in entropy, most comes from a large increase in the orientational entropy of water and a small increase in solute vibrational and orientational entropy. This is offset by unfavourable changes in the orientational entropy of octanol, the vibrational entropy of both solvents, and the positional and conformational entropy of the solute. This study makes clear the feasibility of Energy-Entropy methods for logP calculations, what areas need improvement, and how they might be applied to other systems more generally.