Introduction

“Water, water, every where, ne any drop to drink”, the ancient mariner cries out in Samuel Coleridge’s poem [1,2,3,4,5]. And, in fact, water is one of the most ubiquitous molecules on earth. 71% of earth’s surface are covered by water, and most biochemical reactions occur in aqueous solution [6, 7]. However, most water is not pure, but part of a solution. Sea water is undrinkable due to its high salt concentration, and the human serum contains a multitude of organic and inorganic solutes. Furthermore, salt strongly impacts the structure and function of biomolecules, such as proteins [8] or DNA [9, 10].

There has been large scientific interest in predicting the interactions of water with dissolved substances [9]. In the field of environmental research, solubility of organic compounds in sea water determines their evaporation as well as their bioaccumulation [11]. In pharmaceutical research, hydration properties of large molecules are fundamental to the understanding of aggregation [12], solubility [13], viscosity, and protein expression.

Many biochemically relevant processes depend not only on the solute-water interaction, but also on the salt concentration. For example, ions influence the melting temperature [14] and conformations [15] of DNA, as well as the stability of proteins in solution, an effect that is described by the Hofmeister series [16]. They also affect many biochemical assays, such as Hydrophobic Interaction Chromatography (HIC) [17], which is routinely applied to detect surface hydrophobicity of biopharmaceuticals [18], and utilizes a salt gradient to control elution. It has been shown that results from Affinity-Capture Self-Interaction Nanoparticle Spectroscopy (AC-SINS) [19] correlate with HIC experiments only when sufficient salt concentrations are used. This is expected since long-range electrostatic interactions are shielded, while hydration of hydrophobic side chains as well as the protein backbone is disfavored by the salting-out effect [20].

To improve the understanding of biomolecular solvation, it is crucial to develop computational methods that can describe the effect of salts on hydration properties. For in-depth analysis of solvation effects, it is desirable that those methods provide separate contributions of energy and entropy to the free energy of solvation, as well as a spatial resolution of the computed properties [21].

In this work, we extend the well-known GIST [22] method, which computes spatially resolved contributions to the free energy of solvation based on molecular dynamics (MD) simulations, to take salt effects into account. In the following, we will give a short introduction to the salting-out effect, before reviewing existing computational approaches to describe molecular solvation properties.

The salting-out effect

Setschenow showed a linear correlation between the logarithm of the solubility S of a compound and the salt concentration in the solvent [23], called the Setschenow relation.

$$\mathrm{log}\frac{S}{{S}_{0}}={K}_{S}\times {c}_{\text{salt}}$$
(1)

here, KS is the salting-out constant and depends on the nature of the solute as well as the solvent, and c is the salt concentration. S0 is the solubility in of the compound in pure water.

Some trends have been observed regarding KS. Generally, small and highly charged ions tend to reduce the solubility of hydrophobic solutes, while larger ions are less strongly hydrated and tend to be more favoring to hydration of hydrophobic compounds [24, 25]. Charged and polar compounds tend to have a weaker salting-out effect, or even a salting-in effect (negative KS) [24]. There is also an approximately linear dependence between the size of a solute and its KS.

Multiple methods, including scaled-particle theory [26], Kirkwood–Buff theory [27], and test particle insertion [28, 29], have been used to describe the salting-out effect of nonpolar gases.

Here, we use the salting-out effect as a test system for our generalized GIST approach, since it can be directly linked to the free energy of solvation of a compound in dependence of the salt concentration. For more details, we refer to the Supporting Information (SI).

Computational approaches to solvation

In the field of large biomolecules, most methods dispense with any physical description of the solvent, instead treating quantities like solubility or hydrophobicity as intrinsic to the solute. These tools have proven useful at highly complex tasks such as reducing the aggregation propensity of antibodies [30, 31]. However, they do not provide a physical description of the solvent and are therefore inadequate to describe the effect of the solvent composition and salt concentration.

The solubility of organic substances is frequently described using implicit solvent models, that treat both the water and the dissolved ions as a homogeneous continuum. Those methods provide a simple, yet effective approach to model different salt concentrations. For example, the statistical associating fluid theory (SAFT) [32], as well as the conductor-like screening model (COSMO) in combination with quantum mechanics [33, 34] have been used successfully to predict the effect of salt concentration on solubility. However, it has been shown that the accuracy of implicit solvent models is limited compared to explicit methods based on MDs simulation [35, 36].

Multiple theories have been developed to describe solvation of hydrophobic molecules, many of which are able to incorporate salts as part of the solvent. Notable examples include scaled particle theory [37], Kirkwood–Buff theory [27, 38], as well as approaches based in information theory [39] and test particle insertion [27, 40]. Typically, those methods focus on hydration of purely hydrophobic compounds, though they might be complemented by additional energy terms [41].

Thermodynamic integration (TI) [42,43,44] is a method that calculates free energy differences based on alchemical transformations, such as replacement of a chemical group. It can accurately compute absolute hydration free energies by slowly removing the solute from the simulation box. This is a very general method and has been successfully applied to solvents containing ions [45]. However, care has to be taken when the solute molecule carries a net charge [46,47,48]. Furthermore, the applicability of TI is limited to rather small molecules.

Methods that explicitly model the statistical mechanics of the solvent can be classified broadly into those based on the Ornstein–Zernike (OZ) equation [49] and those based on inhomogeneous solvation theory (IST) [50]. Methods based on the OZ equation, such as 3D-RISM [51, 52], compute the distribution of particle densities from a molecular mechanics force field in combination with the pair distribution function, and can treat both pure solvents and salt–water mixtures [53]. It has also been shown that 3D-RISM can be used to compute salting-out coefficients [54].

In contrast, methods based on IST [50, 55] use an ensemble of solvent conformations, typically obtained from MD simulation, to estimate the partition sum of the solvent molecules in the proximity of the solute. A discretized implementation of IST, called grid inhomogeneous solvation theory (GIST), has been applied to a wide range of biomolecular systems, such as solvation thermodynamics of the cucurbit[7]uril receptor [22], unfavorable hydration sites on the surface of proteins in the context of protein–drug interactions, binding characteristics of serine proteases [56], or hydrophobicity of antibodies [57]. Recent developments include improved performance due to GPU-acceleration [56] or PME-based energy calculation [58], as well as an extension to solvents other than water [59].

Further methods based on IST include WaterMap [60], SSTMap [61] and STOW [62]. However, none of the IST-based methods has been extended to account for the presence of salt in the solvent. This is problematic not only due to the large number of biochemical processes that are affected by the salt content of the solvent, but also because MDs simulations of charged systems are commonly performed using counter ions. Prior GIST studies either did not use counter ions [57], or restrained them as part of the solute [63].

Here, we present an extension of the GIST algorithm that explicitly treats ions as part of the solvent and test its validity on the solvation free energy of carbazole in 1 M NaCl solution. We compute first and second order entropies and test the viability of the Kirkwood superposition approximation (KSA) for a simplified computation of second order entropies. Furthermore, we compute a set of salting-out coefficients from previous work by Endo et al. [34]. This allows us to estimate the accuracy of our method and to show opportunities for prospective improvements of the entropy estimation in GIST.

Theory

GIST

Here, we shortly review the theory behind existing GIST implementations. The extension to multiple solvents, as well as the second order entropy estimate, will be discussed below.

GIST is a method devised by Nguyen et al. [22, 64]. It uses IST [50] to compute thermodynamic quantities of water in the vicinity of a solute, but replaces the spatial integrals by discrete sums over voxels in a 3-dimensional grid.

The input of a GIST calculation is the trajectory of an MD simulation of the solvent molecules surrounding a restraint solute.

The free energy of solvation ΔAsolv is split into separate contributions for energy and entropy.

$$\Delta{A}_{solv}=\Delta{E}_{solv}-T\Delta{S}_{solv}$$
(2)

here, Esolv denotes the energy contribution to solvation, while Ssolv is the entropy contribution. T is the system temperature, and the Δ denotes that quantities are calculated relative to their respective bulk value.

The energy is further split into contributions for the solute–solvent (Euv) and solvent–solvent (Evv) interactions.

$$\Delta {E}_{solv}=\Delta {E}_{solv}^{uv}+\Delta {E}_{solv}^{vv}$$
(3)

Both energy terms can be readily computed from the force field. For Euv, interaction energies are calculated between each solvent molecule and the solute, and assigned to the grid voxel that currently holds the solvent molecule. The same is done in every frame.

Similarly, Evv is calculated from the interaction energies between all pairs of solvent molecules. Half of the energy contribution is assigned to each of the two molecules, to avoid double counting.

The solute–solvent contribution goes to zero with increasing distance from the solute, i.e., Euv is zero in the bulk. In contrast, Evv tends to a constant value (Evvbulk) in the bulk. This value must be subtracted to obtain the energy change of the solvent upon addition of the solute.

$$\Delta {E}_{solv}^{vv}={E}_{solv}^{vv}-{E}_{bulk}^{vv}$$
(4)

The entropy contribution can be calculated in two different ways. First, it can be further separated into contributions for the translational (Strans) and the rotational (Sorient) entropy.

$$\Delta {S}_{solv}\cong \Delta {S}_{\text{solv}}^{\text{trans}}+\Delta {S}_{\text{solv}}^{\text{orient}}$$
(5)

Alternatively, a single entropy estimate (Ssix) can be computed from all six degrees of freedom of the solvent molecule.

$$\Delta{S}_{\text{solv}}\cong \Delta{S}_{\text{solv}}^{\text{six}}$$
(6)

In practice, all three entropy quantities (trans, orient, and six) are calculated using a nearest-neighbor estimate. For each voxel k:

$${S}_{k}^{\text{trans}}=R\left(\gamma +\frac{1}{{N}_{k}}\sum_{i=1}^{{N}_{k}}{\text{ln}}\frac{{N}_{f}{\rho }^{0}4\pi \cdot {d}_{trans}^{3}}{3}\right)$$
(7)
$${S}_{k}^{\text{orient}}=R\left(\gamma +\frac{1}{{N}_{k}}\sum_{i=1}^{{N}_{k}}{\text{ln}}\frac{{N}_{k}{\left(\Delta {\omega }_{i}\right)}^{3}}{6\pi } \right)$$
(8)
$${S}_{k}^{\text{six}}=R\left(\gamma +\frac{1}{{N}_{k}}{\sum }_{i=1}^{{N}_{k}}{\text{ln}}\frac{{N}_{f}{\rho }^{0}\pi {\left(\Delta {\omega }_{i}^{2}+{d}_{i,trans}^{2}\right)}^{3}}{48}\right)$$
(9)

here, R denotes the gas constant, γ is the Euler–Mascheroni constant, which accounts for an asymptotic bias, Nk is the number of solvent molecules that are seen in voxel k throughout the simulation, Nf is the number of frames, ρ0 is the number density of bulk solvent, dtrans is the translational distance of a solvent molecule to its nearest neighbor, and Δω is the rotational distance to its nearest neighbor.

Note that the nearest neighbor distances are calculated between solvent molecules from all frames. Therefore, they tend towards zero in the limit of perfect sampling.

Since this entropy estimate only takes first-order terms—such as the position and orientation of solvent molecules—into account, while omitting higher-order terms such as correlations between the orientation of neighboring solvent molecules, the entropy estimates go to zero in the bulk, at least in the case of perfect sampling. Therefore, no reference values are needed for the entropy in theory. However, limited sampling and inaccuracies in the reference density might lead to a small entropy contribution that is left in the bulk. In previous works, this has sometimes been corrected by subtracting a small reference entropy [58].

Extension of GIST to mixtures

The aim of this work is to extend GIST to solvation in salt–water mixtures, i.e., solvent mixtures containing a regular solvent (e.g., water), a positively charged ionic species (i.e., Na+), and a negatively charged ionic species (i.e., Cl). In the following, we denote those compounds as wat (water), M+ (cation) and X (anion). We note, however, that the same considerations apply to arbitrary mixtures of solvents, and that, in combination with the generalization of GIST to multiple solvents presented in [59], GIST calculations of arbitrary solvent mixtures should indeed become possible.

Throughout this work, we consider the salt to be a part of the solvent rather than the solute. Therefore, the reference state for bulk solvent properties must also be a salt-water mixture.

Energy terms

As in the original GIST implementation, the energy terms can be calculated from the employed force field in a straight-forward manner. Since the ions are considered part of the solvent, all ion–solute interactions are added to the solute–solvent energy Euv of the voxel currently holding the ion. All water–ion and ion–ion interactions are split by two and added to the solvent–solvent energy Evv of the voxels holding the respective molecules, consistently with the solvent–solvent interactions in the original GIST implementation.

Subtracting the appropriate reference energy for the solvent–solvent term, however, is less trivial. We consider a reference state where all solvent molecules have been transferred to the bulk. The average energy attributed to such a system can be written as:

$$\langle {E}_{vv,ref}\rangle =\langle {\sum }_{i}^{\text{solvent molecules}}{E}_{i,{\text{bulk}}}\rangle = {\sum }_{i}^{\text{solvent molecules}}\langle {E}_{i,{\text{bulk}}}\rangle$$
(10)

here, Evv,ref denotes the reference energy, which is equal to the energy that the solvent molecules had if they were located in the bulk, Ei,bulk denotes the same energy for a single molecule, and the angle brackets 〈.〉 denote the ensemble average. This method of summation assumes that all energies have already been divided by two to avoid double counting, as discussed in [65].

In the original GIST algorithm, this term is expressed by a constant per-molecule energy term. However, the energy contributions of different molecular species must be treated separately. This leads to the following expression:

$${\langle {E}_{vv,{\text{ref}}}\rangle }_{k}={\sum }_{i}^{{N}_{k,{\text{wat}}}}\langle {E}_{\text{wat}}\rangle +{\sum }_{i}^{{N}_{k, {\text{M}}^{+}}}\langle {E}_{{\text{M}}^{+}}\rangle +{\sum }_{i}^{{N}_{k, {\text{X}}^{-}}}\langle {E}_{{\text{X}}^{-}}\rangle ={N}_{k,{\text{wat}}}\times \langle {E}_{\text{wat}}\rangle +{N}_{k, {\text{M}}^{+}}\times \langle {E}_{{\text{M}}^{+}}\rangle +{N}_{k, {\text{X}}^{-}}\times \langle {E}_{{\text{X}}^{-}}\rangle$$
(11)

here, Nk,i denotes the number of molecules of type i in voxel k. We treat the expectation value of the energy of each compound in bulk as a constant. We determine those constants by an ordinary least squares (OLS) fit of the solute–solute energy in bulk solution, treating the Nk,i values as independent variables. This fit must be done separately for each solvent composition since different ion concentrations lead to different expectation values of the energy per molecule. To ensure that the solvent is sufficiently bulk-like, we can either perform separate simulations of the solvent without solute or select grid voxels far away from the solute molecule. Throughout this work, the second approach was employed, with a minimum distance of 16 Å from the solute.

We then calculate the reference energy of each voxel k in the vicinity of the solvent by evaluating Eq. 11 using the respective molecule counts in voxel k, as well as the previously determined expectation values of the bulk energy.

Entropy terms

Traditionally, GIST only considers first-order entropy terms, omitting all solvent–solvent correlations. This allows us to split the entropy into separate terms for each component of the solvent mixture:

$${S}_{k}={S}_{k,{\text{wat}}}+{S}_{k,{\text{M}}^{+}}+{S}_{k,{\text{X}}^{-}}$$
(12)

This applies to all three types of entropy terms (translational, orientational, and the six-dimensional entropy). We calculate all terms according to Eqs. 79 but use the appropriate number density for each species.

The ions (M+ and X) do not have any orientational entropy contribution since they are monoatomic. Therefore, only Strans needs to be evaluated for them.

Second order entropy

Current GIST implementations do not contain the second-order entropy terms, which describe the change in solvent–solvent structure upon addition of the solute. Recently, it has been shown empirically, in the case of organic molecules in pure water, that a rough estimate of the higher order entropy terms may be obtained by scaling the first order entropy by a constant factor [66].

Since this has been shown only empirically, we cannot assume that this approach can be extended to solvent mixtures. Therefore, we need to estimate \({g}^{inh}\) from MD simulation data. The exact second order entropy in IST is:

$$\frac{\Delta {S}^{2nd}}{k}=-\frac{1}{2}{\sum }_{\nu }{\sum }_{{\nu }{^{\prime}}}{\rho }_{\nu }^{\infty }{\rho }_{{\nu }{^{\prime}}}^{\infty }{\int }_{local}d{\varvec{r}} d{{\varvec{r}}}{^{\prime}} {G}_{s\nu }\left({\varvec{r}}\right)\left[{G}_{s{\nu }{^{\prime}}}\left({{\varvec{r}}}{^{\prime}}\right) \left({g}_{\nu ,{\nu }{^{\prime}}}^{inh}\left({\varvec{r}}, {{\varvec{r}}}{^{\prime}}\right)\mathrm{ln}\left({g}_{\nu ,{\nu }{^{\prime}}}^{inh}\left({\varvec{r}}, {{\varvec{r}}}{^{\prime}}\right)\right)-{g}_{\nu ,{\nu }{^{\prime}}}^{inh}\left({\varvec{r}},{{\varvec{r}}}{^{\prime}}\right)+1\right)-\left({g}_{\nu {\nu }{^{\prime}}}^{0}\left({{\varvec{r}},{\varvec{r}}}^{\boldsymbol{^{\prime}}}\right)\mathrm{ln}{g}_{\nu {\nu }{^{\prime}}}^{0}\left({{\varvec{r}},{\varvec{r}}}^{\boldsymbol{^{\prime}}}\right)- {g}_{\nu {\nu }{^{\prime}}}^{0}\left({{\varvec{r}},\boldsymbol{ }{\varvec{r}}}^{\boldsymbol{^{\prime}}}\right)+1\right)\right]$$
(13)

here, ν and ν′ denote the solvent species, \({\uprho }^{\infty }\) denotes the solvent density distant to the solute, \(G\) is the solute–solvent pair correlation defined with respect to \({\uprho }^{\infty }\). \({g}^{0}\) is the bulk solvent–solvent pair correlation function, which only depends on the relative position and orientation, and \({g}^{inh}\) is the inhomogeneous pair correlation function, which depends on the positions and orientations of both solvent molecules relative to the solute.

In its full form, \({g}^{inh}\) is a 12-dimensional function. In a previous work, Nguyen et al. [67] estimate the translational part of \({g}^{inh}\) on a grid, which is still a 6-dimensional problem and computationally highly challenging. In our case, it is even more difficult to obtain sufficient sampling because the ion density is much lower than that of water. Therefore, we reduce \({g}^{inh}\) to a 1-dimensional radial distribution function (rdf) at each grid voxel, which is still essentially a 4-dimensional function.

Consistently, we also reduce \({g}^{0}\) to a 1D representation \({g}^{0}(d)\). This is exact in the case of spherical solvents.

Denoting the radial distance as \(d\), and substituting \(\frac{dr}{4{d}^{2}\pi }=\mathrm{d}d\) and \({G}_{s,{\nu }{^{\prime}}}\left({\varvec{r}}, d\right)={\int }_{\left|{r}{^{\prime}}-r\right|=d}{G}_{s,{\nu }{^{\prime}}}\left({{\varvec{r}}}^{\boldsymbol{^{\prime}}}\right) d{{\varvec{r}}}{^{\prime}}/(4{d}^{2}\pi )\), we rewrite Eq. 13 as

$$\frac{\Delta {S}^{2nd}}{k}=-\frac{1}{2}{\sum }_{\nu }{\sum }_{{\nu }{^{\prime}}}{\rho }_{\nu }^{\infty }{\rho }_{{\nu }{^{\prime}}}^{\infty }{\int }_{\mathrm{local}}\mathrm{d}{\varvec{r}} \mathrm{d}d 4{d}^{2}\pi {G}_{s\nu }\left({\varvec{r}}\right)\left[{G}_{s{\nu }{^{\prime}}}\left({\varvec{r}}, d\right) \left({g}_{\nu ,{\nu }{^{\prime}}}^{\mathrm{inh}}\left({\varvec{r}}, d\right)\mathrm{ln}\left({g}_{\nu ,{\nu }{^{\prime}}}^{\mathrm{inh}}\left({\varvec{r}}, d\right)\right)-{g}_{\nu ,{\nu }{^{\prime}}}^{\mathrm{inh}}\left({\varvec{r}},d\right)+1\right)-\left({g}_{\nu {\nu }{^{\prime}}}^{0}\left(d\right)\mathrm{ln}{g}_{\nu {\nu }{^{\prime}}}^{0}\left(d\right)- {g}_{\nu {\nu }{^{\prime}}}^{0}\left(d\right)+1\right)\right]$$
(14)

Furthermore, we employ the KSA [42] to obtain an estimate of the second order entropy in the same 4D representation. This is the same approach that was used by Lazaridis in his original work on IST [50]. Lately, the KSA has been heavily criticized [68]. Here, we test its validity by comparing between the conditional rdfs and the KSA.

In terms of IST, the KSA assumes that the inhomogeneous pair correlation function between solvent molecules in the vicinity of the solute \({g}^{inh}\) is equal to the bulk pair correlation function \({g}^{0}\).

The KSA simplifies above equation to:

$$\frac{\Delta {S}^{2nd,KSA}}{k}= -\frac{1}{2}{\sum }_{\nu }{\sum }_{{\nu }{^{\prime}}}{\rho }_{\nu }^{\infty }{\rho }_{{\nu }{^{\prime}}}^{\infty }{\int }_{\mathrm{local}}\mathrm{d}{\varvec{r}} \mathrm{d}d 4{d}^{2}\pi {G}_{s\nu }\left({\varvec{r}}\right)\left[\left({G}_{s{\nu }{^{\prime}}}\left({\varvec{r}}, d\right)-1\right)\left({g}_{\nu {\nu }{^{\prime}}}^{0}\left(d\right)\mathrm{ln}{g}_{\nu {\nu }{^{\prime}}}^{0}\left(d\right)- {g}_{\nu {\nu }{^{\prime}}}^{0}\left(d\right)+1\right)\right]$$
(15)

Since the ions are monoatomic, their pair correlation functions are 1-dimensional. Therefore, the 1D representation is accurate in the case of ion–ion KSA entropies.

We compute the local integral up to a distance of 10 Å, since the pair distribution functions are well converged at this distance.

Reference densities

It is well known that the pair distribution function \(g\) does not converge to 1 in the canonical ensemble [69], since the central reference particle takes up a certain volume, thereby decreasing the volume that is left to the other particles. IST uses a modified distribution function \(G\), that is defined with respect to the density distant to the solute \({\rho }^{\infty }\), rather than the density of an unperturbed bulk solution \({\rho }^{0}\).

Furthermore, IST requires the density of a solvent reference system without the central solute particle. The IST equations contain correction terms that can be applied when the system density does not exactly match the reference density. These terms arise from the enthalpy and entropy of the reference system and are represented as a part of the liberation terms in the final IST equations. The equations may be found in the supporting information or in Lazaridis’ original work [50].

In current GIST implementations, the reference density is needed only for the entropy calculations and must be supplied as an input parameter. When the solvent is pure water, this only needs to be computed once for each combination of water model, temperature, and pressure. With mixtures, the reference density depends on the exact molar fraction of each species, as well as the density of the mixture.

Furthermore, a change in molar composition around the solute will be compensated by more distant regions of the solvent box. For example, small and strongly hydrated species such as F are strongly disfavored around apolar compounds, which is compensated by a slightly higher density in the “bulk” regions of the solvent box (\({\rho }^{\infty }\)). The change in bulk density depends on the ratio between the excluded and total volumes and is generally small as long the size of the simulation box is sufficient. Nevertheless, this creates a small, but unphysical dependence of the solvation entropy on the box volume.

Here, we avoid this problem by re-calculating the reference densities \({\rho }^{\infty }\) in each GIST calculation from the bulk-like regions of the solvent box, employing a 12 Å distance-cutoff from the solute. Physically, this implies an infinite system at the bulk solvent composition of the respective solvent box, containing a single solute molecule. This only changes the density by a small amount, because most of the solvent box is bulk-like, due to the small and neutral nature of the molecules in our test set.

We note that this approach implies that the composition of the reference solvent is not known exactly until after the simulation. However, this does not limit the applicability of our method, since it is also trivial to compute the correction terms (the equations are shown in the SI) if the exact solvent composition is required.

Implementation details

Typically, the ion concentration is much lower than the water concentration. The current GIST implementations apply a grid-based approach to find the nearest neighbor of each molecule and assume that this neighbor must be in the same or in a neighboring voxel. This is only appropriate if each voxel is visited by multiple molecules during the simulation. However, this is not the case when concentrations much lower than the bulk concentration of water (55.5 M) are used. Therefore, the current implementation of Strans is inappropriate to calculate ion entropies. To solve this problem, we created a more general version of the Strans calculation using the KD-Tree implementation in either SciPy [70] or pykdtree (https://github.com/storpipfugl/pykdtree) for the nearest-neighbor search. This algorithm works correctly with low concentrations, while yielding identical results as the original implementation for the water entropy. However, it cannot replace the original algorithm due to its lower performance.

For the first order entropy of water, we patched our previously published GPU-accelerated GIST implementation so that the water entropy is calculated correctly in the presence of other solvent molecules. To do so, no algorithmic changes were necessary, but we had to make sure that no M+ or X are counted as water in the entropy calculation. Combining this entropy with the ion entropy from our Python implementation, we obtain the total first order entropy.

For the second order entropy, we discretize \({G}_{s{\nu }{^{\prime}}}({\varvec{r}},d)\) and \({g}_{\nu ,{\nu }{^{\prime}}}^{inh}\left({\varvec{r}}, d\right)\) on a 3-dimensional grid of voxels \(k\). Furthermore, we discretize \(d\) by radial bins denoted as \(l\). We write the discretized quantities as \({G}_{s{\nu }{^{\prime}},kl}\) and \({g}_{\nu {\nu }{^{\prime}},kl}^{inh}\). For every \(\nu\) and \({\nu }{^{\prime}}\), we compute the intermolecular distances between all molecules of type \(\nu\) that are within the grid, and all molecules of type \({\nu }{^{\prime}}\) in the entire simulation box, using the minimum image convention. We then assign a radial bin \(l\) based on the exact distance and increment the respective counter at the respective voxel \(k\) holding the first molecule. The resulting histograms are divided by \({N}_{\nu k}{V}_{l}{G}_{s{\nu }{^{\prime}},kl}{\rho }_{\nu }{\rho }_{{\nu }{^{\prime}}}\), to obtain a radial distribution function (rdf). Here, \({N}_{\nu k}\) is the number of molecules \(\nu\) found at voxel \(k\), \({V}_{l}\) is the bin volume, and \({G}_{{s\nu }{^{\prime}},kl}\) is the density of molecules \({\nu }{^{\prime}}\) in a discretized distance \(l\) to voxel \(k\).

\({G}_{s{\nu }{^{\prime}},kl}\) is computed from a grid representation of the density distribution \({G}_{s{\nu }{^{\prime}}}({{\varvec{r}}}^{\boldsymbol{^{\prime}}})\) by integrating in spherical shells around the center of voxel \(k\). To avoid inaccuracies due to the grid spacing, a very fine grid of 0.25 Å spacing is used at first, and the final \({G}_{{s\nu }{^{\prime}},k}(d)\) is re-binned to the desired resolution.

Given the discrete representations of \({g}_{\nu {\nu }{^{\prime}},kl}^{\mathrm{inh}}\), \({g}_{{\nu \nu }{^{\prime}},l}^{0}\) and \({G}_{{s\nu }{^{\prime}},kl}\), the second order entropy can be implemented in a straight-forward manner using Eq. 14. The KSA is used by inserting \({g}_{{\nu \nu }{^{\prime}},l}^{0}\) for \({g}_{\nu {\nu }{^{\prime}},kl}^{inh}\).

Combining the water entropy from the GIST output with the ion entropies calculated in our python implementation, we obtain the total solvation entropy.

The second order entropy and the first order ion entropy are implemented as a series of Python scripts and may be obtained from our group’s GitHub account (https://github.com/liedllab/second-disorder). The updated first order entropy of water is implemented in the newest version of GIGIST.

Salting-out coefficients

The Salting-Out coefficient KS, also called the Setschenow constant [11, 34], describes the dependence of the solubility of a compound on the salt concentration. It is defined via the Setschenow equation (Eq. 1).

Since solubility is linked to the free energy of solvation ΔGsolv [71, 72], the Setschenow equation may be formulated as follows:

$$\Delta{G}_{\text{solv}}\left({c}_{\text{salt}}\right)=\Delta{G}_{\text{solv}}^{0}+RT{\text{ln}}(10){K}_{S}\times {c}_{\text{salt}}$$
(16)

here, T refers to the system temperature in K, and R to the gas constant. Therefore, KS can be obtained from the slope of the free energy of solvation plotted against the salt concentration, divided by RT ln(10). A more detailed derivation of Eq. 16 may be found in the SI.

As described above, we re-calculate the bulk density after the simulations. For consistency, we compute the salt concentration from the GIST reference density. The salt concentration is related to the reference density by Avogadro’s constant NA, as well as a conversion factor of 1027 from [Å−3] to [L−1]. Since the concentrations obtained for Na+ and Cl are not numerically equal, the average was used to compute the salting-out coefficient.

Methods

Starting structures

We chose 16 rigid molecules out of the dataset by Endo et al. [34]. We excluded all flexible molecules since they require separate GIST calculations for each relevant conformation. Those 16 molecules were used for short GIST calculations (as described below) to compute salting-out coefficients. Furthermore, long simulations of carbazole were run for in-depth analysis of the entropy contributions.

Initial structures for the 16 molecules were obtained from the PubChem database [73]. The structures were optimized using Gaussian 16 [74] at HF/6-31G* level, and charges were obtained via a RESP fit [75]. All other parameters were taken from the Gaff2 force field [76] and assigned using the antechamber and parmchk2 programs in AmberTools19 [77].

All molecules were solvated in cuboid TIP3P [78] water boxes with a minimum distance of 20 Å between the molecule and the walls, using the tleap program from AmberTools. Assuming a constant particle density, solutions at different salt concentrations were produced by replacing a fraction of 1/55*csalt of all water molecules for Na+, and the same fraction for Cl, again using tleap. Since this method of setting the salt concentration is not exact, we recompute the exact salt concentration for the calculation of KS, as described in “Theory” section (“Reference densities” subsection). The Joung-Cheatham ion parameters were used [79].

During all equilibration steps, the solute heavy atoms were restrained using a force constant of 1000 kcal/(mol Å2). First, only the hydrogen positions were optimized using 500 steps of steepest descent, followed by 500 conjugate gradient steps. Then, the same was done on all atoms except the solute heavy atoms. The system was then heated from 100 to 300 K within 100 ps, using a time step of 1 fs and a Langevin thermostat [80] with a coupling constant of 2 ps−1. Finally, the pressure was equilibrated within 220 ps using a Berendsen barostat [81] with a time constant of 2 ps, at 300 K.

Molecular dynamics (MD) simulations for GIST

After the equilibration, we performed 200 ns classical MD simulations in the NpT ensemble. The CUDA-accelerated pmemd program in Amber 18 was used. Electrostatic interactions were treated using the PME algorithm [82] with a 8.0 Å cut-off for the real-space contributions. The temperature was set to 300 K using a Langevin thermostat [80] with a collision frequency of 2 ps−1, and the pressure was set to 1 bar using the Berendsen barostat [81] with a coupling time of 1 ps. All bond lengths involving hydrogen were constraint using the SHAKE algorithm, [83] allowing for a timestep of 2 fs. Furthermore, all solute heavy atom positions were kept at their initial values using a harmonic restraint of 1000 kcal/(mol Å2). Coordinates were stored to disk every 10 ps, resulting in 20,000 frames used in the GIST analysis.

For the second order entropy calculations, we performed a 10 µs MD simulation of water in carbazole using the same starting structure. We stored 10 million frames (1 frame per ps) and kept all other simulation settings equal.

GIST calculations

After centering the solute coordinates at the origin, GIST analyses were performed using a grid of 81 × 81 × 81 voxels with a spacing of 0.5 Å in every direction. Both the grid and the solute were centered at the coordinate origin. The reference densities were chosen as described in the Theory section and can be found in the SI. A version of the GPU-enabled GIST implementation [22, 50, 59, 84, 85] was used for all contributions except the ion entropy, with minor patches to ensure correct counting of water molecules in the presence of ions. The first order ion entropy and the second order entropies were calculated using our Python implementation (see “Implementation details” in the Theory section).

The free energy of hydration was calculated by integrating the GIST results over all voxels within 6 Å of any heavy atoms of the solute. The six-dimensional entropy estimate was used in favor of the translational and orientational contributions.

TI calculations

To assess the accuracy of our method in terms of absolute free energy values, we also performed a Thermodynamic Integration (TI) calculation of carbazole in 1.0 M salt solution. The results are shown in “Results and discussion” section, in the subsection titled “Free energy of solvation”.

We used a 2-step TI protocol consisting of an electrostatics part, where the charges of the molecule were removed, and a VdW part, where the already decharged molecule was decoupled from the system using softcore potentials [86, 87].

The 2 steps were started independently. For the electrostatics part, the same equilibrated structure as for the GIST calculations was used. For the VdW part, an extra equilibration step of 25 ns was run before the first lambda window, to account for the loss of charges after the initial equilibration. Soft-core potentials were used with an α parameter of 0.5.

Both steps were composed of 11 equally spaced lambda windows. For the VdW part, we used a simulation time of 25 ns with a time step of 1 fs for each lambda window. Of each window, we discard the first 10 ns as equilibration and use the other 15 ns for analysis. For the electrostatics part, we used 100 ns simulation time at a timestep of 2 fs, discarding the first 40 ns as equilibration.

Furthermore, we ran gas-phase TI calculations as a reference for the electrostatic contribution, using the periodic box of the equilibrated structure without a barostat. This is not required for the VdW part due to the way softcore interactions are handled in Amber.

Free energies profiles were computed separately for the two steps, using the MBAR method as implemented in PyMBAR [88].

K S values

From each GIST or TI run, the salting-out coefficient KS and its standard error were computed using Eq. 16 from the slope of ΔGsolv plotted against the salt concentration c.

Results and discussion

Energy and first-order ion entropy

We performed a 10 μs simulation of a 1 M NaCl solution around carbazole, storing 10 million frames. We used all frames to obtain first-order ion entropies, and 100,000 frames to compute the GIST energy contributions. We use a reference density of 0.03244 \({\AA }^{-3}\) for water and \(6.190\times {10}^{-4}\) and \(6.145\times {10}^{-4}\) for Na+ and Cl, respectively. We reference the energy using Eq. 11 and subtract a reference value of 0.0024 kcal/mol from \(T\Delta {S}^{six}\). Figure 1 shows slices of the 3-dimensional contributions to those quantities around the carbazole.

Fig. 1
figure 1

TΔS of the first order ion entropy (first row) and solvent–solvent and solute–solvent energy around carbazole (second row). To improve visibility of smaller values, the colorbar in each row is scaled so that the highest absolute value exceeds the color limit 25-fold. The integrals were computed within 10 Å of any carbazole atom

We find that the Na+ entropy around carbazole is consistently positive, indicating a reduced density compared to bulk. The Cl entropy is also mostly positive but features a strongly negative region around the nitrogen atom, indicating that the density there is increased compared to bulk. In total, the first order entropy contributes very little to the solvation free energy, with integrals of 0.05 and 0.01 kcal/mol for Na+ and Cl, respectively.

The energy contributions are much larger. As expected, we find that the solute–solvent energy is negative, while the referenced solvent–solvent energy is positive. The effect of the hydrophobic moiety on the energy contributions is much smaller than that of the nitrogen atom, which is a hydrogen bond donor. This is also true for the solvent–solvent contribution, confirming that the hydrogen bond network around a hydrophobic solvent stays largely intact, while a hydrogen bond donor reduces the number of water–water hydrogen bonds.

Second order entropy with and without KSA

We use Eq. 14 to compute the second order (3-body) entropy of all combinations of water, Na+ and Cl around carbazole at 1 M salt concentration. We compute the entropy using histograms of the radial solvent–solvent distance on a three-dimensional grid. We use a grid resolution of 1 × 1 × 1 Å3 and a radial bin size of 0.125 Å. All entropy values shown are multiplied by \({k}_{B}T\) to give (negative) free energy contributions, and properly referenced to bulk. To assert that the bin size of 0.125 is sufficient, we re-computed the water–water, water–cation and water­–anion entropy at a coarser bin size of 0.25 Å. The result is very similar and is shown in SI Fig. 1.

The total contributions of all first- and second-order entropies are shown in Fig. 2. The largest contributions to the total entropy are the first order water entropy and the second order water–cation and water–anion entropies.

Fig. 2
figure 2

Waterfall plot showing the contributions of different entropy terms to the solvation free energy of carbazole in 1 M NaCl solution

Furthermore, we computed all second order entropies using the KSA. This allows us to validate this approximation by plotting the KSA values per voxel against the respective values from the conditional histograms, as shown in Fig. 3.

Fig. 3
figure 3

Scatter plots of the KSA entropy values per voxel against the values from the conditional histograms, for every combination of solvents. All values are in kcal/(mol Å3)

We find that the predictions of the KSA are qualitatively wrong whenever the second molecule \({v}{^{\prime}}\) is water. This is seen most clearly in the cation–water entropy, but the correlations for the anion–water and water–water entropies are similarly bad, except for few voxels with very high entropies.

On the other hand, the entropies where \({v}{^{\prime}}\) is an ion are predicted much better by the KSA. This finding is consistent with literature stating that the KSA is valid at sufficiently low density [89]. Especially the water–cation and water–anion correlations are predicted almost perfectly. This is especially important since those contributions show very large absolute numbers compared to the ion–ion entropies.

As seen in Fig. 2, the water–water entropy shows the highest individual voxel values. However, the integral of this contribution is very small, since the positive and negative values cancel out. We note that this does not take rotational contributions into account, which are likely to be significant for the water–water entropy, due to the high anisotropy of water.

To gain a better understanding of why the KSA works for the water–ion contributions but not for the water–water contribution, we plot the individual \({g}^{inh}\), \({g}^{0}\), and \({G}_{s{\nu }{^{\prime}}}\) values that are required to compute the second order entropy. Figure 4 shows the water–water and water–cation distribution functions at a grid voxel close to the carbazole molecule (centered at [0.5, 4.5, 0.5] Å). Using the KSA amounts to assuming that ginh and g0 are equal.

Fig. 4
figure 4

\({\mathrm{g}}^{\mathrm{inh}}\), \({\mathrm{g}}^{0}\), and \({\mathrm{G}}_{\mathrm{s}{\upnu }^{{{\prime}}}}\) distribution functions for water–water (left panel) and water–Na+ (right panel) systems, at a grid voxel close to the carbazole molecule, centered at (0.5, 4.5, 0.5) Å in the same coordinate system as used below

In terms of GIST, the assumption underlying the KSA is that, given solvent types \(\nu\) and \({\nu }{^{\prime}}\), a molecule \(\nu\) in a region with little \({\nu }{^{\prime}}\) will be poorly coordinated by \({\nu }{^{\prime}}\) molecules. In the case of water–water this assumption is violated, as shown in the left panel of Fig. 4. \({G}_{s{\nu }{^{\prime}}}\) indicates that there should be rather little water at a distance of 0.3 nm to the reference voxel, since a part of the volume at this distance is excluded by the solute. Therefore, the KSA assumes that a water molecule at this position would be poorly coordinated. However, the high peak in \({g}^{inh}\) shows that this is not the case, and that a water molecule at this position is indeed coordinated rather strongly, resulting in a second-order entropy that is lower than in bulk. This is expected, since it is well known [90] that water molecules near small hydrophobic solutes tend to keep up their hydrogen bond network at the cost of an entropic penalty.

In the case of the water–cation entropy, however, the assumption of the KSA holds true. The right panel of Fig. 4 shows that the coordination of a water molecule with Na+ is reduced in the vicinity of a hydrophobic solute. For this specific test voxel, it is even lower than expected by the KSA, indicated by a lower peak in \({g}^{inh}\).

In Fig. 5, we visualize the spatially resolved water–water, water–cation, and water–anion entropy contributions around carbazole. The carbazole molecule may be divided in two regions, where one is a large hydrophobic moiety formed by the aromatic carbon atoms, and the other is the nitrogen atom, which is a hydrogen bond donor and therefore creates a region of increased water and Cl density.

Fig. 5
figure 5

Spatially resolved water–water, water–cation, and water–anion entropies around carbazole. The left column shows the values computed from the KSA, while the right column shows the values from the conditional histogram approach. The outline of carbazole is superposed on the image for reference. The integrals were computed within 10 Å of any carbazole atom

For the water–water contribution (first row in Fig. 5), we find that the KSA is qualitatively wrong over the whole hydrophobic moiety of carbazole, predicting the wrong sign of the water–water entropy contribution. Close to the hydrogen bond donor, the agreement is better. This is because the carbazole nitrogen already satisfies one of the water hydrogen bonds, and thereby reduces the coordination of water more effectively than the hydrophobic region.

In contrast, the KSA performs very well overall for the water–cation contribution (second row in Fig. 5). The highest values are found close to the hydrogen bond donor group, where the water density is very high, but the Na+ density is very low. Similarly, the KSA performs well for the water–anion contribution. Here, we find an additional region with entropy lower than bulk, indicating that water in this region may be coordinated to a Cl that is next to the nitrogen atom, resulting in an increased \({G}_{s{\nu }{^{\prime}}}\).

Free energy of solvation

We performed TI calculations of carbazole in 1.0 M NaCl solution, which result in a solvation free energy of −7.4 kcal/mol. Summing up all the energies and first- and second-order entropies calculated above results in a free energy of solvation of carbazole of +4.2 kcal/mol. As shown above, the effect of the water–water entropy is very small, but it is known from previous work on GIST in pure water that the second and higher orders should be on the order of −0.4 times the first order water entropy. If we omit the second-order entropy and use this approximation instead, we arrive at a more realistic value of −2.34 kcal/mol.

This shows that orientational water–water correlations are more relevant to the final hydration free energy than the radial contribution described here. However, an analog linear correction does not seem to be possible for the ionic contributions, since e.g., the water–ion correlations are larger than the total ion entropy.

For prospective works, this implies that the water–water entropy should be estimated either via a simple linear correction, or via a higher-dimensional approximation of the second-order entropy. Since the KSA performs well on water–ion entropies, while strongly reducing the amount of sampling required, it should be feasible to compute ionic entropies via the KSA in a higher-dimensional representation. This would allow for fast and accurate estimation of the ionic entropy. A full 3-body treatment of the ionic contributions, however, would be prohibitively challenging in terms of sampling requirements. Therefore, such a higher-dimensional representation is out-of-scope for the present work.

K S values from GIST

To test the validity of our method on a more diverse set of structures, we used GIST to calculate KS values of 16 rigid small molecules, for which experimental KS values are available from the literature [34]. We only use the rigid molecules from the dataset, since GIST is defined with respect to a rigid solute structure, and flexible molecules require individual calculations for each relevant conformation.

For each molecule, we performed 11 simulations at different salt concentrations from 0 to 1.0 M and used GIST to obtain the free energy of solvation (ΔGsolv). We did not perform the full 3-body calculation but estimated the second order entropy contributions using the KSA (excluding the water–water contribution).

ΔGsolv increases linearly with the salt concentration. We compute KS from the slope of ΔGsolv and use the uncertainty of the fit as a measure of accuracy.

In Fig. 6, we compare the results to experimental KS values [34].

Fig. 6
figure 6

Comparison of computed KS values with experimental values. Blue: with second order entropy contributions as computed by the KSA. Orange: without second order entropy

Without the second order entropy, we find a Pearson correlation of 0.74 between the calculated and experimental values. However, the individual values are substantially higher than the experimental ones. This shows that the missing higher order entropies counteract the energy contribution in terms of the salting out effect.

Subtracting the second order integrals of TΔS, we find that the absolute values are significantly closer to the experimental ones, and the Pearson correlation (R) slightly improves from 0.74 to 0.78. This indicates that the second order entropy, and especially the water–ion entropy, present a large negative contribution to (and thereby reduce) the salting-out effect.

There is still a significant overprediction of the salting-out effect. One likely reason for this is the simplified 1-dimensional representation of the pair distribution functions in the second-order entropy. However, higher-order entropy contributions might also play a role.

In Fig. 6, the KS value of bisphenol A (BPA) is significantly overpredicted. This has already been noted in prior work by Misin et al. [54]. They argue that this overprediction is due to a relatively stable dimer between BPA and a Na+ ion, which is hard to describe using conventional force fields.

Conclusion

The thermodynamics of solvation are crucial in many fields, such as the pharmaceutical industry or environmental research. While most implicit solvent models treat the ionic strength as an integral part of solvation, many methods based on explicit solvent models lack in this regard.

While the GIST method has been previously applied to a wide range of topics ranging from drug discovery [91, 92] to biophysical properties of antibodies [57], it was previously limited to pure solvents composed of rigid molecules. Here, we extend it towards salt-water systems and, prospectively, to arbitrary solvent mixtures. Since many biochemical processes and methods rely on high salt concentrations, we believe that this will greatly increase the applicability of GIST, and that the detailed resolution of free energy values computed by this method will be valuable for in-depth analysis of solvation effects. In cases where ions are only used to neutralize the simulation box, the ionic entropy contributions will be negligibly small due to the low concentration. However, our approach provides a physical way of treating the ion–water energy in those cases.

While the extension of the energy calculation is relatively straight-forward, the entropy calculation is more challenging. We benchmark our algorithm by presenting spatially resolved contributions to the solvation entropy of carbazole in a 1 M NaCl solution. We compute the second order entropy from a 1-dimensional histogram representation on a 3D grid and test the possibility of quickly estimating those contributions using the KSA.

Our results show that the first-order ion entropy, as well as the second order ion–ion and ion–water entropies, are very minor contributions to the solvation entropy. On the other hand, the water–ion entropies contribute strongly to the free energy of solvation. The water–water contribution largely cancels out, even though the individual voxel values are rather high. Combined with literature results on solvation in pure water showing that the higher order entropies are roughly −0.4 times the first order entropy, this indicates that orientational contributions also play a significant role.

The good performance of the KSA at computing water–ion entropies is crucial for practical applications of our method, since it allows a fast estimation of the ionic contributions to the entropy of solvation, which would otherwise be very challenging computationally. In future works, using a higher-dimensional representation of the water–ion pair distribution function for the KSA computation might also be feasible.

However, the KSA should not be used to compute the water–water entropy, and will probably also fail when the salt concentration significantly exceeds the 1 mol/L used here. Instead, the relatively high density of water makes it easier to obtain converged results for the 3-body entropy calculation. The most effective way of obtaining the full solvation entropy in a salt-water mixture would be a combination of KSA calculations for the ionic terms and full 3-body computation for the water–water entropy.

We further benchmark our algorithm by calculating salting-out coefficients for a set of 16 small molecules. We find a compelling correlation to experimental values, though the individual values are strongly overpredicted. The second order reduces this overprediction roughly by a factor of 2, indicating that the missing degrees of freedom are also relevant.

Due to the wide applicability of GIST, we expect that our extension will provide an effective tool to explain salt-dependent processes that govern the function of biomolecular systems.