Introduction

Charge sensitivity analysis (CSA) [1], the formalism employed in this study, originates from density functional theory (DFT) and is a part of conceptual DFT [24]. It may be considered as a generalization of electronegativity equalization (EE) method [5]. The EE-based methods [612] were mainly applied for deriving charge distribution inside non-interacting molecules. Such static model was introduced to molecular dynamics by Rappe and Goddard [8] as initial guess generator (charge distribution generator). The use of extended Lagrangian method extended the area of application of EE method to interacting molecules [13, 14]. However, the model introduced unphysical thermal fluctuations and additional thermostat was required to cool down the “polarization” degree of freedom. In contrast to EE methods, CSA can describe polarization directly without resorting to Lagrangian method and in addition it allows to define a wide range of sensitivities for each chemically interesting partitioning of the system, therefore, one can monitor the progress of a given chemical process. It was demonstrated in static applications that CSA has a huge potential in the reactivity theory [1517]. Therefore, the dynamic aspect of the CSA may provide many additional information.

The charge sensitivity, \( P_{AB}^{\mathcal{R}} = \left( {\partial p_{A} /\partial t_{B} } \right)_{\mathcal{R}} \), is a differential quantity and represents the response of the parameter p characterizing equilibrium state of fragment A to a displacement of parameter t of the equilibrium state of the another fragment B (see Chapter 1 of ref [1]). This response is measured under specific constraints imposed on the molecular remainder \( {\mathcal{R}} = \left( {C,\;D,\;E, {\ldots}, X,\;Y,\;Z} \right).\) The remainder can be further divided into the freely relaxing and frozen parts . Here, the broken lines are placed between relaxing fragments while the solid lines are placed between frozen fragments. Based on this convention, one can define rigid \( [{\mathcal{R}} = (C | D | E |{\text{ }} \ldots {\text{ }}\left| X \right|Y|Z)], \) relaxed and any intermediate sensitivities. In such a way, the whole hierarchy of \( P_{AB}^{\mathcal{R}} \) can be computed.

Routinely, CSA was applied as supplementary tool to semiempirical or ab initio calculations. Our main intention is to extend its area of applications. We plan to couple CSA with molecular dynamics (MD) simulations. Prior to this, the method should be parameterized for a given force field. This will be also a step towards polarizable force fields since standard force fields used in molecular modeling describe electrostatic interactions in terms of fixed, atom centered, charges. In addition, CSA will introduce “dynamic” qualitative structure activity relationships (QSAR) or qualitative structure property relationships (QSPR) models into MD simulations.

The article is organized as follows. First, CSA methodology is given. Next, the optimization procedure is described. Afterwards, the results obtained are discussed. Finally, conclusions and future prospects are briefly discussed.

A short survey of charge sensitivity analysis

The CSA in atomic resolution is based on second-order Taylor expansion of the system energy \( \left( {E_{\text{M}} } \right) \) with respect to atomic charges \( \left[ {{\mathbf{q}} = (q_{1} ,q_{2} , \ldots ,q_{N} )} \right] \):

$$ \begin{aligned} {\text{d}}E_{\text{M}} & = \sum\limits_{i = 1}^{N} {\left( {\frac{{\partial E_{\text{M}} }}{{\partial q_{i} }}} \right){\text{d}}q_{i} } + \frac{1}{2}\sum\limits_{i = 1}^{N }{\sum\limits_{j = 1}^{N} {\left( {\frac{{\partial^{2} E_{\text{M}}}}{{\partial q_{i} \partial q_{j} }}} \right){\text{d}}q_{i} {\text{d}}q_{j} } } \\ \quad \quad \, & = \sum\limits_{\Upomega } {\sum\limits_{i = 1}^{{n_{\Upomega } }} {\left( {\frac{{\partial E_{\text{M}} }}{{\partial q_{i} }}} \right){\text{d}}q_{i} } } +\frac{1}{2}\sum\limits_{\Upomega } {\sum\limits_{i = 1}^{{n_{\Upomega } }} {\sum\limits_{\Upxi } {\sum\limits_{j = 1}^{{n_{\Upxi } }} {\left( {\frac{{\partial^{2} E_{\text{M}}}}{{\partial q_{i} \partial q_{j} }}} \right){\text{d}}q_{i}{\text{d}}q_{j} } } } }, \,\quad \\ \quad & \, \equiv \sum\limits_{\Upomega } {\sum\limits_{i = 1}^{{n_{\Upomega } }} {{{\upchi}}_{i} {\text{d}}q_{i} } } + \frac{1}{2}\sum\limits_{\Upomega } {\sum\limits_{i = 1}^{{n_{\Upomega } }} {\sum\limits_{\Upxi } {\sum\limits_{j = 1}^{{n_{\Upxi } }} {{\text{d}}q_{i} \eta_{ij} {\text{d}}q_{j} } } } }, \,\quad \quad \, (\Upomega ,\;\Upxi ) \in \{ A,B, \ldots Z\}.\\ \end{aligned} $$
(1)

The sum over fragments \( \Upomega \, \) and \( \, \Upxi \) is introduced for further derivations. The number of atoms in the fragments Ω and Ξ are denoted by n Ω and n Ξ, respectively. The overall number of atoms in the system is equal to N. All differentiations are carried out for a fixed external potential due to nuclei \( {\mathbf{v}} \) and frozen atomic charges except those distinguished in derivatives. The atomic electronegativities \( {\varvec{\chi}} = (\chi_{1} ,\chi_{2} , \ldots ,\chi_{N} ) \) \( = ({\varvec{\chi}}_{\text{A}} ,{\varvec{\chi}}_{\text{B}} , \ldots ,{\varvec{\chi}}_{Z} ) \) and the elements of hardness matrix \( {\varvec{\eta}} = \{ \eta_{ij} \} = \{ {\varvec{\eta}}_{\Upomega \Upxi } \}\; [\left( {i,j} \right) \in \left\{ { 1, 2, \ldots ,N} \right\}; \) (Ω,Ξ) ϵ {A, B,…, Z}] are the main CSA parameters which should be adjusted to reproduce atomic charges on the reference set of molecules. Both \( {\varvec{\chi}} \) and \( {\varvec{\eta}} \) are totally rigid quantities since by definition molecular system M is divided into N mutually closed atoms.

In fragment resolution, the following set of equations:

$$ \begin{array}{*{20}c} {\chi_{1}^{A} = \chi_{2}^{A} = \cdots \chi_{{n_{A} }}^{A} = \chi_{A} \equiv \left( {{{\partial E_{\text{M}} } \mathord{\left/ {\vphantom {{\partial E_{\text{M}} } {\partial q_{A} }}} \right. \kern-\nulldelimiterspace} {\partial q_{A} }}} \right)} \\ {\chi_{1}^{B} = \chi_{2}^{B} = \cdots \chi_{{n_{B} }}^{B} = \chi_{B} \equiv \left( {{{\partial E_{\text{M}} } \mathord{\left/ {\vphantom {{\partial E_{\text{M}} } {\partial q_{B} }}} \right. \kern-\nulldelimiterspace} {\partial q_{B} }}} \right)} \\ \vdots \\ {\chi_{1}^{Z} = \chi_{2}^{Z} = \cdots \chi_{{n_{Z} }}^{Z} = \chi_{Z} \equiv \left( {{{\partial E_{\text{M}} } \mathord{\left/ {\vphantom {{\partial E_{\text{M}} } {\partial q_{Z} }}} \right. \kern-\nulldelimiterspace} {\partial q_{Z} }}} \right)} \\ \end{array} , $$
(2)

marks the intra-fragment equilibrium. In general, fragment electronegativities are different: \( \chi_{A} \ne \chi_{B} \ne \cdots \ne \chi_{Z} \). This equation should be completed by charge conservation equations:

$$ \sum\limits_{i \in \Upomega } {{\text{d}}q_{i} } = q_{\Upomega } ,\,\quad\Upomega \in \{ A,B, \ldots Z\} , $$
(3)

The whole formalism can be summarized in a single matrix equation [17]:

$$ \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} 0 & 0 & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\mathbf{1}}_{A} } \\ \vdots \\ {{\mathbf{0}}_{A} } \\ \end{array} \quad } & {\begin{array}{*{20}c} {{\mathbf{0}}_{B} } \\ \vdots \\ {{\mathbf{0}}_{B} } \\ \end{array} \quad } & \cdots & {\begin{array}{*{20}c} {{\mathbf{0}}_{{\mathbf{Z}}} } \\ \vdots \\ {{\mathbf{1}}_{Z} } \\ \end{array} } \\ \end{array} } \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\mathbf{1}}_{A}^{{\text{\dag }}} } & \cdots & {{\mathbf{0}}_{A}^{{\text{\dag }}} } \\ \end{array} } \\ {\begin{array}{*{20}c} {{\mathbf{0}}_{B}^{{\text{\dag }}} } & \cdots & {{\mathbf{0}}_{B}^{{\text{\dag }}} } \\ \end{array} } \\ \vdots \\ {\begin{array}{*{20}c} {{\mathbf{0}}_{{\mathbf{Z}}}^{{\text{\dag }}} } & \cdots & {{\mathbf{1}}_{{\mathbf{Z}}}^{{\text{\dag }}} } \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} {{\varvec{\eta}}_{AA} } & {{\varvec{\eta}}_{AB} } & \cdots & {{\varvec{\eta}}_{AZ} } \\ {{\varvec{\eta}}_{BA} } & {{\varvec{\eta}}_{BB} } & \cdots & {{\varvec{\eta}}_{BZ} } \\ \vdots & \vdots & \ddots & \vdots \\ {{\varvec{\eta}}_{ZA} } & {{\varvec{\eta}}_{ZB} } & \cdots & {{\varvec{\eta}}_{ZZ} } \\ \end{array} } \\ \end{array} } \right)\;\left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} { - \chi_{A} } \\ \vdots \\ { - \chi_{Z} } \\ \end{array} } \\ {\begin{array}{*{20}c} {{\mathbf{q}}_{A} } \\ {{\mathbf{q}}_{B} } \\ \vdots \\ {{\mathbf{q}}_{Z}^{{}} } \\ \end{array} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {q_{A} } \\ \vdots \\ {q_{Z} } \\ \end{array} } \\ {\begin{array}{*{20}c} { - {\varvec{\chi}}_{A}^{{}} } \\ { - {\varvec{\chi}}_{B}^{{}} } \\ \vdots \\ { - {\varvec{\chi}}_{Z}^{{}} } \\ \end{array} } \\ \end{array} } \right), $$
(4)

where vectors \( {\mathbf{1}}_{\Upomega } \) and \( {\mathbf{0}}_{\Upomega } \) are filled by 0 and 1, respectively. In order to simplify this equation, the neutral atom limit, \( dq_{\Upomega } = q_{\Upomega } \) have been used. By inverting Eq 4, the charge distributions inside fragments are obtained:

$$ \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} { - \chi_{A} } \\ \vdots \\ { - \chi_{Z} } \\ \end{array} } \\ {\begin{array}{*{20}c} {{\mathbf{q}}_{A} } \\ {{\mathbf{q}}_{B} } \\ \vdots \\ {{\mathbf{q}}_{Z}^{{}} } \\ \end{array} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} { - \eta_{AA} } & \cdots & { - \eta_{AZ} } \\ \vdots & \ddots & \vdots \\ { - \eta_{ZA} } & \cdots & { - \eta_{ZZ} } \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\mathbf{f}}_{A,A} } \\ \vdots \\ {{\mathbf{f}}_{Z,A} } \\ \end{array} \quad } & {\begin{array}{*{20}c} {{\mathbf{f}}_{A ,B} } \\ \vdots \\ {{\mathbf{f}}_{Z,B} } \\ \end{array} \quad } & \cdots & {\begin{array}{*{20}c} {{\mathbf{f}}_{A,Z} } \\ \vdots \\ {{\mathbf{f}}_{Z,Z} } \\ \end{array} } \\ \end{array} } \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\mathbf{f}}_{A,A}^{{\text{\dag }}} } & \cdots & {{\mathbf{f}}_{A,Z}^{{\text{\dag }}} } \\ \end{array} } \\ {\begin{array}{*{20}c} {{\mathbf{f}}_{B,A}^{{\text{\dag }}} } & \cdots & {{\mathbf{f}}_{B,Z}^{{\text{\dag }}} } \\ \end{array} } \\ \vdots \\ {\begin{array}{*{20}c} {{\mathbf{f}}_{Z,A}^{{\text{\dag }}} } & \cdots & {{\mathbf{f}}_{Z,Z}^{{\text{\dag }}} } \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} { - {\varvec{\beta}}_{AA} } & { - {\varvec{\beta}}_{AB} } & \cdots & { - {\varvec{\beta}}_{AZ} } \\ { - {\varvec{\beta}}_{BA} } & { - {\varvec{\beta}}_{BB} } & \cdots & { - {\varvec{\beta}}_{BZ} } \\ \vdots & \vdots & \ddots & \vdots \\ { - {\varvec{\beta}}_{ZA} } & { - {\varvec{\beta}}_{ZB} } & \cdots & { - {\varvec{\beta}}_{ZZ} } \\ \end{array} } \\ \end{array} } \right)\;\left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {q_{A} } \\ \vdots \\ {q_{Z} } \\ \end{array} } \\ {\begin{array}{*{20}c} { - {\varvec{\chi}}_{A}^{{}} } \\ { - {\varvec{\chi}}_{B}^{{}} } \\ \vdots \\ { - {\varvec{\chi}}_{Z} } \\ \end{array} } \\ \end{array} } \right). $$
(5)

The elements of the inverse matrix are [17]: diagonal \( ({\mathbf{f}}_{A,A} = \{ \partial q_{i}^{A} /\partial q_{A} \} ) \) and off-diagonal \( ({\mathbf{f}}_{A,B} = \{ \partial q_{i}^{A} /\partial q_{B} \} ) \) Fukui function (FF) vectors, diagonal \( ({\varvec{\beta}}_{A,A} = \{ \partial q_{i}^{A} /\partial v_{j}^{A} \} ) \) and off-diagonal \( \left( {{\varvec{\beta}}_{A,B} = \{ \partial q_{i}^{A} /\partial v_{j}^{B} \} } \right) \) polarization matrices, and hardness matrix \( ({\varvec{\eta}}^{\text{frg}} = \{ \eta_{AB} = \partial \chi_{A} /\partial q_{B} \} ) \) in fragment (frg) resolution. Diagonal FF vectors are normalized to unity \( ({\mathbf{f}}_{A,A}^{{}} {\mathbf{1}}_{A}^{{\text{\dag }}} = 1) \) while off-diagonal are normalized to zero (\( {\mathbf{f}}_{{{\mathbf{A}},B}}^{{}} {\mathbf{1}}_{B}^{{\text{\dag }}} = 0 \)). Both diagonal and off-diagonal polarization matrices are normalized to zero \( \sum\nolimits_{i \in \Upomega }^{{n_{\Upomega } }} {({\varvec{\beta}}_{\Upomega \Upxi } )_{ij} } = 0,\,(\Upomega ,\;\Upxi ) \in \{ A,B, \ldots Z\} , \) since perturbation in the external potential at the position of atom j belonging to fragment B \( (v_{j}^{B} ) \) does not change the overall charge. When one is interested in global equilibrium \( (\chi_{A} = \chi_{B} = \cdots = \chi_{Z} = \chi ), \) Eqs 4 and 5 have simpler form:

$$ \left( {\begin{array}{*{20}c} 0 & {\begin{array}{*{20}c} {1\quad } & {1\quad } & { \cdots \,\;} & 1 \\ \end{array} } \\ {\begin{array}{*{20}c} 1 \\ 1 \\ \vdots \\ 1 \\ \end{array} } & {\begin{array}{*{20}c} {{{\upeta}}_{11} } & {{{\upeta}}_{12} } & \cdots & {{{\upeta}}_{1N} } \\ {{{\upeta}}_{21} } & {{{\upeta}}_{22} } & \cdots & {{{\upeta}}_{2N} } \\ \vdots & \vdots & \ddots & \vdots \\ {{{\upeta}}_{N1} } & {{{\upeta}}_{N2} } & \cdots & {{{\upeta}}_{NN} } \\ \end{array} } \\ \end{array} } \right)\;\left( {\begin{array}{*{20}c} { - {{\upchi}}} \\ {\begin{array}{*{20}c} {q_{1} } \\ {q_{2} } \\ \vdots \\ {q_{N}^{{}} } \\ \end{array} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} q \\ {\begin{array}{*{20}c} { - {{\upchi}}_{1}^{{}} } \\ { - {{\upchi}}_{2}^{{}} } \\ \vdots \\ { - {{\upchi}}_{N}^{{}} } \\ \end{array} } \\ \end{array} } \right), $$
(6)
$$ \;\left( {\begin{array}{*{20}c} { - {{\upchi}}} \\ {\begin{array}{*{20}c} {q_{1} } \\ {q_{2} } \\ \vdots \\ {q_{N}^{{}} } \\ \end{array} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} { - {{\upeta}}} & {\begin{array}{*{20}c} {f_{1} \;\;} & {f_{2} } & {\quad \; \cdots \,} & {f_{N} } \\ \end{array} } \\ {\begin{array}{*{20}c} {f_{1} } \\ {f_{2} } \\ \vdots \\ {f_{N} } \\ \end{array} } & {\begin{array}{*{20}c} { - {{\upbeta}}_{11} } & { - {{\upbeta}}_{12} } & \cdots & { - {{\upbeta}}_{1N} } \\ { - {{\upbeta}}_{21} } & { - {{\upbeta}}_{22} } & \cdots & { - {{\upbeta}}_{2N} } \\ \vdots & \vdots & \ddots & \vdots \\ { - {{\upbeta}}_{N1} } & { - {{\upbeta}}_{N2} } & \cdots & { - {{\upbeta}}_{NN} } \\ \end{array} } \\ \end{array} } \right)\left( {\begin{array}{*{20}c} q \\ {\begin{array}{*{20}c} { - {{\upchi}}_{1}^{{}} } \\ { - {{\upchi}}_{2}^{{}} } \\ \vdots \\ { - {{\upchi}}_{N}^{{}} } \\ \end{array} } \\ \end{array} } \right), $$
(7)

where \( \chi = \partial E_{\text{M}} /\partial q \) is the global electronegativity [18], \( \eta = \partial^{2} E_{\text{M}} /\partial q^{2} = \partial \chi /\partial q \) is the global hardness [19], \( {\varvec{\beta}} = \{ \beta_{ij} = \partial q_{i} /\partial v_{j} , { }i , { }j \, = { 1, 2, } \ldots ,N\} \) is the polarization or linear response matrix and \( {\mathbf{f}} = \{ f{}_{i}\; = \partial q_{i} /\partial q\;,\;i = 1,2, \ldots N\} \) is the FF [20] vector. The FF vector is normalized to 1 \( \left( {{\mathbf{f}}\;{\mathbf{1}}^{{\text{\dag }}} = 1} \right). \) All elements in rows or columns of \( {\varvec{\beta}} \) sum up to zero \( \left( {\sum\nolimits_{i}^{N} {\beta_{ij} } = \sum\nolimits_{i}^{N} {\beta_{ji} } = 0} \right). \) Equation 6 and 7 describe transformation from constrained equilibrium to global equilibrium. In the state of constrained equilibrium, all atoms in the system M are closed to each other: M = (1|2|…|N). Namely, the charge-transfer (CT) among atoms is not allowed. By removing barriers on CT the global equilibrium is restored: M = (1¦2¦…¦N). Intermediate, hypothetical equilibria are described by Eq. 4 and 5.

Empirical parameters of CSA are vector χ and hardnesses matrix η. The off-diagonal hardnesses, which have the meaning of two electron integrals, can be related to diagonal hardnesses by empirical interpolation formulas. We have demonstrated previously [21] that Ohno interpolation formula [22]:

$$ {{\upeta}}_{ij} = \frac{1}{{\sqrt {a_{ij}^{2} + R_{ij}^{2} } }}, $$
(8)

where \( a_{ij} = {2 \mathord{\left/ {\vphantom {2 {(\eta_{ii} + \eta_{jj} )}}} \right. \kern-\nulldelimiterspace} {(\eta_{ii} + \eta_{jj} )}} \) and \( R_{ij} \) is interatomic separation, gave the sharpest distribution around the reference ab initio charges and for this reason is employed in the article. For isolated atoms, \( \chi_{i} \) and \( \eta_{ii} \) depend solely on the atomic number [2, 23]. In molecules, however, they are also influenced by atom’s hybridization and nearest chemical environment [24, 25]. These quantities are named effective electronegativity \( \left( {\chi_{i} } \right) \) and effective hardness \( (\eta_{i} \equiv \eta_{ii} ) \) throughout the article. They have nothing in common with global electronegativity \( (\chi ) \) and global hardness \( (\eta ). \) Matching rules should be applied to compute \( \chi (\eta ) \) from \( \chi_{i} (\eta_{i} ). \) The resultant atomic hardnesses: \( \overline{\eta }_{i} = (\partial \mu /\partial N_{i} ) = \sum\nolimits_{j}^{N} {\eta_{ij} f_{j} } = \eta , \) which fulfills hardness equalization principle, are different from the effective atomic hardnesses. More about equalization principles in fragment and global resolutions can be found in Ref. [26].

The effective quantities are determined by reproducing gas phase charge distribution for a set of training molecules calculated with an ab initio method. In our previous article [21], we have determined CSA parameters based on original Wiener’s AMBERff84 [27] atomic types. Two population analyses were used independently to calculate the reference ab initio charge distribution of training molecules, namely Mulliken Population Analysis (MPA) [28] and electrostatic potential fitting (ChelpG scheme) [29, 30] calculated at HF/6-31G* level of theory. It was shown that MPA charges are much better reproduced by CSA than ChelpG charges, what indicated their better transferability from system to system. In this study, we extend force-field resolved CSA to other population analyses. Hirshfeld [31], Voronoi [32, 33], natural population analysis (NPA) [34] and Bader’s atoms-in-molecules (AIM) [35] charges have been employed. We have also introduced the additional atomic types consistent with AMBERff99 [36, 37] parameterization. The goal of this study is to explore to what extent these charges can be reproduced by CSA.

Methods

All ab initio charges were calculated at B3LYP/6-31G* level of theory. Gaussian [38] program was used for calculating Hirshfeld and NPA charges. Voronoi and AIM charges were calculated with the use of Amsterdam density functional [39] package and DZP basis set. We have also performed parameterization with Mulliken charges computed at B3LYP/6-31G* level of theory for comparison.

The training set consisted of a hundred small and medium sized organic molecules. The total number of atoms in this set was equal 1617. Most of the molecules were of biological importance in order to cover the area of application of the AMBER force field. In particular, standard amino acids and DNA/RNA bases were included. The structure of all molecules was optimized at HF/6-31G* level of theory. The information concerning these molecules can be found in Ref. [21].

Our training set contains 38 AMBERff99 atom types. It means that the number of CSA parameters is equal to 76. For this reason systematic search of any kind could not have been employed. Instead we have used evolutionary algorithms (EA). Nonetheless, simultaneous optimization of so many parameters is still difficult. That is the reason why a sequential procedure has been used. First, the parameterization for different elements: H, C, N, O and S was performed (trivial-atom resolution). Then sp 2 and sp 3 hybridization for carbon, nitrogen and oxygen was introduced (hybridized-atom resolution). Parameters obtained at trivial-atom resolution were perturbed no more than ±30 % of their starting values. Finally AMBERff99 atom types were introduced. Optimization was conducted as follows: each element was parameterized in turn; the values of parameters from hybridized-atom resolution were perturbed. Hardnesses were perturbed in the range of ±30 % of their initial (hybridized-atom) values. Since electronegativities exhibited much lower changeability in the previous runs, the range for those parameters was narrowed to ±15 % of their initial values. Fitness function was defined as:

$$ S^{2} = - \sum\limits_{A} {\sum\limits_{\alpha \in A} {\left( {q_{A,\alpha }^{\text{B3LYP}} - q_{A,\alpha }^{\text{CSA}} } \right)^{2} } } , $$
(9)

where q B3LYP A and q CSA A denote vectors collecting A-th molecule’s atomic charges calculated with ab initio methods and CSA, respectively. The first sum in Eq 9 goes over all the training molecules. S 2 tends to zero in the ideal case of q CSA A being identical to q B3LYP A . Negative sign of the function is introduced in order to make it increasing. The details of genetic calculations were the same as in our previous study. GAUL [40] library was used for performing EA calculations. It was coupled with the CSA package developed in our group.

Results

In Table 1, the values of the fitness function, correlation coefficients (R 2) and best linear fits (y = ax + b) between CSA and respective ab initio charges are collected. Examples of correlation plots between CSA and ab initio charges are presented in Figs. 1, 2, 3. The obtained values of y-intercepts in linear fits were negligible so y = ax type fits were adopted. It can be seen that out of the five population analyses employed three, namely AIM, MPA and NPA, are already very well reproduced at the trivial-atoms resolution. For these analyses correlation coefficients are very close to unity (0.987, 0.983, and 0.981, respectively) and fitness function exhibits low absolute values (6.39, 3.03 and 5.14, respectively). The slopes of linear fits are close to unity and are equal to 0.98 in all three cases. The reproduction of Hirshfeld and Voronoi charges is quite good. For these population analyses correlation coefficients (0.879 and 0.907, respectively) and linear fit slopes (0.88 and 0.91, respectively) are lower than for AIM, MPA and NPA charges. Despite that, the absolute value of the fitness function is considerably smaller (2.53 and 2.39, respectively) that is in contradiction with previous observations. The discrepancy between values of fitness function and correlation coefficients can be attributed to charge distributions. Depending on the population analysis employed different ranges for the charges are obtained. AIM charges cover the range between −1.5 and 2.5e. NPA and MPA charges do not exceed ±1.5 and ±1.0e, respectively. For Hirshfeld and Voronoi population analyses the absolute values of the charges are small (≤0.5e). Less spread charges are responsible for decrease in the fitness functions.

Table 1 Parameters characterizing simulation process and quality of obtained parameterization, i.e., fitness function (S 2), correlation coefficient (R 2) and linear fit (y = ax) for MPA, AIM, NPA, Hirshfeld and Voronoi population analyses
Fig. 1
figure 1

Correlation diagram between Hirshfeld [B3LYP/6-31G(d) level of theory] and CSA-derived charges for trivial-atom (a) and hybrydized-atom (b) resolutions

Fig. 2
figure 2

Correlation diagram between NPA [B3LYP/6-31G(d) level of theory] and CSA-derived charges for trivial-atom (a) and hybrydized-atom (b) resolutions

Fig. 3
figure 3

Examples of correlation plots obtained for force-field resolution. AIM (a) and NPA (b) reference charges

It can be seen that hybridization improves the system’s description. Reproduction of all population analyses is better. The absolute values of S 2 are lowered. In the same time, correlation coefficients and linear fit slopes increase. The improvement is more pronounced for these population analyses which were worse reproduced in the trivial-atom resolution. For Hirshfeld and Voronoi charges R 2 rises by approximately 0.03. In the case of MPA, NPA and AIM charges the improvement is less pronounced. R 2 rises by <0.01. Such results are not surprising. Firstly, the increase in the number of optimization parameters always improves the correlation. Secondly, with the introduction of additional atomic types the system is described more accurately. An illustration of this effect is shown in Fig. 1 where correlation plots between CSA and Hirshfeld charges for trivial-atom (a) and hybridized-atom (b) resolutions are presented. It can be seen that Hirsfeld charges of oxygen atoms form two islands corresponding to sp 2 and sp 3 hybridization. This is not captured by CSA in trivial-atom resolution and all charges obtained for oxygen are of similar magnitude. This effect also explains lower values of linear fit slopes obtained for Hirshfeld and Voronoi charges in trivial-atom resolution. Usually lower values of y-intercepts would suggest that CSA charges are underestimated compared to ab initio reference charges. In this case, however, this is caused by the property of the least square method used in the fit and the fact that there were more sp 2 than sp 3 oxygen atoms in the training set. Distinction of sp 2 and sp 3 oxygens allows to fix this problem. In fact, the improvement of correlation between oxygen charges is the most significant factor responsible for the improved reproduction of both Hirshfeld and Voronoi analyses. In the case of MPA and NPA analyses oxygen charges do not form separate islands corresponding to different hybridizations (Fig. 2). Instead, they form overlapping domains and this is the reason why they are well reproduced by CSA in the trivial-atom resolution. Further betterment of correlation between CSA and ab initio charges is achieved by introducing force-field atom types. Here, the largest improvement is observed for carbon and nitrogen atoms, which is the consequence of the very sophisticated distinction between chemical environments introduced in AMBER force field for these elements.

In the first entry of Table 2 optimal effective electronegativities for trivial-atom resolution are collected. It can be seen that different relative electronegativity scales are obtained for different reference charge distributions. For MPA and NPA charges, effective electronegativities can be ordered as follows: χ H < χ S < χ C < χ N < χ O. Such sequence is in accordance with Pauling electronegativity scale for isolated atoms. For AIM charges sulfur is less electronegative than hydrogen. In the case of Hirshfeld and Voronoi charges electronegativity of sulfur is higher than that of carbon. One should remember that electronegativity of a given atom in molecule is modified by electrostatic contribution due to the environment and isolated, non-interacting atom limit is not the best reference. The deviations between values of optimal electronegativities obtained for different reference population analyses are the smallest for sulfur and carbon. For oxygen and nitrogen differences between them are more pronounced, especially for AIM charges where very high values of \( \chi_{\text{N}} \) and \( \chi_{\text{O}} \) are observed. Such high values of effective electronegativities of nitrogen and sulfur can be rationalized by analyzing the reference AIM charge distribution. First, in this case partial charges of N and O atoms exhibit relatively large absolute values compared to other ab initio population analyses under consideration (see Fig. 3)—except for two nitrogen atoms they are all over 0.8e. Second, charges obtained for oxygen and nitrogen form an island, separated by over 0.4e from the lowest charge obtained for any other element (or over 0.6e with the exclusion of the aforementioned nitrogen atoms).

Table 2 The optimized electronegativity (χ i in eV/e) and hardness (η i in eV/e2) data for trivial-atom resolution

The hardness data reported in Table 2 reveals that relative hardness scales also differ for different reference charge distributions. For MPA charges it agrees with the isolated atom limit, namely, \( \eta_{\text{O}} > \eta_{\text{N}} > \eta_{\text{C}} > \eta_{\text{S}} . \) The value of \( \eta_{\text{H}}^{{}} \) is in between \( \eta_{\text{O}}^{{}} \) and \( \eta_{\text{N}}^{{}} \). For Hirshfeld charges the first three elements share the trend with other population analyses but hardness then drops for O and rises again for S. Similarly to the case of electronegativities, the influence of molecular environment can be invoked to explain these differences. Comparison of the hardness values obtained for different reference charges shows that the biggest differences are observed for nitrogen, oxygen and sulfur atoms. Hardnesses of hydrogen and carbon are less variable between population analyses. Standard deviations (σ) between parameters obtained for these elements are lower than 3 eV/e2 compared to 5–8 eV/e2 obtained for N, O and S. The hardnesses are responsible for resistance on charge flows. The harder is the atom, the stronger is its resistance on CT. Charge distribution for Voronoi and Hirshfeld analyses is less spread than for remaining analyses. Therefore, the hardness parameters for these population analyses are usually higher than for remaining population analyses.

Table 3 collects the effective electronegativities and hardnesses for the hybridized-atom resolution. The number of variation parameters for O, N and C atoms is doubled since two types of hybridization (sp 2 and sp 3) were considered. For some of population analyses the hardnesses and electronegativities of the same elements are in disjoint domains. For other populations the atomic domains overlap. The overlapping of electronegativity and hardness domains is more obvious for AMBER force-field atoms shown in Table 4, especially for hardnesses. This indicates that the effective electronegativity and hardness data are strongly dependent on the nearest environment. It can be seen that for hybridized and force-field atom types, likewise the trivial-atom resolution, the obtained values of electronegativities and hardnesses for oxygen and nitrogen atoms are most variable among population analyses. In the hardness domain the most variable element is nitrogen. For this element the standard deviation σ is equal to 13 and 14 eV/e2 for hybridized and force-field atoms, respectively. In the case of electronegativities oxygen exhibits the biggest differences (σ = 13 eV/e for hybridized atoms and σav = 11 eV/e for force-field atoms). Apart from that, it can be seen that for all three resolutions employed hardnesses are much more variable than electronegativities both between population analyses and between atom types corresponding to the same element. For example in the case of hydrogen in the force-field resolution σav = 7.5 eV/e2 for hardnesses and 1 eV/e for electronegativities. It is not surprising since electronegativity differences determine the direction of charge flow between atoms in molecular system, whereas atomic hardnesses are responsible for the amount of charge transferred between atoms. Hardness parameters are therefore more sensible to both the reference charge distribution employed and chemical environment of a given atomic type.

Table 3 The optimized electronegativity (χ i in eV/e) and hardness (η i in eV/e2) data for hybridized-atom resolution
Table 4 The optimized electronegativity (χ i in eV/e) and hardness (η i in eV/e2) data for AMBER99 force-field resolution

It is hard to indicate how good the obtained parameterizations are. Therefore, in Fig. 4 we have plotted a histogram illustrating distribution of CSA-derived charges around the reference values \( \left| {q_{i}^{\text{CSA}} - q_{i}^{X} } \right| \) (X = MPA, NPA, AIM, Hirshfeld and Voronoi charges) for all molecules from the training set. It can be seen from the figure that Hirshfeld and Voronoi charges have the sharpest distributions and there are no deviations beyond 0.15e. The same observation is valid for MPA, AIM, and NPA charges. However, the number of atoms in the first region (0,0.05) is smaller than for Hirshfeld and Voronoi analyses. Now, the other two regions, (0.05–0.10) and (0.10–0.15) are more populated than Hirshfeld and Voronoi cases.

Fig. 4
figure 4

Histograms of the absolute differences between MPA, AIM, NPA, Hirshfeld, Voronoi [B3LYP/6-31G(d) level of theory] and CSA-derived charges for the set of training molecules

Data presented in Tables 1, 2, 3, 4 and Fig. 4 correspond to the training set of molecules. To validate the obtained parameters we have applied CSA to a validation set. None of the molecules from the validation set was in the training set. The validation set included completely new classes of molecules, namely, mono- and disaccharides, lactams, keto acids, thio acids, thioesters, carbamic acid and its derivatives and others. The structure of molecules from the validation set can be found in Ref [17]. We have performed calculation for force-field resolution. All investigated population analyses were taken into account. The results are illustrated in Fig. 5. The obtained distribution is close to the reference values. The Voronoi charge distribution is the sharpest. The differences in CSA-derived charges for Voronoi, Hirshfeld, and AIM population analyses do not go beyond 0.15e as it was observed for the training set of molecules. Slightly worse agreement is observed for the other population schemes.

Fig. 5
figure 5

Histograms of the absolute differences between MPA, AIM, NPA, Hirshfeld, Voronoi [B3LYP/6-31G(d) level of theory] and CSA-derived charges for the set of validating molecules

Conclusion and future prospects

The extension of CSA to AMBERff99 force-field resolution was performed. The effective electronegativity and hardness data were found using evolutionary algorithms. Five independent sets of parameters reproducing different population analyses, namely, MPA, AIM, Hirshfeld, Voronoi, and NPA, were reported. Apart from force-field resolution, intermediate hybridized-atom resolution and the least resolved trivial-atom resolution were considered. The parameterization included hydrogen, carbon, nitrogen, oxygen, and sulfur atoms. For hybridized-atom resolution, sp 2 and sp 3 states of carbon, nitrogen, and oxygen atoms were considered. The AMBER force-field resolution distinguished 38 different chemical environments of H, C, N, O, and S atoms.

Our investigations clearly demonstrate that effective hardness and electronegativities depend on the nearest chemical neighborhood. The Voronoi and Hirshfeld charges were more sensitive on chemical environment than the remaining population analyses. The sharpest distribution around the reference charges was observed for Voronoi and Hirshfeld charges. The most spread was distribution obtained by NPA charges. This observation was also confirmed by molecules from validating set. CSA-derived charges for MPA parameterization were more spread around the reference MPA charges as compared to the training set. This discrepancy can be explained by the fact that the validation set contained new classes of molecules, none of which were included to the training set. The better performance of the other population analyses can be attributed to smaller changeability of charges within the training and validating sets of molecules.

In the nearest future, we plan to connect force-field CSA with molecular dynamics calculations. We hope that we can adopt our formalism to derive a polarizable force field. Standard force fields used in molecular modeling describe electrostatic interactions in terms of fixed, atom centered, charges. Real molecules are substantially polarized when placed in a high-dielectric medium. The polarization strongly affects the geometry and energetics of solute molecules. The force fields include polarization only in an averaged way by increasing the atomic charges in order to describe the bulk properties of liquid solvents. In available polarizable models [13, 14, 4144] either the total field is determined self-consistently via an iterative energy minimization procedure or extended Lagrangian method is applied for polarization degree of freedom. In the latter case the second thermostat is required. There is no need to apply Lagrangian method in CSA formalism. In addition, self-consistency can also be omitted. We plan to derive polarizable force fields based on this formalism and parameterization obtained in this article.