Introduction

Electrophilic attack on nucleobases is a common molecular mechanism of mutagenicity or carcinogenicity [1, 2]. It is involved in deleterious properties of many mutagens starting with environmental pollutants [3, 4] to finish with endogenous substances or their metabolites [5, 6]. Adduct formation can lead to perturbations in hydrogen-bonding (HB) pattern, conformational interconversions, formation of interstrand cross-links [79] or abasic sites [10] and in consequence to nucleic acids’ malfunction or damage.

Similar to most biological processes, regioselectivity plays an important role here. It is governed by either kinetic or thermodynamic factors [11]. The first ones are connected with relative heights of activation barriers. In the latter, stability of the products is decisive. In both cases, the observed trends can arise from two sources. One is the inherent reactivity of various sites. It arises from electronic structure of the reagent and is independent of the reaction partner. It is believed that the most nucleophilic atoms of purines are the N7 atom of guanine and N1 atom of adenine (IUPAC atoms numbering; see Figs. 1, 2). In this respect, hard and soft acids and bases (HSAB) theory [12, 13] is often invoked along with the empirical Kornblum’s rule [14] and Klopman’s model of charge- and orbital-controlled reactions [15]. The other sources are connected with various specific interactions between the reagents and/or the environment. These include the following: solvent effects [16, 17], solvent-assisted reaction mechanisms [18, 19], the influence of pH on ionic equilibriums [2024] or specific interactions between reagents, in particular HBs and steric interactions [19, 25].

Fig. 1
figure 1

Optimized gas phase structures of isomers of adenine and its ions with their DFT/B3LYP/6-31+G(3df,2p) (kcal/mol) energies relative to the lowest-energy forms A1, AA1 and AC6

Fig. 2
figure 2

Optimized gas phase structures of isomers of guanine and its ions with their DFT/B3LYP/6-31+G(3df,2p) (kcal/mol) energies relative to the lowest-energy forms G2, GC3, GA6 and GA17

Due to this manifold of factors, computational modeling is a particularly advantageous tool in this field, allowing for separation of different contributions to observed trends. For example, Ford and Scribner [26] have used MNDO calculations in studies of nucleobases’ alkylation by alkylnitrosoureas. They have shown that optimal geometry of the transition state determines the preferences for either oxygen or nitrogen attack by these agents. Freccero et al. [19] have employed density functional theory (DFT) [27] in studies of adenine and guanine alkylation by quinone methide. They have demonstrated that the unusual preference of this agent to alkylate weakly nucleophilic exocyclic amine groups is due to its ability to form HBs with the nucleobases. Also on the basis of DFT calculations, the group of Mavri and Bren has confirmed the SN2 mechanism proposed for reaction of purines with epoxides [28, 29]. Another studies concerning, e.g., ionic equilibriums in aqueous environment [30], optimal structures of the adducts and their biological implications [8, 25] or the mutual influence of base pairing and adduct formation [31, 32] have also been pursued.

All of the above studies are based on considering the energetic effects of the examined reactions. However, DFT has recently witnessed a great progress in its conceptual branch [33]. It tackles the problem of chemical reactivity by employing a perturbative approach, similar to experimental thermodynamics [34]. A lot of properties known from experimental chemistry, such as chemical potential, electronegativity, polarizability or hardness and softness, can be given strict definitions on the ground of conceptual DFT. Other reactivity descriptors, e.g., Fukui function [35] or various electrophilicity/nucleophilicity indices [3638] can also be defined. They have been successfully applied in modeling various reactions [33, 39], and lately also chemical toxicity [40].

In our group, a method called charge sensitivity analysis (CSA) has been developed [41]. It is derived from DFT and was originally formulated to extract chemically relevant information from ab initio calculations in different resolutions. We have recently parameterized its semiempirical variant in force-field atoms resolution [42, 43]. It allows efficient calculating of equilibrium charge distribution and a set of reactivity indices by a single matrix inversion step. In the present study, we demonstrate the usefulness of this approach in description of nucleophilic properties of purines. We combine CSA reactivity descriptors with energetic considerations in various solvents in order to separate the contribution of inherent reactivity and reaction conditions to observed regioselectivity. Another goal of this study is to verify the agreement of the obtained reactivity parameters with experimental results before moving on to large biological systems. Complexity of purines’ electronic structure should provide a sufficiently challenging test for the method.

The paper is organized as follows. In the next section, a short overview of CSA formalism is presented. Next, computational details are given, followed by “Results and discussion” section. The last section contains conclusions and some future prospects.

Theoretical background

Conceptual DFT approach to chemical reactivity resembles that of phenomenological thermodynamics. The behavior of the system is described in terms of its responses (sensitivities) to perturbations in state parameters. In studies of chemical reactivity, these usually are the number of electrons, N, and the external potential due to nuclei, v(r). Perturbation in N corresponds to oxidation/reduction of the molecule. Change in v(r) models the presence of the other reagent. By expanding the system’s energy, E, in a second-order Taylor series with respect to these variables, a number of sensitivities can be defined. They include the global chemical potential of electrons (μ) [44], the global hardness (η) [45], polarization/linear response kernel (β) [46] and Fukui function (f(r)) [47].

Chemical potential μ measures the leaving tendency of electrons in the system. For systems in the state of global equilibrium, it is equalized through space [4850]. Global hardness describes the system’s resistance on charge flow. It depends on its global charge, size and polarizability [51, 52]. It is directly linked with the concept of chemical hardness and softness in HSAB principle [5355]. The inverse of the global hardness is the global softness, S. Polarization/linear response kernel measures the response of electron density in position r to perturbation of external potential in position r’. It can be used to probe various properties of molecular systems, such as electron delocalization, aromaticity/antiaromaticity and other aspects of organic/inorganic/metallic chemistry [5659]. Finally, Fukui function is a local property describing the response of electron density, ρ(r), to oxidation/reduction of the system. It is defined as:

$$f(\varvec{r}) \equiv\frac{\partial}{\partial N}\left( {\frac{{\delta E}}{{\delta v(\varvec{r})}}} \right) = \left( {\frac{\partial \rho (\varvec{r})}{\partial N}} \right)_{v(\varvec{r})} = \left( {\frac{\delta \mu }{{\delta v(\varvec{r})}}} \right)_{N}$$
(1)

It can be regarded as a measure of local reactivity: The sites of maximum f(r) should correspond to maximum sensitivity to electrophilic/nucleophilic attack. f(r) is normalized to unity. It can be shown [35] that a quantity called the local softness, defined as \(s(\varvec{r}) = \left( {\partial \rho (\varvec{r})/\partial \mu } \right)_{{v(\varvec{r})}}\), can be calculated with the use of f(r):

$$s(\varvec{r}) = f(\varvec{r})S$$
(2)

A systematic approach to compute these quantities is offered by CSA. The details of this formalism can be found in the literature [41, 42, 60]. Here, only a short overview is given. In CSA, a molecular system M is divided into either mutually closed or opened subsystems. The number of electrons in each subsystem can be controlled by coupling it to a hypothetical external reservoir of electrons with constant chemical potential. Mutually opened subsystems are coupled to the same reservoirs. Consider, for example, a reaction between a Lewis acid, A, and base, B:

$$A\quad + \quad :B \to A:B$$
(3)

In the early stages of the reaction fragments, A and B can be regarded as mutually closed: M = (A|B). Each of them can in turn be represented by a collection of N A and N B mutually opened atoms, e.g., A = (1 A ⋮2 A ⋮…⋮N A ). For that purpose, fragments’ electron densities, ρ A (r) and ρ B (r), should be replaced with vectors grouping their atomic electron populations, or, more conveniently, partial atomic charges, \(\varvec{q}_{A} = \left( {q_{1}^{A} ,q_{2}^{A} , \ldots ,q_{{N_{A} }}^{A} } \right)\) and \(\varvec{q}_{B} = \left( {q_{1}^{B} ,q_{2}^{B} , \ldots ,q_{{N_{B} }}^{B} } \right)\). Similarly, fragments’ external potentials, v A (r) and v B (r), are replaced with vectors \(\varvec{v}_{A} = \left( {v_{1}^{A} ,v_{2}^{A} , \ldots ,v_{{N_{A} }}^{A} } \right)\) and \(\varvec{v}_{B} = \left( {v_{1}^{B} ,v_{2}^{B} , \ldots ,v_{{N_{B} }}^{B} } \right)\). In such atomic resolution, the above-mentioned sensitivities can be represented as:

$$\mu_{X} = \left( {\frac{\partial E}{{\partial N_{X} }}} \right)_{{N_{Y} ,\varvec{v}_{X} ,\varvec{v}_{Y} }}$$
(4)
$$\eta_{X,Y} = \left( {\frac{{\partial^{2} E}}{{\partial N_{X} \partial N_{Y} }}} \right)_{{v_{X} ,v_{Y} }} ,\quad \quad S_{X,Y} = \frac{1}{{\eta_{X,Y} }}$$
(5)
$$\varvec{f}_{X,Y} = \left( { - \frac{{\partial \varvec{q}_{X} }}{{\partial N_{Y} }}} \right)_{{v_{X} ,v_{Y} }} = \left\{ {f_{i}^{X,Y} = \left( { - \frac{{\partial q_{i}^{X} }}{{\partial N_{Y} }}} \right)_{{v_{X} ,v_{Y} }} } \right\}$$
(6)
$$\varvec{s}_{X,Y} = \left\{ {s_{i}^{X,Y} = f_{i}^{X,Y} S_{X,Y} } \right\}$$
(7)
$$\varvec{\beta}_{X,Y} = \left( { - \frac{{\partial \varvec{q}_{X} }}{{\partial \varvec{v}_{Y} }}} \right)_{{N_{X} ,N_{Y} }} = \left\{ {\beta_{i,j}^{X,Y} = \left( { - \frac{{\partial q_{i}^{X} }}{{\partial v_{j}^{Y} }}} \right)_{{N_{X} ,N_{Y} }} } \right\}$$
(8)

where X and Y denote A or B.

As the reagents approach each other, charge transfer (CT) between them can be accounted for by coupling them to a common electron reservoir. The system then passes from a state of constrained equilibrium (different chemical potentials in A and B) to the state of global equilibrium with μ A  = μ B  = μ.

Diagonal atomic Fukui indices, f X,X i , or alternatively atomic softnesses, s X,X i , can be used to probe inherent reactivities of different sites in the reagents. In the early works of Parr and Yang [47], it was suggested that sites with maximum values of Fukui function/local softness are the most reactive. This approach was later generalized by Li and Evans [61] by considering second-order electronic energy change accompanying reaction (3). According to the early works of Klopman [15] and Pearson [12], reactions between hard reagents are charge controlled (hard reactions) and reactions between soft reagents are orbital/charge transfer controlled (soft reactions). Li and Evans have shown that when CT is small, hard reactions proceed through atoms with minimal local softnesses, while soft reactions proceed through atoms of maximum local softness. These trends can be considered as the local counterpart of HSAB principle. When CT is large it can be shown [62] that regioselectivity is governed by Fukui indices and the character of the reagents. A reaction between neutral molecules proceeds through atoms with maximum Fukui indices. However, for reagents with large, usually opposite, total charges, the sites of minimum Fukui indices are preferred. This can be regarded as an intermediate hard/soft reaction when the CT is significant while the reagents are charged (i.e., hard). As indicated in Eqs. (2) and (7), Fukui indices and atomic softnesses are proportional to each other. Each of them can be used to probe relative atomic reactivities within one molecule. In the present study, atomic softnesses have been chosen because they also allow comparison between different molecules.

Computational details

In order to rule out any system-specific interactions, proton (H+) was chosen as a model electrophilic agent. Geometrical structures of all possible tautomers of adenine and guanine in neutral, protonated and deprotonated states were optimized at DFT/B3LYP/6-31G* level of theory. Next, their energies were recalculated in the 6-31+G(3df,2p) basis. Solvent effects were modeled by performing analogous calculation with PCM model [63, 64] in a series of solvents with dielectric constants (ε) varying from 1.88 (n-hexane) to 78.36 (water). All the calculations were performed with Gaussian09 program [65]. The most stable structures were chosen for CSA analysis. The calculations were performed with a program developed in our group. The details of the formalism are the same as in our previous studies [60, 66]. CSA results depend on the population analysis used to partition electron density into atomic contributions. For this reason, they were performed for six different analyses and their results have been compared. The analyses examined include the following: Mulliken population analysis (MPA) [6769], natural population analysis (NPA) [70], Bader’s atoms in molecules (AIM) [71, 72], Hirshfeld’s stockholder analysis (HSA) [73], CHELPG electrostatic potential fitted charges [74, 75] and Voronoi deformation density (VDD)-based charges [76].

Results and discussion

One can think of two main mechanisms of proton’s attack on purines. In the first (path I), proton attacks an anion and in the other (path II) a neutral molecule. In path II, the attack leads to a cationic transition state, which later undergoes deprotonation and the final neutral product is formed. In path I, the neutral product is formed directly by the attack of H+ on the anion.

For adenine, 10 isomers of the neutral molecule (A1A10), 5 anions (AA1AA5) and 14 cations (AC1AC14) have been considered. For guanine, 20 neutral molecules (G1G20), 15 singly deprotonated anions (GA1GA15) and 20 cations (GC1GC20) have been analyzed. Guanine can also undergo double deprotonation. Experimental pKa value for this process equals to 12.3–12.4 [7783]. Since such drastic conditions are rarely met in biological systems, this reaction path is not considered here. However, doubly deprotonated guanine anion (GA16) is used as a reference system in the discussion of CSA parameters and was included in the calculations. Optimized gas phase structures of all analyzed species, together with their energies, are gathered in Electronic Supplementary Materials available online (Figs. S1–S6). In what follows, isomers with energy within 10 kcal/mol from the most stable form, either in gas phase or in water, are considered. Their gas phase structures are presented in Figs. 1 and 2. The structures optimized in solutions were very similar to the gas phase ones and therefore are not presented.

Adenine

The energy diagram of adenine reactions with H+ is presented in Fig. 3. The upper part of the diagram corresponds to reaction path I, the lower to path II. Above the dashes representing each structure are their energies calculated for two extreme environments: gas phase (red) and water (blue).

Fig. 3
figure 3

Energy diagram for reactions of adenine with H+. The first number (red) above each bar represents the energy in the gas phase, the second (blue) in water (Color figure online)

In the gas phase, neutral adenine can be found predominantly in the A1 form, with proton at the N9 position. Polarization by the solvent (Fig. 4a) does not change the energetic ordering of the lowest-energy isomers. It is manifested mainly by lowering the energy difference between A1 and A2 forms, up to 2.2 kcal/mol in water. Among adenine anions, the amine form AA1 unequivocally dominates in the gas phase as well as in solutions. Therefore, reaction path I leads through electrophilic attack on AA1. Among neutral reaction products, N9-substituted A1 isomer is the most favored. However, in polar solvents, some amounts of N7-substituted species can also be formed.

Fig. 4
figure 4

Influence of solvents with different dielectric constants on the energies of neutral (a) and protonated (b) adenine, neutral guanine (c) isomers and guanine anions (d). Energetic order of the isomers in the gas phase is depicted as horizontal lines in the range of negative ε

In the case of path II, in the gas phase it starts with H+ attacking neutral molecule A1. Two low-energy cations can be produced in this process: AC1, AC2. Boltzmann distribution in 298 K leads to 91 and 9 % of AC1 and AC2, respectively. Deprotonation of the transition products leads primarily to recreation of substrate A1. In polar solvents, the energies of AC3, A2 and A4 forms are lowered (Fig. 4 a, b). It allows A2 and A4 isomers to be created and undergo protonation to produce the lowest-energy cation, AC6. In these conditions, especially in protic solvents, one can assume that all the cations coexist as an equilibrium mixture. The contents of AC6, AC1 and AC2 then equal to 51, 45 and 4 %, respectively. Deprotonation of AC6 leads to recreation of N7-substituted A2 and possibly small amounts of N3-substituted A4. All in all, the most favored outcomes of the reaction are recreation of N9-substituted substrate and formation of the N7-substituted product. The latter is more probable in polar solvents.

In Fig. 5, partial atomic charges (top number) and atomic softnesses (bottom number, in braces) of nitrogen atoms are presented. They correspond to MPA population analysis. Results for other population analyses are presented in Supplementary Materials (Figs S7, S8).

Fig. 5
figure 5

Partial atomic charges (MPA) and atomic softnesses of nitrogen atoms in adenine and its anion

In general, three types of nitrogen atoms can be distinguished in purines. The first one (N I ) involves exocyclic amine nitrogen with a mixed sp 2/sp 3 hybridization. In this sp 3 nitrogen atom, the lone electron pair is withdrawn by the aromatic system what introduces a partial sp 2 character. The second type (N II ) involves the nitrogen from the heterocyclic system with hydrogen attached. This sp 2 nitrogen has its lone pair delocalized in the aromatic system or involved in a bond with significant contribution of a double-bond character. The third group (N III ) contains heterocyclic sp 2 nitrogens without hydrogens. They donate a single electron to the heterocyclic system and therefore retain the lone pair.

In adenine, N I atom is characterized by the largest, negative value of partial charge. This atom is in fact electron deficient. This deficiency is partially reduced by strong polarization of N–C and N–H bonds, which is responsible for large magnitude of its partial charge. The partial charges of N II atoms are less negative. Finally, N III atoms exhibit the least negative charges. Some small deviations from these trends can be observed in other population analyses. Nevertheless, in all population analyses, atomic softnesses exhibit a consistent trend. Electron-deficient character of N I and N II atoms is reflected by their low softness. Atoms belonging to these groups are not susceptible to electrophilic attack. N III atoms possess lone electron pairs; thus, they exhibit the largest values of atomic softness. Atoms belonging to this group are the most susceptible to electrophilic attack.

To elucidate trends in relative reactivities of different sites in adenine, N III atoms need to be compared. In AA1 anion, purine’s characteristic pattern of alternate N/C atoms is not perturbed by the presence of protons on the cyclic nitrogens. Moreover, partial charges and geometrical parameters (Figs. S1, S2 and S5) show that the negative charge is uniformly distributed in the heterocyclic system. Therefore, the trends observed for this species can be regarded as the most intrinsic for adenine. In this structure, nitrogen atoms from 6-membered ring are softer than these from the 5-membered ring. Moreover, s N3  > s N1 and s N9  > s N7 . So the soft character of cyclic nitrogens increases in the series: N7 < N9 < N1 < N3. These trends are preserved in the neutral species.

On path I, reaction of AA1 with H+ proceeds between two oppositely charged species. According to Li–Evans model, it should proceed through atoms with low s i values. These are the N7 and N9 atoms. s N7 is lower than s N9 —so from the purely kinetic point of view N7 substitution is favored. This observation is in accordance with trends in energies of neutral adenine isomers. A1 and A2 forms result from attack on N9 and N7 and have low energies. Lower energy of A1 is due to an unfavorable steric interaction between a proton from NH2 group and H7, and not due to lower reactivity of N7. This is confirmed by significant lowering of the energy difference between A1 and A2 in water. A3 and A4 forms result from attack on soft N3 and N1 atoms. This path is less favored; hence, A3 and A4 have larger energies. The energy of A4 is less decreased in water than that of A2. This confirms that the destabilization of A4 comes mainly from electronic effects. In the case of A3, both destabilizing factors, electronic and steric, cooperate. These results are in agreement with experimental observations, i.e., in basic conditions the reaction with a variety of reagents proceeds on N9 [20, 24, 84, 85].

In reactions on path II, a cation attacks a neutral molecule. Again, the atom with minimum softness, i.e., N7, is the favorable site of attack. Different preferences can be expected according to relative energies of the cations. As stated above, the stability of cations decreases in the series: AC1 > AC2 > AC3, so N1 site is preferred over N3 and N7, with the latter being strongly disfavored. These discrepancies are caused by the large change in external potential introduced by the attacking H+. In adenine cations, electron delocalization determines their relative energies. The lone electron pair of N6 atom is conjugated with the heterocyclic ring, and the whole system is flat. The presence of a hydrogen atom in N1 position allows for advantageous electron delocalization over both heterocyclic rings. This is not possible in N7- and N3-substituted products. Additional destabilization of the former is introduced by the steric interaction between H7 and H6 atoms. N7 substitution is therefore favored in terms of inherent reactivity of isolated substrate, while N1 substitution is favored energetically. Experimental results show that reactivity criteria dominate and that N7-substituted products are observed in neutral pH [3, 86].

Guanine

In the case of guanine, manifold of possible tautomeric forms makes the picture somewhat more complex. The energy diagram of its reactions with H+ (Fig. 6) shows that in the gas phase it exists as a mixture of two main isomers: G1 and G2. Both of them have a proton in N1 position. They differ in the position of the second proton (N9 in G1 and N7 in G2). Small amounts of enolamine form G9 can also be found. In 298 K, Boltzmann distribution leads to 21, 74 and 5 % of isomers G1, G2 and G9, respectively. In solutions, the ordering of G1 and G2 forms interchanges as ε rises (Fig. 4c). At the same time, the energy of G9 isomer rises relative to G1, up to 7 kcal/mol in water. In this solvent, the amounts of guanine isomers equal to 77 and 23 % of G1 and G2, respectively. These results are in qualitative agreement with a more accurate study of Goddard and co-workers who report 32 % of G1 in the gas phase and 88 % in water [30]. As already pointed out by Goddard, these differences in relative energies are due to two factors. Firstly, G1 has a larger dipole moment than G2, and therefore, its solvation energy is greater. Secondly, in G1 and G2 forms unfavorable steric interactions between the protons from NH2 group and N1 position are shielded in polar solvents. This causes their energies to be lowered with respect to G9 isomer.

Fig. 6
figure 6

Energy diagram for reactions of guanine with H+. The first number (red) above each bar represents the energy in the gas phase, the second (blue) in water (Color figure online)

In the gas phase, single deprotonation of isomers G1, G2 and G9 leads to three low-energy anions: GA1, GA4 and GA5. After accounting for the relative contents of neutral species, their amounts equal to 38, 55 and 4 %, respectively. However, similar to the case of adenine cations, the most stable anion, GA6, is not produced by direct deprotonation of any of the low-energy neutral species. This picture changes in solutions (Fig. 4d). As the dielectric constants of the solvent rises, the energies of GA5 and GA6 quickly rise. At the same time, the energies of GA1 and GA3 are lowered with respect to GA4. In water, the amounts of GA1, GA3 and GA4 are equal to 47, 23 and 30 %, respectively. Again, these trends can be rationalized in terms of dipole moment differences and shielding unfavorable steric interactions.

To sum up, in the gas phase, reactions on path I proceed through anions GA1, GA4 and GA5. Attack on GA1 leads to a mixture of products substituted at positions N9 and N7 (G1 and G2). Attack on GA4 produces mainly G2 form. Attack on GA5 leads to enolamine isomer G9. In water, the reaction may proceed through GA1, GA3 or GA4 anions. Again, attack on GA1 leads to a mixture of G1 and G2 products. G1 is also produced by attack on GA3, while G2 is produced by attack on GA4. Therefore, all in all N7- and N9-substituted products are favored.

In the case of guanine cations, one low-energy structure, i.e., GC3, dominates in the gas phase. It is produced directly from protonation of either G1 or G2. Formation of the other low-energy cation, GC8, is unlikely due to high energies of its antecedents, G10 and G6. In solutions, the energy of GC8 quickly increases. At the same time, the energy difference between GC3 and GC1 decreases, up to 2 kcal/mol in water. Deprotonation of GC3 leads back to G1 and G2 neutral systems Therefore, in the gas phase reactions on path II lead to a mixture of N7- and N9-substituted products. This is not changed in solutions since formation of GC1 cation leads primarily to recreation of the substrate, G2.

Figure 7 presents partial charges and atomic softnesses of nitrogen and oxygen atoms in guanine and its ions. Again, the same three types of N atoms can be distinguished. Their main characteristics remain unchanged. Additional type of nitrogen (N IV ), i.e., N2 imine nitrogen, can be found in G9 and GA5 systems. It forms a double bond with C2 and retains the lone pair. It is characterized by low, negative partial charge and large softness. It is susceptible to electrophilic attack. Two types of oxygen atoms can be distinguished: carbonyl (O I ) and hydroxyl (O II ). Carbonyl oxygens exhibit a partial charge comparable to those of N III nitrogens. The softness of O I atoms unequivocally dominates among other atomic softnesses in the system. In analogy to N atoms, magnitude of oxygen’s partial charge increases upon protonation, while its softness drops.

Doubly deprotonated anion, GA16, can serve as a model of guanine’s heterocyclic system, unperturbed by the presence of hydrogens. From data in Fig. 7, it can be seen that in this species negative charge is distributed among all the electronegative atoms. Geometrical parameters (Fig. S4) suggest that in contrast to adenine anion AA1, in GA16 the presence of a carbonyl group disables the heterocyclic rings from attaining a fully aromatic character. Nevertheless, partial charges and atomic softnesses of N III atoms exhibit similar trends to AA1. The only exception is that nitrogens from 5- and 6-membered rings have in pairs similar softness values. Thus, nucleophilic character of N III atoms increases in the series: N7 ≈ N9 < N1 ≈ N3. It can also be seen that the lone pair of N2 atom is no longer withdrawn by the heterocyclic system. It is reflected in the value of its atomic softness, which is in the same range as those of N III atoms. Therefore, sp 3 nitrogens which retain their lone pairs are also susceptible to electrophilic attack. Also electron-rich carbonyl oxygen has a large value of atomic softness.

Fig. 7
figure 7

Partial atomic charges (MPA) and atomic softnesses of nitrogen and oxygen atoms in guanine and its anions

When an N III atom in GA16 is protonated, its atomic softness and partial charge change in the same way as previously described for adenine. Interestingly, protonation of N1 atom causes also a decrease in the softness of N3 atom. On the contrary, in 5-membered ring protonation of one of the nitrogens makes the other softer. Protonation of O6 decreases softnesses of all N III atoms, especially N7. In singly deprotonated anions, protonation of nitrogen atoms preserves the observed trends. Upon protonation of O6 s N1 now decreases the most. These trends rationalize the observed softness values in neutral species. In G1 and G2 systems, the increase in s N7 or s N9 and decrease in s N3 are responsible for the greater values of N III atoms’ softnesses in 5-membered ring than those of nitrogens in 6-membered ring. Similarly, lower value of s N1 than s N3 in G9 can be attributed to the protonation of O6.

As previously stated, sites with minimum softness values are preferred for the attack on both reaction paths. In pH between 9.4 and 12.5, depending on the polarity of the solvent three singly deprotonated anions, GA1, GA3 and GA4, can be present. In all of these structures, softnesses of all N III atoms are close to each other. Some subtle trends that can be observed are not equivocal and depend on the population analysis. Therefore, kinetic control does not favor any of the products definitely. In this case, the observed products should depend on thermodynamic control. Depending on the polarity of the solvent, those can be N7-, N9- or N1-substituted species. Accounting for the fact that both the anions and the products of their protonation exist as mixtures of different isomers, all three N-substituted species can be expected. Therefore, neither reactivity nor energetic parameters do unequivocally favor a specific site of the reaction. When H+ is attacking a neutral guanine molecule (pH < 9), N3 site is the most reactive. Yet, attack on this atom is strongly disfavored energetically. This explains the lack of adducts observed experimentally in these conditions [3].

The results gathered in Supplementary Materials show that with some minor exceptions NPA, HSA and VDD analyses predict similar trends to MPA. AIM analysis on the other hand clearly stands out of the others, giving results which are often in contradiction with all the remaining analyses. For this reason, it should probably be discouraged in similar studies. CHELPG analysis sometimes agrees with the others and sometimes it does not. What is more, it often produces negative values of atomic softnesses, which are somewhat questionable [87]. They are most probably due to the statistical inaccuracy introduced by the charge-fitting procedure. Therefore, its use may also be disfavored. MPA and NPA analyses as well as HSA and VDD give in pairs very similar results. This is not surprising since they are based on similar theoretical backgrounds. The results of MPA and NPA pair are probably closest to conventional chemical intuition regarding magnitude of the atomic charges and softnesses. A better agreement with experimental data has also been observed for these analyses, with MPA being slightly superior. In conclusion, these two schemes can be advised as most reliable in similar studies. The choice of NPA or HSA analyses has in fact already been suggested by others [87].

Conclusions and future prospects

In the presented study, DFT calculations have been employed to model nucleophilic reactivity of purine bases. The energetics of their protonation/deprotonation in different media have been examined with the PCM model. CSA calculations have been used to study charge distribution and relative reactivity of different sites. It is shown that this approach is very successful in predicting regioselectivity of electrophilic attack on these ambidente species. It enables separation of two different contributions, i.e., inherent reactivity and the influence of reaction conditions, to regioselectivity. This in turn allows elucidating and rationalizing clear trends in the otherwise perplexing manifold of experimental data. The results obtained in the present study are in agreement with experimentally observed products of reactions with various electrophiles and in various conditions.

In empirical organic chemistry, Kornblum’s rule is often used to rationalize observed regioselectivity of electrophilic attack on ambidente nucleophiles. It originally stated that hard electrophiles react with the most electronegative sites in the nucleophile, while soft electrophiles tend to attack less electronegative, soft atoms. It was later combined with empirical HSAB principle. Recently, this approach has been criticized by Mayr et al. [88]. They argue that the number of cases where empirical HSAB approach fails is almost equal to that of its successes. In the discussed case of electrophilic attack on purines, Kornblum’s rule also fails. For guanine, it predicts that hard electrophiles should attack the oxygen atom. The present study shows that despite large electronegativity of O6, this atom is actually very soft. Therefore, extremely hard H+ attacks nitrogen atoms. Indeed, O-substituted products are rarely observed experimentally and often in small yields. It seems that their formation is due to some specific interactions between the reagents or other species present in the reaction mixture. Similar situation was observed for alkylation of 2-pyridone on the basis of which Kornblum formulated his rule [89, 90]. The present study demonstrates that, despite the failure of Kornblum’s rule, the concept of HSAB itself does not need to be discarded. The main deficiency of empirical HSAB is the inaccurate definition of hardness/softness. Thanks to development introduced in this field by conceptual DFT; this problem can be solved. This, as well as many other successful theoretical studies [34] show that the concept of HSAB is in fact justified and very useful.

As far as biological consequences are concerned another issue needs to be addressed, i.e., the influence of the chemical neighborhood of the nucleobases on their reactivity. For example, Fishbein and co-workers have demonstrated that the highest nucleophilic activity of N1 atom of adenine is actually due to the structure of the DNA helix and not to inherent properties of this nucleobase [91]. One aspect of this problem is the geometrical availability of particular sites in the DNA helix. Another is the electrostatic environment of individual atoms and its influence of on their reactivity. We have recently demonstrated that atomic Fukui indices as well as other charge sensitivities change when an atom is engaged in a hydrogen bond [60]. Also the bond with the sugar moiety can influence the electronic properties of the heterocyclic ring. Studies in this direction are carried out in our group.

Extremely low computational cost of CSA scheme makes it especially well suited for studies of large systems. It can further be combined with molecular dynamics calculations in order to include many-body effects in the simulations. Similar electronegativity-based approaches are already known in the literature under the name of fluctuating charge models [9296]. However, CSA offers a much broader perspective. As evidenced above, it provides reliable information about the reactivity of particular sites in the system. In this respect, the agreement of CSA reactivity parameters with experimental results, and also with commonly recognized chemical intuition, is especially encouraging. Our previous study [60] also indicates that various charge sensitivities can be used as bond detectors. All these features of CSA can be utilized in construction of a reactive force field. We have undertaken steps in this direction.