Introduction

It is a trivial statement to say that DNA and RNA biopolymers are of fundamental importance for life. Ever since the Pauling [1] and Crick and Watson [2] discoveries, topology of covalently linked nucleic bases, via the connections by ribosome and phosphate anion, has been recognized as a leading concept to the helix. Next, the H-bond interactions were acknowledged for a certain stiffness of the helix-like strands. H-bonding is a well-recognized type of interaction [3,4,5,6] and—in principle—does not present any particular problem in computational descriptions of the interactions between pairs and other complexes of nucleic acids [7]. Much more complex is another very important interaction, although rather weak, known as the stacking. It takes place—in general—as an interaction between π-electron structures of two planar or close to planar molecules, often parallel one against another and having a rich π-electron population. There are many conformations of stacking pairs of nucleic bases [8], and hence any computational approach is a complex problem [9, 10]. There are many detailed works in this field of research (for reviews, see [11,12,13,14]). It should be emphasized that the evaluation of stacking interaction requires the use of accurate quantum-chemical methods, for example, MP2 or CCSD(T) [15, 16]. However, their use is limited to a rather small system (e.g., benzene dimer) [17]. In the case of larger systems, these calculations are considerably time and resource consuming [16]. For this reason, often, interactions are studied at higher computational level (e.g., MP2) for geometry optimized at DFT level [18, 19].

Summarizing, noncovalent interactions govern the structure and conformational dynamics of molecular systems, and hence they are crucial for their chemical properties. Therefore, the ability to understand and predict noncovalent interactions is very important. Computational studies are necessary for these purposes. Due to the size of the studied systems, it is very important to choose the appropriate level of calculation (methods and basis sets). Our research focuses on assessing the impact of a substitution on the structure and energy of stacking interactions of adenine dimers. Thus, the aim of this paper is to present the most effective computational approach which can reliably describe the abovementioned interactions. For this purpose, adenine dimers presenting a variety of mutual orientation of molecules, as well as variants of methods/basis sets, were selected.

Methodology

Choice of input structures

Adenine may participate in various stacking interactions that differ in the mutual orientation of molecules, a distance between them, a tilt, and a shift of individual adenine molecule. In order to determine an optimal method and basis set to describe such interactions, a set of most representative systems needs to be selected. Adenine can participate in amino-imino types of tautomeric equilibria leading to 12 possible tautomers [20]. A molecule of adenine can adopt four (stable) “amino” tautomers, from which the most stable and most occurring one in biological systems is conformer 9H [21]. Thus, this tautomer is the most often described. The Hobza group analyzed optimal arrangements of 9H adenine dimers using single-point calculations [22]. From this work, we have selected eight systems in their geometry corresponding to the minimum on the potential energy curve (AH) and used them as input structures for systems modeling the most common adenine behavior (Fig. 1, for input data set, see S.I.).

Fig. 1
figure 1

Adenine dimers used in the study

Choice of methods and basis sets

Intermolecular stacking interactions, due to their nature, need to be described using diffusion corrections. Additionally, in a selection of level of theory, calculation speed and low computational cost play important roles. Therefore, DFT-D methods were used and, according to the suggestions from the Hobza group [11], the following functionals were chosen: B97D [14], B97D3 [23], wB97XD [24], M06-2X [25], and additionallyCAM-B3LYP [26]. Since the basis set selection may also play the important role in the resulting geometry and energy, geometry optimization computations were carried out in various basis sets: Pople’s [27] 6-311++G(d,p); Dunning’s [28] aug-cc-pvdz, aug-cc-pvtz, and daug-cc-pvdz; and Ahlrichs’ [29, 30] (def2tzvpp). The latter basis set gives results that DFT calculations can be regarded as close to the complete basis set limit, whereas cam-b3lyp/def2tzvpp calculations resulted in one of the best performance for optimizing molecular geometries [31]. All calculations with full geometry optimizations were performed using Gaussian 09 [32].

Results and discussion

The research focused on the selection of the most optimal computational level to study stacking interactions in 9H adenine dimers. To evaluate the final results, the energy criteria were taken into consideration. However, due to possible geometry changes during optimization, special attention was also paid to deviation between input and output structures. Additionally, since the carried calculations were devoted to a simple model system, underlying further modification in the future work, ease of converging was also an important factor for final selection of the optimum method/basis set. What is more, the selection of a relatively stable system and energy of stacking interactions within was also of our interest.

Assessment of energy values of interactions in output geometries

Interaction energy (Eint) between two fragments of A···B system was calculated according to Eq. (1):

$$ {E}_{\mathrm{int}}={E}_{\mathrm{A}\cdots \mathrm{B}}\left({\mathrm{basis}}_{\mathrm{A}\cdots \mathrm{B}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right)\hbox{--} {E}_{\mathrm{A}}\left({\mathrm{basis}}_{\mathrm{A}\cdots \mathrm{B}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right)\hbox{--} {E}_{\mathrm{B}}\left({\mathrm{basis}}_{\mathrm{A}\cdots \mathrm{B}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right) $$
(1)

where EA(basisA···B; optA···B) and EB(basisA···B; optA···B) are the energies of the A and B molecules, respectively, for its geometries obtained during the optimization of the A···B system and calculated using internal coordinates of the A and B molecules; basisA···B; EA···B(basisA···B; optA···B) means the energy of the optimal A···B complex.

So, all of the interaction energies have been corrected for the basis set superposition error (BSSE) using the counterpoise technique [33, 34]. BSSE is determined by the equation:

$$ \mathrm{B}\mathrm{SSE}={E}_{\mathrm{A}}\left({\mathrm{basis}}_{\mathrm{A}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right)\hbox{--} {E}_{\mathrm{A}}\left({\mathrm{basis}}_{\mathrm{A}\cdots \mathrm{B}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right)+{E}_{\mathrm{B}}\left({\mathrm{basis}}_{\mathrm{B}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right)\hbox{--} {E}_{\mathrm{B}}\left({\mathrm{basis}}_{\mathrm{A}\cdots \mathrm{B}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right) $$
(2)

The total energy of interaction (Etot), also known as binding energy, is a sum of the interaction energy, Eq. (1), and deformation (Edef). The latter is the amount of energy-characterizing changes in geometries of A and B from the optimized ones to their geometry in the complex (A···B), and therefore is always positive. The deformation energy can be calculated as:

$$ {E}_{\mathrm{def}}={E}_{\mathrm{A}}\left({\mathrm{basis}}_{\mathrm{A}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right)\hbox{--} {E}_{\mathrm{A}}\left({\mathrm{basis}}_{\mathrm{A}};{\mathrm{opt}}_{\mathrm{A}}\right)+{E}_{\mathrm{B}}\left({\mathrm{basis}}_{\mathrm{B}};{\mathrm{opt}}_{\mathrm{A}\cdots \mathrm{B}}\right)\hbox{--} {E}_{\mathrm{B}}\left({\mathrm{basis}}_{\mathrm{B}};{\mathrm{opt}}_{\mathrm{B}}\right) $$
(3)

The obtained values of interaction energies are presented in Table 1 and Fig. 2, while BSSE values are also shown in Fig. S1 (Supplementary Information). In addition, energy values for the obtained optimal geometries of the studied systems and deformation energies are gathered in Tables S1and S2, respectively.

Table 1 Estimated energy of the interactions, Eint, between adenine molecules in the analyzed systems; BSSE correction included and its value provided in the brackets; both in kcal mol−1
Fig. 2
figure 2

Estimated energy of the interactions between adenine molecules in the analyzed systems for selected method/basis set variants (for clarity, data points were connected with solid line)

Although the values of the interaction energy differ depending on the used level of theory, the trends remain similar, except for the results obtained for CAM-B3LYP/def2tvzpp optimizations (presented below in a separate subsection). It appears that stacking interactions between parallel adenine molecules (system A) are weak and its mean value is − 2.44 ± 1.04 kcal mol−1. Moreover, in the case of this system as well as for B and C ones, the weakest interactions were predicted by the M06-2X functional. Interactions in geometries D, E, and G are described by similar energy values (− 8.73 ± 0.64, − 8.98 ± 0.50, and − 9.03 ± 0.56 kcal mol−1, respectively, without including CAM-B3LYP/def2tvzpp results), and what is not surprising is that after the optimization, the mutual orientation of adenine molecules in those systems is averaged (Fig. 3). The strongest stacking interactions were found for F system (Eint from − 9.41 up to − 11.36 kcal mol−1, see Table 1 and S1), and the largest geometry changes were also found in this case (see below).

Fig. 3
figure 3

RMS values for AH systems including data from selected method/basis set optimizations

BSSE values seem to be almost constant for each method/basis set used, and thus system A seems to be an exception, since obtained BSSE values for this geometry are lower in comparison to those of any other system (Fig. S1 in SI). The smallest BSSE values (ca. 0.5 kcal mol−1) are found in the case of the largest basis sets, i.e., the triple ζ type (aug-cc-pvtz and def2tzvpp), as expected. Furthermore, in most cases, BSSE values are greater than the calculated deformation energies (Table S2 in SI).

Assessment of output geometries

Bearing in mind the importance of changes in the geometry of the optimized systems, quantitative parameter, namely RMS (root mean square) indicating average distance between heavy atoms in systems before and after calculations, was introduced to the study. Table 2 contains obtained values of RMS parameter.

Table 2 RMS values of systems after optimizations in comparison to input geometries

As mentioned above, the input geometries of studied systems were “artificial,” not optimized ones, and thus they should be treated only as reference, not as a goal configuration.

Systems A, D, E, and G were successfully optimized without significant geometry changes in all used methods and basis sets, apart from CAM-B3LYP/def2tzvpp (Figs. 3 and 4). In the case of the first two, optimized conformations showed higher raise [35], when compared to input geometries. A raise and a discrete shift [35] of adenine molecule was observed in the case of E and G systems. In both cases, changes lead to the final conformation close to D system. B and H were found to be the most unstable input configurations, what resulted in difficulties in converging as well as inconsistent optimal geometry throughout the applied level of theory (Fig. 4). Geometry changes in the system F lead unanimously to the tilt and the twist of adenine molecule (depicted in Fig. 4). Additionally, these geometry changes can be connected to the strongest stacking interactions (the highest absolute value of the stacking energy, Eint, see Fig. 2 and Table 1).

Fig. 4
figure 4

The overlay of dimers before and after optimization for tested methods/basis set where stacking interactions were preserved. In all schemes, input structure was marked in purple. Optimized with CAM-B3LYP/def2tzvpp geometry was marked in yellow

Optimization results with CAM-B3LYP/def2tzvpp

The most extreme geometry changes were observed using the CAM-B3LYP method. Although input structures consisted the systems exhibiting π···π interactions, in 4 out of 8 examples (systems A,B, C, F, and H), adenine molecules were shifted to be co-planar and further stabilized by hydrogen bonds. What is worth noticing, in structures B, C, F, and H, hydrogen bonds have been formed spontaneously (Fig. 5, Table 3). In the case of system A, molecules have shifted, destroying stacking interactions and trying to form H-bonds, yet optimal geometry has not been reached.

Fig. 5
figure 5

Output structures optimized at the CAM-B3LYP/def2tzvpp level of theory where H-bond-stabilized motifs were observed instead of expected stacking

Table 3 Hydrogen bonds in output structures from CAM-B3LYP/def2tzvpp optimization

Centrosymmetric, hydrogen bond–stabilized dimers formed in systems C and H are also present in adenine crystal structure deposited in the crystallographic database CSD [36]. Experimental D···A distance lengths are significantly shorter; however, overall geometry is reasonably well predicted. The resulting dimers of B and F are not found in any crystal structures of adenine, probably due to their less preferable, asymmetric character, yet their geometry remains realizable. Estimated energy values of hydrogen bonds formed in the case of B and C systems are close to average stacking interaction energy for E and F systems obtained by different methods/basis sets, respectively. It can be concluded that although CAM-B3LYP/def2tvzpp level of theory provides well simulation of H-bond geometry, the energy of those interactions remains underestimated.

In the case of D, E, and G systems, stacking interactions are preserved, yet interaction energies were heavily underestimated. Thus, both energy and RMS values for optimization results visibly deviate from the ones obtained with a different level of theory. In Fig. 4, in yellow color, CAM-B3LYP/def2tvzpp output geometry was distinguished. It can be concluded that the overestimated distance between adenine molecules resulted in an understatement of energy of stacking interactions.

Conclusions

The clue of conducted research was to determine which method/basis set would be appropriate to analyze stacking interactions between adenine molecules, simulating aggregates present in secondary structure of DNA and RNA nucleic acids. In the course of the research, system A (presenting parallel orientation of adenine monomers) and system D (contained molecules twisted by 180°) appeared to be the easiest to converge and thus the most stable from this point of view. Twisted F geometry was exhibiting the strongest stacking interactions, i.e., the interaction energy equaled about − 10.7 kcal mol−1, meanwhile energy of A and D systems was determined as ca. − 2.4 kcal mol−1 and − 8.7 kcal mol−1, respectively.

Disproportionally long walltime was necessary to finalize optimization in the case of use aug-cc-pvtz and daug-cc-pvdz basis sets, and thus despite more accurate energy estimation for some systems, complete data in those cases was not obtained.

Taking into consideration the comparison of energy values for each system between all applied methods/data sets and the ease of converging, three of the most optimal methods/basis sets have been chosen to be the best in describing stacking interactions, namely wB97XD/6-311G(p,d), wB97XD/aug-cc-pvdz, and B97D3/aug-cc-pvdz.

Calculations performed with CAM-B3LYP/def2tvzpp in some cases unexpectedly resulted in formation of hydrogen bonds between adenine molecules. Although in the literature this method/basis set variant was successfully used to describe geometry and π-electron delocalization of hetero- and polycyclic molecules [31, 37], it is not appropriate in the case of stacking interactions in adenine dimers.