INTRODUCTION

Derivatives and analogs of nucleic acids (NA) are used to solve the problems of targeted therapy of various diseases [1, 2], biosensor technologies, molecular biology [3], and biotechnology [4]. Although there are numerous NA derivatives with improved properties relative to natural analogs [4], they do not fully meet the requirements of researchers, and their use in practice is often limited by patent law. Thus, the creation of new derivatives with predetermined biological and physicochemical properties remains an urgent task in chemistry and physical chemistry of nucleic acids. One of the most important properties of these analogs is their ability to efficiently interact with complementary NA sequences. To date, there are no methods for evaluating the hybridization properties of NA derivatives before chemical synthesis. For well-studied compounds, several approaches based on the analysis of empirical dependencies have been developed that allow calculation of the melting temperature [13, 14] and the energy of the formation of their complexes with DNA [511] and/or RNA [12], which depend on the length and nucleotide composition of the interacting strands. In the case of new compounds, it is necessary to use approaches that do not rely on experimental information on their hybridization properties. Methods based on molecular mechanics and/or molecular dynamics (MD) are suitable for large compounds consisting of more than 100 atoms.

Over the past decade, there has been great progress in the development of various approaches for calculating the thermodynamic parameters of intermolecular interactions [15]. The most common methods are the MMPBSA (Molecular Mechanics/Poisson Boltzmann Surface Area) [16], MMGBSA (Molecular Mechanics/Generalized Born Surface Area) [17, 18], and methods for analyzing the potential mean force (PMF), for example the weighted histogram analysis method (WHAM) [19] or the method of the variational free energy profile (vFEP) [20]. Over the past decade, several approaches have also been developed which allow one to effectively cover the analyzed phase space for a more productive calculation of the free energy. Among these methods, one can distinguish methods of alchemical transformations [21], e.g., methods of free energy perturbation (FEP) [22] and thermodynamic integration (TI) [23]. Their disadvantages include large expenditures of computing power and calculation time to obtain reliable modeling results. Therefore, the above-mentioned approaches cannot be applied to calculate the energies of a set of complexes of different lengths and GC composition.

Simple and computationally efficient techniques, such as MMGBSA and MMPBSA are very popular for evaluating the energy of complex formation. In these methods, the energies of the entire complex and its components obtained by MD modeling are calculated for each conformation of molecules, and the solvent (which is presented in the simulation explicitly) is replaced by a continuous medium with a given dielectric permittivity. In these approaches, the binding energy (ΔGbind) consists of a change in the total molecular mechanical energy (ΔEMM), the solvation energy (ΔGsolv), and the contribution of conformational entropy (TΔS) [24]:

$$\Delta {{G}_{{{\text{bind}}}}} = {\text{ }}\Delta {{E}_{{{\text{MM}}}}} + {\text{ }}\Delta {{G}_{{{\text{solv}}}}} - T\Delta S.$$
(1)

In turn, the internal energy (ΔEint) (deformation of bonds, angles, and dihedral angles), electrostatic energy (ΔEele), and Van der Waals interactions (ΔEvdW) contribute to the ΔEMM value:

$$\Delta {{E}_{{{\text{MM}}}}} = {\text{ }}\Delta {{E}_{{{\text{int}}}}} + {\text{ }}\Delta {{E}_{{{\text{ele}}}}} + {\text{ }}\Delta {{E}_{{{\text{vdW}}}}}.$$
(2)

The solvation energy is the sum of the contributions from the polar (ΔGPB/GB) and non-polar (ΔGSA) interaction of a studied molecule with solvent:

$$\Delta {{G}_{{{\text{solv}}}}} = {\text{ }}\Delta {{G}_{{{{{\text{PB}}} \mathord{\left/ {\vphantom {{{\text{PB}}} {{\text{GB}}}}} \right. \kern-0em} {{\text{GB}}}}}}} + {\text{ }}\Delta {{G}_{{{\text{SA}}}}}.$$
(3)

The first member of equation (3), ΔGPB/GB, is the electrostatic energy, which can be calculated either by the direct solution of the Poisson‒Boltzmann equation (PB, the MMPBSA method), or using the generalized Born method (GB, the MMGBSA method). The contribution of nonpolar interactions, ΔGSA, is usually represented as a linear function of the area of the molecule available for interaction with the solvent (SASA) [25]:

$$\Delta {{G}_{{{\text{SA}}}}} = \gamma SASA + b,$$
(4)

where γ and b are coefficients of the linear equation.

To calculate the Gibbs free energy, the entropy contribution including the conformation component is usually evaluated by normal mode analysis (NMA) [26] or quasi-harmonic approximation (Q-Harm) [27]. The former method is characterized by high computational complexity and allows analysis of the entropy contribution for only a small number of snapshots of the MD trajectory (up to several thousand), while the latter method is significantly faster and more efficient.

These methods of estimating the energy of intermolecular interactions are widely used when studying the interactions of biopolymers with small molecules, e.g., in molecular biological studies [28] or the development of new drugs [29]. The successful application of these approaches has also been demonstrated for calculation of the energy of complex formation with DNA and RNA of both native [30, 31] and chemically modified oligonucleotides [3236]. However, in the latter case, the research is not systematic; it is rather an illustration of the possibilities of the methods.

The creation of an approach for the predictive calculation of hybridization properties of different classes of NA derivatives could become the basis for the rational design of these compounds and the subsequent directed search for methods of chemical synthesis. At the first stage, it is necessary to verify the applicability of calculations which have been performed for well-studied unmodified NA complexes, to new, including not yet synthesized, NA analogs. We have previously demonstrated the successful application of the MPB(GB)SA approaches for calculating the energy and thermal stability of a set of more than 300 DNA/DNA duplexes [30]. During this study, we analyzed the possibility of calculating the hybridization properties of DNA/RNA and RNA/RNA complexes by the classical MD method.

The goal of the present work was to test the applicability of the MMPBSA and MMGBSA methods for calculating the enthalpy component and the Q-Harm and NMA methods for calculating the entropy component of the Gibbs energy of DNA/RNA and RNA/RNA duplexes. The MD method was used to simulate representative sets of oligonucleotides and their complexes. The resulting trajectories were analyzed using the MMPBSA and MMGBSA approaches, and the entropy was calculated using the NMA and Q-Harm methods. The data were analyzed and compared with the experimental values of the thermodynamic parameters of oligonucleotide complex formation.

EXPERIMENTAL

Database of thermodynamic parameters. The experimental values of the thermodynamic parameters of the formation of the NA complexes are taken from previously published studies. For complementary DNA/RNA duplexes, we used the data of Sugimoto et al. [7]. Duplexes that contained 3'-terminal phosphate moieties were excluded from consideration. Thus, the database contained thermodynamic parameters of complex formation for 65 duplexes with a length from 5 to 12 bp (on average 8 bp) and composition of G/C pairs from 0 to 84% (on average 53%). For complementary RNA/RNA duplexes, we used the data of Xia et al. [37]. Duplexes that contained terminal phosphate groups were also excluded from consideration. Thus, the database contained the thermodynamic parameters of complex formation for 75 duplexes with a length of 4 to 11 bp (on average 7 bp) and composition of G/C pairs from 22 to 100% (on average 58%).

Preparation of molecular systems for modeling. Duplex structures were prepared in the NAB program of the AmberTools14 software package [38] using the Chimera program [39]. The starting structures of RNA/DNA and RNA/RNA duplexes had the A form of a double helix. Single-stranded oligonucleotides were obtained from duplexes by removing one of the strands.

Molecular dynamics simulation. MD modeling was performed using the AMBER 14 software package and parallel computing on CPUs and GPUs with CUDA architecture. The ff99bsc0 [40] and ff99bsc0+chiOL3 [41] force fields were used for DNA and RNA, respectively. MD simulations were performed in an explicit water shell at a temperature of 300 K. The electrostatic interactions were calculated by the Particle Mesh Ewald Method with a lattice step of 1 Å. The pressure in the system was 1 bar (Berendsen barostat); the temperature in the system was maintained by scaling the atom’s speeds with a period of 10 ps. The SHAKE algorithm was used to increase the step of simulation up to 2 fs. We used a nonbonded cut-off of 10 Å. The coordinates of each atom in the system were recorded every 10 ps of simulation. Duplexes and single-stranded oligonucleotides were placed in a cubic cell with a distance of 12 Å from the simulated molecule to the cell boundaries. The amount of water in the simulated cell was in the range of 4000–8000 molecules. Sodium ions were used to neutralize the negative charge of NA in the periodic cell.

The simulation procedure consisted of the following steps:

(1) Creation of a PDB file that contains data on the coordinates of each atom in the structure of a duplex or single-stranded NA.

(2) Creation of a water environment (TIP3P water model, cubic periodic conditions, 12 Å); addition of sodium ions to neutralize the periodic system.

(3) Minimization of the entire system with a fixed NA (10000 minimization steps) (PMEMD.MPI).

(4) Heating of the system with a fixed NA for 2.5 ns with an integration step of 0.5 fs from 0 to 300 K (PMEMD.CUDA).

(5) Equilibrating the density of the system at a constant pressure of 1 bar for 500 ps (SANDER) with a fixed NA.

(6) Equilibrating the system at a constant pressure of 1 bar and a temperature of 300 K for 5 ns (PMEMD. CUDA).

(7) MD simulation for 100 ns in an NPT ensemble (1 bar, 300 K) (PMEMD.CUDA).

(8) Calculation of the energy of complex formation using the MMGBSA and MMPBSA methods; calculation of entropy in Q-Form and NAME (MMPBSA.MPI). Calculations were performed using only the duplex trajectory (single-trajectory approach) and independently obtained trajectories of duplex and single-stranded oligonucleotides (three-trajectory approach).

To calculate by the MMGBSA method, we used the parameters of the generalized Born method proposed by Tsui and Case [17]. The ionic strength of the solution was 100 mM. Each recorded snapshot of the MD trajectory was used in calculation by the MMGBSA, MMPBSA, and Q-Harm methods, and every tenth saved conformation was used in the case of NMA.

RESULTS

We have previously shown that computer calculations for evaluating the hybridization properties of DNA/DNA duplexes provide accuracy comparable to that of experimental methods [30]. Therefore, we used the MD modeling and calculation of the energy of duplex formation concerning DNA/RNA and RNA/RNA complexes. We used a database of thermodynamic parameters of the complex formation including experimental values of the enthalpy, entropy, and Gibbs free energy of the formation of NA duplexes. The characteristics of the data sets are given in the Experimental section.

At the first stage, molecular structures were designed and calculated by the MD method in an explicit water shell for 100 ns for 65 DNA/RNA duplexes [7], 75 RNA/RNA duplexes [37], and their single-stranded oligonucleotide components. In the case of self-complementary oligonucleotides, calculations were performed using two different initial velocities of atoms. The resulting MD trajectories were analyzed.

Analysis of Complex Structure

The modeling of the RNA/RNA duplexes showed that the structure of the A form of the double helix was preserved along the entire trajectory with a length of 100 ns for each of the 75 analyzed complexes. The deviation of the geometry of the helix from the initial unbalanced structure during modeling did not exceed 2.5 Å, the standard deviation (RMSD) between the atoms. The complexes after equilibrating for 10 ns slightly changed their conformation during the next 90 ns of modeling. The average RMSD value along the MD trajectory for the entire set of complexes was 1.56 Å and did not exceed 2.56 Å for each snapshot of the trajectory. Unwinding of the end pairs of nucleotides was observed in 11 of the 75 studied duplexes. In some cases, reversible destruction of the second base pair occurred for duplexes with terminal A/U- or A/T-pairs, while the terminal adenine and the pre-terminal nucleotide of the same strand retained a conformation close to that in the double helix.

When modeling DNA/RNA duplexes, deviation from the starting structure did not exceed 4.1 Å. In 17 of the 65 studied structures, unwinding of the terminal pairs was observed as in the bimolecular RNA/RNA complexes. The average RMSD value along the MD trajectory for the entire set of complexes was 2.26 Å and did not exceed 3.86 Å for each snapshot of the trajectory.

The structures of the DNA/RNA and RNA/RNA duplexes were slightly different. Analysis of RNA/RNA complexes using the X3DNA software [42, 43] revealed that these complexes are in the A form of the double helix. The structure of DNA/RNA duplexes can be characterized as intermediate between the A and B forms. This difference in conformations is known and has been shown by experimental methods, i.e., NMR spectroscopy [44], X-ray diffraction analysis [45], and circular dichroism spectroscopy [44, 46, 47]. We did not consider homonucleotide sequences of the DNA/RNA and RNA/RNA duplexes because of the peculiarities of their structures [48, 49].

The MD analysis of single-stranded oligonucleotides revealed significant conformational rearrangements in both DNA and RNA strands. In most cases, we observed local destruction of single-stranded stacking between adjacent bases and, in some cases, the formation of intramolecular hairpin structures, which can be quite stable over a trajectory length of 100 ns. This leads to significantly higher RMSD values: both averaged values along the trajectory (5.3 Å for the RNA strand and 5.7 Å for the DNA strand, on average for all the studied oligonucleotides) and maximum values (8.0 and 9.5 Å for the RNA and DNA strands, respectively) compared to those for the initial structures. This behavior of single-stranded forms is typical for nucleic acids, which follows from the published experimental data [50, 51] and the results of computer modeling [5254].

Energy of Formation of RNA/RNA Complexes

The energy of the formation of NA complexes was evaluated as the difference in the energies of the double-stranded and single-stranded states. The change in the internal energy of complex formation was calculated by MD modeling using MMPBSA and MMGBSA. The NME and Q-Harm methods were used to calculate the contribution of the configuration entropy. The contribution of single-stranded states was assessed by analyzing the MD trajectories of two separate oligonucleotides (three-trajectory approach, 3tr). Alternatively, individual oligonucleotides were isolated from the MD trajectory of RNA duplexes, and the resulting set of conformations was used for calculations (single-trajectory approach, 1tr). The Gibbs energy of complex formation was calculated using various combinations of the enthalpy and entropy values of complex formation obtained in single- or three-trajectory approaches.

A comparison of the experimental values of the thermodynamic parameters and the values obtained in single- and three-trajectory approaches for RNA/RNA complexes is shown in Fig. 1 and Table 1. The best correlation between the experimental and calculated values of the enthalpy of complex formation was observed when using the MMGBSA method and the three-trajectory approach. The correlation coefficient R2 (where R is the Pearson correlation coefficient) between the experimental and calculated values was 0.72; in this case, the slope of the linear dependence was close to unity (0.95), and the shift coefficient was only 4.6 kcal/mol. When using this dependence to calculate the enthalpy of complex formation, the average absolute value of its prediction error was 8.1 kcal/mol or 16.3%. When using the MMPBSA method, the correlation of the experimental and calculated values was much worse than when using the MMGBSA method, i.e., the average absolute error was much higher in the case of a single-trajectory approach and significantly lower in the case of a three-trajectory approach (Table 1).

Fig. 1.
figure 1

Correlation analysis of the thermodynamic parameter values evaluated experimentally and calculated by the MD method for RNA/RNA duplexes. The enthalpy values (ΔH0) are calculated by the MMGBSA methods using the analysis of one (1tr) and three (3tr) trajectories (a). The Gibbs energy values at 37°C (\(\Delta G_{{37}}^{^\circ }\)) are evaluated experimentally and by the analysis of a single MD trajectory using a combination of the MMGBSA and NMA or Q-Harm methods (b). The order of the equations corresponds to the order of the notation in the legend.

Table 1.   Comparison of the experimental thermodynamic parameter values for RNA/RNA duplexes with those evaluated by analyzing MD trajectoriesa

The introduction of a linear empirical correction to the calculated enthalpy values of complex formation, which is obtained from the correlation of calculated and experimental values (ΔH°(Exp) = aΔH0(MD) + b), decreases the absolute and relative error values to 3.7 kcal/mol and by up to 8.1% when calculated by the MMGBSA (3 tr) method. This decrease is lower than the common experimental accuracy of evaluating the complexation enthalpy (10%) [37].

The value of the entropy of the complex formation was calculated by the Q-Harm and NMA when analyzing one or three independent trajectories. A comparison of the calculated and experimental values shows that the best correlation (R2 = 0.67) is observed when using a quasi-harmonic approximation for the analysis of a single trajectory (Table 1). The average value of the relative error in calculating the entropy, in this case, is 83.4 cal mol‒1 K‒1, or 54%. This is significantly worse in percentage terms than in the case of the optimal method of calculating the enthalpy by MMPGSA using three-trajectory analysis. The three-trajectory analysis gives 1.2–2.5 times lower error values in calculating the entropy of complex formation, whereas the observed correlations are significantly worse compared to the single trajectory analysis. It should be noted that linear dependencies demonstrate the proportionality coefficient between the calculated and experimental entropy values in the range 1.13–2.40 and the free term of the linear dependence in the range ‒34.99 + 16.48 cal mol‒1 K‒1. This means that the configuration entropy is largely correlated with but does not make a determining contribution to the entropy of complex formation. The use of a linear correction to approximate the calculated values of the entropy of complex formation to the experimental data significantly decreases the calculation error to 8.7% when using a quasi-harmonic approximation in a single-trajectory approach (Table 1). The less reliable predictive ability of the entropy calculation methods in the three-trajectory approach is most likely caused by incomplete coverage of the configuration space of single-stranded oligomers at a given length of the trajectory in the classical MD method. In addition, the observed significant difference in the entropy also depends on the ionic strength of the solution, which, when modeled, differs from the standard experimental conditions (1 M NaCl, neutral pH values). In one of the classical concepts, the effect of the concentration of monovalent ions on the thermal stability of intermolecular NA complexes can be represented as an entropy correction, which is proportional to the logarithm of the concentration of cations in solution and the number of negatively charged phosphate residues in the NA strand [37, 55, 56]. This dependence corresponds to the observed trend, i.e., the linear dependence of the experimental and calculated values of the entropy of duplex formation. However, it is not possible to take full account of the contribution of NA duplexes to the thermal stability within the classical MD method.

The Gibbs free energy value is most often used in practice when comparing the hybridization properties of oligonucleotides [55, 56]. This parameter determines the value of the equilibrium constant in the Van’t Hoff equation and, consequently, the ratio of the concentrations of single- and double-stranded states of oligonucleotides in solution at different temperatures. We considered the predictive ability of combinations of entropy and enthalpy calculation methods when evaluating the Gibbs energy of duplex formation in the single- and three-trajectory approaches. The results of correlations of the calculated and experimental values are shown in Fig. 1 and Table 1. The best correlation with R2 = 0.85 in evaluating the enthalpy and entropy of complex formation is observed when using single-trajectory analysis by the NMA and Q-Harm methods, respectively. The linear dependence shows a significant slope (a = 0.29) and a nonzero shift coefficient (b = –4.0 kcal/mol) (Fig. 1b). This leads to large values of the absolute error value. As can be seen from the data for the prediction of the enthalpy and entropy of complex formation (Table 1), both of these components contribute to the non-unity correlation slope of the calculated and experimental values of the Gibbs free energy change and promote intersection with the abscissa axis at a non-zero point. Consideration of the linear correlation significantly reduces the error to 0.68 kcal/mol, or 8.63%, which coincides with the average experimental error for NA duplexes (~7%) [37].

The analysis shows sufficiently close experimental and calculated values (without taking into account linear corrections) of the complex formation energy only for the enthalpy change, which is calculated by the MMGBSA method using the three-trajectory approach. The nearest neighbor model gives comparable error values [37]. We compared the previously published increments of the formation of the dinucleotide steps of the helix with the increments based on the calculation of both the experimental and computer analysis data (Fig. 2).

Fig. 2.
figure 2

Correlation analysis of the thermodynamic parameter values evaluated experimentally and calculated by the MD method for RNA/RNA duplexes. The predictive efficiency of the nearest neighbor method (NN; 5'-NN/NN-3') for the enthalpy of the RNA/RNA duplex formation evaluated experimentally (Exp) and calculated by the MMGBSA method using the three-trajectory analysis (MMGBSA, 3tr) (a). Comparison of the increments of the nearest neighbor model for RNA/RNA complexes, which were published by Xia et al. [37], evaluated experimentally (Exp), and calculated by the (MMGBSA, 3tr) method (b). The order of the equations corresponds to the order of the notation in the legend. Init., the initiation factor of the double helix formation.

The nearest neighbor model equally well describes the experimental data and the results calculated by the MMPBSA method using the three-trajectory approach (Fig. 2a). All increments are significant (p-value < 4 × 10‒6) except for the increment of the initiation factor for experimental data (p-value = 0.14).

The literature values of the formation of dinucleotide pairs are close to those which are based on the experimental and calculated data, and are the same within the limits of errors. The exceptions were increments for GA/UC and GC/CG, which were calculated by the MMPBSA method (Fig. 2b). In this case, the difference from the experimental values for the studied database was 4.3 and 3.8 kcal/mol, respectively. The most significant difference was observed for the initiation factor (Fig. 2b, increment “Init”), the value of which was –9.9 kcal/mol when analyzing the MD trajectories and +5.9 kcal/mol when analyzing the same database of experimental values. This difference is probably caused by the low significance of the increment based on the experimental data. Similarly, the correction associated with the terminal A/U pair is more negative, i.e., +5.4 vs. +9.4 kcal/mol, respectively. This may be caused by an incorrect account of the interaction of the terminal base pairs and an insufficiently long MD trajectory, which makes it impossible to take into account the effect of opening the terminal pairs with high confidence. At the same time, the data set indicates the adequacy of the proposed approach including MD modeling (force field, modeling method) and energy calculation using the MMPBSA method to account for interactions within the duplex structure of RNA.

Energy of Formation of DNA/RNA Complexes

In the case of hybrid DNA/RNA duplexes, we revealed a linear correlation between the calculated and experimental values of the enthalpy of complex formation (Fig. 3, Table 2). The correlation of the changes in the enthalpy was observed only when analyzing one trajectory by the MMGBSA method (R2 = 0.71). The slope of the correlation straight line is close to unity (a = 0.841), and the free term is close to zero (b = –0.982 kcal/mol). This method of calculating the enthalpy gave 23% relative error and 12.0 kcal/mol of absolute error. In all other cases, the correlation coefficient R2 did not exceed 0.64. At the same time, the MMPBSA method in the three-trajectory analysis led to the minimum values of the calculation errors (16% and 11.4 kcal/mol) in the absence of correlation (R2 = 0.13) as in the case of the RNA/RNA complexes (Tables 1 and 2).

Fig. 3.
figure 3

Correlation analysis of the thermodynamic parameter values evaluated experimentally and calculated by the MD method for DNA/RNA duplexes. The enthalpy values (ΔH °) are calculated by the MMGBSA methods using the analysis of one (1tr) and three (3tr) trajectories (a). The Gibbs free energy values at 37°C (\(\Delta G_{{37}}^{^\circ }\)) evaluated experimentally and by the analysis of a single MD trajectory using a combination of the MMGBSA and NMA or Q-Harm methods (b). The order of the equations corresponds to the order of the notation in the legend.

Table 2.   Comparison of the experimental thermodynamic parameter values for DNA/RNA duplexes with those evaluated by analyzing MD trajectoriesa

The introduction of a linear correction reduces the error to 8.2% (5.1 kcal/mol) in the case of the MMGBSA method when analyzing the trajectory of DNA/RNA duplexes only. This error is close to that for the RNA/RNA complexes.

The values of the entropy of the formation of DNA/RNA complexes calculated by the NMA method in the analysis of a single trajectory are best correlated with the experimental values (R2 = 0.74), while the average absolute error is 47.5 cal mol‒1 K‒1, or 29% (Table 2). When using the quasi-harmonic approximation in a single-trajectory analysis, the correlation coefficient was lower, R2 = 0.55, and the average absolute error was 78.2 cal mol‒1 K‒1, or 49%. The proportionality coefficient between the calculated and experimental values of the entropy of the DNA/RNA complexes is in the range of 1.3–1.8, and the free term of the linear dependence is in the range of ‒3.0…+41 cal mol–1 K–1, which indicates significant differences between the experimental values of the entropy of complex formation and calculated values of the configuration entropy. The correlation coefficient and slope are close to those in the case of the RNA/RNA complexes (Tables 1 and 2), which indicates the general nature of the observed patterns for different types of complexes.

The calculation of the entropy of the formation of RNA-containing duplexes in the quasi-harmonic approximation when analyzing a single trajectory gives a good linear correlation with the experimental data. However, the slope of the linear approximation differs significantly from unity, which makes it difficult to use this approach to evaluate the total entropy of complex formation. It is worth noting that the calculation of the entropy in the quasi-harmonic approximation is several orders of magnitude faster than in the analysis of normal mode analysis. This justifies the use of the Q-Harm approximation in evaluating the Gibbs energy for a large set of DNA/RNA and RNA/RNA complexes.

The values of the Gibbs energy of the formation of hybrid complexes calculated by MMGBSA and using the configuration entropy values are highly correlated with the experimental values for DNA/RNA duplexes (Fig. 3). The correlation coefficients, R2, are 0.82 and 0.78 when calculating the entropy by Q-Harm and NMA, respectively. In other cases, there is no significant correlation (R2 < 0.06).

The analysis of the applicability of the nearest neighbor model to describe the values of the enthalpy of complex formation revealed a high predictive efficiency of this model when applied to data obtained experimentally and by the MMPBSA method when analyzing a single trajectory (Fig. 4a). The increments of the nearest neighbor model, which describe the initiation factor of complex formation were not statistically significant, i.e., the p-values were 0.11 and 0.69 for experimental and calculated data, respectively. When analyzing the experimental database, the dinucleotide CU/AG and UU/AA pairs are of low significance. For the enthalpy values calculated by the MMGBSA in the single-trajectory approach, all increments, except for the initiation factor, were statistically significant (p-value < 4 × 10–5).

Fig. 4.
figure 4

Correlation analysis of the thermodynamic parameter values evaluated experimentally and calculated by the MD method for DNA/RNA duplexes. Predictive efficiency of the nearest neighbor method (NN; 5'-NN/NN-3') for the enthalpy of the DNA/RNA duplex formation evaluated experimentally (Exp.) and calculated by the MMGBSA method using the three-trajectory analysis (MMGBSA, 3tr) (a). Comparison of the increments of the nearest neighbor model for DNA/RNA complexes, which were published by Sugimoto et al. [7], evaluated experimentally (Exp), and calculated by the (MMGBSA, 3tr) method (b). The order of the equations corresponds to the order of the notation in the legend. Init, the initiation factor of the double helix formation.

The increment values for experimental data, which we evaluated and published earlier, differed significantly for the initiation factor and dinucleotide pairs that contained cytosine or guanine at the 5' end of the RNA strand (rСХ/dX'G, or rGХ/dX'C, where X = A, C, U, or G, except for the rGU/dAC dinucleotide pair) (Fig. 4b). The evaluated increments of rСХ/dX'G and rGХ/dX'C have amplitudes lower and higher, respectively, compared to the literature data. The UG and UU increments have also a lower contribution to the stability of the double helix. The comparison of the increments of the collected database shows that the values of the increments of rСХ/dX'G, rGG/dCC, and rGU/dAC of the experimental enthalpy values of complex formation are significantly lower compared to those calculated by the MMGBSA method in the single-trajectory approach. This may be caused by the inconsistency of the force fields for DNA and RNA when modeling hybrid complexes, which have been optimized for DNA/DNA and RNA/RNA duplexes. In addition, a relatively small sample and a low variety of lengths and nucleotide compositions of model complexes may lead to sufficiently high error values for individual increments in the nearest-neighbor model (Fig. 4b). Expanding the sample set in the future will allow for more reliable results.

The negative values of the initiation factors and end pairs calculated by the MD method using the nearest neighbor model exceed the corresponding experimental data for the DNA/DNA [30], RNA/RNA, and DNA/RNA complexes. This indicates a systematic difference in the MD modeling or MMGBSA calculations in comparison with experimental data. The NMR method shows that the terminal base pairs are in equilibrium between the open and closed states [57] and the opening of the end pairs is not observed for all complexes. These events are not numerously repeated within the considered trajectory for each modeled complex. Therefore, it makes it impossible to reliably account for contributions from terminal base pairs when calculating thermodynamic parameters. Nevertheless, the MMGBSA method gives the best correlation between the experimental and the calculated values for all three types of complexes.

Our results demonstrate the possibility of calculating the thermodynamic parameters for the formation of NA complexes by the classical MD method. When analyzing sets of the RNA/RNA, DNA/RNA, and DNA/DNA duplexes [30], we have found that the enthalpy of complex formation calculated by the MMGBSA method when analyzing the MD trajectory of only the complex has the best correlation, on average, between the calculated and experimental values for each type of complex (R2 > 0.73). The error in calculating the enthalpy of complex formation (15‒30%) is significantly higher than the typical experimental error (~10%). A high correlation is also observed when calculating the entropy using the quasi-harmonic approximation in the single-trajectory analysis. In this case, however, the slope and the free term of the linear dependence differed significantly from unity and zero, respectively. The analysis of correlations between the experimental and calculated enthalpy values for complex formation of the DNA/DNA [30], RNA/RNA, and DNA/RNA duplexes shows that the slope angles and free terms of all linear equations are close for these three pairs (Fig. 5). We combined all our results to evaluate a general universal dependence, which will allow one to recalculate the values obtained by the MMGBSA method in a single-trajectory approximation. This linear function has the following form: ΔH°(Exp) = 0.727 ΔH°(MMGBSA, 1tr) 5.32.

Fig. 5.
figure 5

Correlation analysis of the enthalpy (a) and Gibbs free energy (b) values evaluated experimentally and calculated by the MMGBSA method using a single trajectory for different types of complexes. Data on DNA/DNA complexes are taken from Lomzov et al. [30]. The order of the equations corresponds to the order of the notation in the legend.

The use of this linear correction makes it possible to significantly reduce the relative error of calculating the enthalpy of complex formation to 9.3% (7.8%), 12.8% (9.9%), and 7.5% (6.3%) for RNA/RNA, DNA/RNA, and DNA/DNA complexes, respectively (the variance for the given values is indicated in parentheses). On average, the calculation error was 8.6% for all types of complex, which is lower than the typical value of the experimental error (10%).

A significantly higher level of correlation is observed between the value of complex formation energy calculated by the MMGBSA method in the single-trajectory approximation and the experimental value of the Gibbs free energy when all types of complex are considered simultaneously (R2 = 0.897). However, a more in-depth analysis revealed that there are significantly different linear dependencies for each of the considered types of complex, and the highest R2 value was found for a sample of 304 DNA/DNA complexes. The differences in the linear dependence coefficients are caused, among other things, by the different contributions of the entropy for different types of complexes, both in terms of the configuration and solvation component, which characterizes the interaction with ions. This hypothesis is confirmed by various linear dependencies for the experimental Gibbs free energy values and those calculated from the enthalpy and entropy values obtained by the MMGBSA (1tr) method and quasi-harmonic approximation, respectively (Tables 1 and 2). Thus, the linear dependences (Fig. 5b) can be used to calculate the Gibbs free energy only for a particular type of complex using the values of complex formation energy calculated by the MMGBSA method in a single-trajectory approximation. Taking into account corrections in the form of linear functions with their coefficients for each type of complex, the error values of these calculations give \(\Delta G_{{37}}^{^\circ }\) 9.9 (6.7), 10.3 (9.0), and 8.0% (6.9%) for RNA/RNA, DNA/RNA, and DNA/DNA, respectively (the variance for the values given is indicated in parentheses).

The use of the MMGBSA-calculated values of the internal energy change and the configuration entropy in the quasi-harmonic approximation to calculate the Gibbs energy of duplex formation gives results that are linearly related to the experimental values with a high correlation coefficient. This dependence is observed for sets of the RNA/RNA, DNA/RNA, and DNA/DNA duplexes of different lengths and GC composition. At the same time, the high variance of the calculation error indicates that it is necessary either to use methods of more complete coverage of the conformational space (sampling methods) or analyze a large number of complexes of different lengths and nucleotide composition, which will make it possible to reliably compare the calculated data for different types of complexes.

The use of computer modeling methods with more complete coverage of the conformational space in the analysis of trajectories makes it possible to evaluate the value of the Gibbs free energy. Previous publications have shown the fundamental possibility of achieving experimental accuracy of ~1 kcal/mol. Using the umbrella sampling method when using the distance between the ends of the RNA hairpin structure as the coordinate and the weighted histogram analysis method (WHAM), L. Smith et al. [58] obtained a high consistency of experimental and calculated values of free energy. The main disadvantage of this approach is the need to obtain an extremely long MD trajectory, ~200 µs. A combination of alchemical transformation methods and the exchange of replicas may be more economical. In this case, it has been found that the replacement of one base by another of the same class (purine or pyrimidine) in an RNA duplex with a size of 6 bp results in accuracy of 0.55 kcal/mol for calculating the energy [22]; at the same time, the duration of the MD trajectory is small, ~1 µs, which is an order of magnitude greater than in the approach we propose.

MD methods are used to analyze the structure and hybridization properties of not only native oligonucleotides but also their analogs and derivatives to establish their possible spatial structure, the effect of modifications on the thermal stability of complementary complexes, and the reasons for the observed physicochemical and molecular-biological effects. The energy of the formation of modified complexes was calculated for locked NA (LNA) [59], peptidyl NA (PNA) [35], 2'-O-Me RNA [34], glycine-morpholino analogs [32], and phosphoryl guanidine oligonucleotides [33]. Researchers used various approaches including the most common MMPB(GB)SA method, which we also implemented. In most cases, correlation has been shown between the calculated energy values and the experimental characteristics of thermal stability, i.e., the enthalpy, the Gibbs free energy of complexation, or the melting point. However, there is no quantitative agreement in most approaches between the calculated and experimental energy values. This is further aggravated by the fact that a small number of complexes of similar length and/or GC composition are usually analyzed.

Taken together, the results indicate that it is impossible to quantitatively compare the Gibbs energy of the formation of different types of complexes, for example, oligonucleotides with a modified chemical structure, using classical MD and MMPB(GB)SA methods for calculating the enthalpy and the Q-Harm or NMA methods for calculating the entropy of complex formation. Our studies show the possibility to quantitatively compare the enthalpy of the formation of the RNA/RNA, DNA/RNA, and DNA/DNA complexes. It will be possible to understand whether this statement is true for NA derivatives after analyzing a set of their complexes of different lengths and GC composition using our proposed approach.

DISCUSSION

In this study, we analyzed the possibility of using the MD method to predict the energy value of the formation of NA duplexes. The MMPBSA and MMGBSA methods were used to calculate the enthalpy of complex formation, and the Q-Harm and NMA methods were used to calculate the entropy of complex formation. The analysis was performed using only the MD trajectory for the NA duplex or three trajectories for double-stranded state and single-stranded oligonucleotides. The value of the Gibbs free energy of duplex formation was evaluated from a combination of the entropy and enthalpy values calculated by different methods using the single- and three-trajectory approaches. In some cases, the resulting data were characterized by high correlation coefficients for experimental and calculated values.

The results lead to the conclusion that quantitative calculation of the enthalpy of complex formation can be carried out for the RNA/RNA, DNA/RNA, and DNA/DNA complexes using the MMGBSA method in a single-trajectory approximation using linear empirical correction. The resulting enthalpy value coincides with the experimental value with an accuracy of 8.6%. The values of the entropy and Gibbs free energy can be compared quantitatively only within one type of complex. In this case, the single-trajectory approach and combination of the MMGBSA and Q‑Harm methods give the highest accuracy, i.e., 8.2 and 9.3% on average, for RNA/RNA and DNA/RNA duplexes, respectively, when taking into account the linear correction for each type of the complex. The use of these two calculation methods is the most computationally efficient and gives an accurate quantitative assessment of the thermodynamic parameters of complex formation.

To extend the proposed approach to NA analogs or derivatives, it is necessary to reliably calculate the structure of oligomers and their complexes with complementary nucleic acids. As soon as these structures or ensembles of structures are found, the method can be used to quickly assess the hybridization ability of NA derivatives. We showed earlier with examples of glycine-morpholino oligomers [32] and phosphoryl guanidine oligonucleotides [33] that the enthalpy values of complex formation calculated by the proposed method were close to the experimental values. However, the reliability of the proposed method should be quantified in additional studies of a set of oligomers of different lengths and GC compositions.

For quantitative calculation of the enthalpy of duplex formation, we recommend the MMGBSA method with an analysis of the trajectory of the duplex structure as an effective and fast approach. In the future, the proposed approaches can also be extended to NA analogs and derivatives, which will provide the opportunity for the rational design of new types of compounds.