Introduction

DNA molecule is one of the most important biologically active compounds. It encodes the genetic instructions used in the development and functioning of all known living organisms. The information in DNA is stored as a code made up of four nitrogen bases. The formation of DNA base pairs plays a crucial role in the realization of the main role of DNA, which is the storage and replication of genetic information [1]. Therefore, a detailed knowledge about structure and properties of single building blocks of DNA is of great importance. One of the DNA bases is cytosine. Its atom numbering is shown in Fig. 1.

Fig. 1
figure 1

Atom numbering in cytosine

Despite the fact that cytosine may exist in various tautomeric forms, we will focus on keto-amino structure (see Fig. 2), which is presumably the most stable one in the gas phase [2].

Fig. 2
figure 2

Selected resonance forms of cytosine for the most stable keto-amino structure

The question of cytosine tautomeric stability in the gas phase, low-temperature matrices and polar solution is not clear [2]. Several theoretical calculations at density functional theory (DFT) and second-order perturbation Moller–Plesset theory (MP2) level, using a relatively incomplete basis sets (like 6-311G(2d,2p) or 6-311++G** and with simplified inclusion of solvent in Ref. [2], suggest the keto-amino form being the most stable form in the gas phase and solution. On the other hand, the recent MP2/6-311++G** work by Alonso and coworkers [3] proposes the trans enol-amino form as the most stable form in the gas phase (lower by only about 1.19 kcal/mol from the corresponding keto-amino form). However, the detailed earlier theoretical work by Zeegers-Huyskens et al. [4] clearly indicates the amino-oxo tautomer as the most stable form in the gas phase. In addition, they stressed its predominance (by a factor of ten) in water.

Vibrational (IR/Raman) and NMR spectroscopic techniques, additionally supported by computational methods, have been used as very efficient tools for characterization of biological molecules [5]. On the other hand, current theoretical methods support interpretation of complex NMR, Raman and IR spectra, and combined experimental and computational studies are in routine use [68]. Unfortunately, the use of harmonic approximation for prediction of vibrational frequencies suffers from neglecting the effect of anharmonicity [6, 9]. The simplest remedy to bring theoretical harmonic frequencies (often overestimating experiment by 5–10 %) close to measured values is by using a uniform scaling factor [9]. However, the value of a proper scaling factor depends on the method of calculation and basis set quality. Several optimal scaling factors have been reported [912].

A more theoretically sound approach involves the inclusion of an anharmonic potential. Several methods including anharmonicity, for example, the second-order vibrational perturbational theory (VPT2) [13, 14], vibrational self-consistent field (VSCF) [1517] and vibrational configuration interaction (VCI) [1820] are available. Unfortunately, these approaches are significantly more computationally demanding and therefore are practically limited to small- and medium-size molecules.

Similarly, accurate modeling of 13C and 1H NMR spectra relays primarily on the selected theory level and completeness and flexibility of the basis set used [2123]. Besides, the inclusion of zero-point vibration corrections and solvent effects should further improve the agreement between theory and experiment [2428].

Cytosine has been the subject of numerous experimental and theoretical studies. Its structural parameters of single molecule were investigated by ab initio methods both in the gas phase [2932] and in solution [31, 3335]. Several studies focused on cytosine and other nucleobase tautomers [3639]. In addition, hydrated complexes of cytosine were studied theoretically [29, 40]. IR studies of cytosine have been carried out in the gas phase [41, 42], argon [43] and N2 [44] matrixes, aqueous solutions [45, 46] and in the solid state [47]. Calculated harmonic [31, 4850] and anharmonic [30] vibrations were also reported. Most vibrational studies were conducted using a simple harmonic model combined with DFT and MP2 calculations. Rasheed et al. [30] reported on HF, B3LYP and MP2 calculated anharmonic vibrational spectra of cytosine using the VSCF and CC-VSCF methods. The authors observed a good agreement between DFT and MP2 anharmonic wavenumbers and experiment. 1H [51], 13C [51, 52], 15N [52] and 17O [53] NMR chemical shifts of cytosine measured in DMSO solution have been reported. However, we are not aware of high-level theoretical prediction of the corresponding NMR parameters. The only available report published the B3LYP/6-311G(2d,2p) calculated proton and carbon chemical shifts in the gas phase and solution using TMS as theoretical Ref. [2]. Besides, the authors did not verify the accuracy of their predictions with experiment.

The aim of this work was to discuss the impact of time-consuming anharmonic model on the accuracy of the predicted structural and spectroscopic properties of isolated cytosine molecule (shown in Fig. 2b as keto-enol tautomer) in vacuum and DMSO solution using DFT calculations. Cytosine is selected as an example of well-characterized experimentally and theoretically real-size biomolecule containing 10–15 atoms. Obviously, a molecule of this size could be a subject of benchmark calculations, for example, at the coupled cluster level and very large basis sets. However, such calculations are extremely expensive, and the calculation cost could scale very steep (N7 or N8 of number of basis sets) with the size of atomic system. An example of such anharmonic studies on structure and anharmonic vibrations of uracil was reported [54, 55].

Thus, we want to see whether there is an improvement in prediction of cytosine structure, vibrational parameters and also NMR chemical shifts using a popular and efficient B3LYP density functional [56, 57] and a medium-size Pople-type basis set 6-311++G** in the gas phase by replacing a standard harmonic model with anharmonic one. Finally, we want to test the advantage of including solvent effects within harmonic and anharmonic models on the accuracy of the selected cytosine structural and spectroscopic parameters. A simple polarized continuum model (PCM [58, 59]) will be used to account for DMSO solvent.

Theoretical approach

The unconstrained cytosine geometry optimization, vibrational analysis and NMR calculations were performed using Gaussian 09 [60] software. B3LYP hybrid density functional [56, 57] combined with 6-311++G** basis set was used to fully optimize free cytosine equilibrium geometry (R e ) in the gas phase and in DMSO solution. In order to include an impact of solvent on selected properties of dissolved cytosine, the self-consistent reaction field (SCRF) calculations within the polarized continuum model (PCM) [58, 59] were selected.

The harmonic and anharmonic vibration calculations (yielding rovibrationally averaged R v structure) were carried out in vacuum and in DMSO at B3LYP/6-311++G** level of theory using the VPT2 method [13, 14]. All vibrational calculations yielded only positive vibrations ensuring minimum energy structures.

In our studies, we preferred to use PCM as a simple model of solvent impact on the structure and spectroscopic properties of cytosine. This very rough approach works well for solvents of low polarity and nonpolar solute molecules. We are aware about the limitations of PCM model, but the use of a super molecule model with explicit DMSO molecules, in particular for the VPT2 calculations, is very expensive computationally.

Finally, the B3LYP/6-31++G** calculated cytosine R e and R v geometries in the gas phase and in DMSO were used for all subsequent prediction of nuclear shieldings using the gauge-independent atomic orbital (GIAO) [61, 62] approach. For calculation of nuclear magnetic shieldings, we selected two density functionals—B3LYP and BHandH. The latter functional was selected because our earlier studies indicated its good performance in predicting proton, carbon and fluorine NMR parameters [63]. Since the GIAO NMR parameters are very sensitive to the completeness and quality of the used basis set [6, 24, 64], we selected three basis sets. Initially, we used the same basis set as for geometry optimization (Pople-type 6-311++G**). Next, we selected aug-cc-pVTZ-J basis set, tailored by Sauer et al. [65, 66] for accurate calculations of indirect spin–spin coupling constants. However, this basis set also enabled prediction of carbon nuclear shieldings in a set of small molecules close to complete basis set limit [24, 6770]. This basis set was downloaded from Environmental Molecular Sciences Laboratory (EMSL) exchange basis set library [71, 72]. Finally, we selected somehow smaller and more compact STO-3Gmag basis set, designed by Leszczyński and coworkers [73] for efficient prediction of carbon shieldings in larger molecular systems. The latter basis sets was taken directly from their article [73].

Theoretical carbon and proton chemical shifts were referenced to benzene, calculated at the same level of theory, and the corresponding parameters were calculated as follows:

$$ \delta ({\text{C}}_{i} ) \, = \sigma \left( {\text{benzene}} \right) \, - \sigma \left( {{\text{C}}_{i} } \right) + 128.5 $$
(1)
$$ \delta ({\text{H}}_{i} ) \, = \sigma \left( {\text{benzene}} \right) \, - \sigma \left( {{\text{H}}_{i} } \right) + 7.21 $$
(2)

Besides, we used magnetic shielding of water [74], calculated at the same level of theory (B3LYP/6-311++G**) in the gas phase and DMSO, as reference for 17O chemical shifts. Thus, the corresponding 17O shieldings were 296.366 and 328.791 ppm. Taking into account magnetic shielding of liquid water (−36.1 ppm [75]) used as reference in experimental studies, the theoretical reference values were 260.266 and 292.691 ppm, respectively. Similarly, liquid nitromethane (shielding of −112.56 ppm or chemical shift of 380.2 ppm relative to neat ammonia [76]) is used as reference in 15N NMR spectroscopy. Our calculated 15N shieldings for nitromethane in the gas phase and DMSO were −152.622 and −166.051 ppm, respectively.

Besides, as suggested by the reviewer, we applied empirically derived linear correlations [77] between theoretical nuclear shieldings (B3LYP/6-311++G** results) and experimental carbon and nitrogen chemical shifts to derive theoretical chemical shifts. This approach does not involve a theoretical reference molecule and takes advantage of “averaging” about 395 and 56 chemical shifts for 13C and 15N, respectively.

The accuracy of theoretical predictions is often expressed by the root-mean-square (RMS) deviation from experimental values. In this work, we applied the following formula for RMS calculation:

$$ {\text{RMS}} = \sqrt {\frac{{[(x_{i} - x{}_{{i_{\exp } }})^{2} + \ldots (x_{n} - x_{{n_{\exp } }} )^{2} ]}}{n}} $$
(3)

In case of cytosine frequency modes and selected structural parameters, xi corresponds to 33 vibrations and eight bond lengths between non-hydrogen atoms. However, only four carbon and two proton chemical shifts are available for statistics. So, from statistical point of view, the NMR data should be discussed in terms of averaged deviations of calculated values from experiment. However, for consistency, we decided to use RMS as a rough measure of prediction quality in the current study.

Results and discussion

Structure in the gas phase and DMSO solution

As mentioned in Introduction, we decided to study the trans keto-amino cytosine tautomer (Fig. 2b). The MP2/6-311++G** results in Ref. [3] indicate the possible existence of three forms, differing by only 1–2.5 kcal/mol, and the remaining two forms are significantly less stable (by 3.67 and 5.36 kcal/mol). Thus, the relatively low level of theory does not warrant conclusive information about the most stable form of cytosine in vacuum.

In Table 1 are compared the selected equilibrium (R e ) and rovibrationally averaged (R v ) cytosine interatomic distances calculated at the B3LYP/6-311++G** level of theory in the gas phase and DMSO solution with the available experimental X-ray values [78]. The total performance of theory is given by RMS deviations from experiment (see Table 1). We are aware that the comparison between theoretical numbers in the gas phase or solution with experimental data measured in the solid state is somehow artificial (H-bonding and crystal packing forces are not considered in single molecule calculations), but unfortunately, there are no other available experimental studies in the literature. The agreement between theory and experiment for the selected CC, CN and CO is better visualized using a graphical presentation of bond length deviations from experiment (see Fig. 3).

Table 1 Comparison of selected equilibrium and rovibrationally averaged B3LYP/6-311++G** calculated cytosine bond lengths (in Å) in vacuum and DMSO
Fig. 3
figure 3

Deviations of selected equilibrium (R e ) and rovibrationally averaged (R v ) cytosine bonds, calculated at the B3LYP/6-311++G** level of theory, in the gas phase and DMSO solution from the experimental X-ray values

One could expect that DMSO solvent, due to its strong tendency to H-bonding, should produce geometry more resembling than that for crystalline cytosine. In particular, in comparison with the gas phase data, the C=O and N–H bonds should be more elongated in both the DMSO solution and in the crystalline state. This is in agreement with our results showing that the RMS values in the gas phase increase from 0.020 to 0.030 Å and are about 50 % larger than in DMSO (RMS raises from 0.010 to 0.015 Å, see Table 1; Fig. 3).

It is apparent from Fig. 3 that the C5–C6 and C6–N1 bond lengths are predicted very accurately, and the worse results are produced for C4–N8 and C2–N1 bonds. It is known that the R v structures should generally show more elongated bonds. Interestingly, both in the gas phase and DMSO solution, the overall R v structures are in worse agreement with experimental data (RMS is higher by about 50 %) than the initial R e structures.

The observed accuracy of predicted bond lengths in DMSO is somehow related to the relative higher “content” of resonance structures A and C in comparison with the neutral form B (see Fig. 2). The C5–C6 and C6–N1 bonds do not take part in the resonance structures and therefore are predicted very accurately both in the gas phase and DMSO, using harmonic and anharmonic modeling. In contrast, the C2=O7 bond is very sensitive to the solvent presence and anharmonicity corrections. Similarly, the shortening of C2–N1, C2–N3 and C4–N8 is consistent with the effect of resonance.

Harmonic and anharmonic frequencies

In the next step, we will discuss the quality of theoretically predicted cytosine frequencies (Table S1) in comparison with experimental values, measured in low-temperature argon matrix [43]. Besides, we will compare our results with recent theoretical data [30], obtained at significantly lower level of theory (B3LYP/6-31G**). Thus, we will compare our B3LYP/6-311++G** calculated cytosine harmonic, scaled (with a single scaling factor of 0.9688) and anharmonic frequencies in vacuum and DMSO solution with recent harmonic, anharmonic VSCF, VCI [30] and experimental results [30, 43]. Instead of discussing all individual modes, we will concentrate here on the overall picture only. Thus, we will concentrate on RMS deviations between theoretical wavenumbers predicted in the gas phase and fifteen highest frequency experiment performed in low-temperature noble gas matrix (e.g., for an experimental setup resembling gas phase). Thus, going from raw frequencies in the gas phase to scaled and VPT2 anharmonic frequencies, a consistent improvement of results is visible from Table 2 (RMS drops from about 86 to 29 and 20 cm−1). It is also worth mentioning larger RMS values obtained for Rasheed and Ahmad [30] results: The corresponding RMS values for harmonic and anharmonic VSCF and CC-VSCF frequencies are 96, 53 and 39 cm−1. Besides, when we look at the diagnostic and typically most intense C=O stretch band in the IR spectrum (see Table 2), we observe a large improvement and the corresponding deviations from experiment are 49, −6 and 16 cm−1. Interestingly, the 33 calculated raw harmonic frequencies for cytosine in DMSO (see Table S1 in supplementary material) are of identical accuracy to those in the gas phase (RMS of 76 and 75 cm−1) and are improved by a similar amount using a single scaling factor (RMS of 60 and 58 cm−1). However, inclusion of anharmonicity and solvent significantly worsens the results (RMS increases from 50 to 90 cm−1).

Table 2 B3LYP/6-311++G** calculated harmonic, scaleda and anharmonic frequencies (in cm−1) of cytosine in vacuum. For comparison are included recent theoreticalb and experimentalc results

It is apparent from Fig. 4 that the selected high-frequency anharmonic modes of cytosine in the gas phase are closer to experimental data measured in the low-temperature argon matrix than the theoretical results obtained from the DMSO solution. This tendency is particularly pronounced for C=O stretch.

Fig. 4
figure 4

Deviation of selected harmonic, scaled and anharmonic B3LYP/6-311++G** frequencies of cytosine in the gas phase and DMSO solution from experimental data in argon matrix [43]

1H and 13C NMR results

Finally, we will concentrate on the predicted proton and carbon chemical shifts using B3LYP and BHandH density functionals and their comparison with the reported experimental values [51, 52] in DMSO-d6. In this case, we will look at the impact of basis set quality, inclusion of rovibration effects and solvent effect. Here we will only consider the accuracy of the calculated chemical shift of four different carbon atoms (C2, C4, C5 and C6) and two protons (C5H and C6H). Obviously, our simplified PCM model cannot account for specific H-bonding, shifting the –NH and –NH2 signals by 3–5 ppm, and observed in experimental spectra, recorded in DMSO at room temperature. Thus, we will initially exclude from the discussion all nitrogen and oxygen data, as well as exchangeable protons, involved in strong hydrogen bonds.

In Fig. 5 are shown deviations of B3LYP and BHandH predicted carbon and proton chemical shifts, calculated for the gas phase structures, from experiment performed in solution. The use of R v geometry improves the agreement for B3LYP calculated C2, C4, C5, C5H and C6H chemical shifts and does not influence the accuracy for C6 (however, the reverse sign of deviation is observed). However, there is no clear dependence for BHandH calculated chemical shifts obtained for R v structure.

Fig. 5
figure 5

Deviations of B3LYP and BHandH predicted carbon and proton chemical shifts from experiment (DMSO-d6 solution at room temperature). Theoretical values are calculated for R e and R v structures in the gas phase and referenced to benzene

The importance of solvent inclusion for prediction of cytosine chemical shifts is apparent from Fig. 6. First, the B3LYP calculated C2, C4, C5, C5H and C6H chemical shifts at R e geometry in DMSO are closer to experiment than the corresponding gas phase values (see also Fig. 5). Besides, a comparable agreement for C6 is observed. However, the respective results predicted with BHandH do not show a uniform improvement upon including solvent effect. In addition, the use of R v geometry and PCM model does not improve consistently the theoretical results in comparison with gas phase calculations. In particular, a combination of BHandH density functional, solvent impact and rovibrationally averaged geometry lowers the predictive power of theory.

Fig. 6
figure 6

Deviations of B3LYP and BHandH predicted carbon and proton chemical shifts from experiment (DMSO-d6 solution at room temperature [51, 52]). Theoretical values are calculated for R e and R v structures in DMSO solution and referenced to benzene

In order to get a general picture of the performance of different models on the accuracy of NMR chemical shifts in Table 3, we gathered the corresponding RMS values. These results also show the performance of three selected basis sets used in GIAO NMR calculations.

Table 3 RMS deviation (in ppm) of theoretical BHandH and B3LYP carbon and proton chemical shiftsa of cytosine in vacuum and DMSO from experimental data [51, 52] in DMSO-d6

First, we notice an improvement of gas phase BHandH and B3LYP results calculated at R v structure of cytosine in comparison with the corresponding proton and carbon chemical shifts, predicted at R e geometry. An opposite situation is observed in DMSO solution. The best agreement for proton chemical shifts is observed for aug-cc-pVTZ-J (in DMSO) calculations at R e geometry (RMS of 0.262 and 0.253 ppm for BHandH and B3LYP). The best results for carbon chemical shifts are predicted when using B3LYP/6-311++G** calculations in DMSO (RMS of 1.530 ppm). Somehow worse result is obtained from B3LYP/STO-3Gmag calculations of 13C chemical shifts at R v structure in DMSO (RMS of 1.840 ppm). Thus, the improvement of basis set quality does not produce better agreement with experiment in case of proton and carbon chemical shifts of cytosine. Unfortunately, this trend for BHandH and B3LYP density functionals is opposite to coupled cluster calculated GIAO NMR results (in case of CCSD(T), a continuous improvement of predicting power is observed toward the complete basis set limit [24, 27]). However, when we consider typical chemical shift ranges observed for proton and carbon spectra (10 and 200 ppm), the best RMS values from Table 3 will correspond to 0.21 and 0.55 %, respectively. Thus, the result for cytosine chemical shifts points out very accurate predictions using DFT calculations.

Another way of looking at performance of theoretical models is to analyze the quality of correlation between calculated and experimental chemical shifts and the corresponding parameters of least square fit. For brevity, such correlations including both proton and carbon results are included in Figs. S1–S4 in the supplementary material. Here we only shortly mention the general conclusions from all these graphs: Very nice linear correlations, indicating good reproduction of experimental NMR parameters by theory, were obtained (y = ax + b, with slope (a) close to 0.5 and r 2 values close to 1). Besides, the parameter b in all these graphs was close to zero.

In order to asses the total performance of GIAO B3LYP/6-311++G** calculations in the gas phase and DMSO in case of more theoretically difficult nuclei, e.g., 15N and 17O, we also gathered in Table 4 the corresponding nuclear shieldings and chemical shifts for these isotopes (all nuclear shieldings were referenced to benzene, nitromethane and water). Besides, we also included carbon and nitrogen chemical shifts derived from empirical linear formulas reported by Blanco and coworkers [77]. This approach does not need to use a separate calculation for a reference molecule.

Table 4 Comparison of theoretically predicted chemical shifts of cytosine in the gas phase and DMSO with available experimental data in the condensed phases

It is apparent from Table 4 that the direct (and popular) referencing of carbon data both in the gas phase and DMSO solution leads to somehow better reproduction of experiment (RMS of 4.46 vs. 5.20 ppm in the gas phase and 1.53 vs. 1.90 ppm in DMSO) in comparison with empirical relation [77]. On the other hand, the empirical formula for nitrogen works significantly better (RMS of 19.77 vs. 15.12 in the gas phase and 25.20 vs. 7.81 ppm in DMSO). Thus, the inclusion of DMSO solvent improves prediction of carbon chemical shieldings but worsens nitrogen chemical shifts with respect to values measured in DMSO. The advantage of including solvent is particularly important in case of 17O NMR chemical shift (deviation of 42.58 in the gas phase vs. −13.69 ppm in DMSO).

However, we are aware that the experimental nitrogen and oxygen chemical shifts are also recorded in different conditions (solvent, temperature or solid state), but the presence of hydrogen bonding is not taken into account in our calculations. Thus, the absolute deviation between theory and experiment in case of nitrogen and oxygen chemical shifts could be significantly larger than in case of carbon and proton data (15N and 17O appear at significantly larger range of chemical shifts than 13C or 1H).

Conclusions

The use of affordable B3LYP/6-311++G** level of theory enabled very fast and reliable prediction of equilibrium structure of cytosine. The RMS deviations of R e bond lengths between non-hydrogen atoms from experimental values, measured using X-ray technique, were fairly small (RMS of 0.010–0.020 Å). The VPT2 predicted rovibrational structure in the gas phase, and DMSO solution (within PCM solvent model) was significantly more expensive computationally. Besides, the agreement between the R v structure and X-ray experiment was slightly worse (RMS of 0.015–0.030 Å). However, anharmonic frequencies reproduced significantly better the fifteen highest frequency experimental values, measured in low-temperature argon matrices than the raw, harmonic data (RMS of 20 vs. 86 cm−1). Obviously, a simple uniform scaling also improved the results significantly (RMS of about 30 cm−1).

Typical BHandH and B3LYP calculations with popular 6-311++G** basis set for cytosine R e structure in the gas phase resulted in very inaccurate cytosine proton chemical shifts (RMS of 0.854 and 0.708 ppm). The use of R v instead of R e cytosine structure in the gas phase generally improved proton chemical shifts, and the only exception was for BHandH/aug-cc-pVTZ-J. DFT-predicted proton chemical shifts in DMSO were consistently more accurate when using R v instead of R e cytosine structure for the tested 6-311++G**, aug-cc-pVTZ-J and STO-3Gmag basis sets. Besides, addition of solvent using PCM model improves the predicted cytosine proton shifts calculated for both R e and R v structures. Only in case of B3LYP/STO-3Gmag calculations at R e geometry, it leads to worse agreement with experiment. The overall best results for cytosine protons are observed for both BHandH and B3LYP density functionals combined with aug-cc-pVTZ-J basis set when using R v structure in DMSO (RMS of 0.225 and 0.206 ppm).

The use of R v structure in the gas phase improves the accuracy of carbon chemical shifts. This tendency was particularly pronounced for B3LYP density functional (RMS of 4.172 decreased to 1.655 ppm for B3LYP/6-311++G**). Furthermore, the inclusion of solvent improves carbon chemical shifts calculated for cytosine R e structure. The best agreement with experiment (RMS of 1.530 ppm) was observed for B3LYP/6-311++G** predicted 13C chemical shifts using R e structure of cytosine in DMSO. Moreover, in most cases, the combination of PCM calculations and anharmonic correction yielded worse agreement between predicted carbon chemical shifts and the corresponding experimental values. In case of cytosine carbon chemical shifts, the improvement of basis set quality did not produce better agreement with experiment. Thus, probably due to cancelation of different errors, the use of inexpensive R e structure with PCM solvent model predicted by B3LYP/6-311++G** calculations produced the best carbon shieldings.

The obtained NMR results strongly suggest caution when mixing different correction techniques in order to improve the predictive power of DFT. Thus, the semi-empirical nature of the used density functionals seems to be the source of limitations when using DFT as predicting tool in calculation of GIAO NMR parameters.