Combination of solid-state NMR, molecular mechanics and DFT calculations for the molecular structure determination of methyl glycoside benzoates

A reliable method for molecular structure determination, excluding single-crystal X-ray diffraction (SCXRD), has been applied to six methyl glycoside tetrabenzoates. The proposed method is based on a global conformational search using molecular mechanics and subsequent DFT calculations guided by a solid-state NMR experiment. The accuracy of the applied method has been verified on three methyl glycoside benzoates for which the SCXRD analysis has been completed. It appeared that the calculated conformations of unprivileged energy could be found in the solid state. Bulky substituents (benzoates) exerted less energetically favored interactions in crystals in contrast to isolated molecules. Therefore, solid-state NMR was revealed to be an indispensable approach for choosing credible conformations from the calculated conformations.


Introduction
It is always worthwhile to compare the molecular structures of similar chemical species. Structural studies are the most basic studies and should precede further physicochemical analysis. The most credible molecular structures are obtained from single-crystal X-ray diffraction (SCXRD). However, this technique requires the formation of appropriate monocrystals, which can be a challenging task even for crystalline substances. Therefore, the accessibility of molecular structures derived from SCXRD is limited.
Chemically similar compounds can surprisingly crystallize in various manners (crystals of different sizes and space groups) or even be amorphous. This can often hinder SCXRD measurements. Nevertheless, a comparison of molecular structures among similar chemical species can provide important insights into their structural features. For example, the anomeric effect was thoroughly examined in a series of methyl glycoside acetates because their molecular structures derived from SCXRD were accessible [1].
The cited [1] reported the fortunate coincidence that the molecular structures of six methyl glycoside acetates, the derivatives of both anomers of glucose, galactose, and mannose, were available. Although there were plans to expand the examination of the anomeric effect to a series of six analogous methyl glycoside esters, namely, benzoates (Chart 1), difficulties in SCXRD refinement were encountered.
As mentioned above, in solid-state studies, the SCXRD method provides the most reliable information on the structure and conformation of the studied compounds. However, when it is not possible to obtain an appropriately sized monocrystal, alternative approaches based on dynamically developing methods of molecular modeling are sought. The obtained computational data could be further verified using solid-state NMR spectroscopy.
For compounds in the solid state, intermolecular interactions play a key role in determining solid-phase conformations. Strong intermolecular interactions, when present, stabilize the crystal and mainly influence the conformation of the molecule. In this case, it is necessary to include the full crystal structure in the calculations. However, in the case of molecules in which the intermolecular interactions are limited to the weak van der Waals interactions, single-molecule calculations may allow determination of the conformation of the compound in the crystal. Derivatives of monosaccharides in which hydroxyl groups were substituted with benzoate moieties were selected to investigate this possibility. On the other hand, the introduction of such substituents eliminates the possibility of forming strong intermolecular interactions, such as hydrogen bonds, and the orientations of such bulky aromatic substituents have a primary influence on the 13 (7) 12.451353 α (°) 9 0 9 0 9 0 9 0 9 0 9 0 β (°) 90 90 90 90 98.159 (2) 97.44615 γ (°) 9 0 9 0 9 0 9 0 9 0 9 0 calculations and verification of the obtained data based on 13 C CP/MAS NMR spectroscopy will allow reconstruction of the crystal conformation of the analyzed compounds.

X-ray diffraction
Although the six abovementioned methyl glycoside benzoates differ only in the configurations on separate carbon atoms, major differences among their ability to crystallize were observed. While Bz-Me-β-Gal, Bz-Me-α-Man, and Bz-Me-β-Man were found to form crystals suitable for SCXRD measurements, crystals of the glucosides (Bz-Me-α-Glc and Bz-Me-β-Glc) were not of a sufficient size for this type of crystallographic study, whereas Bz-Me-α-Gal was found to be amorphous.
Bz-Me-β-Gal and Bz-Me-β-Man were found to crystallize in the orthorhombic crystal system and P2 1 2 1 2 1 space group, while the crystals of Bz-Me-α-Man belonged to the monoclinic crystal system and P2 1 space group (Table S1). Importantly, Table 3 Experimental and calculated (CASTEP-and Gaussian-optimized) 13  in those three cases, only one molecule was found to form a unit cell (Z′ = 1). The three glycosides that could not be measured by SCXRD were subjected to powder X-ray diffraction (PXRD) measurements instead (Figs. S25-S28), which proved the crystallinity of Bz-Me-α-Glc and Bz-Me-β-Glc and the amorphousness of Bz-Me-α-Gal.

Me-β-Man
For half of the studied glycosides, crystal structures were obtained and used to validate the proposed method. The first computational aim of this study was to analyze the differences between the experimental, CASTEP-optimized and Gaussian-optimized conformations of methyl glycoside benzoates. To achieve this, the experimental crystal structures of Bz-Me-β-Gal, Bz-Me-α-Man, and Bz-Me-β-Man were used as the initial geometries for CASTEP geometry optimization, including the optimization of the unit cell dimensions. Simultaneously, single molecules were extracted from the crystal structures and optimized in Gaussian using the same exchange-correlation DFT functional (PBE) and semiempirical dispersion correction (Grimme). The optimized structures were then used for calculations of NMR parameters using GIPAW (for CASTEP calculations) and GIAO (for Gaussian calculations) approaches. Calculations of the NMR shielding constants of the experimental (nonoptimized) crystal structures were not performed since it has been proven multiple times that at least the optimization of hydrogen atoms is mandatory to obtain accurate results, even for structures of very high quality [2,3].
To assess how optimization affects the molecular structure, a comparison of the experimental and optimized unit cell dimensions is presented in Table 1. Furthermore, the calculated RMSD values are provided in Table 2. The calculated and experimental 13 C NMR chemical shifts are compared in Table 3.
The CASTEP-optimized unit cell dimensions were found to be in very good agreement with the corresponding experimental dimensions. The slight decrease in cell lengths observed after geometry optimization can be explained by differences between the experimental temperature (130 K) and the calculation conditions in which thermal motions have been neglected, as the geometry optimization was performed at 0 K.
The calculated RMSD values proved that the differences between the CASTEP-optimized and experimental conformations in all three studied cases were very small ( Fig. 1) and limited mostly to a slight decrease in the C-H bond lengths during the calculations due to differences between the computational (0 K) and experimental (130 K) temperatures, similar to what was observed for the unit cell lengths (Table 1).
Higher RMSD values were obtained between the experimental and Gaussian-optimized conformations. This was Table 4 Differences between experimentally obtained and calculated (Conformers-calculated Gaussian-optimized) 13 C NMR chemical shifts (ppm) for Bz-Me-β-Gal with the coefficient of determination (R 2 ) between experimental and calculated chemical shifts; relative energies (E, kcal/mol) of the optimized conformations calculated using Conformers and Gaussian software   caused mostly by the aperiodicity of the Gaussian calculations, in contrast to the CASTEP calculations, and the resulting inability to include the intermolecular interactions. However, in the case of the Gaussian calculations, the molecules preserved their initial experimental conformations after optimization (Fig. 2). Subsequently, the optimized structures were used for the NMR parameter calculations, and the obtained results are presented in Table 3. It should be noted here that only those carbon atoms that formed the sugar unit and aglycon (methoxy group) were chosen for comparison. This choice was made due to the overlap of the 13 C CP/ MAS NMR signals from aromatic substituents, which made the precise determination of their chemical shift values impossible (Figs. S19-S24).
The calculated and experimental chemical shift values were found to be in very good agreement. Slightly better accuracy was observed for the CASTEP results due to a higher similarity between the experimental and CASTEP-optimized conformations, as proven by the lower values of RMSD (Table 2) and the periodicity of the CASTEP calculations. However, even for the Gaussian calculations, the largest differences between the calculated and experimental NMR chemical shifts did not exceed 3 ppm.
The three refined crystal structures (Bz-Me-β-Gal, Bz-Me-α-Man, and Bz-Me-β-Man) were then used to test the chosen method of conformational search using the Conformers module in Materials Studio 2017, as described in "Conformers calculations."

Bz-Me-β-Gal
In the case of Bz-Me-β-Gal, the experimental solid-state conformation (EGO) was found in the group of ten lowest calculated conformations and ranked as the 4th best with energy higher by approximately 2.5 kcal/mol than the lowest energy conformation (G5) ( Table 4). After optimization at the DFT level, only minor differences in the stability order of the conformations were observed. This proves the reliability of the Conformers calculations at the MM level, at least in terms of energy calculations and their application to identify the lowest energy conformation. However, this does not explain why the experimental solid-state conformation (EGO) was found to be merely at the 4th position in terms of energy. However, after an analysis of the NMR calculations results, EGO was found to be the conformation for which the highest coefficient of determination (R 2 ) was obtained. Furthermore, it was the only conformation for which the highest absolute difference between the calculated and experimental NMR chemical shifts did not exceed 3 ppm. A comparison between the experimental and calculated conformations revealed the reasons for both the lowest energies of G5, G6, and G7 compared with EGO as well as some of the observed differences between the calculated and experimental NMR chemical shifts. For example (Fig. 3), in the G7 conformation, a completely different orientation of the benzoate substituent at C6 than in EGO was observed. In the G7 conformation, the aromatic rings of the benzoates at C6 and C4 were found to participate in π-π stacking, which explained why this conformation was the one with the lowest energy. Such interaction was not observed in the experimental conformation. Furthermore, in G7, the H-C1-O-Me moiety was found to be anti-oriented, while in the EGO conformation, this moiety was in the gauche position. This reorientation resulted in a very large difference between the calculated and experimental chemical shifts for the OMe carbon atom.

Bz-Me-α-Man
In the case of Bz-Me-α-Man, the experimental conformation was ranked as the 2nd best during the conformational search. However, after geometry optimization at the DFT level, conformations G6 and EGO converged into one minimum, which was proven based on the very similar values of the calculated 13 C NMR chemical shifts (Table 3), RMSD (Table 5), and E Gaussian (Table 6). Surprisingly, another very low-energy conformation (G8) was found, for which a very weak correlation between the calculated and experimental 13 C NMR chemical shifts was observed. This was associated with a reorientation of the C6 benzoate residue. While in the experimental conformation, a C-H···π interaction was present between the benzoates at C2 and C6; in G8, this interaction was not observed (Fig. 4). However, in G8, π-π stacking was observed between the benzoates at C4 and C6.

Bz-Me-β-Man
In the case of Bz-Me-β-Man, the experimental conformation was ranked as the best during the conformational search. Furthermore, it was also characterized by the lowest differences and highest R 2 value between the experimentally obtained and calculated 13 C NMR chemical shifts (Table 3), as well as the lowest E G a u s s i a n (Table 7). Again, as in the case of Bz-Me-α-Man, the low-energy conformations G2 and G3, which after Gaussian optimization converged into one minimum, were characterized by the least accurate NMR parameter calculation results. Furthermore, while a C-H···π interaction between the benzoates at C2 and C6 and π-π stacking between the benzoates at C3 and C4 were formed in G2, those interactions were not present in the experimental solid-state conformation (Fig. 5). However, in the refined crystal structure of Bz-Me-β-Man, intermolecular π-π stacking between benzoates at C2 (first molecule) and C3′ (second molecules) was observed, which could not have been formed in the single-molecule calculations performed in Conformers and Gaussian.

Calculations for Bz-Me-α-Glc, Bz-Me-β-Glc, and Bz-Me-α-Gal
In the case of Bz-Me-α-Glc, the obtained results were similar to those of Bz-Me-β-Gal. Two low-energy conformations (G2 and G1) that, after the DFT optimization, converged into one minimum were characterized by a very large difference between the calculated and experimental 13 C NMR shifts for C6 (Table 8). Therefore, it is very likely that this conformation is far from the experimental conformation. However, another low-energy conformation (G4) was characterized by a very large R 2 value, with the absolute differences between the experimental and calculated 13 C NMR chemical shifts not exceeding 2 ppm. Therefore, it is very likely that the G4 conformation is actually present in the crystal structure. Almost the same results obtained for Bz-Me-α-Glc were obtained for its anomer, that is, Bz-Me-β-Glc (Table 9). Again, the two lowest energy conformations identified in the Conformers search (G1 and G2) after Gaussian optimization converged to the same minimum, Table 7 Differences between the experimentally obtained and calculated (Conformers-calculated Gaussian-optimized) 13 C NMR chemical shifts (ppm) for Bz-Me-β-Man with the coefficient of determination (R 2 ) between the experimental and calculated chemical shifts; relative energies (E, kcal/mol) of the optimized conformations calculated using Conformers and Gaussian software  and the 3rd best conformation in terms of both E Conformers and E Gaussian was characterized by the best fitting of the calculated and experimental NMR results. Although a relatively large absolute difference between the experimental and calculated 13 C NMR shifts for C4 has been observed, it should be noted that some of the differences between those shifts for glycosides with solved crystal structures (Bz-Me-β-Gal, Bz-Me-α-Man, and Bz-Me-β-Man) were also in the range of 3 to 4 ppm.
In the case of Bz-Me-α-Gal, it was impossible to compare the calculated and experimental chemical shifts due to the very broad and overlapping signals in its 13 C CP/MAS NMR spectrum (Fig. S21). Therefore, for this glycoside, the calculated chemical shifts, instead of differences between the experimental and calculated shifts, are presented in Table 10. It is worth noting that for this glycoside, the lowest differences between the calculated shifts for the same carbon atoms for different conformations among all the studied compounds were observed. Furthermore, groups of very similar energy yet different conformations have been formed, such as G4, G5, and G7 and G3, G1, and G8. In the first group, it is clear that the optimized conformations are different, which was determined by comparison of the calculated NMR shifts for C4; however, the differences between their E Gaussian values did not exceed 0.6 kcal/mol. In the second group, the differences between the 13 C NMR shifts for C6 prove their versatility, yet again, the differences between the DFT energies calculated for these conformations did not exceed 0.6 kcal/mol. These observations may stem from the amorphousness of Bz-Me-α-Gal, resulting from the multiple conformations of similar energy and the resulting inability to form a stable crystal due to entropic reasons.

Conclusions
In this paper, the considerable potential of a combined conformational search followed by geometry optimization at the DFT level and subsequent verification using solid-state NMR analysis ( 13 C CP/MAS NMR) for molecular structure evaluation has been demonstrated. The method of structure determination applied in this study was first verified on three compounds for which the SCXRD structures were obtained for the first time. Then, the same approach was used to identify conformations for three substituted sugars for which SCXRD analysis was not possible.
While in the case of saccharides with free hydroxyl groups, the solid-state molecular structure is determined by strong intermolecular hydrogen bonds, we have demonstrated that, even for substituted sugars forming weak intermolecular interactions, energetically favored conformations are not always the ones present in the crystal lattice. Fortunately, solid-state NMR chemical shifts of sugar carbon atoms proved to be extremely sensitive to different conformations of bulky and highly anisotropic substituents such as benzoates. Thanks to the screening of the correlations between theoretical and experimental chemical shifts, reliable molecular structures were selected even when SCXRD data were unavailable.
To conclude, in this article, a combined computationalexperimental structural analysis and prediction method was presented. It has been shown that such an approach can Table 8 Differences between the experimentally obtained and calculated (Conformers-calculated and Gaussian-optimized) 13 C NMR chemical shifts (ppm) for Bz-Me-α-Glc with the coefficient of determination (R 2 ) between the experimental and calculated chemical shifts; relative energies (E, kcal/mol) of the optimized conformations calculated using Conformers and Gaussian software   Conformation  G2  G1  G4  G3  G6  G8  G5  G10  G7  G9  C1  -1 Table 9 Differences between the experimentally obtained and calculated (Conformers-calculated and Gaussian-optimized) 13 C NMR chemical shifts (ppm) for Bz-Me-β-Glc with the coefficient of determination (R 2 ) between the experimental and calculated chemical shifts; relative energies (E, kcal/mol) of the optimized conformations calculated using Conformers and Gaussian software  accurately predict the conformation of solid organics, yet there are some limitations. The requirements of this method include the absence of strong intermolecular interactions and the presence of only one molecule in the unit cell (Z′ = 1). In addition, prior registration of the solid-state 13 C NMR spectra of the studied compounds is mandatory for this method to be applied.

Experimental
General procedures

Materials
Bz-Me-α-Glc, Bz-Me-β-Glc, Bz-Me-α-Gal, Bz-Me-β-Gal, and Bz-Me-α-Man were synthesized according to the procedures described in Ref. [4], while Bz-Me-β-Man was obtained according to the synthesis described in Ref. [5]. Basic analytical data for the synthetized compounds are collected in Table 11. The chemical shifts and 1 H coupling constants for 1 H and 13 C NMR spectra in acetone-d 6 solutions are given in Table 12.
Solid-state NMR spectroscopy ( 13 C CP/MAS NMR) Cross-polarization (CP) magic angle spinning (MAS) 13 C NMR spectra were recorded at 100. 16 MHz on a Bruker DSX-400 spectrometer. Powdered samples were spun at 10 kHz in a 4 mm ZrO 2 rotor, with a contact time of 2 ms, a repetition time of 10 s, and a spectral width of 41 kHz for the accumulation of 256 (Bz-Me-α-Glc and Bz-Me-β-Glc), 232 (Bz-Me-α-Gal), or 400 (Bz-Me-β-Gal, Bz-Me-α-Man, and Bz-Me-β-Man) scans. Chemical shifts were calibrated indirectly through the glycine C=O signal recorded at 176.5 ppm relative to TMS.

CASTEP calculations
The density functional theory (DFT) geometry optimization and NMR shielding constant calculations were carried out with the CASTEP program [8] implemented in the Materials Studio 2017 software [9 ] using the plane wave pseudopotential formalism and the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional [10], defined within the generalized gradient approximation (GGA) with the Grimme method for the dispersion correction (DFT-D) [11]. The calculations were performed with ultrasoft pseudopotentials calculated on the fly. Geometry optimization was carried out using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) [12] optimization scheme and the smart method for finite basis set correction. The quality of calculations was set to ultrafine as implemented in the CASTEP standards. The convergence criteria were set at 5 × 10 -6 eV/atom for the energy, 1 × 10 -2 eV/Å for the interatomic forces, 2 × 10 -2 GPa for the stresses, and 5 × 10 -4 Å for the  [4] 158-160 (EtOH) [7] 63-64 (EtOH) [4] 51-52 (EtOH) [4] 136-137 (MeOH) [4] 148-149 (MeOH) [5] [ − 153 (c 1.1, CHCl 3 ) [5] displacements. The fixed basis set quality method for the cell optimization calculations and the 5 × 10 -7 eV/atom tolerance for SCF were used. The kinetic energy cutoff for the plane waves was set to 630 eV. Brillouin zone integration was performed using discrete 3 × 2 × 1 (Bz-Me-β-Gal and Bz-Me-α-Man) and 2 × 3 × 2 (Bz-Me-β-Man) Monkhorst-Pack k-point sampling for a primitive cell. The optimized structures were then used for the NMR parameter calculations using the gauge including projector augmented wave (GIPAW) method of Pickard and Mauri [13]. To facilitate comparison of the theoretical and experimental data, the calculated shielding constants (σ iso ) were converted into chemical shifts (δ t ) using the following equation: δ t = (σ Gly + δ Gly ) − σ iso , where σ Gly and δ Gly stand for the calculated shielding constant and the experimental chemical shift, respectively, of the glycine carbonyl carbon atom (176.5 ppm).

Gaussian calculations
The DFT geometry optimization and NMR shielding constant calculations were carried out with the Gaussian 16 program [14]. The molecules either from the asymmetric units of the Xray structures or resulting from the Conformers search (as described in the "Conformers calculations" section) were taken as the starting geometries to optimize all atom positions at the PBEPBE/6-311G(d,p) with Grimme dispersion correction level of theory. The resulting geometries were used for calculation of the NMR shielding constants using the GIAO approach at the same level of theory. To compare the theoretical and experimental data, the calculated chemical shielding constants (σ iso ) were converted to chemical shifts (δ t ) using the following equation: δ t = σ TMS − σ iso , where σ TMS stands for the shielding constant of the tetramethylsilane (TMS) carbon atom calculated at the same level of theory.

Conformers calculations
A conformational search of the studied molecules was performed using the Conformers module of Materials Studio 2017 by applying the Metropolis Monte Carlo algorithmbased Boltzmann jump search method. A total of 10 3 conformers per molecule were generated by applying 500 perturbations per jump within the torsion angle window adjusted to accept 50% of the generated conformers. Geometry optimization was performed on each conformer at the molecular mechanics level by the provided Smart algorithm (COMPASS force field, ultrafine quality and maximum 5 × 10 4 iterations). Then, the ten lowest energy conformations for each molecule were kept for the Gaussian DFT calculations. a Spectral data originally assigned (in the case of Bz-Me-β-Man for the first time), coherent (except for Bz-Me-β-Gal) with appropriate data from Ref. [4]