Solid-state 13C NMR, X-ray diffraction and structural study of methyl 4-O-methyl β-D-glucopyranosides with all eight possible methyl-substitution patterns

Cellulose model compounds that mimic the building blocks of modified cellulose and cellulose derivatives are widely used in cellulose research to infer the properties of the polymer from the monomer. Based on the well-established model compound methyl 4-O-methyl β-d﻿-glucopyranoside, in which the methyl groups represent the truncated side chains of the cellulose, the corresponding O-methyl-substituted derivatives with all eight different substitution patterns (mono-, di- and trisubstituted at O-2, O-3, O-6) were synthesized. Crystallization of the products in sufficient quality for solid-state structure determination by single-crystal X-ray diffraction succeeded in all cases, and the results are reported. Two of the compounds showed more than one independent molecule per unit cell. Solid-state 13C NMR showed a significant down-field shift (5–10 ppm) of the OMe-substituted carbons relative to the OH-substituted counterparts and generally confirmed the important influence of solid-state packing on the chemical shifts as seen by comparison to the solution NMR data.


Introduction
Methylcellulose (MC) is one of the most important classes of cellulose derivatives, in terms of produced amounts and versatility of uses. Several books and reference works (e.g., Grover 1993;Nasatto et al. 2015) can provide readers with a deeper understanding of the respective chemistry, properties, and applications. MC has practical applications in various industries, such as food, cosmetics, paints, and pharmaceuticals, where it is used as a thickener, stabilizer, and emulsifier (Thirumala et al. 2013;Morozova 2020;Forghani and Devireddy 2018). MC exhibits a rich phase behavior, including isotropic, nematic, cholesteric, and smectic phases, which can be tuned by varying the degree of methylation, the concentration in solution, and the temperature (Forghani and Devireddy 2018;Hynninen and Patrakka 2021). Studies have also investigated the mechanical properties of MC gels and films, as well as the interactions of MC with other molecules such as surfactants and proteins. The insights gained from these studies have contributed to our understanding of the self-assembly and dynamics of complex fluids and biological systems, especially with regard to medical applications (Thirumala et al. 2013;Ahlfeld et al. 2020;Bonetti et al. 2021;Biswas 2016).
The complex supramolecular and hierarchical structure of cellulose continues to present challenges to the understanding of its properties and behavior. Often, model compounds of low molecular weight are used to simplify the chemical or physical problems under study, in particular with regard to analytical issues. These model compounds most frequently are monomers and dimers, sometimes oligomers, of the cellulose backbone that mimic monomeric, sometimes, oligomeric, building blocks or sections of cellulose or cellulose derivatives. The substituents or modifications on the model compounds are thereby meant to reflect the substituents or modifications on the polymeric cellulose chain. Also, our earlier work has made extensive use of model compounds, particularly in questions of chromophore formation in oxidatively damaged celluloses (Rosenau et al. 2004(Rosenau et al. , 2017aHenniges et al. 2013;Korntner et al. 2015) or from hexeneuronic acids (Rosenau et al. 2017b), the identification of residual chromophores in different cellulosic matrices and their destruction in bleaching (Rosenau et al. 2007(Rosenau et al. , 2011, model system for cellulose derivatization (Hettegger et al. 2015;Odabas et al. 2016) or questions of quantification of oxidized groups on the cellulosic polymer (Tot et al. 2008;Röhrling et al. 2002).
Computational studies of molecular shape are dependent on model compounds. For example, the shapes of the polymer can be inferred from the lowest energy shapes of the disaccharide. Besides the obvious role of the direct use of experimental structures in the computation, determination of the effects of substituent groups on the properties of the monosaccharides can inform the necessary assumptions in studies of the polysaccharide where it is still not reasonable to explicitly account for all possible variations of structure. Consider that the cellobiose molecule has 10 exocyclic substituents. With three orientations for any one exocyclic group, there are 3 10 = 59,049 possible geometries for the structure with a single ring shape and fixed geometry of the linkage between the two glucose rings. That could be multiplied by 324 combinations of the linkage torsion angles for a more complete study of the disaccharide for a number that is too large to be reasonable for a quantum mechanics study, so preliminary assumptions can be helpful in reducing the actual variables.
Based on methyl 4-O-methyl-β-d-glucopyranoside (1) as the reference compound, model compounds of monomeric MC units, with all eight different patterns of methyl substitution (mono-, di-and tri-substituted at O-2, O-3 and O-6) were synthesized (compounds 2-8, Table 1). The multi-step syntheses made heavy use of protecting group chemistry (Yoneda et al. 2016). Each compound was fully analytically characterized, involving full resonance assignment in the 1 H and 13 C NMR domains and mass-spectrometric data as well as the purity confirmation by elemental analysis. The model compounds represent monomeric β-d-glucopyranoside units in methyl celluloses, with the methyl substituents at OH-1 and OH-4 simulating the truncated side chains of the polymeric counterpart. Previous work has demonstrated that the terminal 4-OMe group was crucial to induce crystallization and H-bond patterns resembling the cellulose allomorphs, and that methyl 4-O-methyl-βd-glucopyranoside Tot et al. 2008) as well as oligomeric β-methyl glucosides with a terminal 4-OMe group (Mackie et al. 2002;Ruiz Ruiz et al. 2006;Yoneda et al. 2008) are valid model compounds for cellulose in terms of solution and solid-state structural data. This explains why all model compounds presented in this study are derived from the same fundamental structure with both 1-OMe and 4-OMe substituents (Table 1).
In previous work, compounds 1-8 have been used to study the effects of methyl group substituents at different positions on the hydrolytic stability of glycosidic bonds and methyl substituents in MCs in an aqueous solution (Yoneda et al. 2008;Hosoya et al. 2014) and on physicochemical properties as well as NMR shifts (Karrasch et al. 2009a and b). In this work, we would like to report the solid-state structure of the model compounds, with a focus on the 13 C NMR and crystal structure analysis data, as a basis for a later, more in-depth analysis.

Materials and methods
The syntheses of the methylcellulose model compounds 1-8 have been reported previously (Yoneda et al. 2016). , equipped with a 4 mm dual broadband CP-MAS probe. 13 C spectra were acquired by using the TOSS (total sideband suppression) sequence at ambient temperature with a spinning rate of 5 kHz, a cross-polarization (CP) contact time of 2 ms, a recycle delay of 2 s, and SPINAL-64 1 H decoupling. 2 k data points were sampled with an acquisition time of 43 ms resulting in a total sweep width of 240 ppm. Chemical shifts were referenced externally against the carbonyl signal of glycine with δ = 176.03 ppm. The acquired FIDs were apodized with an exponential function (lb = 11 Hz) prior to Fourier transformation.

Solution NMR
All solution NMR spectra were recorded on a Bruker Avance II 400 spectrometer (resonance frequency 400.13 MHz for 1 H and 100.61 MHz for 13 C) equipped with a 5 mm observe broadband probe head (BBFO) with z-gradients at room temperature with standard Bruker pulse programs. The samples were dissolved in 0.6 mL of CDCl 3 (99.8% D) or methanol-D 4 (99.8% D). Chemical shifts are given in ppm, referenced to the respective residual solvent signals. 1 H NMR data were collected with 32 k complex data points and apodized with a Gaussian window function (lb = − 0.3 Hz and gb = 0.3 Hz) prior to Fourier transformation. 13 C-jmod spectra with WALTZ16 1 H decoupling were acquired using 64 k data points. Signal-to-noise enhancement was achieved by multiplication of the FID with an exponential window function (lb = 1 Hz). All two-dimensional experiments were performed with 1 k × 256 data points, while the number of transients and the sweep widths were optimized individually.

X-ray crystallography
Single crystal X-ray data were collected on a Bruker Kappa APEX-2 CCD diffractometer with a nitrogen gas cryostream cooler (Oxford Cryosystems) and a Bruker AXS Smart APEX CCD diffractometer using graphite-monochromatized Mo-Kα radiation (λ = 0.71073 Å) and 0.5° ϕ-and ω-scan frames usually covering complete Ewald spheres with θ max = 30°, except for compound 6 which was measured on a STOE Stadivari instrument (Eulerian 4-circle diffractometer, frame width 0.36°, 6892 frames, detector distance = 40 mm). Non-hydrogen atoms were refined anisotropically. Corrections for absorption with the program SADABS, structure solution with direct methods, structure refinement on F 2 (Bruker AXS, 2001: programs SMART, version 5.626; SAINT, version 6.36A; SADABS version 2.05; XPREP, version 6.12; SHELXTL, version 6.10. Bruker AXS Inc., Madison, WI, USA). C-bonded H atoms were placed in calculated positions and thereafter refined as riding (CH 3 groups refined in orientation using AFIX 137). O-bonded hydrogen atoms were located in Fourier syntheses and were then refined with a restraint that kept the O-H bond distance and the C-O-H angle at 0.84 Å and 109.5° fixed, but permitted rotation about the C-O(H) bond axis (AFIX 147 of program SHELXL) with U iso (H) = 1.5 U eq (O). The absolute structures of all compounds could not be determined through the very weak anomalous dispersion effects and had to be assigned through the known absolute configuration of the glucose residue. Compounds 1-8 formed colorless or white crystals (for form, appearance and crystallization conditions see Table 3). A suitable crystal was mounted on a glass fiber in each case and examined by X-ray single crystal diffraction at RT. The deposited Cambridge Crystallographic Data Center (CCDC) files (see Table 3) contain the supplementary crystallographic data for compounds 1-8. These data can be obtained free of charge via www. ccdc. cam. ac. uk/ data_ reque st/ cif, by emailing data_request@ccdc.cam.ac.uk, or by contacting The Cambridge Crystallographic Data Centre, 12, Union Road, Cambridge CB2 1EZ, UK; fax: + 44 1223 336,033.

Results and discussion
The solid-state NMR spectra and data of compounds 1-8 ( Fig. 1 and Table 2) showed some interesting deviations from the solution NMR counterparts. While in the solution NMR every C atom gave an unambiguous signal, in solid-state NMR two compounds showed more resonances than would have been expected from the structural formulae. Thus, based on the number and intensities of the signals, two magnetically equivalent entities had to be assumed for compound 7 (the 1,3,4,6-methylated derivative), and even three for compound 6 (the 1,2,4,6-methylated derivative). Generally, there was a quite pronounced effect of the solid-state packing on the solid-state chemical shifts. This effect is evidently canceled out in solution when the solid-state environment and the   molecules undergo Brownian motion which renders the sample isotropic. The differences in chemical shifts for structurally analogous C atoms are therefore much larger in the solid-state NMR spectra than in the solution spectra. For example, in the solid state, the chemical shift values of C-1 ranged between 102.5 and 105.7 ppm (Δδ = 3.2 ppm) and that for C-4 between 77.7 and 82.2 ppm (Δδ = 4.5 ppm), compared to 105.2-105.4 ppm (Δδ = 0.2 ppm) and 78.3-78.6 ppm (Δδ = 0.3 ppm), respectively, in solution. The methoxy group resonances in the solid state covered a rather wide shift range of ~ 5 ppm, while in solution the shift differences between the compounds were smaller than 0.3 ppm. Methyl substitution caused a significant down-field shift of the 13 C resonances, approximately between 9 and 12 ppm for C-2, 8-11 ppm for C-3 and 11 ppm for C-6 ( Table 2). A methoxy-substituted C-6 is found at > 70 ppm and thus shifted close to the region typical for C-2 to C-5 in non-substituted hexopyranoses. A list with all 13 C resonances and their full assignments, including those of the methyl substituents, can be found in the Supporting Information. Over time, we succeeded in crystallizing all eight compounds to a quality sufficient for structure determination by X-ray diffraction. The resulting geometries are summarized in Table 3, along with selected compound data. Crystal data and structure refinement details, listings of bond lengths and angles as well as packing diagrams for all eight compounds are compiled in the Supplementary Information. As already suspected from the solid-state NMR data, compound 6 contained three independent molecules per unit cell, and compound 7 two independent molecules. This explains and confirms the occurrence of multiple resonances per C atom in the solid-state 13 C NMR b Tube representation of compound 5, the derivative that deviates most from the perfect chair conformation spectra of these two compounds. Compound 4 crystallized as the hydrate.
One point of interest in detailed studies of monosaccharide structure is the ring shape. This is conveniently described in the language of ring puckering. In the Cremer-Pople (1975) puckering space (see Fig. 2), the Θ parameter determines the type of ring shape, with 0° and 180° being the 4 C 1 and 1 C 4 chairs and 90° being the boats and skew conformations. The half chairs and envelopes are at about 52° and 128° in Θ. The Φ parameter indicates the extent of the pseudorotation. All compounds were clearly in the 4 C 1 domain, with the rings being somewhat distorted. Because of the glucopyranose ring oxygen atom (O5), a perfect 4 C 1 chair would have a Θ value of about 7°. Coincidentally, all these structures have Φ values close to 0°, where the O5 would be the atom out of plane if the structure would have the other five atoms in a plane. A complete map of the different ring forms with MM3 energy contours is given in Dowd et al. (1994). Note that there are only two unique chair forms because 4 C 1 is the same as O C 3 and 2 C 5 , and 1 C 4 is the same as 3 C O and 5 C 2 . The structure that deviated most from the perfect chair was the one substituted at positions 1,2,3, and 4. It is shown in tube representation in Fig. 2b, with O5 being higher above a mean plane than the other atoms. Still, Θ is at 15.5°-a Θ of 26° (half of 52°) would mean a conformation resembling O E more closely than 4 C 1 . The values of the puckering amplitude, Q, are all very similar, with the Q value for the most and least substituted rings being 0.575 and 0.572 Å, respectively. Other than the concentration of the structures close to Φ = 0° there is nothing particularly unusual about the puckering in the compounds 1-8. There is no evidence for any particular influence of methyl substitution on the ring shape.
The anomeric effect is the unexpectedly high concentration for the α-glucose in solution equilibrium with the β-glucose anomer. In the present studies, mutarotation is blocked by the formation of the methyl glucoside, but there are other manifestations of the stereoelectronic arrangement for the atoms in the sequence involving the ring atoms C5, O5, C1, O1, and the methyl carbon attached to the ring by O1. Table 4 shows that the C1-O1 bond is significantly shorter than the other C-O bonds, about 1.39 Å, but the other bonds are all very similar. Various attempts to find a correlation between the presence or absence of methyl substituents failed.
The O3-H…O5' hydrogen bond is a frequent finding in molecules related to cellulose, starting with cellobiose. The present work (Table 5) shows that O5 is not a great acceptor of hydrogen bonds in the environments provided by these methylated sugars, however. Only two of the 11 O5 atoms act as acceptors of methyl group hydrogen atoms, and no other donors were observed. Similarly, seven of the O1 atoms do not accept hydrogen bonds; the remaining four include one that accepts from the water of hydration and three that accept C-H…O hydrogen bonds. On the other hand, nine of the O6 accept, as well as eight  (Yoneda et al. 2008). The orientation of the primary alcohol group is always of interest, with three minima in the calculated energy for complete rotation of the O6 group around the C5-C6 bond. Different from extensive surveys of related molecules in the crystallographic database that shows that the preferred O6 orientation is gauche to both O5 and to C4, the gg orientation, here, nine of the 11 O6 groups were gauche to O5 and trans to C4, the gt conformation. Despite the O6 in nature's most prevalent carbohydrate compound (native cellulose) having the other conformation, trans to O5 and gauche to C4 (tg), it is in only a small minority of all different carbohydrates. When the original, biosynthesized cellulose structure is disrupted by dissolution or even swelling by NaOH or amines, the O6 re-crystallizes in the gt form.
Neither of the gg O6 structures are derivatized. The gg O6 of the 1,2,4 structure participates in a sequence of O3-H…O6-H…O4 network of hydrogen bonds, and the O6 of 1,3,4 structure is in a similar sequence involving O2-H..O6-H…O4. All of the O1 atoms are substituted with methyl groups, and they all take a position close to O5 and trans to C2 (mean C7-O1-C1-O5 torsion angle of 49(6)° where the value in parentheses is the standard deviation).
Returning to the question of secondary alcohol substituent group orientations raised in the introduction, some patterns can be found that add to our knowledge of carbohydrate generally, and cellulose derivatives specifically. Unlike the primary alcohol group (O6 atom) and the methyl group on the anomeric carbon that take more or less staggered orientations relative to their tetrahedral adjacent atoms, the methyl substituents on O2, O3, and O4 are oriented with the methyl carbon close to eclipsing their respective methine ring hydrogen atoms. Similar findings for galactose pentaacetate were found earlier, supported by density functional theory calculations (Thibodeaux et al. 2002). Figure 3 shows the distribution of orientations of the Table 5 Hydrogen bonding activity at the various oxygen atoms a a "From" means that a given oxygen atom accepts a proton from an adjacent oxygen or carbon; "to" signifies that the oxygen atom is donating a proton to a hydrogen bond secondary methoxy groups as torsions to their associated methine hydrogen atom. Visual examination of the crystal structures of 1-8 showed that the methyl groups were arranged in such a way that they mostly had one of their three hydrogen atoms in a plane that included the methine protons on the ring atoms. This is shown for the first member of the series in Fig. 4, with all structures shown in the Supplementary Information. Participation of the methyl group hydrogen atoms in these planes requires particular values of the torsion angle for the H-C-O-C sequence from the ring to the methyl group and the torsion angle for rotation about the O-Me bond.

Conclusions
We were able to provide the solid-state 13 C NMR data with complete resonance assignment and the crystal structure data of the methylcellulose model compounds 1-8, along with an analysis of their solid-state molecular geometry, the packing data and the hydrogen bond systems. The 1,4-dimethyl structure, shown with planes drawn through the methine hydrogen atoms (H2, H4 and H6b on the top side of the ring, and H1, H3, and H5 on the bottom side. These mean planes also included a hydrogen from the methyl groups on the 1 (lower plane) and 4 (upper plane) oxygen atoms as well. The intersection of the planes with these hydrogen atoms are shown by the approximately half white and half red colorings, with the white portions being above the planes and the red half spheres being below the rose-colored planes