2D Assignment and quantitative analysis of cellulose and oxidized celluloses using solution-state NMR spectroscopy

The limited access to fast and facile general analytical methods for cellulosic and/or biocomposite materials currently stands as one of the main barriers for the progress of these disciplines. To that end, a diverse set of narrow analytical techniques are typically employed that often are time-consuming, costly, and/or not necessarily available on a daily basis for practitioners. Herein, we rigorously demonstrate a general quantitative NMR spectroscopic method for structural determination of crystalline cellulose samples. Our method relies on the use of a readily accessible ionic liquid electrolyte, tetrabutylphosphonium acetate ([P4444][OAc]):DMSO-d6, for the direct dissolution of biopolymeric samples. We utilize a series of model compounds and apply now classical (nitroxyl-radical and periodate) oxidation reactions to cellulose samples, to allow for accurate resonance assignment, using 2D NMR. Quantitative heteronuclear single quantum correlation (HSQC) was applied in the analysis of key samples to assess its applicability as a high-resolution technique for following cellulose surface modification. Quantitation using HSQC was possible, but only after applying T2 correction to integral values. The comprehensive signal assignment of the diverse set of cellulosic species in this study constitutes a blueprint for the direct quantitative structural elucidation of crystalline lignocellulosic, in general, readily available solution-state NMR spectroscopy.


Introduction
Surface chemical modification of cellulosic materials is a logical approach to tune the properties and, thus, applicability of these bio-renewable polymers (Klemm et al. 2005). Unlike in small-molecule-based chemical disciplines, with cellulosics, currently there is no established general quantitative analytical technique to accurately assess chemical changes with sufficient resolution. This is in large part due to the poor solubility of cellulosic materials in common molecular solvents, preventing non-destructive solution-state analyses. This limitation has imposed researchers to rely on the poorer chemical resolution of solid-state techniques or indirect methods for characterization of samples, which contain a significant phase composition of crystalline cellulose. Typically, a succession of direct and indirect methods are applied for this task, affording partial insights. However, this process is often lengthy as a whole and its threads are difficult to bring together. Solid-state NMR, in particular, has found utility in the quantification of the different crystalline phases in celluloses (Newman 1999;Kono et al. 2002;Zuckerstätter et al. 2009). High resolution, using ultra-fast magic angle spinning (MAS), and multidimensional experiments are possible for solid-state NMR. However, spectral resolution is rather limited using typical MAS probes, preventing the accurate separation and quantitation of different chemical species. In addition, T 1 relaxation times in the solid state are typically very long, requiring labeling strategies to give sufficient signalto-noise (S/N) for quantitative experiments. Chemical modification of nanocelluloses (Habibi et al. 2010), which by nature involves regioselective surface chemistry, represents a significant challenge due to the infancy of the field and complexity of the materials. This has been compounded by a flood of conceptual articles applying chemistries but which lack the analytical rigor of traditional chemistry disciplines. For accurate definition of feedstocks and reaction products, a multitude of complimentary methods are commonly used (Foster et al. 2018). However, until recently the one irreplaceable method to organic chemistry (solution-state NMR) has not seriously been considered.
The proposed solution-state NMR technique is practically applicable to all crystalline celluloses and even whole biomass samples, provided the molecular weights are not high enough to reduce spectral resolution and S/N (due to relaxation effects), as demonstrated by Holding et al. (2016). However, a specific requirement for a useable NMR method is to A. Isogai Department of Biomaterial Sciences, The University of Tokyo, Tokyo, Japan have accurate assignments for the structural features common to the most studied samples. Another specific requirement is to allow for quantitation of chemical species, which is somewhat limited using NMR to analyze polymeric samples, even using basic 1D solution-state experiments. This is particularly difficult for the more complicated chemical modifications, or whole biomass samples, which would require resolution in several dimensions.
Previously, we published the use of a novel ionic liquid electrolyte, tetrabutylphosphonium acetate ([P 4444 ] [OAc]):DMSO-d 6 for the solution-state NMR analysis of nanocelluloses (King et al. 2018). The choice of the ionic liquid electrolyte was discussed, in detail, in previous articles. (Deb et al. 2016;King et al. 2018) However, the choice is very much related to the high stability of tetraalkylphosphonium cations preventing reaction with solutes and, thus, artifact formation. In addition, the ability to dissolve cellulose efficiently at such low ratios of [P 4444 ][OAc] to DMSO allows for low viscosity solutions, thus, higher resolution spectra. Furthermore, as [P 4444 ][OAc] signals do not overlap with the cellulose resonances, in the 1 H and 13 C ppm domains, which makes [P 4444 ] [OAc] ideal for this purpose. Direct-dissolution NMR solvents, based on the use of 1-ethyl-3-methylimidazolium acetate ([emim][OAc]) (Cheng et al. 2013) or tetrabutylammonium fluoride ([N 4444 ]F) (Heinze et al. 2000;Ö stlund et al. 2009), are problematic for fine chemical analysis of cellulosics. [emim][OAc] is known to react with cellulose (Liebert and Heinze 2008;Ebner et al. 2008;Clough et al. 2015) and high purity [N 4444 ]F is very unstable in non-protic solvents (Sun and DiMagno 2005). Unfortunately, both also have signals that overlap with the cellulose backbone resonances. Alternative low-cost and unreactive perdeuterated cellulose solvents have also not yet appeared.
In this work, we provide thorough characterization for a few different cellulose substrates, using the ([P 4444 ] [OAc]):DMSO-d 6 electrolyte, before and after applying common oxidation schemes. Three cellulose substrates were used. The first is low degree of polymerization-cellulose nanocrystals (LDP-CNC), isolated by super-critical water (sc-H 2 O) extraction of microcrystalline cellulose (MCC) (Buffiere et al. 2016). This was used as it is quite low molecular weight, offering good spectral resolution for signal assignment. The second was pristine cellulose nanocrystals (CNCs) derived from cotton. This is a representative CNC sample, also with relatively low molecular weight. The third substrate was MCC, a common cellulose model compound. The reaction products include cellulose which has been oxidized using either of two synthetically significant methods: 1) periodate oxidation (Kim et al. 2000;Nypelö et al. 2018) or 2) nitroxyl-radical (e.g., TEMPO)-oxidation (Isogai et al. 2011(Isogai et al. , 2018. The spin-systems are assigned (polymeric and terminal units) using a range of common NMR methods and with the help of the monomeric (glucose, gluconic acid and glucuronic acid) and dimeric model compounds (cellobiose and cellobionic acid). Standard heteronuclear single-quantum correlation (HSQC) NMR experiments are not quantitative. Therefore, a suitable quantitative HSQC sequence was tested, with and without T 2 correction, to demonstrate the accuracy of separation and quantitation of key chemical species, before and after oxidation. The results aim to illustrate the potential of this method, not only for analysis of cellulose and cellulose derivatives, but also as a method to improve quantitation in analysis of lignocellulosics in general.

Materials and methods
Raw materials and preparation of oxidized celluloses MCC (DP N-GPC 153) was purchased from Sigma-Aldrich. The LDP-CNCs (DP N-GPC 37, 15 wt% dispersed in water) were the precipitated 'residue' from the sc-H 2 O extraction of microcrystalline cellulose (MCC), as described by Buffiere et al. (2016). They were freeze-dried before use to remove as much free water as reasonably possible. Nitroxyl-radical oxidation of the LDP-CNCs was carried out in the NaClO/NaClO 2 system in the presence of 4-acetamido-2,2,6,6-tetramethylpiperidine 1-oxyl (4-AcNH-TEMPO), under acidic conditions (pH 5.8), according to Hirota et al. (2009). This yielded 4-AcNH-TEMPO-oxidized LDP-CNCs (TOx-LDP-CNCs). The prepared sodium polyglucuronic acid salt form of the TOx-LDP-CNCs was acidified to pH 1.0 and separated by centrifugation with subsequent water washing and freeze-drying. Pinnick oxidation of the reducing ends of LDP-CNCs was carried out under acidic conditions (pH 5.0) in the presence of one weight equivalent of NaClO 2, to yield Pinnickoxidized LDP-CNCs (POx-LDP-CNCs). The prepared salt form was acidified to pH 1.0 and further processed, as described above. Periodate-oxidized CNCs (NaIO 4 -CNCs) were prepared from pristine CNCs (prepared from cotton by hydrolysis with H 2 SO 4 followed by desulfation with HCl), as described in Nypelö et al. (2018). After oxidation with sodium periodate, a film from the oxidized CNCs was cast in a Petri dish, by initial sonication of cellulosic dispersion, casting and air-drying. A mixture of gluconic acid and the corresponding lactone was prepared by evaporation of an aqueous gluconic acid solution (49-53 wt%) in a rotary evaporator. Detailed procedures can be found in the Supporting Information. Cellobionic acid was purchased from Aldox, Dept. of Food Science and Technology, BOKU, Vienna, Austria. All other solvents and chemicals were commercially available from Sigma-Aldrich and VWR, except DMSO-d 6 (Eurisotop) and 4-AcNH-TEMPO (TCI Europe) and used as received, without further purification. More detailed information can be found in the Supporting Information.

NMR experiments
Spectra were recorded using a Bruker Avance 600 MHz Avance III or NEO spectrometers. The majority of the experiments were recorded using a SmartProbe TM optimized for X-nucleus detection. For some samples, an inverse triple resonance probe-head ( 1 H/ 19 F, 13 C, 31 P) or a cryogenically-cooled quadruple resonance ( 1 H, 13 C, 31 P, 15 N) probe-head were used.
The key NMR experiments are as follows: • Standard 1 H and 13 C 1D experiments were recorded for all samples. In some cases, instead of simple 1D 13 C experiments, 13 C (refocused) insensitive nuclei enhanced by polarization transfer (INEPT) experiments were recorded. They provided [ 2 9 improvement in S/N, at the expense of the loss of quaternary signals. • Quantitative 13 C (inverse-gated 1 H-decoupling), was run for the 4-AcNH-TEMPO oxidized LDP-CNC sample, with a repetition delay of 8 s and a 30°pulse (King et al. 2018). • Diffusion-edited 1D 1 H experiments were measured for all polymeric samples using a 1D bipolarpulse pair stimulated echo (BPPSTE) pulse sequence ('ledbpgp2s1d' in the Bruker TopSpin 4.0 pulse program library). • Multiplicity-edited HSQC (Willker et al. 1993), experiments ('hsqcedetgp', or 'hsqcedetgpsisp2.2' for increased sensitivity, in the Bruker TopSpin 4.0 pulse program library) were recorded for all samples.
All NMR measurements were conducted at a sample temperature of 65°C. Typically, the timedomain size in the indirect 13 C-dimension (f1) for HSQC was 1024 and HMBC was 512, corresponding to 512 (td1/2) and 512 (td1) actual t 1 -increments in the real data, for phase sensitive HSQC sequences and the magnitude mode HMBC sequence, respectively. High digital resolution was used as most samples were quite low molecular weight. Chemical shifts in 1 H and 13 C ppm scales were calibrated against the DMSO-d 6 signals (2.50 ppm for residual 1 H and 39.52 ppm for 13 C). All spectra were processed using Bruker TopSpin 4.0.6 (https://bruker.com/) and/or MestRe-Nova 10.0.2 (https://mestrelab.com/) software. Further 1D data processing was completed using Fityk 1.3.1 (Wojdyr 2010) (https://fityk.nieto.pl/). Full NMR experimental and conditions are given in the Supporting Information.

Cellulose model and methodology choice
For NMR analysis, the samples were dissolved in the [P 4444 ][OAc]:DMSO-d 6 (1:4 wt%) electrolyte at 5 wt%, at as low temperatures as possible (typically 25-80°C). For the low DP samples this occurred rapidly at RT. This concentration of cellulosic materials allowed for detection of the low-intensity signals, such as the chain ends in the polymeric samples. All spectra were collected at the elevated temperature of 65°C, as it offers further improvement in resolution and S/N, due to longer spin-lattice (T 1 ) and spin-spin (T 2 ) relaxation times. It is known that T 2 increases with elevated temperature, due to an approximate inverse-law relationship between T 2 and viscosity (Kim 2008). Significant improvements in resolution were also previously observed for the case of MCC dissolved in the homologous methyltrioctylphosphonium acetate ([P 8881 ][OAc]):DMSO-d 6 system (Holding et al. 2016).
Literature resonance assignment of the most basic monomeric units of oxidized celluloses are incomplete. Thus, a range of monomers and dimers were studied, specifically in the [P 4444 ][OAc]:DMSO-d 6 (1:4 wt%) electrolyte. Cellulose is also complicated by the fact that there are non-reducing end (NRE) and anomeric reducing-end (RE) units that differ in their chemical shifts from the corresponding species in the polymeric units. Separation of these species, using 2D correlation methods, is not guaranteed for high molecular weight samples. Hence, monomeric and dimeric models are described, in addition to the LDP-CNC sample (DP N-GPC 37). This sample is rather unique in the fact that there are not many sources of low DP cellulose accessible, in large enough quantities, for synthesis and assignment of the products.
The NMR spectra of native celluloses can provide information on the average chain length of the polymer, as the signals of the reducing end and nonreducing end are relatively well separated (King et al. 2018;Heise et al. 2019;Holding et al. 2016). However, the characterization of modified cellulose samples can be complicated, as both the location and substitution pattern may vary along the polymer chain, and the high molecular weight can preclude using more sophisticated NMR techniques. In addition, as the literature data on relevant monomeric units of oxidized cellulose is incomplete, a range of monomeric, dimeric and oligomeric models were chosen or prepared to aid the spectral interpretation of the oxidized samples. These include: glucose, cellobiose, LDP-CNC, glucuronic acid, gluconic acid and cellobionic acid.

Chemical shift assignment of cellulose and modified units
The full assignment for the dimers, polymeric units and terminal units in this study are shown in Fig. 1. These are in the [P 4444 ][OAc]:DMSO-d 6 (1:4 wt%) electrolyte at 65°C, referenced against DMSO-d 6 (residual 1 H at 2.5 ppm and 13 C at 39.52 ppm). The following will be a description of how the assignments are made and further aspects of the study. -S11) and glucuronic acid (Fig. S12-S13). Tabulated chemical shift data, along with previous literature assignments, are shown in Tables S1 and S3: glucose (Table S1) (Roslund et al. 2008) and glucuronic acid (Table S3) (Agrawal 1992). Our assignments did not change significantly from those of the literature assignments.
Cellobiose Cellobiose (Fig. 1a, b) consists of a an b anomers giving 24 identifiable correlations in HSQC and resonances the 13 C spectrum (Fig. 2a, b). The ratio of anomers is 34:66 (a: b) by 1 H NMR (by integration). HSQC-TOCSYs are shown in Fig. S14-S18, in the Supporting Information, allowing for complete assignment. The tabulated chemical shift data is also given in the Supporting Information (Table S4-S5), along with previous literature assignments in D 2 O (Roslund et al. 2008) which do not show any major deviations from our data. With cellobiose, the spectra start to become quite a bit more complex than for the monomers, with many overlapping peaks in the C2-C5 region (65-85 ppm in the 13 C domain). HSQC-TOCSY spectra with short mixing times (15 ms, Fig. S14), which provides COSY-like correlations, was most useful for tracing the complete spin-systems for the anomers. In the case of cellobiose, which is not polymeric, we still term the glucopyranose with the hemiacetal anomeric carbon atom as the RE and the one with the glycosidic C1 as the NRE. HSQC-TOCSY with long mixing time (120 ms, Fig. S18), was used to easily visualize the corresponding TOCSY correlations for the separate RE and NRE spinsystems. Full assignments are given in Fig. 1a, b and Table S4-S5. At 2048 indirect (f1) increments in the HSQC (Fig. 2a) the resolution starts to approach that of the 13 C spectrum.
In the assignment of cross-peaks, we have tried to be consistent with the color labeling of the assigned correlations: (1) NREs are labeled in green; (2) nonmodified internal AGU correlations are labeled in black; (3) a and b REs are labeled in red and blue, respectively; (4) oxidized internal AGUs, anhydroglucopyranosiduronic acid (AGA) units are labeled in brown; (5) where appropriate, the open (acid) form of the unit was labeled turquoise and closed (lactone) form in purple. This applies to all the figures, except Fig. 8. Cross-peak coloration of the HSQC spectra, unless grayscale, depends on the multiplicity: for primary (CH) and tertiary (CH 3 ) substituted carbons, cross-peak correlations are shown in green; for secondary (CH 2 ) substituted carbons cross-peak correlations are shown in blue.
LOW-DP cellulose LDP-CNCs (DP N-GPC 37) consist of chains of b-(1,4)linked glucopyranose units terminated by RE and NRE groups (Fig. 1c, d). These are true nanocrystals ( Fig. S35-S37), formed by partial depolymerization and recrystallisation of MCC using sc-H 2 O. This 'residue' fraction (Buffiere et al. 2016) comprised of cellulose crystallite fragments consisting of both cellulose I and cellulose II allomorphs. This is illustrated by the microscopy combined with wideangle X-ray scattering (WAXS) analysis of the material (Fig. S37), showing both phases of cellulose I and cellulose II. Some of the main distinguishable diffraction planes corresponding to cellulose I Miller indices (French 2014) are clearly visible at 15.6°for ð110Þ and (110), and 22.3°for (200). For cellulose II there are also distinguishable peaks at 12.3°for the ð110Þ and 20.1°for the (110) Miller indices (French 2014). Gel-permeation chromatography (GPC) showed a higher molecular weight residue, originating from MCC but the majority of material was as a lower molecular weight fraction, with a peak-maximum at a DP of * 32 and overall DP N-GPC of 37 (Fig. S38).
The CH-1 region in the HSQC spectrum ( Fig. 3a) was characterized by four clear signals and the signal with highest intensity was assigned as anhydroglucose unit (AGU)-1 that belongs to the bulk polymeric CH-1 (d H = 4.40 ppm (d); d C = 102.38 ppm), while the remaining signals correspond to NRE-1, RE-a-1 and RE-b-1. This region is characteristic of (hemi)acetals and such close grouping is caused by the rigid conformation adopted by the sugar unit, with the aanomer showing a characteristic down-field shift (to [ 4.5 ppm) in the 1 H dimension and up-field shift in the 13 C domain. Detailed assignment of the remaining HSQC correlations was then completed using HSQC-TOCSY ( Fig. S20-S22) and HMBC to  shown in red and blue, respectively, internal (middle chain) anhydrous glucose unit resonances (AGU) shown in black separate the CH-6 position correlations (Fig. S19). The ratio of a to b, of the 1 H spectrum is 38:62, by deconvolution (Supporting Information Fig. S3), using 'Fityk' (Wojdyr 2010). The same method yielded a DPN-1 H of 15. While there is clearly error in this calculation, we favor the lower DP value, given by the NMR, as overestimation by GPC was also demonstrated in a previous publication (Heise et al. 2019), comparing 'CCOA labelling' (Röhrling et al. 2002) and 2 separate GPC systems; one calibrated using pullulan standards and one using multi-angle light scattering (MALLS) detection.
Throughout each of the experiments, the number of t 1 -increments for the real spectrum (td1/2 for phasesensitive HSQC) can be changed to improve resolution to the required level, to allow for separation of each signal. This is rather straightforward for low molecular weight compounds, such as glucose, cellobiose and to a lesser extent the LDP-CNCs. In this regard, the resolution in the indirectly detected 13 C-dimension in HSQC can start to approach that of the 13 C spectra, providing T 2 values are long enough to benefit from the further sampling. However, as molecular weights increase, the potential gain in resolution can often not be worth the additional collection times, with collection time proportional to the number of f1 increments. In addition, with shorter T 2 values the signals decay quickly and increased sampling will simply result in increased noise, with minimal increase in spectral resolution. Therefore, there is a trade-off between number of scans and number of increments, as molecular weight increases. To assess the resolution gain for a typical cellulose model, MCC (DP N-GPC 153), we measured the full-width at half maximum (FWHM) values (here in ppm units) in the 13 C dimension from HSQCs, gathered for different increment values (Fig. S2). The graph shows an inverse power function relating the resolution to the number of f1 increments. The optimum resolution, with little further trade-off in resolution vs collection time, can be achieved using 1024 increments, for the utilized spectral width of 24,883 Hz in the 13 C-dimension (corresponding to 24.3 Hz/pt digital resolution of the data in the f1-dimension). However, for most of the cases where good enough resolution is required for assignment of main peaks, 256 t 1 -increments is sufficient and 512 still gives a reasonable improvement. This can be reduced further by using a smaller spectral width, as 24,883 Hz (165 ppm) is already rather wide, encompassing much more of the 13 C ppm domain that is necessary for unmodified cellulose. As molecular weight of the cellulose sample increases, the improvement in resolution with application of increasing increments is less apparent due to restricted motion, resulting in faster relaxation. However, 256-512 increments (i.e. 512-1024 time-domain size in f1) are perfectly reasonable values to achieve good S/N in an overnight run for assignment of NRE and RE signals for DP N values of up to * 200. It should also be considered that as the molecular weight increases, the relative abundance of NRE and RE resonances also decreases.

Nitroxyl-radical oxidized cellulose
Oxidation of cellulose with nitroxyl-radicals, such as TEMPO or AcNH-TEMPO, supposedly yield selective oxidation at the surface primary 6-hydroxyls (vs secondary 2-or 3-hydroxyls) to carboxylates but should also be capable of oxidizing the RE (hemi)acetal/aldehyde. Thus, the monomer unit in oxidized cellulose should be glucuronic acid (assuming each monomer is oxidized). If the terminal RE unit is oxidized, at the RE-1 position, gluconic acid should be the oxidized unit. Gluconic acid is available commercially as the sodium salt or as a solution in water, where it exists in equilibrium with the cyclic ester (lactone) form, dependent on water content. Indeed, drying mixtures of gluconic acid, even under ambient conditions, will induce lactonization (with loss of H 2 O) to the d-gluconolactone (Fig. S9). In this study, we dried a 49-53 wt% solution of gluconic acid in water using a rotary evaporator, at RT. The product was dissolved into the electrolyte and an HSQC spectrum was recorded (Fig. S9a). The HSQC spectrum clearly shows two separate spin-systems, i.e. open-chain and lactone forms. Both were assigned using 2D HSQC-TOCSY and compared with the HSQC spectrum for pure d-gluconolactone (Fig. S10). Spectra of glucuronic acid (a-and b-anomers) were also taken for reference and can be found in the Supporting Information (Fig. S12-S13).
A water slurry of LDP-CNCs (15 wt%) was oxidized under mild acidic conditions (pH 5.8) with AcNH-TEMPO, in the presence of the NaClO/NaClO 2 oxidant system (see Supporting Information). Pinnick oxidation conditions were chosen to ensure complete conversion of aldehyde species to carboxylates. Water soluble (high degree of oxidation (DO)) and a water insoluble (low DO) samples were recovered and separated by centrifugation. However, as the highly oxidized sodium carboxylate samples do not dissolve in the electrolyte, we were forced to acidify (Fujisawa et al. 2010) the fractions, for further NMR analysis. The principal structure of polyglucuronic acid is represented in Fig. 1f, with the AGA unit as the oxidized polymeric unit.
While the insoluble fraction expectedly consisted of minimally oxidized cellulose, the soluble TOx-LDP-CNC fraction had clearly identifiable correlations in the HSQC not corresponding to polymeric glucose resonances (Fig. 4). As with previous samples, the resonances for the AGA units were assigned using HSQC-TOCSY ( Fig. S23-S26) but also using HMBC (Fig. S27), to further illustrate the linkage of the carboxylates C-6 to the H-5 position. The assignments for the AGA units were fully consistent with those of 13 C assignments for polyglucuronic acid in D 2 O (Table S7 of the Supporting Information), from TEMPO oxidation of cellulose (Tahiri and Vignon 2000; Isogai et al. 2011). The RE and NRE peaks corresponding to glucose terminated chains are also assignable. One might assume that the NRE C6-OHs should be more accessible to oxidation than any other C6-OH. However, they are clearly present with the NRE more abundant than the RE signals, requiring scaling of the spectra close to the background to visualize the RE signals ( Fig. 4-inset).

Cellobionic acid
Both the common nitroxyl-radical and Pinnick oxidation conditions should lead to oxidation of the reducing ends to carboxylates (Fig. 5). The Pinnick (acidic chlorite) oxidation at the reducing ends of CNCs is commonly used as the first step in reducing end functionalization, typically via amide formation and leading to nano-structures with self-assembly potential (Villares et al. 2018;Lin et al. 2019). To aid in the assignment of the terminal units in the oxidized products, we obtained a commercial sample of cellobionic acid and analyzed it in the [P 4444 ][OAc]:DMSO-d 6 electrolyte (Fig. 5). A doubling of the peaks was observed, consistent with partial conversion to the lactone form. The sample was also observed to be somewhat unstable at 65°C, presumably decomposing by oligomerization. Therefore, the four spin-systems were assigned at 27°C using HSQC-TOCSY (Fig. S29) and the two spinsystems corresponding to the acid form were identified by adding a drop of water into the NMR tube, allowing for almost complete conversion of the lactone form to the acid form (Fig. S31). The final assignments for the mixture of compounds at 65°C showed little deviation from the sample at 27°C. Thus, suitable model assignments for the oxidized reducing ends were afforded by the open-chain acid and closed-chain lactone spin-systems (Fig. 6).

Reducing end oxidation to carboxylate
Under nitroxyl-radical oxidation conditions (Hirota et al. 2009) we would have expected that the RE-1 position would have also been completely oxidized to carboxylates. However, expansion of the acetal region in the HSQC of the TOx-LDP-CNCs and increase in intensity shows presence of residual anomeric CH-1 resonances (Fig. 4a, inset). Clearly, complete oxidation of the reducing ends has not occurred. In addition to the cellobionic acid model compound, a further oxidation of the LDP-CNCs was performed under Pinnick oxidation conditions to allow for a more complete conversion of the reducing ends to gluconate moieties (or to the corresponding lactone). The HSQC spectrum for the oxidized POx-LDP-CNC product (Fig. 7a), where the spectral scale was increased to emphasize the baseline signals, shows signals corresponding to the open-chain acid spin-system, almost identical to that of cellobionic acid but signals corresponding to the closed-chain lactone form are absent. Reducing end signals are also present, again indicating incomplete oxidation. Similarly, if the nitroxyl-radical oxidized sample is also scaled to a similar scale, emphasizing baseline signals, the same peaks corresponding to the open-chain acid form are present (Fig. 7b). HSQC-TOCSY spectra of both these samples (Fig S29-S30) also allow for tracing of the spin-system, consistent with the cellobionic acid model (Fig S28). Therefore, there are now unequivocal solution-state NMR assignments for this functional moiety, which can be used for further understanding/optimization of associated chemistry. Finally, a typical procedure for periodate oxidation (NaIO 4 ) of cellulose nanocrystals (CNCs) was performed (Nypelö et al. 2018). The resulting NaIO 4 -CNCs were then dissolved in the electrolyte and analyzed. After analysis of the samples (overnight at 65°C) the sample was brown, whereas other cellulose samples (nitroxyl-radical-oxidized and unoxidized) did not colorize significantly at all. This may indicate some kind of degradation or possibly iodine formation, from (per)iodate residues. The HSQC spectrum showed a forest of peaks (Fig. 8), clustered mainly around the CH2-5 region and from the high resolution of many of the signals in the 1 H spectrum (Fig. 8, top 1 H trace) it is quite clear that significant depolymerization had occurred. However, when the diffusionedited 1 H spectrum (diffusion-editing filters out the slow-diffusing species) was collected (Fig. 8, bottom 1 H trace), broad signals corresponding to polymeric cellulose resonances were apparent. Based on our previous assignments for glucose (Fig. S5), cellobiose (Fig. 2) and cellulose (Fig. 3), almost all the correlations in the HSQC could be assigned, with only a few signals remaining unassigned. This indicated that a significant proportion of the cellulose was fragmented into glucose, cellobiose and likely other oligomers. Yet, polymeric cellulose also remained. Hosoya et al. (2018) recently demonstrated that oxidation of cellulose at position 6 to carboxylate does not seem to introduce instability to cellulose, based on experimental kinetics and transition-state modelling. However, oxidation at positions 2 & 3 to ketones, and position 6 to aldehyde, does seem to introduce significant instability to cellulose. It is proposed that under alkaline conditions, b-elimination occurs leading to fragmentation of the sugar units. As the [P 4444 ][OAc]:DMSO-d 6 NMR electrolyte is rather a basic media, mainly attributable to the acetate anion and absence of any protic solvating species, it is apparent that the position 2 & 3 aldehydes that are formed during periodate oxidation also introduce significant instability to the cellulose polymer. Therefore, a mechanism can be proposed (Fig. 9) which accounts for the current NMR observations: periodate oxidation proceeds by oxidizing different points along cellulose chains, on the surface of the CNCs. After dissolution into the basic electrolyte, fragmentation at these oxidation sites occurs liberating the oligomeric, dimeric and monomeric sugars which linked the oxidation points on the surface chains. These are clearly resolvable using HSQC. Likewise, the untouched polymeric chains at the core of the CNCs are also resolvable and their presence is clearly illustrated through the diffusion-edited 1 H spectrum, which filters out all low molecular-weight monomeric, dimeric and oligomeric species. This proposed mechanism is also consistent with previous mechanistic studies demonstrating that periodate oxidation on cellulose proceeds heterogeneously, by formation of oxidized domains on the crystallite surfaces (Kim et al. 2000). The 'unknown' low molecular weight residues, that remain unassigned in the HSQC (Fig. 8), may correspond to fragments not attached to the polymeric units, resulting from C2-C3 bond cleavage. Closer examination of the diffusion-edited 1 H and , open-chain acid unit (turquoise) and lactone unit (purple). 'A' and 'L' subscripts refer to 'acid' and 'lactone' forms HSQC spectra (Fig. S34) reveals some more complexity in the (hemi)acetal region, which may result from acetal formation with these fragments.
While it seems that periodate oxidation introduces instability through b-elimination under basic conditions, this method seems to allow for assessment of that stability and may offer a further method for validating the reported increase in stability of further modification schemes, e.g., through borohydride reduction of periodate-oxidized cellulose (Potthast et al. 2009).

Quantitation using HSQC
One drawback of solution-state 13 C NMR analysis is the low abundance of 13 C-nucleus leading to low sensitivity. Thus, high numbers of repetitions are required in order to obtain decent S/N ratios, for adequate quantitation accuracy. This is exacerbated by the requirement for longer relaxation delays. However, there is an increasing trend of deconvolution of 1 H spectra of polymers, as S/N is much better than for 13 C. Of course, not all 1 H resonance signals are easily identified and separated by deconvolution, due to the lower resolution of 1 H. Baseline correction can also be problematic and if one wishes to quantify the RE and NRE signals using this method, the errors very rapidly become large at a DP N of [ 100 (Holding et al. 2016). 2D HSQC provides vastly improved resolution of species over 1D experiments and significantly improved S/N over 1D 13 C-data, as it is a 1 H-detected experiment. In terms of analysis of celluloses and oxidized celluloses, separation of the main polymeric-1 cross-peak from the (nitroxyl-radical) oxidized polymeric-1 cross-peak and from the a-RE, and b-RE, cross-peaks is now very good using the [P 4444 ][OAc]:DMSO-d 6 electrolyte. This potentially allows for a rapid and reliable method for data extraction; requiring only phasing, baseline correction and 2D correlation peak integration of the raw data. NRE signals have volume overlapping with the main polymeric-1 correlation so this is not so clearly separable, except based on the assumption that there is the same amount of NRE species as there are RE species. The geminal-6 signals are also well separated from the rest of the cellulose backbone signals. Separation of other signals is possible but the error starts to increase the closer they are to each other, due to peak volume overlap. Samples with wider ranges of functionalities, such as lignocellulosic biomass samples or those that have resonances downfield from the cellulose polymeric-1 acetal correlations, are also easily separated.
The major drawbacks with 2D HSQC and polymeric samples are four-fold: 1) differences in 1 J( 13 C-1 H) values for different 13 C-1 H pairs cause variations in intensities of those correlations. Typical HSQC experiments assume an average 1 J( 13 C-1 H) value (typically 145 Hz) for the experiments, represented in a specific INEPT polarization transfer delay. 2) Coherence transfer periods, where sufficient time is given for 1 H magnetization to evolve, cause intensity variation of the correlation peaks as each resonance has different T 2 values, i.e., mainly during the INEPT delay periods, more or less signal is lost for different resonances, prior to acquisition. 3) Correlation-peak distortions arising from evolution of homonuclear J( 1 H-1 H) coupling during the INEPT steps can cause errors in integration. 4) Non-linear excitation bandwidth leads to variation of cross-peak signal intensity, especially across the 13 C frequency range at high field strengths. Several quantitative HSQC sequences or processing strategies have been developed that attempt to correct for these issues. Variations in 1 J( 13 C-1 H) values have been corrected for by applying INEPT-delay modulation in the first 'quantitative HSQC' (Q-HSQC) experiment (Heikkinen et al. 2003). This corresponds to the application of a spread of INEPT delays covering the typical 1 J( 13 C-1 H) coupling value range expected in organic materials. Signal losses due to variations in T 2 values as well as in 1 J( 13 C-1 H) values have been accounted for in the 'time-zero HSQC' (HSQC 0 ) experiment (Hu et al. 2011), which records a loop of an increasing train of HSQC sequences (HSQC X , X = 1-3), prior to actual acquisition. This has the effect of increasing coupling and relaxation effects for each loop, which can be extrapolated back to 'time-zero', where potentially all effects are removed. Obviously for HSQC 0 , T 2 values have to allow recording of HSQC 2 and HSQC 3 data sets with intensity allowing reliable extrapolation. The 'quick-quantitative HSQC' (QQ-HSQC) experiment is a rather elegant method for potentially reducing the collection times by a factor of 4 (Peterson and Loening 2007). This encodes the INEPT-modulation into different vertical slices in the sample but only represents a possible doubling of S/N for the same collection times. However, this is only really applicable for small, slow-relaxing, molecules due to relaxation effects. The 'quantitative CPMG-adjusted HSQC' (Q-CAHSQC) experiment (Koskela et al. 2005) applies CPMG-INEPT steps to avoid cross-peak distortions, due to J( 1 H-1 H) coupling evolution. The 'quantitative, offset-compensated, CPMG-adjusted HSQC' (Q-OCCAHSQC) experiment (Koskela et al. 2010) applies novel broad-band pulses to reduce 13 C offset errors that are most prevalent on ultrahigh-field instruments, e.g. 1000 MHz, over wider frequency ranges ([ 150 ppm). This is not really a concern for cellulose samples on a 600 MHz spectrometer, where the 13 C domain is rather narrow (\ 50 ppm) but may start to become an issue for lignocellulose samples at ultrahigh field. The final sequence of interest is the 'quantitative, equal carbon HSQC' (QEC-HSQC) experiment ) where a refocusing period, after the first INEPT step, is used to discard the excess CH 2 and CH 3 magnetization. This yields the same signal intensity for each protonated carbon in the sample. This could potentially be of value in the analysis of lignocellulose samples but is not really necessary at this point for systems where the well separated resonances (and their multiplicities) are already reasonably well assigned. An overview of the different HSQC experiments is given in Table 1. For our purposes, the Q-CAHSQC sequence seems to be most suitable, if we can correct the processed integral data for T 2 losses. Issues with 13 C-offset, variation in 1 J( 13 C-1 H) and relaxation can be accounted for mathematically, to some degree, in post-processing (Zhang and Gellerstedt 2007). The most concerning issue for cellulose is the very fast T 2 relaxation, that the HSQC 0 sequence accounts for, but the other approaches do not. If the T 2 values are known, it is possible to adjust for signal losses by application of Eq. 1, after integration of the 2D HSQC spectra; where D is the delay period in which T 2 losses occur, V is the measured correlation peak volume and V 0 is the theoretical correlation peak volume, with no losses due to relaxation: In order to apply this correction, T 2 values for the resonances of interest must be measured. Zhang and Gellerstedt (2007) have shown that the 2D HSQC-CPMG sequence for determination of T 2 values of cellulose triacetate, gives inaccurate T 2 values. However, this is a chicken and egg scenario; losses due to the HSQC portion of the sequence obviously contribute to the inaccuracies, which becomes more of an issue where there is very high molecular weight material in the sample due to disproportionate loss of signal from those resonances. Therefore, the lower resolution 1D CPMG is the only real option for determining more representative 'average' T 2 values, for such samples. Nevertheless, loss of higher molecular weight signal during the INEPT delays is always going to be a problem. In terms of quantitation of chemical species; if the system is not complex, as is typically in cellulose samples, T 2 values can be relatively easily obtained. As such we determined T 1 and T 2 values for the LDP-CNC, MCC and TOx-LDP-CNC samples (Table 2).
After summing the appropriate delay times leading to T 2 losses, a D value of 13.9 ms was calculated for the Q-CAHSQC sequence, which is in a very similar range to some of the MCC T 2 values. Therefore, considerable signal loss is expected and needs to be corrected for. Equation 1 was applied in the correction of integral values. The main cross-peaks of interest were the polymeric CH-1 (AGU-1), NRE-1, a-RE-1, b-RE-1, oxidized polymeric CH-1 (AGA-1), AGUgem-6 and oxidized RE-6 position (Ox-RE-gem-6). The MCC sample also clearly contained a little xylan, so the xylan 1 (AXU-1) and geminal-5 (AXU-gem-5) positions were also integrated in the MCC sample. The corrected results are given in Table 3 and were processed further to yield a few additional parameters: (1) the ratio of AGU-1 to AGU-gem-6 (AGU 1/6 ), (2) the DP N from HSQC (DP N-HSQC ). (3) The % values of a-RE-1 by HSQC (% a-HSQC ) and b-RE-1 by HSQC (% b-HSQC ), (4) The % values for AXU based on AXU-1 (% AXU-1 ) and AXU-gem-5 (% AXU-5 ). 5) The % values for oxidation of AGU to AGA (% AGA ) and RE (% RE-Ox ). In addition to the processed HSQC data, we have data from deconvolution of the 1 H spectra, for comparison: 1) the DP N from 1 H (DP N-1 H ), 2) The % values of a-RE-1 by 1 H (%a-1 H) and b-RE-1 by 1 H (%b -1 H).
Clearly the T 2 correction has a big impact on the integral values, especially for those with short T 2 values, i.e. for the bulk polymeric AGU-1 and AGUgem-6. As these positions are most likely to be used for quantitation, e.g. of DS values, it is clearly critical to do the T 2 correction. To compare how effective the quantitation is, the parameter AGU 1/6 shows how accurate integration of AGU-1 and AGU-gem-6 is, with the optimum value of 1. For the LDP-CNC sample, the value improves significantly after T 2 correction. For the MCC experiments, the lowerresolution but higher S/N experiments (ns = 40, td1 = 128) gave a value of 1.00 and 0.96, for the room temperature probe-head and He-cooled cryoprobe-head, respectively. This indicates that making all efforts to maximize S/N is critical for quantitation,     Calculated as'100 9 a-RE-1/(a-RE-1 ? b-RE-1)' from the HSQC peak volumes h Calculated as'100 9 b-RE-1/(a-RE-1 ? b-RE-1)' from the HSQC peak volumes i Calculated as'100 9 a-RE-1/(a-RE-1 ? b-RE-1)' from the 1 H peak areas j Calculated as'100 9 b-RE-1/(a-RE-1 ? b-RE-1)' from the 1 H peak areas k Calculated as'100 9 AXU-1/(AXU-1 ? AGU-1 ? a-RE-1 ? b-RE-1 ? NRE)' from the HSQC peak volumes l Calculated as'100 9 AXU-gem-5/(AXU-gem-5 ? AGU-gem-6)' from the HSQC peak volumes m Calculated as'100 9 AGA-1/(AGA-1 ? AGU-1 ? a-RE-1 ? b-RE-1 ? NRE)' from the HSQC peak volumes n Calculated as'50 9 Ox-RE-gem-6/(a-RE-1 ? b-RE-1)' from the HSQC peak volumes o S/N ratio for the AGU-1 peak maximum. Values were determined using the MestreNova 10.0 'SNR Peak Calculator' script even at the expense of resolution. The TOx-LDP-CNC sample also gave a significant improvement in AGU 1/6 , from 0.57 to 0.95. DP N-HSQC values also changed significantly and the corrected values were more or less consistent with the DP N-1 H values. There is a noticeable difference between the DP N-GPC and those obtained from NMR. More accurate studies validating the use of NMR against both labelling and GPC studies are needed. Practically, the HSQC method is still limited in what samples can be studied for DP N-HSQC determination, as better S/N will be required with increases in molecular weight. For the same reason that HSQC-CPMG is not suitable for T 2 determination, for cellulose samples, HSQC on such samples is going to eliminate a significant proportion of the faster relaxing high molecular weight material, artificially decreasing the DP N-HSQC values somewhat. The ratios of a-RE-1 and b-RE-1 were relatively consistent between the corrected and uncorrected values, for HSQC and 1 H deconvolution. However, clearly the lower abundance of RE resonances for MCC causes significant error, although, this situation can be improved using a cryoprobe-head and possibly linear prediction. AXU contents for the two MCC experiments were relatively consistent, based on AXU-gem-5 integration. However, the higher resolution experiment gave more consistent values, based on integration of AXU-1 and AXU-gem-5, as the separation of these peaks from the cellulose resonances was much better in the higher resolution case. The degrees of oxidation for TOx-LDP-CNC, % AGA and % RE-Ox , were also relatively consistent.
Overall, the CH1 peaks for the low molecular weight LDP-CNC and TOx-LDP-CNC samples are easily separable with 512 f1 increments (td1), or perhaps even less (of course depending on field strength), due to their slower T 2 relaxation (Fig. 10a,  b). With the higher molecular weight MCC, separation of the CH1 resonances is definitely improved with the higher number of increments (Fig. 10c, d). While there is sufficient separation of the RE-1 and AGU-1 signals, so that f1 resolution can be lowered further (to allow for increased collection times), poor S/N is still an issue for the RE-1 signal in both spectra (Fig. 10c,  d). This situation is improved somewhat with the use of the cryoprobe-head, where S/N is approximately doubled (Table 3). However, if quantitation of DS values is all that is required, lower resolutions are acceptable to reduce collection times to a few hr. If resolution eventually does become an issue in quantitation of the DS of some substituent, for higher molecular weight samples, then ball milling will likely have to be applied to reduce molecular weights (Ling et al. 2019), preventing disproportionate T 2 losses. However, this requires future work with well-defined samples over wide molecular weight ranges, both polydisperse and non-polydisperse.

Further applicability
This method is ideally suited to the analysis of nanocelluloses, due to the relatively low molecular weight that these samples show, in particular CNCs. However, higher molecular weight samples are also possible, which makes this method of significant wider value for following cellulose surface chemistry, where crystallinity is maintained. Indeed, it has been possible to dissolve and collect an HSQC spectrum for even bacterial nanocellulose, in a related solvent system (Holding et al. 2016). The main limitation here is the faster signal relaxation, which would have a significant effect on any HSQC quantitation, relative to the cellulose backbone signals. However, quantitation through 1D spectra, with the aid of signal deconvolution, would not be affected. Thus, a combination of 1D and 2D methods can be applied, optionally using the [P 4444 ] ? signals themselves as internal standard.
Several solvent systems are known for analysis of whole biomass samples (Foston et al. 2016). Mansfield et al. (2012) have recommended the use of routine HSQC experiments for quantifying biopolymer species, in whole biomass samples. Their protocol demonstrates the swelling of planetary-milled wood samples in DMSO-d 6 /pyridine-d 5 (4:1) to yield 'gelled' samples, yet with a solvent mixture that is unable to directly dissolve cellulose. This method was said to yield similar quantitation results to those samples which are fully processed into the solutionstate, by peracetylation. However, wood samples are a difficult case, not only due to the insolubility of cellulose in common molecular solvents but also due to their recalcitrant nature, in general (Deb et al. 2016;Kyllönen et al. 2013;Kilpeläinen et al. 2007). Thus, there is still some way to go to establish quantitative conditions for whole biomass samples, even with nonderivatizing direct-dissolution cellulose solvents. The cellulose portion of these materials is always the most troublesome as it is such a rigid polymer, which suffers from fast T 2 relaxation. However, if extensive milling is applied to allow for full solubilization and low enough molecular weight (further increasing T 2 values), more accurate quantitation may be close. In this context, suitable stable direct-dissolution solvents for solution-state NMR analysis of whole biomass samples has been something that has also been lacking. Cheng et al. (2013)  :DMSO-d 6 was then prepared and used for the application of a quantitative HSQC experiment on fully dissolved solutions. However, as mentioned previously, [emim][OAc] is known to react with cellulose (Liebert and Heinze 2008;Ebner et al. 2008, Clough et al. 2015. It also has e MCC (ns = 40, td1 = 128) with a room temperature probehead, and f MCC (ns = 40, td1 = 128) with a cryoprobe-head. F1 is the 13 C dimension and F2 is the 1 H dimension. No forward linear prediction was used to improve resolution significant signal overlap with the polysaccharide spectral region and [emim][OAc]-d 14 is much too expensive and laborious to prepare, for routine analyses. Nevertheless, the [emim][OAc]:DMSO-d 6 solutions were shown to be stable over a longer period (2 weeks), whereas the molecular solvent dispersions showed phase-separation. This is a good indication that the current solvent-system may open the window to a much wider range of samples.

Conclusions
The chemical shifts of polymeric units in cellulose, including NRE and RE units can be unambiguously assigned using solution-state NMR in a novel ionic liquid electrolyte, [P 4444 ][OAc]:DMSO-d 6 . The main monomeric units in 4-AcNH-TEMPO oxidized cellulose (polyglucuronic acid) are also assigned, as are the terminal units for the unoxidized and oxidized materials. The latter has led to identification of the terminal open-chain gluconate moiety after both the used acidic 4-AcNH-TEMPO protocol and Pinnick oxidation conditions. However, in both instances RE groups remained in the oxidized products, indicating a further need for optimization of this reaction for different substrates or more defined structural characterization of substrates for surface oxidation sites, that may undergo b-elimination, yielding new reducing ends. Periodate oxidation of cellulose clearly introduces instability into cellulose, when the dry oxidation product was introduced to the basic electrolyte media. This degradation mechanism is thought to be similar to the b-elimination mechanisms, illustrated in previous publications, for aqueous alkaline media but obviously requires further study to elucidate the mechanism of degradation. Further investigations into how to stabilize the periodate oxidation products towards basic degradation, by further chemical modifications, are needed as periodate oxidation of cellulose is a widely utilized technique. However, NMR analysis in the electrolyte medium seems to be a useful probe into the stability of these compounds, in addition to providing the necessary chemical species resolution that other techniques cannot. Of course, this is also a direct method to follow the progress of oxidation reactions. Nitroxyl-radical-type oxidations (to 6 position carboxylates), under mild acidic conditions, seems to be quite robust, in terms of resulting product stability in the electrolyte and under aqueous alkaline conditions. Thus, avoiding aldehyde formation under alkaline oxidation conditions is clearly important in improving the quality of the oxidized products, by preventing losses and molecular weight reduction due to fragmentation of surface chains. Q-CAHSQC, with T 2 correction, seems to be a suitable experiment and processing combination to yield quantitative data from HSQC, without calibration against internal standards. While this is still not suitable for accurate determination of DP N for higher molecular weight and low polydispersity samples, accurate DS and regioselectivity determination will be possible for certain chemical modifications, even at reducing ends in lower molecular weight samples, such as model CNCs. However, it should be stressed that this solvent system and processing strategy are not only applicable to nanocelluloses but offer the chance to significantly improve our opportunities for quantitative analysis of whole biomass samples, that contain a significant crystalline cellulose phase composition.