Introduction

The structure and chemistry of cellulose has been studied for decades (O’Sullivan 1997; Payen 1838), and interest in this biologically-produced polymer has increased recently due to its potential use as a source of renewable fuels (Himmel et al. 2007; Himmel et al. 1999; Lynd et al. 1991) and materials (Moon et al. 2011). Utilization of cellulose from plants is inhibited by the fact that it is difficult to extract and biodegrade because of its own structure and its interactions with other plant cell wall components. Our interest is mainly in the key role cellulose plays in plant cell wall architecture. Before one can create a realistic model of the plant cell wall, the structure of cellulose and its surfaces must be known.

Although the X-ray and neutron diffraction patterns of Iα and Iβ cellulose were collected and interpreted a decade ago (Nishiyama et al. 2002; Nishiyama et al. 2003), the nature of cellulose is such that these methods do not fully define the details of its structure (Atalla 1999). The small cellulose fiber diameter and partially disordered regions lead to uncertainty regarding several questions. For example, does cellulose twist (Fernandes et al. 2011; French and Johnson 2009)? If this twist exists, does it contribute to periodic disruption of order (Moon et al. 2011)? Are the H-bonding distances accurate and how does H-bonding relate to the observed vibrational spectra (Nishiyama et al. 2008)? Lastly, how do the hydroxymethyl groups on each glucose residue unit rotate with respect to the ring atoms and what are the energy barriers to rotation (Gonzalez-Outeiriño et al. 2006)?

In addition to X-ray and neutron diffraction, vibrational (i.e., infrared and Raman) and nuclear magnetic resonance (NMR) spectroscopies have added complementary information that shed further light on details of cellulose structure (Atalla 1999; Atalla and VanderHart 1984; Blackwell 1977; Earl and VanderHart 1981, 1984; Heiner et al. 1995; Hesse-Ertelt et al. 2008; Koch et al. 2000; Sternberg et al. 2003; Witter et al. 2006). Consequently, the goal of this paper is to use quantum mechanical calculations to determine model structures that match the XRD structures while producing vibrational frequencies consistent with observed vibrational spectra and 13C chemical shifts (δ13C) that match experimental NMR values. Once this agreement between observation and computation is achieved, the computational methods can be used to address problems in cellulose surface chemistry, formation, and amorphization.

Although DFT has had numerous successes throughout computational chemistry, modeling van der Waals forces has been problematic because standard DFT does not account for these intermolecular electron correlation energies (Grimme 2006). A number of research groups (Grimme 2006; Kim et al. 2012; Zhao et al. 2006) have addressed this issue by adding empirical corrections to account for van der Waals forces based on DFT-calculated electron densities. For this work, the DFT-D2 method of Grimme (2006) has been employed because it has been implemented in the Vienna Ab-initio Simulation Package, VASP 5.4 (Kresse and Furthmüller 1996), and been shown to accurately reproduce cellulose Iα and Iβ structures (Bućko et al. 2011; Li et al. 2011). In this study, we take the additional steps of calculating vibrational frequencies for comparison against observed vibrational spectra and δ13C NMR chemical shifts to compare against observed NMR spectra.

In addition to testing and verifying the cellulose structures published by Nishiyama and co-workers based on X-ray and neutron diffraction, our goal was to demonstrate that DFT-D2 methods can reliably reproduce the observed vibrational frequencies of cellulose because this ability will allow us to trust this methodology when simulating less well understood materials such as “amorphous” or disordered cellulose. In combination with hybrid molecular orbital/density functional theory calculations (MO/DFT) to predict NMR chemical shifts, DFT-D2 can be used to interpret the molecular-level structures of cellulose when order is insufficient for XRD. For example, knowledge of cellulose surface structures is critical when attempting to understand cellulose interaction with water (Li et al. 2011; Matthews et al. 2006; Newman and Davidson 2004), hemicelluloses (Hanus and Mazeau 2006; Mazeau and Charlier 2012) or cellulose binding domains (Fernandes et al. 2011; Harris et al. 2012; Tavagnacco et al. 2011; Tormo et al. 1996). With detailed atomistic models benchmarked against a variety of analytical data, the DFT-D2 results can then be used as benchmarks for classical molecular mechanics simulations of cellulose and its interactions with other plant cell wall components.

Methods

Carbohydrate main-chain torsion angle conformations (Φ, Ψ) (i.e., Φ = O5-C1-O-C4; Ψ = C1-O-C4-C5) are defined based on the notation shown in Scheme 1. In carbohydrates, the exocyclic hydroxymethyl torsions can collectively be represented by the orientation of the O5-C5 bond with respect to the C6-O6 bond (χ1 = O5-C5-C6-O6) and the C4-C5 bond with respect to the C6-O6 bond (χ2 = C4-C5-C6-O6). In cellulose, the values of hydroxymethyl torsions (χ) vary depending on residue 1 (χ1) and residue 2 (χ2) in the Iα polymorph/center chain (χ1, χ2) and the origin chain (χ1, χ2) and center chain (χ′1, χ′2) in the Iβ polymorph (Nishiyama et al. 2002, 2003). The χ1 and χ′1 are referred to as trans-gauche (tg; χ1 = 180°), gauche-trans (gt; χ1 = 60°) and gauche-gauche (gg; χ1 = 300°) depending on the positions of the O5-O6 and H5-H6 atoms (Gonzalez-Outeiriño et al. 2006).

Scheme 1
scheme 1

The Arabic numerals represent the C-atoms in a two-glucose residue unit of a cellulose chain showing the nomenclature used in this paper for the torsion angles in cellulose. Note that the “n” does not imply that cellobiose is the repeating unit of cellulose, but that the structure of the polymer continues in both directions through β(1 → 4) linkages

The initial structures of Iα and Iβ cellulose were created using Materials Studio 5.5 (Accelrys Inc., San Diego, CA, USA) based on CIF files published by Nishiyama et al. (2002) and (2003), respectively. Cartesian coordinate files generated with Materials Studio 5.5 were converted into VASP 5.2 input files via a Perl script written by A.V. Bandura (St. Petersburg State University). Manipulations of χ to three different conformations (tg, gg and gt) and H-bonding networks (NetA and NetB; ‘Net’ here refers to ‘Network’) were performed manually in Materials Studio 5.5 based on two possible H-atom positions centered about O6 suggested by Nishiyama and co-workers (Nishiyama et al. 2002). The Iα and Iβ simulation cell stoichiometries were C24O20H40 and C48O40H80, respectively. The Iα/tg/NetA conformation was also doubled in size along the b-axis (Fig. 1) to test the effect of system size on crystal structure; this model is designated 1 × 2 × 1.

Fig. 1
figure 1

Calculated minimum energy structures of a Iα and b Iβ cellulose. Blue dashed lines are H-bonds, and the black boxes outline the simulation cells. C gray; O red; H white. (Color figure online)

Finite clusters were extracted from the energy-minimized periodic structures and methyl groups were added to terminate broken C-O bonds to the next glucose residue unit. Atomic coordinates were fixed in the positions determined by the DFT-D2 energy minimizations. The models were 7 and 12 chains of glucose tetramers for Iα and Iβ, respectively (C182O147H322 and C312O252H552). Only the glucose residues in the center of the finite clusters were used to predict 13C NMR chemical shifts (δ13C), because those C-atoms should best reflect those found in the interior of cellulose.

Periodic calculations were performed with the Vienna Ab-initio Simulation Package (VASP) (Kresse and Furthmüller 1996; Kresse et al. 1994; Kresse and Hafner 1993, 1994). Projector-augmented planewave pseudopotentials were used with the PBE gradient-corrected exchange correlation functional for the 3-D periodic DFT calculations. The choice of electron density and atomic structure optimization parameters were based on Li et al. (2011) and Bućko et al. (2011). An energy cut-off of 77,190 kJ mol−1 was used with an electronic energy convergence criterion of 9.65 × 10−6 kJ mol−1. Atomic structures were relaxed until the energy gradient was <1.93 kJ/(mol*Å). 2 × 2 × 2 k-point samplings were used. Atoms were first allowed to relax with the lattice parameters constrained to the experimental values, and then the atoms and lattice parameters were allowed to relax to obtain the structures, energies and spectroscopic properties reported herein. The D2 dispersion-correction parameters were 40 Å for the cutoff distance (Bućko et al. 2011) and 0.75 for the scaling factor (s6) and 20 for the exponential coefficient (d) in the damping function (Grimme 2006).

Frequency analyses were performed on the energy minimized structures as predicted using VASP. Second derivatives of the potential energy matrix with respect to atomic displacements were calculated using two finite-difference steps (NFREE = 2) and atomic movements of 0.015 Å (POTIM = 0.015). Calculated frequencies were scaled by a factor of 0.97 for comparison with experiment based on comparison of vibrational modes observed in sum frequency generation (SFG) spectra (Lee et al. 2012) and the calculated vibrational modes. Vibrational modes were analyzed using the free version of the program wxDragon 1.8.0 (Eck 2012).

Finite cluster calculations and shielding tensor calculations on the finite clusters were carried out using Gaussian 09 (Frisch et al. 2009). Gauge-independent atomic orbitals (GIAO) (Wolinski et al. 1990) were employed with the modified Perdue-Wang exchange-correlation functional mPW1PW91 and the 6-31G(d) basis set (Rassolov et al. 2001). Chemical shifts were calculated relative to methanol because this secondary standard produces δ13C in better agreement with experiment (Sarotti and Pellegrinet 2009; Watts et al. 2011) than does a direct comparison of the tensors with the tetramethylsilane standard (Cheeseman et al. 1996). This multi-standard reference method also uses an empirical correction (50.4 ppm) for the difference between the δ13C in methanol and (TMS) commonly used as an experimental 13C NMR standard (Sarotti and Pellegrinet 2009).

Benchmark testing of the computational methodology on reproducing observed H-bond energies and O-H stretching frequencies was conducted on the H2O-H2O and CH3OH-CH3OH dimers. These are excellent test systems for H-bonding accuracy because the interactions between the molecules are dominated by a single H-bond and because there are experimental data on the energy of dimerization and vibrational frequency shifts from monomer to dimer. Gaussian 09 (Frisch et al. 2009) calculations were performed with the ωB97X-D (Chai and Head-Gordon 2008) functional and the 6-311G(d,p), 6-311+G(d,p), and 6-311G++(d,p) basis sets. The ωB97X-D exchange-correlation functional has proven reliable for reproducing H-bonding (Cirtog et al. 2011), and the basis sets selected are reasonably robust for these compounds. The diffuse functions (“+” in the basis set designations) were added to evaluate the effect they have on H-bond energies and O-H stretching frequencies because these types of functions are missing from the DFT-D2 calculations. The above three dimers were fully energy minimized and subjected to frequency analyses in Gaussian 09, then these structures were re-minimized and subjected to frequency analyses in VASP 5.2 using the same method described above for the cellulose models.

Results

H2O-H2O and CH3OH-CH3OH benchmark tests

As a test of the accuracy of the DFT-D2 computational methodology employed in the VASP calculations, comparisons of ωB97X-D and DFT-D2 results were made for the H2O-H2O and CH3OH-CH3OH dimers. The water dimerization energies (ΔEdimer) are equal to −32, −26 and −26 kJ mol−1 for the 6−311G(d,p), 6-311+G(d,p) and 6-311++G(d,p) basis sets, respectively. The DFT-D2 ΔEdimer result was −25 kJ mol−1. The calculated ΔEdimer is approximately within the range of reported experimental data of −20.4 to −12.3 kJ mol−1 with reported uncertainties of approximately ± 3.5 kJ mol−1 (Fiadzomor et al. 2008; Rocher-Casterline et al. 2011). Addition of diffuse functions did improve the precision of the calculated results with respect to the water dimerization energy by −6 kJ mol−1.

More relevant to the results on cellulose considered in this study are the H-bond distances and O-H stretching frequencies of the H-bonded OH group. The ωB97X-D/6-311G(d,p) and 6-311++G(d,p) basis sets resulted in O-O bond lengths of 2.811 and 2.835 Å and unscaled harmonic O-H symmetric stretching frequencies of 3,785 and 3,779 cm−1, respectively. Thus, for H-bond distances and symmetric O-H stretching frequencies, the lack of diffuse functions in the basis set does not make a significant difference on the calculated values. In comparison, the DFT-D2 calculated values for ΔEdimer, O-O H-bond distance and O-H stretching frequency associated with the H-bond in the H2O-H2O dimer are −25 kJ mol−1, 2.856 Å, and 3,814 cm−1, respectively. The experimental O-H frequency is 3,735 cm−1 (Huisken et al. 1996), and O-O H-bond distance is 2.976 Å (Odutola and Dyke 1980). Therefore, DFT-D2 results agree well with the observed interaction energy, O-O distance and O-H stretching frequency. This is especially true if the DFT-D2 frequency is scaled by the 0.97 factor (3,814 × 0.97 = 3,670 cm−1) derived by correlating DFT-D2 frequencies with SFG frequencies (see “Methods” section).

The CH3OH-CH3OH dimer is a test system that is even more analogous to the H-bonds found in cellulose. Although experimental data on the ΔEdimer is not available for the methanol dimer, we suggest that the ωB97X-D/6-311++G(d,p) calculations on the methanol dimer are as accurate as they are for the water dimer. Hence, comparison of DFT-D2 and ωB97X-D/6-311++G(d,p) results can be used to evaluate the accuracy of the DFT-D2 energy results. The DFT-D2 ΔEdimer is −29 kJ mol−1 which compares favorably with −28 kJ mol−1 calculated using ωB97X-D/6-311++G(d,p). Similarly, the O-H-O H-bond distance using DFT-D2 is 1.826 Å whereas the ωB97X-D/6-311++G(d,p) value is 1.875 Å. The experimental value for this bond length is 2.034 Å (Lovas et al. 1995), so the calculated results were within 0.2 Å of the observed H-bond length. The unscaled, calculated O-H stretching frequencies associated with this H-bond are 3,504 and 3,767 cm−1 for DFT-D2 and ωB97X-D/6-311++G(d,p), respectively. In this case, the ωB97X-D/6-311++G(d,p) result overestimated the observed value of 3,527 cm−1 (Han et al. 2011) by 126 cm−1 (3 %, after scaling by 0.97); furthermore, when scaling is considered, and the DFT-D2 result underestimates the frequency by 128 cm−1 (3 %) after scaling. We conclude that the DFT-D2 methodology used on cellulose in this study is accurate to approximately 10 and 3 % for the O-H distances and O-H stretching frequencies based on these comparisons with experimental and the ωB97X-D/6-311++G(d,p) results.

Structures

Table 1 contains the lattice parameters and glycosidic torsion angles from experiment (Nishiyama et al. 2002, 2003), 2-D periodic DFT calculations (Nishiyama et al. 2008), previous 3-D periodic DFT-D2 calculations (Bućko et al. 2011; Li et al. 2011) and the present study for Iα and Iβ cellulose. Although variations on the order of a few percent are present throughout when comparing calculated results versus experimental data and our calculated results against previous calculations, the model values are reasonably precise. The tg/NetA results agree best with experimental lattice parameters compared to tg/NetB, gt and gg, for both Iα and Iβ (Table 1). In general, the calculated lattice parameters for the tg/NetA model are less than observed and some of this discrepancy may be accounted for the temperature difference between experiment and theory (i.e., observations at 298 K and calculations at 0 K) causing thermal expansion in the former as compared with the latter. In addition, the tg/NetA models are predicted to have the lowest total electronic energies for both Iα and Iβ (Table 2), consistent with the interpretation of experimental data (Nishiyama et al. 2003). For Iα and Iβ, the energy differences between NetA and NetB conformations are −20 and −24 kJ/glucose residue, respectively.

Table 1 Comparison of Iα and Iβ calculated lattice parameters with observed structure via X-ray and neutron diffraction by Nishiyama et al. (2002, 2003)1 and low-temperature structure of Nishiyama et al. (2008)2; Nishiyama calculated values from Nishiyama et al. (2008); Li values from Li et al. (2011); and Bućko values from Bućko et al. (2012). CHARMM values based on structure of 6 × 6 × 40 glucose residue microfibril MD simulation in water in the tg/NetA conformation
Table 2 Calculated total electronic energies (in kJ mol−1 per glucose unit) including the D2 dispersion correction for van der Waals forces and relative energies (ΔE) compared to the Iβ/tg/NetA model

One of the larger variations from observation is for the b-lattice parameter of Iα where our closest calculated value is 5 % shorter (Table 1) than found in Nishiyama et al. (2003). These discrepancies are not surprising because the b-lattice parameter is strongly influenced by van der Waals forces between sheets which are likely to be the least accurate component of the DFT-D2 methodology.

The structure of Iβ cellulose is predicted more accurately than Iα (Table 1). For the tg/NetA conformation, the largest discrepancy is the a-lattice vector with a 3 % error. The better agreement for Iβ versus Iα is likely due to the monoclinic symmetry of the Iβ unit cell serving to limit the relaxation of the simulation cell. This symmetry effect seems to be significant in spite of the fact that the simulation cell was doubled in size along the c-direction compared to the Iα simulation cell.

Glycosidic torsions (Scheme 11, Φ2, Ψ1, Ψ2) are critical parameters in determining whether or not cellulose twists (French and Johnson 2009), and these calculated structural parameters are also compared with experiment and previous calculations in Table 1. The discrepancies of our calculated values with those determined by Nishiyama et al. (2002, 2003) are relatively small. For both Iα and Iβ, the calculated torsion angles differ from experiment by at most 6 °. The reported experimental standard deviation for the glycosidic torsion angles are approximately 3° for Iα and up to 20° for Iβ cellulose (Nishiyama et al. 2003). Hence, these model deviations from experiment of ≤6° are reasonably precise. Previous calculations resulted in similar differences from experiment of approximately 5 % (Bućko et al. 2011; Li et al. 2011; Nishiyama et al. 2008). Unfortunately, due to the limited size of the simulation cell, the possible twist along the cellulose chains cannot be directly investigated. In addition, the experimental uncertainty and computational error will not allow for evaluation of potential subtle, long-range twisting. Nonetheless, this model of cellulose that strictly excludes twisting will be shown to reproduce experimentally observed spectra accurately. Models including any twist must reproduce experimental observables more accurately in order for twisting to be considered a necessary component of the cellulose structure.

Nishiyama et al. (2002, 2003) determined that the torsions of the hydroxymethyl groups (χ) are in the tg conformation for both Iα and Iβ cellulose based on X-ray diffraction patterns and structural refinements. Consequently, previous studies using DFT-D2 methods to model these structures focused solely on the tg conformations of the hydroxymethyl groups (Bućko et al. 2011; Li et al. 2011). However, based on classical molecular dynamics simulations of cellulose, Matthews et al. (2012) have suggested that Iα undergoes a conformational change from tg to gg, and Iβ switches from tg to gt at temperatures of 500 K. Furthermore, χ of native cellulose surfaces may not be constrained to the values observed in the interior due to the interactions with water and other plant biopolymers (Fernandes et al. 2011; Harris et al. 2012; Newman and Davidson 2004; Viëtor et al. 2002; Štrucova et al. 2004). Hence, we have also investigated the gt and gg conformations in addition to tg of Iα and Iβ cellulose in order to predict energy and structural differences accompanying hydroxymethyl group rotations. Although the calculated lattice parameters, glycosidic torsions and energies of these other conformations do not suggest that they exist in large percentages in the particular samples analyzed (Tables 1, 2), consideration of these other forms is worthwhile.

Table 3 lists the experimental and calculated χ1 and χ2 torsion angles for the hydroxymethyl groups in Iα and Iβ. Our calculated values are similar to the Nishiyama et al. (2003) values for the tg/NetA conformations, especially considering experimental uncertainty. All values deviate from the ideal values of 180° and −60°. The gt and gg torsion angles also are predicted to deviate by only a few degrees from the ideal values of +60/+60° and −60°/−60°, respectively. These changes in the hydroxymethyl group torsion angles induce changes in the a and b lattice parameters of Iα cellulose that are outside of the range of computational accuracy, but the changes in lattice parameters for Iβ cellulose are smaller than for Iα (Table 1). This result suggests that mixtures of tg/gt/gg conformers may be more difficult to detect via X-ray and neutron diffraction in Iβ compared to Iα cellulose. Relative to the tg conformers, the calculated energy changes are +11 and +24 kJ mol−1 for the gt and gg conformers in Iα, respectively, and +19 and +26 kJ/glucose residue for the gt and gg conformers in Iβ, respectively (Table 2). Hence, higher temperatures could induce transitions to other conformations when entropic effects are considered.

Table 3 Comparison of experimental (Nishiyama et al. 2002, 2003) and calculated hydroxymethyl torsions. CHARMM values based on structure of 6 × 6 × 40 glucose residue microfibril MD simulation in water in the tg/NetA conformation

Hydrogen bonding

Selected H-bond and O-O distances are listed in Table 4. In general, the DFT-D2 values are up to 0.2 Å less than the experimental values, whereas the CHARMM-based values from a 6 × 6 × 40 Iβ microfibril are typically 0.1 Å greater than the experimental values. The calculated distances are comparable to those obtained with DFT-D2 and wB97X-D/6-311++G(d,p) for the water and methanol dimers discussed above. These discrepancies can be considered small for computational methods, especially considering the uncertainties in the experimental values. However, the DFT-D2 model over-estimation of the H-bond strengths could have an effect on the calculated O-H stretching frequencies discussed below because O-H vibrations are highly sensitive to H-bond strengths. This will be particularly problematic for the Iα O3-H3-O5 and O6-H2-O2 O-H stretches and the Iβ O6-H6-O3 stretch where the frequencies may be 200 cm−1 lower than observation because of the stronger calculated H-bonding (see “Vibrational frequencies” section below). In addition, the DFT-D2 calculations do not predict an O6-H6-O1 H-bond in Iα cellulose observed by Nishiyama et al. (2003) (Table 4).

Table 4 Iα and Iβ H-bonding parameters compared to experimental values of Nishiyama et al. (2002, 2003)

Vibrational frequencies

To test the accuracy of our frequency calculations on crystalline solids, we compared calculated and observed vibrational frequencies of crystalline cellobiose. There is no controversy about the cellobiose crystal structure, so this model provides a firm link between structure and vibrational frequencies. Figure 2a, b illustrates that the correlation between the modeled and observed frequencies is excellent (IR: slope 1.00, intercept −2 cm−1, R2 0.998; Raman: slope 1.00, intercept 2 cm−1, R2 0.998). A perfect 1:1 correlation would result in a slope of 1.0, intercept of 0.0 and R2 value of 1.0. Hence, we conclude that our computational methodology is accurate for carbohydrate vibrational frequencies.

Fig. 2
figure 2

Correlations of calculated with observed a IR and b Raman frequencies for cellobiose, c, d Iα cellulose and e, f Iβ cellulose. Cellobiose crystal IR and Raman from Xie et al. (2011), Valonia Iα IR frequencies from Blackwell (1977), Valonia Iα and Ramie Iβ Raman frequencies from Wiley and Atalla (1987), and Valonia Iβ (converted from Iα) IR frequencies from Maréchal and Chanzy (2000)

The correlations of calculated vibrational frequencies with observed IR and Raman spectra are excellent for both Iα and Iβ cellulose (Fig. 2c-f). The slopes and R2 values of the tg/NetA conformations deviate from their ideal values by <2 %. The maximum error for the intercept is 20 cm−1 for the Iα Raman spectrum (Fig. 2d; Table 5). Paradoxically, the maximum error is found for the tg/NetA conformation that best matches the crystal structure (Table 1) and 13C NMR chemical shifts (see below). This result makes it difficult to distinguish the correct model of cellulose structure based on IR and Raman spectra because all the models are reasonably consistent with observed frequencies. We think the reason for the discrepancy between observation and the model tg/NetA is that the higher frequency O-H stretches that appear in the calculated tg/NetB conformation and observed spectra. Natural cellulose samples are probably a mixture of different conformations (Nishiyama et al. 2008); hence, the observed spectra may pick up frequencies that are not due to the most stable and predominant conformation.

Table 5 Correlation statistics for IR and Raman frequencies of Iα and Iβ cellulose

Vibrational modes visualized are generally consistent with previous spectral assignments of IR and Raman bands, but detailed analysis of vibrational modes is beyond the scope of this paper. Based on these statistics it could be impossible to distinguish among the four conformations (tg/NetA, tg/NetB, gt and gg). Many vibrational modes such as C-C stretches and CH2 angle bends should not be sensitive to the long-range order because short-range covalent forces dominate them. However, the O-H stretching region between 3,000 and 3,500 cm−1 should be affected by longer-range structure because H-bonding distances would be affected by changes in lattice parameters. H-bond distances are known to have a readily observable correlation with O-H stretching frequencies.

Consequently, we performed correlations between calculated and observed O-H stretching modes separately. The correlation statistics for all models were poorer compared to the overall correlations as expected, but the gt and gg models resulted in better fits compared to the Iα IR and Raman frequencies, respectively. The tg/NetB model better matched observed Iβ IR and Raman O-H stretching frequencies.

There could be a number of reasons for these discrepancies. For example, we note that the IR spectrum for Iα cellulose of Blackwell (1977) exhibits a peak near 3,408 cm−1. The model that best produces this higher frequency (i.e., less H-bonded) O-H stretch is the Iα/tg/NetB conformation. We have concluded above based on comparisons of structures to experiment and calculated energetics, that the Iα/tg/NetA conformation should be dominant, but mixtures of networks A and B are possible in cellulose (Nishiyama et al. 2008). The fact that our calculations reproduce other observed frequencies so accurately, while this one vibrational mode can deviate from observation by over 100 cm−1, leads us to conclude that it is not inaccuracy of the model calculations but rather structural heterogeneity that leads to the presence of this 3,408 cm−1 peak in the observed spectrum. A similar discrepancy exists between the calculated and observed IR spectrum of Iβ cellulose; but in this case, the network A conformation is approximately 60 cm−1 under-estimated compared to experiment (Table 4). Another potential reason why the preferred Iα/tg/NetA conformation does not reproduce some of the observed higher frequency O-H stretches such as that at 3,408 cm−1 could be that the observed spectra are detecting O-H groups at the surface of the cellulose. The H-bonding of surface OH groups could be less strong than internal OH groups and lead to higher frequency O-H stretches. Lastly, the scaling factor for the O-H stretches could be different from that obtained based on the CH2 stretches.

A crystallinity index for cellulose I based on the intensities of the 380 and 1,096 cm−1 Raman peaks is available (Agarwal et al. 2010). Since these two vibrational modes may be related to the order/disorder of cellulose I, we examined the model vibrational modes of frequencies in these regions. Calculated frequencies were scaled by 0.97 based on matching observed and calculated sum frequency generation (SFG) modes (Lee et al. 2012), and we found that the observed 380 cm−1 vibration is due to C1-C2-C3, C1-O1-C4 and C1-O5-C5 bending modes in our calculations. The 1,096 cm−1 Raman peak consists of C1-O1, C1-C2 and C4-C5 stretches, C1-O1-C4 angle bends, and C5-C6H2 twists. Note that Raman intensities were not calculated, so we cannot definitively assign these modes to Raman bands nor examine the intensity ratios, but these motions do provide a hypothesis for amorphization of cellulose I. The fact that the internal ring modes of the 380 cm−1 peak lose Raman intensity suggests that the glucose residues may be distorting from the chair conformation. The predicted motions of the 1,096 cm−1 peak involve both the glycosidic and hydroxymethyl torsions, which will affect the order both along and between the glucose residue chains.

13C NMR chemical shifts

As discussed above for the IR and Raman spectra, the tg/NetA conformation results in the best overall correlation of calculated and observed δ13C values for both Iα and Iβ cellulose (Table 6; Figs. 3, 4). In this case, the slopes deviate from 1.0 by <1 % and the intercepts deviate from 0.0 by <1 ppm. The maximum error between computation and observation for individual peaks is no greater than 3 ppm for both Iα and Iβ tg/NetA (Table 6). The correlations of the other conformations are reasonable, but all have greater root-mean-squared error (RMSE) and maximum errors than the tg/NetA conformations. These results represent a significant improvement over previous semi-empirical calculations (Koch et al. 2000) and are of similar quality to the results in Witter et al. (2006) based on empirical correlations of structures versus observed δ13C values.

Table 6 Correlation values for δ13C calculations versus observation (Witter et al. 2006; Sternberg et al. 2003)
Fig. 3
figure 3

Example extracted cluster used to calculate 13C NMR chemical shifts based on central atoms. Glucose residues in the center of the cluster (circled) were used for δ13C correlations versus experiment. C gray; O red; H white. (Color figure online)

Fig. 4
figure 4

Correlations of calculated δ13C values with observed chemical shifts for a Iα cellulose from Witter et al. (2006) and b Iβ cellulose from Erata et al. (1997) as cited in Sternberg et al. (2003)

C1 and C4 13C chemical shifts and glycosidic torsions

The C1 and C4 atoms involved in the glycosidic torsions are in agreement with observed values well within the accuracy of our methodology. This could have three explanations. One, the δ13C signals from these atoms could be relatively insensitive to the 10 % error in Φ and Ψ mentioned above in our model. Two, there could be compensating errors in computing the δ13C values, allowing an inaccurate structure to provide fortuitous agreement with experiment. Three, model Φ and Ψ values could be more accurate than Table 1 would suggest.

The first explanation was tested by varying the Φ and Ψ values of a cellulose dimer model through (−86°, 151°), (−87°, 156°) and (−88°, 161°) with other atomic coordinates held constant. The C1 and C4 δ13C values varied by 2.6 and 1.1 ppm over this range, respectively. In contrast, ranges of approximately 15 and 12 ppm for the C1 and C4 δ13C values have been reported (Suzuki et al. 2009), but the range of Φ and Ψ values was from 0 to 360° in that study. The Suzuki et al. (2009) results within the window of Φ and Ψ values we investigated show small δ13C dependence, so the current results are actually consistent with this previous study. Hence, one can conclude that the C1 and C4 chemical shifts are relatively insensitive to changes in the glycosidic torsion angles within our computational accuracy. The second argument can be discounted because similar methodologies applied to simple molecules with known structures (e.g., cellobiose) predict 13C chemical shifts to within approximately 2 ppm. The third explanation cannot be tested with our methods because of the insensitivity of the C1 and C4 δ13C values mentioned above. We conclude that uncertainty in measuring Φ and Ψ angles and the actual variation in these cellulose structural parameters forces one to accept this level of uncertainty.

C4, C5 and C6 13C chemical shifts and hydroxymethyl torsions

In a similar vein, the C4, C5 and C6 δ13C values should be sensitive to the χ1 and χ2 torsion angles (Suzuki et al. 2009). The fact that the calculated δ13C values for C4, C5 and C6 match experiment closely (Fig. 4) is another indication that the tg conformation predominates and that the model structures predicted are consistent with experiment. For example, the Iα/tg/NetA C5 and C6 chemical shifts differ from experiment by <1 ppm, whereas as those for the Iα/tg/NetB, gt, and gg conformations are in error by up to 4.2, 6.2 and 4.8 ppm, respectively. The δ13C4 value changes from 87.1 ppm in the tg conformation to 80.6 ppm in the gg conformation compared to the 87.9 and 88.7 ppm values of Sternberg et al. (2003). Consequently, our results strongly support the tg conformation.

The sensitivity of the C5 and C6 chemical shifts to hydroxymethyl group rotations was also tested by varying χ1 from 160 to 170°. NMR calculations were performed without any further energy relaxation. The 10° difference between the 160 and 170° conformations resulted in changes of 1.2 and 0.4 ppm for the C5 and C6 atoms, respectively. These results are consistent with Suzuki et al. (2009) who predicted that the C5 atom should be most sensitive to changes in χ angle in this range. This magnitude of change should be observable in the 13C NMR spectra but is within the uncertainty of our computational methodology, so we cannot use the calculated δ13C values to further refine the expected χ1 and χ2 torsion angles.

Discussion

The main purpose of this study was to test the ability of the Nishiyama et al. (2002, 2003) cellulose structures to reproduce the IR, Raman (Fig. 2), and 13C NMR (Fig. 4) spectra of Iα and Iβ cellulose. It is not possible to distinguish the correct structure based on our correlations of calculated and observed vibrational frequencies because many frequencies are similar in each model; furthermore, no model accurately reproduces all O-H stretching frequencies. However, based on the excellent correlations of calculated and observed δ13C values, the Nishiyama structures are confirmed as our results match observation within computational accuracy. The conformations of the hydroxymethyl groups and the H-bonding networks suggested in Nishiyama et al. (2002, 2003) resulted in better fits than the alternative structures in most cases.

Another goal was to evaluate the DFT-D2 methodology for modeling cellulose structure. Although improvements are possible, the structures produce good agreement with experimental observables such as lattice parameters, vibrational spectra and 13C NMR chemical shifts. This benchmarking step is significant as it allows one to trust DFT-D2 results when modeling cellulose behavior for cases that are not as well constrained experimentally (e.g., disordered cellulose and surface interactions). DFT-D2 can then be used to help interpret experimental data and to benchmark classical force fields.

Significant limitations in the present study should be addressed, however. First, due to computational constraints, the size of the simulation cell does not adequately test the possibility of twisting along the length of the cellulose chains. Larger scale, classical MD simulations have shown that twisting is an effect of model dimensions, so larger model structures should be examined in the future. Furthermore, the fully ordered 3-D periodic simulation cells constrain the possible structures obtained in energy minimizations. Models of finite cellulose fibers surrounded by vacuum, water, or other plant cell wall components would be useful. Finite cellulose fibers, such as those used in classical simulations (Matthews et al. 2006, 2012; Zhong et al. 2008) allow one to study the surface relaxation that likely occurs. This surface relaxation may be the cellulose component observed as “amorphous” or disordered cellulose (Harris et al. 2012), so including this possibility is critical. In addition, surface relaxation may induce changes in the prediction of bulk cellulose structure, so comparison with experimental observables could be more realistic in model systems of finite cellulose fibers. Lastly, results in this study were based on energy minimizations, so DFT-D2 MD simulations would be an important additional methodology. The inclusion of a finite temperature in the simulation would allow for a more direct comparison to experimental thermodynamics, structures and spectra.