Introduction

Enzyme reactions at interfaces are common both in Nature and industry1. About half of the enzymes in the living cell work at a membrane surface2 and many technical enzyme applications involve catalysis at the solid-liquid interface3. Examples of the latter include the use of immobilized enzymes in protein arrays or biosensors4, but more commonly, the activity of soluble enzymes on insoluble substrates such as polysaccharides, lipids, precipitated proteins5 or more recently plastic6. Studies of heterogeneous enzyme reactions have shown that substrate specificity7, turnover number8, and enzyme-substrate binding affinity9 can be significantly altered at an interface compared to analogous reactions in the bulk. Nevertheless, the kinetics of interfacial reactions is typically disregarded or fleetingly treated in textbooks10,11,12,13,14, and this state of affairs is quite different from conventional (non-biochemical) catalysis, where homogeneous and heterogeneous reactions are treated in parallel. Although insightful models and concepts of interfacial enzyme kinetics have been suggested15,16,17, no generally applied kinetic approach or rate equation currently exist. Neither is it clear whether progress in this field should be based on adaptation of conventional enzyme kinetic theory, or modifications of concepts and principles taken from inorganic heterogeneous catalysis.

Here we investigated heterogeneous enzyme catalysis using cellulases as a paradigm. These enzymes catalyze the hydrolysis of the β-1,4 glycosidic bond that links glucopyranose units in (insoluble) cellulose and constitute a generic and experimentally convenient example of interfacial enzymes. In addition, cellulases are of direct industrial interest since enzymatic conversion of lignocellulosic biomass into fermentable sugars (known as saccharification) is expected to play a key role in the upcoming biorefineries that produce fuels, chemicals, and materials from sustainable feedstocks18,19,20. We focused on fungal cellulases, which are commonly applied in industrial enzyme cocktails21, and investigated enzymes from Glycoside Hydrolase (GH) family 5, 6, 7, 12, and 4522. Specifically, we produced and biochemically characterized 83 enzymes using insoluble cellulose as substrate. The characterized cellulases included both, wild types and variants, and represented a wide range of structural and functional differences (see Fig. 1). Nevertheless, the kinetic data showed a clear common trait as we found a conspicuous scaling between the apparent Michaelis–Menten (MM) constant (KM) and the maximal turnover (kcat) across the entire group of cellulases. The scaling could be expressed as a so-called linear free energy relationship (LFER), and we used this to discuss functional plasticity and physical constraints for the enzymatic conversion of cellulose. We argue that the LFER for cellulases may facilitate both mechanistic and evolutionary studies, and act as guidance in future attempts to select or design improved technical enzymes. Moreover, the observed LFER is reminiscent of the behavior found for some well-described inorganic heterogeneous catalysts, and this may help to establish better theoretical frameworks for interfacial enzyme reactions.

Fig. 1: Structural representation of the different classes of cellulases characterized in this study.
figure 1

a Surface representation of the six different glycoside hydrolase (GH) families (exemplified by the PDB ID: 4C4C53 (GH7), 1QK255 (GH6), 1H8V56 (GH12), 4ENG58 (GH45), 3QR357 (GH5)). b Structure of two GH7 cellulases with different modes of action in complex with cellononaose. A cellobiohydrolase (CBH) with a tunnel-shaped catalytic domain (PDB: 4C4C) and an endoglucanase (EG) with an open catalytic cleft (PDB: 1EG154). c Illustration of a GH7 CBH in complex with a cellulose fiber. The enzyme is modular with a catalytic domain (CD) and a carbohydrate-binding module (CBM) connected by a flexible linker. All structures were visualized using PyMOL71.

Results

Enzyme production

The investigated enzymes were selected from five GH families as illustrated in Fig. 1 and Table 1. These families (GH5, GH6, GH7, GH12, and GH45) cover essentially all major fungal cellulases21 and hence represent a wide range of structures and mechanisms. This included enzymes with or without a carbohydrate-binding module (CBM), enzymes using an inverting or retaining mechanism, enzymes that attack the cellulose chain internally (endoglucanases, EGs) or at a chain end (cellobiohydrolases, CBHs) and enzymes with different degrees of processivity. In addition to the wild types, a library of cellulase variants was made with the intention of changing the enzyme-substrate binding strength. This library included variants with mutations in the CBM, linker, and catalytic domain, as well as variants, where the CBM and linker were added, removed or swapped. A full list of the enzymes characterized here, can be found in supplementary Table 4 in the supplementary information (SI).

Table 1 Fungal cellulases characterized in this study.

Kinetic analysis

All enzymes were characterized by MM kinetics using microcrystalline cellulose (Avicel PH-101) as substrate. Quasi-steady-state rates (vss) were measured at a constant, low enzyme concentration (E0) and different substrate loads (S0), and analyzed by the MM-equation (Eq. 1) using non-linear regression. The resulting kinetic parameters (KM and kcat) are listed in supplementary Table 4. Previous studies have identified practical procedures for measuring the quasi-steady-state rate for this type of system23 and shown that Eq. 1 is valid and applicable even though the substrate is solid and specified by its mass load (S0) in units of g/L24,25. The derived rates were based on soluble products only and control experiments (supplementary Table 3) showed that this was a good descriptor of the overall activity even for EGs.

$${v}_{{ss}}=\frac{{E}_{0}{k}_{{cat}}{S}_{0}}{{S}_{0}+{K}_{M}}$$
(1)

In Fig. 2, the natural logarithm of the derived kinetic parameters (KM and kcat) are plotted against each other for all investigated enzymes to illustrate the power-law correlation between the kinetic parameters. From the main panel (Fig. 2a), it appeared that most enzymes clustered in a narrow lane around the diagonal. Some enzymes were located below the diagonal, but we did not find any above. To assess whether the experimental points in Fig. 2 correlated with structural or functional properties of the studied enzymes, we highlighted specific sub-groups in the dataset in five separate subplots (Panels b–f, Fig. 2). Linear regression showed that the slope in Fig. 2 was 0.74 ± 0.02. Regression outliers were identified based on studentized residual analysis using a conservative cutoff of ±2.5σ. The outliers were omitted from the regression analysis and identified by open symbols in Fig. 2. A list of kinetic parameters for all investigated enzymes can be found in supplementary Table 4.

Fig. 2: Correlation plot of ln(KM) and ln(kcat).
figure 2

Correlation plot for all investigated enzymes (a). The smaller panels highlight data for different classes of enzymes. These are wild type cellulases (b), variants (c), cellobiohydrolases from GH7 (d), cellobiohydrolases from GH6 (e), and endoglucanases from family GH7, GH12, and GH45 (f). The solid line in all plots derives from linear regression to the experimental data in the main panel (a) excluding the outliers (open symbols) identified as explained in the main text. Bands shown in panel (a) are 95% confidence band (dark gray) and 95% prediction band (light gray) of the linear regression. Error bars in (a) represent standard deviations from MM-fit (Eq. 2) to triplicates.

Computational analysis

The strong correlation between ln(KM) and ln(kcat) shown in Fig. 2 is attractive from a computational point of view. If the apparent MM constant (KM) can be interpreted as a descriptor for the enzyme-substrate binding affinity it may open up for prediction of catalytic rates based solely on computed binding free energies. To test this hypothesis we computed cellulose binding strengths for a subset of nine enzymes from Fig. 2, using molecular dynamics (MD) simulations with umbrella sampling along the binding path (See further details in supplementary Fig. 8). We selected enzymes that spanned a wide range of KM values and represented all structural and functional classes listed in Fig. 1. For modular cellulases, the contribution of the CBM to binding energy ΔG°B was computed separately. To compare with experiments we used kcat and KM, to estimate changes in respectively transition-state free energy (ΔΔG) and standard free energy of ligand binding (ΔΔG°B) following well-established principles26,27,28. Specifically, we used the equations

$$\varDelta \varDelta {G}_{B}^{o}={RT}{\rm{ln}}\left(\frac{{K}_{M}}{{K}_{M,{ref}}}\right)$$
(2)
$$\,\varDelta \varDelta {G}^{\ddagger }=-{RT}{\rm{ln}}\left(\frac{{k}_{{cat}}}{{k}_{{cat},{ref}}}\right)$$
(3)

which introduce a reference enzyme with the kinetic parameters KM,ref and kcat,ref. Hence, the calculated free energies are energy changes relative to the selected reference. This approach alleviates ambiguities regarding standard states (Eq. 2) and pre-exponential factors (Eq. 3). We used the GH6 cellobiohydrolase from Trichoderma reseei (TrCel6A) as our reference enzyme and it follows that this enzyme will have ΔΔG = ΔΔG°B = 0.

The validity of Eq. 2 is dependent on whether KM can be interpreted as a descriptor of the enzyme-substrate affinity. The comparison in Fig. 3a showed that despite the diversity of the analyzed cellulases, computed changes in binding affinity, ΔΔGB,MD, scaled reasonably well with the experimental values, ΔΔGB,exp, derived from Eq. 2. This supports the validity of Eq. 2 for this system and the idea of using computed ligand-binding energies to predict catalytic rates. Figure 3b illustrate the scaling between ΔΔGB,MD and ΔΔGexp.

Fig. 3: Correlation of computed and experimental free energies for nine selected cellulases.
figure 3

a Changes in computed free energies of binding (ΔΔGB,MD) and experimental changes in binding free energy (ΔΔGB,exp). b Correlation of ΔΔGB,MD and experimental changes in activiation free energy (ΔΔGexp). Experimental free energies were calculated using Eqs. 2 and 3. The kinetic parameters (KM and kcat) of the nine cellulases can be found in supplementary Table 4. The selected cellulases covered a wide range of kinetic parameters shown in Fig. 2 and encompassed all main structural and functional traits specified in Fig. 1. Standard deviations of the experimental free energies and computed free energies are shown as error bars.

Discussion

In this study, we produced and kinetically characterized 83 enzymes covering essentially all classes of fungal cellulases (Table 1 and Fig. 1). We used the same expression host, to ensure the enzymes were exposed to the same apparatus of post-translational modifications. Moreover, kinetic characterizations were based on the same substrate, experimental conditions, and principles of analysis. This provided a robust basis for comparative analyses of interfacial enzymes in general and cellulases in particular. Indeed, the breadth of the dataset allowed us to identify a striking correlation between ln(KM) and ln(kcat) and in the following we discuss the origin and corollaries of this observation.

Enzyme fitness and physical constraints

Figure 2 may be seen as a fitness landscape for cellulases attacking their native insoluble substrate, and it appears that most enzymes accumulated around the diagonal. The diagonal defines a continuum ranging from enzymes with weak substrate interactions and rapid turnover (high KM and kcat), to enzymes with stronger interactions, but slower turnover (low KM and kcat). The tendency to accumulate along the diagonal was observed for all types of cellulases (refer to Table 1 and Fig. 1), and hence does not seem to rely on specific structural or mechanistic properties. Rather, it appears that the maximal turnover can be expressed solely by one descriptor, namely KM. The area below the diagonal in Fig. 2 represents a region where the enzymes have a low specificity constant (i.e., low kcat/KM), and this seems to signify inefficient catalysis. We found some enzymes in this range, including some wild type enzymes and variants with replacements of key amino acid residues. We suggest that this southeastern region of the fitness landscape represents enzymes that have been either catalytically impaired by our engineering, are structurally unstable under the selected conditions, or have other primary substrate preference than cellulose.

On the other hand, the region above the diagonal in Fig. 2, specifies enzymes, which have a high specificity constant on this substrate. This clearly appears functionally advantageous, but we did not find any cellulases in this northwestern region. We suggest that this absence is the result of basic physical restrictions of the cellulolytic process. It follows that the accumulation of data points in a narrow lane in Fig. 2 may be seen as a balance between evolutionary selection, which drives the kinetic parameters toward the northwest, and physical constraints, which prevents this development beyond the boundary defined by the line in Fig. 2.

The engineered variants in Fig. 2b represent a range of replacements and deletions at different positions (see supplementary Table 4), which were designed with the overall purpose of altering ligand-binding strength. In a few cases, the mutations shifted the variants into the southeastern “wasteland” of the fitness landscape, but most remained on the diagonal. The tendency to stay on the line did not reflect that the variants had unaltered kinetic parameters. Rather, changes in KM and kcat tended to compensate. Some examples of this are highlighted in Fig. 4, and it appears that both point mutations, and extensive changes in the amino acid sequence, readily moved kinetic parameters up or down the diagonal, but rarely sent them off the line. Interestingly, the vast majority of the variants moved up the line compared to their respective wild type, and only in cases where a CBM was added to a CBM-less wild type (Fig. 4a) did the variant move down the line toward lower KM and kcat values. This indicates that the wild-type enzymes have evolved to have high affinity for the substrate rather than high turnover. Nonetheless, the differences in affinities across GH families may be important in Nature where cellulose is degraded by cellulases from multiple GH families.

Fig. 4: Illustration of the effect of the non-catalytic CBM (a) and tryptophan residues in the catalytic domain of cellobiohydrolases from GH6 and GH7 (b).
figure 4

a Correlation plot of ln(KM) and ln(kcat) for three wild-type CBHs and three variants, where the CBM was either removed (−CBM) from the wild-type (TrCel7A → TrCel7ACD, TrCel6A → TrCel6ACD) or added (+CBM) to the wild type (ReCel7A → ReCel7ACBM). b Analogous correlation plot for replacements of conserved tryptophan residues by alanine in the catalytic domain of TrCel7A (TrCel7A → TrCel7AW38A) or TrCel6A (TrCel6A → TrCel6AW269A). The solid line shown in both plots is the same as in Fig. 2. It appears that changes in KM and kcat tend to compensate so that all enzymes remain close to the diagonal. Inserts are illustrations to guide the reader about the structural changes in the variants. Error bars represent standard deviations from MM-fit (Eq. 2) to triplicates.

Origin of physical constraint

Correlations between binding and activation free energies are well-known in both organic and inorganic catalysis29, but have only been sporadically used for (homogenous) enzyme reactions30,31. A LFER exists, if the binding free energy, ΔG°B, scales linearly with the free energy of activation, ΔG. This is tantamount to proportionality between the changes in these two free energies, and we may write

$$\Delta \Delta {G}^{\ddagger }=\Phi \Delta \Delta {G}_{B}^{o}$$
(4)

where Φ is a scaling constant that convert changes in the binding free energy (\(\Delta \Delta {G}_{B}^{o}\)) to changes in activation free energy (ΔΔG). The correlation shown in Fig. 2 may be interpreted as an LFER if the KM values can be interpreted as a dissociation constant for the enzyme-substrate complex. In general one has to be cautious when using the (apparent) KM value as affinity descriptor for complex enzyme reactions such as the one studied here. However, such interpretation of KM has been successfully used earlier26,27,28 and it is also in line with the MD results (Fig. 3a) that showed good correlations between computed ligand-binding energies and experimental binding energies calculated using Eq. 2. The validity of KM as a descriptor of the enzyme-substrate affinity of the investigated enzymes is further discussed in the SI (see supplementary note 1 and 2).

Using Eqs. 2 and 3 we calculated ΔΔG°B and ΔΔG and found that the two free energies correlated with a slope of Φ = −0.74 ± 0.02 (see supplementary Fig. 9). This is the same slope as found for the line in Fig. 2 but with opposite sign due to the minus in Eq. 3 (e.g., low activation energies gives high kcat values). The scaling constant, Φ in Eq. 4 provides some information about the nature of the transition state (TS), and this idea has been used, for example, to elucidate the TS of protein folding32. As proposed by Warshel27, the Φ -value also provides a means to classify effects of mutations on enzyme function. If, for example, both the enzyme-substrate complex and the TS in a variant are stabilized to the same extent (so-called uniform binding, see Figs. 5b–1) Φ would be 0 since the activation energy would remain unchanged (i.e., and ΔΔG = 0). Another illustrative case is when changes in interactions only manifest themselves in the TS (so-called TS-stabilization, Figs. 5b–2). This results in Φ → ∝ since the activation energy can be changed independently of the binding energy. Finally, if mutations only act to stabilize the ground state complex (GS stabilization, Figs. 5b–3), ΔΔG will change commensurate with ΔΔG°B, and Φ = −1.

Fig. 5: Structural and energetic interpretation of a simplified reaction scheme for the enzyme-catalyzed hydrolysis of cellulose.
figure 5

a Simplified reaction scheme for a cellulase (yellow) hydrolyzing insoluble cellulose (gray). The cartoon provides a structural interpretation of the three steps in the overall reaction; (1) association, (2) hydrolysis, and (3) dissociation. b Schematic energy-diagrams for a wild type (black curve) and three conceptually different variants (red curves). c Expected scaling plots for a group of variants that behave according to the three different energy-diagrams shown in (b). If the energy of the variant differs from the wild type by the same amount in both transition state (TS) and ground state (GS), we have so-called uniform binding and Φ = 0 (panel b1 and c1). The parallel shift in energies for uniform binding implies that the same interactions occur in GS and TS. If, on the other hand, a mutation only lowers the TS energy, known as TS-stabilization, this leads to a vertical line in the scaling plot (panel b2 and c2). Finally, in GS-stabilization (panels B3 and c3), only the GS energy changes, while the TS remains fixed. In this case, Φ = −1, and this is close to the experimental observation (see supplementary Fig. 9).

This interpretation of Φ -values was developed to classify mutants that were closely related in structure, but in the current context it may elucidate differences across cellulases (wild types and mutants) with widely different structures and mechanisms. We found Φ = −0.74 ± 0.02 (see supplementary Fig. 9), and it follows that kinetic differences among the investigated cellulases can be mostly ascribed to differences in the degree of GS stabilization. This has the noticeable consequence that the free energy of the (rate-limiting) TS is quite similar for all tested enzymes, and that the main kinetic diversity lies in different affinities for the substrate. This is illustrated in Figs. 5b–3, which shows that tighter binding to the substrate (red trace) unavoidably leads to a higher activation energy if the TS is (almost) fixed. Experimental studies have suggested that the rate-limiting step for some cellulases is slow dissociation33,34,35,36,37. Since weaker binding is associated with a lower activation barrier for dissociation (Figs. 5b–3), a dissociation limited mechanism would explain the inverse correlation of binding strength and maximal turnover. Based on these considerations it is tempting to suggest that weak ligand-binding is a functional advantage since it invariably increases kcat. However, mutational studies suggest that weak binding is not necessarily advantageous for the efficacy of GHs attacking solid carbohydrates38,39. The characterized variants support this interpretation, since most of the variants moved up the line in Fig. 2 compared to the respective wild type, indicating that the wild types were optimized for high affinity. Strong ligand binding may be needed in order for the enzyme to transfer a cellulose chain from the cellulose surface, where it is strongly bound40,41, to the binding cleft (see cartoon in Fig. 5a). Hence, strong ligand binding appears to benefit catalysis by promoting ligand transfer42, but it is inevitably associated with a slow turnover of an off-rate controlled reaction, as illustrated in Figs. 5b–3. We suggest that the LFER between the binding energy and activation energy, is a direct consequence of the overall reaction being controlled by the on-off kinetics of the cellulases (see supplementary Fig. 6). The existence of LFERs for enzyme reactions governed by the chemical step remains to be investigated further, but meta-analyses of kinetic databases show little correlation between kcat and KM26,28,43. This is unlike many reactions in both homogenous and heterogeneous (non-biochemical) catalysis, which may be limited by an LFER even though the reaction is governed by a chemical step44,45. Kinetic parameters for heterogeneous enzyme reactions are scarce. Thus, it is still an open question, whether scaling relations are as common in heterogeneous biocatalysis as they are in inorganic heterogeneous catalysis46, but the current study shows that cellulases are severely restricted by an LFER.

Consequences of the scaling relationship

One aspect of the proposed scaling of KM and kcat is that the initial rate, vss (Eq. 1), may be approximated by just one of the kinetic parameters. To illustrate this, we combined Eqs. 24 and solved for kcat.

$${k}_{{cat}}=A{K}_{M}^{a}$$
(5)

In Eq. 5 a = −Φ and \(A=\frac{{k}_{{cat},{ref}}}{{{K}_{M,{ref}}}^{-\varPhi }\,}\). Inserting Eq. 5 into the MM-equation (Eq. 1) expresses vss as a function of KM

$${v}_{{ss}}=\frac{{E}_{0}A{K}_{M}^{a}{S}_{0}}{{S}_{0}+{K}_{M}}$$
(6)

Equation 6 underscores, how ligand affinity is a double-edged sword. Hence, as demonstrated in the SI (supplementary note 3), Eq. 6 has a global maximum when KM attains the value

$${K}_{M,{opt}}={S}_{0}\frac{a}{1-a}$$
(7)

This implies that at a fixed load of substrate, S0, a cellulase with low KM (i.e., KM < KM,opt) will become a better catalyst (increase vss) if it is engineered for weaker substrate binding. Conversely, weakly binding enzymes (KM > KM,opt) will gain from tighter binding. In the current case, a = 0.74 and insertion into Eq. (7) shows that KM,opt = 2.8 S0. In other words, the fastest initial rate on the current substrate (Avicel) will be observed for a cellulase that has a KM value that is around threefold higher than the Avicel load. To illustrate this, we plotted vss as a function of KM for all of the investigated enzymes (excluding outliers identified in Fig. 2) at different substrate loads (Fig. 6). The results are in line with a previous observation47 showing the so-called volcano plots, where cellulase activity tapers off on each side of the optimal affinity. Such volcano plots mirror the Sabatier principle, which states that the catalytic efficacy is optimal for a catalyst with intermediate binding strength48. Higher/lower affinity leads to a situation where dissociation/association limits the overall rate. The optimal affinity, KM,opt, depends on the substrate load and this is indicated by the black symbols in Fig. 6, which were calculated using Eq. 7. We emphasize that the appearance of an optimal KM is a direct consequence of the LFER, and that this type of analysis is well-established within (non-biochemical) heterogeneous catalysis46,49.

Fig. 6: Volcano plots for five different substrate loads.
figure 6

The specific rate at five different substrate loads is plotted as a function of KM for the investigated enzymes (excluding outliers identified in Fig. 2). Points represent experimental data and solid lines are the predicted volcano curves calculated using Eq. 6 and a = 0.74 (there are no free parameters in the determination of these solid curves). Black squares represent KM,opt values calculated using Eq. 7, and these points identify the maxima of the volcano plots at a given load of substrate.

As a final example of an application, we note that the LFER may be useful in computational selection and design of enzymes for technical use. Thus, a link between activity and affinity provides an important simplification as it converts the highly complex problem of in silico assessment of enzyme turnover frequency to the more tractable challenge of calculating binding energy. To illustrate this, we computationally assessed the strength of enzyme-substrate interactions for a subset of nine enzymes spread along the diagonal in Fig. 2. As shown in Fig. 3a, the computed binding energies scaled with the experimental values. These results suggest that the kinetic properties of novel, uncharacterized enzymes may be estimated by combining computed binding data with an experimental LFER based on a limited number of enzymes. Hence, efficient enzymes for a given set of experimental conditions could be identified through in silico screening.

In closing, the kinetic characterization of a wide group of fungal cellulases on their native, insoluble substrate revealed a LFER between substrate binding and activation barrier. We propose that this reflects basic physical restrictions of the hydrolytic reaction, which limits the evolutionary selection to a narrow lane around the scaling line, irrespectively of the enzymes’ fold, modularity, or catalytic mechanism. The scatter around the proposed scaling line in Fig. 2 corresponds to a factor of about 2 in the value of kcat. Hence, our results suggested that experimental kcat values for enzymes with approximately the same KM varied within this range. This variance encompassed a minor contribution from experimental errors, but it may also reflect kinetic diversity that results from differences in the mechanism and specificity of the tested enzymes. However, when we zoomed out and considered a broad range of KM values, this variance was modest, and the fitness landscape was dominated by a common scaling for all enzymes. Comparisons of wild types and variants revealed that small alterations in sequence (even point mutations) could lead to significant kinetic changes. In most cases, however, the changes involved a stringent movement on the scaling line rather than a shift away from the line, and this further demonstrated a strong coupling between affinity and turnover. We propose that this behavior is linked to the interfacial nature of the reaction. On one hand, strong ligand interactions are required to enable the transfer of a cellulose chain from the cellulose surface to the enzyme complex. On the other, a highly stable enzyme-substrate complex is inescapably associated with slow turnover (Figs. 5b–3). These relationships may help rationalize cellulolytic mechanisms and guide the selection of technical enzymes. It also appears that LFERs for interfacial enzyme reactions may establish a connection to (inorganic) heterogeneous catalysis, and hence pave the way for the use of practices and principles from this field within enzymology.

Methods

Enzymes and kinetic measurements

Experimental methods used in this work have been described elsewhere (see supplementary Table 4). Briefly, we expressed all enzymes heterologously in Aspergillus oryzae and purified as described elsewhere50,51. Engineered enzymes containing single or multiple amino acid substitutions, deletions or insertions was made using splicing overlap extension (SOE) PCR or by expression vector50. A full list of primers can be found in supplementary Table 5. For variants with added CBM, gBlocks Gene Fragments was ordered from Integrated DNA Technologies (IDT) overhang of 24 bp for SOE. SDS-PAGE gels (15-well NuPAGE 4–12% BisTris, GE Healthcare) revealed a single band for the purified enzymes, and their concentrations were determined by UV absorbance at 280 nm using a theoretical extinction coefficient calculated based on amino acid sequence52. Michaelis–Menten (MM) curves were obtained as described previously51 using 0.1 µM enzyme and microcrystalline cellulose (Avicel PH-101, Sigma-Aldrich, St. Louis, MO) load ranging from 1 to 100 g/L. MM curves were fitted to Eq. 1 in Origin pro v. 7. All experiments were done in triplicates at 25 °C using standard buffer 50 mM sodium acetate pH 5.0.

Molecular dynamics simulation

For simplicity, the modular structures of the different cellulases were split into two simulations. The CDs were simulated in complex with a cellononaose ligand and the CBMs (if present) were simulated bound to a cellulose crystal.

Simulations of the catalytic domains

If available, the structures were taken from the Protein Data Bank (TrCel7A: 4C4C53, TrCel7B: 1EG154, TrCel6A: 1QK255, TrCel12A: 1H8V56, TrCel45A: 3QR357, HiCel45A: 4ENG58, Re7A: 3PL3). The ligand was inserted by alignment, if a related structure with a similar large disaccharide was available. Elsewise, docking with Autodock Vina was performed59. The ten clusters lowest in energy were inspected and the lowest energy configuration from the cluster with the closest distance between the catalytic residues and the glycosidic bond of interest was taken. The CHARMM36 force field was used to describe the system60. All simulations were run in GROMACS 2018.661. Catalytic acids of all CDs were protonated. GROMACS was used to construct a cuboid box with edge lengths of 9.4 × 9.4 × 20 nm and the complexes were positioned at 4.7, 4.7, and 3.3 nm. The complexes were rotated so that the center of mass of the last and the fourth last sugar unit of the ligands were parallel to the z-axis. The systems were solvated with TIP3P water. To neutralize the net charges of the systems, random water molecules were exchanged with sodium ions. Minimization was conducted in a steepest-descent over 10’000 iterations. All subsequent simulations were performed at 300 K. NVT-simulations were performed for 100 ps while keeping the complex restraint. Thereafter, NPT-simulations with restraints on the solutes were performed for 100 ps. For all further simulations, only Cα further away than 1.5 nm from the ligand were restrained. A second round of NPT-simulations with the new restraints were performed for 100 ps. RMSD analysis of the protein backbone showed, that this time was sufficient to reach an equilibrated state. Thereafter, steered MD simulations were done over 800 ps with a pulling rate of 0.01 nm/ps and a force constant of 1000 kJ/mol/nm2. The pull was performed on the first sugar unit of the cellononaose ligand in z direction. The resulting trajectories were used to prepare further simulations. Frames every travelled 0.5 Å by the ligand were extracted up to a final distance of 1 nm between the CD and the ligand. The extracted frames were used as starting configuration for Umbrella sampling simulation along the binding path. Each window was simulated for 620 ps, where the first 20 ps were disregarded as equilibration. It should be noted, that TrCel6A works from the opposite end compared to the other cellulases21. The set-up was adapted accordingly.

Simulations of the carbohydrate-binding modules

If available, the structures were taken from the Protein Data Bank (CBM1 of TrCel7A: 2CBH, CBM1 of TrCel7B: 4BMF). Otherwise, they were prepared through homology modelling by Modeller62 (CBM1 of TrCel6A, CBM1 of TrCel5A, CBM1 of HiCel45A). A cellulose crystal of the type Iβ with a length of 5, a width of 6, and a depth of 3 unit cells was generated with the Cellulose Builder web server63. The CBMs were placed on the surface according to Beckham, et al64. A cubic box with a minimal distance of 1.0 nm was constructed. The crystal plane was oriented perpendicular to the z-axis. The simulations were performed in a similar fashion as the ones for the CD domains. However, the heavy atoms of the crystals were kept constrained after the energy minimization and the second NPT-simulation was increased to 1 ns to get the CBM settled on the crystal surface.

Analysis

Analysis of the trajectories was performed with GROMACS. The weighted histogram analysis method (WHAM) was applied to analyze the Umbrella sampling simulations along the binding path65. If density gaps occurred, additional windows at those distances were inserted iteratively until no gaps occurred. From the resulting PFM curves, the energy difference between the minimum and the maximum of those curves were taken. The errors were estimated with bootstrapping. Obtained ΔGB,MD values from the CD and CBM part were added up to give values for the full enzyme. The energies were normalized by the values from the reference enzyme TrCel6A. This resulted in ΔΔGB,MD values, which are more readily comparable to the experimental ΔΔGB,Exp. Linear regressions of the experimental binding energy and experimental activation energy against the computed binding energy were performed. The former resulted in a linear fit in the form of y = 0.16x + 0.2 and with a Pearson’s coefficient r2 = 0.93 and the later resulted in y = 0.13x + 0.67 with r2 = 0.81. To counteract this known systematic overestimation issues of the method66,67,68 and of the carbohydrate binding in general69,70, a linear transformation on the initially obtained computed binding energies was performed using the parameters from the linear regressions. The final results for the prediction of the binding energies had a root-mean-squared error (RMSE) of 0.86 kJ/mol, the ones for the prediction of the activation energy had RMSE of 1.20 kJ/mol.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.