Insights into the mechanism(s) of digestion of crystalline cellulose by plant class C GH9 endoglucanases

Kundu, Siddhartha

doi:10.1007/s00894-019-4133-1

Insights into the mechanism(s) of digestion of crystalline cellulose by plant class C GH9 endoglucanases

Original Paper
Open access
Published: 23 July 2019

Volume 25, article number 240, (2019)
Cite this article

Download PDF

You have full access to this open access article

Journal of Molecular Modeling Aims and scope Submit manuscript

Insights into the mechanism(s) of digestion of crystalline cellulose by plant class C GH9 endoglucanases

Download PDF

Siddhartha Kundu ORCID: orcid.org/0000-0003-3962-776X¹

1270 Accesses
6 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 27 July 2020

This article has been updated

Abstract

Biofuels such as γ-valerolactone, bioethanol, and biodiesel are derived from potentially fermentable cellulose and vegetable oils. Plant class C GH9 endoglucanases are CBM49-encompassing hydrolases that cleave the β (1 → 4) glycosidic linkage of contiguous D-glucopyranose residues of crystalline cellulose. Here, I analyse 3D-homology models of characterised and putative class C enzymes to glean insights into the contribution of the GH9, linker, and CBM49 to the mechanism(s) of crystalline cellulose digestion. Crystalline cellulose may be accommodated in a surface groove which is imperfectly bounded by the GH9_CBM49, GH9_linker, and linker_CBM49 surfaces and thence digested in a solvent accessible subsurface cavity. The physical dimensions and distortions thereof, of the groove, are mediated in part by the bulky side chains of aromatic amino acids that comprise it and may also result in a strained geometry of the bound cellulose polymer. These data along with an almost complete absence of measurable cavities, along with poorly conserved, hydrophobic, and heterogeneous amino acid composition, increased atomic motion of the CBM49_linker junction, and docking experiements with ligands of lower degrees of polymerization suggests a modulatory rather than direct role for CBM49 in catalysis. Crystalline cellulose is the de facto substrate for CBM-containing plant and non-plant GH9 enzymes, a finding supported by exceptional sequence- and structural-homology. However, despite the implied similarity in general acid-base catalysis of crystalline cellulose, this study also highlights qualitative differences in substrate binding and glycosidic bond cleavage amongst class C members. Results presented may aid the development of novel plant-based GH9 endoglucanases that could extract and utilise potential fermentable carbohydrates from biomass.

The GH10 and GH48 dual-functional catalytic domains from a multimodular glycoside hydrolase synergize in hydrolyzing both cellulose and xylan

Article Open access 03 December 2019

Comparative insights into the saccharification potentials of a relatively unexplored but robust Penicillium funiculosum glycoside hydrolase 7 cellobiohydrolase

Article Open access 20 March 2017

Natural diversity of cellulases, xylanases, and chitinases in bacteria

Article Open access 29 June 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The microfibrillar structure of cellulose is constituted and strengthened by islands of hydrogen-bonded inter-glucan chains. These microcrystalline regions (I_α, I_β) render cellulose chemically inert and recalcitrant to most physical stressors, an attribute that is desirable to land plants (xylem, phloem), sporulating bacteria and fungi, and quorum sensing by microbial biofilms [1,2,3,4,5,6,7,8]. Most organisms (bacteria, fungi, protists) possess enzymes (oxidoreductases, EC 1.x.y.z; transferases, EC 2.x.y.z; hydrolases, EC 3.x.y.z) that can cleave cellulose into physiologically relevant oligo- and mono-saccharides (RXNs 1–3) [2, 9,10,11,12,13,14,15,16].

$$ \left\{{C}_2,{C}_{2+}\right\}+\left\{{e}^{-};2{e}^{-}\right\}\rightleftharpoons \left\{{C}_2=O,{C}_{2+}=O\right\} $$

(RXN 1)

$$ {C}_n+{H}_iP{O}_4\leftrightharpoons C- OP{O}_3+{C}_{n-1} $$

(RXN 2)

$$ {C}_n+{kH}_2O\leftrightharpoons +\left({m}_1\right)\left({C}_{n-i}\right)+{kH}_2O\leftrightharpoons \left({m}_2\right)\left({C}_{2-5}\right)+{kH}_2O\leftrightharpoons \left({m}_n\right)\left({C}_1\right) $$

(RXN 3)

$$ {\displaystyle \begin{array}{ccc}{C}_n& := & \mathrm{Glucan}\\ {}C& := & D\left(\alpha \right)-\mathrm{glucopyranose}\ \mathrm{phosphate}\\ {}i& \in & \left\{1,2,3\right\}\\ {}{C}_2& := & \mathrm{Cellulose}\ \mathrm{with}\ \mathrm{degree}\ \mathrm{of}\ \mathrm{polymerization}\ \left( DP=2\right)\\ {}{C}_{2+}& := & \mathrm{Cellulose}\ \mathrm{with}\ \mathrm{degree}\ \mathrm{of}\ \mathrm{polymerization}\ \left( DP>2\right)\\ {}{C}_2=O& := & \mathrm{Lactone}\ \mathrm{form}\ \mathrm{of}\ {C}_2\\ {}{C}_{2+}=O& := & \mathrm{Lactone}\ \mathrm{form}\ \mathrm{of}\ {C}_{2+}\\ {}{C}_{n-i}& := & \mathrm{Shorter}\ \mathrm{chain}\ \mathrm{glucans}\\ {}{C}_{2-5}& := & \mathrm{Oligosacchrides}\ \left( DP\in \left\{2,3,4,5\right\}\right)\ \mathrm{of}\ \beta (D)-\mathrm{glucopyranose}\\ {}{C}_1& := & \mathrm{Monosaccharide}\ \mathrm{of}\ \beta (D)-\mathrm{glucopyranose}\\ {}{m}_j& := & \mathrm{Stoichiometry}\ \mathrm{of}\ \mathrm{short}\ \mathrm{chain}\ \mathrm{glucans}\ \left({m}_1<{m}_2<{m}_3<\dots ..{m}_n\right)\end{array}} $$

Glycoside hydrolase 9 (GH9) endoglucanases (EC 3.2.1.4) hydrolytically cleave the β (1 → 4)-glycoside linkage between contiguous (D)-glucopyranose residues and accomplish this with the aid of one or more carbohydrate binding modules (CBMs). Detailed phylogenetics analysis and molecular dating has shown that GH9 (≅480 AA) is very well conserved amongst taxa and has been so for ≈3000 Mya [8, 17]. The presence of active site residues in GH9 further imply that catalysis of crystalline cellulose proceeds by a relatively unchanged generic acid-base mechanism and may deploy aspartic (D) and/or glutamic (E) acids as alternating proton donors/acceptors. The arrangement of these, i.e. {EE, DD, DE, ED}, may then dictate the position of the −OH at the hemiacetal/acetal carbon (anomeric carbon; {C1, C2}) of the oligosaccharide products thereby retaining or inverting the configuration of the parent compound [18].

Carbohydrate-binding modules (CBMs) or carbohydrate-binding domains (CBDs) form distinct subsequences in eukaryotes (plants, CBM49; yeast, CBM54), protists (Dictyostelium discoideum, CBM8), fungi (CBM1), and bacteria (CBMs 2-4) [8, 17, 18]. Most CBMs are separated by linkers (<100 AA) from the GH domain(s) and vary in length (≈40 − 200 AA), number, position (N-, C-termini, central), substrate affinity, and contribution to catalysis [8, 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]. For example, GH9 endoglucanases from vascular land plants possess a unique subpopulation of CBM49-encompassing crystalline cellulose-digesting enzymes (class C) in addition to the amorphous cellulose cleaving subsets (classes A and B) [17, 18, 42,43,44]. The presence of one or more CBMs may also extend the range of substrates of GH9 enzymes to include complex heteropolymeric moieties (chitin, CBM5, 12, 14, 18, 33; polygalactouronic acid, CBM32; lipopolysaccharide/lipoteichoic acid, CBM39) [8, 17, 19, 35,36,37,38,39,40,41]. The precise mechanism(s) by which CBM-mediated catalysis proceeds is(are) debatable with several plausible explanations for the observed kinetic data [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]. Most CBMs possess non-contiguous aromatic amino acids (tryptophan/phenylalanine/tyrosine) interspersed with amino acids with shorter side chains. These could result in concomitant and non-uniform interactions with the glycosidic linkage(s) and consecutive cycles of stretching and relaxation. This mechanism favours the introduction of strain with consequent weakening of the glycosidic linkage [33, 34, 45,46,47]. Alternatively, there are reports that polar amino acids (serine/threonine/cysteine) could form complexes with calcium (CBM35, 36, 60) which, even in the absence of an overt CBM may mediate cleavage [48,49,50].

Extant structures of non-plant GH9 enzymes suggest that crystalline cellulose may be digested in subtle fully enclosed tunnels (processive), or in larger, open solvent accessible grooves/clefts (non-processive), although a mixed mode is likely to prevail in most enzymes [51,52,53,54,55,56,57,58,59,60]. The binding site(s) are labelled as plus (substrate, entrance) and minus (product, exit) sites with hydrolytic cleavage occurring between the +1 and −1 sites [51,52,53,54,55,56,57]. The length of the tunnel itself (≈50 Ang) is consistent amongst other GH9 enzymes and consists of about ten subsites (−7 to + 2), where amino acids make contact with the glucan chain [51,52,53,54,55,56,57]. Further insights into the mechanistic contributions of GH9, linker, and/or the CBMs may be gleaned from the X-ray structures of enzymes in complex with simple (DP < 9; DP = {2, 3, 5}) or complex (DP = 10; −SH) oligosaccharides [58,59,60]. For example, GH9 and CBM3 are distinct spatial entities (Cel9G, Clostridium cellulolyticum; CelE4, Thermomonospora fusca) with an interaction surface that comprises a network of hydrogen-bonded residues [59, 60]. However, in the absence of an active enzyme substrate (ES) complex (DP ≥ 6), the manner in which polymeric crystalline cellulose is processed by GH9 enzymes is not known [59]. Interestingly, the authors also report an inter-dependence or quasi-allostericity of the GH9 and CMBs in binding crystalline cellulose, a substrate-binding groove that is lined with polar and aromatic acid residues, and the possibility of a polyfunctional CelE4 with exo- and endo-glucanase activities [59, 60]. Crystalline cellulose is the cognate substrate for GH9 endoglucanases in non-plant taxa such as bacteria, archaea, fungi, protists, and arthropods, and may predate plant GH9 enzymes by several millions of years [8]. This, when combined with the similarity between the GH9 domains, suggests that the active site architecture of plant class C enzymes and subsequent reaction chemistry may be similar [8, 51, 52]. Whilst, the data generated vide supra is able to offer insights into the origin and evolution of plant class C enzymes, mechanistic details of the same are fundamental to comprehending the precise manner in which catalysis of crystalline cellulose may proceed. Here, I analyse homology models of putative and characterised plant class C sequences, i.e. with a single wel-defined CBM49 subsequence, to classify and infer the contribution(s) of the GH9, CBM49, and linker to the catalysis of crystalline cellulose.

Methods

Model generation, geometry optimization, equilibration, and MD of class C enzymes

A generic protocol to assess the contribution(s) of GH9, linker, and CBM49 has been outlined (Fig. 1). Laboratory-characterised full length (FL) and truncated (T) class C sequences (x) from Oryza sativa (Q5NAT0), Gossypium hirsutum (Q8LJP6), Nicotiana tabacum (Q93WY9), Solanum lycopersicum (Q9ZSP9), i.e. x(FL) = x_FL = GH9 ∪ L ∪ CBM49; x(T) = x_T = GH9 ∪ L; x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9}, along with full-length putative class C sequences (n = 92) identified in previous work were submitted to Phyre2 (www.sbg.bio.ic.ac.uk/phyre2) [8, 18, 61]. The templates were graded in terms of the root mean squared deviation (rmsd) of their Cα-backbones from the predicted model, presence of an extant homologous structure (confidence), proportion of the sequence modelled (coverage), and sequence identity.

The LeAP module of AMBERTOOLS v17.0 was used to explicitly add water molecules (TIP3P) to the 3D models of characterised class C enzymes (n = 4; x_FL, x_T) and render the modelled structures electrically neutral ({Na⁺, Cl⁻} ≥ 1) (Fig. 1) [62]. The models were optimised by minimizing their computed energies in a bi-phasic (n_min1 = n_min2 = 5000) implementation of the steepest descent algorithm with (100 Kcal mol⁻Ang²) and without positional restraints for the amino acids (Fig. 1, Table 1). The minimised models $ \left({x}_{FL_{\mathrm{min}}},{x}_{T_{\mathrm{min}}}\right) $ were utilised for comparative analyses to ascertain the significance and relevance of CBM49 to the structural integrity of the protein. Full length minimised structures were perturbed (Temp : 0.0K → 300.0K; constant volume; 20 ps) with low energy (10.0 Kcal mol⁻¹Ang²) positional restraints for the amino acids, which was followed by an unrestrained (Temp = 300.0K; constant pressure; 100 ps) and a production grade run (40.1 ns) MD run with NAMD v2.13 (nanoscale molecular dynamics) and VMD v1.9.3 (visual molecular dynamics; configuration files) (Fig. 1, Table 1) [63, 64]. These models, i.e. $ {x}_{FL_{40.1 ns}};x\in \left\{Q5 NAT0,Q8 LJP6,Q93 WY9,Q9 ZSP9\right\} $, were used to infer active site architecture, perform docking experiments, and identifying structural homologues of selected characterised class C enzymes (Fig. 1, Table 1).

Table 1 Parameters for minimizing, equilibrating, and simulating 3D structures of characterised class C GH9 endoglucanases

Full size table

Invariant core analysis of characterised and putative class C enzymes

The invariant core is a measure of inferring structural variation from the xyz coordinates of aligned atoms of amino acids at specific site(s) and was utilised to assess the conservation of GH9, linker, and CBM49. This was accomplished by generating multiple sequence alignments (MSA) with a standalone version of multiple sequence alignment by computing log-expectation (MUSCLE; http://drive5.com/muscle) in association with the R-package Bio3D (http://thegrantlab.org/bio3d) and with scripts developed in house (Fig. 1) [65,66,67]. The volume of the invariant core was then iteratively computed and is defined as the least volume (V < 1.0 Ang³) from all volumes of arbitrary ellipsoids (V ≥ 1.00 Ang³). Here, an ellipsoid comprises the variance of eigenvalues along its three principle axes of the atomic xyz coordinates of amino acid(s) at every aligned position of the combined and ungapped MSA, whilst its volume represents the structural variation at the given position(s) [67,68,69,70]. Although Alanine is not the most hydrophobic amino acid (kdH_Ala < kdH_Met < kdH_Cys < kdH_Phe < kdH_Leu < kdH_Val < kdH_Ile; kdH ≔ Kyte Doolittle Hydrophobicity index), its non-bulky and unbranched side chain renders it an excellent index of invariance of a given structure. Since truncating the proteins might be expected to dramatically alter the behaviour of the GH9 of the 3D models, a corrected subset (O. sativa, #AA = 456; N. tabacum, #AA = 466; G. hirsutum, #AA = 464; S. lycopersicum, #AA = 476) that comprised matched residues of full length proteins was used $ \left(x\left({cFL}_{\mathrm{min}}\right)={x}_{cFL_{\mathrm{min}}}\right) $, i.e.

$$ {x}_{cF{L}_{\mathrm{min}}}={x}_{F{L}_{\mathrm{min}}}-\left({x}_{F{L}_{\mathrm{min}}}-{x}_{T_{\mathrm{min}}}\right) $$

(1)

for comparative analyses $ \left({x}_{cFL_{\mathrm{min}}}\ vs\ {x}_{T_{\mathrm{min}}}\right) $ where x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9}. Since, the number of characterised class C enzymes was small (n = 4), a larger MSA, which included 3D models of putative class C enzymes (n = 92) was generated. The eigenvalues of the lowest invariant core (0 < V(Ang) ≤ 1.0) were then investigated with principal component analysis (PCA), which in turn was used to cluster and identify structural homologues of characterised class C enzymes. The aligned models were thence utilised to infer plausible active-site architecture(s) of plant class C enzymes.

Structural analysis of 3D models of plant class C GH9 enzymes

Low frequency (ω) and non-trivial normal modes (NM) (ω(NM) > 0, NM > 6; ω ∈ ℝ, NM ∈ ℕ) of the superposed 3D models as well as individual protein sequences of the minimised $ \left( NM\left({x}_{FL_{\mathrm{min}}}\right)\right.={NM}_{x_{FL_{\mathrm{min}}}}, NM\left({x}_{T_{\mathrm{min}}}\right)={NM}_{x_{T_{\mathrm{min}}}}, NM\left({x}_{cFL_{\mathrm{min}}}\right)={NM}_{x_{c{FL}_{\mathrm{min}}}}\Big) $ and 40.1 ns MD trajectories $ \left( NM\left({x}_{FL_{40.1 ns}}\right)={NM}_{x_{FL_{40.1 ns}}}\right) $ was done [67, 71, 72]. Each normal mode investigated was an eigenvector and was computed from the combined oscillatory motion of the Cα-atoms under a generic force field and possessed a characteristic eigenvalue (Fig. 1). As discussed vide supra, the corrected subset (x_cFL) of each protein was used for comparative analyses $ \left({x}_{cFL_{\mathrm{min}}}\ vs\ {x}_{T_{\mathrm{min}}}\right) $ where x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9}. A modified rmsf-based score $ \left(\mathrm{rmsf}\left({x}_{cFL_{\mathrm{min}}}\right)={\mathrm{rmsf}}_{x_{c{FL}_{\mathrm{min}}}},\mathrm{rmsf}\left({x}_{T_{\mathrm{min}}}\right)={\mathrm{rmsf}}_{x_{T_{\mathrm{min}}}}\right) $ was formulated as under:

$$ \Delta {\mathrm{rmsf}}_{x_{c{FL}_{\mathrm{min}}}}=\max \left({\mathrm{rmsf}}_{x_{c{FL}_{\mathrm{min}}}}\right)-\min \left({\mathrm{rmsf}}_{x_{c{FL}_{\mathrm{min}}}}\right) $$

(2)

$$ \Delta {\mathrm{rmsf}}_{x_{T_{\mathrm{min}}}}=\max \left({\mathrm{rmsf}}_{x_{T_{\mathrm{min}}}}\right)-\min \left({\mathrm{rmsf}}_{x_{T_{\mathrm{min}}}}\right) $$

(3)

These, in tandem with the standard deviation $ \left({\sigma}_{\mathrm{rmsf}}\left({x}_{cFL_{\mathrm{min}}},{x}_{T_{\mathrm{min}}}\right)\right) $, were used to assess and compare the influence of atomic motion on the structural organization of characterised class C proteins. The presence of correlated displacements of residues for each full length protein after the MD run $ \left({x}_{FL_{40.1 ns}}\right) $ was also examined by the dynamic cross correlation map (DCCM), i.e. the covariance matrix of the root mean square fluctuations $ \left(\mathrm{rmsf}\left({x}_{FL_{40.1 ns}}\right)={\mathrm{rmsf}}_{x_{FL_{40.1 ns}}}\right) $ of every Cα atom of each class C protein $ \left(\operatorname{cov}\left({x}_{F{L}_{40.1 ns}},{x}_{F{L}_{40.1 ns}}\right)\right)\forall $x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9} (Fig. 1). These investigations were complemented by computing the surfaces, cavities, and, grooves present in the GH9, linker, and CBM49 regions or at their interfaces using the SPDBV (Swiss protein data bank viewer) suite of programs (https://spdbv.vital-it.ch) (Fig. 1) [73]. A cylinder of minimum area and volume was used to model and thence approximate the dimensions (radius ≔ r, height ≔ h, length ≔ l; r, h, l ∈ ℝ₊) of the predicted substrate binding and cleaving groove(s) necessary to accommodate and digest crystalline. These formulas were derived and are as under:

$$ {A}_o\cong \varnothing +{A}_c=(2)\left(\pi \right)(r)\left(r+h\right) $$

(4)

$$ {V}_o\cong \beta +{V}_c=\left(\pi \right)\left({r}^2\right)(h) $$

(5)

Differentiating w, r, t, h and solving for r and h results in the formulae

$$ r=\sqrt{A_o/(4)\left(\pi \right)} $$

(6)

$$ h=(4)\left({V}_o\right)/{A}_o $$

(7)

$$ l={A}_0/r $$

(8)

$$ {\displaystyle \begin{array}{ccc}{A}_o& := & \mathrm{Computed}\ \mathrm{area}\ \mathrm{of}\ \mathrm{wide}\ \mathrm{groove}\ \left({Ang}^2\right)\\ {}{V}_o& := & \mathrm{Computed}\ \mathrm{volume}\ \mathrm{of}\ \mathrm{wide}\ \mathrm{groove}\ \left({Ang}^3\right)\\ {}r& := & \mathrm{Radius}\ \mathrm{of}\ \mathrm{approximating}\ \mathrm{cylinder}\\ {}h& := & \mathrm{Height}\ \mathrm{of}\ \mathrm{approximating}\ \mathrm{cylinder}\\ {}l& := & \mathrm{Length}\ \mathrm{of}\ \mathrm{groove}\ (Ang)\\ {}{A}_c& := & \mathrm{Computed}\ \mathrm{area}\ \mathrm{of}\ \mathrm{approximating}\ \mathrm{cylinder}\ \left({Ang}^2\right)\\ {}{V}_c& := & \mathrm{Computed}\ \mathrm{volume}\ \mathrm{of}\ \mathrm{approximating}\ \mathrm{cylinder}\ \left({Ang}^3\right)\\ {}\varnothing & := & \mathrm{Constant}\ \mathrm{of}\ \mathrm{approximation}\ (Area)\\ {}\beta & := & \mathrm{Constant}\ \mathrm{of}\ \mathrm{approximation}\ (Volume)\end{array}} $$

The difference data, i.e. ∅ = |A_o − A_c|; β = |V_o − V_c|, was then used to quantify and characterise this approximation.

Ligand preparation and utilization

The degree of polymerization (DP) was utilised to shortlist potential candidates of cellulose oligomers (2 ≤ DP ≤ 8) and their stereoisomers, from the ZINC12 and PubChem databases (http://www.ncbi.nlm.nih.gov/pubchem;http://zinc.docking.org) [74, 75]. Briefly, for 2 ≤ DP ≤ 4 (n = 3) and for 5 ≤ DP ≤ 8 (n = 1) were utilised (n = 13 = 3 ∗ (3) + 4) for this analysis (Fig. 1, Table 2). The ligands were downloaded in the isomeric SMILES format and built with ChemSketch installed locally. Geometry isomerization was initially performed with Chemsketch itself, followed by a further 500 − 2000 cycles of optimization with the steepest descent and the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithms [76]. These were implemented with a local installation of Arguslab using the universal force field (UFF) parameter of the molecular mechanics component (http://www.arguslab.com/arguslab.com) [77]. Additional relevant parameters for this step were the cutoff for non-bonded interactions (8.0 Ang) and data updates after every 20 steps. The optimization converged for all the ligands tested with a net energy of < − 8 Kcal mol⁻Ang². The xyz coordinates along with other relevant information was encoded as a pdb file and uploaded to the DockingServer (https://www.dockingserver.com/web) [78]. The geometry of all the ligands (n = 13) uploaded were finally optimised using the semi-empirical (PM6) method of partial charge addition, the Merck molecular force field (MMFF94), with all rotatable bonds delineated and non-polar hydrogen atoms merged [79, 80].

Table 2 Ligands utilised in docking experiments

Full size table

Docking experiments of characterised plant class C GH9 endoglucanases

3D models of characterised plant class C GH9 endoglucanases $ \left({x}_{FL_{40.1 ns}};x\in \left\{Q5 NAT0,Q8 LJP6,Q93 WY9,Q9 ZSP9\right\}\right) $ were uploaded to the DockingServer (https://www.dockingserver.com/web) [78]. The server with the aid of AutoDock, added the necessary hydrogens, atomic charges, and utilised a grid of 100 × 100 × 100 points with a spacing of 0.375 Ang [81]. The final positions of the coordinates on this grid were modified to include the previously delineated interaction surfaces of GH9, linker, and CBM49, for all the proteins. Computation of the non-covalent bonds (van der Waals, electrostatics) was accomplished using the parameter set from AutoDock. Docking was performed using the Lamarckian genetic algorithm and a local search method after the initial position, orientation, and torsion angles of the ligand molecules were set randomly [81, 82]. Data for a single experiment was derived from 100 different runs (∆translation = 0.2 Ang; ∆torsion = ∆quaternion = 5). These were set to terminate after a previously set limit of energy evaluations (E_evals =2500000, population =150). The contribution of these residues to the catalysis of crystalline cellulose was inferred from the free energy $ \left(x\left({\Delta G}_y\right)={x}_{{\Delta G}_y}\right) $ and constant of inhibition $ \left(x\left({Ki}_y\right)={x}_{Ki_y}\right) $ ∀x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9}; y ∈ {C21, C22, C23, C31, C32, C33, C41, C5, C6, C7, C8}.

Results

Data organization and arrangement

A pipeline comprising each step and the relevant data generated are presented as under the following steps:

Step 0: Parameters were defined for protocols to minimise, equilibriate, and preliminarily characterise 3D models of plant class C GH9 endoglucanases and ligands of cellulose (Fig. 1, Tables 1 and 2)
Step 1: The 3D fold of sequences of characterised (full length, truncated) and putative plant class C GH9 endoglucanases was determined (Figs. 1 and 2, Table 3; Supplementary Text 1).
Step 2: The 3D models of characterised class C enzymes were minimised and used to assess contributions of the linker and CBM49 to the structural integrity of protein (potential energy calculations, rms deviation, normal mode analysis, root mean square fluctuations) (Fig. 3, Table 4; Supplementary Texts 2–5).
Step 3: The minimised full length 3D models of characterised class C enzymes were perturb, equilibriate (300K; 120ps), and simulated with a molecular dynamics run (300K; 40.1 ns) (Fig. 4, Supplementary Text 6).
Step 4: The MD simulated characterised class C plant GH9 endoglucanases were analysed (invariant core analysis, surface contact analysis, cavity and groove delineation, normal mode analysis, docking) to garner insights into the architecture and composition of putative active sites (Figs. 5, 6, and 7, Tables 5, 6, 7, and 8; Supplementary Texts 7–9).
Step 5: Structural homologues of selected characterised and putative class C enzymes were identified with a PCA-based clustering schema and analysed to derive insignts into the mechanism(s) of digesting crystalline cellulose by plant class C GH9 endoglucanases (Figs. 8 and 9, Table 9; Supplementary Text 10).

Table 3 Fold identification by homology modelling of plant GH9 endoglucanases

Full size table

Table 4 Frequencies of non-trivial low frequency modes of 3D models of characterised and minimised class C enzymes

Full size table

Table 5 Dimensions of putative crystalline cellulose binding cleft of full length characterised class C enzymes after 40.1 ns MD-run

Full size table

Table 6 Computed data for cleaned and prepared ligands

Full size table

Table 7 Docking calculations to assess contribution of ligand interacting amino acids in full length class C enzyme after 40.1 ns MD-run

Full size table

Table 8 Distribution and composition of amino acids that may interact with cellulose-based ligands (2 ≤ DP ≤ 8)

Full size table

Table 9 Major groove dimensions of putative plant class C enzymes (n = 39)

Full size table

Homology modelling and assessment of characterised class C GH9 endoglucanases

An intersequence pairwise alignment suggests that despite a high degree of identity (≈75 − 83%) between the class C enzymes of S. lycopersicum, G. hirsutum, and N. tabacum, the preferred template for G. hirsutum was from T. fusca (PDBID : 1JS4). Conversely, the sequence identity for O. sativa was marginally lower (≈62 % identity), yet shared the same top ranked template, i.e. C. cellulolyticum (PDBID : 1GA2), with S. lycopersicum and N. tabacum (Table 3; Supplementary Text 1). However, the average sequence identity with the templates (≈32 − 40%) was similar for all class C enzymes investigated (Table 3). The superposed ungapped MSA of the truncated (x_T) class C proteins additionally resulted in the exclusion of the linker, i.e. CBM49 ≡ CBM49 ∪ L, from the MSA, i.e. x_T = GH9 − CBM49 = GH9 − (CBM49 ∪ L); x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9}, (Fig. 2). The results (rmsd (template, x) < 2 Ang) suggest that the catalytic machinery for digesting crystalline may be conserved in plants and other non-plant taxa most notably bacteria (Table 3; Supplementary Text 1) [8, 17, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34, 59, 60]. The models also indicate that in addition to GH9, CBM49 and the linker (coverage = 89 − 94%) may partake in digesting crystalline cellulose (Table 3) [8, 17, 59, 60]. Since, solvent addition was explicit, minimization of energy (E_min) was carried out exclusively by the steepest descent algorithm (ncyc > maxcyc) for the full length $ \left({E}_{\mathrm{min}}\left(Q5 NAT{0}_{FL_{\mathrm{min}}}\right)\cong -4.36\ast {10}^5\ \mathrm{kcal}\ {\mathrm{mol}}^{-1};{E}_{\mathrm{min}}\left(Q93 WY{9}_{FL_{\mathrm{min}}}\right)\cong -4.34\ast {10}^5\ \mathrm{kcal}\ {\mathrm{mol}}^{-1};{E}_{\mathrm{min}}\left(Q8 LJP{6}_{FL_{\mathrm{min}}}\right)\cong -4.31\ast {10}^5\ \mathrm{kcal}\ {\mathrm{mol}}^{-1};{E}_{\mathrm{min}}\left(Q9 ZSP{9}_{FL_{\mathrm{min}}}\right)\cong -4.29\ast {10}^5\mathrm{kcal}\ {\mathrm{mol}}^{-1}\right) $ and truncated $ \left({E}_{\mathrm{min}}\left(Q5 NAT{0}_{T_{\mathrm{min}}}\right)\cong -2.58\ast {10}^5\ \mathrm{kcal}\ {\mathrm{mol}}^{-1};{E}_{\mathrm{min}}\left(Q93 WY{9}_{T_{\mathrm{min}}}\right)\cong -3.61\ast {10}^5\ \mathrm{kcal}\ {\mathrm{mol}}^{-1};{E}_{\mathrm{min}}\left(Q8 LJP{6}_{T_{\mathrm{min}}}\right)\cong -3.49\ast {10}^5\ \mathrm{kcal}\ {\mathrm{mol}}^{-1};{E}_{\mathrm{min}}\left(Q9 ZSP{9}_{T_{\mathrm{min}}}\right)\cong -3.70\ast {10}^5\ \mathrm{kcal}\ {\mathrm{mol}}^{-1}\right) $ models (Fig. 3, Tables 1 and 3; Supplementary Text 3). Interestingly, whilst, the data $ \left(\mathrm{Rank}\left({E}_{\mathrm{min}}\left({x}_{FL_{\mathrm{min}}}\right)\right)=\mathrm{Rank}\left({E}_{\mathrm{min}}\left({x}_{T_{\mathrm{min}}}\right)\right)=\left\{2,3\right\};x=\left\{Q93 WY9,Q8 LJP6\right\}\right) $ were consistent for N. tabacum and G. hirsutum, there was a complete reversal of the same for O. sativa and S. lycopersicum $ \left(\mathrm{Rank}\ \left({E}_{\mathrm{min}}\left(Q5 NAT{0}_{FL_{\mathrm{min}}},Q9 ZSP{9}_{T_{\mathrm{min}}}\right)\right)\propto 1/\mathrm{Rank}\left({E}_{\mathrm{min}}\left(Q5 NAT{0}_{T_{\mathrm{min}}},Q9 ZSP{9}_{FL_{\mathrm{min}}}\right)\right)\right) $ (Fig. 3; Supplementary Text 3). These data suggest that full length class C enzymes may adopt a stable conformation earlier than their truncated counterparts. Interestingly, the rms deviations of the minimised full length class C enzymes from O. sativa $ \left({E}_{\mathrm{min}}\left(Q5 NAT{0}_{FL_{\mathrm{min}}}\right)/{E}_{\mathrm{min}}\left(Q5 NAT{0}_{T_{\mathrm{min}}}\right)\cong 2.31\right) $, G. hirsutum $ \left({E}_{\mathrm{min}}\left(Q8 LJP{6}_{FL_{\mathrm{min}}}\right)/{E}_{\mathrm{min}}\left(Q8 LJP{6}_{T_{\mathrm{min}}}\right)\cong 1.05\right) $, and S. lycopersicum $ \left({E}_{\mathrm{min}}\left(Q9 ZSP{9}_{FL_{\mathrm{min}}}\right)/{E}_{\mathrm{min}}\left(Q9 ZSP{9}_{T_{\mathrm{min}}}\right)\cong 2.84\right) $ were higher as compared with the truncated forms while the reverse was observed for N. tabacum $ \left({E}_{\mathrm{min}}\left(Q93 WY{9}_{FL_{\mathrm{min}}}\right)/{E}_{\mathrm{min}}\left(Q93 WY{9}_{T_{\mathrm{min}}}\right)\cong 0.89\right) $ (Fig. 3; Supplementary Text 3).

Assessing the contribution of CBM49 to the structural integrity of class C enzymes

The core data for the 3D models of all full length characterised plant class C enzymes suggests that while GH9 is well conserved (#Cα_{0.0 < V ≤ 100.0}(GH9) > 0), CBM49 is not (#Cα_{0.0 < V ≤ 100.0}(CBM49) = 0). The N- and C-terminal regions of the linker does, however, exhibit partial conservation (#Cα_{8.0 < V ≤ 100.0}(Linker) = {1, 3}), a trend which is unlikely to be sustained for larger datasets (Supplementary Text 2). Low frequency non-trivial modes, i.e. $ {NM}_{{x_{FL}}_{\mathrm{min}}}={NM}_{{x_T}_{\mathrm{min}}}=7-18 $, were also assessed to garner additional information about the possible role(s) of CBM49 and the linker in influencing structure of the GH9 (Table 4; Supplementary Texts 4 and 5). With the exception of O. sativa, the frequencies of these modes for all other full length $ \left(\Delta \omega \left({NM}_{{x_{FL}}_{\mathrm{min}}}\right)\right) $ class C members (G. hirsutum, N. tabacum, S. lycopersicum) were ≈2 − 3 fold higher than those for their truncated forms $ \left(\Delta \omega \left({NM}_{{x_{FL}}_{\mathrm{min}}}\right)\right.>\left(\Delta \omega \left({NM}_{{x_T}_{\mathrm{min}}}\right)\right) $ (Table 4; Supplementary Texts 4 and 5). The frequency of these for the truncated models $ \left(\Delta \omega \left({NM}_{{x_T}_{\mathrm{min}}}\right)\right) $ of O. sativa in general was ≈2 − 5 fold higher for all modes examined or as in S. lycopersicum for the higher frequency modes $ \left(\Delta \omega \left({NM}_{{x_{FL}}_{\mathrm{min}}}\right)\right.<\left(\Delta \omega \left({NM}_{{x_T}_{\mathrm{min}}}\right)\right) $ (Table 4; Supplementary Texts 4 and 5). The atomic fluctuation data for G. hirsutum $ \left(\Delta {\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\cong 1.74,\sigma \left({\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\right)\cong 0.21;\Delta {\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\cong 758.00,\sigma \left({\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\right)\cong 41.12\right) $, N. tabacum $ \left(\Delta {\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\cong 3.28,\sigma \left({\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\right)\cong 0.29;\Delta {\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\cong 673.61,\sigma \left({\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\right)\cong 37.56\right) $, and S. lycopersicum $ \left(\Delta {\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\cong 2.1,\sigma \left({\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\right)\cong 0.22;\Delta {\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\cong 46.29,\sigma \left({\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\right)\cong 4.30\right) $, exhibited greater variance as compared with the full length proteins, i.e. $ \sigma \left({\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\right)>>\sigma \left({\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\right) $ (Fig. 3; Supplementary Texts 4 and 5). Interestingly, the corresponding data for O. sativa only differed marginally $ \left(\Delta {\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\cong 2.49,\sigma \left({\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\right)\cong 0.26;\Delta {\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\cong 2.77,\sigma \left({\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\right)\cong 0.22\right) $ (Fig. 3; Supplementary Texts 4 and 5). The baseline rmsf values were remarkably consistent for all the proteins $ \left(\min \left(\Delta {\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\right)\cong \min \left(\Delta {\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\right)\cong 0.07\right) $ examined, although for O. sativa there was a tangible difference, i.e. $ \min \left(\Delta {\mathrm{rmsf}}_{{x_{cFL}}_{\mathrm{min}}}\right)\cong 0.07,\min \left(\Delta {\mathrm{rmsf}}_{{x_T}_{\mathrm{min}}}\right)\cong 0.05 $ (Table 4; Supplementary Texts 4 and 5). A position-specific analysis of this data clearly demonstrates that this heightened oscillatory motion involves the residues of the linker and CBM49 (Fig. 3, Table 4; Supplementary Texts 4 and 5). These data when combined suggests that CBM49 and the linker, despite being poorly conserved even amongst class C members, may deploy corrective hypermobility to rapidly restore equilibrial status secondary to perturbation events such as that observed for substrate binding and subsequent catalysis by enzymes.

Delineating the active site architecture of characterised plant class C enzymes

An multi-modal approach (surface contact analysis, docking, cavity and groove delineation) was adopted to ascertain the residues and their relevance to crystalline cellulose digestion by plant class C enzymes.

Analysing the DCCM to assess and characterise intra-protein residue interactions

The NMA and DCCM data of mature-folded (40.1 ns) class C enzymes suggest that several residues that comprise the non-contiguous segments between the GH9, linker, and CBM49 exhibit positively correlated atomic displacements (r ≅ 1.00) (Fig. 4, Supplementary Texts 6–9). These data imply that plant class C enzymes, like their bacterial counterparts may also possess well-defined interaction surface(s) $ \left( IS=\left\{{IS}_x^{GC},{IS}_x^{CL},{IS}_x^{GL}\right\}\right) $ between GH9, linker, and CBM49 (Fig. 5) [59, 60]. The surface area of interacting residues was variable and ranged from 375 − 517 Ang² (CBM49_linker$ \equiv {IS}_x^{CL} $), 283 − 481 Ang² (GH9_linker$ \equiv {IS}_x^{GL} $), and 96−208 Ang² (GH9_CBM49$ \equiv {IS}_x^{GC} $) where x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9}. The surfaces themselves may be further decomposed into non-contiguous subsegments, i.e. G = G1 ∪ G2 ∪ G3 and C = C1 ∪ C2 ∪ C3. Thus, $ {IS}_x^{GC}=G2\cup G3\cup C2\cup C3 $, $ {IS}_x^{GL}=G1\cup G2\cup G3\cup L $, and $ {IS}_x^{CL}=C1\cup C2\cup C3\cup L $ (Fig. 5). In general, while the contact surface formed between GH9 and CBM49 was the least, the same for CBM49 and the linker was maximal $ \left({IS}_x^{GC}<{IS}_x^{GL}<{IS}_x^{CL},x\in \left\{Q5 NAT0,Q8 LJP6,Q93 WY9,Q9 ZSP9\right\}\right) $. The only exception was for the class C enzyme from O. sativa $ \left({IS}_{Q5 NAT0}^{CL}>{IS}_{Q5 NAT0}^{GL}\right) $ which can be explained by a large interaction surface spanning GH9, CBM49, and the linker $ \left({IS}_{Q5 NAT0}^{GLC}=G1\cup G2\cup G3\cup C1\cup L\right) $, i.e. $ {IS}_{Q5 NAT0}^{GL}\equiv {IS}_{Q5 NAT0}^{GL C} $. The bonds between the residues that comprised these protein-protein interaction surfaces $ \left({AA}_x^{IS}\right.;x\in \left\{Q5 NAT0,Q8 LJP6,Q93 WY9,Q9 ZSP9\right\}\Big) $were non-covalent (hydrophobic, hydrogen, van der Waals) for N. tabacum, G. hirsutum, and S. lycopersicum. Here, too, the contact surface for the class C enzyme from O. sativa was exceptional and included the possibility of a covalent and oxygen-sensitive (−SS−) linkage between C124 and M5/M132 (Fig. 5).

Docking data suggests qualitative differences between individual class C enzymes

The binding energy of the ligands was lower for the higher molecular weight ligands $ \left({\mathrm{x}}_{\Delta {G}_{C8}}<{\mathrm{x}}_{\Delta {G}_{C5}}<{\mathrm{x}}_{\Delta {G}_{C6}}\le {\mathrm{x}}_{\Delta {G}_{C4}}\le {\mathrm{x}}_{\Delta {G}_{C3}}<{\mathrm{x}}_{\Delta {G}_{C2}}<{\mathrm{x}}_{\Delta {G}_{C7}}\right) $ with C8 possessing the lowest $ \Big({\mathrm{x}}_{\Delta {G}_{C8}}\cong -7.44\ \mathrm{kcal}\ {\mathrm{mol}}^{-1}=\min \left({\mathrm{x}}_{\Delta {G}_y}\right) $, while interestingly, the free energy of binding for C7 $ \left({\mathrm{x}}_{\Delta {G}_{C7}}\cong -2.67\ \mathrm{kcal}\ {\mathrm{mol}}^{-1}=\max \left({\mathrm{x}}_{\Delta {G}_y}\right)\right) $ for all the class C enzymes investigated. These data were also supported by the corresponding Ki values, i.e. $ {\mathrm{x}}_{Ki_{C8}}\cong 3.56\ \upmu \mathrm{M}=\min \left({\mathrm{x}}_{Ki_y}\right) $ and $ {\mathrm{x}}_{Ki_{C7}}\cong 7.58\ \mathrm{mM}=\max \left({\mathrm{x}}_{Ki_y}\right) $ x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9}; y ∈ {C21, C22, C23, C31, C32, C33, C41, C5, C6, C7, C8}) (Figs. 5 and 6, Tables 5 and 6). The distribution of specific amino acids identified by docking $ \left({AA}_x^{\mathrm{Dock}}\subset {AA}_x^{IS}\right.;x\in \left\{Q5 NAT0,Q8 LJP6,Q93 WY9,Q9 ZSP9\right\}\Big) $ suggests a preponderance of residues with small hydrophobic, aromatic, and basic side chains along with serine and threonine. Exceptionally, the catalytic amino acids aspartic (D) and glutamic (E) acids were almost (D465, O. sativa; D139, D451, S. lycopersicum) completely excluded from these calculations as were other amino acids with known proclivity to partake in catalysis, i.e. cysteine (C) and histidine (H) (Table 7).

Delineating the cavities and grooves for crystalline cellulose catalysis and modification by plant class C GH9 endoglucanases

Since, solvent accessibility is a pre-requisite for hydrolytic catalysis of the glycosidic linkage by GH9 endoglucanases, the presence of amino acids identified previously by docking was examined in cavities and grooves of the 3D models of full length characterised class C enzymes. The distribution of these for O. sativa (GH9 = 27, L = 0, CBM49 = 1, LC = 4, GC = 1), G. hirsutum (GH9 = 21, L = 0, CBM49 = 0, LC = 4, GC = 1), N. tabacum (GH9 = 20, L = 0, CBM49 = 0, LC = 4, GC = 0), and S. lycopersicum (GH9 = 23, L = 1, CBM49 = 2, LC = 1, GC = 0) that CBM49/linker may function to modulate catalysis by substrate modification rather participate directly (Figs. 5 and 6, Tables 5, 6, and 7). The amino acids that comprise these were enumerated $ \left({AA}_x^{CvG}\right.;x\in \left\{Q5 NAT0,Q8 LJP6,Q93 WY9,Q9 ZSP9\right\}\Big) $ and analysed (Table 7). The amino acid distribution when combined, $ {AA}_x=\left({AA}_x^{\mathrm{Dock}}\cap {AA}_x^{CvG}\right)\subset {AA}_x^{IS};x\in \left\{O. sativa,G. hirsutum,N. tabacum,S. lycopersicum\right\} $, was utilised to compute the dimensions (length ≅ 100 − 130 Ang, radius ≅ 8.0 − 10.4 Ang, height ≅ 2.2 − 2.8 Ang) of a probable architecture for the active site(s) of plant class C GH9 endoglucanases (Fig. 7, Tables 7 and 8). Whilst, the volume of the approximating cylinder perfectly matched the observed data (|V_o − V_c| = β ≅ 0) for all class C enzymes, the differences in the surface areas (∅ ≅ 250 − 510 Ang², mean ≅ 679 ± 137.66) could imply an intrinsic heterogeneity in the composition of amino acids viz. their side chains that comprise these grooves (Fig. 7, Tables 7 and 8).

Principal component-based clustering to identify potential class C homologues

The variance between the xyz coordinates of each ungapped aligned position (n = 363) was computed and summarised as eigenvalues (n = 1089). A scatter plot of the principal components (PC1 ≈ 73 % ≡ x axis; PC3 ≈ 5 % ≡ z axis) resulted in class C enzymes (n = 96) being clustered into 4 distinct groups (x, z = {(−, −), (−, +), (+, −), (+, +)}). Since most of the characterised members (n = 3; O. sativa, S. lycopersicum, G. hirsutum) belonged to a single cluster, these, and associated putative class C members (n = 39; Arabidopsis spp., B. stricta, B. distachyon, B. rapa, C. rubella, C. sinensis, E. grandis, E. salsugineum, G. max, G. raimondii, L. usitatissimum, M. domestica, M. truncatula, M. guttatus, P. virgatum, P. trichocarpa, P. persica, S. purpurea, S. moellendorffii, S. lycopersicum, S. tuberosum, Z. mays) (Sequence identity ≈ 3 − 49%) could be utilised to draw meaningful inferences about the generic active site and mechanism(s) deployed by plant class C enzymes to digest crystalline cellulose (Fig. 8; Supplementary Table 1 and Supplementary Text 10). Interestingly, members (n = 22) of the quadrant (+, −) included the bryophyte P. patens spp. and O. sativa spp. as compared with sequences (n = 39) present (−, −) which included the tracheophyte S. moellendorffii spp. (Fig. 8; Supplementary Table 1 and Supplementary Text 10). The presence of these ancestral class C members, i.e. tracheophytes, further strengthened the rationale of selecting this group since it represents organisms that may have evolved over 400 million years ago and therefore any mechanism postulated to digest crystalline cellulose would also likely have remained unchanged for that duration [8]. The quadrants (−, +) whose members (n = 19) included the characterised class C enzyme from N. tabacum, and (+, +) with n = 13 members possessed a similar distribution of plant members as with group 1 (−, −) (Fig. 8; Supplementary Table 1 and Supplementary Text 10).

Discussion

Contribution of the GH9, linker, and CBM49 to the architecture of the active site plant class C enzymes

Plant class C enzymes share considerable structural homology with gram-positive and -negative bacterial GH9 members (Tables 1, 2, and 3; Supplementary Texts 1 and 2). Although these results for GH9 are not entirely unexpected, data from this study also supports the involvement of the linker and CBM49 in the catalysis of crystalline cellulose by plant class C enzymes (Table 3; Supplementary Texts 1 and 2) [8, 17, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34, 59, 60]. The inclusion of the N- and C-terminal linker, albeit at higher volumes (V ∈ (8.0,100.0]) and the complete exclusion of CBM49 even amongst this small subset of class C enzymes suggest poor conservation of these segments (Figs. 5 and 8a; Supplementary Text 2) [8, 81]. These data raise the possibility that the linker and CBM49 may have an indirect or modulatory role in catalysing glycosidic cleavage and may partake in substrate selection/modification rather than direct catalysis (Figs. 5 and 8a; Supplementary Text 2)).

The digestion of crystalline cellulose, in non-plant taxa may occur in a continuous groove that spans the GH9, linker, and the associated CBMs [51,52,53,54,55,56,57,58,59,60]. Plant class C enzymes may also do so in a surface groove that is initially bounded by the GH9_linker $ \left({IS}_x^{GL}\right) $ at the posterior basolateral surface and continues laterally being bounded in turn by the GH9_linker $ \left({IS}_x^{GL}\right) $, CBM49_linker $ \left({IS}_x^{CL}\right) $, and GH9_CBM49 $ \left({IS}_x^{GC}\right) $ surfaces where x ∈ {Q5NAT0, Q8LJP6, Q93WY9, Q9ZSP9}, finally terminating anteriorly in a solvent accessible cavity that might constitute the principal active site (Fig. 5). Physically, although the IS-bounded grooves appear discontinuous at the surface, a thorough analysis suggests the presence of several subsurface cavities that could maintain contiuity (Figs. 5, 6, and 7). Further, an almost complete absence of measurable cavities in CBM49/linker could also ensure that the substrate-facing surface through which crystalline cellulose traverses was chemically inert. The model precludes the existence of disparate active sites whilst, concomitantly asserts a preparatory/modulatory effect by CBM49/linker which may then be followed by the hydrolytic cleavage of the glycosidic bond at the active site (Fig. 7). The rmsf and DCCM data in concert with the invariant core volumes further suggests that the IS that bounds the linker and CBM49 $ \left({IS}_x^{CL};x\in \left\{Q5 NAT0,Q8 LJP6,Q93 WY9,Q9 ZSP9\right\}\right) $ surface may exhibit heightened low frequency motion, a factor that could confer upon class C enzymes the propensity to accommodate varying lengths of crystalline cellulose (47, Tables 6, 7, and 8; Supplementary Text 2, 6–9).

Molecular dissection of a putative active site of class C enzymes

Any plausible model of the active site architecture of plant class C enzymes would have to explain as well as include extant empirical data. 3D models of full length minimised and characterised members from O. sativa, G. hirsutum, N. tabacum, and S. lycopersicum were simulated in vacuo for 40.1 ns and thence examined for amino acids that may contribute to substrate binding and/or catalysis. The combined list of functionally relevant amino acids, i.e. $ {AA}_x=\left({AA}_x^{\mathrm{Dock}}\cap {AA}_x^{CvG}\right)\subset {AA}_x^{IS};x\in \left\{O. sativa,G. hirsutum,N. tabacum,S. lycopersicum\right\} $, were enumerated and utilised for these analyses (Table 7). The paucity/absence of residues that support generic acid-base mediated cleavage of the β (1 → 4) glycosidic bond for crystalline cellulose as well as known active site amino acids $ \left(\left\{E,C,H\right\}\notin {AA}_x^{\mathrm{Dock}}\right) $, despite being present contiguously with those that are $ \left(\left\{D,E,C,H,P,R,K,N,Q,L,I,V,A,M,W,F,Y,G,S,T\right\}\in {AA}_x^{IS}\cup {AA}_x^{CvG}\right) $ suggests that catalysis might occur in a superficial cavity just below the surface of the protein (Fig. 5, Table 7). However, the preponderance of energetically favourable aromatic amino acids along the interaction surfaces and various grooves (AAA = {W, F, Y} ≈ 15 − 41%; {W, F, Y} ∈ AA_x) when taken in tandem with previously conducted mutagenesis experiments on the CBMs suggest that cellulose may physically interact with these residues on the surface prior to entering the cavity for catalysis (Figs. 5, 6, and 7, Tables 6, 7, and 8). The formation of this purported groove may be supported/strengthened by the uniform presence of proline (P ≈ 3.7 − 25%), as well as stabilizing electrostatic interactions involving arginine (R), lysine (K), asparagine (N), glutamine (Q), serine (S), and threonine (T) ([RKNQ] ≈ 12 − 25%; [ST] ≈ 10 − 18.5%), while remaining chemically inert throughout its length with several amino acids with shorter hydrophobic side chains lining the groove, i.e. leucine (L), isoleucine (I), valine (V), methionine (M), and exceptionally alanine (A) (HSC ≡ [LIVAM] ≈ 25 − 37%) (Table 7).

Mechanistic insights into crystalline cellulose digestion by plant class C enzymes

The aforementioned discussion notwithstanding the small sample size could preclude meaningful inference of the mechanism(s) of crystalline cellulose digestion by plant class C GH9 endoglucanases. This was offset by examining 3D models of putative structural homologues of selected class C members (n = 39) (Fig. 8a; Supplementary Table 1 and Supplementary Text 2 and 10). Data from these suggest that the largest uninterrupted grooves that span GH9 (l_GH9 ≅ 101 − 194 Ang) and CBM49 (l_CBM49 ≅ 71 − 183 Ang) are disjoint and distinct, the only exceptions being the sequences from L. usitatissimum spp. and B. distachyon spp. (Fig. 9, Table 9). Further support for the mechanism(s) purported for digesting crystalline cellulose plant class C enzymes may be gleaned by examining the 3D models for IS-bounded surface grooves $ \left({IS}_x^{GL},{IS}_x^{CL},{IS}_x^{GC}\right) $ in A. coerulea, C. sinensis, C. rubella, S. purpurea, P. persica spp., M. domestica, P. virgatum spp., A. lyrata, B. rapa spp., Z. mays spp., and B. distachyon spp. (Fig. 9, Table 9). Interestingly, the groove located at the interaction surface and bounded by GH9, linker, and CBM49 concomitantly $ \left({IS}_x^{GLC}\right) $ as in L usitatissimum spp., P. trichocarpa, M. truncatula spp., G. max, and B. distachyon spp. may exert significant influence on crystalline cellulose in comparison with the distally located and smaller CBM49-bounded grooves (l_CBM49 ≤ 100 Ang) (Fig. 9). This data further complements the hypothesis that plant class C GH9 endoglucanases may possess a dual mode (processive, non-processive) of action wherein crystalline cellulose is initially acted upon and thereby modified by the indenting side chains of aromatic amino in a quasi-continuous surface groove at the interface(s) of GH9, linker, and CBM49, which is inert and stable. Once modified (induced strain on the glycosidic linkage), crystalline cellulose is driven towards a solvent accessible subsurface cavity. Here, the GH9 conserved catalytic residues of aspartic (D) and/or glutamic (E) acids utilise an acid-base catalytic mechanism to cleave the β (1 → 4) linkage between glucopyranose units. These may then be acted upon by exoglucanases to release oligosaccharides (C2 − C4). This mechanism not only corroborates extant kinetic data such as CBM-mediated modulatory catalysis, but also offers a molecular explanation for substrate promiscuity observed for this group of enzymes, whilst conforming to available structural data from non-plant taxa (Figs. 4, 5, 6, 7, 8, and 9, Tables 4, 5, 6, 7, 8, and 9; Supplementary Tables 1 and Supplementary Texts 2–10) [51,52,53,54,55,56,57,58,59,60].

Evolutionary significance for CBM49-mediated digestion of crystalline cellulose

The ability to cleave crystalline cellulose by plant class C members is dependent on the presence of CBM49 and may have evolved directly from non-plant taxa (≈500 Mya) [8, 17, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34, 59, 60]. An additional premise explored previously was that plant class C enzymes may not just predate but, could potentially diverge into classes A and B after CBM49 was excised during processing of the mature mRNA transcript [8, 18, 46, 83,84,85]. A mechanistic understanding of these processes is clearly desirable with much of the aforementioned generated data involving kinetic parameters, mRNA expression levels, and sequence information. The present study highlights variations in the CBM49/linker even amongst class C enzymes, provides insights into the architecture, position, plasticity, and composition of the IS-enclosed surface grooves, delineates the position and composition of a contiguous subsurface cavity for catalytic cleavage of the glycosidic linkage, enumerates functionally relevant amino acids that participate in substrate selection/modification, and offers a mechanistic explanation of CBM49-mediated reaction chemistry (Figs. 1, 2, 3, 4, 5, 6, 7, 8, and 9; Tables 1, 2, 3, 4, 5, 6, 7, 8, and 9; Supplementary Table 1 and Supplementary Texts 1–10). Additionally, a definitive body of literature indicates that hyperflexible regions may be intrinsically disordered and therefore have short t_1/2 [63, 86, 87]. This would imply that proteins with the CBM49_linker may be evolutionarily at a disadvantage than those without. Alternatively, these might be encoded by nucleotides with a tendency to form higher order substructures in mRNA such as stem loops, bulges, and bends. These in turn could delay or irreversibly interrupt the ribosomal apparatus and prevent effective translation of the mRNA, and thereby contribute to decreased expression of class C enzymes. Since CBM49 is central to the ability of plant class C enzymes to digest crystalline cellulose, it would follow this loss could lead to a decrease in class C enzymes or conversely an increase in classes A and B [8].

Conclusions

A detailed biophysical analysis of homology models of characterised and putative class C endoglucanases was carried out to assess the contribution(s) of the GH9, linker, and CBM49 to catalysis/modification of crystalline cellulose. The work presented in this manuscript corroborates the notion that the linker and CBM49 may complement generic acid-base catalysis by aspartic/glutamic residues of GH9, and may do so in a multitude of ways. These include an influence on the structural organization of the protein, participation in critical intra-protein interactions, facilitate formation of inert and structurally plastic surface grooves, and render crystalline cellulose amenable to hydrolytic cleavage. Despite being entirely computational, the findings presented here offer profound insights into not just the active site geometry of plant class C GH9 endoglucanases, but also offer valuable clues into their evolutionary divergence. Whilst, most these findings await experimental valiation the analyses conducted suggests that plant-based conversion of biomass is feasible and may constitute a viable alternative to bacterial-, fungal-, and algal-based protocols.

Change history

27 July 2020
The article Insights into the mechanism(s) of digestion of crystalline cellulose by plant class C GH9 endoglucanases, written by Siddhartha Kundu, was originally published Online First without Open Access.

Abbreviations

AA:: Amino acids
CBM:: Carbohydrate binding module
DP:: Degree of polymerization
EC:: Enzyme commission
FL:: Full length
GH:: Glycoside hydrolase
IS:: Interaction surface
L:: Linker
NMA:: Normal mode analysis
T:: Truncated

References

Klemm D, Heublein B, Fink HP, Bohn A (2005) Cellulose: fascinating biopolymer and sustainable raw material. Angew Chem Int Ed Eng 44(22):3358–3393
CAS Google Scholar
Augimeri RV, Varley AJ, Strap JL (2015) Establishing a role for bacterial cellulose in environmental interactions: lessons learned from diverse biofilm-producing proteobacteria. Front Microbiol 6:1282
PubMed PubMed Central Google Scholar
Reardon-Robinson ME, Wu C, Mishra A, Chang C, Bier N, Das A, Ton-That H (2014) Pilus hijacking by a bacterial coaggregation factor critical for oral biofilm development. Proc Natl Acad Sci U S A 111(10):3835–3840
CAS PubMed PubMed Central Google Scholar
Updegraff DM (1969) Semimicro determination of cellulose in biological materials. Anal Biochem 32(3):420–424
CAS PubMed Google Scholar
Yoshida Y, Palmer RJ, Yang J, Kolenbrander PE, Cisar JO (2006) Streptococcal receptor polysaccharides: recognition molecules for oral biofilm formation. BMC Oral Health 6(Suppl 1):S12
PubMed PubMed Central Google Scholar
Agarwal V, Dauenhauer PJ, Huber GW, Auerbach SM (2012) Ab initio dynamics of cellulose pyrolysis: nascent decomposition pathways at 327 and 600 degrees C. J Am Chem Soc 134(36):14958–14972
CAS PubMed Google Scholar
Paulsen AD, Hough BR, Williams CL, Teixeira AR, Schwartz DT, Pfaendtner J, Dauenhauer PJ (2014) Fast pyrolysis of wood for biofuels: spatiotemporally resolved diffuse reflectance in situ spectroscopy of particles. ChemSusChem
Kundu S, Sharma R (2018) Origin, evolution, and divergence of plant class C GH9 endoglucanases. BMC Evol Biol 18:79
PubMed PubMed Central Google Scholar
del Campillo E, Gaddam S, Mettle-Amuah D, Heneks J (2012) A tale of two tissues: AtGH9C1 is an endo-beta-1,4-glucanase involved in root hair and endosperm development in Arabidopsis. PLoS One 7(11):e49363
PubMed PubMed Central Google Scholar
Kundu S (2015) Co-operative intermolecular kinetics of 2-oxoglutarate dependent dioxygenases may be essential for system-level regulation of plant cell physiology. Front Plant Sci 6:489
PubMed PubMed Central Google Scholar
Tan TC, Kracher D, Gandini R, Sygmund C, Kittl R, Haltrich D, Hallberg BM, Ludwig R, Divne C (2015) Structural basis for cellobiose dehydrogenase action during oxidative cellulose degradation. Nat Commun 6:7542
PubMed PubMed Central Google Scholar
Westermark U, Eriksson K-E, Daasvatn K, Liaaen-Jensen S, Enzell CR, Mannervik B (1974) Cellobiose:quinone oxidoreductase, a new wood-degrading enzyme from white-rot fungi. Acta Chem Scand 28b:209–214
Google Scholar
Schimz KL, Broll B, John B (1983) Cellobiose phosphorylase (EC 2.4.1.20) of cellulomonas: occurrence, induction, and its role in cellobiose metabolism. Arch Microbiol 135(4):241–249
Google Scholar
Sheth K, Alexander JK (1969) Purification and properties of beta-1,4-oligoglucan:orthophosphate glucosyltransferase from Clostridium thermocellum. J Biol Chem 244(2):457–464
CAS PubMed Google Scholar
Ye X, Zhu Z, Zhang C, Zhang YH (2011) Fusion of a family 9 cellulose-binding module improves catalytic potential of Clostridium thermocellum cellodextrin phosphorylase on insoluble cellulose. Appl Microbiol Biotechnol 92(3):551–560
CAS PubMed Google Scholar
Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42(Database issue):D490–D495
CAS PubMed Google Scholar
Davison A, Blaxter M (2005) Ancient origin of glycosyl hydrolase family 9 cellulase genes. Mol Biol Evol 22:1273–1284
CAS PubMed Google Scholar
Kundu S, Sharma R (2016) In silico identification and taxonomic distribution of plant class C GH9 endoglucanases. Front Plant Sci 7:1185
PubMed PubMed Central Google Scholar
Ficko-Blean E, Boraston AB (2006) The interaction of a carbohydrate-binding module from a Clostridium perfringens N-acetyl-beta-hexosaminidase with its carbohydrate receptor. J Biol Chem 281(49):37748–37757
CAS PubMed Google Scholar
Duan CJ, Feng YL, Cao QL, Huang MY, Feng JX (2016) Identification of a novel family of carbohydrate-binding modules with broad ligand specificity. Sci Rep 6:19392
CAS PubMed PubMed Central Google Scholar
Prates ET, Stankovic I, Silveira RL, Liberato MV, Henrique-Silva F, Pereira Jr N, Polikarpov I, Skaf MS (2013) X-ray structure and molecular dynamics simulations of endoglucanase 3 from Trichoderma harzianum: structural organization and substrate recognition by endoglucanases that lack cellulose binding module. PLoS One 8(3):e59069
CAS PubMed PubMed Central Google Scholar
Boraston AB, Nurizzo D, Notenboom V, Ducros V, Rose DR, Kilburn DG, Davies GJ (2002) Differential oligosaccharide recognition by evolutionarily-related beta-1,4 and beta-1,3 glucan-binding modules. J Mol Biol 319(5):1143–1156
CAS PubMed Google Scholar
Charnock SJ, Bolam DN, Nurizzo D, Szabo L, McKie VA, Gilbert HJ, Davies GJ (2002) Promiscuity in ligand-binding: the three-dimensional structure of a Piromyces carbohydrate-binding module, CBM29-2, in complex with cello- and mannohexaose. Proc Natl Acad Sci U S A 99(22):14077–14082
CAS PubMed PubMed Central Google Scholar
Crennell SJ, Cook D, Minns A, Svergun D, Andersen RL, Nordberg Karlsson E (2006) Dimerisation and an increase in active site aromatic groups as adaptations to high temperatures: X-ray solution scattering and substrate-bound crystal structures of Rhodothermus marinus endoglucanase Cel12A. J Mol Biol 356(1):57–71
CAS PubMed Google Scholar
Kim SJ, Kim SH, Shin SK, Hyeon JE, Han SO (2016) Mutation of a conserved tryptophan residue in the CBM3c of a GH9 endoglucanase inhibits activity. Int J Biol Macromol 92:159–166
CAS PubMed Google Scholar
Mattinen ML, Kontteli M, Kerovuo J, Linder M, Annila A, Lindeberg G, Reinikainen T, Drakenberg T (1997) Three-dimensional structures of three engineered cellulose-binding domains of cellobiohydrolase I from Trichoderma reesei. Protein Sci 6(2):294–303
CAS PubMed PubMed Central Google Scholar
Morrill J, Kulcinskaja E, Sulewska AM, Lahtinen S, Stalbrand H, Svensson B, Abou Hachem M (2015) The GH5 1,4-beta-mannanase from Bifidobacterium animalis subsp. lactis Bl-04 possesses a low-affinity mannan-binding module and highlights the diversity of mannanolytic enzymes. BMC Biochem 16:26
PubMed PubMed Central Google Scholar
Nishijima H, Nozaki K, Mizuno M, Arai T, Amano Y (2015) Extra tyrosine in the carbohydrate-binding module of Irpex lacteus Xyn10B enhances its cellulose-binding ability. Biosci Biotechnol Biochem 79(5):738–746
CAS PubMed Google Scholar
Parsiegla G, Reverbel-Leroy C, Tardif C, Belaich JP, Driguez H, Haser R (2000) Crystal structures of the cellulase Cel48F in complex with inhibitors and substrates give insights into its processive action. Biochemistry 39(37):11238–11246
CAS PubMed Google Scholar
Simpson HD, Barras F (1999) Functional analysis of the carbohydrate-binding domains of Erwinia chrysanthemi Cel5 (endoglucanase Z) and an Escherichia coli putative chitinase. J Bacteriol 181(15):4611–4616
CAS PubMed PubMed Central Google Scholar
Simpson PJ, Xie H, Bolam DN, Gilbert HJ, Williamson MP (2000) The structural basis for the ligand specificity of family 2 carbohydrate-binding modules. J Biol Chem 275(52):41137–41142
CAS PubMed Google Scholar
Strobel KL, Pfeiffer KA, Blanch HW, Clark DS (2015) Structural insights into the affinity of Cel7A carbohydrate-binding module for lignin. J Biol Chem 290(37):22818–22826
CAS PubMed PubMed Central Google Scholar
Taylor CB, Talib MF, McCabe C, Bu L, Adney WS, Himmel ME, Crowley MF, Beckham GT (2012) Computational investigation of glycosylation effects on a family 1 carbohydrate-binding module. J Biol Chem 287(5):3147–3155
CAS PubMed Google Scholar
Yaniv O, Petkun S, Shimon LJ, Bayer EA, Lamed R, Frolow F (2012) A single mutation reforms the binding activity of an adhesion-deficient family 3 carbohydrate-binding module. Acta Crystallogr D Biol Crystallogr 68(Pt 7):819–828
CAS PubMed Google Scholar
Abbott DW, Hrynuik S, Boraston AB (2007) Identification and characterization of a novel periplasmic polygalacturonic acid binding protein from Yersinia enterolitica. J Mol Biol 367(4):1023–1033
CAS PubMed Google Scholar
Abramyan J, Stajich JE (2012) Species-specific chitin-binding module 18 expansion in the amphibian pathogen Batrachochytrium dendrobatidis. MBio 3(3):e00150–e00112
CAS PubMed PubMed Central Google Scholar
Bachman ES, McClay DR (1996) Molecular cloning of the first metazoan beta-1,3 glucanase from eggs of the sea urchin Strongylocentrotus purpuratus. Proc Natl Acad Sci U S A 93(13):6808–6813
CAS PubMed PubMed Central Google Scholar
Janecek S, Svensson B, MacGregor EA (2011) Structural and evolutionary aspects of two families of non-catalytic domains present in starch and glycogen binding proteins from microbes, plants and animals. Enzym Microb Technol 49(5):429–440
CAS Google Scholar
Li S, Yang X, Bao M, Wu Y, Yu W, Han F (2015) Family 13 carbohydrate-binding module of alginate lyase from Agarivorans sp. L11 enhances its catalytic efficiency and thermostability, and alters its substrate preference and product distribution. FEMS Microbiol Lett:362(10)
Newstead SL, Watson JN, Bennet AJ, Taylor G (2005) Galactose recognition by the carbohydrate-binding module of a bacterial sialidase. Acta Crystallogr D Biol Crystallogr 61(Pt 11):1483–1491
PubMed Google Scholar
Palomo M, Kralj S, van der Maarel MJ, Dijkhuizen L (2009) The unique branching patterns of Deinococcus glycogen branching enzymes are determined by their N-terminal domains. Appl Environ Microbiol 75(5):1355–1362
CAS PubMed PubMed Central Google Scholar
Libertini E, Li Y, McQueen-Mason SJ (2004) Phylogenetic analysis of the plant endo-beta-1,4-glucanase gene family. J Mol Evol 58(5):506–515
CAS PubMed Google Scholar
Molhoj M, Pagant S, Hofte H (2002) Towards understanding the role of membrane-bound endo-beta-1,4-glucanases in cellulose biosynthesis. Plant Cell Physiol 43(12):1399–1406
CAS PubMed Google Scholar
Urbanowicz BR, Bennett AB, Del Campillo E, Catala C, Hayashi T, Henrissat B, Hofte H, McQueen-Mason SJ, Patterson SE, Shoseyov O et al (2007) Structural organization and a standardized nomenclature for plant endo-1,4-beta-glucanases (cellulases) of glycosyl hydrolase family 9. Plant Physiol 144(4):1693–1696
CAS PubMed PubMed Central Google Scholar
Flint J, Bolam DN, Nurizzo D, Taylor EJ, Williamson MP, Walters C, Davies GJ, Gilbert HJ (2005) Probing the mechanism of ligand recognition in family 29 carbohydrate-binding modules. J Biol Chem 280(25):23718–23726
CAS PubMed Google Scholar
Montanier C, Flint JE, Bolam DN, Xie H, Liu Z, Rogowski A, Weiner DP, Ratnaparkhe S, Nurizzo D, Roberts SM et al (2010) Circular permutation provides an evolutionary link between two families of calcium-dependent carbohydrate binding modules. J Biol Chem 285(41):31742–31754
CAS PubMed PubMed Central Google Scholar
Roske Y, Sunna A, Pfeil W, Heinemann U (2004) High-resolution crystal structures of Caldicellulosiruptor strain Rt8B.4 carbohydrate-binding module CBM27-1 and its complex with mannohexaose. J Mol Biol 340(3):543–554
CAS PubMed Google Scholar
Zhang C, Zhang W, Lu X (2015) Expression and characteristics of a Ca(2)(+)-dependent endoglucanase from Cytophaga hutchinsonii. Appl Microbiol Biotechnol 99(22):9617–9623
CAS PubMed Google Scholar
Tunnicliffe RB, Bolam DN, Pell G, Gilbert HJ, Williamson MP (2005) Structure of a mannan-specific family 35 carbohydrate-binding module: evidence for significant conformational changes upon ligand binding. J Mol Biol 347:287–296
CAS PubMed Google Scholar
Uni F, Lee S, Yatsunami R, Fukui T, Nakamura S (2009) Role of exposed aromatic residues in substrate-binding of CBM family 5 chitin-binding domain of alkaline chitinase. Nucleic Acids Symp Ser (Oxf) 53:311–312
CAS Google Scholar
Divne C, Ståhlberg J, Reinikainen T, Ruohonen L, Pettersson G, Knowles JKC, Teeri TT, Jones TA (1994) The 3dimensional crystal-structure of the catalytic core of cellobiohydrolase-I from Trichoderma reesei. Science 265:524–528
CAS PubMed Google Scholar
Divne C, Ståhlberg J, Teeri TT, Jones TA (1998) High resolution crystal structures reveal how a cellulose chain is bound in the 50 Å long tunnel of cellobiohydrolase I from Trichoderma reesei. J Mol Biol 275:309–325
CAS PubMed Google Scholar
Kleywegt GJ, Zou JY, Divne C, Davies GJ, Sinning I, Ståhlberg J, Reinikainen T, Srisodsuk M, Teeri TT, Jones TA (1997) The crystal structure of the catalytic core domain of endoglucanase I from Trichoderma reesei at 3.6 Å resolution, and a comparison with related enzymes. J Mol Biol 272:383–397
CAS PubMed Google Scholar
Mackenzie LF, Sulzenbacher G, Divne C, Jones TA, Woldike HF, Schulein M, Withers SG, Davies GJ (1998) Crystal structure of the family 7 dndoglucanase I (Cel7B) from Humicola insolens at 2.2 Å resolution and identification of the catalytic nucleophile by trapping of the covalent glycosyl-enzyme intermediate. Biochem J 335:409–416
CAS PubMed PubMed Central Google Scholar
Ståhlberg J, Johansson G, Pettersson G, New A (1991) Model for enzymatic-hydrolysis of cellulose based on the 2-domain structure of cellobiohydrolase-I. Biotechnol Biofuels 9:286–290
Google Scholar
Payne CM, Baban J, Horn SJ, Backe PH, Arvai AS, Dalhus B, Bjørås M, Eijsink VGH, Sørlie M, Beckham GT et al (2012) Hallmarks of processivity in glycoside hydrolases from crystallographic and computational studies of the Serratia marcescens Chitinases. J Biol Chem 287:36322–36330
CAS PubMed PubMed Central Google Scholar
Taylor CB, Payne CM, Himmel ME, Crowley MF, McCabe C, Beckham GT (2013) Binding site dynamics and aromatic-carbohydrate interactions in processive and non-processive family 7 glycoside hydrolases. J Phys Chem B 117:4924–4933
CAS PubMed Google Scholar
Payne CM, Resch MG, Chen L, Crowley MF, Himmel ME, Taylor 2nd LE, Sandgren M, Ståhlberg J, Stals I, Tan Z, Beckham GT (2013) Glycosylated linkers in multimodular lignocellulose-degrading enzymes dynamically bind to cellulose. Proc Natl Acad Sci U S A 110:14646–14651
CAS PubMed PubMed Central Google Scholar
Mandelman D, Belaich A, Belaich JP, Aghajari N, Driguez H, Haser R (2003) X-ray crystal structure of the multidomain endoglucanase Cel9G from Clostridium cellulolyticum complexed with natural and synthetic cello-oligosaccharides. J Bacteriol 185(14):4127–4135
CAS PubMed PubMed Central Google Scholar
Sakon J, Irwin D, Wilson DB, Karplus PA (1997) Structure and mechanism of endo/exocellulase E4 from Thermomonospora fusca. Nat Struct Biol 4(10):810–818
CAS PubMed Google Scholar
Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10:845–858
CAS PubMed PubMed Central Google Scholar
Case DA, Cerutti DS, Cheatham III TE, Darden TA, Duke RE, Giese TJ, Gohlke H, Goetz AW, Greene D, Homeyer N, Izadi S, Kovalenko A, Lee TS, LeGrand S, Li P, Lin C, Liu J, Luchko T, Luo R, Mermelstein D, Merz KM, Monard G, Nguyen H, Omelyan I, Onufriev A, Pan F, Qi R, Roe DR, Roitberg A, Sagui C, Simmerling CL, Botello-Smith WM, Swails J, Walker RC, Wang J, Wolf RM, Wu X, Xiao L, York DM, Kollman PA (2017) AMBER 2017. University of California, San Francisco
Google Scholar
Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K (2005) Scalable molecular dynamics with NAMD. J Comput Chem 26(16):1781–1802
CAS PubMed PubMed Central Google Scholar
Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14(1):33–38 27-8
CAS PubMed Google Scholar
Edgar RC (2004a) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
PubMed PubMed Central Google Scholar
Edgar RC (2004b) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
CAS PubMed PubMed Central Google Scholar
Grant BJ, Rodrigues AP, ElSawy KM, McCammon JA, Caves LS (2006) Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics 22:2695–2696
CAS PubMed Google Scholar
Altman RB, Gerstein M (1994) Finding an average core structure: application to the globins. Proc Int Conf Intell Syst Mol Biol 2:19–27
CAS PubMed Google Scholar
Gerstein M, Altman RB (1995) Average core structures and variability measures for protein families: application to the immunoglobulins. J Mol Biol 251(1):161–175
CAS PubMed Google Scholar
Gerstein M, Chothia C (1991) Analysis of protein loop closure. Two types of hinges produce one motion in lactate dehydrogenase. J Mol Biol 220(1):133–149
CAS PubMed Google Scholar
Durand P, Trinquier G, Sanejouand Y-H (1994) A new approach for determining low-frequency normal modes in macromolecules. Biopolymers 34(6):759–771
CAS Google Scholar
Hinsen K, Petrescu A-J, Dellerue S, Bellissent-Funel M-C, Kneller GR (2000) Harmonicity in slow protein dynamics. Chem Phys 261(1–2):25–37
CAS Google Scholar
Kaplan W, Littlejohn TG (2001) Swiss-PDB viewer (deep view). Brief Bioinform 2:195–197
CAS PubMed Google Scholar
Irwin JJ, Shoichet BK (2005) ZINC–a free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182
CAS PubMed PubMed Central Google Scholar
Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) ZINC: a free tool to discover chemistry for biology. J Chem Inf Model:52, 1757–1768
Das B, Meirovitch H, Navon IM (2003) Performance of hybrid methods for large-scale unconstrained optimization as applied to models of proteins. J Comput Chem 24(10):1222–1231
CAS PubMed Google Scholar
Rappe AK, Casewit CJ, Colwell KS, Goddard WA, Skiff WM (1992) UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J Am Chem Soc 114(25):10024–10035
CAS Google Scholar
Bikadi Z, Hazai E (2009) Application of the PM6 semi-empirical method to modeling proteins enhances docking accuracy of AutoDock. Aust J Chem 1:15
Google Scholar
Stewart JJ (2009) Application of the PM6 method to modeling proteins. J Mol Model 15:765–805
CAS PubMed Google Scholar
Halgren TA (1996) Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17:490–519
CAS Google Scholar
Morris GM, Goodsell DS (1998) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19(14):1639–1662
CAS Google Scholar
Solis FJ, Wets RJB (1981) Minimization by random search techniques. Math Oper Res 6(1):19–30
Google Scholar
Urbanowicz BR, Catala C, Irwin D, Wilson DB, Ripoll DR, Rose JK (2007) A tomato endo-beta-1,4-glucanase, SlCel9C1, represents a distinct subclass with a new family of carbohydrate binding modules (CBM49). J Biol Chem 282(16):12066–12074
CAS PubMed Google Scholar
Buchanan M, Burton RA, Dhugga KS, Rafalski AJ, Tingey SV, Shirley NJ, Fincher GB (2012) Endo-(1,4)-beta-glucanase gene families in the grasses: temporal and spatial co-transcription of orthologous genes. BMC Plant Biol 12:235
CAS PubMed PubMed Central Google Scholar
Xie G, Yang B, Xu Z, Li F, Guo K, Zhang M, Wang L, Zou W, Wang Y, Peng L (2013) Global identification of multiple OsGH9 family members and their involvement in cellulose crystallinity modification in rice. PLoS One 8:e50171
CAS PubMed PubMed Central Google Scholar
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones DT et al (2014) Classification of intrinsically disordered regions and proteins. Chem Rev 114(13):6589–6631
PubMed PubMed Central Google Scholar
van der Lee R, Lang B, Kruse K, Gsponer J, Sanchez de Groot N, Huynen MA, Matouschek A, Fuxreiter M, Babu MM (2014) Intrinsically disordered segments affect protein half-life in the cell and during evolution. Cell Rep 8(6):1832–1844
PubMed PubMed Central Google Scholar

Download references

Acknowledgments

SK wishes to formally thank Dr. Rita Sharma for her suggestions and unflinching moral support.

Author information

Authors and Affiliations

Department of Biochemistry, Army College of Medical Sciences, Brar Square, Delhi Cantt., New Delhi, 110010, India
Siddhartha Kundu

Authors

Siddhartha Kundu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SK collated the data, conducted the analysis, developed the scoring indices, wrote all the necessary code, and the manuscript.

Corresponding author

Correspondence to Siddhartha Kundu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised due to a retrospective Open Access order.

Electronic supplementary material

ESM 1

(PDF 20 kb)

ESM 2

(PDF 1857 kb)

ESM 3

(PDF 2382 kb)

ESM 4

(PDF 276 kb)

ESM 5

(PDF 208 kb)

ESM 6

(PDF 215 kb)

ESM 7

(PDF 282 kb)

ESM 8

(PDF 2217 kb)

ESM 9

(PDF 123 kb)

ESM 10

(PDF 155 kb)

ESM 11

(PDF 129 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kundu, S. Insights into the mechanism(s) of digestion of crystalline cellulose by plant class C GH9 endoglucanases. J Mol Model 25, 240 (2019). https://doi.org/10.1007/s00894-019-4133-1

Download citation

Received: 25 October 2018
Accepted: 11 July 2019
Published: 23 July 2019
DOI: https://doi.org/10.1007/s00894-019-4133-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Insights into the mechanism(s) of digestion of crystalline cellulose by plant class C GH9 endoglucanases

Abstract

Similar content being viewed by others

Introduction

Methods

Model generation, geometry optimization, equilibration, and MD of class C enzymes

Invariant core analysis of characterised and putative class C enzymes

Structural analysis of 3D models of plant class C GH9 enzymes

Ligand preparation and utilization

Docking experiments of characterised plant class C GH9 endoglucanases

Results

Data organization and arrangement

Homology modelling and assessment of characterised class C GH9 endoglucanases

Assessing the contribution of CBM49 to the structural integrity of class C enzymes

Delineating the active site architecture of characterised plant class C enzymes

Analysing the DCCM to assess and characterise intra-protein residue interactions

Docking data suggests qualitative differences between individual class C enzymes

Delineating the cavities and grooves for crystalline cellulose catalysis and modification by plant class C GH9 endoglucanases

Principal component-based clustering to identify potential class C homologues

Discussion

Contribution of the GH9, linker, and CBM49 to the architecture of the active site plant class C enzymes

Molecular dissection of a putative active site of class C enzymes

Mechanistic insights into crystalline cellulose digestion by plant class C enzymes

Evolutionary significance for CBM49-mediated digestion of crystalline cellulose

Conclusions

Change history

27 July 2020

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher’s note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation