QM/MM study of the binding of H2 to MoCu CO dehydrogenase: development and applications of improved H2 van der Waals parameters

The MoCu CO dehydrogenase enzyme not only transforms CO into CO2 but it can also oxidise H2. Even if its hydrogenase activity has been known for decades, a debate is ongoing on the most plausible mode for the binding of H2 to the enzyme active site and the hydrogen oxidation mechanism. In the present work, we provide a new perspective on the MoCu-CODH hydrogenase activity by improving the in silico description of the enzyme. Energy refinement—by means of the BigQM approach—was performed on the intermediates involved in the dihydrogen oxidation catalysis reported in our previously published work (Rovaletti, et al. “Theoretical Insights into the Aerobic Hydrogenase Activity of Molybdenum–Copper CO Dehydrogenase.” Inorganics 7 (2019) 135). A suboptimal description of the H2–HN(backbone) interaction was observed when the van der Waals parameters described in previous literature for H2 were employed. Therefore, a new set of van der Waals parameters is developed here in order to better describe the hydrogen–backbone interaction. They give rise to improved binding modes of H2 in the active site of MoCu CO dehydrogenase. Implications of the resulting outcomes for a better understanding of hydrogen oxidation catalysis mechanisms are proposed and discussed.


Introduction
Hydrogenases represent a large enzyme family that catalyses the conversion of H 2 to protons and electrons, as well as the reverse reaction [1]. They are all metalloenzymes and are subdivided into three main classes depending on the type of metal ions that compose their active site: [NiFe], [FeFe] and [Fe] hydrogenases. Still, also other metalloenzymes have the ability to oxidise H 2 to protons and electrons, such as the MoCu CO dehydrogenase (MoCu-CODH, featuring a MoCu binuclear active site [2]) [3,4]. In the latter case, the reducing equivalents of the oxidation reaction pass from the active site bimetallic centre via two [2Fe-2S] clusters to a FAD cofactor and they are finally transferred to the quinone pool of the electron transport chain [5,6].
The accuracy of calculations at the molecular mechanics (MM) level critically depends on the accuracy of the employed force field (ff). Suboptimal ff parameters may give rise to incorrect conformations or misleading energies. This problem may also arise at the interface between the QM and MM subsystems in hybrid QM/MM calculations.
In the study of enzymes expressing hydrogenase activity, the relevance of employing accurate parameters for the H 2 -protein interaction is evident. In fact, gas diffusion towards or from the active site may correspond to a complex dynamic phenomenon that does not simply involve predefined channels but may be based on networks of dynamically and temporary formed pathways [18].
In a previous study, we proposed a mechanistic picture for the MoCu-CODH hydrogenase activity on the basis of QM/MM results [13]. As far as computed energies are concerned, it has been pointed out that both QM-cluster and QM/MM results may depend critically on the size of the QM system [19][20][21][22][23][24][25]. To tackle this issue, one of us has developed the BigQM approach within which stable energies are normally obtained using a large quantum model (∼ 1000 atoms). The latter is composed by all groups within 4.5-6Å from the QM system of the QM/MM calculation, all buried charged groups in the protein and by moving truncated groups three residues away from the active site [23]. A previous QM/MM study of the first steps of the CO oxidation reaction in MoCu-CODH [26] indicated that BigQM-based refinement of computed energies is a valuable approach also in the theoretical investigation of the latter enzyme.
In the present study, we have refined the QM/MM energies associated with the hydrogenase activity of MoCu-CODH. As described in the following, we found unexpected large differences in relative energies between the QM/MM and the BigQM results. We show that these large deviations are caused by poor van der Waals (VdW) parameters used for H 2 which lead to much too short interactions with nearby backbone HN groups. Therefore, we have developed improved H 2 VdW parameters, which correct the dubious interactions. A comparison with the results obtained with parameters previously described in literature shows that our developments represent a key step to obtain reliable energies and structures in the modelling of H 2 -protein interactions.

The MoCu-CODH protein
The setup of the protein was the same as in our previous QM/MM studies of MoCu-CODH [13]. All calculations were based on the crystal structure of CODH in its oxidised form (PDB ID: 1N5W) [2]. Only the active sitecontaining large subunit (L) of the MoCu-CODH dimer of LMS heterotrimers was considered. The setup of the protein involved a detailed analysis of all the protonable residues, based on calculations with PROPKA [27], studies of the hydrogen-bond pattern, the solvent accessibility and the possible formation of ionic pairs. Based on this, all Arg, Lys, Asp and Glu residues were considered in their charged form, with exception of Glu29 and Glu488 that were protonated on OE2 and Asp684 that was protonated on OD1. Cysteine ligands coordinating to metals were deprotonated. Histidine residues were assumed to be doubly protonated with exception of His177, 178, 210, 213, 243, 700, 753 and 788 that were protonated only on the NE2 atom and of His61, 339, 766 and 793 that were protonated on the ND1 atom. The protein was solvated with water molecules, forming a sphere with a 60-Å radius around the geometric centre. A 1-ns simulated-annealing moleculardynamics simulation, followed by a minimisation, was run to optimise added protons and water molecules.

QM/MM calculations
The QM/MM calculations were performed using the ComQum software [28,29]. The protein and the solvent were split into two subsystems: system 1 is the QM system (see Supporting Information) while system 2 contains the remaining part of the protein and the water molecules. During the QM/MM geometry minimisation, system 1 is described by a wavefunction and is relaxed by QM methods, whereas system 2 is represented by an array of partial point charges (electrostatic embedding) and is kept fixed at the crystallographic coordinates. Covalent bonds between the QM and MM systems were truncated using the hydrogen link-atom approach [30]. The QM system is capped with hydrogen atoms (hydrogen link atoms, HL), the position of which are linearly related to the corresponding carbon atoms (carbon link atoms, CL) in the full system. The latter are not included in the point-charge model. The total QM/MM energy in ComQum is calculated as [28,29] where E HL QM1+ptch2 is the QM energy of system 1 truncated by HL atoms and embedded in the set of point charges modelling system 2. E HL MM1,q 1 =0 is the MM energy of system 1, still truncated by HL atoms, without any electrostatic interactions. E CL MM12,q 1 =0 is the classical energy of the whole system, with CL atoms and with the charges in system 1 set to zero, in order to avoid double counting of the electrostatic interactions. Thus, ComQum employs a subtractive scheme with electrostatic embedding and van der Waals link-atom corrections [31].
The QM calculations were carried out at the BP86-D3(BJ)/def2-TZVP [32][33][34][35] level of theory using TURBO-MOLE 7.2 software [36]. The resolution-of-identity technique was used to accelerate the calculations [37]. The MM calculations were carried out by means of the Amber software [38], using the Amber FF14SB force field for the protein [39] and the general Amber force field [40] with restrained electrostatic potential (RESP) charges [41] for the molybdopterin-cytosine dinucleotide (MCD) cofactor. Parameters used for H 2 are discussed below. The two Fe 2 S 2 clusters-fixed during the optimisation processwere described with RESP charges and a non-bonded model [42]. Single-point calculations at the B3LYP-D3(BJ)/def2-TZVPD [35,43,44] level were also run on the optimised geometries.

BigQM calculations
The BigQM technique was applied with the aim of obtaining more accurate QM/MM energies [23,24] based on the knowledge that QM/MM calculations converge faster than QM-only ones, but models of about 1000 QM atoms are needed to obtain convergence of the energies [19,21,22,25]. The minimal QM system (system 1) was extended with all chemical groups with at least one atom within 6.0Å of it and junctions were moved three amino-acids away from each residue in the minimal QM system. In addition, all buried charges inside the protein were included, with exception of the two iron-sulfur clusters (see Supporting Information). The resulting BigQM model consisted of 983 (most calculations), 991 (when Cys388 is protonated) or 1019 atoms (when a water molecule is present in the active site). The BigQM calculations were performed on coordinates from the QM/MM optimisations, with a surrounding point-charge model, at the BP86-D3(BJ)/def2-SV(P) level. The multipole accelerated resolution-ofidentity J approach (marij keyword) was employed to accelerate the calculations [45]. The resulting energies were corrected with a QM/MM MM term for the BigQM region: Finally, the energies were also corrected by taking into consideration the B3LYP-D3(BJ)/def2-TZVPD functional and basis set effects, using calculations with the standard QM/MM QM system with a point-charge model of the surroundings:

The H 2 -CH 3 CONHCH 3 model
A small [H 2 + CH 3 CONHCH 3 ] model (see Fig. 1) was used to determine the potential-energy surface for the H 2 -HN interaction with QM methods and to subsequently determine van der Waals parameters to reproduce it. The calculations were performed at the B3LYP-D3(BJ)/def2-TZVPD level using TURBOMOLE 7.2 software. The resolution-of-identity technique was employed to accelerate the calculations. The potential-energy surface for the H 2 -HN distance was scanned from a distance of 3.50 to 1.00Å with steps of 0.25Å, through optimisation of all degrees of freedom with exception of the analysed H-H distance that was kept fixed at the predetermined distance. The Amber software was used to carry out all MM calculations in conjunction with Amber FF14SB force field for the CH 3 CONHCH 3 protein backbone model. Restrained electrostatic potential (RESP) charges were obtained separately for H 2 and CH 3 CONHCH 3 running a single point energy calculation at HF/6-31G* [46][47][48] on the QM-optimised structure, followed by antechamber charge generation. Naturally, the charges on the two H atoms in H 2 were zero. The H-H optimum bond length was set to 0.7443Å with a force constant of 416.038 kcal/mol/Å 2 . Improved non-bonded parameters were obtained for the H 2 molecule, as described in Section Improving the H 2 -backbone non-bonded MM parameters.

Evidence of inaccuracies in the QM/MM description of the H 2 -HN(backbone) interaction
In a previous study [13], we employed the hybrid QM/MM technique to investigate plausible mechanisms for the oxidation of H 2 by MoCu-CODH. Binding of H 2 to the active site was explored considering the enzyme resting state (1H), the presence of a water molecule in proximity to the copper ion (2H) and the hypothetical protonation of the Cu-bound cysteine residue (3H). Moreover, a variant of (1H) was considered in which Cu and the Mo-O eq atom act as a frustrated Lewis pair (1H-FLP). For each of these variants, we studied possible H 2 binding modes to the Cu centre, comparing the energy of the coordinated structures (hereafter indicated with an R tag) to the corresponding structure with the hydrogen molecule present in the active  Table 1.
Comparing the energy profiles obtained by the QM/MM and BigQM methods, large deviations are observed for the binding energies of H 2 to Cu (ΔΔE NoB/R BigQM/QMMM ) for all of the tested active-site states (1H, 2H and 3H). The differences in relative energies were found to be 27, 31, 18 and 10 kcal/mol for 1H, 2H, 3H and 1H-FLP, respectively. Moreover, the 1H species gave very large differences for all studied states, up 41 kcal/mol for the 1HR/1HP pair. ΔΔE 3HP2/3HR BigQM/QMMM is also rather large (11 kcal/mol), whereas all other intermediates exhibit a difference in relative energy that is less than 5 kcal/mol, i.e. similar to what is observed for other systems [26].
To understand the origin of these large differences between the QM/MM and BigQM relative energies, we focused on the 1HR → 1HP step and calculated singlepoint BigQM energies at the BP86-D3(BJ)/def2-SV(P) level (i.e. without the additional corrections in Eq. 2) for QM systems of increasing size. Thus, the original BigQM system-composed of 983 atoms (see Section Methods)was gradually reduced until the QM system of the QM/MM calculations was reached (46 atoms). The shrinking of the QM model was obtained by modulating three parameters for setting up the BigQM system (see Section Methods), i.e. the cutoff radius, the number of residues the junctions are moved away and whether charged buried residues are included. To reach the smallest models, the MCD cofactor was also truncated before or after the phosphate groups. The resulting energy differences between the 1HP and 1HR intermediates are reported in Table 2.
It can be seen that BigQM energies change by less than 2 kcal/mol if the buried charged residues are included or excluded. Somewhat larger variations are observed for the various cutoff radii (43-51 kcal/mol), but the difference between the two largest radii is less than 2 kcal/mol. Truncating the MCD cofactor also has a minimal effect (less than 1 kcal/mol). Instead, the large difference comes from the number of residues the junctions are moved away. In The models are obtained with varying cutoff radii (r inÅ), varying number of residues the junctions are moved away (n aa ) and whether buried charged residues are included (ch aa). The resulting QM systems are described by the total number of atoms (n atoms ), the total charge (charge) and the relative energy of 1HP with respect to 1HR Such short H 2 -backbone distances indicates that the van der Waals repulsion between the peptide NH group and the hydrogen molecule is underestimated at the MM level (in these QM/MM calculations, H 2 is in the QM system but Ser389 is in the MM system, so that their interactions are determined by the MM parameters, whereas in the BigQM calculation, both groups are in the QM system). This was examined in the following sections, by using a small [H 2 + CH 3 NHCOCH 3 ] model.

Improving the H 2 -backbone non-bonded MM parameters
To confirm that the problem lies in the VdW parameters of H 2 , we optimised the structure of the [H 2 + CH 3 NHCOCH 3 ] model as described in Section Methods. The optimised geometry, reported in Fig. 4, shows H 2 -HN distances of 2.52 and 2.73Å , i.e. much longer than the corresponding ones in the QM/MM structures of MoCu-CODH (see above). On the other hand, a full optimisation of the  Starting from the QM structure, we performed a series of constrained optimisations in order to scan the potentialenergy surface (PES) for the H 2 -HN interaction. The resulting QM PES is shown in Fig. 5a (black thick line).
The non-bonded interaction between two atoms at a distance of r is described in the Amber MM force-field as a sum of an attractive r −6 (dispersion) and a repulsive r −12 (exchange repulsion) term, according to the Lennard-Jones equation: where r * represents the distance at the energy minimum and defines the well depth. The original VdW parameters for H in H 2 were taken from the Amber FF14SB atom type H (H bound to N; the same parameters are used for H bound to S), with r * = 0.60Å and = 0.0157 kcal/mol. They clearly give too short equilibrium distances and a not enough repulsive PES at short distances as is also shown in Fig. 5a (red line marked H; the MM PESs were calculated by single-point calculations to avoid any distortion of the structure or the influence from other MM parameters).
To obtain a better description of the PES, we tested different sets of VdW parameters (r * = 1.50, 1.60, 1.70 and 1.85Å, as well as = 0.01, 0.0157, 0.05, 0.055, 0.06, 1, 10 kcal/mol). The results are also included in Fig. 5a (lines marked with the values of r * / ) and they are described in Table 3. It can be seen that the optimum H(H 2 )-HN distance depends mainly on r * , whereas is less important. It is also clear that it is not possible to reproduce both the long-and short-distance part of the PES (the MM potential is more repulsive than the QM one). We decided that it was most important to describe the minimum and longer distances, where the potential energy is close to the minimum energy and therefore often encountered in simulations. Figure 5b concentrates on that region. Employing the mean absolute deviation (MAD) in the interval d = 2.02-3.50Å as the quality measure, we obtained optimum VdW parameters for r * = 1.7Å and = 0.055 kcal/mol. With these, the MM curve reproduces the QM relative energies with a MAD of 0.06 kcal/mol (0.01 kcal/mol for distances 2.27-3.50Å). At shorter distances, the curve is much too steep but this is acceptable because this part of the PES will rarely be visited in real simulations. A comparison of the optimum structure obtained with QM and the new MM parameters is shown in Fig. 4. It does not fully reproduce the distances observed in the QM structure because H 2 moves to a more symmetric position in the MM structure and the methyl groups show different rotations.
The Amber FF14SB contains 13 sets of VdW parameters for different H atom types. Two of them (for water and OH groups) have zeroed parameters (the interaction is determined instead by VdW interactions by the heavy atom to which the H atom is bound), whereas the other have similar = 0.0157 or 0.0150 kcal/mol, but a varying r * = 0.6 − 1.487Å (low values for H bound to polar atoms and high values for H bound to C). Our suggested parameters are both significantly larger than the AMBER VdW parameters. Figure 5c shows that all AMBER parameters give too short H 2 -HN distances (but the H atom type is by far the worst).
In MD simulations of the diffusion of H 2 in hydrogenases [14][15][16][17], two different sets of VdW parameters have been used. One was suggested by Schulten and coworkers [15] and employs r * = 1.7682Å and = 0.1521 kcal/mol. From Fig. 5c and Table 3, it can be seen that it gives a slightly too long H 2 -HN optimum distance and a too repulsive potential at longer distances. The other is a more Fig. 5 PES for the [H 2 + CH 3 NHCOCH 3 ] model obtained with QM and different MM force fields (those starting with H are the various AMBER atom types). The pairs of numbers in the legends indicate the r * and parameters in our calibration. "Schulten" and "Hunter" indicate the force fields suggested in refs. [15] and [49], respectively. "best" is our suggested force field sophisticated model by Hunter et al. [49]. In variance to all the other models, it employs an extra point at the H-H bond centre with a charge of -0.95 e and charges of 0.475 e on the H atoms (in order to give a proper quadrupole moment).
Only the bond centre has non-zero VdW parameters, r * = 1.501Å and = 0.4769 kcal/mol. In Fig. 5c and Table 3, it can be seen that it gives a too short H 2 -HN optimum distance and a too repulsive potential at longer distances. Thus, we can conclude that the H 2 interaction with the protein backbone NH group is sensitive to the VdW parameters and that only our suggested parameters give an accurate description of this interaction.

Test case: H 2 interaction with the MoCu-CODH active site
In order to validate the new set of VdW parameters, we run new QM/MM optimisations of the 1HR intermediate. We tested two approaches. In the first (1HR-A), we used the QM system described in Section Methods but employed our new VdW parameters for H 2 (r * = 1.7Å and = 0.055 kcal/mol). In the second approach (1HR-B), the backbone neighbouring Cys388 was included in the QM system (see Figure S1), so that the H 2 -HN interaction was treated at a QM level. Both optimisations led to a complete detachment of H 2 from the copper ion (see Fig. 6), with a Cu-H distance of 3.28, 3.88Å and 2.97, 3.33Å for 1HR-A and 1HR-B. The main difference between the two structures is that Glu763 coordinates to Mo in 1HR-A with a Mo-O distance of 2.24Å, whereas the Mo-O distance is 3.23Å in the 1HR-B structure, because it instead forms a hydrogen bond to the backbone HN group of Ser389, which is also introduced in the QM region. This leads to quite extensive changes in the local structure around the Mo ion. The new QM/MM optimisations show that the VdW parameters of H 2 have a significant influence on the structure and that the 1HR minimum previously obtained by means of the old VdW parameters for the H 2 ( Fig. 2) was an artefact caused by a bad description of the hydrogen-backbone interaction. Moreover, it shows that it is crucial to include the backbone around Cys388 in the QM system, with the two NH groups pointing into in the active site, providing potential hydrogen bonds to both Glu763 or various reaction intermediates. Apparently, the MM parameters are not accurate enough to model the subtle competition between Mo and the HN group of Ser389 for the carboxylate group of Glu763.

Conclusions
In this paper, we have deepened our insights about the energetics of the putative initial steps for dihydrogen oxidation by MoCu CO dehydrogenase, keeping as a reference the QM/MM results from our previous investigation of  1HR-B (b). All distances inÅ. The colour code is the same as in Fig. 2 the hydrogenase activity [13]. The energies were refined by using the BigQM approach on the QM/MM optimised geometries, analogously to our previous work on the COoxidation catalysis by this enzyme [26]. Unexpectedly, we observed large deviations in the relative energies obtained by the two approaches. Therefore, a detailed investigation was carried out to identify the cause of the deviation. We found that the reason was a suboptimal description of the H 2 -backbone interaction, caused by poor van der Waals parameters for H 2 . Using the simple H 2 + CH 3 NHCOCH 3 model, we show that none of the previously suggested VdW parameters for H 2 is able to accurately reproduce the H 2 -HN(backbone) interaction, which has key relevance for a correct description of the binding of H 2 in the active site pocket. Therefore, we have developed a new set of VdW parameters, with the aim of providing an accurate description of this interaction.
Based on the energy refinement work here presented, it is possible to exclude the previously proposed occurrence [4] of side-on H 2 binding at the copper centre when the enzyme is in its resting state (1HR). Other routes for the initial interaction between substrate and enzyme active site need to be considered to account for the hydrogenase activity of Mo/Cu CO dehydrogenases, in line with previous theoretical investigations [13,50].
From a methodological point of view, the BigQM approach is confirmed to be a highly valuable approach for the exploration of the complex potential energy surface associated with metalloenzymes activity. However, it is a single-point post-processing of the QM/MM energies. Therefore, it can identify problems in the QM/MM structures, but it cannot correct them. If the MM potential is poor, it needs to be improved or the QM system needs to be enlarged. Then, new QM/MM and BigQM calculations are needed to get reliable results. Naturally, accurate MM parameters are even more important for MD simulations. Therefore, our new H 2 parameters work will most likely be relevant also for future efforts towards the theoretical description of the interaction between dihydrogen and metalloenzymes for which H 2 is a reagent, a product or both. The availability of improved sets of H 2 interaction parameters may be relevant also for the role that computational chemistry will play in the context of the design of artificial hydrogenase enzymes, a challenging new frontier in biomimetics [51].
Author contributions All authors designed the project; AR and UR did the calculations; all authors contributed to the writing of the manuscript.
Funding Open Access funding provided by Lund University. This work was supported by grants from the Swedish research council (project 2018-05003) to UR. The computations were performed on computer sources provided by the Swedish National Infrastructure for Computing (SNIC) at Lunarc at Lund University, partially funded by the Swedish Research Council (grant 2018-05973).
Data availability All raw data are available from the authors upon request.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.