On the Challenge of Obtaining an Accurate Solvation Energy Estimate in Simulations of Electrocatalysis

The effect of solvation on the free energy of reaction intermediates adsorbed on electrocatalyst surfaces can significantly change the thermochemical overpotential, but accurate calculations of this are challenging. Here, we present computational estimates of the solvation energy for reaction intermediates in oxygen reduction reaction (ORR) on a B-doped graphene (BG) model system where the overpotential is found to reduce by up to 0.6 V due to solvation. BG is experimentally reported to be an active ORR catalyst but recent computational estimates using state-of-the-art hybrid density functionals in the absence of solvation effects have indicated low activity. To test whether the inclusion of explicit solvation can bring the calculated activity estimates closer to the experimental reports, up to 4 layers of water molecules are included in the simulations reported here. The calculations are based on classical molecular dynamics and local minimization of energy using atomic forces evaluated from electron density functional theory. Data sets are obtained from regular and coarse-grained dynamics, as well as local minimization of structures resampled from dynamics simulations. The results differ greatly depending on the method used and the solvation energy estimates are deemed untrustworthy. It is concluded that a significantly larger number of water molecules is required to obtain converged results for the solvation energy. As the present system includes up to 139 atoms, it already strains the limits of computational feasibility, so this points to the need for a hybrid simulation approach where efficient simulations of much larger number of solvent molecules is carried out using a lower level of theory while retaining the higher level of theory for the reacting molecules as well as their near neighbors and the catalyst. The results reported here provide a word of caution to the computational catalysis community: activity predictions can be inaccurate if too few solvent molecules are included in the calculations.


Abstract
The effect of solvent on the free energy of reaction intermediates adsorbed on electrocatalyst surfaces can significantly change the thermochemical overpotential, but accurate calculations of this are challenging. Here, we present computational estimates of the solvation energy for reaction intermediates in oxygen reduction reaction (ORR) on a B-doped graphene (BG) model system where the overpotential is found to reduce by up to 0.6 V due to solvation. BG is experimentally reported to be an active ORR catalyst but recent computational estimates using state-of-the-art hybrid density functionals in the absence of solvation effects have indicated low activity. To test whether the inclusion of explicit solvation can bring the calculated activity estimates closer to the experimental reports, up to 4 layers of water molecules are included in the simulations reported here. The calculations are based on classical molecular dynamics and local minimization of energy using atomic forces evaluated from electron density functional theory. Data sets are obtained from regular and coarse-grained dynamics, as well as local minimization of structures resampled from dynamics simulations. The results differ greatly depending on the method used and the solvation energy estimates and are deemed untrustworthy. It is concluded that a significantly larger number of water molecules is required to obtain converged results for the solvation energy. As the present system includes up to 139 atoms, it already strains the limits of computational feasibility, so this points to the need for a hybrid simulation approach where efficient simulations of much larger number of solvent molecules is carried out using a lower level of theory while retaining the higher level of theory for the reacting molecules as well as their near neighbors and the catalyst. The results reported here provide a word of caution to the computational catalysis community: activity predictions can be inaccurate if too few solvent molecules are included in the calculations.

Introduction
The replacement of costly and rare precious metals with cheaper and more abundant elements in catalysts, for example in the oxygen reduction reaction (ORR) in fuel cells, is an important milestone towards sustainable energy production. To this end, heteroatom-doped graphenes have been explored extensively [1][2][3] [3] Computational predictions of the ORR activity of BG have overall been promising. The free energy approach using the computational hydrogen electrode (CHE) [13] is typically used to evaluate the ORR activity of computational models. Since the estimate of an overpotential obtained by this approach only reflects thermodynamic free energy of intermediates as well as initial and final states, it will be referred to as the thermochemical overpotential, η TCM , in the following.
Jiao and co-workers predict a η TCM range of 0.4-0.6 V for both BG and NG based on calculations using the B3LYP functional and molecular flake model systems, in good agreement with their experimental measurements. [12] A similar value, 0.38 V, is reported by Wang et al. for a BG nanoribbon [14] using the PBE functional and DFT-D3 [15,16] dispersion correction. The most optimistic prediction is reported by Fazio and co-workers with a η TCM of 0.29 V in a B3LYP-based study of a BG flake model system. [17] For reference, the measured overpotential of a typical Pt/C electrocatalyst is 0.3-0.4 V. [18] The experimental overpotential, however, depends on many other factors besides adsorption strength of the ORR intermediates, hence η TCM values are only a rough and purely thermodynamic estimate of the actual overpotential.
The exact mechanism of the ORR on BG is a matter of ongoing investigation. Fazio and co-workers established that the associative 4 epathway should be dominant for BG from a theoretical perspective. [ Jiao et al. [12] finds that a top adsorption geometry should be favored for the critical *O intermediate on BG while other studies [14,17,19] typically find a B-C bridge site to be favored for *O adsorption. It can be summarized that the active site debate for the ORR mechanism on BG is not settled yet.
Furthermore, the stabilization of the ORR intermediates on BG by water molecules, which has been found to be critical to correct energetic description of the ORR on NG, [20][21][22][23]  In the study by Jiao et al. [12] solvation effects are estimated using implicit [24] solvation models. However, implicit solvation models have in some cases been shown to fail at reproducing experimental solvation energy measurements or solvation energy results from simulations using many explicit solvent molecules. [25][26][27][28] We recently presented results for the ORR on NG where it was shown that high-level DFT calculations based on hybrid functionals yield a η TCM estimate close to 1.0 V vs. SHE, [29] indicating catalytic inactivity.

Calculation of the confidence interval for average ensemble properties
The confidence interval (CI) is a useful statistical measure for the error bar of an average result sampled from a normal distribution of values. It is therefore also useful to estimate the error bar of ensemble averages sampled through molecular dynamics integration; see Grossfield et al. [31] for more details. The CI defines an interval in which the true ensemble average lies with a certain probability. Here, a 95 % probability threshold is used to define the error bars, i.e., the 95 % CI.
The two-sided CI < x > of a variable x is defined as wherex is the ensemble average and U is the expanded uncertainty. The expanded uncertainty is defined as where k is the coverage factor and s(x) is the experimental standard deviation of the mean. s(x) is defined as where s(x) is the experimental standard deviation with the sample values x j , the arithmetric mean of the ensemble propertyx, and the number of independent samples n.  Figure 1 shows a representative illustration of the BG sheet model with an *O adatom in contact with 4 layers of water molecules; illustrations of sheet models with *OH and *OOH admolecules as well as models in contact with 1-3 layers of water are shown in figures S1 and S2, respectively. In agreement with studies by Fazio et al., [17] Ferrighi et al., [19] and Wang et al. [14] but in disagreement with the study by Jiao et al., [12] we find adsorption of the *O intermediate on the C-B bridge position to be energetically most favorable. The *OH and *OOH adspecies are found to adsorb most favorably on the B top position, which is in agreement with all previously mentioned studies.
The 32-atomic BG model system is converged with respect to the adsorption energy of the ORR intermediates, *O, *OH, and *OOH, see figure S4. This model therefore allows for the study of the adsorption energy -and the influence of solvation thereon -for a dilute system where the electronic effects of both the dopant atom and the adspecies are isolated and crowding effects can be ruled out.

Simulation parameters
The obtained data sets, including input files with simulation parameters, are distributed alongside this article and are available under DOI:10.5281/zenodo.7684918.

Choice of DFT code and functional
All simulations were performed with the VASP software version 6.2.0. [32][33][34][35] All calculations used the RPBE density functional [36] with DFT-D3 dispersion correction. [15,16] The RPBE-D3 method has been shown to yield water configurations in good agreement with experiments and higher-level methods at comparatively low computational cost. [37] Previous work on NG showed that adsorption energy values for the ORR intermediates can be wrong by up to 0.4 eV compared to the best estimate provided by the HSE06 hybrid functional, which was found to give the lowest error of 5 % compared to a diffusion Monte Carlo benchmark calculation. [29] Similar results were obtained for BG, [30] see table S1, where η TCM with the HSE06 functional was ca. 1.0 V vs. SHE and GGA functionals underestimated this best-estimate value by up to 0.6 V. Figure S3 shows the free energy trends for the ORR on BG obtained with various density functionals. However, our previous work also showed that ∆∆E solv does not share the same strong dependency on the functional. [29] This realization enables the present study since FPMD simulations as long as required for this work are currently not computationally feasible using hybrid functionals.

Static DFT calculations
Static calculations constitute single-point electronic energy calculations as well as minimization of the total energy with respect to the atomic coordinates.
Wave functions were self-consistently optimized until the energy in subsequent iterations changed by less than 10 −6 eV. The wave function was sampled using Monkhorst-Pack k point grids. [38] A k point density larger than 2×2×1 was found to give converged results for ∆∆E solv , see figure S5. Due to the wide variety of structures calculated in this work, refer to the data set distributed alongside this article to see the chosen k point density for each subset of calculations.
Simulations were carried out using a plane wave basis set with an energy cutoff of 600 eV to represent valence electrons and the projector-augmented wave (PAW) method [39,40] was used to account for the effect of inner electrons. See figure S6 for a convergence study for the PAW energy cutoff.
Gaussian-type finite temperature smearing was used to speed up convergence.
The smearing width is chosen so that the electronic entropy was smaller than 1 meV in all cases. Real-space evaluation of the projection operators was used to speed up calculations of larger systems, using a precision of 10 −3 eV atom −1 .
The L-BFGS limited-memory Broyden optimizer from the VASP Transition State Tools (VTST) software package was used to minimize the forces with respect to the atomic coordinates. The periodic images are separated by 14 Å of vacuum and a dipole correction is applied perpendicular to the slab.

Classical molecular dynamics simulations
Classical molecular dynamics (MD) simulations were carried out in an NVT ensemble at 300 K using the Langevin dynamics [41] implemented in VASP. The simulations used similar parameters to those outlined in section 3.3.1 but used a lower PAW energy cutoff of 400 eV and a 3×3×1 Monkhorst-Pack k point grid for computational efficiency. A Langevin friction parameter of γ = 4.91 was used throughout all simulations.
Dynamics were calculated initially until the total energy and temperature were converged. This equilibration period is not considered in the evaluation and was optimized on a case-by-case basis. After equilibration had been achieved, the actual sampling was performed over a period of time. In all simulations the geometry of the graphene sheet and the adspecies were constrained to the geometry obtained from a one-shot geometry optimization of the system in contact with n = 1 − 4 water layers, respectively. Only the water molecules were allowed to move during simulations. The Etot vs. To obtain ∆∆E solv , configurations were sampled every 1 ps, yielding 10 samples for the flexible MD data set and 100 samples for the constrained MD data set. This choice of sampling frequency is informed by the correlation time of water. The correlation time is the time it takes for complete re-orientation of the water arrangement, thus yielding a new, independent sample configuration that is statistically significant. It was found to be ca. 1.7 ps for water at room temperature using nuclear magnetic resonance spectroscopy. [43] The chosen sampling rate of 1 ps is smaller than this value as a result of the significant computational effort of performing long dynamics simulations. To minimize the risk of oversampling, Langevin dynamics was chosen to describe coupling to a heat bath. Langevin dynamics introduces a stochastic component to the propagation which can help to diversify configurations more quickly compared to fully deterministic dynamics.

One-shot minimization of atomic coordinates
The first data set is generated by bringing the BG model system with *O, *OH, and *OOH adspecies into contact with 4-32 molecules of water and minimizing the resulting configurations with respect to the atomic forces. This data set will be referred to as the one-shot minimization data set going forward. The chosen water configurations are modeled after those used by Reda et al. to calculate the solvation stabilization energy for the ORR intermediates on NG sheet model systems. [23] Configurations were created so that water molecules are only on one side of the BG sheet model or on both sides, denoted with the † and ‡ symbols, respectively, in table 2 and figure 2.
The ∆∆E solv results obtained from the one-shot minimization data set give rise to several trends. First, when water molecules are placed only on one side of the model, ∆∆E solv for the *O intermediate does not appear to be converged within the tested series of models as ∆∆E solv still increases from

NVT simulations
In order to probe if insufficient sampling of the configurational space is responsible for the inconsistent results of the one-shot minimization data set,  This result potentially indicates that the bond length constraint affects the coordination fine structure around the adspecies and thus may help to explain the differences between the flexible MD and constrained MD data sets. However, more detailed investigation is required to validate the importance of this observed difference.
It can be summarized that coarse-grained MD simulations yielded a data set that is significantly different from the more similar-to-each-other flexible MD and one-shot minimization data sets but did not yield more consistent

Re-sampling and energy minimization
The flexible MD and constrained MD data sets did not yield converged ∆∆E solv results. There are, however, two technical limitations which may reduce the significance of these data sets: 1. For these data sets, ∆∆E solv is calculated by using the average total energy from an NVT ensemble (T = 300 K) for the energy terms labeled "with solvent" in equation (4). The energy terms labeled "without solvent" are obtained from energy minimization calculations of the systems without solvent which are technically at 0 K temperature. While the BG sheet model and adspecies were kept frozen in the atomic configuration from a 0 K energy minimization during the MD and only water molecules were allowed to move, it cannot be fully excluded that results are biased due to a mismatch between the averaged finite-temperature MD values on one side and the locally optimized, 0 K values on the other side of the equation.
2. As outlined in section 3, the MD simulations -as well as the corresponding reference simulations of the systems "without solvent" needed for equation (4) -used a reduced PAW energy cutoff value of 400 eV to enable longer simulation times. This value is technically not converged for adsorption energy calculations, see figure S5.
In order to address both of these limitations, a fourth data set is produced. To this end, 20 structures are randomly sampled from each flexible MD trajectory and subsequently energy-minimized using the settings presented in section 3, i.e., with a larger PAW energy cutoff of 600 eV. This way, the diversity of the MD-generated configurations is maintained but all values entering equation (4) are obtained from energy-minimized atomic configurations using safe accuracy settings. This data set will be referred to as the resampled data set going forward. Figure 4 visualizes the ∆∆E solv results obtained from this data set.
The resampled data set shares similarities with the flexible MD and one-   First, convergence of ∆∆E solv , i.e. changes of < 0.05 eV between subsequent data points, is not observed in any case. It is impossible at this point to give a confident estimate of ∆∆E solv for the tested adspecies on the BG sheet model. This result indicates that more than 32 molecules (4 layers) of water are likely necessary to obtain converged results.
Converging the ∆∆E solv value to changes within chemical accuracy is of crucial importance. For example, consider the potential-dependent free energy trends for the ORR on the BG model presented in figure S3. These trends were obtained according to the free energy approach using the computational hydrogen electrode. [13] Using the most reliable functional for adsorption energy calculations on this material class according to benchmarks, [29,44,45] the HSE06 hybrid functional, the potential-determining step is the formation As an intermediary conclusion, the most likely explanation for the nonconversion of the ∆∆E solv results in general, as well as for the non-systematic differences between data sets more specifically, is that significantly more water molecules need to be included in simulations. It is unclear at this point how many water molecules would be required to achieve convergence. Sakong et al.
found that 6 layers of water are needed to obtain bulk water behavior and converged work function estimates in the case of FPMD simulations of a Pt (111) surface in contact with water. [46] However, Pt(111) is a strongly-coordinating surface compared to the hydrophobic BG sheet model in the present study.
Furthermore, the group tested for convergence of the work function and not for ∆∆E solv of reaction intermediates. Hence, it is unlikely that the number of 6 necessary water layers will also be the correct number of layers to include for the present system.
For these reasons, it is currently not possible to foresee the ultimately required number of water molecules required to obtain converged ∆∆E solv results for this system. Attempting to find this number systematically by dynamics simulations with DFT atomic forces quickly becomes computationally unfeasible; simulations for the models in contact with 32 water molecules in this work already required several weeks of computational time. Even if these considerable time and energy resources would be spent to identify this number for the present problem, such a study would have to be repeated for every new material under investigation. Even though the influence of solvation has been shown to significantly affect free energy trends, the authors are therefore convinced that such simulations cannot (yet) be performed routinely.
We have thus come to the decision to publish the present results as-is and to not continue running simulations with model systems that include more and more water molecules at ever increasing computational cost. Instead, we are currently focusing research efforts into development of a 2D periodic polarizable-embedding QMMM method that will allow for simulations with thousands of water molecules while retaining electronic structure level accuracy for the surface model and the closest few layers of water molecules. This method will use the Single Center Multipole Expansion (SCME) ansatz to describe polarization of water molecules which is crucial to accurately describe interface processes such as charge transfer. [47,48] Because the boundary plane between the QM and MM regions has exclusively water molecules on both sides, and because it is not necessary to describe diffusion to or from the surface to obtain ∆∆E solv results, an efficient restrictive boundary method can be used. The SAFIRES method recently developed in our groups was build to support 2D periodic boundary conditions. [49] A publication on the technical implementation of the 2D periodic polarizable-embedding QMMM ansatz for the open-source GPAW and ASE programs is currently in preparation in our groups. The goal is to use this method to revisit the BG model system in the present work.

Analysis of potential error sources
To conclude the discussion of the data sets presented in this work, the following sections will rule out various potential error sources that readers familiar with dynamics simulations and the pitfalls of solvation energy calculations may be concerned about.

Influence of the sampling frequency on the results
Configurations were sampled from the dynamics simulations at an interval of 1 ps. It is important to ask how the ∆∆E solv results are affected by changes of the sampling frequency. Figure S8 compares ∆∆E solv results from the flexible MD and constrained MD data sets analyzed every 2 ps, 1 ps, 100 fs, and 10 fs.
The ∆∆E solv results appear to be robust against the choice of sampling frequency. The only significant differences are observed when between sampling the flexible MD data set every 2 ps (5 total samples) or every 1 ps (10 total samples) and faster. This difference can be attributed to the poor statistics in the case of the 2 ps sampling frequency.
The size of the error bars is affected significantly by the sampling frequency because the square root of the number of samples, √ n, enters the divisor of equation (7). This test therefore highlights the importance of choosing a reasonable sampling frequency based on the physical properties of the system to obtain a meaningful error bar. It is easy to get lured into a false sense of security by oversampling the results to obtain small error bars.

Spurious dipole and quadrupole corrections
Total energy calculations were performed using dipole and quadrupole correction perpendicular to the surface to avoid interactions between periodic repetitions of the simulation box. It is known that first-row semiconductors with defects, of which BG is an example, can lead to large dipole and quadrupole moments, thus making the correction necessary. However, our simulations showed that the correction can sometimes give erroneously large corrections of several eV for unknown reasons. After re-optimizing the wave function in a single-point calculation, the correction is then found to be of a reasonable magnitude again, usually on the order of some meV.
Because it is impossible to perform this manual correction for all calculations in this work, the consistency of the results is representatively examined by analyzing the average dipole and quadrupole correction energy (and uncertainty thereof) of the resampled data set. Figure S9 shows the results of this analysis. The average correction energy is <= 0.02 eV in all cases, which better than chemical accuracy. Error bars are found to be as large as 0.01 eV in some cases and close to 0.02 eV in one extreme case (BG-OOH in contact with 24 water molecules), indicating that the dipole and quadrupole energy correction is indeed volatile (in relation to the absolute values) and dependent on the exact geometry of the system. However, due to the small overall magnitude of the correction, it can be concluded that this correction should not significantly influence the calculation results.   shows that the results do not become more consistent when the dispersion energy contribution is removed. Hence, it can be concluded that any volatility of the dispersion correction results is also not the cause for but most likely the result of the erratic nature of the entire data set.

Spurious dispersion correction
One caveat in this analysis and discussion, however, is that this a posteriori removal of the final dispersion correction energy does not remove the entire influence of dispersion correction on the data set. Both the MD simulations and the local minimization of the structures in the resampled data used dispersion correction throughout, hence the final structures (re-)analyzed here are generated on the RPBE-D3 potential surface. Despite this caveat, it is still unlikely that dispersion is the driving factor behind the erratic results since in particular the RPBE-D3 functional combination has been shown in the past to produce water structure that is in good agreement with experiments. [37]

Influence of minimizing the reference systems
This concern is related to the discussion about inconsistent cell size in section 5.2.4. As pointed out there, the reference systems were obtained from the solvated parent systems by removal of the water molecules and subsequent energy-minimizaion of the resulting atomic configurations. This approach was chosen to account for the possibility that the most stable atomic arrangement of the BG-adspecies system may change once water molecules are removed.
However, this approach creates a potential inconsistency: by optimizing the atomic configuration of the reference systems, the ∆∆E solv values obtained from equation (4) do not only contain the interaction of the BG-adspecies system with the water molecules but also the reorganization energy of the systems when going from a system in vacuum to a solvated system. Overall, however, the differences appear to be systematic across the board and do not change the trends. Therefore, this factor is also not responsible for the erratic, non-converging behavior of ∆∆E solv with increasing number of water molecules.

Influence of using the lowest-energy structures to
obtain the solvation stabilization energy Hence, this approach is applied to the present data set. The flexible MD data set was re-analyzed to find the structure with the lowest total energy for each combination of adspecies and number of water molecules. The obtained images were then energy-minimized using the tight accuracy settings outlined in section 3. Figure S12 shows the results of this approach. It can be concluded that this approach not only did not resolve the erratic results but can further distort the results because the close-to-ideal local configurations optimized in this case likely do not represent the average configurations of water molecules around the adspecies in real, finite-temperature systems.

Influence of constraining the geometry of the BG sheet
100 ps of classical dynamics without bond length constraints on the water molecules and no geometry constraint on the BG sheet and adspecies were accidentally performed for the BG-OOH system in contact with 1 layer of water.
This mistake, however, can be used to probe the influence of the geometry constraint on the BG-OOH system. Figure S13 compares the total energy and temperature trends over the course of the simulation time for the simulations with and without geometry constraint on the BG-OOH system. Most notably, the total energy fluctuations are significantly increased in the case of the model without constraint.
The increased amplitude of fluctuations translate to a larger error bar. Hence, without the geometry constraint on the BG-OOH backbone, more sampling statistics is required to reduce the uncertainty to an appropriate level. In the interest of computational feasibility, the geometry constraint therefore turns out to be an almost necessary prerequisite.
Finally, figure S14 compares the g(z) of the systems where the BG sheet was constrained against that of the non-constrained system. No significant differences were observed. This result indicates that constraining the BG sheet does not significantly affect the interactions between the surface and the first water layer from a structural point of view.

Embedded solvation approach
The embedded solvation approach, where a small cluster of explicit solvent molecules is used in combination with an implicit continuum description of the solvent bulk, has recently been employed to good effect. [50,51] In the beginning of this study, the one-shot minimization data set was, in fact, computed using the embedded approach and similarly erratic results were obtained. The implicit solvent model was then discarded for the remainder of this study to reduce the number of potential error sources. A detailed discussion of the simulation parameters and potential error sources is provided to rule out that technical errors lead to these erratic results.
We conclude that 32 water molecules, which is the equivalent of 4 layers of water in this model system, are not sufficient to describe solvation of the adspecies within chemical accuracy. Chemical accuracy, i.e. convergence of ∆∆E solv to changes of < 0.05 eV when adding more and more water molecules, is essential since any reduction of the free energy of the potential-determining intermediate will lead to a proportional reduction of the thermochemical overpotential as well.
These results emphasize that new simulation methods are required to be able to calculate large enough systems to obtain converged ∆∆E solv results since molecular dynamics simulations with DFT forces quickly become computationally unfeasible when adding more and more water molecules. Our groups are therefore focused on implementing a 2D periodic hybrid method (often referred to as QM/MM) for the open-source ASE and GPAW software packages which will enable calculations with thousands of water molecules.
Another promising approach to tackle this problem is the recently developed on-the-fly machine learning force field training method.
[52] This approach could be used to train a machine learning force field on a small system and then upscale the system to contain many water molecules while retaining close-to-DFT accuracy.
Finally, we believe in the importance of presenting these negative results to the catalysis community as a word of caution. It is easy to underestimate the number of explicit water molecules required to obtain sufficiently accurate solvation energy results.  Figure S1 illustrates the 32-atomic BG sheet model with the *O, *OH, and *OOH adspecies used throughout this study. Figure S2 illustrates the BG sheet model with an *O adatom in contact with 1-4 layers of water molecules.

Supplementary information. A Supplementary
Interactive visualizations of the model systems can be found online at https:// bjk24.gitlab.io/bg-solvation/docs/visualization.html. Using η TCM as a descriptor for ORR activity, this test is performed to investigate how strongly the thermochemical results depend on the chosen density reproduce a Diffusion Monte Carlo benchmark value most accurately out of all tested functionals. [29] Thus, the HSE06 result is used as a reference value in the following. Figure S3 shows free energy changes of the ORR intermediates on the BG model at 0 V vs. SHE and at the extrapolated onset potential for each functional. To further illustrate this conclusion, thermochemical overpotentials η TCM are calculated from the extrapolated onset potentials U onset as η TCM results are summarized in Table S1. The hybrid functionals  The test below calculates energy differences according to where E ads tot is the total energy of a system with an adspecies (*O, *OH, or *OOH) and E clean tot is the total energy of the BG sheet without any adspecies.
To test for supercell size convergence, it is not necessary to calculate an actual adsorption energy by taking into account the total energy of the adspecies calculated from molecules such as O 2 , H 2 , and H 2 O because their total energy will change only in small ways as the size of the supercell increases. There is always a slight change because plane waves always fill the entire box, also for molecules, which makes the total energy dependent on the box size. However, this effect is insignificant compared to the influence of decreasing dopant and adatom concentration as a function of the supercell size. Figure S4 shows the results of this test.
Since these convergence tests were performed at a much earlier date  1. This data set uses a PAW energy cutoff of ENCUT = 500. As can be seen further below in section 7.3.3, the choice of ENCUT = 600, which was used in the main article for geometry optimization calculations, is on the paranoid end of safe and there is no reason to assume that using a cutoff of 500 eV in this instance had an adverse effect on the results.
2. Note that the k grid was individually optimized for each system.
The k grid optimization runs are not shown for the sake of brevity.
The entire calculated data set is provided in a archive found under DOI:10.5281/zenodo.7684918; consult the KPOINTS files in the respective subfolders for the converged k grid settings. 3. We expect that the trends obtained from these functionals (PBE and HSE06, the latter of which constitutes a PBE functional with 25% exact exchange and a screening parameter) are fully transferable to the RPBE functional with DFT-D3 dispersion correction that was used for production calculations later on. RPBE and PBE belong to the same family of GGA-rung functionals, more specifically RPBE is a re-parameterized PBE functional optimized towards surface (adsorption) calculations. The differences between PBE and RPBE are minor, unlike for example the differences between PBE and functionals like BEEF or SCAN which use fundamentally different potential terms and would therefore require careful re-investigation. This convergence test shows that the energy differences are well converged from the beginning. This result is in contrast to what was observed for NG where the the size 16 and 24 data points did not show converged results yet. [29] However, to stay consistent with the previous work on NG, we chose to use the 32 atomic model system going forward.
There is also another reason for using the slightly larger system: there is the possibility that if the system size is chosen too small, the water atoms in the individual layers become too crowded and do not have the necessary space to relax and accommodate the surface and adspecies properly. To investigate crowding effects in the lateral directions properly, the MD simulations presented later on in section Results should be repeated for a set of surface models with increasing size in x and y direction; however, such a test was not computationally feasible at the time of this study.

k grid convergence of the solvation stabilization energy
Convergence of ∆∆E solv with respect to the k grid is tested. Our hypothesis was that the k grid density of ∆E ads and ∆∆E solv should be different because the interaction is fundamentally different (covalent interaction of adatom with periodic surface vs. non-convalent long-range interaction of solvent molecules with -mostly -the adspecies and only very lightly with the hydrophobic graphene surface). This hypothesis was strengthened by the observation that ∆E ads and ∆∆E solv do not share the same dependence on the density functional. [29] In this case ∆∆E O * solv for the BG system with an *O adatom is calculated as  Figure S5 shows the results of this test.
From this test, ∆∆E solv results are converged using a 2x2x1 k grid.
Arguably, the results can be regarded converged already at a 1x1x1 grid since the energy difference between the smallest and the next k grid is only ca. 0.012 eV. For the MD simulations, we erred on the side of caution and used a 3x3x1

Convergence of the plane-augmented wave energy cutoff
The same approach as above for the supercell size was used to establish the relationship between ∆E and the plane-augmented wave energy cutoff. Only the *O adatom is tested in this case since behavior appears to be similar for all intermediates (see for example figure S4) and because the *O intermediate in particular was found to be the most notorious in benchmark calculations with NG. [29] The 32-atomic BG sheet model was used. Aside from using the PBE functional and a changing ENCUT parameter, the other simulation parameters were consistent with those summarized in the Computational Details section in the main article. Figure S6 shows the results of this test.
Results show that a PAW cutoff energy of 500 eV is sufficient to obtain     This test is performed to check if potentially erratic dipole and quadrupole correction values, which were empirically observed in this work to sometimes occur for no apparent reason, are causing the ∆∆E solv results to be erratic. Figure S9 visualizes the dipole and quadrupole energy correction results for the resampled data set.   Figure S10b reproduces the ∆∆E solv results for the resampled data set shown in Figure 4 in the main manuscript but with the dispersion energy contribution removed from the total energy values before calculating ∆∆E solv . Results indicate that while the dispersion correction is significant in terms of absolute values, removing the E disp does not stabilize the ∆∆E solv trends either (figure S10b). There is, however, an important caveat to this test: dispersion correction was included during the MD simulations from which the resampled data set was generated and also during minimization calculations of the structures in the resampled data set. Hence, this a posteriori removal of the dispersion contribution can only be a rough indicator of its influence.
The data sets would need to be reproduced completely without dispersion correction to conclusively rule out this parameter. Table S2 summarizes the total energy values of the reference systems without water molecules from the resampled data set used to calculate ∆∆E solv . The reference systems were generated from the parent systems that include water molecules by removing the latter. This step was necessary because the simulation box dimensions are different depending on how many water molecules are included. This table illustrates the differences introduced into the total energy by variable cell dimensions. The differences are < 0.01 eV in all cases and, even if the cell dimension had not been corrected for in this way, are unlikely to distort the simulation results in a meaningful way.

Influence of energy-minimizing the non-solvated reference systems
Another potential source of inconsistency is the way that total energy values for the non-solvated reference systems are obtained. In the main article, the configurations of the non-solvated systems were obtained by removing the water molecules from the solvated systems and minimizing the resulting atomic configurations. However, with this approach, ∆∆E solv does not only contain the interaction between the surface-adspecies system with the water molecules but also the rearrangement energy from the relaxation of the reference system. Figure S11 explores    in figure S12 loosely resemble the trends for the resampled data set, ∆∆E solv for the *O adspecies is significantly more negative than with any other analysis strategy. It can be concluded that this approach not only did not resolve the erratic trends but likely further distorted the results because the close-toideal local configurations optimized in this case likely do not represent the average configurations of water molecules around the adspecies in real, finitetemperature systems.

Influence of freezing the BG sheet
The flexible MD simulation of BG-OOH in contact with 8 (flexible) water molecules was accidentally performed with a non-constrained BG sheet. While the data shown in the main article was obtained with the correctly constrained model, this mistake allows to probe the influence of this geometry constraint.  This result may indicate that significantly more sampling would be required in the case of the non-constrained BG sheet to obtain a good estimate of ∆∆E solv with a small-enough error bar. Figure S14 does not show any significant differences between the two systems, indicating that freezing the BG sheet and adspecies does not significantly change the interaction with the first layer of solvent molecules.