Machine learning potentials for metal-organic frameworks using an incremental learning approach

Vandenhaute, Sander; Cools-Ceuppens, Maarten; DeKeyser, Simon; Verstraelen, Toon; Van Speybroeck, Veronique

doi:10.1038/s41524-023-00969-x

Machine learning potentials for metal-organic frameworks using an incremental learning approach

Article
Open access
Published: 06 February 2023

Volume 9, article number 19, (2023)
Cite this article

Download PDF

You have full access to this open access article

npj Computational Materials

Machine learning potentials for metal-organic frameworks using an incremental learning approach

Download PDF

10k Accesses
45 Citations
13 Altmetric
Explore all metrics

Abstract

Computational modeling of physical processes in metal-organic frameworks (MOFs) is highly challenging due to the presence of spatial heterogeneities and complex operating conditions which affect their behavior. Density functional theory (DFT) may describe interatomic interactions at the quantum mechanical level, but is computationally too expensive for systems beyond the nanometer and picosecond range. Herein, we propose an incremental learning scheme to construct accurate and data-efficient machine learning potentials for MOFs. The scheme builds on the power of equivariant neural network potentials in combination with parallelized enhanced sampling and on-the-fly training to simultaneously explore and learn the phase space in an iterative manner. With only a few hundred single-point DFT evaluations per material, accurate and transferable potentials are obtained, even for flexible frameworks with multiple structurally different phases. The incremental learning scheme is universally applicable and may pave the way to model framework materials in larger spatiotemporal windows with higher accuracy.

Machine learned force-fields for an Ab-initio quality description of metal-organic frameworks

Article Open access 20 January 2024

High-throughput predictions of metal–organic framework electronic properties: theoretical challenges, graph neural networks, and data exploration

Article Open access 17 May 2022

Capturing chemical intuition in synthesis of metal-organic frameworks

Article Open access 01 February 2019

Introduction

Metal-organic frameworks (MOFs) have become one of the most intriguing classes of materials in the last decades as they may exhibit anomalous responses upon exposure to external stimuli such as temperature, pressure, or guest sorption; notable examples are negative thermal expansion, negative gas adsorption, and large-amplitude structural transitions. Apart from being academic peculiarities, MOFs have now also found their way into technologically important applications such as gas storage, sensing, separation, or catalysis.^{1,2,3,4,5,6,7,8,9}. Recently, it became clear that the dynamic response of MOFs is largely affected by spatial heterogeneities at various length scales, varying from the subnanometer scale for isolated defects to the mesoscale when correlated nanoregions are present and to the microscale when accounting for the finite crystal size.^{10,11,12,13,14} It led to the terminology of spatiotemporal processes within MOFs, referring to the entanglement between the dynamic response of the material and the existing spatial disorder.^15,16 Modeling spatiotemporal processes in realistic MOFs at length and time scales comparable to experimental conditions is a very ambitious goal, and requires a range of innovative methods to translate atomic-scale information into macroscopic insight. One key ingredient is the method used to describe the interatomic forces. Current modeling efforts on MOFs either rely on density functional theory (DFT), which may obtain an accurate description of framework-guest interactions when complemented with proper dispersion models to describe longer-ranged interactions, or on classical force fields, which are simple analytical functions that ignore the quantum mechanical description of the electrons. Classical force fields have become tractable within the MOF field as they are now starting to bridge the length scale gap towards experimentally observed MOF crystallites^17,18. However, they are less accurate than DFT-based methods and exhibit only limited transferability; force fields derived under certain thermodynamic conditions are not necessarily applicable to other operating conditions. In addition, they are generally unable to describe bond formation or breakage. For these reasons, DFT-based methods are in principle preferred but their applicability is limited to nanometer-sized structural models and time scales up to about 100 ps, even on state-of-the-art computing infrastructure. Ideally, it would be possible to combine the best of both worlds, whereby interatomic forces are evaluated at an accuracy comparable to DFT but with a computational efficiency similar to classical force fields. Machine learning potentials (MLPs) may offer such a hybrid alternative; they may learn the potential energy surface (PES) at a given level of quantum chemical accuracy, and can then be used to accelerate subsequent energy evaluations by multiple orders of magnitude^19,20. MLPs have been successfully developed for a variety of different materials and molecules. Broadly speaking, these models may be categorized into either kernel regression methods, which determine the interaction energy by comparing a given configuration to a set of reference configurations^21,22, or neural network potentials, which directly determine a high-dimensional representation of the PES based on thousands or even millions of parameters²³.

Since 2019, a number of studies have emerged in which neural network potentials have been employed to study e.g., mechanical or diffusion properties for framework materials^24,25,26. Although they demonstrate the potential of MLPs in the field of modeling MOFs, they required a large set of expensive DFT molecular dynamics (MD) trajectories in order to generate the necessary amount of training data (over 10,000 snapshots in all three cases). Such approaches become computationally intractable for systems where large portions of phase space need to be sampled. This is the case for more complex frameworks with large unit cells such as HKUST-1(Cu) or MOF-808(Zr), but also flexible frameworks with multiple stable phases pose a real challenge in terms of generating training data in a computationally efficient manner (e.g., MIL-53(Al) in Fig. 1). For these frameworks, all relevant regions in phase space need to be properly sampled, including not only the stable phases of the material within the thermodynamic conditions of interest but also activated regions and all points along important transition paths. This is very difficult to achieve using equilibrium first principles MD because simulation times are limited when using DFT to evaluate the interatomic forces at each timestep. In addition, configurations obtained from equilibrium MD typically sample regions close to free energy minima and fail to capture important transition states in between phases, even though these are essential to include in the training data.

**Fig. 1: Overview of the atomic structures of the frameworks.**

In this work, we present an incremental learning scheme for the construction of accurate MLPs with thermodynamic transferability without requiring large amounts of DFT input data. We employ NequIP²⁷, a newly proposed message passing neural network (MPNN) that exhibits extraordinary data efficiency due to its use of equivariant feature representations for the atomic environments (see below)²⁸. The key observation in our learning scheme is that a proper sampling of the phase space can be achieved without performing extensive quantum mechanical-based molecular dynamics simulations. Instead, inspired by other active learning approaches^29,30,31, we propose to sample the phase space using the MLP itself, in combination with on-the-fly training whenever it encounters unknown regions. In this sense, systematically improved MLPs are generated. Key to our approach is the use of metadynamics (MTD) in order to enforce the sampling towards unexplored regions. This allows us to ensure that both in- and out-of-equilibrium configurations are automatically included during training, even when large free energy barriers are present. Importantly, we show that our incremental learning algorithm is highly efficient and does not require a kind of uncertainty metric that determines which samples to include in the training data, as is often the case for other active learning approaches proposed elsewhere^30,32. We demonstrate the accuracy and transferability of models constructed with the incremental learning scheme based on two representative frameworks (Fig. 1): the mechanically rigid UiO-66(Zr) framework, which consists of twelve-fold coordinated inorganic bricks and organic benzene-1,4-dicarboxylate (BDC) ligands³³, and the flexible MIL-53(Al) framework, which consists of one-dimensional (periodic) aluminum hydroxide chains connected with the same organic ligands³⁴. Incremental learning addresses the main disadvantages of traditional data generation with an iterative learning approach, in which efficient and parallel enhanced sampling molecular dynamics simulations are combined with on-the-fly learning. Based on a single atomic structure and a definition of one or more collective variables, the algorithm proceeds with simultaneously exploring and learning the phase space of the material; the approach is schematically shown in Fig. 2. The algorithm starts by constructing a first generation MLP, which is trained based on a small set of initial configurations that are obtained by applying random perturbations to the particle positions and strain components of the initial configuration. This first generation MLP is then used in a short (~1 ps) multiple walker metadynamics (MTD) simulation in order to explore the phase space around the initial configuration. The final configuration of each of the walkers is extracted and subjected to a new DFT calculation to obtain the energy and forces. The latter are added to the training and validation sets after which a next generation MLP is obtained by training the model for a short amount of time. Even though the sampling was performed with an initially inaccurate MLP, it still suffices to explore a meaningful region of phase space and generate almost decorrelated samples. The sampling time per iteration may be kept relatively short as it serves the purpose to gradually explore larger portions of phase space. After training, the model is considered to have learned an incremental region in phase space, and may then be passed on to the next iteration in which it continues the MTD sampling. The state of the bias potential is retained from the previous iteration as to ensure that the walkers explore a slightly different region of phase space in each iteration. By continuously alternating the enhanced sampling and training steps, the entire phase space of the collective variable is explored and learned on-the-fly without the need to perform expensive DFT-based molecular dynamics simulations. Importantly, our approach ensures that atomic configurations for which a QM evaluation is performed are always separated by a ~1 ps MTD trajectory, which implies that they are at most only weakly correlated. As such, we are guaranteed that there is little redundancy between the QM-evaluated configurations in the training data even though we did not rely on any specialized uncertainty measures as found in other active learning schemes^30,32.

**Fig. 2: Overview of the incremental learning approach as a combination of enhanced sampling and on-the-fly training.**

The incremental learning approach as proposed here is most efficient when combined with MLP models which are easy to train and which are able to learn a large number of atomic environments and chemical species. While existing MLPs for MOFs used strictly invariant architectures^24,25,26, we employ the neural equivariant interatomic potential (NequIP)²⁷ because it achieves a data efficiency similar to kernel regression methods while maintaining a cost of evaluation that is independent of the size of the training set. Like many other message passing neural networks (MPNNs)^35,36, NequIP computes the total potential energy of a given configuration as a sum of individual atomic contributions, whereby each contribution is the result of a series of convolutions taken over neighboring atomic environments as to represent the physical interactions between nearby atoms. Equivariant MPNNs differ from more established MPNNs such as SchNet³⁷ or DeePMD³⁸ because they complement rotationally invariant, scalar-like features with equivariant features in the representation of atomic environments³⁹ (Fig. 3a). Inspired by tensor field networks²⁸, NequIP assigns each of the features to a specific irreducible representation of the rotational group SO(3). Each irreducible representation is characterized by a rotation order $\ell \in {\mathbb{N}}$, which defines how features transform when the cartesian coordinates at the input are rotated. The first two rotation orders (ℓ = 0 and ℓ = 1) correspond to scalar and vector features respectively, and are illustrated in Fig. 3a. Effectively, features for which ℓ > 0 are better suited to represent directional information of the atomic environment throughout the layers of the network, and are the primary reason for the exceptional data efficiency of equivariant MPNNs as compared to invariant MPNNs and even kernel regression methods²⁷. To demonstrate the importance of equivariant features, we performed separate incremental learning runs for the invariant (${\ell }_{\max }=0$) and equivariant (${\ell }_{\max }=1$) networks, and monitored the mean absolute error (MAE) on the force predictions for an independent test set as a function of the number of iterations (Fig. 3). In addition, we explicitly show the efficiency of our learning approach by indicating the corresponding total number of QM evaluations required to achieve a given error. As expected, the test error decreases monotonically as a function of the number of iterations as training and validation sets contain an increasingly larger number of atomic environments which the models may learn from. While the number of weights in both networks was roughly equal to about 200,000, the equivariant model clearly required far less QM evaluations to achieve a given test error as compared to the invariant model.

**Fig. 3: Comparison between invariant and equivariant feature representations.**

Results

Incremental learning scheme for UiO-66(Zr) and MIL-53(Al)

We demonstrate the efficiency of incremental learning based on case studies for both UiO-66(Zr) as well as MIL-53(Al) (Fig. 1). Final models are validated in terms of their test error accuracy, transferability towards out-of-equilibrium configurations, and their ability to predict structural and mechanical properties in agreement with the DFT reference. Since the learning algorithm takes care of both data generation and training, the only required user input is an initial atomic structure of the system as well as a collective variable along which the bias should be applied. For both frameworks, we chose the unit cell volume as collective variable and employ NequIP models with ${\ell }_{\max }=1$ as MLP. The QM calculations are performed using DFT at the PBE-D3(BJ) level of theory. In each iteration, 50 MTD walkers are used to explore the phase space and generate 50 configurations that are added to training and validation sets. The learning algorithm started from the DFT optimized geometry. Chemical bond integrity was preserved throughout all iterations and the constructed MLPs are therefore only intended to be employed in nonreactive dynamics. Although outside the scope of this work, the same approach may be used to construct reactive MLPs; the MTD bias will continue to increase in each iteration until it exceeds the barrier needed to break the coordination bonds between the metal and the ligand, at which point the enhanced sampling will start generating defective framework configurations. A full overview of all computational details may be found in the “Methods” section and in the Supplementary Information.

For both systems, we constructed a test set of reference data that is obtained based on a large number of first principles MD trajectories at 600 K (see Supplementary Note 1). The test set was specifically generated as to include strongly out-of-equilibrium unit cell volumes, as these become relevant for the behavior of the framework at high temperatures and/or pressures. The energy and force MAEs are shown in the top half of Fig. 4 as a function of the MTD collective variable (the unit cell volume), and remain below 1 meV per atom and 30 meV Å⁻¹ across the entire volume range. Such values are in fact lower than the intrinsic uncertainty on these quantities due to e.g., the choice of functional or even the basis set dependence⁴⁰, and further training with more data would at that point no longer yield significant improvements in predictive power. Note that even the volume region between the two stable phases for MIL-53(Al) is well predicted, indicating that the model has successfully discovered and learned the phase transition by itself even though it only initially received the closed pore (cp) configuration as input. This is the case because the MTD bias ensures that the sampling in each iteration is gradually expanded towards unit cell volumes that are not yet explored.

**Fig. 4: Validation of the obtained potentials for UiO-66(Zr) and MIL-53(Al).**

Once the training data covers all unit cell volumes of interest, no additional sampling is necessary and the algorithm may be terminated. For UiO-66(Zr), this was the case after only 11 iterations or 600 QM evaluations, whereas MIL-53(Al) required 19 iterations or 1000 QM evaluations because the MTD simulations take some time to overcome the free energy barrier with Gaussian hills. Nevertheless, it is clear that incremental learning generates trained MLPs at only a fraction of the computational cost when compared to passive learning approaches even though the accuracy is maintained across thermodynamic conditions (see Supplementary Note 4). In fact, the total computational cost of training data generation is now equivalent to or even less than a routine geometry optimization or a very short first principles MD trajectory (excluding the GPU resources required during training).

Mechanical stability and phase transitions

Interestingly, while the network parameters were optimized using a weighted loss function that takes both energies and forces into account, the NequIP models are also capable of providing accurate predictions for the virial stress in the system (see Supplementary Note 4). This is a necessary requirement in order to generate representative (N, P, T) dynamics, and it already suggests that the overall mechanical behavior of the frameworks will be well reproduced by the MLPs. To further investigate this, we compared the final models with the DFT reference in terms of structural and mechanical properties at 0 K. Optimized geometries for UiO-66(Zr) and both phases of MIL-53(Al) are in good agreement with the DFT reference, with RMSD values below 0.05 Å on both atomic positions and box vector components. For each of the optimized geometries, we evaluate the mechanical behavior in terms of the stiffness tensor ${\mathsf{C}}\in {{\mathbb{R}}}^{6\times 6}$ (in Voigt notation), which determines how the stress ${{{\boldsymbol{\sigma }}}}\in {{\mathbb{R}}}^{6}$ within the material changes due to an applied strain ${{{\boldsymbol{\epsilon }}}}\in {{\mathbb{R}}}^{6}$:

$${{{\boldsymbol{\sigma }}}}={\mathsf{C}}{{{\boldsymbol{\epsilon }}}}$$

(1)

The components of C are computed using either a finite-difference approach (used for the DFT reference), or with an exact second-order hessian matrix (used for the MLPs); both methods are outlined in Supplementary Note 3.

The bottom part of Fig. 4 shows the stiffness tensors for all three optimized structures; each of the squares is color-coded based on the relative deviation between the MLP prediction and the DFT reference, whereas the value of the latter is indicated in black text inside each square. None of the predicted stiffness constants differed by more than 7 GPa from the DFT reference values, demonstrating that our models are capable of reproducing the mechanical properties of the frameworks, as is to be expected based on the force and stress error assessments. To further demonstrate their thermodynamic transferability, we explicitly performed (N, P, T) MD simulations at a large range of pressures and validated the obtained trajectories with the underlying DFT reference; the results are presented in Supplementary Note 4. The constructed MLPs exhibit essentially the same accuracy as the underlying level of theory while being at least three orders of magnitude faster to evaluate. To further demonstrate this enormous gain in computational efficiency, we will investigate the threshold pressure for the large pore (lp) to closed pore (cp) transition in MIL-53(Al) at 300 K, which was previously estimated at 13–18 MPa based on mercury porosimetry experiments⁴¹. Computational prediction of the transition pressure is difficult because ab initio (N, P, T) MD simulations of MIL-53(Al) typically exhibit large volume fluctuations due to the small unit cell size (with ΔV/V on the order of 10%), and these may act as premature triggers for the phase transition⁴². Fig. 5 visualizes the transition dynamics for a typical 1 × 2 × 1 cell used in ab initio simulations. The framework exhibits lp-to-cp transitions for any nonnegative pressure, suggesting that the lp phase is unstable at 300 K in spite of a clear lp minimum on the Helmholtz free energy surface of the material. In addition, the absence of any correlation between the timing of the transitions and the magnitude of the applied pressure further suggests that transitions at this scale are driven entirely by unit cell volume fluctuations⁴². Fortunately, the extraordinary computational efficiency of MLPs allows us to consider much larger supercells of the same framework, such as the 9 × 2 × 9 cell shown in Fig. 5. Because the ensemble standard deviation of physical observables decreases according to the square root of the number of particles, the fluctuations in unit cell volume at this scale are an order of magnitude smaller and therefore no longer able to trigger premature transitions of the framework. The obtained transition pressure of 18–20 MPa as shown in Fig. 5 is further confirmed by a full evaluation of the pressure-versus-volume equation of state at 300 K as presented in Supplementary Note 5, and is in good agreement with the experimental result. Previous computational estimates existed but were obtained using a classical force field that was parameterized based on DFT input data from the optimized lp geometry only—i.e., without taking into account either the cp phase or the transition region—which resulted in a large disagreement with experiment⁴³.

**Fig. 5: Estimating the transition pressure via large-scale dynamics.**

Towards universal interaction potentials

Overall, our results demonstrate that the physical interactions in a given framework may be learned by equivariant MPNNs based on only a minimal amount of QM evaluations. Naturally, we can exploit this efficiency even further and examine whether we can construct a single model for the prediction of physical interactions in several different frameworks. This kind of transferability has already been shown for other systems such as small organic molecules, and can also be anticipated for MOFs because of their modular building block structure. As a proof of principle, we considered a set of 10 well-known aluminum- and zirconium-based frameworks that are similar to but different from MIL-53(Al) and UiO-66(Zr) in either the topology of the framework or the organic ligand. A more detailed description of the frameworks under consideration in terms of their building blocks is given in Supplementary Note 5. We used incremental learning to explore and learn the phase space of this set of 10 materials, which resulted in a training set of about 3100 configurations. We evaluated the test error performance for the frameworks included during training as well as the UiO-66(Zr) and MIL-53(Al) frameworks that were left out; the results are shown in Table 1. Even though it was trained on a dataset with a significantly larger variety in atomic environments, the model still achieves relatively low force and stress MAEs, even for the frameworks not included during training. In Supplementary Note 6, the performance of the model is further investigated and compared with UFF4MOF⁴⁴; an established universal interaction potential for MOFs. While the model is slightly less accurate when compared to the results in Fig. 4, it still outperforms UFF4MOF by a large margin. In addition, it should be emphasized that the entire training procedure only required about 3100 QM evaluations in total or about 310 QM evaluations per material, which further demonstrates the efficacy of incremental learning in combination with equivariant MPNNs.

Table 1 Test error performance of an MLP trained on a set of 10 aluminum- or zirconium frameworks (not including UiO-66(Zr) and MIL-53(Al)) at 600 K.

Full size table

Discussion

In this work, we propose an efficient approach for the construction of accurate and transferable MLPs for framework materials. Even for systems with multiple phases, we show that about 1000 QM evaluations are sufficient to construct accurate equivariant MPNNs. This increased computational efficiency is important for future research, as it is now possible to employ more advanced QM methods (e.g., hybrid functionals or beyond) during MLP training and in this way allow for a more accurate description of dynamic phenomena in these materials. In addition, the ability to construct a single potential for the description of multiple frameworks is highly promising, especially because we observed that the number of QM evaluations per material actually decreases with increasing variety in the training set (from about 1000 to only about 300). Nevertheless, further research in this area is still necessary. For example, it is still unclear how a maximally diverse training set of different frameworks should be assembled, i.e., which distribution of building blocks, topology, and/or pore sizes are necessary to guarantee transferability towards as many materials as possible. In addition, it remains to be seen whether equivariant MPNNs like NequIP will maintain their accuracy across the MOF design space, including e.g., large mesoporous systems. In those cases, message passing architectures may have to be complemented with more recent models that are targeted towards a more accurate description of long-range interactions^45,46,47. Finally, future work should extend the applicability of MLPs towards guest-loaded and even disordered frameworks in order to fully unleash their potential in the design of next-generation MOF technologies.

Methods

Density functional theory calculations

QM energy evaluations were performed using the CP2K simulation package⁴⁸, version 8.2. We employ the PBE⁴⁹ functional with Grimme D3 dispersion corrections⁵⁰ and a hybrid basis set including both TZVP Gaussian basis functions and plane waves⁵¹; GTH pseudopotentials were used to smoothen the electron density near the nuclei. The plane wave cutoff energy was set to 1000 Ry for all materials as to guarantee that force and stress calculations were fully converged. Additional computational details regarding DFT calculations are available in Supplementary Note 1.

Molecular dynamics

MLP sampling was performed using YAFF⁵² at conditions of constant temperature and constant pressure, with a timestep of 0.5 fs. The temperature was controlled using a Langevin thermostat⁵³ with a time constant of 100 fs, and the pressure was controlled using a Langevin barostat with a time constant of 1 ps⁵⁴. PLUMED version 2.7.2 was used as a plugin to add the metadynamics bias in the simulation⁵⁵.

Machine learning potentials

All the MLP models in this work were constructed using NequIP version 0.5.4. All models employed a cutoff radius of 5 Å for the atomic environments and used four interaction layers, as this was found to strike the optimal balance between accuracy and computational efficiency. Because neighboring atoms exchange information about their atomic environments in each layer, the effective interaction radius of a single atom is at most 4 × 5 Å = 20 Å because that is the largest distance over which information on a given atomic environment can travel. While this was found sufficient for all frameworks considered in this work, an accurate description of frameworks having larger pores may require dedicated long-range interactions. Feature representations were restricted to rotation orders of either l = 1 or l = 0, and the sizes of the network layers were chosen such that the total number of network parameters was around 200,000. The loss function consists of a weighted average of potential energy and force errors, and was optimized using Adam⁵⁶ with a learning rate of 0.005. Additional parameters are presented in Supplementary Note 2.

Workflow management and sampling parameters

All of the computational steps in the learning algorithm are managed using the Snakemake workflow management system, version 7.8⁵⁷. QM evaluations and metadynamics simulations are performed in a massively parallel manner on CPU nodes, whereas MLP training is performed on a single GPU (Nvidia A100 40GB). In this way, each iteration in the algorithm takes about one to two hours in real time. For models that were trained on a single material, we used 50 parallel MTD walkers to explore the phase space, 45 of which were used to construct the training set and the remaining 5 for the validation set. In contrast to traditional multiple walker MTD, each walker maintains its own bias potential as to encourage them to sample different regions in phase space. Hills were added at a pace of 50 fs, with a width σ = 100 Å³ and a height of 5 kJ/mol. Models were initialized by training on a small set of 50 structures obtained by applying uniform perturbations in the atomic coordinates and strain components (with respective amplitudes of 0.08 Å and 0.01) starting from the initial structure.

Additional validation

To verify the validity of the transition dynamics as generated by the MLP, we validated the 0 MPa phase transition simulation on the 1 × 2 × 1 unit cell of MIL-53(Al) from Fig. 5 by post hoc calculation of the DFT energy and forces at regular intervals over the entire transition. In this way, we obtained an average force MAE of only 6.7 meV Å⁻¹ and an average energy MAE of 0.1 meV per atom, which is exceptionally low. In addition to the 9 × 2 × 9 configuration presented in Fig. 4, transition pressures were also determined based on alternative supercell configurations (3 × 2 × 3 and 6 × 2 × 6), all of which yielded the same 18–20 MPa estimate.

Data availability

All datasets that were used and/or generated in this work are publicly available via Zenodo^58,59.

Code availability

An automated implementation of the entire algorithm using Snakemake is available online⁵⁹, together with the necessary input files for CP2K and NequIP as well as a variety of scripts to perform molecular dynamics simulations, geometry optimizations, and extended Hessian calculations. In addition, we provide a highly modular and scalable implementation of the incremental learning approach in psiflow, a Python library available at github.com/svandenhaute/psiflow.

References

Burtch, N. C., Jasuja, H. & Walton, K. S. Water stability and adsorption in metal-organic frameworks. Chem. Rev. 114, 10575 (2014).
Article CAS Google Scholar
Redfern, L. R. & Farha, O. K. Mechanical properties of metal-organic frameworks. Chem. Sci. 10, 10666 (2019).
Article CAS Google Scholar
Horcajada, P. et al. Metal–organic frameworks in biomedicine. Chem. Rev. 112, 1232 (2012).
Article CAS Google Scholar
Rogge, S. M. J. et al. Metal–organic and covalent organic frameworks as single-site catalysts. Chem. Soc. Rev. 46, 3134 (2017).
Article CAS Google Scholar
Lee, J. et al. Metal–organic framework materials as catalysts. Chem. Soc. Rev. 38, 1450 (2009).
Article CAS Google Scholar
Freund, R. et al. The current status of mof and cof applications. Angew. Chem. Int. Ed. 60, 23975 (2021).
Article CAS Google Scholar
Ma, N. & Horike, S. Metal–organic network-forming glasses. Chem. Rev. 122, 4163 (2022).
Article CAS Google Scholar
Lin, J.-B. et al. A scalable metal-organic framework as a durable physisorbent for carbon dioxide capture. Science 374, 1464 (2021).
Article CAS Google Scholar
Hanikel, N. et al. Evolution of water structures in metal-organic frameworks for improved atmospheric water harvesting. Science 374, 454 (2021).
Article CAS Google Scholar
Furukawa, H., Müller, U. & Yaghi, O. M. "heterogeneity within order” in metal-organic frameworks. Angew. Chem. Int. Ed. 54, 3417 (2015).
Article CAS Google Scholar
Cheetham, A. K., Bennett, T. D., Coudert, F.-X. & Goodwin, A. L. Defects and disorder in metal organic frameworks. Dalton Trans. 45, 4113 (2016).
Article CAS Google Scholar
Krause, S. et al. The effect of crystallite size on pressure amplification in switchable porous solids. Nat. Commun. 9, 1573 (2018).
Article Google Scholar
Ehrling, S., Miura, H., Senkovska, I. & Kaskel, S. From macro- to nanoscale: finite size effects on metal–organic framework switchability. Trends Chem. 3, 291 (2021).
Article CAS Google Scholar
Ehrling, S. et al. Crystal size versus paddle wheel deformability: selective gated adsorption transitions of the switchable metal–organic frameworks dut-8(co) and dut-8(ni). J. Mater. Chem. A 7, 21459 (2019).
Article CAS Google Scholar
Van Speybroeck, V., Vandenhaute, S., Hoffman, A. E. & Rogge, S. M. Towards modeling spatiotemporal processes in metal-organic frameworks. Trends Chem. 3, 605–619 (2021).
Article Google Scholar
Evans, J. D., Bon, V., Senkovska, I., Lee, H.-C. & Kaskel, S. Four-dimensional metal-organic frameworks. Nat. Commun. 11, 2690 (2020).
Article CAS Google Scholar
Vandenhaute, S., Rogge, S. M. J. & Van Speybroeck, V. Large-scale molecular dynamics simulations reveal new insights into the phase transition mechanisms in mil-53(al). Front. Chem. 9, 2296 (2021).
Article Google Scholar
Keupp, J. & Schmid, R. Molecular dynamics simulations of the “breathing” phase transformation of mof nanocrystallites. Adv. Theory Simul. 2, 1900117 (2019).
Article CAS Google Scholar
Friederich, P., Häse, F., Proppe, J. & Aspuru-Guzik, A. Machine-learned potentials for next-generation matter simulations. Nat. Mater. 20, 750 (2021).
Article CAS Google Scholar
Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142 (2021).
Article CAS Google Scholar
Christensen, A. S., Bratholm, L. A., Faber, F. A. & Anatole von Lilienfeld, O. Fchl revisited: faster and more accurate quantum machine learning. J. Chem. Phys. 152, 044107 (2020).
Article CAS Google Scholar
Deringer, V. L. et al. Gaussian process regression for materials and molecules. Chem. Rev. 121, 10073 (2021).
Article CAS Google Scholar
Behler, J. Four generations of high-dimensional neural network potentials. Chem. Rev. 121, 10037 (2021).
Article CAS Google Scholar
Eckhoff, M. & Behler, J. From molecular fragments to the bulk: Development of a neural network potential for mof-5. J. Chem. Theory Comput. 15, 3793 (2019).
Article CAS Google Scholar
Achar, S. K., Wardzala, J. J., Bernasconi, L., Zhang, L. & Johnson, J. K. Combined deep learning and classical potential approach for modeling diffusion in uio-66. J. Chem. Theory Comput. 18, 3593 (2022).
Article CAS Google Scholar
Yu, Y., Zhang, W. & Mei, D. Artificial neural network potential for encapsulated platinum clusters in mof-808. J. Phys. Chem. C 126, 1204 (2022).
Article CAS Google Scholar
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Article CAS Google Scholar
Thomaset, N. et al. Tensor field networks: rotation- and translation-equivariant neural networks for 3d point clouds. Preprint at https://arxiv.org/abs/1802.08219 (2018).
Sivaraman, G. et al. Machine-learned interatomic potentials by active learning: amorphous and liquid hafnium dioxide. NPJ Comput. Mater. 6, 104 (2020).
Article CAS Google Scholar
Schran, C. et al. Machine learning potentials for complex aqueous systems made simple. Proc. Natl Acad. Sci. USA 118, e2110077118 (2021).
Article CAS Google Scholar
Wang, W., Yang, T., Harris, W. H. & Gómez-Bombarelli, R. Active learning and neural network potentials accelerate molecular screening of ether-based solvate ionic liquids. Chem. Commun. 56, 8920 (2020).
Article CAS Google Scholar
Vandermause, J. et al. On-the-fly active learning of interpretable bayesian force fields for atomistic rare events. NPJ Comput. Mater. 6, 20 (2020).
Article Google Scholar
Cavka, J. H. et al. A new zirconium inorganic building brick forming metal organic frameworks with exceptional stability. J. Am. Chem. Soc. 130, 13850 (2008).
Article Google Scholar
Loiseau, T. et al. A rationale for the large breathing of the porous aluminum terephthalate (mil-53) upon hydration. Chem. Eur. J. 10, 1373 (2004).
Article CAS Google Scholar
Schütt, K. T., Unke, O. T. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. Preprint at https://arxiv.org/abs/2102.03150 (2021).
Batatia, I., Kovács, D. P., Simm, G. N. C., Ortner, C. & Csányi, G. Mace: higher order equivariant message passing neural networks for fast and accurate force fields. Preprint at https://arxiv.org/abs/2206.07697 (2022).
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. Schnet—a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Article Google Scholar
Wang, H., Zhang, L., Han, J. & Weinan, E. Deepmd-kit: a deep learning package for many-body potential energy representation and molecular dynamics. Comput. Phys. Commun. 228, 178 (2018).
Article CAS Google Scholar
Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759 (2021).
Article CAS Google Scholar
Nazarian, D., Ganesh, P. & Sholl, D. S. Benchmarking density functional theory predictions of framework structures and properties in a chemically diverse test set of metal-organic frameworks. J. Mater. Chem. A 3, 22432 (2015).
Article CAS Google Scholar
Yot, P. G. et al. Metal-organic frameworks as potential shock absorbers: the case of the highly flexible mil-53(al). Chem. Commun. 50, 9462 (2014).
Article CAS Google Scholar
Rogge, S. M. J. et al. A comparison of barostats for the mechanical characterization of metal–organic frameworks. J. Chem. Theory Comput. 11, 5583 (2015).
Article CAS Google Scholar
Rogge, S. M. J., Waroquier, M. & Van Speybroeck, V. Unraveling the thermodynamic criteria for size-dependent spontaneous phase separation in soft porous crystals. Nat. Commun. 10, 4842 (2019).
Article Google Scholar
Coupry, D. E., Addicoat, M. A. & Heine, T. Extension of the universal force field for metal–organic frameworks. J. Chem. Theory Comput. 12, 5215 (2016).
Article CAS Google Scholar
Staacke, C. G. et al. On the role of long-range electrostatics in machine-learned interatomic potentials for complex battery materials. ACS Appl. Energy Mater. 4, 12562 (2021).
Article CAS Google Scholar
Grisafi, A. & Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 151, 204105 (2019).
Article Google Scholar
Lewis, A. M., Grisafi, A., Ceriotti, M. & Rossi, M. Learning electron densities in the condensed phase. J. Chem. Theory Comput. 17, 7203 (2021).
Article CAS Google Scholar
VandeVondele, J. et al. Quickstep: fast and accurate density functional calculations using a mixed Gaussian and plane waves approach. Comput. Phys. Commun. 167, 103 (2005).
Article CAS Google Scholar
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
Article CAS Google Scholar
Grimme, S., Antony, J., Ehrlich, S. & Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (dft-d) for the 94 elements h-pu. J. Chem. Phys. 132, 154104 (2010).
Article Google Scholar
Lippert, B. G., Hutter, J. & Parrinello, M. A hybrid gaussian and plane wave density functional scheme. Mol. Phys. 92, 477 (1997).
Article CAS Google Scholar
Verstraelen, T., Vanduyfhuys, L., Vandenbrande, S. & Rogge, S. Yaff, yet another force field. http://molmod.ugent.be/software/.
Bussi, G. & Parrinello, M. Accurate sampling using langevin dynamics. Phys. Rev. E 75, 056707 (2007).
Article Google Scholar
Feller, S. E., Zhang, Y., Pastor, R. W. & Brooks, B. R. Constant pressure molecular dynamics simulation: the Langevin piston method. J. Chem. Phys. 103, 4613 (1995).
Article CAS Google Scholar
Tribello, G. A., Bonomi, M., Branduardi, D., Camilloni, C. & Bussi, G. Plumed 2: new feathers for an old bird. Comp. Phys. Commun. 185, 604 (2014).
Article CAS Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017).
Mölderet, F. et al. Sustainable data analysis with snakemake. F1000Research 10, 33 (2021).
Vandenhaute, S., Cools-Ceuppens, M., Verstraelen, T. & Van Speybroeck, V. Machine learning potentials for metal-organic frameworks with thermodynamic transferability: training data. https://doi.org/10.5281/zenodo.6359970 (2022).
Vandenhaute, S., Cools-Ceuppens, M., DeKeyser, S., Verstraelen, T. & Van Speybroeck, V. Machine learning potentials for metal-organic frameworks using an incremental learning approach: workflow and data. https://doi.org/10.5281/zenodo.7539133 (2023).

Download references

Acknowledgements

V.V.S. and T.V. acknowledge funding from the Research Board of Ghent University (BOF). S.V. and M.C.C. wish to thank the Research Foundation – Flanders (FWO) for doctoral fellowships (grant nos. 11H6821N and 11D0420N respectively). The resources and services used in this work were provided by VSC (Flemish Supercomputer Center), funded by the Research Foundation – Flanders (FWO) and the Flemish Government.

Author information

Authors and Affiliations

Center for Molecular Modeling, Ghent University, Technologiepark 46, 9052, Zwijnaarde, Belgium
Sander Vandenhaute, Maarten Cools-Ceuppens, Simon DeKeyser, Toon Verstraelen & Veronique Van Speybroeck

Authors

Sander Vandenhaute
View author publications
You can also search for this author in PubMed Google Scholar
Maarten Cools-Ceuppens
View author publications
You can also search for this author in PubMed Google Scholar
Simon DeKeyser
View author publications
You can also search for this author in PubMed Google Scholar
Toon Verstraelen
View author publications
You can also search for this author in PubMed Google Scholar
Veronique Van Speybroeck
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.V., M.C.C., S.D., T.V., and V.V.S. initiated the discussion. S.V. implemented the algorithm, trained the models, and performed all simulations included in this work. S.V., M.C.C., T.V., and V.V.S. were involved in the discussions of the results, designed, and wrote the paper.

Corresponding author

Correspondence to Veronique Van Speybroeck.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supporting information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Vandenhaute, S., Cools-Ceuppens, M., DeKeyser, S. et al. Machine learning potentials for metal-organic frameworks using an incremental learning approach. npj Comput Mater 9, 19 (2023). https://doi.org/10.1038/s41524-023-00969-x

Download citation

Received: 17 March 2022
Accepted: 25 January 2023
Published: 06 February 2023
DOI: https://doi.org/10.1038/s41524-023-00969-x
Springer Nature Limited

This article is cited by

Machine learned force-fields for an Ab-initio quality description of metal-organic frameworks
- Sandro Wieser
- Egbert Zojer
npj Computational Materials (2024)
Progress toward the computational discovery of new metal–organic framework adsorbents for energy applications
- Peyman Z. Moghadam
- Yongchul G. Chung
- Randall Q. Snurr
Nature Energy (2024)
Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials
- Viktor Zaverkin
- David Holzmüller
- Johannes Kästner
npj Computational Materials (2024)
Analysis of metal–organic framework-based photosynthetic CO2 reduction
- P. M. Stanley
- V. Ramm
- J. Warnan
Nature Synthesis (2024)
Gas adsorption and framework flexibility of CALF-20 explored via experiments and simulations
- Rama Oktavian
- Ruben Goeminne
- Peyman Z. Moghadam
Nature Communications (2024)

Machine learning potentials for metal-organic frameworks using an incremental learning approach

Abstract

Similar content being viewed by others

Introduction

Results

Incremental learning scheme for UiO-66(Zr) and MIL-53(Al)

Mechanical stability and phase transitions

Towards universal interaction potentials

Discussion

Methods

Density functional theory calculations

Molecular dynamics

Machine learning potentials

Workflow management and sampling parameters

Additional validation

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation