Introduction

Metal-organic frameworks (MOFs) have become one of the most intriguing classes of materials in the last decades as they may exhibit anomalous responses upon exposure to external stimuli such as temperature, pressure, or guest sorption; notable examples are negative thermal expansion, negative gas adsorption, and large-amplitude structural transitions. Apart from being academic peculiarities, MOFs have now also found their way into technologically important applications such as gas storage, sensing, separation, or catalysis.1,2,3,4,5,6,7,8,9. Recently, it became clear that the dynamic response of MOFs is largely affected by spatial heterogeneities at various length scales, varying from the subnanometer scale for isolated defects to the mesoscale when correlated nanoregions are present and to the microscale when accounting for the finite crystal size.10,11,12,13,14 It led to the terminology of spatiotemporal processes within MOFs, referring to the entanglement between the dynamic response of the material and the existing spatial disorder.15,16 Modeling spatiotemporal processes in realistic MOFs at length and time scales comparable to experimental conditions is a very ambitious goal, and requires a range of innovative methods to translate atomic-scale information into macroscopic insight. One key ingredient is the method used to describe the interatomic forces. Current modeling efforts on MOFs either rely on density functional theory (DFT), which may obtain an accurate description of framework-guest interactions when complemented with proper dispersion models to describe longer-ranged interactions, or on classical force fields, which are simple analytical functions that ignore the quantum mechanical description of the electrons. Classical force fields have become tractable within the MOF field as they are now starting to bridge the length scale gap towards experimentally observed MOF crystallites17,18. However, they are less accurate than DFT-based methods and exhibit only limited transferability; force fields derived under certain thermodynamic conditions are not necessarily applicable to other operating conditions. In addition, they are generally unable to describe bond formation or breakage. For these reasons, DFT-based methods are in principle preferred but their applicability is limited to nanometer-sized structural models and time scales up to about 100 ps, even on state-of-the-art computing infrastructure. Ideally, it would be possible to combine the best of both worlds, whereby interatomic forces are evaluated at an accuracy comparable to DFT but with a computational efficiency similar to classical force fields. Machine learning potentials (MLPs) may offer such a hybrid alternative; they may learn the potential energy surface (PES) at a given level of quantum chemical accuracy, and can then be used to accelerate subsequent energy evaluations by multiple orders of magnitude19,20. MLPs have been successfully developed for a variety of different materials and molecules. Broadly speaking, these models may be categorized into either kernel regression methods, which determine the interaction energy by comparing a given configuration to a set of reference configurations21,22, or neural network potentials, which directly determine a high-dimensional representation of the PES based on thousands or even millions of parameters23.

Since 2019, a number of studies have emerged in which neural network potentials have been employed to study e.g., mechanical or diffusion properties for framework materials24,25,26. Although they demonstrate the potential of MLPs in the field of modeling MOFs, they required a large set of expensive DFT molecular dynamics (MD) trajectories in order to generate the necessary amount of training data (over 10,000 snapshots in all three cases). Such approaches become computationally intractable for systems where large portions of phase space need to be sampled. This is the case for more complex frameworks with large unit cells such as HKUST-1(Cu) or MOF-808(Zr), but also flexible frameworks with multiple stable phases pose a real challenge in terms of generating training data in a computationally efficient manner (e.g., MIL-53(Al) in Fig. 1). For these frameworks, all relevant regions in phase space need to be properly sampled, including not only the stable phases of the material within the thermodynamic conditions of interest but also activated regions and all points along important transition paths. This is very difficult to achieve using equilibrium first principles MD because simulation times are limited when using DFT to evaluate the interatomic forces at each timestep. In addition, configurations obtained from equilibrium MD typically sample regions close to free energy minima and fail to capture important transition states in between phases, even though these are essential to include in the training data.

Fig. 1: Overview of the atomic structures of the frameworks.
figure 1

UiO-66(Zr) is a mechanically rigid framework with a single stable phase; MIL-53(Al) is a flexible material.

In this work, we present an incremental learning scheme for the construction of accurate MLPs with thermodynamic transferability without requiring large amounts of DFT input data. We employ NequIP27, a newly proposed message passing neural network (MPNN) that exhibits extraordinary data efficiency due to its use of equivariant feature representations for the atomic environments (see below)28. The key observation in our learning scheme is that a proper sampling of the phase space can be achieved without performing extensive quantum mechanical-based molecular dynamics simulations. Instead, inspired by other active learning approaches29,30,31, we propose to sample the phase space using the MLP itself, in combination with on-the-fly training whenever it encounters unknown regions. In this sense, systematically improved MLPs are generated. Key to our approach is the use of metadynamics (MTD) in order to enforce the sampling towards unexplored regions. This allows us to ensure that both in- and out-of-equilibrium configurations are automatically included during training, even when large free energy barriers are present. Importantly, we show that our incremental learning algorithm is highly efficient and does not require a kind of uncertainty metric that determines which samples to include in the training data, as is often the case for other active learning approaches proposed elsewhere30,32. We demonstrate the accuracy and transferability of models constructed with the incremental learning scheme based on two representative frameworks (Fig. 1): the mechanically rigid UiO-66(Zr) framework, which consists of twelve-fold coordinated inorganic bricks and organic benzene-1,4-dicarboxylate (BDC) ligands33, and the flexible MIL-53(Al) framework, which consists of one-dimensional (periodic) aluminum hydroxide chains connected with the same organic ligands34. Incremental learning addresses the main disadvantages of traditional data generation with an iterative learning approach, in which efficient and parallel enhanced sampling molecular dynamics simulations are combined with on-the-fly learning. Based on a single atomic structure and a definition of one or more collective variables, the algorithm proceeds with simultaneously exploring and learning the phase space of the material; the approach is schematically shown in Fig. 2. The algorithm starts by constructing a first generation MLP, which is trained based on a small set of initial configurations that are obtained by applying random perturbations to the particle positions and strain components of the initial configuration. This first generation MLP is then used in a short (~1 ps) multiple walker metadynamics (MTD) simulation in order to explore the phase space around the initial configuration. The final configuration of each of the walkers is extracted and subjected to a new DFT calculation to obtain the energy and forces. The latter are added to the training and validation sets after which a next generation MLP is obtained by training the model for a short amount of time. Even though the sampling was performed with an initially inaccurate MLP, it still suffices to explore a meaningful region of phase space and generate almost decorrelated samples. The sampling time per iteration may be kept relatively short as it serves the purpose to gradually explore larger portions of phase space. After training, the model is considered to have learned an incremental region in phase space, and may then be passed on to the next iteration in which it continues the MTD sampling. The state of the bias potential is retained from the previous iteration as to ensure that the walkers explore a slightly different region of phase space in each iteration. By continuously alternating the enhanced sampling and training steps, the entire phase space of the collective variable is explored and learned on-the-fly without the need to perform expensive DFT-based molecular dynamics simulations. Importantly, our approach ensures that atomic configurations for which a QM evaluation is performed are always separated by a ~1 ps MTD trajectory, which implies that they are at most only weakly correlated. As such, we are guaranteed that there is little redundancy between the QM-evaluated configurations in the training data even though we did not rely on any specialized uncertainty measures as found in other active learning schemes30,32.

Fig. 2: Overview of the incremental learning approach as a combination of enhanced sampling and on-the-fly training.
figure 2

a Each iteration employs the current MLP in multiple walker metadynamics to explore increasingly larger regions in phase space, after which the obtained samples are evaluated at the QM level and the MLP is retrained. b Metastable regions and transition paths are automatically explored and included in the training data.

The incremental learning approach as proposed here is most efficient when combined with MLP models which are easy to train and which are able to learn a large number of atomic environments and chemical species. While existing MLPs for MOFs used strictly invariant architectures24,25,26, we employ the neural equivariant interatomic potential (NequIP)27 because it achieves a data efficiency similar to kernel regression methods while maintaining a cost of evaluation that is independent of the size of the training set. Like many other message passing neural networks (MPNNs)35,36, NequIP computes the total potential energy of a given configuration as a sum of individual atomic contributions, whereby each contribution is the result of a series of convolutions taken over neighboring atomic environments as to represent the physical interactions between nearby atoms. Equivariant MPNNs differ from more established MPNNs such as SchNet37 or DeePMD38 because they complement rotationally invariant, scalar-like features with equivariant features in the representation of atomic environments39 (Fig. 3a). Inspired by tensor field networks28, NequIP assigns each of the features to a specific irreducible representation of the rotational group SO(3). Each irreducible representation is characterized by a rotation order \(\ell \in {\mathbb{N}}\), which defines how features transform when the cartesian coordinates at the input are rotated. The first two rotation orders ( = 0 and  = 1) correspond to scalar and vector features respectively, and are illustrated in Fig. 3a. Effectively, features for which  > 0 are better suited to represent directional information of the atomic environment throughout the layers of the network, and are the primary reason for the exceptional data efficiency of equivariant MPNNs as compared to invariant MPNNs and even kernel regression methods27. To demonstrate the importance of equivariant features, we performed separate incremental learning runs for the invariant (\({\ell }_{\max }=0\)) and equivariant (\({\ell }_{\max }=1\)) networks, and monitored the mean absolute error (MAE) on the force predictions for an independent test set as a function of the number of iterations (Fig. 3). In addition, we explicitly show the efficiency of our learning approach by indicating the corresponding total number of QM evaluations required to achieve a given error. As expected, the test error decreases monotonically as a function of the number of iterations as training and validation sets contain an increasingly larger number of atomic environments which the models may learn from. While the number of weights in both networks was roughly equal to about 200,000, the equivariant model clearly required far less QM evaluations to achieve a given test error as compared to the invariant model.

Fig. 3: Comparison between invariant and equivariant feature representations.
figure 3

a Features in equivariant neural networks are associated with a specific irreducible representation of SO(3), as characterized by the rotation order . For  = 0, features do not transform upon rotation of the system, and may be considered as scalars. For  = 1, features transform according to vectors in \({{\mathbb{R}}}^{3}\). b Evolution of the test error as a function of the number of iterations in the incremental learning algorithm as well as the total number of QM evaluations that were performed for both the invariant \({\ell }_{\max }=0\) model (blue) and equivariant \({\ell }_{\max }=1\) model (red). The test system was UiO-66(Zr). Computational details regarding network hyperparameters, MTD parameters, and test set generation can be found in the Methods section as well as the Supplementary Information.

Results

Incremental learning scheme for UiO-66(Zr) and MIL-53(Al)

We demonstrate the efficiency of incremental learning based on case studies for both UiO-66(Zr) as well as MIL-53(Al) (Fig. 1). Final models are validated in terms of their test error accuracy, transferability towards out-of-equilibrium configurations, and their ability to predict structural and mechanical properties in agreement with the DFT reference. Since the learning algorithm takes care of both data generation and training, the only required user input is an initial atomic structure of the system as well as a collective variable along which the bias should be applied. For both frameworks, we chose the unit cell volume as collective variable and employ NequIP models with \({\ell }_{\max }=1\) as MLP. The QM calculations are performed using DFT at the PBE-D3(BJ) level of theory. In each iteration, 50 MTD walkers are used to explore the phase space and generate 50 configurations that are added to training and validation sets. The learning algorithm started from the DFT optimized geometry. Chemical bond integrity was preserved throughout all iterations and the constructed MLPs are therefore only intended to be employed in nonreactive dynamics. Although outside the scope of this work, the same approach may be used to construct reactive MLPs; the MTD bias will continue to increase in each iteration until it exceeds the barrier needed to break the coordination bonds between the metal and the ligand, at which point the enhanced sampling will start generating defective framework configurations. A full overview of all computational details may be found in the “Methods” section and in the Supplementary Information.

For both systems, we constructed a test set of reference data that is obtained based on a large number of first principles MD trajectories at 600 K (see Supplementary Note 1). The test set was specifically generated as to include strongly out-of-equilibrium unit cell volumes, as these become relevant for the behavior of the framework at high temperatures and/or pressures. The energy and force MAEs are shown in the top half of Fig. 4 as a function of the MTD collective variable (the unit cell volume), and remain below 1 meV per atom and 30 meV Å−1 across the entire volume range. Such values are in fact lower than the intrinsic uncertainty on these quantities due to e.g., the choice of functional or even the basis set dependence40, and further training with more data would at that point no longer yield significant improvements in predictive power. Note that even the volume region between the two stable phases for MIL-53(Al) is well predicted, indicating that the model has successfully discovered and learned the phase transition by itself even though it only initially received the closed pore (cp) configuration as input. This is the case because the MTD bias ensures that the sampling in each iteration is gradually expanded towards unit cell volumes that are not yet explored.

Fig. 4: Validation of the obtained potentials for UiO-66(Zr) and MIL-53(Al).
figure 4

a, b Energy and force MAEs on independent test sets and a smoothed histogram of the training data sampled using incremental learning; the unit cell volume of the initial configuration is indicated with a black dot. Additional computational details are provided in the Methods section. c Stiffness matrix for UiO-66(Zr) and both phases of MIL-53(Al) as calculated with the MLPs. Each square is color-coded based on the difference between the MLP prediction and the DFT reference values; the latter are explicitly shown inside each square. Black dots indicate stiffness constants which are zero due to symmetry considerations; see Supplementary Note 3 for more details.

Once the training data covers all unit cell volumes of interest, no additional sampling is necessary and the algorithm may be terminated. For UiO-66(Zr), this was the case after only 11 iterations or 600 QM evaluations, whereas MIL-53(Al) required 19 iterations or 1000 QM evaluations because the MTD simulations take some time to overcome the free energy barrier with Gaussian hills. Nevertheless, it is clear that incremental learning generates trained MLPs at only a fraction of the computational cost when compared to passive learning approaches even though the accuracy is maintained across thermodynamic conditions (see Supplementary Note 4). In fact, the total computational cost of training data generation is now equivalent to or even less than a routine geometry optimization or a very short first principles MD trajectory (excluding the GPU resources required during training).

Mechanical stability and phase transitions

Interestingly, while the network parameters were optimized using a weighted loss function that takes both energies and forces into account, the NequIP models are also capable of providing accurate predictions for the virial stress in the system (see Supplementary Note 4). This is a necessary requirement in order to generate representative (N, P, T) dynamics, and it already suggests that the overall mechanical behavior of the frameworks will be well reproduced by the MLPs. To further investigate this, we compared the final models with the DFT reference in terms of structural and mechanical properties at 0 K. Optimized geometries for UiO-66(Zr) and both phases of MIL-53(Al) are in good agreement with the DFT reference, with RMSD values below 0.05 Å on both atomic positions and box vector components. For each of the optimized geometries, we evaluate the mechanical behavior in terms of the stiffness tensor \({\mathsf{C}}\in {{\mathbb{R}}}^{6\times 6}\) (in Voigt notation), which determines how the stress \({{{\boldsymbol{\sigma }}}}\in {{\mathbb{R}}}^{6}\) within the material changes due to an applied strain \({{{\boldsymbol{\epsilon }}}}\in {{\mathbb{R}}}^{6}\):

$${{{\boldsymbol{\sigma }}}}={\mathsf{C}}{{{\boldsymbol{\epsilon }}}}$$
(1)

The components of C are computed using either a finite-difference approach (used for the DFT reference), or with an exact second-order hessian matrix (used for the MLPs); both methods are outlined in Supplementary Note 3.

The bottom part of Fig. 4 shows the stiffness tensors for all three optimized structures; each of the squares is color-coded based on the relative deviation between the MLP prediction and the DFT reference, whereas the value of the latter is indicated in black text inside each square. None of the predicted stiffness constants differed by more than 7 GPa from the DFT reference values, demonstrating that our models are capable of reproducing the mechanical properties of the frameworks, as is to be expected based on the force and stress error assessments. To further demonstrate their thermodynamic transferability, we explicitly performed (N, P, T) MD simulations at a large range of pressures and validated the obtained trajectories with the underlying DFT reference; the results are presented in Supplementary Note 4. The constructed MLPs exhibit essentially the same accuracy as the underlying level of theory while being at least three orders of magnitude faster to evaluate. To further demonstrate this enormous gain in computational efficiency, we will investigate the threshold pressure for the large pore (lp) to closed pore (cp) transition in MIL-53(Al) at 300 K, which was previously estimated at 13–18 MPa based on mercury porosimetry experiments41. Computational prediction of the transition pressure is difficult because ab initio (N, P, T) MD simulations of MIL-53(Al) typically exhibit large volume fluctuations due to the small unit cell size (with ΔV/V on the order of 10%), and these may act as premature triggers for the phase transition42. Fig. 5 visualizes the transition dynamics for a typical 1 × 2 × 1 cell used in ab initio simulations. The framework exhibits lp-to-cp transitions for any nonnegative pressure, suggesting that the lp phase is unstable at 300 K in spite of a clear lp minimum on the Helmholtz free energy surface of the material. In addition, the absence of any correlation between the timing of the transitions and the magnitude of the applied pressure further suggests that transitions at this scale are driven entirely by unit cell volume fluctuations42. Fortunately, the extraordinary computational efficiency of MLPs allows us to consider much larger supercells of the same framework, such as the 9 × 2 × 9 cell shown in Fig. 5. Because the ensemble standard deviation of physical observables decreases according to the square root of the number of particles, the fluctuations in unit cell volume at this scale are an order of magnitude smaller and therefore no longer able to trigger premature transitions of the framework. The obtained transition pressure of 18–20 MPa as shown in Fig. 5 is further confirmed by a full evaluation of the pressure-versus-volume equation of state at 300 K as presented in Supplementary Note 5, and is in good agreement with the experimental result. Previous computational estimates existed but were obtained using a classical force field that was parameterized based on DFT input data from the optimized lp geometry only—i.e., without taking into account either the cp phase or the transition region—which resulted in a large disagreement with experiment43.

Fig. 5: Estimating the transition pressure via large-scale dynamics.
figure 5

We compare estimates using a regular 1 × 2 × 1 unit cell versus a large-scale 9 × 2 × 9 unit cell, as obtained based on (N, P, T) dynamics. All unit cell volumes were normalized with respect to the lp minimum of the Helmholtz free energy surface at 300 K, which occurs at 2882 Å. Nevertheless, the lp phase is not stable for the 1 × 2 × 1 cell due to the larger volume fluctuations. In contrast, the 9 × 2 × 9 cell allows to determine a transition pressure in the range of 18--20 MPa, which is in good agreement with experiment41.

Towards universal interaction potentials

Overall, our results demonstrate that the physical interactions in a given framework may be learned by equivariant MPNNs based on only a minimal amount of QM evaluations. Naturally, we can exploit this efficiency even further and examine whether we can construct a single model for the prediction of physical interactions in several different frameworks. This kind of transferability has already been shown for other systems such as small organic molecules, and can also be anticipated for MOFs because of their modular building block structure. As a proof of principle, we considered a set of 10 well-known aluminum- and zirconium-based frameworks that are similar to but different from MIL-53(Al) and UiO-66(Zr) in either the topology of the framework or the organic ligand. A more detailed description of the frameworks under consideration in terms of their building blocks is given in Supplementary Note 5. We used incremental learning to explore and learn the phase space of this set of 10 materials, which resulted in a training set of about 3100 configurations. We evaluated the test error performance for the frameworks included during training as well as the UiO-66(Zr) and MIL-53(Al) frameworks that were left out; the results are shown in Table 1. Even though it was trained on a dataset with a significantly larger variety in atomic environments, the model still achieves relatively low force and stress MAEs, even for the frameworks not included during training. In Supplementary Note 6, the performance of the model is further investigated and compared with UFF4MOF44; an established universal interaction potential for MOFs. While the model is slightly less accurate when compared to the results in Fig. 4, it still outperforms UFF4MOF by a large margin. In addition, it should be emphasized that the entire training procedure only required about 3100 QM evaluations in total or about 310 QM evaluations per material, which further demonstrates the efficacy of incremental learning in combination with equivariant MPNNs.

Table 1 Test error performance of an MLP trained on a set of 10 aluminum- or zirconium frameworks (not including UiO-66(Zr) and MIL-53(Al)) at 600 K.

Discussion

In this work, we propose an efficient approach for the construction of accurate and transferable MLPs for framework materials. Even for systems with multiple phases, we show that about 1000 QM evaluations are sufficient to construct accurate equivariant MPNNs. This increased computational efficiency is important for future research, as it is now possible to employ more advanced QM methods (e.g., hybrid functionals or beyond) during MLP training and in this way allow for a more accurate description of dynamic phenomena in these materials. In addition, the ability to construct a single potential for the description of multiple frameworks is highly promising, especially because we observed that the number of QM evaluations per material actually decreases with increasing variety in the training set (from about 1000 to only about 300). Nevertheless, further research in this area is still necessary. For example, it is still unclear how a maximally diverse training set of different frameworks should be assembled, i.e., which distribution of building blocks, topology, and/or pore sizes are necessary to guarantee transferability towards as many materials as possible. In addition, it remains to be seen whether equivariant MPNNs like NequIP will maintain their accuracy across the MOF design space, including e.g., large mesoporous systems. In those cases, message passing architectures may have to be complemented with more recent models that are targeted towards a more accurate description of long-range interactions45,46,47. Finally, future work should extend the applicability of MLPs towards guest-loaded and even disordered frameworks in order to fully unleash their potential in the design of next-generation MOF technologies.

Methods

Density functional theory calculations

QM energy evaluations were performed using the CP2K simulation package48, version 8.2. We employ the PBE49 functional with Grimme D3 dispersion corrections50 and a hybrid basis set including both TZVP Gaussian basis functions and plane waves51; GTH pseudopotentials were used to smoothen the electron density near the nuclei. The plane wave cutoff energy was set to 1000 Ry for all materials as to guarantee that force and stress calculations were fully converged. Additional computational details regarding DFT calculations are available in Supplementary Note 1.

Molecular dynamics

MLP sampling was performed using YAFF52 at conditions of constant temperature and constant pressure, with a timestep of 0.5 fs. The temperature was controlled using a Langevin thermostat53 with a time constant of 100 fs, and the pressure was controlled using a Langevin barostat with a time constant of 1 ps54. PLUMED version 2.7.2 was used as a plugin to add the metadynamics bias in the simulation55.

Machine learning potentials

All the MLP models in this work were constructed using NequIP version 0.5.4. All models employed a cutoff radius of 5 Å for the atomic environments and used four interaction layers, as this was found to strike the optimal balance between accuracy and computational efficiency. Because neighboring atoms exchange information about their atomic environments in each layer, the effective interaction radius of a single atom is at most 4 × 5 Å = 20 Å because that is the largest distance over which information on a given atomic environment can travel. While this was found sufficient for all frameworks considered in this work, an accurate description of frameworks having larger pores may require dedicated long-range interactions. Feature representations were restricted to rotation orders of either l = 1 or l = 0, and the sizes of the network layers were chosen such that the total number of network parameters was around 200,000. The loss function consists of a weighted average of potential energy and force errors, and was optimized using Adam56 with a learning rate of 0.005. Additional parameters are presented in Supplementary Note 2.

Workflow management and sampling parameters

All of the computational steps in the learning algorithm are managed using the Snakemake workflow management system, version 7.857. QM evaluations and metadynamics simulations are performed in a massively parallel manner on CPU nodes, whereas MLP training is performed on a single GPU (Nvidia A100 40GB). In this way, each iteration in the algorithm takes about one to two hours in real time. For models that were trained on a single material, we used 50 parallel MTD walkers to explore the phase space, 45 of which were used to construct the training set and the remaining 5 for the validation set. In contrast to traditional multiple walker MTD, each walker maintains its own bias potential as to encourage them to sample different regions in phase space. Hills were added at a pace of 50 fs, with a width σ = 100 Å3 and a height of 5 kJ/mol. Models were initialized by training on a small set of 50 structures obtained by applying uniform perturbations in the atomic coordinates and strain components (with respective amplitudes of 0.08 Å and 0.01) starting from the initial structure.

Additional validation

To verify the validity of the transition dynamics as generated by the MLP, we validated the 0 MPa phase transition simulation on the 1 × 2 × 1 unit cell of MIL-53(Al) from Fig. 5 by post hoc calculation of the DFT energy and forces at regular intervals over the entire transition. In this way, we obtained an average force MAE of only 6.7 meV Å−1 and an average energy MAE of 0.1 meV per atom, which is exceptionally low. In addition to the 9 × 2 × 9 configuration presented in Fig. 4, transition pressures were also determined based on alternative supercell configurations (3 × 2 × 3 and 6 × 2 × 6), all of which yielded the same 18–20 MPa estimate.