Equivariant Neural Network Force Fields for Magnetic Materials

Neural network force fields have significantly advanced ab initio atomistic simulations across diverse fields. However, their application in the realm of magnetic materials is still in its early stage due to challenges posed by the subtle magnetic energy landscape and the difficulty of obtaining training data. Here we introduce a data-efficient neural network architecture to represent density functional theory total energy, atomic forces, and magnetic forces as functions of atomic and magnetic structures. Our approach incorporates the principle of equivariance under the three-dimensional Euclidean group into the neural network model. Through systematic experiments on various systems, including monolayer magnets, curved nanotube magnets, and moir\'e-twisted bilayer magnets of $\text{CrI}_{3}$, we showcase the method's high efficiency and accuracy, as well as exceptional generalization ability. The work creates opportunities for exploring magnetic phenomena in large-scale materials systems.


I. INTRODUCTION
Ab initio calculations employing density functional theory (DFT) have become an indispensable tool in material discovery, but their practical applications are limited to small-size systems due to the high computational cost.Deep learning has been proposed as a viable solution to address the trade-off between efficiency and accuracy.Over the past decade, deep learning ab initio methods have revolutionized electronic and atomistic modeling [1][2][3][4][5][6][7][8][9][10][11][12][13].For electronic modeling, a series of deep neural networks are developed to learn the relationship between DFT Hamiltonians and materials structures [8,9,13].By satisfying the principle of equivariance, the neural-network approach has demonstrated exceptional accuracy in example studies of various nonmagnetic systems [8][9][10]13].Remarkably, an extended deep-learning method has been developed to learn the mapping from atomic structures and magnetic structures to DFT Hamiltonians, which preserves a generalized principle of equivariance for magnetic materials [11].For atomistic modeling, neural network force fields (NNFFs) have been devised for non-magnetic materials and widely applied in molecular dynamics and Monte Carlo simulations [14][15][16][17][18][19][20][21][22][23][24][25].The corresponding research on magnetic materials is of equal importance; however, it remains largely unexplored.
The development of NNFFs for magnetic materials faces a few challenges.Firstly, magnetic NNFFs double the degrees of freedoms (6N ) of conventional NNFFs (3N ), which requires substantial amount of extra data for machine learning.At the same time, the training data of magnetic materials are significantly more costly than conventional DFT datasets due to the additional computational workload for constraining magnetic configurations, exacerbating the situation of data scarcity.Yang et al. [26] recently proposed a neural network method based on an existing descriptor based model, but it suffers from data inefficiency problem for only including invariant features, which has been explicitly illustrated in previous works [22,24].Alternatively, incorporating a prior knowledge of symmetry into neural network design can alleviate this problem.Equivariant neural networks (ENNs) [27][28][29] are aimed to satisfy the equivariance requirements by ensuring that the input, output and all internal features transform equivalently under the symmetry group.Therefore, ENNs can be extended to scenarios with limited data without necessitating data augmentation, rendering them more viable for magnetic material energy modeling task.
Secondly, the derivatives of total energy with respect to varying orientations of magnetic moments (called magnetic forces in this work) serve a role in analogy to atomic forces in conventional NNFFs, which are indispensable to the atomistic modeling of magnetic materials.For magnetic NNFFs, the absence of magnetic forces will seriously increase the demand for additional data to fit the energy profile, which is ignored in previous studies [30][31][32][33][34][35][36].Furthermore, those models could not comprehensively explore the magnetic effects involving higher-order derivatives to magnetic moments accurately, such as the low-energy elementary excitations [37].
To address the above challenges, we propose Mag-Net, an equivariant deep-learning framework to represent DFT total energy E({R}, {M}) and its derivative forces as functions of atomic structures {R} and complex non-collinear local atomic magnetic moments {M}.
As a critical innovation, we design an ENN architecture naturally integrating both atomic and magnetic degrees of freedom and incorporate with direct mapping of magnetic forces, enabling efficient and accurate learning of magnetic materials.The method is systematically tested to show high reliability in calculating the magnon dispersion and good generalization capabilities by example studies of magnetic CrI 3 nanotubes.Finally we implement our method for studying spin dynamics of moirétwisted bilayer CrI 3 .Benefiting from the high efficiency and accuracy, there could be further promising applications of MagNet in magnetic materials computation at large length/time scales.Deep learning methods have enabled efficient material simulations with DFT accuracy.A significant generalization of the methods is required for studying magnetic materials.For nonmagnetic systems, the total energy E as a function of atomic structure {R} is calculated by self-consistent field (SCF) iterations in DFT.The function E({R}) is the learning target of NNFFs.In contrast, for magnetic systems, the total energy depends not only on atomic structure {R} but also on magnetic structure {M}.To compute the total energy for varying {M}, one needs to apply constrained DFT that utilizes the Lagrangian approach to constrain specific magnetic configuration and introduces constraining fields as an additional potential in the Kohn-Sham equation [38], which significantly increases the computational workload and is much more time-consuming.

Serif
The function of MagNet is illustrated in Fig. 1.First, magnetic materials with different atomic and magnetic configurations are calculated by constrained DFT for preparing training datasets.Then the training datasets are fed into MagNet for predicting physical properties, including DFT total energy, atomic forces, and magnetic forces for atomic and magnetic structures unseen in the training datasets.By substituting the costly SCF calculation with neural networks, the method significantly lowers the computational overhead and enables the efficient and accurate mapping between properties and structures of magnetic materials.The critical point here is empowering neural networks by leveraging a priori knowledge, which will be discussed subsequently.
Noticeably, for most magnetic materials, varying {R} will alter the strength of interatomic bonding energies in the total energy, whereas varying {M} mainly modifies the relatively weak and localized magnetic exchange interactions, leading to minor changes in the total energy.Consequently, the effects on the total energy due to alterations in {M} are expected to be weaker in magnitude and shorter in length scale.The subtle interactions require an appropriate design of neural networks, distinct from the description on changes induced by {R}.
The equivariance is another essential point to consider in network design.For atomistic systems, the physical properties of materials are equivariant under the action of rotation, inversion, and translation -which comprise the three-dimensional Euclidean group E(3).Scalar quantities like the total energy are invariant under these symmetry group operations, whereas vector quantities such as atomic forces and magnetic forces are equivariant and will change when the atomic geometry is transformed.Thus it is natural to incorporating the equivariance into the design of neural networks.Given information about one structure, the target property of all the symmetry-related structures can be obtained from neural networks via equivariant transformations, which enables a more efficient mapping in data limited cases.
Here we present an ENN architecture of MagNet.The equivariant building blocks of the neural network model are implemented following the scheme proposed by DeepH-E3 [9].Formally, a function f relating the input vector space X and the output vector space Y is regarded equivariant, provided that for any input x ∈ X, output y ∈ Y , and any group element g within a transformation group G, the following condition is satisfied: where D X (g) and D Y (g) indicates transformation matrices in X and Y , parameterized by g.In MagNet, translation symmetry is guaranteed by operating on relative positions of atoms.For rotation, features v l m carry the irreducible representation of the SO(3) group of dimension 2l + 1, where l represents the angular momentum quantum number, and m denotes the magnetic quantum number varying between −l and l.A key operation for interacting different l features is the tensor product, denoted as ⊗, which uses Clebsch-Gordan coefficients C l3,m3 l1,m1,l2,m2 to combine features x l1 and y l2 and produces output feature z l3 : Since the features are equivariant under rotation, the physical quantities represented by the features will also change equivariantly under rotation.For spatial inversion, it is necessary to introduce an additional parity index into the features, which labels the spatial inversion either even (p = 1) or odd (p = −1).Parity equivariance is ensured by permitting contributions to an output feature with parity p 3 from two features possessing parities p 1 and p 2 in the tensor product if the selection rule is satisfied: In the context of the building blocks illustrated in Fig. 2, the operation 'E3Linear' is formulated as: where c and c ′ denote the channel indices.The terms W l cc ′ and b l c are the learnable weights and biases, respectively.It is essential to note that the biases b l c are nonzero only for equivariant features v with l = 0, ensuring the preservation of equivariance requirements.'Activation' introduces a non-linear activation function, or a scalar gate, on the features depending on the index l .For features with l > 0, a linear scaling is employed.For features with l = 0, a non-linear SiLU function is employed.The normalization of the features while preserving equivariance is achieved by using the E3Layernorm proposed in Ref. [9]: where µ l m and σ l m are the mean and the standard deviation of features, respectively, g l c and h l c are learnable parameters, and ϵ is a small constant introduced for enhancing numerical stability.The term h l c is subject to the same equivariance requirements as b l c in Eq. ( 3).The network architecture of MagNet, as shown in Fig. 2, is built on an embedding block, followed by a series of atomic interaction blocks and magnetic interaction blocks, and output final vertex features after an E3Linear layer.For the embedding block, the purpose is to initialize equivariant features from the information of magnetic materials, including the interatomic distance vector r ij , the magnetic moment vector m i , and the atom species Z i .For non-magnetic atom i, the magnetic moment m i is set to zero.The radial functions expand interatomic distances and magnetic moment length in the form of Gaussian basis [14].The directions of r ij and m i , are incorporated into the real spherical harmonics Y l m with indices l and m .Atomic interaction blocks encode interactions between neighboring atoms where different features are mixed and contracted through the tensor product.Gaussian functions and a polynomial envelope function [16] are implemented in multi-layer perceptrons (MLP) as the radial weights for coupled tensor production interactions.Then the vertex features are carried magnetic moment information to interact with other atomic features in a following series of magnetic interaction blocks.Since the influence of magnetic moment is relatively more localized, we set smaller number of layers for magnetic interaction blocks than for atomic interaction blocks.Final vertex features are obtained as the output of the E3Linear layer.The total energy is derived from the sum of final vertex features with a rescaling as shown in Eq. ( 5).Atomic forces and magnetic forces are subsequently determined as the negative gradient of atomic positions and magnetic moments to the predicted total energy.
Fi,α = − ∂ Ê where σ 0 and N µ 0 are the standard deviation and the mean over the training set, respectively.N is the number of atoms, i is the atom index number and α is the coordinate index.MagNet is trained using a loss function based on a weighted sum of total energy, atomic forces, and magnetic forces mean-squared error loss terms: where λ E , λ F , and λ Fmag denote the weights of total energy, atomic forces, and magnetic forces, respectively.N , N mag are number of atoms and number of magnetic atoms, respectively.α, β are the coordinate indices.

III. RESULTS AND DISCUSSIONS
The capability of MagNet is tested by a series of example studies on the magnetic material CrI 3 .Our results demonstrate that MagNet can well reproduce DFT results.Remarkably, once trained by DFT data on small structures with random magnetic orientations, MagNet can accurately predict on new magnetic configurations unseen in the training datasets, especially the large-scale magnetic structures.To generate the dataset and benchmark results, we calculated DFT total energy, atomic forces, and magnetic forces for given magnetic configurations using constrained DFT as implemented in the DeltaSpin package [39], where the Kohn-Sham eigenstates and the constraining fields are updated alternately to obtain the target magnetic moments.Atomic and magnetic forces are obtained via the Hellmann-Feynman theorem [40].i ' denotes the l-th layer vertex feature of atom i. 'eB(|rij|), Y( mi), eB(|mi|), Y(rij)' denote radial and spherical harmonics embeddings of interatomic distance vectors rij and magnetic moment vectors mi, respectively.'(Ux) ⊗ (Vy)' denotes the tensor product operation between features x and y, where U and V are learnable parameters.' || ' denotes vector concatenation and ' • ' denotes element-wise multiplication.' j ' denotes the summation of features over neighboring vertices.
Magnons as elementary excitations of spin waves in magnetic materials are regarded as prospective information carriers, which facilitates the realization of diverse spin-wave-based logic gates [41][42][43] for potential computing applications [44].As an example study, we predicted the magnon dispersion of a magnetic material through MagNet using the neural-network automatic differentiation.We prepared DFT datasets by calculating supercells of monolayer CrI 3 with the equilibrium lattice structure and randomly perturbed magnetic moment orientations, up to 10 • away from the ground state ferromagnetic configuration [Fig.3(a)].The neural network model of MagNet was trained by the DFT data and then used to predict the magnon dispersion.To verify the results of neural-network automatic differentiation, the finite difference method by DFT was per-formed to compute the derivative of magnetic forces: , where the step size ∆ refers to the change of magnetic-moment orientation and ∆ = 5 • was chosen.The calculation results of DFT and MagNet are shown and compared in Fig. 3(b).MagNet achieves a mean-absolute error (MAE) of 1.67 × 10 −2 meV/µ B for magnetic forces on the validation dataset, and the predicted magonon dispersion agrees well with the DFT reference, indicating the good reliability and high accuracy of MagNet.
Strain gradients can significantly affect the magnetism in curved structures [45,46].CrI 3 nanotubes have attracted considerable interest for the study of curved magnetism [47].The first-principles calculations, however, are limited by the large-size structures and diverse magnetic configurations.Modeling E({R}, {M}) is a chal- lenging task for neural networks when both {R} and {M} vary simultaneously.We prepared DFT datasets by calculating flat sheets of monolayer CrI 3 featuring randomly perturbed atomic and magnetic configurations, and applied the trained neural-network model of MagNet to investigate the CrI 3 nanotubes [Fig.3(c)].Furthermore, the energies of the two possible magnetic configurations are considered.One is a non-collinear magnet, with the magnetic moments aligned along the radial direction, and the other is a ferromagnet, as displayed in the insets of Fig. 3(d).The (10, 10), (12,12), (14,14), (16,16) nanotubes of CrI 3 are used to investigate the size effects.The MAE of total energy predicted by MagNet reaches as low as 0.129 meV/atom.The energy differences between the two magnetic configurations as a function of nanotube curvature are predicted by DFT and MagNet.As shown in Fig. 3(d), the crossover from ferromagnet to non-collinear magnet as increasing the nanotube radius are well captured by MagNet, as checked by the DFT benchmark data.The good generalization ability of MagNet is thus demonstrated.
Finally, we tried a more challenging study on twisted bilayer CrI 3 , which has been reported to ex-hibit abundant non-collinear magnetic textures both theoretically and experimentally [48][49][50][51].The Landau-Lifshitz-Gilbert equation [52] was applied to update magnetic moment configurations according to the predicted magnetic forces: where γ is the electron gyromagnetic ratio and α is a phenomenological damping parameter.More specifically, the new magnetic moment orientations could be efficiently updated with the dissipative term proposed in Ref. [53]: where λ represents the step size.As shown in Fig. 4(a), the non-twisted bilayer CrI 3 datasets were used to train MagNet.Our simulations of twisted bilayer CrI 3 were carried out on a supercell comprising 4,326 atoms with a twist angle of θ = 63.48 • , which was predicted to host non-collinear magnetic configurations [48,49].Using the skyrmion state [Fig.4 [11]) and relaxed magnetic configurations of moiré-twisted bilayer CrI3 with a twist angle of 63.48 • (4,336 atoms per supercell) as predicted by MagNet.The magnetic moments of the top CrI3 layer are represented by colored arrows, with the in-plane components denoted by the arrow length and the out-of-plane components denoted by the color.On the magnetic moments of the bottom CrI3 layer, the out-of-plane components are the same as the top layer, whereas the in-plane components are opposite.(c) Band structure of the moiré-twisted bilayer CrI3 (displayed in (b)) predicted by the xDeepH method [11].
predicted by Ref. [48] as the initial magnetic configuration, we performed the spin dynamics simulation according to Eq. ( 10) with the magnetic forces predicted by MagNet.Converged within a few hundred steps, the skyrmion state transits to a more stable magnetic configuration, in which the out-of-plane components are positive and the in-plane components are in opposite directions between the top and bottom layers [Fig.4(b)].Furthermore, we applied the extended deep-learning DFT Hamiltonian method (named xDeepH) [11] to predict the electronic structure of the relaxed magnetic configuration.As shown in Fig. 4(c), the valence bands near the Fermi level become flatter after performing the spin dynamics.The isolated flat bands could be useful for exploring the correlated electronic and magnetic physics.This work demonstrates that the magnetic and electronic structures of magnetic superstructures can be predicted by deep learning methods.

IV. CONCLUSIONS
In brief summary, we proposed a general neuralnetwork framework of MagNet to represent DFT total energy, atomic and magnetic forces as functions of atomic and magnetic structures by deep neural networks.Mag-Net incorporates the E(3) group symmetry, which significantly reduces the training complexity and the amount of training data required.High accuracy and exceptional generalization ability of the method are demonstrated by investigating various kinds of magnets formed by CrI 3 .This approach creates opportunities for exploring novel magnetism and spin dynamics in magnetic structures at large length/time scales.

V. APPENDIX A. Neural-network methods
The MagNet model is implemented with PyTorch, Py-Torch Geometric and e3nn [54] libraries.The initial vertex features were set to be 32 × 0e.For the interaction blocks, equivariant vertex features are set to be 32 × 0e + 16 × 1o , where 32 × 0e denotes 32 equivariant vectors carrying an l = 0 representation with even parity, and 16 × 1o denotes 16 equivariant vectors carrying an l = 1 representation with odd parity.For the magnetic interaction blocks, equivariant vertex features were set to be 32 × 0e + 16 × 1o.The number of interaction blocks N 1 and magnetic interaction blocks N 2 are set to be 4 and 2, respectively.Spherical harmonics has a maximal angular momentum of l = 2.The learning rate is selected initially to be 0.001 and decreases by a factor of 0.5 when the loss does not decrease after 20 epochs.The cutoff of the radius on three datasets are set as 6 Å.All the network results are reported on validation sets.For the study of the magnon dispersion of CrI 3 , we use λ E = 10, λ F = 10 5 and λ Fmag = 10 8 , getting mean-absolute errors (MAE) of 5.57 × 10 −7 eV/atom, 1.26 × 10 −4 eV/ Å, and 1.67 × 10 −5 eV/µ B for DFT total energy, atomic forces and magnetic forces, respectively.For the study of CrI 3 nanotubes, we use λ E = 10, λ F = 10 5 and λ Fmag = 10 6 , getting MAEs of 1.29×10 −4 eV/atom, 3.77×10 −3 eV/ Å, and 1.30×10 −3 eV/µ B for total energy, atomic forces and magnetic forces, respectively.For the study of bilayer CrI 3 , we use λ E = 10, λ F = 10 5 and λ Fmag = 10 6 , getting MAEs of 1.32×10 −3 eV/atom, 9.35×10 −3 eV/ Å, and 3.99 × 10 −3 eV/µ B for DFT total energy, atomic forces and magnetic forces, respectively.Neural-network training is performed on an NVIDIA RTX 3090 GPU with a batch size of 1.For spin dynamics simulations, the step size λ is set as 20.

B. DFT calculation methods
DFT calculations are performed by the Vienna ab initio simulation package [55] using the Perdew-Berke-Ernzerhof-type exchange-correlation functional.Constrained DFT calculations as implemented by the Deltaspin method [39] is applied to study systems with specified magnetic configurations.In all calculations, we constrain the magnetic orientation as well as the magnitude of magnetic moments.The spin-orbit coupling is included in the DFT calculations.The DFT + U method with a Hubbard correction of U = 3.0 eV is applied to describe the 3d orbitals of Cr.
The dataset of 3 × 3 supercells of monolayer CrI 3 contains 200 configurations with random perturbations (up to 10 • ) away from the ground state ferromagnetic configuration in the equilibrium atomic structure.The energy cut off of plane wave basis is set to be 400 eV.A Monkhorst-Pack k -mesh of 3 × 3 × 1 is used.
For the dataset of 2 × 2 supercells of monolayer CrI 3 , 80 different atomic structures are prepared by introducing random atomic displacements (up to 0.1 Å) to each atom about their equilibrium positions.For each atomic structure, 5 random magnetic configurations are generated by arbitrarily arranging the orientation of the constrained magnetic moment for each magnetic atom Cr, giving 400 structures in total.The energy cut off of plane wave basis is set to be 400 eV.A Monkhorst-Pack k -mesh of 4 × 4 × 1 is used.For CrI 3 nanotubes, a Monkhorst-Pack k -mesh of 1 × 1 × 5 is used.
For the dataset of supercells of bilayer CrI 3 , the second layer is shifted with respect to the first layer.The shift is sampled by an 8 × 8 grid of the supercell, yielding 64 atomic configurations.In addition, random displacements (up to 0.1 Å) are introduced to each atom about their equilibrium positions.For each atomic configuration, 5 random magnetic configurations, ferromagnetic and antiferromagnetic configurations with a random perturbation (up to 30 • ) away from the z-axis configurations are generated, giving 448 structures.The energy cut off of plane wave basis is set to be 300 eV.A Monkhorst-Pack k -mesh of 2 × 2 × 1 is used.
All the datasets are divided into training and validation sets randomly by a ratio of 8 : 2.

C. Magnon dispersion computation
The energy of a magnetic material near a minimum can be described by a quadratic form: where S a and S b denote the spin vector in global coordinates, a and b denote the index of spin site respectively, and J ab denotes the exchange coupling matrix.We choose a local-coordinate system on each magnetic atom, with the z axis parallel to the ground-state magnetic orientation.The transformation between local and global coordinates is a three by three orthogonal matrix R a : where S ′ a is the spin vector in local coordinates.When the deviation from the ground-state magnetization is small, S ′ a can be written as: S ′ a = S a ( ẑ + P ⊤ χ a ), (13) where S a is the length of the a-th spin, ẑ = (0, 0, 1) ⊤ , P = 1 0 0 0 1 0 , and χ a is the projection of spin on the local x-y plane, normalized by the length of spin.We note that χ a is a two-component object.
From the Heisenberg equation of motion, we obtain: where σ y is the Pauli-y matrix, and the spin dynamical matrix D ab is defined as: In a system with translation symmetry, the dynamical matrix only depends on the difference between r a and r b , therefore we can write: where r a is the position of the a-th spin.Finally, the magnon dispersion is obtained by diagonalizing the Fourier transformation of the spin dynamical matrix D(k):

FIG. 1 .
FIG.1.Function of MagNet.MagNet is an equivariant neural network model mapping from the atomic structure {R} and magnetic structure {M} to physical properties, including the total energy, atomic forces, and magnetic forces.

FIG. 2 .
FIG. 2. Network architecture of MagNet.(a) Model sketch of MagNet.MagNet embeds the atomic and magnetic structures of magnetic materials, extracts the geometric information through a series of atomic and magnetic interaction blocks, and outputs the final vertex features through an E3Linear layer.N1 and N2 denote the number of atomic and magnetic interaction blocks, respectively.(b) Details of the embedding block.(c) Details of the atomic interaction block.(d) Details of the magnetic interaction block.'Z' denotes the atom type embeddings.'v (l)

FIG. 3 .
FIG. 3. Example applications of MagNet.(a) Schematic diagrams showing monolayer CrI3 and its spin waves.MagNet is trained by DFT calculation results of monolayer CrI3 and used to study spin waves.(b) Magnon dispersion of monolayer CrI3 predicted by MagNet via neural networks automatic differentiation and further checked by DFT finite-difference calculations.(c) Generation ability of MagNet.MagNet learns from DFT results of monolayer CrI3 and generalizes to study CrI3 nanotubes.(d) Energy difference (∆E) between the two possible magnetic configurations displayed in the insets as a function of nanotube curvature 1/R.

FIG. 4 .
FIG. 4. Example applications of MagNet in studying moiré-twisted materials.(a) Generation ability of MagNet.MagNet learns from DFT results of non-twisted bilayer CrI3 and generalizes to study moiré-twisted bilayer CrI3 with varying twist angles.(b) Initial magnetic configurations (adapted from Ref.[11]) and relaxed magnetic configurations of moiré-twisted bilayer CrI3 with a twist angle of 63.48 • (4,336 atoms per supercell) as predicted by MagNet.The magnetic moments of the top CrI3 layer are represented by colored arrows, with the in-plane components denoted by the arrow length and the out-of-plane components denoted by the color.On the magnetic moments of the bottom CrI3 layer, the out-of-plane components are the same as the top layer, whereas the in-plane components are opposite.(c) Band structure of the moiré-twisted bilayer CrI3 (displayed in (b)) predicted by the xDeepH method[11].