1 Introduction

Ab initio calculations employing density functional theory (DFT) have become an indispensable tool in material discovery, but their practical applications are limited to small-size systems due to the high computational cost. Deep learning has been proposed as a viable solution to address the trade-off between efficiency and accuracy. Over the past decade, deep learning ab initio methods have revolutionized electronic and atomistic modeling [113]. For electronic modeling, a series of deep neural networks are developed to learn the relationship between DFT Hamiltonians and materials structures [8, 9, 13]. By satisfying the principle of equivariance, the neural-network approach has demonstrated exceptional accuracy in example studies of various non-magnetic systems [810, 13]. Remarkably, an extended deep-learning method has been developed to learn the mapping from atomic structures and magnetic structures to DFT Hamiltonians, which preserves a generalized principle of equivariance for magnetic materials [11]. For atomistic modeling, neural network force fields (NNFFs) have been devised for non-magnetic materials and widely applied in molecular dynamics and Monte Carlo simulations [1425]. The corresponding research on magnetic materials is of equal importance; however, it remains largely unexplored.

The development of NNFFs for magnetic materials faces a few challenges. Firstly, magnetic NNFFs double the degrees of freedoms (6N) of conventional NNFFs (3N), which requires substantial amount of extra data for machine learning. At the same time, the training data of magnetic materials are significantly more costly than conventional DFT datasets due to the additional computational workload for constraining magnetic configurations, exacerbating the situation of data scarcity. Yang et al. [26] recently proposed a neural network method based on an existing descriptor based model, but it suffers from data inefficiency problem for only including invariant features, which has been explicitly illustrated in previous works [22, 24]. Alternatively, incorporating a prior knowledge of symmetry into neural network design can alleviate this problem. Equivariant neural networks (ENNs) [2729] are aimed to satisfy the equivariance requirements by ensuring that the input, output and all internal features transform equivalently under the symmetry group. Therefore, ENNs can be extended to scenarios with limited data without necessitating data augmentation, rendering them more viable for magnetic material energy modeling task.

Secondly, the derivatives of total energy with respect to varying orientations of magnetic moments (called magnetic forces in this work) serve a role in analogy to atomic forces in conventional NNFFs, which are indispensable to the atomistic modeling of magnetic materials. Yu et al. [30] recently proposed a time-reversal equivariant neural network to map the energy of magnetic materials, where the training data of magnetic forces were not used and explicitly learned. For magnetic NNFFs, the absence of magnetic forces will seriously increase the demand for additional data to fit the energy profile, which is ignored in previous studies [3137]. Furthermore, those models could not comprehensively explore the magnetic effects involving higher-order derivatives to magnetic moments accurately, such as the low-energy elementary excitations [38].

To address the above challenges, we propose MagNet, an equivariant deep-learning framework to represent DFT total energy \(E(\{\mathcal{R}\},\{\mathcal{M}\})\) and its derivative forces as functions of atomic structures \(\{\mathcal{R}\}\) and complex non-collinear local atomic magnetic moments \(\{\mathcal{M}\}\). As a critical innovation, we design an ENN architecture naturally integrating both atomic and magnetic degrees of freedom and incorporate with direct mapping of magnetic forces, enabling efficient and accurate learning of magnetic materials. The method is systematically tested to show high reliability in calculating the magnon dispersion and good generalization capabilities by example studies of magnetic CrI3 nanotubes. Finally we implement our method for studying spin dynamics of moiré-twisted bilayer CrI3. Benefiting from the high efficiency and accuracy, there could be further promising applications of MagNet in magnetic materials computation at large length/time scales.

2 Methods

Deep learning methods have enabled efficient material simulations with DFT accuracy. A significant generalization of the methods is required for studying magnetic materials. For nonmagnetic systems, the total energy E as a function of atomic structure \(\{\mathcal{R}\}\) is calculated by self-consistent field (SCF) iterations in DFT. The function \(E(\{\mathcal{R}\})\) is the learning target of NNFFs. In contrast, for magnetic systems, the total energy depends not only on atomic structure \(\{\mathcal{R}\}\) but also on magnetic structure \(\{\mathcal{M}\}\). To compute the total energy for varying \(\{\mathcal{M}\}\), one needs to apply constrained DFT that utilizes the Lagrangian approach to constrain specific magnetic configuration and introduces constraining fields as an additional potential in the Kohn–Sham equation [39], which significantly increases the computational workload and is much more time-consuming.

The function of MagNet is illustrated in Fig. 1. First, magnetic materials with different atomic and magnetic configurations are calculated by constrained DFT for preparing training datasets. Then the training datasets are fed into MagNet for predicting physical properties, including DFT total energy, atomic forces, and magnetic forces for atomic and magnetic structures unseen in the training datasets. By substituting the costly SCF calculation with neural networks, the method significantly lowers the computational overhead and enables the efficient and accurate mapping between properties and structures of magnetic materials. The critical point here is empowering neural networks by leveraging a priori knowledge, which will be discussed subsequently.

Figure 1
figure 1

Function of MagNet. MagNet is an equivariant neural network model mapping from the atomic structure \(\{\mathcal{R}\}\) and magnetic structure \(\{\mathcal{M}\}\) to physical properties, including the total energy, atomic forces, and magnetic forces

Noticeably, for most magnetic materials, varying \(\{\mathcal{R}\} \) will alter the strength of interatomic bonding energies in the total energy, whereas varying \(\{\mathcal{M}\}\) mainly modifies the relatively weak and localized magnetic exchange interactions, leading to minor changes in the total energy. Consequently, the effects on the total energy due to alterations in \(\{\mathcal{M}\}\) are expected to be weaker in magnitude and shorter in length scale. The subtle interactions require an appropriate design of neural networks, distinct from the description on changes induced by \(\{\mathcal{R}\} \).

The equivariance is another essential point to consider in network design. For atomistic systems, the physical properties of materials are equivariant under the action of rotation, inversion, and translation — which comprise the three-dimensional Euclidean group E(3). Scalar quantities like the total energy are invariant under these symmetry group operations, whereas vector quantities such as atomic forces and magnetic forces are equivariant and will change when the atomic geometry is transformed. Thus it is natural to incorporate the equivariance into the design of neural networks. Given information about one structure, the target property of all the symmetry-related structures can be obtained from neural networks via equivariant transformations, which enables a more efficient mapping in data limited cases.

Here we present an ENN architecture of MagNet. The equivariant building blocks of the neural network model are implemented following the scheme proposed by DeepH-E3 [9]. Formally, a function f relating the input vector space X and the output vector space Y is regarded equivariant, provided that for any input \(x \in {X} \), output \(y \in {Y} \), and any group element g within a transformation group G, the following condition is satisfied:

$$ f\bigl({D}_{{X}}(g)x\bigr) = {D}_{{Y}}(g)f(x), $$
(1)

where \({D}_{{X}}(g) \) and \({D}_{{Y}}(g) \) indicate transformation matrices in X and Y, parameterized by g. In MagNet, translation symmetry is guaranteed by operating on relative positions of atoms. For rotation, features \(v_{m}^{l}\) carry the irreducible representation of the \(\mathrm{SO(3)}\) group of dimension \(2l + 1\), where l represents the angular momentum quantum number, and m denotes the magnetic quantum number varying between −l and l. A key operation for interacting different l features is the tensor product, denoted as ⊗, which uses Clebsch–Gordan coefficients \(C^{l_{3},m_{3}}_{l_{1},m_{1},l_{2},m_{2}} \)to combine features \(x^{l_{1}} \) and \(y^{l_{2}} \) and produces output feature \(z^{l_{3}} \):

$$\begin{aligned} z^{l_{3}}_{m_{3}}&= x_{m_{1}}^{l_{1}} \otimes y_{m_{2}}^{l_{2}} \\ &= \sum_{m_{1},m_{2}} C^{l_{3},m_{3}}_{l_{1},m_{1},l_{2},m_{2}} x^{l_{1}}_{m_{1}} y^{l_{2}}_{m_{2}}. \end{aligned}$$
(2)

Since the features are equivariant under rotation, the physical quantities represented by the features will also change equivariantly under rotation. For spatial inversion, it is necessary to introduce an additional parity index into the features, which labels the spatial inversion either even (\(p = 1\)) or odd (\(p = -1\)). Parity equivariance is ensured by permitting contributions to an output feature with parity \(p_{3}\) from two features possessing parities \(p_{1}\) and \(p_{2}\) in the tensor product if the selection rule is satisfied: \(p_{3} = p_{1} p_{2}\). In addition, for time-reversal symmetry, it could be treated essentially in the same way as parity and integrated into ENNs, as implemented in the previous work [11].

In the context of the building blocks illustrated in Fig. 2, the operation ‘E3Linear’ is formulated as:

$$ \text{E3Linear}\bigl(v^{l}_{cm}\bigr) = \sum _{c'} W^{l}_{cc'} v^{l}_{c'm} + b^{l}_{c}, $$
(3)

where c and \(c'\) denote the channel indices. The terms \(W^{l}_{cc'}\) and \(b^{l}_{c}\) are the learnable weights and biases, respectively. It is essential to note that the biases \(b^{l}_{c}\) are nonzero only for equivariant features v with \(l = 0\), ensuring the preservation of equivariance requirements. ‘Activation’ introduces a non-linear activation function, or a scalar gate, on the features depending on the index l. For features with \(l > 0\), a linear scaling is employed. For features with \(l = 0\), a non-linear SiLU function is employed. The normalization of the features while preserving equivariance is achieved by using the E3Layernorm proposed in Ref. [9]:

$$ \text{E3Layernorm}\bigl(v^{l}_{cm}\bigr) = g^{l}_{c} \cdot \frac{ v^{l}_{cm} - \mu ^{l}_{m}}{\sigma ^{l}_{m} + \epsilon} + h^{l}_{c}, $$
(4)

where \(\mu ^{l}_{m}\) and \(\sigma ^{l}_{m}\) are the mean and the standard deviation of features, respectively, \(g^{l}_{c}\) and \(h^{l}_{c}\) are learnable parameters, and ϵ is a small constant introduced for enhancing numerical stability. The term \(h^{l}_{c}\) is subject to the same equivariance requirements as \(b^{l}_{c}\) in Eq. (3).

Figure 2
figure 2

Network architecture of MagNet. (a) Model sketch of MagNet. MagNet embeds the atomic and magnetic structures of magnetic materials, extracts the geometric information through a series of atomic and magnetic interaction blocks, and outputs the final vertex features through an E3Linear layer. \(N_{1}\) and \(N_{2}\) denote the number of atomic and magnetic interaction blocks, respectively. (b) Details of the embedding block. (c) Details of the atomic interaction block. (d) Details of the magnetic interaction block. ‘Z’ denotes the atom type embeddings. ‘\(\mathrm{v}^{(l)}_{i}\)’ denotes the l-th layer vertex feature of atom i. ‘\({\mathbf{e}_{\mathrm{B}}(| \mathbf{r}_{ij}|)},{\mathbf{Y}(\hat{m}_{i})}, {\mathbf{e}_{\mathrm{B}}(| \mathbf{m}_{i}|)},{\mathbf{Y}(\hat{r}_{ij})}\)’ denote radial and spherical harmonics embeddings of interatomic distance vectors \(\mathbf{r}_{ij}\) and magnetic moment vectors \(\mathbf{m}_{i}\), respectively. ‘\((\mathbf{U}x) \otimes (\mathbf{V}y)\)’ denotes the tensor product operation between features x and y, where U and V are learnable parameters. ‘ ∥ ’ denotes vector concatenation and ‘ ⋅ ’ denotes element-wise multiplication. ‘ \(\sum_{j}\)’ denotes the summation of features over neighboring vertices

The network architecture of MagNet, as shown in Fig. 2, is built on an embedding block, followed by a series of atomic interaction blocks and magnetic interaction blocks, and output final vertex features after an E3Linear layer. For the embedding block, the purpose is to initialize equivariant features from the information of magnetic materials, including the interatomic distance vector \(\mathbf{r}_{ij}\), the magnetic moment vector \(\mathbf{m}_{i}\), and the atom species \(Z_{i}\). For non-magnetic atom i, the magnetic moment \(\mathbf{m}_{i}\) is set to zero. The radial functions expand interatomic distances and magnetic moment length in the form of Gaussian basis [14]. The directions of \(\mathbf{r}_{ij}\) and \(\mathbf{m}_{i}\), are incorporated into the real spherical harmonics \(Y^{l}_{m}\) with indices l and m. Atomic interaction blocks encode interactions between neighboring atoms where different features are mixed and contracted through the tensor product. Gaussian functions and a polynomial envelope function [16] are implemented in multi-layer perceptrons (MLP) as the radial weights for coupled tensor production interactions. Then the vertex features are carried magnetic moment information to interact with other atomic features in a following series of magnetic interaction blocks. Since the influence of magnetic moment is relatively more localized, we set smaller number of layers for magnetic interaction blocks than for atomic interaction blocks. Final vertex features are obtained as the output of the E3Linear layer. The total energy is derived from the sum of final vertex features with a rescaling as shown in Eq. (5). Atomic forces and magnetic forces are subsequently determined as the negative gradient of atomic positions and magnetic moments to the predicted total energy.

$$\begin{aligned} &\hat{E} = \sum_{i} (\sigma _{0} \mathrm{v}_{i} + {\mu _{0}}), \end{aligned}$$
(5)
$$\begin{aligned} &{\hat{F}_{i,\alpha}} = - \frac{\partial \hat{E}}{\partial r_{i,\alpha}}, \end{aligned}$$
(6)
$$\begin{aligned} &\hat{F}_{\mathrm{mag} i,\alpha} = -{ \frac{\partial \hat{E}}{\partial m_{i,\alpha}}}, \end{aligned}$$
(7)

where \(\sigma _{0}\) and \(N \mu _{0}\) are the standard deviation and the mean over the training set, respectively. N is the number of atoms, i is the atom index number and α is the coordinate index. MagNet is trained using a loss function based on a weighted sum of total energy, atomic forces, and magnetic forces mean-squared error loss terms:

$$\begin{aligned} L={}&\lambda _{E} \Vert \hat{E}-E \Vert ^{2}+ \frac{\lambda _{F}}{3 N} \sum_{i=1}^{N} \sum _{\alpha =1}^{3} \biggl\Vert - \frac{\partial \hat{E}}{\partial r_{i, \alpha}}-F_{i, \alpha} \biggr\Vert ^{2} \\ &{} + \frac{\lambda _{F_{\mathrm{mag}}}}{3 N_{\mathrm{mag}}} \sum_{j=1}^{N_{ \mathrm{mag}}} \sum _{\beta =1}^{3} \biggl\Vert - \frac{\partial \hat{E}}{\partial m_{j, \beta}}-F_{\mathrm{mag} j, \beta} \biggr\Vert ^{2}, \end{aligned}$$
(8)

where \(\lambda _{E}\), \(\lambda _{F}\), and \(\lambda _{F_{\mathrm{mag}}}\) denote the weights of total energy, atomic forces, and magnetic forces, respectively. N, \(N_{\mathrm{mag}}\) are number of atoms and number of magnetic atoms, respectively. α, β are the coordinate indices.

3 Results and discussions

The capability of MagNet is tested by a series of example studies on the magnetic material CrI3. Our results demonstrate that MagNet can well reproduce DFT results. Remarkably, once trained by DFT data on small structures with random magnetic orientations, MagNet can accurately predict on new magnetic configurations unseen in the training datasets, especially the large-scale magnetic structures. To generate the dataset and benchmark results, we calculated DFT total energy, atomic forces, and magnetic forces for given magnetic configurations using constrained DFT as implemented in the DeltaSpin package [40], where the Kohn–Sham eigenstates and the constraining fields are updated alternately to obtain the target magnetic moments. Atomic and magnetic forces are obtained via the Hellmann–Feynman theorem [41].

Magnons as elementary excitations of spin waves in magnetic materials are regarded as prospective information carriers, which facilitates the realization of diverse spin-wave-based logic gates [4244] for potential computing applications [45]. As an example study, we predicted the magnon dispersion of a magnetic material through MagNet using the neural-network automatic differentiation. We prepared DFT datasets by calculating supercells of monolayer \(\mathrm{CrI_{3}}\) with the equilibrium lattice structure and randomly perturbed magnetic moment orientations, up to 10° away from the ground state ferromagnetic configuration [Fig. 3(a)]. The neural network model of MagNet was trained by the DFT data and then used to predict the magnon dispersion. To verify the results of neural-network automatic differentiation, the finite difference method by DFT was performed to compute the derivative of magnetic forces: \(f'(x) = [f(x+\Delta ) - f(x-\Delta )]/2\Delta \), where the step size Δ refers to the change of magnetic-moment orientation and \(\Delta = 5\)° was chosen. Details of deriving the magnon dispersion are described in Appendix C. The calculation results of DFT and MagNet are shown and compared in Fig. 3(b). MagNet achieves a mean-absolute error (MAE) of \(1.67 \times 10^{-2}\,\text{meV/}\mu _{\mathrm{B}}\) for magnetic forces on the validation dataset, and the predicted magnon dispersion agrees well with the DFT reference, indicating the good reliability and high accuracy of MagNet.

Figure 3
figure 3

Example applications of MagNet. (a) Schematic diagrams showing monolayer CrI3 and its spin waves. MagNet is trained by DFT calculation results of monolayer CrI3 and used to study spin waves. (b) Magnon dispersion of monolayer CrI3 predicted by MagNet via neural networks automatic differentiation and further checked by DFT finite-difference calculations. (c) Generalization ability of MagNet. MagNet learns from DFT results of monolayer CrI3 and generalizes to study CrI3 nanotubes. (d) Energy difference (ΔE) between the two possible magnetic configurations displayed in the insets as a function of nanotube curvature \(1/R\)

Strain gradients can significantly affect the magnetism in curved structures [46, 47]. \(\mathrm{CrI}_{3}\) nanotubes have attracted considerable interest for the study of curved magnetism [48]. The first-principles calculations, however, are limited by the large-size structures and diverse magnetic configurations. Modeling \(E(\{\mathcal{R}\},\{\mathcal{M}\})\) is a challenging task for neural networks when both \(\{\mathcal{R}\}\) and \(\{\mathcal{M}\}\) vary simultaneously. We prepared DFT datasets by calculating flat sheets of monolayer CrI3 featuring randomly perturbed atomic and magnetic configurations, and applied the trained neural-network model of MagNet to investigate the CrI3 nanotubes [Fig. 3(c)]. Furthermore, the energies of the two possible magnetic configurations are considered. One is a non-collinear magnet, with the magnetic moments aligned along the radial direction, and the other is a ferromagnet, as displayed in the insets of Fig. 3(d). The (10, 10), (12, 12), (14, 14), (16, 16) nanotubes of CrI3 are used to investigate the size effects. The MAE of total energy predicted by MagNet reaches as low as \(0.129\,\text{meV/atom}\). The energy differences between the two magnetic configurations as a function of nanotube curvature are predicted by DFT and MagNet. As shown in Fig. 3(d), the crossover from ferromagnet to non-collinear magnet as increasing the nanotube radius are well captured by MagNet, as checked by the DFT benchmark data. The good generalization ability of MagNet is thus demonstrated.

Finally, we tried a more challenging study on twisted bilayer CrI3, which has been reported to exhibit abundant non-collinear magnetic textures both theoretically and experimentally [4952]. The Landau–Lifshitz–Gilbert equation [53] was applied to update magnetic moment configurations according to the predicted magnetic forces:

$$ \frac{d m_{i}}{d t}=\gamma m_{i} \times \frac{\partial {E}}{\partial m_{i}}+\gamma \alpha m_{i} \times \biggl(m_{i} \times \frac{\partial {E}}{\partial m_{i}} \biggr), $$
(9)

where γ is the electron gyromagnetic ratio and α is a phenomenological damping parameter. More specifically, the new magnetic moment orientations could be efficiently updated with the dissipative term proposed in Ref. [54]:

$$ \hat{m_{i}}^{\prime}=\hat{m_{i}}+\lambda \hat{m_{i}}\times \biggl( \hat{m_{i}} \times \frac{\partial E}{\partial \hat{m_{i}}} \biggr), $$
(10)

where λ represents the step size, and the magnitude of the magnetic moment is normalized after each update step.

As shown in Fig. 4(a), the non-twisted bilayer CrI3 datasets were used to train MagNet. Our simulations of twisted bilayer CrI3 were carried out on a supercell comprising 4,326 atoms with a twist angle of \(\theta = 63.48\)°, which was predicted to host non-collinear magnetic configurations [49, 50]. Using the skyrmion state [Fig. 4(b)] predicted by Ref. [49] as the initial magnetic configuration, we performed the spin dynamics simulation according to Eq. (10) with the magnetic forces predicted by MagNet. Converged within a few hundred steps, the skyrmion state transits to a more stable magnetic configuration, in which the out-of-plane components are positive and the in-plane components are in opposite directions between the top and bottom layers [Fig. 4(b)]. Furthermore, we applied the extended deep-learning DFT Hamiltonian method (named xDeepH) [11] to predict the electronic structure of the relaxed magnetic configuration. As shown in Fig. 4(c), the valence bands near the Fermi level become flatter after performing the spin dynamics. The isolated flat bands could be useful for exploring the correlated electronic and magnetic physics. This work demonstrates that the magnetic and electronic structures of magnetic superstructures can be predicted by deep learning methods.

Figure 4
figure 4

Example applications of MagNet in studying moiré-twisted materials. (a) Generalization ability of MagNet. MagNet learns from DFT results of non-twisted bilayer CrI3 and generalizes to study moiré-twisted bilayer CrI3 with varying twist angles. (b) Initial magnetic configurations (adapted from Ref. [11]) and relaxed magnetic configurations of moiré-twisted bilayer CrI3 with a twist angle of 63.48 (4,336 atoms per supercell) as predicted by MagNet. The magnetic moments of the top CrI3 layer are represented by colored arrows, with the in-plane components denoted by the arrow length and the out-of-plane components denoted by the color. On the magnetic moments of the bottom CrI3 layer, the out-of-plane components are the same as the top layer, whereas the in-plane components are opposite. (c) Band structure of the moiré-twisted bilayer CrI3 (displayed in (b)) predicted by the xDeepH method [11]

4 Conclusions

In brief summary, we proposed a general neural-network framework of MagNet to represent DFT total energy, atomic and magnetic forces as functions of atomic and magnetic structures by deep neural networks. MagNet incorporates the E(3) group symmetry, which significantly reduces the training complexity and the amount of training data required. High accuracy and exceptional generalization ability of the method are demonstrated by investigating various kinds of magnets formed by CrI3. This approach creates opportunities for exploring novel magnetism and spin dynamics in magnetic structures at large length/time scales.