1 Introduction

Machine learning is currently revolutionizing many technological areas of modern society, ranging from image/speech recognition to content filtering on social networks and self-driving cars [1, 2]. Recently, its tools and techniques have been adopted to tackle intricate quantum many-body problems [314], where the exponential scaling of the Hilbert space dimension poses a notorious challenge. In particular, a number of supervised and unsupervised learning methods have demonstrated remarkable strides in the realm of identifying phases and phase transitions in various systems [1332]. The integration of deep neural networks and prior knowledge about physical systems has enabled a more nuanced and accurate characterization of different phases in complex materials. These methodologies have proven particularly effective in handling large datasets and extracting intricate patterns that may be challenging for traditional analytical methods. Moreover, the application of machine learning in this domain has not only enhanced the speed of phase identification but has also contributed to a deeper understanding of the underlying mechanisms governing phase transitions [13, 16, 17, 20, 22, 23, 25]. Following these approaches, notable proof-of-principle experiments with different platforms [3336], including electron spins in diamond nitrogen-vacancy (NV) centers [33], doped CuO2 [36], and cold atoms in optical lattices [34, 35], have also been carried out subsequently, showing great potentials for unparalleled advantages of machine learning approaches compared to traditional means.

An important question of both theoretical and experimental relevance concerns the reliability of such machine-learning approaches to condensed matter physics: are these approaches robust to adversarial perturbations, which are deliberately crafted in a way intended to fool the classifiers? In the realm of adversarial machine learning [3743], it has been shown that machine learning models can be surprisingly vulnerable to adversarial perturbations if the dimension of the data is high enough—one can often synthesize small, imperceptible perturbations of the input data to cause the model make highly-confident but erroneous predictions. A prominent adversarial example that clearly manifests such vulnerability of classifiers based on deep neural networks was first observed by Szegedy et al. [44], where adding a small adversarial perturbation, although unnoticeable to human eyes, will cause the classifier to miscategorize a panda as a gibbon with confidence larger than 99%. In this paper, we investigate the vulnerability of machine learning approaches in the context of classifying different phases of matter, with a focus on supervised learning based on deep neural networks (see Fig. 1 for an illustration).

Figure 1
figure 1

A schematic illustration for the vulnerability of machine learning phases of matter. For a clean image, such as the time-of-flight image obtained in a recent cold-atom experiment [34], a trained neural network (i.e., the classifier) can successfully predict its corresponding Chern number with nearly unit accuracy. However, if we add a tiny adversarial perturbation (which is imperceptible to human eyes) to the original image, the same classifier will misclassify the resulted image into an incorrect category with nearly unit confidence

We find that typical phase classifiers based on deep neural networks are likewise extremely vulnerable to adversarial perturbations. This is demonstrated through two concrete examples, which cover different phases of matter (including both symmetry-breaking and symmetry-protected topological phases) and different strategies to obtain the adversarial perturbations. To better understand why these adversarial examples can fool the classifier in the physics context, we open up the neural network and use an idea borrowed from the machine learning community, called activation map [45, 46], to study how the classifier infers different phases of matter. Further, we show that an adversarial training-based defense strategy improves classifiers’ ability to resist specific perturbations and how well the underlying physical principles are captured. Our results shed light on the fledgling field of machine-learning applications in condensed matter physics, which may provide an important paradigm for future theoretical and experimental studies as the field matures.

To begin with, we introduce the main ideas of adversarial machine learning, which involves the generation and defense of adversarial examples [3740]. Adversarial examples are instances with small intentionally crafted perturbations to cause the classifier make incorrect predictions. Under the supervised learning scenario, we have a training data set with labels \(\mathcal{D}_{n}=\{(\boldsymbol{x}^{(1)},y^{(1)}),\ldots ,( \boldsymbol{x}^{(n)},y^{(n)})\}\), a classifier \(h(\cdot ;\theta )\) and a loss function L to evaluate the classifier’s performance. Adversarial examples generation task can be reduced to an optimization problem: searching for a bounded perturbation that maximizes the loss function (see Sec. I in the Additional file 1):

$$\begin{aligned} \max_{\delta \in \Delta} L\bigl(h\bigl(\boldsymbol{x}^{(i)}+\delta ; \theta \bigr),y^{(i)}\bigr). \end{aligned}$$
(1)

In the machine learning literature, a number of methods have been proposed to solve the above optimization problem, along with corresponding defense strategies [41, 4750]. We employ some of these methods and one general defense strategy, adversarial training, on two concrete examples: one concerns the conventional paramagnetic/ferromagnetic phases with a two-dimensional classical Ising model [14, 16, 17]; the other involves topological phases with experimental raw data generated by a solid-state quantum simulator [33].

2 Results

2.1 The ferromagnetic Ising model

The first example we consider involves the ferromagnetic Ising model defined on a 2D square lattice: \(H_{\text{Ising}} = -J\sum_{\langle ij\rangle}\sigma _{i}^{z}\sigma _{j}^{z}\), where the Ising variables \(\sigma _{i}^{z}=\pm 1\) and the coupling strength \(J\equiv 1\) is set to be the energy unit. This model features a well-understood phase transition at the critical temperature \(T_{c}=2/\ln (1+\sqrt{2})\approx 2.366\) [51], between a high-temperature paramagnetic phase and a low-temperature ferromagnetic phase. In the context of machine learning phases of matter, different pioneering approaches, including these based on supervised learning [16], unsupervised learning [14], or a confusion scheme combining both [17], have been introduced to classify the ferromagnetic/paramagnetic phases hosted by the above 2D Ising model. In particular, Carrasquilla and Melko first explored a supervised learning scheme based on a fully connected feed-forward neural network [16]. They used equilibrium spin configurations sampled from Monte Carlo simulations to train the network and demonstrated that after training it can correctly classify new samples with notably high accuracy. Moreover, through scanning the temperature the network can also locate the transition temperature \(T_{c}\) and extrapolate the critical exponents that are crucial in the study of phase transitions.

To study the robustness of these introduced machine learning approaches to adversarial perturbations, we first train a powerful classifier which has comparable performance with the ones shown in [16]. After training, the network can successfully classify data from the test set with a high accuracy larger than 97%. Then we try to obtain adversarial perturbations to attack this seemingly ideal classifier. It is natural to consider discrete attack in this scenario since the spin configuration in Ising model can be only discretely changed as spin flips. We apply the differential evolution algorithm (DEA) [52] to the Monte Carlo sampled spin configurations and obtain the corresponding adversarial perturbations. A concrete example found by DEA is illustrated in Fig. 2(a-b). Initially, the legitimate example shown in (a) is in the ferromagnetic phase, which has magnetization \(M=|\sum_{i}^{N}\sigma _{i}|/N=0.791\) and the classifier classifies it into the correct phase with confidence 72%. DEA obtains an adversarial example shown in (b) by flipping only a single spin, which is located in the red circle. This new spin configuration has almost the same magnetization \(M=0.789\) and should still belong to the ferromagnetic phase, but the classifier misclassifies it into the paramagnetic phase with confidence 60%. If we regard \(H_{\text{Ising}}\) as a quantum Hamiltonian and allow the input data to be continuously modified, one can also consider a continuous attack scenario and obtain various adversarial examples, as shown in the Additional file 1.

Figure 2
figure 2

(a) A legitimate sample of the spin configuration in the ferromagnetic phase with \(M=|\sum_{i}^{N}\sigma _{i}|/N=0.791\). (b) An adversarial example obtained by the differential evolution algorithm (DEA), which only differs from the original legitimate one by flipping one spin (in red circle). (c) The activation map (AM) of the original classifier. The spins at positions with darker colors contribute more to the confidence of being ferromagnetic phase. (d) The activation map of the classifier after adversarial training. The map becomes much flatter and the variance of each position’s activation value becomes much smaller

To understand why this adversarial example crafted with tiny changes leads to misclassification, we dissect the classifier by estimating each position’s importance to the final prediction, which we call the activation map of the classifier (see Sec. II in the Additional file 1). In Fig. 2(c) we depict the activation map for ferromagnetic phase. It is evident that the classifier makes prediction mainly based on positions with large activation values (dark colors). The position where the adversarial spin flip happens in Fig. 2(b) has an activation value 3.28, which is the forth largest among all 900 positions. Then we enumerate all positions with activation values larger than 2.6 and find that single spin flips, which changes the contribution to ferromagnetic phase from positive to negative, can all lead to misclassification. The values at different positions for the activation map is found to be nonuniform, which contradicts to the physical knowledge that each spin contributes equally to the order parameter M. This explains why the classifier is vulnerable to these particular spin flips. We remark that in the traditional machine learning realm of classifying daily-life images (such as images of cats and dogs), such an explanation is unattainable due to the absence of a sharply defined “order parameter”.

2.2 Topological phases of matter

Unlike conventional phases (such as the paramagnetic/ferromagnetic phases discussed above), topological phases do not fit into the paradigm of symmetry breaking [53] and are described by nonlocal topological invariants [54, 55], rather than local order parameters. This makes topological phases harder to learn in general. Notably, a number of different approaches, based on either supervised [15, 23, 56] or unsupervised [22, 5767] learning paradigms, have been proposed recently and some of them been demonstrated in proof-of-principle experiments [33, 34].

The obtaining of adversarial examples is more challenging, since the topological invariants capture the global properties of the systems and are insensitive to local perturbations. We consider a three-band model for 3D chiral topological insulators (TIs) [68, 69]: \(H_{\text{TI}} = \sum_{\boldsymbol{k\in \text{BZ}}}\Psi _{ \boldsymbol{k}}^{\dagger}H_{\boldsymbol{k}}\Psi _{\boldsymbol{k}}\), where \(\Psi _{\boldsymbol{k}}^{\dagger}=(c_{\boldsymbol{k},1}^{\dagger},c_{ \boldsymbol{k},0}^{\dagger},c_{\boldsymbol{k},-1}^{\dagger})\) with \(c_{\boldsymbol{k,\mu}}^{\dagger}\) the fermion creation operator at momentum \(\boldsymbol{k}=(k_{x},k_{y},k_{z})\) in the orbital (spin) state \(\mu =-1,0,1\) and the summation is over the Brillouin zone (BZ); \(H_{\boldsymbol{k}}=\lambda _{1}\sin k_{x}+\lambda _{2}\sin k_{y}+ \lambda _{6}\sin k_{z}-\lambda _{7}(\cos k_{x}+\cos k_{y}+\cos k_{z}+h)\) denotes the single-particle Hamiltonian, with \(\lambda _{1,2,6,7}\) being four traceless Gell-Mann matrices [68]. The topological properties for each band can be characterized by a topological invariant \(\chi ^{(\eta )}\), where \(\eta = l,m,u\) denotes the lower, middle and upper bands, respectively. \(\chi ^{(\eta )}\) can be written as an integral in the 3D momentum space: \(\chi ^{(\eta )} = \frac{1}{4\pi ^{2}} \int _{\text{BZ}}\epsilon ^{\mu \nu \tau}A_{\mu}^{(\eta )} \partial _{k^{\nu}}A_{\tau}^{(\eta )}\,d^{3} \mathbf{k}\), where \(\epsilon ^{\mu \nu \tau}\) is the Levi-Civita symbol with \(\mu ,\nu ,\tau \in \{x,y,z\}\), and the Berry connection is \(A_{\mu}^{(\eta )}=\langle \psi _{\mathbf{k}}^{(\eta )}|\partial _{k^{\mu}}|\psi _{\mathbf{k}}^{(\eta )}\rangle \) with \(|\psi _{\mathbf{k}}^{(\eta )}\rangle \) denoting the Bloch state for the η band. The topological invariants for each band are related as \(\chi ^{(u)} = \chi ^{(l)} = \chi ^{(m)}/4\) for the three-band chiral topological insulator model. We can obtain that \(\chi ^{(m)}=0,1,\text{and }-2\) for \(|h|>3,1<|h|<3\), and \(|h|<1\), respectively. Recently, an experiment has been carried out to simulate \(H_{\text{TI}}\) with the electron spins in a NV center and a demonstration of the supervised learning approach to topological phases has been reported [33]. Using the measured density matrices in the momentum space (which can be obtained through quantum state tomography) as input data, a trained 3D convolutional neural network (CNN) can correctly identify distinct topological phases with exceptionally high success probability, even when a large portion of the experimentally generated raw data was dropped out or inaccessible. Here, we show that this approach is highly vulnerable to adversarial perturbations.

We first train a 3D CNN with numerically simulated data. The training curve is shown in Fig. 3(a) and the accuracy on validation data saturates at a high value (\(\approx 99\%\)). After the training, we fix the model parameters of the CNN and utilize the Fast Gradient Sign Method (FGSM) [49], projected gradient descent (PGD) [49] and Momentum Iterative Method (MIM) [50] to generate adversarial perturbations [70]. Figure 3(b) shows the confidence probabilities of the classification of a sample with \(\chi ^{({m})}=0\) as functions of the MIM iterations. From this figure, \(P(\chi ^{({m})}=0)\) decreases rapidly as the iteration number increases and converges to a small value (\(\approx 2\%\)) after about eight iterations. Meanwhile, \(P(\chi ^{({m})}=1)\) increases rapidly and converges to a large value (\(\approx 98\%\)), indicating a misclassification of the classifier—after about eight iterations, the sample originally from the category \(\chi ^{({m})}=0\) is misclassified to belong to the category \(\chi ^{({m})}=1\) with a confidence level \(\approx 98\%\). We note that a direct calculation of the topological invariant through integration confirms that \(\chi ^{(\text{m})}=0\) for this adversarial example, indicating that the tiny perturbation would not affect the traditional methods.

Figure 3
figure 3

(a) The average accuracy and loss of the 3D convolutional neural network to classify the topological phases. (b) We use the momentum iterative method to obtain the adversarial examples. This plot shows the classification probabilities as a function of the iteration number. After around two iterations, the network begin to misclassify the samples. (c-f) The activation maps (AM) of the sixth kernel in the first convolutional layer under different settings. (c) the average AM on all samples in the test set with \(\chi ^{(m)}=0\). (d) the average AM on \(\chi ^{(m)}=1\). (e) the AM obtained by taking the legitimate sample as the input to the topological phases classifier and (f) is taking the adversarial example as input

It is more challenging to figure out why the topological phases classifier is so vulnerable to the adversarial perturbation. Since these convolutional kernels are repeatedly applied to different spatial windows, we have limited method to calculate the importance of each location, as we do with an Ising classifier. We study the activation maps of all convolutional kernels in the first convolutional layer. We find that the sixth kernel has totally different activation patterns for topologically trivial and nontrivial phases, which acts as a strong indicator for the classifier to distinguish these phases (see Sec. II in the Additional file 1). Specifically, the activation patterns for \(\chi ^{(m)}=0,1\) phases are illustrated in Fig. 3(c-d). We then compare the sixth kernel’s activation maps of the legitimate sample and the adversarial example. As shown in Fig. 3(e-f), we can clearly see that the tiny adversarial perturbation makes the activation map much more correlated with the \(\chi ^{(m)}=1\) ones, which gives the classifier high confidence to assert that the adversarial example belongs to the \(\chi ^{(m)}=1\) phase. This explains why adversarial examples can deceive the classifier.

The above two examples clearly demonstrate the vulnerability of machine learning approaches to classify different phases of matter. We mention that, although we have only focused on these two examples, the existence of adversarial perturbations is ubiquitous in learning various phases (independent of the learning model and input data type) and the methods used in the above examples can also be used to generate the desired adversarial perturbations for different phases. From a more theoretical computer science perspective, the vulnerability of the phase classifiers can be understood as a consequence of the strong “No Free Lunch” theorem—there exists an intrinsic tension between adversarial robustness and generalization accuracy [7173]. The data distributions in the scenarios of learning phases of matter typically satisfy the \(W_{2}\) Talagrand transportation-cost inequality, thus a phase classifier could be adversarially deceived with high probability [74].

2.3 Adversarial training

In adversarial machine learning, a number of countermeasures against adversarial examples have been developed [75, 76]. Adversarial training, whose essential idea is to first generate a substantial amount of adversarial examples and then retrain the classifier with both original data and crafted data, is one of these countermeasures to make the classifiers more robust. Here, in order to study how it works for machine learning phases of matter, we apply adversarial training to the 3D CNN classifier used in classifying topological phases. Partial results are plotted in Fig. 4(a-b). While the classifier’s performance on legitimate examples maintains intact, the test accuracy on adversarial examples increases significantly (at the end of the adversarial training, it increases from about 60% to 98%). This result indicates that the retrained classifier is immune to the adversarial examples generated by the corresponding attacks.

Figure 4
figure 4

The effectiveness of the adversarial training. (a) We first numerically generate adequate adversarial examples with the FGSM, and then retrain the CNN with both the legitimate and adversarial data. (b) Similar adversarial training for the defense of the PGD attack. (c) The original classifier’s output representing the confidence of being ferromagnetic. The “Configurations” means different spin configurations with the same magnetization M. The corresponding temperature (listed in parentheses) of each M is calculated by Onsager’s formula [51]. The original classifier identify the transition point at \(T=2.255\). (d) The refined classifier’s output after the adversarial training against BIM attack. The identified transition point changes to \(T=2.268\), which becomes closer to \(T_{c}=2.269\)

We also study how adversarial training can make the classifiers grasp physical quantities more thoroughly. As shown in Fig. 2(d), the activation map of the Ising model classifier becomes much flatter after adversarial training (the standard deviation of different positions is reduced from 0.88 to 0.20), which indicates that after adversarial learning the output of the classifier is more consistent with the physical order parameter of magnetization where each spin contributes equally, and hence more robust to adversarial perturbations. This is also reflected by the fact that after adversarial training the classifier can identify the phase transition point more accurately, as shown in Fig. 4(c-d). We also show that adversarial training can also help the topological phase classifier to make better inference from the view of activation maps (see Sec. III in the Additional file 1).

3 Discussion and conclusion

We mention that the adversarial training method is useful only on adversarial examples which are crafted on the original classifier. The defense may not work for black-box attacks [77, 78], where an adversary generates malicious examples on a locally trained substitute model. To deal with the transferred black-box attack, one may explore the recently proposed ensemble adversarial training method that retrain the classifier with adversarial examples generated from multiple sources [79]. In the future, it would be interesting and desirable to find other defense strategies to strengthen the robustness of phase classifiers to adversarial perturbations. In addition, an experimental demonstration of adversarial learning phases of matter together with defense strategies would also be an important step towards reliable practical applications of machine learning in physics.

In summary, we have studied the robustness of machine learning approaches in classifying different phases of matter. Our discussion is mainly focused on supervised learning based on deep neural networks, but its generalization to other types of learning models (such as unsupervised learning or reinforcement learning) and other type of phases are possible. Through two concrete examples, we have demonstrated explicitly that typical phase classifiers based on deep neural networks are extremely vulnerable to tiny adversarial perturbations. We have studied the explainability of adversarial examples and demonstrated that adversarial training significantly improves the robustness of phase classifiers by assisting the model to learn underlying physical principle and symmetries. Our results reveal a novel vulnerability aspect for the growing field of machine learning phases of matter, which would benefit future studies across condensed matter physics, machine learning, and artificial intelligence.