Robustness Verification of Quantum Classifiers

Several important models of machine learning algorithms have been successfully generalized to the quantum world, with potential speedup to training classical classifiers and applications to data analytics in quantum physics that can be implemented on the near future quantum computers. However, quantum noise is a major obstacle to the practical implementation of quantum machine learning. In this work, we define a formal framework for the robustness verification and analysis of quantum machine learning algorithms against noises. A robust bound is derived and an algorithm is developed to check whether or not a quantum machine learning algorithm is robust with respect to quantum training data. In particular, this algorithm can find adversarial examples during checking. Our approach is implemented on Google's TensorFlow Quantum and can verify the robustness of quantum machine learning algorithms with respect to a small disturbance of noises, derived from the surrounding environment. The effectiveness of our robust bound and algorithm is confirmed by the experimental results, including quantum bits classification as the"Hello World"example, quantum phase recognition and cluster excitation detection from real world intractable physical problems, and the classification of MNIST from the classical world.


Introduction
In the last few years, the successful interplay between machine learning and quantum physics shed new light on both fields.On the one hand, machine learning has been dramatically developed to satisfy the need of the industry over the past two decades.At the same time, many challenging quantum physical problems have been solved by automated learning.Notably, inaccessible quantum many-body problems have been solved by neural networks, one instance of machine learning [1].On the other hand, as the new model of computation under quantum mechanics, quantum computing has been proved that it can arXiv:2008.07230v2[quant-ph] 31 May 2021 (exponentially) speed up classical algorithms for some important problems [2].This motivates the development of quantum machine learning and provides the possibility of improving the existing computational power of machine learning to a new level (see the review papers [3,4] for the details).After that, quantum machine learning was integrated into solving real world problems in quantum physics.One essential example is that quantum convolutional neural networks inspired by machine learning were proposed to implement quantum phase recognition [5].Quantum phase recognition asks whether a given input quantum state belongs to a particular quantum phase of matter.At the same time, more provable advantages of quantum machine learning than the classical counterpart have been reported.For instance, the training complexity of quantum models has an exponential improvement on certain tasks [6].Stepping into industries, Google recently built up a framework TensorFlow Quantum for the design and training of quantum machine learning within its famous classical machine learning platform-TensorFlow [7].
Even though quantum machine learning outperforms the classical counterpart in some way, the difficulties in the classical world are expected to be encountered in the quantum case.Classical machine learning has been found to be vulnerable to intentionally-crafted adversarial examples (e.g.[8,9]).Adversarial examples are inputs to a machine learning algorithm that an attacker has crafted to cause the algorithm to make a mistake.One essential mission of machine learning is to prove the absence of or detect adversarial examples used in the defense strategy-adversarial training [10]-appending adversarial examples to the training dataset and retraining the machine learning algorithm to be robust to these examples.However, this goal is not easily achieved [11].The machine learning community has developed several interesting ideas on designing specific attack algorithms (e.g.[12,10]) to generate adversarial examples, which is far from measuring the robustness against any adversary.Recently, the formal method community has taken initial steps in this direction [13,14,15,16], by verifying the robustness of classical machine learning algorithms in a provable way: either a formal guarantee that the algorithms are robust for a given input or a counter-example (adversarial example) is provided if an input is not robust.Some tools have been developed, such as VerifAI [17] and NNV [18].This phenomenon of the vulnerability is more common in the quantum world since quantum noise is inevitable in quantum computation, at least in the current NISQ (Noisy Intermediate-Scale Quantum) era, and thus led to a series of recent works on quantum machine learning robustness against specific noises.For example, Lu et al. [19] studied the robustness to various classical adversarial attacks; Du et al. [20] proved that appending depolarization noise in quantum circuits for classifications, a robust bound against adversaries can be derived; Liu and Wittek [21] gave a robust bound for the quantum noise coming from a special unitary group.Very recently, Weber et al. [22] formalized a link between binary quantum hypothesis testing [23] and robust quantum machine learning algorithms for classification tasks.
Up to our best knowledge, the existing studies of quantum machine learning robustness only consider the situation of a known noise source.However, a fundamental difference between quantum and classical machine learning is that the quantum attacker is usually the surroundings instead of humans in the classical case, and the information of the environment is unknown.To protect against an unknown adversary, we need to derive a robust guarantee against a worst-case scenario, from which the commonly-assumed known noise sources (e.g.depolarization noise [20]) are usually far.Yet in the case of unknown noise, several basic issues are still unsolved: -In theory, it is unclear how to compute a tight and even the optimal bound of the robustness for any given quantum machine learning algorithm.-In practice, an efficient way to find an adversarial example, which can be used to retraining the algorithm to defense the noise, is lacking.Indeed, we do not even know which metric is a better choice measuring the robustness against noise, the same as the classical case against human attackers [24].
In this work, we define a formal framework for the robustness verification and analysis of quantum machine learning algorithms against noises in which the above problems can be studied in a principled way.More specifically, we choose to use fidelity as the metric measuring the robustness as it is one of the most widely used quantity to quantify the uncertainty of noise in the process of quantum computation, and commonly used in quantum engineering and experimental communities (e.g.[25,26]).Based on this, an analytical robust bound for any quantum machine learning classification algorithm is obtained and can be applied to approximately checking the robustness of quantum machine learning algorithms.Furthermore, we show that computing the optimal robust bound can be reduced to solving a Semidefinite Programming (SDP) problem.These results lead to an algorithm to exactly and efficiently check whether or not a quantum machine learning algorithm is robust with respect to the training data.A special strength of this algorithm is that it can identify useful new training data (adversarial examples) during checking, and these data can be used to implement adversarial training as the same as classical robustness verification.The effectiveness of our robust bound and algorithms is confirmed by the case studies of quantum bits classification as the "Hello World" example of quantum machine learning algorithms, quantum phase recognition and cluster excitation detection from real world intractable physical problems, and the classification of MNIST from the classical world.
In summary, the main technical contributions of the paper are as follows.
-Computing the optimal robust bound of quantum machine classification algorithms is reduced to an SDP (Semidefinite Programming) problem; -An efficient algorithm to check the robustness of quantum machine learning algorithms and detect adversarial examples is developed; -The implementation of the robustness verification algorithm on Google's TensorFlow Quantum; -Case studies -Checking the robustness of several popular quantum machine learning algorithms for quantum bits classification, cluster excitation detection and the classification of MNIST (which are all implemented in Google's TensorFlow Quantum), and quantum phase recognition.

Quantum Data and Computation Models
For convenience of the reader, in this section, we recall some basic concepts of quantum data (states) and the quantum computation model.The basic data of classical computers are bits, represented by two digits 0 and 1.In quantum computing, quantum bits (qubit) play the same role.A qubit is expressed by a normalized complex vector |φ = a b = a|0 + b|1 with complex numbers a and b satisfying the normalization condition |a| 2 + |b| 2 = 1.
Here, |0 = 1 0 , |1 = 0 1 correspond to bits 0, 1 respectively, and {|0 , |1 } is an orthonormal basis of a 2-dimensional Hilbert (linear) space.In general, for a quantum computer consisting of n qubits, a quantum datum is a normalized complex vector |ψ in a 2 n -dimensional Hilbert space H.Such a |ψ is usually called a pure state in the literature of quantum computation.As a model for computation, a quantum circuit consists of a sequence of, say m quantum logic gates.Each quantum gate can be mathematically represented by a unitary matrix where U † i is the conjugate transpose of U i and I is the identity matrix on H. Then the circuit is represented by the unitary matrix U = U m • • • U 1 .If the quantum datum |ψ is input to the circuit, then the output is a quantum datum: In practice, a quantum datum may not be completely known and can be thought of as a mixed state or ensemble {(p k , |ψ k )} k , meaning that it is at |ψ k with probability p k .Mathematically, it can be described by a density operator ρ (Hermitian positive semidefinite matrix with unit trace 5 ) on H: where In this case, the model of quantum computation is tuned to be a super-operator E, i.e. a mapping from matrices to matrices.It can be written as Here, ρ and ρ are the input and output data (mixed states) of quantum computation E, respectively.Not every super-operator E is meaningful in physics.It is required to satisfy the following conditions: -E is trace-preserving: tr(E(ρ)) = tr(ρ) for all mixed state ρ on H; -E is completely positive: for any Hilbert space H , the trivially extended operator id H ⊗E maps density operators to density operators on H ⊗ H, where ⊗ denotes the tensor product and id H is the identity map on H .
Such a super-operator E admits a Kraus matrix form [2]: there exists a set of matrices {E k } k on H such that Here {E k } k is called Kraus matrices of E.
The behind dynamics of quantum computers is governed by quantum mechanics, which is applied at the microscopic scale (near or less than 10 −9 meters).At this level, we cannot directly readout the quantum data as the same to the classical counterpart.The only way to extract information from it is through a quantum measurement, which is mathematically modeled by a set {M k } m k=1 of matrices on its state (Hilbert) space H with k M † k M k = I.This observing process is probabilistic: if the system is currently in state ρ, then a measurement outcome k is obtained with probability After the measurement, the system's state will be collapsed (changed), depending on the measurement outcome k, which is vitally different to the classical computation.If the outcome is k, the post-measurement state becomes This special property makes it hard to accurately estimate the distribution {p k } k unless enough many copies of ρ are provided.In summary, quantum data have two different forms -pure state |ψ and mixed state ρ corresponding to the computation model as a unitary matrix U or a super-operator E, respectively.Not surprisingly, the latter is a generalization of the former by putting: Because of this, the results obtained for mixed states ρ can also be applied to pure states |ψ .Thus, in this paper, we mainly consider mixed states as the quantum data and super-operators as the model of quantum computation.

Quantum Classification Algorithms
In this section, we briefly recall quantum classification algorithms.They are designed for classification of quantum data.Essentially, they share the same basic ideas with their classical counterparts but deal with quantum data in the quantum computation model.

Basic Definitions
In this paper, we focus on a specific learning model called quantum supervised classification.Given a Hilbert space H, we write D(H) for the set of all (mixed) quantum states on H (see its definition in Eq. ( 2)).
Definition 1.A quantum classification algorithm A is a mapping D(H) → C, where C is the set of classes we are interested in.
Following the training strategy of classical machine learning, the classification A is learned through a dataset T instead of being pre-defined.This training dataset T = {(ρ i , c i )} N i=1 consists of N < ∞ pairs (ρ i , c i ), meaning that quantum state ρ i belongs to class c i .To learn A, we initialize a quantum learning model -a parameterized quantum circuit (including measurement control) E θ and a measurement {M k } k∈C .Mathematically, the circuit can be modelled as a quantum super-operator E θ (see its definition in Eq.( 3)), and θ is a set of free parameters that can be tuned.Then for each k ∈ C, we can compute the probability of the measurement outcome being k: It is worth noting that, as we mentioned before, measuring quantum state ρ is probabilistic and ρ will be changed after measuring.So, in practice, accurately estimating f k (θ, ρ) for all k ∈ C requires enough many copies of ρ, which is not the same to the classical case, where a single copy of classical data often meets the training process.The quantum classification algorithm A outputs the class label c for a quantum state ρ using the following condition: The learning is carried out as θ is optimized to minimize the empirical risk where L refers to a predefined loss function, f (θ, ρ) is a probability vector with each f k (θ, ρ), k ∈ C as its element, and c i is also seen as a probability vector with the entry corresponding to c i being 1 and others being 0. The goal is to find the optimized parameters θ * minimizing the risk in Eq.( 8) for the given dataset T .Mean-squared error (MSE) is the most popular instance of the empirical risk, i.e., the loss function L is squared error: As one can see in the above learning process, the main differences between classical and quantum machine learning algorithms are the learning models and data.
In this paper, we focus on the well-trained quantum classification algorithm A, usually called a quantum classifier.Here, A is said to be well-trained if training and validation accuracy are both high (≥ 95%).The training (validation) accuracy is the frequency that A successfully classifies the data in a training (validation) dataset.A validation dataset is mathematically equivalent to a training dataset but only for testing A rather than learning A. In this context, θ * is naturally omitted, i.e., A(ρ) = A(θ * , ρ) and

An Illustrative Example
Let us further illustrate the above definitions by a concrete example-Quantum Convolutional Neural Networks (QCNNs) [5], one of the most popular and successful quantum learning models.QCNN extends the main features and structures of the Convolutional Neural Networks (CNNs) to quantum computing.The Fig. 1: Simple example of CNN and QCNN.QCNN, like CNN, consists of a convolution layer that finds a new state and a pooling layer that reduces the size of the model.Here, MCUG stands for measurement control unitary gate, i.e., unitary matrix V 1 is applied on the circuit if and only if the measurement outcome is 1.
model of QCNN applies the convolution layer and the pooling layer from CNNs to quantum systems, as shown in Figure 1(b).The layout proceeds as follows: 1 The convolution layer (circuit) applies multiple qubit gates U i between adjacent qubits to find a new state; 2 The pooling layer reduces the size of the quantum system by measuring a fraction of qubits, and the outcomes determine unitary V i applied to nearby qubits; 3 Repeat the convolution layer and pooling layer defined in 1-2; 4 When the size of the system is sufficiently small, the fully connected layer is applied as a unitary matrix F on the remaining qubits.
The input of QCNNs is an unknown quantum state ρ in and the output is obtained by measuring a fixed number of output qubits.As in the classical case, the learning model (defined as the number of convolution and pooling layers) is fixed, but the involved quantum gates (i.e.unitary matrices) U i , V j , F themselves are learned by the above learning process.
Remark 1.Quantum machine learning can also be used to do classical machine learning tasks.Image classification, for example, is one of the most successful applications of Neural Networks (NNs).To explore the possible advantage of quantum computing, Quantum Neural Networks (QNNs) have been used to classify images in [27,28]).It is shown that by encoding images to a quantum state ρ in , QNNs can achieve high accuracy in image classification.We will present a quantum classifier for the classification of MNIST as an example in the evaluation section.

Robustness
An important issue in classical machine learning is: how robust is a classification algorithm to adversarial perturbations.A similar issue exists for quantum classifiers against quantum noise.Intuitively, the robustness of quantum classifier A is the ability to make correct classification with a small perturbation to the input states.Then a quantum state σ is considered as an adversarial example if it is similar to a benign state ρ, but ρ is correctly classified and σ is classified into a class different from that of ρ.Formally, Definition 2 (Adversarial Example).Suppose we are given a quantum classifier A(•), an input example (ρ, c), a distance metric D(•, •) and a small enough threshold value ε > 0. Then σ is said to be an ε-adversarial example of ρ if the following is true The leftmost condition A(ρ) = c asserts that ρ is correctly classified, the middle condition A(σ) = c means that σ is incorrectly classified, and the rightmost condition D(ρ, σ) ≤ ε indicates that ρ and σ are similar (i.e., their distance is small).Sometimes, without any ambiguity, σ is called an adversarial example of ρ if ε is preset.Notably, by the above definition, if A incorrectly classifies ρ, then we do not need to consider the corresponding adversarial examples.This is the correctness issue of quantum classifier A rather than the robustness issue.Hence, in the following discussions, we only consider the set of all correctly recognized states.

The absence of adversarial examples leads to robustness.
Definition 3 (Adversarial Robustness).Let A be a quantum classifier.Then ρ is ε-robust for A if there is no adversarial example of ρ.
The major problem concerning us in this paper is the following: Problem 1 (Robustness Verification Problem).Given a quantum classifier A(•) and an input example (ρ, c).Check whether or not If not, then an adversarial example (counter-example) σ ∈ N ε (ρ) is provided.
Obviously, if δ is a robust bound for an input example (ρ, c) such that A(σ) = c for any state σ ∈ N δ (ρ), then for any ε ≤ δ (i.e.N ε (ρ) ⊆ N δ (ρ)), there is no ε-adversarial example of ρ.It is a challenging problem to compute the optimal robust bound δ * = max δ so that there is no ε-adversarial example if and only if ε ≤ δ * .
The above adversarial robustness of quantum states can be generalized to a notion of robustness for quantum classifiers: Definition 4 (Robust Accuracy).Let A be a quantum classifier.The εrobust accuracy of A is the proportion of ε-robust states in the training dataset.Remark 2. Here, the robust accuracy is defined with respect to the training dataset.In some applications, the dataset can be chosen as another set of quantum states with correct classifications, such as a validation dataset or a combination of it with the training dataset.
The reader should notice that the above definitions of robustness for quantum classifiers are similar to those for classical classifiers.But an intrinsic distinctness between them comes from the choice of distance D(•, •).In the classical case, humans play the role of the adversary, and then such a distance should promise that a small perturbation is imperceptible to humans, and vice versa.Otherwise, we cannot take the advantage of machine learning over human's distinguishability.For instance, in image recognition, the distance should reflect the perceptual similarity in the sense that humans would consider adversarial examples generated by it perceptually similar to benign image [24].In the quantum case, it is essential to choose a distance D that is meaningful in quantum physics.In this paper, we choose to use the distance: defined by fidelity Here Fidelity is one of the most widely used quantities to quantify such uncertainty of noise by the experimental quantum physics and quantum engineering communities (see e.g.[29,30]).
Remark 3. The trace distance has been used in recent literature (e.g.[20]) for some issues related to quantum robustness verification: It is a generalization of the total variation distance, which is a distance measure for probability distributions.So far, to the best of our knowledge, there is no discussion about which distance is better in the literature.Here, we argue that fidelity is better than trace distance in the context of quantum machine learning against quantum noise.As we know, state distinguishability is the basis of measuring the effect of noise on quantum computation.The main difference between trace distance T (ρ, σ) and fidelity F (ρ, σ) is the number of copies of states ρ and σ as the resource required in the experiments for distinguishing them.More precisely, trace distance quantifies the maximum probability of correctly guessing through a measurement whether ρ or σ was prepared, while fidelity asserts the same quantity whence infinitely many samples of ρ and σ can be supplied (See Appendix A of the extended version of this paper [31] for more details).In quantum machine learning, a large enough number of copies of the states are the precondition of statistics in Eq.( 6) for learning and classification.Thus, fidelity is more suitable than trace distance for our purpose.

Robust Bound
In this section, we develop a theoretic basis for robustness verification of quantum classifiers.After setting the distance D to be the one defined by fidelity, a robust bound can be derived.
Lemma 1 (Robust Bound).Given a quantum classifier A = (E, {M k } k∈C ) and a quantum state ρ.Let p 1 and p 2 be the first and second largest elements of Proof.See Appendix B of the extended version of this paper [31].
The above robust bound gives us a quick robustness verification by the measurement outcomes of ρ without searching any possible adversarial examples.Furthermore, it also can be used to compute an under-approximation of the robust accuracy of A by one-by-one checking the robustness of quantum states in the training dataset.We will see that the robust bound and the induced robust accuracy scales well in the later experiments.However, , we can compute the optimal robust bound by Semidefinite Programming (SDP).Recall that SDP is a convex optimization concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space.It has the form min tr(CX) where C, A 1 , . . ., A m are all Hermitian n×n matrices (i.e.A † = A), and X is the optimization variable n × n matrix with X ≥ 0, i.e., X is positive semidefinite.Many efficient solvers have been developed for solving SDPs-not only compute the minimal value, but also output a corresponding optimal solution X.The following two theorems show that that checking ε-robustness and computing optimal robust bound of quantum states can both be reduced to an SDP.
Theorem 1 (ε-robustness Verification).Let A = (E, {M k } k∈C ) be a quantum classifier and ρ be a state with A(ρ) = l.Then ρ is ε-robust if and only if for all k ∈ C and k = l, the following problem has no solution (feasibility problem): Proof.See Appendix C of the extended version of this paper [31].
Actually, the objective function 0 in the above theorem can be chosen as any constant number.
Theorem 2 (Optimal Robust Bound).Let A and ρ be as in Theorem 1 with A(ρ) = l, and let δ k be the solution of the following problem: where if the problem is unsolved, then δ k = +∞.Then δ = min k =l δ k is the optimal robust bound of ρ.
Proof.The proof is similar to Theorem 1.
Remark 4. One may wonder why checking ε-robustness and computing the optimal robust bound can always be reduced to an SDP.This is indeed implied by the basic quantum mechanics postulate of linearity; more specifically, all of the super-operators and measurements used in quantum machine learning algorithms are linear.In contrast, the functions represented by the neural networks in classical machine learning may be nonlinear as the pooling layer is not linear.As a result, the reduced optimization problem for the robustness verification is not convex (e.g.[32]).For overcoming this difficulty, many different methods have been developed to encode the nonlinear activation functions as linear constraints.

Robustness Verification Algorithms
In this section, we develop several algorithms for verifying the robustness of quantum classifiers based on the theoretic results presented in the last section.
First, let us consider the robustness of a given quantum state ρ.In many applications (as shown in our experiments in Section 7), we are required to check whether ρ is ε-robust for an arbitrarily given threshold ε.Note that once we computed the optimal robust bound δ, checking ε-robustness of ρ is equivalent to compare ε and δ; that is, ε ≤ δ if and only if ρ is ε-robust.Combining this simple observation with Theorem 1, we obtain Algorithm 1 for checking the εrobustness of ρ and finding the minimum adversarial perturbation δ caused by quantum noise.The main cost of Algorithm 1 incurs in solving SDPs in Line 2, which scales as O(n 6.5 ) by interior point methods [36], where n is the number of rows of the semidefinite matrix ρ in SDP, i.e., the dimension of Hilbert space of the quantum states in our case.As we need to apply an SDP solver for |C| − 1 times in Line 1, the total complexity is as follows.
Theorem 3. The worst case complexity of Algorithm 1 is O(|C| • n 6.5 ), where n is the dimension of input state ρ and |C| is the number of the set C of classes we are interested in.Now we turn to consider the robustness of a quantum classifier A. Algorithm 2 is designed for checking robustness of A by combining Algorithm 1 with Lemma 1 (see the discussion in the paragraph after Lemma 1).A major benefit of formal robustness verification for classical classifiers is perhaps that it can be used to detect a counter-example (adversarial example) for a given input (see e.g.[13,14,15,16]).This benefit is kept in Algorithm 2 for the robustness verification of quantum classifiers.In particular, we are able to extend the technique of adversarial training in classical machine learning [10] into the quantum case: an adversarial example σ is automatically generated once ε-robustness of ρ fails, and then by appending (σ, l) into the training dataset, we can retrain A to improve the robustness of the classifier.

Algorithm 2 RobustnessVerifier(A, ε, T )
Require: A = (E, {M k } k∈C ) is a well-trained quantum classifier, ε < 1 is a real number, T = {(ρi, li)} is the training dataset of A Ensure: The robust accuracy RA and a set R = {< σj, ij >}, where for each j, ρj is an ε-adversarial example of ρi j ; R can be an empty set if all states in T are ε-robust.1: R = ∅ be an empty set. / To analyze the complexity of Algorithm 2, we first see by Theorem 2 that for evaluating the robustness of A-computing its robust accuracy and finding its adversarial examples, one need to call Algorithm 1 for each quantum state in the training dataset, which costs O(|C| • n 6.5 ).Thus, the total complexity of robustness verification is O(|T | • |C| • n 6.5 ), where |T | is the number of elements in the training dataset T .However, the robust bound given in Lemma 1 can help to speed up the process by quickly finding all potential non-robust states, as the complexity of finding the bound is only O(|C| • n 3 ), which is the cost of |C| times of the multiplication of two n × n matrices.In practice, this bound scales well, as confirmed by our experiments presented in Section 7. Therefore, a good strategy for implementing the robustness verification is that we first use robust bound to pick up all potential non-robust states from the given training dataset T and store them in a set T .Then we check all left candidates in the training dataset T one-by-one using Algorithm 1 and use a set R to record the found adversarial examples and the corresponding indexes of states.This strategy can significantly reduce the complexity to O(|T | • |C| • n 6.5 ).Indeed, our experiments show that the robust bound given in Lemma 1 scales very well in the sense of |T | |T |.
Remark 5. Thanks to the linearity of the quantum learning model determined by the basic postulate of quantum mechanics, the robustness verification of quantum classifiers can be done in an efficient way (with polynomial time complexity in the size of input state).It is usually not the case in verifying the robustness of classical machine learning algorithms.For example, DNNs are often non-linear and non-convex and verifying even some simple properties of them can be an NP-complete problem [37].
Surprisingly, the robustness verification problem for quantum classifiers becomes much harder if we are required to find adversarial examples in pure states.Roughly speaking, the reason is that the set of all pure states is not convex, and thus computing the optimal robust bound for pure states is not an SDP, as in Theorem 2. We can prove that it is a Quadratically Constrained Quadratic Program (QCQP), an optimization problem where both the objective function and the constraints are quadratic functions (see Appendix D of the extended version of this paper [31] for the proof), which is NP-hard.Algorithm 1 can be adapted to this pure state robustness verification by calling a QCQP solver instead of an SDP solver in Line 2. Subsequently, Algorithm 2 can use this new version of Algorithm 1 as a subroutine to compute the corresponding robust accuracy and find adversarial examples of pure states.We will evaluate the QCQP-based robustness verification in the case study of MNIST classification in which handwritten digits are encoded in pure states.

Evaluation
Algorithm 2 is implemented on TensorFlow Quantum -a platform of Google for designing and training quantum machine learning algorithms, by calling an SDP solver -CVXPY: Python Software for Disciplined Convex Programming [38].This section aims to evaluate our approach with experiments on some concrete examples.This section is arranged as follows.In Subsections 7.1-7.4,we present several well-trained quantum classifiers.Then the evaluation is carried out in Subsection 7.5 by applying Algorithm 2 to check the robustness verification of these classifiers and find their adversarial examples if existing.
To demonstrate our method as sufficiently as possible, we check the robustness of four quantum classifiers.We begin with a "Hello World" examplequbits classification, and then we step in two quantum classifiers applied to real world tasks -quantum phase recognition and cluster excitation detection, which are both fundamental and hard problems in quantum physics.At last, to compare with classical robustness verification, we consider the classification of MNIST by encoding handwritten digital images into quantum data.These experiments cover all illustrated examples of TensorFlow Quantum.

Quantum Bits Classification
A "Hello World" example of quantum machine learning is quantum bits classification [7].The aim is to implement a binary classification for regions on a single qubit, i.e., a perceptron for qubits.Specifically, two random normalized vectors |a and |b (pure states) in the X-Z plane of the Bloch sphere are chosen.Around these two vectors, we randomly sample two sets of quantum data points; the objective is to learn a quantum gate to distinguish the two sets.A concrete instance of this type is shown in Fig.  8), we use Adam optimizer [39] to update θ.After training, we achieve both 100% training and validation accuracy, and the final parameter θ is 0.4835.

Quantum Phase Recognition
Quantum phase recognition (QPR) of one dimensional many-body systems has been attacked by quantum convolutional neural networks (QCNNs) proposed by Cong et al. [5].Consider a Z 2 × Z 2 symmetry-protected topological (SPT) phase P and the ground states of a family of Hamiltonians on spin-1/2 chain with open boundary conditions: where X i , Z i are Pauli matrices [2] for the spin at site i, and the Z 2 ×Z 2 symmetry is generated by X even(odd) = i∈even(odd) X i .The goal is to identify whether the ground state |ψ of H belongs to phase P when H is regarded as a function of (h 1 /J, h 2 /J).For small N , a numerical simulation can be used to exactly solve this problem [5]; See Fig. 4a in Appendix E of the extended version of this paper [31] for the exact phase boundary points (blue and red diamonds) between SPT phase and non-SPT (paramagnetic or antiferromagnetic) phase for N = 6.Thus the 6-qubit instance is an excellent testbed for different new methods and techniques of QPR.Here, we train a QCNN model to implement 6-qubit QPR in this setting.
To generate the dataset for training, we sample a serials of Hamiltonian H with h 2 /J = 0, uniformly varying h 1 /J from 0 to 1.2 and compute their corresponding ground states; see the gray line of Fig. 4a in Appendix E of the extended version of this paper [31].For the testing, we uniformly sample a set of validation data from two random regions of the 2-dimensional space 1 /J, h 2 /J); see the two dashed rectangles of Fig. 4a.Finally, we obtain 1000 training data and 400 validation data.Our parameterized QCNN circuit is shown in Fig. 4b in Appendix E of the extended version of this paper [31], and the unitaries U i , V j , F are parameterized with generalized Gell-Mann matrix basis [40]: U = exp(−i j θ j Λ j ), where Λ j is a matrix and θ j is a real number; the total number of parameters θ j , Λ j is 114.For the outcome measurement of one qubit, we use measurement M = {M 0 = |+ +|, M 1 = |− −|} to predict that input states belongs to P with output 0, where |± = 1 √ 2 (|0 ± |1 ).Targeting to minimizing the MSE form of Eq. ( 8), we use Adam optimizer to update the 114 parameters.After training, 97.7% training accuracy and 95.25% validation accuracy are obtained.At the same time, our classifier conducts a phase diagram (the colorful figure in Fig. 4a), where the learned phase boundary almost perfectly matches the exact one gotten by the numerical simulation.All these results indicate that our classifier is well-trained.

Cluster Excitation Detection
The task of cluster excitation detection is to train a quantum classifier to detect if a prepared cluster state is "excited" or not [7].Excitations are represented with a X rotation on one qubit.A large enough rotation is deemed to be an excited state and is labeled by 0, while a rotation that isn't large enough is labeled by 1 and is not deemed to be an excited state.Here, we demonstrate this classification task with 6 qubits.We use the circuit shown in Fig. 5a of Appendix E in the extended version of this paper [31] to generate training (840) and validation (360) samples.The circuit generates cluster state by performing a X rotation (we omit angle θ) on one quibit.The rotation angle θ is ranging from −π to π and if −π/2 ≤ θ ≤ π/2, the label of the output state is 1; otherwise, the label is 0. The classification circuit model (a quantum convolutional neural network) uses the same structure in TensorFlow Quantum [7], shown in Fig. 5b of Appendix E in the extended version of this paper [31].The explicit parameterization of C i , P j can be found in [7].The final measurement Targeting to minimizing the MSE form of Eq. ( 8), we use Adam optimizer to update all C i , P j .We achieve 99.76% training accuracy and 99.44% validation accuracy.

The Classification of MNIST
Handwritten digit recognition is one of the most popular tasks in the classical machine learning zoo.The archetypical training and validation data come from the MNIST dataset which consists of 55,000 training samples handwritten digits [41].These digits have been labeled by humans as representing one of the ten digits from number 0 to 9, and are in the form of gray-scale images that contains 28 × 28 pixels.Each pixel has a grayscale value ranging from 0 to 255.Quantum machine learning has been used to distinguish a too simplified version of MNIST by downscaling the image sizes to 8 × 8 pixels.Subsequently, the numbers represented by this version of MNIST can not be perceptually recognized [7].Here, we build up a quantum classifier to recognize a MNIST version of 16 × 16 pixels (see second column images of Fig. 3).As demonstrated in [7], we select out 700 images of number 3 and 700 images of number 6 to form our training (1000) and validation (400) datasets.Then we downscale those 28 × 28 images to 2 4 × 2 4 images (fitting the size of quantum data), and encode them into the pure states of 8 qubits via amplitude encoding.Amplitude encoding uses the amplitude of computational basis to represent vectors with normalization: where {|i } is a set of orthogonal basis of the 8 qubits state space.The normalization doesn't change the pattern of those images.For learning a quantum classifier, we use the QCNN model in Fig. 6 of Appendix E in the extended version of this paper [31] and use measurement The output of measurement M indicates the numbers: output 1 for number 3 and output 0 for number 6.The explicit parameterization of those C i , P j can be found in [7].Again we use Adam optimizer to update the model parameters minimizing the MSE form of Eq.( 8).We finally achieve 98.4% training accuracy and 97.5% validation accuracy.

Robustness Verification
Now, we start to check the ε-robustness for the above four well-trained classifiers presented in the previous four subsections.
In practical applications, the value of robustness ε in Definition 3 represents the ability of state preparation by quantum controls.For example, the state-ofthe-art is that a single qubit can be prepared with fidelity 99.99% (e.g.[29,30]).Here, we choose four different values of ε in each experiment.To show the scalability of our robust bound given in Lemma 1, we use it to develop an algorithm (Algorithm 3 in Appendix F of the extended version of this paper [31]) to under-approximate the robust accuracy, which is computed by Algorithm 2. Algorithm 3 is a subroutine of Algorithm 2 without calling an SDP solver (whenever a potential non-robust state can be detected by the robust bound in Lemma 1).We compare the verification times by Algorithms 2 and 3.The experiments are done on a computer with the following configurations: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz × 8 Processor, 15.8 GiB Memory, Ubuntu 18.04.5 LTS, with CVXPY: Python Software for Disciplined Convex Programming [38] for solving SDP, and a SciPy solver for finding the minimum of constrained nonlinear multivariable function for solving QCQP.The experi-Robust Accuracy (in Percent) mental results are given in Tables 1-4.As an example, we illustrate the details of the result for the case of ε = 0.001 in Table 1.First, we only apply our robust bound in Lemma 1 to pick up all potential non-robust states from the 800 points in the training dataset.Then 95 points are left.Thus, the under-approximation of the robust accuracy computed by Algorithm 3 (in Appendix F of the extended version of this paper [31]) is 88.13%.Next, we check the 0.001-robustness by Algorithm 2. Indeed, only 80 of the points detected by the above robust bound are non-robust and the exact robust accuracy is 90.00%.We also compare the verification time of the two approaches to the robust accuracy.See the second column in Table 1 for the detail, and other experiment results of ε-robustness are also summarized in the same table.Tables 1-4 for the verification results show that in all of these experiments, the robust bound obtained in Lemma 1 scales very well, and the robustness verification by Algorithm 3 costs significantly less time (< 2s) than the way of computing the optimal robust bound by Algorithm 2. For example, for quantum phase recognition, for ε = 0.0001, 0.0002

Conclusion
In this work, we initiate the research of the formal robustness verification of quantum machine learning algorithms against unknown quantum noise.We found an analytical robustness bound which can be efficiently computed to under-approximate the robust accuracy in practical applications.Furthermore, we developed a robustness verification algorithm that can exactly verify the ε- For topics for future research, it should be useful in practical applications to find an efficient method that over-approximates the robust accuracy of quantum classifiers.Combined the under-approximation approach developed in this work, it can help us to more accurately and fast estimate the robust accuracy.In classical machine learning, there exist some works in the literature to achieve this task.For instance, ImageStars, a new set representation, is introduced in [13] to perform efficient set-based analysis by combining operations on concrete images with linear programming, which leads to efficient over-approximative analysis of classical convolutional neural networks.
Tensor networks are one of the best-known data structures for implementing large-scale quantum classifiers (e.g.QCNNs with 45 qubits in [5]).For practical applications, we are going to incorporate tensor networks into our robustness verification algorithm so that it can scale up to achieve the demand of NISQ devices (of ≥ 50 qubits).
More generally, further investigations are required to better understand the role of the robustness in quantum machine learning, especially through more experiments on real world applications like learning phases of quantum manybody systems.
where η = − ln 4ε const .In terms of trace distance, we have: which is derived by the Fuchs-van de Graaf inequalities By the above inequalities about N , we can see that infidelity gives a better (linear) estimation of how many samples we need to accurately discriminate ρ from σ than trace distance which gives a polynomial estimation of N .For instance, when ε = 1 4 const • e −100 (then η = 100), for the same value of infidelity and trace distance, saying 0.01, the estimations of N is [10 4 , 2 × 10 4 ] and [10051, 10 6 ], respectively.Thus fidelity is more suitable as the metric D(ρ, σ) = 1 − F (ρ, σ) in Eq.( 9) in the scenario of quantum machine learning that many copies of quantum states are provided.

B Proof of Lemma 1
Proof.For any σ with F (ρ, σ) ≥ 1 − ε, by the monotonicity of the fidelity [2, Theorem 9.6], we have Without loss of generality, we assume p 1 ≥ p 2 ≥ . . .≥ p n .We use x • y and x = √ x • x to denote the inner product of x, y and the 2-norm, respectively.Then p and q both have unit norms: By the definition of fidelity, we have We see that any probability distribution R = (r 1 , r 2 , . . ., r n ), with r k ≥ 0 and k r k = 1 can be viewed as a unit vector ( √ r 1 , √ r 2 , . . ., √ r n ) and the fidelity of two distributions is the inner product of their corresponding unit vectors.
Next, we prove that the unit vector form p of ρ can be used to obtain a robust bound for it.First, we find a vector that has maximum inner product with p and is within another class rather than the belonging class of ρ.This can be done by solving the following optimization problem: With constraint x = 1, we have x • p = (a x) • p/ a x for any a > 0. Thus, let y = a x, the above optimization problem is rewritten as: The objective function is not changed by multiplying the numerator and denominator with a positive number.Thus, we can assume y • p = 1 and obtain the following problem min .y then there exists 0 ≤ λ 0 ≤ 1 such that which is equivalent to So the first and second elements of x(λ 0 ) are equal.
We have x(λ 0 ) − p = (1 − λ 0 )( y * − p) and ( y * − p) • p = 0, which means that y * − p and p are orthogonal.Then We find a vector x(λ 0 ) satisfies y 1 − y 2 = 0 and x(λ 0 ) 2 ≤ y * 2 .Now we consider the situation of y * 1 > y * 2 .We have y * j = y * 1 > y * 2 and p 2 ≥ p j , and following the same analysis in the above, we can find 0 < λ 0 ≤ 1 such that the second and j-th elements of x(λ 0 ) are equal and which is contradict to y * 2 is the optimal value.Thus, the optimal value is achieved at a vector y satisfying y 1 − y 2 = 0, then the problem can be reformulated as min .y Using the Lagrange multiplier method, the optimal value of problem ( 13) is Then the optimal value of problem ( 10) is Therefore, if then for any vector q with √ 1 − ε ≤ p • q, we have the corresponding quantum state σ with F (ρ, σ) ≥ 1 − ε and σ is classified into the class of ρ.In other words, ρ is ε-robust.

C Proof of Theorem 1
Proof.The sufficient direction directly follows from the definition of adversarial robustness in Definition 3.For the necessary direction, if there is k ∈ C such that there is a solution σ in the above problem, then A(σ) = k, i.e., σ is classified in the class k.That is σ is an ε-adversarial example of ρ and ρ is not ε-robust.Now we prove that the above problem can be reduced to a SDP.First, it is easy to verify D(H), the set of quantum states on H, is a convex set of positive semidefinite matrices.Computing F (ρ, σ) can be reduced to solving a SDP [44].Thus replacing F (ρ, σ) by the SDP, the above problem is a SDP problem by noting that tr(σ) = 1 is equivalent to tr(σ) ≤ 1 and −tr(σ) ≤ −1.

D Pure State Robustness Verification
In this section, we discuss the robustness verification for pure states, i.e. pure state |ψ against adversarial examples of pure states.That is that all quantum states in the training dataset and their adversarial examples are pure states.Then, by Theorem 2 and noting that the set of all pure states is not convex, computing the optimal robust bound for pure states is not an SDP.But we can prove it is a Quadratically Constrained Quadratic Program (QCQP), which is hard to be solved.
In mathematical optimization, a QCQP is an optimization problem in which both the objective function and the constraints are quadratic functions.It has the form min .
1 2 x T P 0 x + q T 0 x subject to 1 2 x T P i x + q T i x + r i ≤ 0, for i = 1, . . ., m Ax = b where P 0 , P 1 , . . ., P m are n × n matrices and x ∈ R n is the optimization variable.The problem is convex, if P 0 , P 1 , . . ., P m are all positive semidefinite, but non-convex, if these matrices are neither positive nor negative semidefinite.In general, solving QCQP is an NP-hard problem.As we can see, the above QCQP is non-convex, so we cannot efficiently compute the pure state optimal robust bound like the mixed state case by SDP.However, there are some numerical tools designed to solve QCQP-not only compute the minimal value but also output a corresponding optimal solution |ϕ , as SDP solvers.Some methods have been developed to approximately solving non-convex QCQPs in a reasonable time.For example, non-convex QCQPs with non-positive off-diagonal elements can be exactly solved by SDP relaxations [45].Therefore, there may have a polynomial-time algorithm to solve this specific form of QCQP, and we left this problem as a future research.

E Training Models
is a well-trained quantum classifier, ε < 1 is a real number, (ρ, l) is an element of the training dataset of A Ensure: true indicates ρ is ε-robust or false with an adversarial example σ indicates ρ is not ε-robust 1: for each k ∈ C and k = l do 2: By a SDP solver, compute δ k with an optimal state σ k in the SDP of Theorem 2 3: end for 4: Let δ = min k δ k and k * = arg min k δ k 5: if δ > ε then 6: return true 7: else 8:

2 .Fig. 2 :
Fig. 2: Training model of quantum bits classification: the left figure shows the samples of quantum training dataset represented on the Bloch sphere.Samples are divided into two categories, marked by red and yellow, respectively.The vectors are the states around which the samples were taken.The first part of the right figure is a parameterized rotation gate, whose job is to remove the superpositions in the quantum data.The second part is a measurement M along the Z-axis of the Bloch sphere converting the quantum data into classes.

Fig. 3 :
Fig. 3: Two training states and their adversarial examples generated by Algorithm 2 with a QCQP solver: the first column images are 28 × 28 benign data from MNIST; The second column shows the two downscaled 16 × 16 grayscale images; The last column images are decoded from adversarial examples founded by Algorithm 2. The third column images are the grayscale difference between benign and adversarial images.

Fig. 4 :
Fig. 4: (a) The phase diagram obtained by our trained QCNN model for input size N = 6 spins.(b) Our QCNN circuit model.

Fig. 5 :
Fig. 5: (a) The circuit generating cluster state.(b) The classification model for cluster excitation detection.

Table 1 :
Verification Results of Quantum Bits Classification

Table 2 :
Verification Results of Quantum Phase Recognition.

Table 3 :
Verification Results of Cluster Excitation Detection

Table 4 :
Verification Results of the Classification of MNIST and 0.0003, the under-approximation of the robust accuracy is exactly same to the real value.Even for the last case of ε = 0.0004, only 0.1% difference is got.Furthermore, from the tables, the verification time of Algorithm 2 is increasing with the value of ε, while the running time of the method by the robust bound is almost unchanged.This is because the former algorithm uses a SDP or QCQP solver to search all possible adversarial examples for the potential non-robust states picked up by the robust bound, and the number of these states are growing up with the value of ε.These counter-examples detected by the algorithm confirms that our robustness framework is effective.For instance, see Fig.3for two visualized adversarial examples generated by Algorithm 2 with a QCQP solver.As we can see, the benign and adversarial images are perceptually similar.This also proves that our robustness verification algorithm can detect not only quantum but also classical adversarial examples.