Analysis of classifiers’ robustness to adversarial perturbations
 3.2k Downloads
 16 Citations
Abstract
The goal of this paper is to analyze the intriguing instability of classifiers to adversarial perturbations (Szegedy et al., in: International conference on learning representations (ICLR), 2014). We provide a theoretical framework for analyzing the robustness of classifiers to adversarial perturbations, and show fundamental upper bounds on the robustness of classifiers. Specifically, we establish a general upper bound on the robustness of classifiers to adversarial perturbations, and then illustrate the obtained upper bound on two practical classes of classifiers, namely the linear and quadratic classifiers. In both cases, our upper bound depends on a distinguishability measure that captures the notion of difficulty of the classification task. Our results for both classes imply that in tasks involving small distinguishability, no classifier in the considered set will be robust to adversarial perturbations, even if a good accuracy is achieved. Our theoretical framework moreover suggests that the phenomenon of adversarial instability is due to the low flexibility of classifiers, compared to the difficulty of the classification task (captured mathematically by the distinguishability measure). We further show the existence of a clear distinction between the robustness of a classifier to random noise and its robustness to adversarial perturbations. Specifically, the former is shown to be larger than the latter by a factor that is proportional to \(\sqrt{d}\) (with d being the signal dimension) for linear classifiers. This result gives a theoretical explanation for the discrepancy between the two robustness properties in high dimensional problems, which was empirically observed by Szegedy et al. in the context of neural networks. We finally show experimental results on controlled and realworld data that confirm the theoretical analysis and extend its spirit to more complex classification schemes.
Keywords
Adversarial examples Classification robustness Random noise Instability Deep networks1 Introduction
Stateoftheart deep networks have recently been shown to be surprisingly unstable to adversarial perturbations (Szegedy et al. 2014). Unlike random noise, adversarial perturbations are minimal perturbations that are sought to switch the estimated label of the classifier. On vision tasks, the results of Szegedy et al. (2014) have shown that perturbations that are hardly perceptible to the human eye are sufficient to change the decision of a deep network, even if the classifier has a performance that is close to the human visual system. This surprising instability raises interesting theoretical questions that we initiate in this paper. What causes classifiers to be unstable to adversarial perturbations? Are deep networks the only classifiers that have such unstable behaviour? Is it at all possible to design training algorithms to build deep networks that are robust or is the instability to adversarial noise an inherent feature of all deep networks? Can we quantify the difference between random noise and adversarial noise? Providing theoretical answers to these questions is crucial in order to achieve the goal of building classifiers that are robust to adversarial hostile perturbations.
In this paper, we introduce a framework to formally study the robustness of classifiers to adversarial perturbations in the binary setting. We provide a general upper bound on the robustness of classifiers to adversarial perturbations, and then illustrate and specialize the obtained upper bound for the families of linear and quadratic classifiers. In both cases, our results show the existence of a fundamental limit on the robustness to adversarial perturbations. This limit is expressed in terms of a distinguishability measure between the classes, which depends on the considered family of classifiers. Specifically, for linear classifiers, the distinguishability is defined as the distance between the means of the two classes, while for quadratic classifiers, it is defined as the distance between the matrices of second order moments of the two classes. For both classes of functions, our upper bound on the robustness is valid for all classifiers in the family independently of the training procedure, and we see the fact that the bound is independent of the training procedure as a strength. This result has the following important implication: in difficult classification tasks involving a small value of distinguishability, any classifier in the set with low misclassification rate is vulnerable to adversarial perturbations. Importantly, the distinguishability parameter related to quadratic classifiers is much larger than that of linear classifiers for many datasets of interest, which suggests the existence of robust classifiers in flexible classification families, even for tasks where no linear robust and accurate classifiers exist (provably). We further compare the robustness to adversarial perturbations of linear classifiers to the more traditional notion of robustness to random uniform noise, where perturbation vectors are sampled uniformly at random from a sphere. The latter robustness is shown to be larger than the former by a factor of \(\sqrt{d}\) (with d the dimension of input signals), thereby showing that in high dimensional classification tasks, linear classifiers can be robust to random noise even for small values of the distinguishability. We illustrate the newly introduced concepts and our theoretical results on a running example used throughout the paper. We complement our theoretical analysis with experimental results, and show that the intuition obtained from the theoretical analysis also holds for more complex classifiers.
The phenomenon of adversarial instability has recently attracted a lot of attention from the deep network community. In Szegedy et al. (2014), the adversarial robustness of different classifiers is measured as the magnitude of the perturbation required to misclassify a data point. Stateoftheart classifiers are moreover shown to achieve small robustness. Several attempts have then been made to make deep networks robust to adversarial perturbations (Chalupka et al. 2014; Gu and Rigazio 2014; Bendale and Boult 2016), while more advanced techniques are proposed to defeat the classifiers (Carlini and Wagner 2016). Moreover, a distinct but related phenomenon has been explored in Nguyen et al. (2014). Closer to our work, the authors of Goodfellow et al. (2015) provided an empirical explanation of the phenomenon of adversarial instability, and designed an efficient method to find adversarial examples. Specifically, contrarily to the original explanation provided in Szegedy et al. (2014), the authors argue that it is the “linear” nature of deep nets that causes the adversarial instability. Instead, our paper adopts a rigorous mathematical perspective to the problem of adversarial instability and shows that adversarial instability is due to the low flexibility of classifiers compared to the difficulty of the classification task.
Our work should not be confused with works on the security of machine learning algorithms under adversarial attacks (Biggio et al. 2012; Barreno et al. 2006; Dalvi et al. 2004). These works specifically study attacks that manipulate the learning system (e.g., change the decision function by injecting malicious training points), as well as defense strategies to counter these attacks. This setting significantly differs from ours, as we examine the robustness of a fixed classifier to adversarial perturbations (that is, the classifier cannot be manipulated). The stability of learning algorithms has also been defined and extensively studied in Bousquet and Elisseeff (2002), Lugosi and Pawlak (1994). Again, this notion of stability differs from the one studied here, as we are interested in the robustness of fixed classifiers, and not of learning algorithms. The security of machine learning algorithms at test time have also been previously examined in different scenarios, in particular when the adversary has only limited knowledge about the classifier (Biggio et al. 2013; Dekel et al. 2010; Srndic and Laskov 2014). Unlike these papers that provide an empirical assessment (and improvement) of the robustness of classifiers to different types of attacks, the goal of our work is significantly different, as we show fundamental upper bounds on the robustness of classifiers, which cannot be violated by any learning algorithm.
The construction of learning algorithms that achieve robustness of classifiers to data corruption has been an active area of research in machine learning and robust optimization (see e.g., Caramanis et al. 2012 and references therein). For a specific disturbance model on the data samples, the robust optimization approach for constructing robust classifiers seeks to minimize the worst possible empirical error under such disturbances (Lanckriet et al. 2003; Xu et al. 2009). It is shown that, for many disturbance models, the desired objective function can be written as a tractable convex optimization problem. Our work studies the robustness of classifiers from a different perspective; we establish upper bounds on the robustness of classifiers independently of the learning algorithms. That is, using our bounds, we can certify the instability of a class of classifiers to adversarial perturbations, independently of the learning mechanism. In other words, while algorithmic and optimization aspects of robust classifiers have been studied in the above works, we focus on fundamental limits on the adversarial robustness of classifiers that are independent of the learning scheme.
The paper is structured as follows: Sect. 2 introduces the problem setting. In Sect. 3, we introduce a running example that is used throughout the paper. We introduce in Sect. 4 a theoretical framework for studying the robustness to adversarial perturbations. In the following two sections, two case studies are analyzed in detail. The robustness of linear classifiers (to adversarial and random noise) is studied in Sect. 5. In Sect. 6, we study the adversarial robustness of quadratic classifiers. Experimental results illustrating our theoretical analysis are given in Sect. 7. Proofs and additional discussion on the choice of the norms to measure perturbations are finally deferred to the “Appendix”.
2 Problem setting
Quantities of interest in the paper and their dependencies
Quantity  Definition  Dependence 

Risk  \(R(f) = \mathbb {P}_{\mu } (\text {sign} (f(x)) \ne y(x))\)  \(\mu , y, f\) 
Robustness to adversarial perturbations  \(\rho _{\text {adv}} (f) = \mathbb {E}_{\mu } (\varDelta _{\text {adv}}(x; f))\)  \(\mu , f\) 
Robustness to random uniform noise  \(\rho _{\text {unif}, \epsilon } (f) = \mathbb {E}_{\mu } (\varDelta _{\text {unif}, \epsilon } (x;f))\)  \(\mu , f\) 
3 Running example
We introduce in this section a running example used throughout the paper to illustrate the notion of adversarial robustness, and highlight its difference with the notion of risk. We consider a binary classification task on square images of size \(\sqrt{d} \times \sqrt{d}\). Images of class 1 (resp. class −1) contain exactly one vertical line (resp. horizontal line), and a small constant positive number a (resp. negative number \({}a\)) is added to all the pixels of the images. That is, for class 1 (resp. −1) images, background pixels are set to a (resp. \({}a\)), and pixels belonging to the line are equal to \(1+a\) (resp. \(1a\)). Figure 2 illustrates the classification problem for \(d = 25\). The number of datapoints to classify is \(N = 2 \sqrt{d}\).

Risk and adversarial robustness are two distinct properties of a classifier. While \(R(f_{\text {lin}}) = 0\), \(f_{\text {lin}}\) is definitely not robust to small adversarial perturbations.^{6} This is due to the fact that \(f_{\text {lin}}\) only captures the bias in the images and ignores the orientation of the line.

To capture orientation (i.e., the most visual concept), one has to use a classifier that is flexible enough for the task. Unlike the class of linear classifiers, the class of polynomial classifiers of degree 2 correctly captures the line orientation, for \(d = 4\).

The robustness to adversarial perturbations provides a quantitative measure of the strength of a concept. Since \(\rho _{\text {adv}}(f_{\text {lin}}) \ll \rho _{\text {adv}}(f_{\text {quad}})\), one can confidently say that the concept captured by \(f_{\text {quad}}\) is stronger than that of \(f_{\text {lin}}\), in the sense that the essence of the classification task is captured by \(f_{\text {quad}}\), but not by \(f_{\text {lin}}\) (while they are equal in terms of misclassification rate). In general classification problems, the quantity \(\rho _{\text {adv}} (f)\) provides a natural way to evaluate and compare the learned concept; larger values of \(\rho _{\text {adv}} (f)\) indicate that stronger concepts are learned, for comparable values of the risk.
In the next sections, our goal is to quantify how large can the robustness to adversarial perturbations be, for fixed classification families (e.g., family of linear classifiers). To do so, we establish upper bounds on the adversarial robustness \(\rho _{\text {adv}} (f)\) in terms of the classifier risk R(f) for all classifiers in the family. These learningindependent limits show that it is not possible to achieve a large robustness jointly with a small risk for many classification tasks of interest, independently of the training algorithm used to choose f.
4 Upper limit on the adversarial robustness
We now introduce our theoretical framework for analyzing the robustness to adversarial perturbations. We first present a key assumption on the classifier f for the analysis of adversarial robustness.
Bounds of the form Eq. (6) have been established for various classes of functions since the early of work of Łojasiewicz (1961) in algebraic geometry and have found applications in areas such as mathematical optimization (Pang 1997; Lewis and Pang 1998). For example, Łojasiewicz (1961) and later Luo and Pang (1994) have shown that, quite remarkably, assumption (A) holds for the general class of analytic functions.^{7} In Ng and Zheng (2003), (A) is shown to hold with \(\gamma =1\) for piecewise linear functions. In Luo and Luo (1994), error bounds on polynomial systems are studied. Proving inequality (6) with explicit constants \(\tau \) and \(\gamma \) for different classes of functions is still an active area of research (Li et al. 2014). In Sects. 5 and 6, we provide examples of function classes for which (A) holds, and explicit formulas for the parameters \(\tau \) and \(\gamma \).
The following result establishes a general upper bound on the robustness to adversarial perturbations:
Lemma 1
The proof can be found in “Appendix A.1”. The above result provides an upper bound on the adversarial robustness that depends on the risk of the classifier, as well as a weighted difference between the expectations of the classifier values computed on distribution \(\mu _1\) and \(\mu _{1}\). This result is general, as we only assume that f satisfies assumption (A). In the next two sections, we apply Lemma 1 to two classes of classifiers, and derive interpretable upper bounds in terms of a distinguishibality measure (that depends only on the dataset) which quantifies the notion of difficulty of a classification task. Studying the general result in Lemma 1 through two practical classes of classifiers shows the implications of such a fundamental limit on the adversarial robustness, and illustrates the methodology for deriving classspecific and practical upper bounds on adversarial robustness from the general upper bound.
5 Robustness of linear classifiers to adversarial and random perturbations
The goal of this section is twofold; first, we specialize Lemma 1 to the class of linear functions, and derive interpretable upper bounds on the robustness of classifiers to adversarial perturbations (Sect. 5.1). Then, we derive a formal relation between the robustness of linear classifiers to adversarial perturbations, and the robustness to random uniform noise (Sect. 5.2).
5.1 Adversarial perturbations
We define the classification function \(f(x) = w^T x + b\). Note that any linear classifier for which \(b > M \Vert w \Vert _2\) is a trivial classifier that assigns the same label to all points, where we recall that M is defined such that \(\mathbb {P}_{\mu } (\Vert x \Vert _2 \le M) = 1\). We therefore assume that \(b \le M \Vert w \Vert _2\).
We first show that the family of linear classifiers satisfies assumption (A), with explicit parameters \(\tau \) and \(\gamma \).
Lemma 2
Assumption (A) holds for linear classifiers \(f(x) = w^T x + b\) with \(\tau = 1/\Vert w \Vert _2\) and \(\gamma = 1\).
Proof
Let x be such that \(f(x) \ge 0\), and the goal is to prove that \(\text {dist}(x, S_{}) \le \tau f(x)^{\gamma }\) (the other inequality can be handled in a similar way). We have \(f(x) = w^T x + b\). Observe that \(\text {dist}(x, S_) = \min _{z} \{ \Vert x  z \Vert _2: w^T z + b \le 0 \}\), which corresponds to the distance between x and its projection onto the affine plane \(\{z: w^T z + b = 0\}\). Hence, \(\text {dist}(x, S_) = f(x) / \Vert w \Vert _2 \implies \tau = 1/\Vert w\Vert _2, \gamma =1\). \(\square \)
Using Lemma 1, we now derive an interpretable upper bound on the robustness to adversarial perturbations. In particular, the following theorem bounds \(\rho _{\text {adv}} (f)\) from above in terms of the first moments of the distributions \(\mu _1\) and \(\mu _{1}\), and the classifier’s risk:
Theorem 1
Proof
 i.
\(w^T \left( p_1 \mathbb {E}_{\mu _1} (x)  p_{1} \mathbb {E}_{\mu _{1}} (x) \right) \le \Vert w \Vert _2 \Vert p_1 \mathbb {E}_{\mu _1} (x)  p_{1} \mathbb {E}_{\mu _{1}} (x) \Vert _2\) using CauchySchwarz inequality.
 ii.
\(b (p_1  p_{1}) \le M \Vert w \Vert _2 p_{1}  p_{1}\) using the assumption \(b \le M \Vert w\Vert _2\),
 iii.
\(\Vert f \Vert _{\infty } = \max _{x: \Vert x \Vert _2 \le M} \{ w^T x + b \} \le 2 M \Vert w \Vert _2\).
When \(p_1 = p_{1} = 1/2\), and the intercept \(b=0\), inequality (iii) can be tightened to \(\Vert f \Vert _{\infty } \le M \Vert w \Vert _2\), and directly leads to the stated result Eq. (8). \(\square \)
Our upper bound on \(\rho _{\text {adv}} (f)\) depends on the difference of means \(\Vert \mathbb {E}_{\mu _1} (x)  \mathbb {E}_{\mu _{1}} (x) \Vert _2\), which measures the distinguishability between the classes. Note that this term is classifierindependent, and is only a property of the classification task. The only dependence on f in the upper bound is through the risk R(f). Thus, in classification tasks where the means of the two distributions are close (i.e., \(\Vert \mathbb {E}_{\mu _1} (x)  \mathbb {E}_{\mu _{1}} (x) \Vert _2\) is small), any linear classifier with small risk will necessarily have a small robustness to adversarial perturbations. Note that the upper bound logically increases with the risk, as there clearly exist robust linear classifiers that achieve high risk (e.g., constant classifier). Figure 4a pictorially represents the \(\rho _{\text {adv}}\) versus R diagram as predicted by Theorem 1. Each linear classifier is represented by a point on the \(\rho _{\text {adv}}\)–R tradeoff diagram, and our result shows the existence of a region that linear classifiers cannot attain.
5.2 Random uniform noise
We now examine the robustness of linear classifiers to random uniform noise. The following theorem compares the robustness of linear classifiers to random uniform noise with their robustness to adversarial perturbations.
Theorem 2
The proof can be found in “Appendix A.2”. In words, \(\rho _{\text {unif}, \epsilon } (f)\) behaves as \(\sqrt{d} \rho _{\text {adv}} (f)\) for linear classifiers (for constant \(\epsilon \)). Linear classifiers are therefore more robust to random noise than adversarial perturbations, by a factor of \(\sqrt{d}\). In typical high dimensional classification problems, this shows that a linear classifier can be robust to random noise even if \(\Vert \mathbb {E}_{\mu _1} (x)  \mathbb {E}_{\mu _{1}} (x) \Vert _2\) is small. Note moreover that our result is tight for \(\epsilon = 0\), as we get \(\rho _{\text {unif}, 0} (f)= \rho _{\text {adv}} (f)\).
Our results can be put in perspective with the empirical results of Szegedy et al. (2014), that showed a large gap between the two notions of robustness on neural networks. Our analysis provides a confirmation of this high dimensional phenomenon on linear classifiers.
5.3 Illustration of the results on the running example
We now focus on the robustness to uniform random noise of \(f_{\text {lin}}\). For various values of d, we compute the upper and lower bounds on the robustness to random uniform noise (Theorem 2) of \(f_{\text {lin}}\), where we fix \(\epsilon \) to 0.01. In addition, we compute a simple empirical estimate \(\widehat{\rho }_{\text {unif}, \epsilon }\) of the robustness to random uniform noise of \(f_{\text {lin}}\) (see Sect. 7 for details on the computation of this estimate). The results are illustrated in Fig. 5. While the adversarial noise robustness is constant with the dimension (equal to 0.1, as \(\rho _{\text {adv}} (f_{\text {lin}}) = \sqrt{d} a\) and \(a = 0.1/\sqrt{d}\)), the robustness to random uniform noise increases with d. For example, for \(d = 2500\), the value of \(\rho _{\text {unif}, \epsilon }\) is at least 15 times larger than the adversarial robustness \(\rho _{\text {adv}}\). In high dimensions, a linear classifier is therefore much more robust to random uniform noise than adversarial noise.
6 Adversarial robustness of quadratic classifiers
In this section, we derive specialized upper bounds on the robustness to adversarial perturbations of quadratic classifers using Lemma 1.
6.1 Analysis of adversarial perturbations
We first show that the assumption (A) is satisfied for quadratic classifiers, and derive explicit formulas for \(\tau \) and \(\gamma \).
Lemma 3
Assumption (A) holds for the class of quadratic classifiers \(f(x) = x^T A x\) where \(\lambda _{\min } (A) < 0\), \(\lambda _{\max } (A) > 0\) with \(\tau = \max (\lambda _{\min } (A)^{1/2}, \lambda _{\max } (A)^{1/2})\), and \(\gamma = 1/2\),
Proof
The following result builds on Lemma 1 and bounds the adversarial robustness of quadratic classifiers as a function of the second order moments of the distribution and the risk.
Theorem 3
Proof
 i.Note first thatwhere we have used the canonical inner product for matrices \(\langle Y, Z \rangle = \text {Trace} (Y^T Z)\). Using Holder’s inequality for matrices (Bhatia 2013), we have \(\langle A, p_1 C_1  p_{1} C_{1} \rangle \le \Vert A \Vert \Vert p_1 C_1  p_{1} C_{1} \Vert _{*}\), where \(\Vert \cdot \Vert \) and \(\Vert \cdot \Vert _{*}\) denote respectively the spectral and nuclear matrix norms.$$\begin{aligned} p_1 \mathbb {E}_{\mu _1} (x^T A x)  p_{1} \mathbb {E}_{\mu _{1}} (x^T A x)&= \sum _{i,j} a_{i,j} p_1 \mathbb {E}_{\mu _1} (x_i x_j)  \sum _{i,j} a_{i,j} p_{1} \mathbb {E}_{\mu _{1}} (x_i x_j) \\&= p_1 \text {Trace} (A^T C_1)  p_{1} \text {Trace} (A^T C_{1}) \\&= \text {Trace} (A^T (p_1 C_1  p_{1} C_{1})) \\&= \langle A, p_1 C_1  p_{1} C_{1} \rangle , \end{aligned}$$
 ii.
\(f(x) = x^T A x \le \Vert A \Vert \Vert x \Vert \le \Vert A \Vert M\),
 iii.
\(\Vert A \Vert ^{1/2} \tau = \max (\lambda _{\min }(A), \lambda _{\max }(A))^{1/2} \max (\lambda _{\min } (A)^{1/2}, \lambda _{\max } (A)^{1/2}) \le \sqrt{K}\).
In words, the upper bound on the adversarial robustness depends on a distinguishability measure, defined by \(\Vert C_1  C_{1} \Vert _{*}\), and the classifier’s risk. In difficult classification tasks, where \(\Vert C_1  C_{1} \Vert _{*}\) is small, all quadratic classifiers with low risk that satisfy our assumptions in Eqs. (12, 13) are nonrobust to adversarial perturbations.
It should be noted that, while the distinguishability is measured with the distance between the means of the two distributions in the linear case, it is defined here as the difference between the second order moments matrices \(\Vert C_1  C_{1} \Vert _{*}\). Therefore, in classification tasks involving two distributions with close means, and different second order moments, any zerorisk linear classifier will not be robust to adversarial noise, while zerorisk and robust quadratic classifiers are a priori possible according to our upper bound in Theorem 3. This suggests that robustness to adversarial perturbations can be larger for more flexible classifiers, for comparable values of the risk.
Finally, it is important to emphasize that the above result does not show that any linear classifier is always less robust than any quadratic classifier, for a fixed problem. In contrast, we show that for a fixed problem, the upper bound on \(\rho _{\text {adv}} (f)\) obtained for the family of linear classifiers is usually much smaller than that of quadratic classifiers (for similar accuracy). This therefore suggests that, while for many problems of interest, it is not possible to find robust (and accurate) linear classifiers, we can find higherorder classifiers that achieve large robustness (and accuracy).
6.2 Illustration of the results on the running example
7 Experimental results
7.1 Setting
In this section, we illustrate our results on practical classification examples. Specifically, through experiments on real data, we seek to confirm the identified limit on the robustness of classifiers, and we show the large gap between adversarial and random robustness on real data. We also study more general classifiers to suggest that the trends obtained with our theoretical results are not limited to linear and quadratic classifiers.
7.2 Binary classification using SVM
We perform experiments on several classifiers: linear SVM (denoted LSVM), SVM with polynomial kernels of degree q (denoted polySVM (q)), and SVM with RBF kernel with a width parameter \(\sigma ^2\) (RBFSVM(\(\sigma ^2\))). To train the classifiers, we use the efficient Liblinear (Fan et al. 2008) and LibSVM (Chang and Lin 2011) implementations, and we fix the regularization parameters using a crossvalidation procedure.
Training and testing accuracy of different models, and robustness to adversarial noise for the MNIST task
Model  Train error (%)  Test error (%)  \(\widehat{\rho }_{\text {adv}}\)  \(\widehat{\rho }_{\text {unif}, \epsilon }\) 

LSVM  4.8  7.0  0.08  0.97 
polySVM(2)  0  1  0.19  2.15 
polySVM(3)  0  0.6  0.24  2.51 
RBFSVM(1)  0  1.1  0.16  – 
RBFSVM(0.1)  0  0.5  0.32  – 
Training and testing accuracy of different models, and robustness to adversarial noise for the CIFAR task
Model  Train error (%)  Test error (%)  \(\widehat{\rho }_{\text {adv}}\)  \(\widehat{\rho }_{\text {unif}, \epsilon }\) 

LSVM  14.5  21.3  0.04  0.94 
polySVM(2)  4.2  15.3  0.03  0.73 
polySVM(3)  4  15  0.04  0.89 
RBFSVM(1)  7.6  16  0.04  – 
RBFSVM(0.1)  0  13.1  0.06  – 
The parameter \(\kappa \), and distinguishability measures for the two classification tasks
Quantity  Definition  Digits  Natural images 

Distance between classes  \(\kappa \) [see Eq. (14)]  0.72  0.39 
Distinguishability (linear class.)  \(\Vert p_1 \mathbb {E}_{\mu _1} (x)  p_{1} \mathbb {E}_{\mu _{1}} (x) \Vert _2\)  0.14  0.06 
Distinguishability (quadratic class.)  \(2 \sqrt{K \Vert p_{1} C_1  p_{1} C_{1} \Vert _{*}}\)  1.4  0.87 
The instability of all classifiers to adversarial perturbations on this task suggests that the essence of the classification task was not correctly captured by these classifiers, even if a fairly good test accuracy is reached. To reach better robustness, two possibilities exist: use a more flexible family of classifiers (as our theoretical results suggest that more flexible families of classifiers achieve better robustness), or use a better training algorithm for the tested nonlinear classifiers. The latter solution seems possible, as the theoretical limit for quadratic classifiers suggests that there is still room to improve the robustness of these classifiers.
7.3 Multiclass classification using CNN
Since our theoretical results suggest that more flexible classifiers achieve better robustness to adversarial perturbations in the binary case, we now explore empirically whether the same intuitions hold in scenarios that depart from the theory in two different ways: (i) we consider multiclass classification problems, and (ii) we consider convolutional neural network architectures. While classifiers’ flexibility is relatively well quantified for polynomial classifiers by the degree of the polynomials, this is not straightforward to do for neural network architectures. In this section, we examine the effect of breadth and depth on the robustness to adversarial perturbations of classifiers.
We observe first that increasing the depth of the network leads to a significant increase in the robustness to adversarial perturbations, especially from 1 to 2 layers. The depth of a neural network has an important impact on the robustness of the classifier, just like the degree of a polynomial classifier is an important factor for the robustness. Going from 2 to 3 layers however seems to have a marginal effect on the robustness. It should be noted that, despite the increase of the robustness with the depth, the normalized robustness computed for all classifiers is relatively small, which suggests that none of these classifiers is really robust to adversarial perturbations. Note also that the results in Fig. 8a showing an increase of the robustness with the depth are inline with recent results showing that depth provides robustness to adversarial geometric transformations (Fawzi and Frossard 2015). In Fig. 8b, we show the effect of the number of feature maps in the CNN (for a one layer CNN) on the estimated normalized robustness to adversarial perturbations. Unlike the effect of depth, we observe that the number of feature maps has barely any effect on the robustness to adversarial perturbations. Finally, a comparison of the normalized robustness measures of very deep networks VGG16 and VGG19 (Simonyan and Zisserman 2014) on ImageNet shows that these two networks behave very similarly in terms of robustness (both achieve a normalized robustness of \(3 \cdot 10^{3}\)). This experiment, along with the experiment in Fig. 8a, empirically suggest that adding layers on top of shallow network helps in terms of adversarial robustness, but if the depth of the network is already sufficiently large, then adding layers only moderately changes that robustness.
8 Discussion and perspectives
In this paper, we provided a quantitative analysis of the robustness of classifiers to adversarial perturbations, and showed the existence of upper limits on the adversarial robustness of classifiers. We showed that for the family of linear classifiers, the established limit is very small for most problems of interest. Hence, linear classifiers are usually not robust to adversarial noise (even though robustness to random noise might be achieved). Linear classifiers are, however, seldom used directly on the input/pixel space. Instead, the features of the image (e.g., SIFT features Lowe 2004 or features resulting from the first layers of a convolutional neural network) are first computed, and only then fed to a linear classifier. While our bounds (in Sect. 5) can be directly applied in the feature space, such results would be difficult to interpret as they do not translate easily to the input space. In fact, the feature mapping is usually non bijective (and nonsmooth), which implies that the robustness of the linear classifier might significantly differ from the robustness of the overall classification system. Besides, using the \(\ell _2\) metric in the feature space might not be adapted to measure the robustness of the system.
Towards the goal of studying more realistic classifiers, we studied the robustness of quadratic classifiers, and provided a general result that is (in theory) applicable to a large set of classification functions (Lemma 1). Our results for quadratic classifiers show that the limit on the robustness for the family of quadratic classifier is usually larger than for linear classifiers, which gives hope to have classifiers that are robust to adversarial perturbations. In fact, by using an appropriate training procedure, it might be possible to get closer to the theoretical bound. For general nonlinear classifiers (e.g., neural networks), designing training procedures that specifically take into account the robustness in the learning is an important future work. We also believe that the application of our general upper bound in Lemma 1 to derive explicit upper bounds that are specific to e.g., deep neural networks is an important future work. To do that, we believe that it is important to derive explicitly the parameters \((\tau , \gamma )\) of assumption (A) for the class of functions under consideration. Even though this problem is still open, results from algebraic geometry seem to suggest that establishing such bounds might be possible for general classes of functions (e.g., piecewise linear functions). In addition, experimental results suggest that, unlike the breadth of the neural network, the depth plays a crucial role in the adversarial robustness. Identifying an upper bound on the adversarial robustness of deep neural networks in terms of the depth of the network would be a great step towards having a better understanding of such systems.
Footnotes
 1.
The label is assumed here to be a nonstochastic function of the datapoints.
 2.
We make the assumption that a perturbation r that satisfies the equality \(f(x+r) = 0\) flips the estimated label of x.
 3.
In that aspect, our definition slightly differs from the one proposed in Szegedy et al. (2014), which defines the robustness to adversarial perturbations as the average of the norms of the minimal perturbations required to misclassify all datapoints. As our notion of robustness is larger (since \(\varDelta _{\text {adv}} (x; f)\) is positive even when x is misclassified), the upper bounds derived in our paper also directly apply for the definition of robustness in Szegedy et al. (2014).
 4.
To see this, note that the average of class 1 images is equal to the average of class −1 images. Hence, the convex hulls of the two sets of points intersect, which shows that these sets are not linearly separable.
 5.
We postpone the detailed analysis of \(f_{\text {quad}}\) to Sect. 6.
 6.
The opposite is also possible, since a constant classifier (e.g., \(f(x) = 1\) for all x) is clearly robust to perturbations, but does not achieve good accuracy.
 7.
It should be noted that such results assume that the function is defined on a compact set. When this condition is not met, it is easy to find functions where assumption (A) is not satisfied.
 8.
The exact curve is computed using a bruteforce approach that enumerates all possible partitions of the data points with linear classifiers.
 9.
This procedure is not guaranteed to provide the optimal solution (for arbitrary classifiers f), as the problem is clearly non convex. Strictly speaking, the optimization procedure is only guaranteed to provide an upper bound on \(\varDelta _{\text {adv}} (x;f)\).
 10.
We compute the robustness to uniform random noise of all classifiers, except RBFSVM, as this classifier is often asymmetric, assigning to one of the classes “small pockets” in the input space, and the rest of the space is assigned to the other class. In these cases, the robustness to uniform random noise can be equal to infinity for one of the classes, for a given \(\epsilon \).
 11.
More precisely, we report the normalized robustness \(\frac{1}{m} \sum _{i=1}^m \frac{\hat{\varDelta }_{\text {adv}} (x_i; f)}{\Vert x_i \Vert _2}\). This normalized version is easier to interpret in practice; e.g., a normalized robustness of order 1 indicates that a perturbation of the same order as the image is necessary to change the estimated label.
Notes
Acknowledgements
We thank the anonymous reviewers for their detailed comments. We thank Hamza Fawzi, Ian Goodfellow for discussions and comments on an early draft of the paper, and Guillaume Aubrun for pointing out a reference for Theorem 4. We also thank Seyed Mohsen Moosavi for his help in preparing experiments.
References
 Barreno, M., Nelson, B., Sears, R., Joseph, A., & Tygar, D. (2006). Can machine learning be secure? In ACM symposium on information, computer and communications security (pp. 16–25).Google Scholar
 Bendale, A., & Boult, T. E. (2016). Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1563–1572).Google Scholar
 Bhatia, R. (2013). Matrix analysis (Vol. 169). Berlin: Springer.zbMATHGoogle Scholar
 Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., et al. (2013). Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases (pp. 387–402). Berlin: Springer.Google Scholar
 Biggio, B., Nelson, B., & Laskov, P. (2012). Poisoning attacks against support vector machines. In International conference on machine learning (ICML).Google Scholar
 Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. The Journal of Machine Learning Research, 2, 499–526.MathSciNetzbMATHGoogle Scholar
 Caramanis, C., Mannor, S., & Xu, H. (2012). Robust optimization in machine learning. In S. Sra, S. Nowozin, & S. J. Wright (Eds.), Optimization for machine learning. Cambridge: MIT Press. chap 14.Google Scholar
 Carlini, N., & Wagner, D. (2016). Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644.
 Chalupka, K., Perona, P., & Eberhardt, F. (2014). Visual causal feature learning. arXiv preprint arXiv:1412.2309.
 Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.CrossRefGoogle Scholar
 Chang, Y. W., Hsieh, C. J., Chang, K. W., Ringgaard, M., & Lin, C. J. (2010). Training and testing lowdegree polynomial data mappings via linear SVM. The Journal of Machine Learning Research, 11, 1471–1490.MathSciNetzbMATHGoogle Scholar
 Dalvi, N., Domingos, P., Sanghai, S., & Verma, D. (2004). Adversarial classification. In ACM SIGKDD (pp. 99–108).Google Scholar
 Dekel, O., Shamir, O., & Xiao, L. (2010). Learning to classify with missing and corrupted features. Machine Learning, 81(2), 149–178.MathSciNetCrossRefGoogle Scholar
 Fan, R. W., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9, 1871–1874.zbMATHGoogle Scholar
 Fawzi, A., & Frossard, P. (2015) Manitest: Are classifiers really invariant? In British machine vision conference (BMVC) (pp. 106.1–106.13).Google Scholar
 Goldberg, Y., & Elhadad, M. (2008). splitsvm: Fast, spaceefficient, nonheuristic, polynomial kernel computation for nlp applications. In 46th Annual meeting of the association for computational linguistics on human language technologies: Short papers (pp. 237–240).Google Scholar
 Goodfellow, I. (2015). Adversarial examples. http://www.iro.umontreal.ca/~memisevr/dlss2015/goodfellow_adv.pdf, presentation at the Deep Learning Summer School, Montreal.
 Goodfellow, I., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International conference on learning representations.Google Scholar
 Gu, S., & Rigazio, L. (2014). Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068.
 Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto.Google Scholar
 Lanckriet, G., Ghaoui, L., Bhattacharyya, C., & Jordan, M. (2003). A robust minimax approach to classification. The Journal of Machine Learning Research, 3, 555–582.MathSciNetzbMATHGoogle Scholar
 LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRefGoogle Scholar
 Lewis, A., & Pang, J. (1998). Error bounds for convex inequality systems. In J.P. Crouzeix, J.E. MartinezLegaz, & M. Volle (Eds.), Generalized convexity, generalized monotonicity: Recent results (pp. 75–110). Berlin: Springer.Google Scholar
 Li, G., Mordukhovich, B. S., & Pham, T. S. (2015). New fractional error bounds for polynomial systems with applications to Hölderian stability in optimization and spectral theory of tensors. Mathematical Programming, 153(2), 333–362.Google Scholar
 Łojasiewicz, S. (1961). Sur le probleme de la division (to complete).Google Scholar
 Lowe, D. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
 Lugosi, G., & Pawlak, M. (1994). On the posteriorprobability estimate of the error rate of nonparametric classification rules. IEEE Transactions on Information Theory, 40(2), 475–481.MathSciNetCrossRefzbMATHGoogle Scholar
 Luo, X., & Luo, Z. (1994). Extension of Hoffman’s error bound to polynomial systems. SIAM Journal on Optimization, 4(2), 383–392.MathSciNetCrossRefzbMATHGoogle Scholar
 Luo, Z. Q., & Pang, J. S. (1994). Error bounds for analytic systems and their applications. Mathematical Programming, 67(1–3), 1–28.MathSciNetCrossRefzbMATHGoogle Scholar
 Matoušek, J. (2002). Lectures on discrete geometry (Vol. 108). New York: Springer.CrossRefzbMATHGoogle Scholar
 MoosaviDezfooli, S. M., Fawzi, A., & Frossard, P. (2016). Deepfool: A simple and accurate method to fool deep neural networks. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
 Ng, K., & Zheng, X. (2003). Error bounds of constrained quadratic functions and piecewise affine inequality systems. Journal of Optimization Theory and Applications, 118(3), 601–618.MathSciNetCrossRefzbMATHGoogle Scholar
 Nguyen, A., Yosinski, J., & Clune, J. (2014). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. arXiv preprint arXiv:1412.1897.
 Pang, J. (1997). Error bounds in mathematical programming. Mathematical Programming, 79(1–3), 299–332.MathSciNetzbMATHGoogle Scholar
 Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556.
 Srndic, N., & Laskov, P. (2014). Practical evasion of a learningbased classifier: A case study. In IEEE symposium on security and privacy (pp. 197–211). IEEE.Google Scholar
 Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., et al. (2014). Intriguing properties of neural networks. In International conference on learning representations (ICLR).Google Scholar
 Xu, H., Caramanis, C., & Mannor, S. (2009). Robustness and regularization of support vector machines. The Journal of Machine Learning Research, 10, 1485–1510.MathSciNetzbMATHGoogle Scholar