1 Introduction

In the last few years, there has been a growing concern on adversarial example attacks in machine learning. An adversarial attack refers to a small (humanly imperceptible) change of an input specifically designed to fool a machine learning model. These attacks have recently come to light thanks to works by Biggio et al. (2013) and Szegedy et al. (2014) studying deep neural networks for image classification, although it was an existing topic in spam filter analysis (Dalvi et al., 2004; Lowd & Meek, 2005; Globerson & Roweis, 2006). The vulnerability of state-of-the-art classifiers to these attacks has genuine security implications especially for deep neural networks used in AI-driven technologies such as self-driving cars, as repetitively demonstrated by Sharif et al. (2016), Sitawarin et al. (2018) and Yao et al. (2020). Besides security issues, this shows how little we know about the worst-case behaviors of models the industry uses daily. It is essential for the community to understand the very nature of this phenomenon in order to mitigate the threat.

Accordingly, a large body of works has been trying to design new models that would be less vulnerable to the adversarial setting (Goodfellow et al., 2015; Metzen et al., 2017; Xie et al., 2018; Hu et al., 2019; Verma & Swami, 2019) but most of them were proven (in time) to offer only limited protection against more sophisticated attacks (Carlini & Wagner, 2017; He et al., 2017; Athalye et al., 2018; Croce & Hein, 2020; Tramer et al., 2020). Among the defense strategies, randomization has proven effective in some contexts (Xie et al., 2018; Dhillon et al., 2018; Liu et al., 2018; He et al., 2019). Albeit these significant efforts, randomization techniques lack theoretical arguments. In this paper, we generalize the prior results from Pinot et al. (2019) by studying a general class of randomized classifiers, including randomized neural networks, for which we demonstrate adversarial robustness guarantees and analyze their generalization properties (see Sect. 2.3 for more details).

1.1 Supervised learning for image classification

Let us consider the supervised classification problem with an input space \({\mathcal{X}}\) and an output space \({\mathcal{Y}}\). In the following, w.l.o.g. we will consider \({\mathcal{X}}\subset [-1,1]^{d}\) to be a set of images, and \({\mathcal{Y}}:=[K] :=\{1,\dots ,K\}\) a set of labels describing them. The goal of a supervised machine learning algorithm is to design classifier that maps any image \({\varvec{x}}\in {\mathcal{X}}\) to a label \(y \in {\mathcal{Y}}\). To do so, the learner has access to a training sample of n image-label pairs \({\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}),\dots ,({{\varvec{x}}_{\varvec{n}}},y_{n})\}\). Each training pair \(({{\varvec{x}}_{\varvec{i}}},y_{i})\) is assumed to be drawn i.i.d. from a ground-truth distribution \({\mathcal{D}}\). To build a classifier, the usual strategy is to select a hypothesis function \({\varvec{h}}: {\mathcal{X}} \rightarrow {\mathcal{Y}}\) from a pre-defined hypothesis class \({\mathcal{H}}\) to minimize the risk with respect to \({\mathcal{D}}\). This risk minimization problem writes

$$\begin{aligned} \inf _{{\varvec{h}}\in {\mathcal{H}}} {\mathcal{R}}({\varvec{h}}) :={\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ {\mathcal{L}}_{0/1}\left( {\varvec{h}}({\varvec{x}}), y\right) \right] , \end{aligned}$$
(1)

where \({\mathcal{L}}_{0/1}\), the \(0/1\) loss, outputs 1 when \({\varvec{h}}({\varvec{x}}) \ne y\), and zero otherwise.

In practice, the learner does not have access to the ground-truth distribution; hence it cannot estimate the risk \({\mathcal{R}}({\varvec{h}})\). To find an approximate solution for Problem (1), a learning algorithm solves the empirical risk minimization problem instead. In this case, we simply replace the risk by its empirical counterpart over the training sample \({\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}),\ldots ,({{\varvec{x}}_{\varvec{n}}},y_{n})\}\). The empirical risk minimization problem writes

$$\begin{aligned} \inf _{{\varvec{h}}\in {\mathcal{H}}} {\mathcal{R}}_{{\mathcal{S}}}({\varvec{h}}) :=\frac{1}{n} \sum _{i=1}^{n} {\mathcal{L}}_{0/1}\left( {\varvec{h}}({{\varvec{x}}_{\varvec{i}}}), y_{i}\right) . \end{aligned}$$
(2)

Then, to evaluate how far the selected hypothesis is from the optimum, one wants to upper bound the difference between the risk and the empirical risk of any \({\varvec{h}}\in {\mathcal{H}}\). This difference is known as the generalization gap.

1.2 Classification in the presence of an adversary

Given a hypothesis \({\varvec{h}}\in {\mathcal{H}}\) and a sample \(({\varvec{x}},y) \sim {\mathcal{D}}\), the goal of an adversary is to find a perturbation \(\varvec{\tau } \in {\mathcal{X}}\) such that the following assertions both hold. First, the perturbation is imperceptible to humans. This means that a human cannot visually distinguish the standard example \({\varvec{x}}\) from the adversarial example \({\varvec{x}}+ \varvec{\tau }\). Second, the perturbation modifies \({\varvec{x}}\) enough to make the classifier misclassify. More formally, the adversary seeks a perturbation \(\varvec{\tau } \in {\mathcal{X}}\) such that \({\varvec{h}}({\varvec{x}}+\varvec{\tau }) \ne y\).

Although the notion of imperceptible modification is very natural for humans, it is genuinely hard to formalize. Despite these difficulties, in the image classification setting, a sufficient condition to ensure that the attack will remain undetected is to constrain the perturbation \(\varvec{\tau }\) to have a small \(\ell _{p}\) norm. This means that for any \(p \in [1,\infty ]\), there exists a threshold \(\alpha _{p} > 0\) for which any perturbation \(\varvec{\tau }\) is imperceptible as soon as \(\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}\). It is worth noting that \(\ell _{p}\) norms are only surrogates for the perception distance, for which it is still an open question to give a formal definition. In this paper, we only focus on robustness on \(\ell _{p}\) norms. The literature on adversarial attacks for image classification usually uses either an \(\ell _\infty\) norm akin (Madry et al., 2018) or an \(\ell _{2}\) norm akin (Carlini & Wagner, 2017) as a surrogate for imperceptibility. Other authors such as Chen et al. (2018) and Papernot et al. (2016) also used an \(\ell _{1}\) norm or an \(\ell _{0}\) semi-norm.

To account for adversaries possibly manipulating the input images, one needs to revisit the standard risk minimization by incorporating the adversary in the problem. The goal becomes to minimize the worst-case risk under \(\alpha _{p}\)-bounded manipulations. We call this problem the adversarial risk minimization. It writes

$$\begin{aligned} \inf _{{\varvec{h}}\in {\mathcal{H}}} {{\mathcal{R}}^{\mathrm{adv}}}({\varvec{h}}; \alpha _{p}) :={\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})}{\mathcal{L}}_{0/1}\left( {\varvec{h}}({\varvec{x}}+ \varvec{\tau }), y\right) \right] , \end{aligned}$$
(3)

where \(B_{p}(\alpha _{p}) :=\{ \tau \in {\mathcal{X}}~{s.t.}~ \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}\}\). In this new formulation, the adversary focuses on optimizing the inner maximization, while the learner tries to get the best hypothesis from \({\mathcal{H}}\) “under attack”. By analogy with the standard setting, given n training examples \({\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}),\dots ,({{\varvec{x}}_{\varvec{n}}},y_{n})\}\), we want to find an approximate solution to the adversarial risk minimization by studying its empirical counterpart, the empirical adversarial risk minimization. This optimization problem writes

$$\begin{aligned} \inf _{{\varvec{h}}\in {\mathcal{H}}} {\mathcal{R}}^{\mathrm{adv}}_{{\mathcal{S}}}({\varvec{h}};\alpha _{p}) :=\frac{1}{n}\sum _{i=1}^{n} \sup _{\varvec{\tau }\in B_{p}(\alpha _{p})}{\mathcal{L}}_{0/1}\left( {\varvec{h}}({{\varvec{x}}_{\varvec{i}}} + \varvec{\tau }), y_{i}\right) . \end{aligned}$$
(4)

In the presence of an adversary, two major issues appear in the empirical risk minimization. First, as recently pointed out by Madry et al. (2018), the adversarial generalization error (i.e. the gap between the empirical adversarial risk and the adversarial risk) can be much larger than in the standard setting. Indeed, the adversary makes the problem dependent on the dimension of \({\mathcal{X}}\). Hence, in high-dimension (e.g. for images) one needs much more samples to classify correctly as pointed out by Schmidt et al. (2018) as well as Simon-Gabriel et al. (2019). Moreover, finding an approximate solution to the adversarial risk minimization is not always sufficient. Indeed, recent works by Tsipras et al. (2019) and Zhang et al. (2019) gave theoretical evidence that training a robust model may lead to an increase of its standard risk. Hence finding a good approximation for Problem (3) may lead to a poor solution for Problem (1). Accordingly, it is natural to wonder whether we can find a class of models \({{{\mathcal{H}}}}\) for which we can control both the standard and adversarial risks?

In this paper, we provide answers to the above question by conducting an in depth analysis of a special class of models called randomized classifiers, i.e. classifiers that output random variables instead of labels. Our main contributions summarize as follows.

1.3 Contributions

Our first contribution consists in studying randomized classifiers. By analogy with the deterministic case, we define a notion of robustness for randomized classifiers. This definition amounts to making the classifier locally Lipschitz with respect to the \(\ell _{p}\) norm on \({\mathcal{X}}\), and a probability metric on \({\mathcal{Y}}\) (e.g. the total variation distance or the Renyi divergence). More precisely, if we denote D the probability metric at hand, a randomized classifier \(\mathrm{m}\) is called \((\alpha _{p}, \epsilon )\)-robust w.r.t. D if for any \({\varvec{x}},{\varvec{x}}' \in {\mathcal{X}}\)

$$\begin{aligned} \Vert {{\varvec{x}}- {\varvec{x}}'}\Vert _{p} \le \alpha _{p} \implies D(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}')) \le \epsilon . \end{aligned}$$

Denoting \({\mathcal{M}}_{D}(\alpha _{p},\epsilon )\) the class of randomized classifiers that respect this local Lipschitz condition, we present the following results.

  1. 1.

    If D is either the total variation distance or the Renyi divergence, we show that for any \(\mathrm{m}\in {\mathcal{M}}_{D}(\alpha _{p},\epsilon )\), we can upper-bound the gap between the risk and the adversarial risk of \(\mathrm{m}\). Notably, if D is the total variation distance, for any \(\mathrm{m}\in {\mathcal{M}}_{D}(\alpha _{p},\epsilon )\) we have \({\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p}) - {\mathcal{R}}(\mathrm{m}) \le \epsilon\). Hence, \(\epsilon\) controls the maximal trade-off between robust and standard accuracy for locally Lipschitz randomized classifier. We demonstrate similar results when D is the Renyi divergence showing that \({\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p}) - {\mathcal{R}}(\mathrm{m}) \le 1- O\left( e^{-\epsilon }\right)\). This means that, for the class of locally Lipschitz randomized classifiers, solving the risk minimization problem, i.e. Problem (1), gives an approximate solution to the adversarial risk minimization problem, i.e. Problem (3), up to an additive factor that depends on the robustness parameter \(\epsilon\).

  2. 2.

    We devise an upper-bound on the generalization gap of any \(\mathrm{m}\) in \({\mathcal{M}}_{D}(\alpha _{p},\epsilon )\). In particular, when D is the total variation distance, we demonstrate that for any \(\mathrm{m}\in {\mathcal{M}}_{D}(\alpha _{p},\epsilon )\) we have

    $$\begin{aligned} {\mathcal{R}}(\mathrm{m}) - {\mathcal{R}}_{{\mathcal{S}}}(\mathrm{m}) \le O\left( \sqrt{\frac{N \times K}{n}}\right) + \epsilon , \end{aligned}$$

    where N is the external \(\alpha _{p}\)-covering number of the input samples. This means that, when \(N/n \underset{n \rightarrow \infty }{\rightarrow } 0\), solving the empirical risk minimization problem, i.e. Problem (2), on \({\mathcal{M}}_{D}(\alpha _{p},\epsilon )\) provides an approximate solution to the risk minimization problem, i.e. Problem (1). Since we can also bound the gap between the adversarial and the standard risk, we can combine the two results to bound the adversarial generalization gap on \({\mathcal{M}}_{D}(\alpha _{p},\epsilon )\). Note however, that this result relies on a strong assumption on \({\mathcal{X}}\) that does not always avoid dimensionality issues. The problem of finding a subclass of \({\mathcal{M}}_{D}(\alpha _{p},\epsilon )\) that provides tighter generalization bounds is an open question.

For our second contribution, we present a practical way to design this class \({\mathcal{M}}(\alpha _{p},\epsilon )\) by using a simple yet efficient noise injection scheme. This allows us to build randomized classifiers from state-of-the-art machine learning models, including deep neural networks. More precisely our contribution is as follows.

  1. 1.

    Based on information-theoretic properties of the total variation distance and the Renyi divergence (e.g. the data processing inequality) we design a noise injection scheme to turn a state-of-the-art machine learning model into a robust randomized classifier. More formally, let us denote \(\varPhi\) the c.d.f. of a standard Gaussian distribution. Let us consider \({\varvec{h}}\) a deterministic hypothesis, we show that the randomized classifier \(\mathrm{m}: {\varvec{x}}\mapsto {\varvec{h}}\left( {\varvec{x}}+n\right)\) with \(n\sim {\mathcal{N}}(0, \sigma ^{2} I_{d})\) is both \((\alpha _{2}, \frac{(\alpha _{2})^{2}}{2 \sigma })\)-robust w.r.t. the Renyi divergence and \((\alpha _{2},\ 2 \varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) - 1)\)-robust w.r.t. the total variation distance. Our results on randomized classifiers are applicable to a wide range of machine learning models including deep neural networks.

  2. 2.

    We further corroborate our theoretical results with experiments using deep neural networks on standard image datasets, namely CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009). These models can simultaneously provide accurate prediction (over 0.82 clean accuracy on CIFAR-10) and reasonable robustness against \(\ell _{2}\) adversarial examples (0.45 against \(\ell _{2}\) adversaries with magnitude 0.5 on CIFAR-10).

2 Related work

Contrary to other notions such as training corruption, a.k.a. poisoning attacks (Kearns & Li, 1993; Kearns et al., 1994), the theoretical study of adversarial robustness is still in its infancy. So far, empirical observations tend to show that (1) adversarial examples on state-of-the-art models are hard to mitigate and (2) robust training methods give poor generalization performances. Some recent works started to study the problem through the lens of learning theory either to understand the links between robustness and accuracy or to provide bounds on the generalization gap of current learning procedures in the adversarial setting.

2.1 Accuracy versus robustness trade-off

A first line of research (Su et al., 2018; Jetley et al., 2018; Tsipras et al., 2019) suggests that designing robust models might be inconsistent with standard accuracy. These works argue with experiments and toy examples that robust and standard classification are two concurrent problems. Following this line, Zhang et al. (2019) observed that the adversarial risk of any hypothesis \({\varvec{h}}\) decomposes as follows,

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}({\varvec{h}};\alpha _{p}) = {\mathcal{R}}({\varvec{h}}) + {\mathcal{R}}^{\mathrm{adv}}_{>0}({\varvec{h}};\alpha _{p}), \end{aligned}$$
(5)

where \({\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m};\alpha _{p})\) is the amount of risk that the adversary gets with non-null perturbations. Looking at Eq. (5), we realize that minimizing the adversarial risk is not enough to control standard accuracy, as one could only optimize over the second term. This indicates that adversarial risk minimization, i.e. Problem (3), is harder to solve than the standard risk minimization, i.e. Problem (1).

While this indicates that both goals may be difficult be achieve simultaneously, Eq. (5), along with the empirical studies from the literature do not highlight any fundamental trade-off between robustness and accuracy. Moreover, no upper-bound on \({\mathcal{R}}^{\mathrm{adv}}_{>0}({\varvec{h}};\alpha _{p})\) has been demonstrated yet. Hence the questions whether this trade-off exists and can be controlled remain open. In this paper, we provide a rigorous answer to these questions by identifying classes \({\mathcal{M}}_{D}(\alpha _{p},\epsilon )\) of randomized classifiers for which we can upper bound the trade-off term \({\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m};\alpha _{p})\) for any \(\mathrm{m}\in {\mathcal{M}}_{D}(\alpha _{p},\epsilon )\). Hence, we can control the maximum loss of accuracy that the model can suffer in the adversarial setting. It also challenges the intuitions developed by previous works (Su et al., 2018; Jetley et al., 2018; Tsipras et al., 2019) and argues in favor of using randomized mechanisms as a defense against adversarial attacks.

2.2 Studying adversarial generalization

To further compare the hardness of the two problems, a recent line of research began to explore the notion of adversarial generalization gap. In this line, Schmidt et al. (2018) presented some first intuitions by studying a simplified binary classification framework where \({\mathcal{D}}\) is a mixture of multi-dimensional Gaussian distributions. In this framework the authors show that without attacks, we only need O(1) training samples to have a small generalization gap. But against an \(\ell _{\infty }\) adversary, we need \(O(\sqrt{d})\) training samples instead. In the discussion of their work, the authors present the problem of obtaining similar results without making any assumption about the distribution as an open problem.

This issue was recently studied using the Rademacher complexity by Khim and Loh (2018), Yin et al. (2019) and Awasthi et al. (2020). These papers relate the adversarial generalization error of linear classifiers and one-hidden layer neural networks with the dimension of the problem. They show that the adversarial generalization depends on the dimension of the problem. At a first glance, the difficulty of adversarial generalization seems to contradict previous conclusions on the link between robustness and generalization presented by Xu and Mannor (2012). But, as we will discuss in the sequel, these results assume that the input space \({\mathcal{X}}\) can be partitioned in O(1) sub-space in which the classification function has small variations. This assumption may not always hold when dealing with high dimensional input spaces (e.g. images) and very sophisticated classification algorithms (e.g. deep neural networks).

Going further, it should be noted that the generalization gap measures only the difference between empirical and theoretical risks. In practice, the empirical adversarial risk is hard to estimate, since we cannot compute the exact solution to the inner maximization problem. The following question therefore remains open: even if we can set up a learning procedure with a controlled generalization gap, can we give guarantees on the standard and adversarial risks? In this paper, we start answering this question by providing techniques that provably offer both small standard risk and reasonable robustness against adversarial examples (see Sect. 1.3 for more details).

2.3 Defense against adversarial examples based on noise injection

Injecting noise into algorithms to improve train time robustness has been used for ages in detection and signal processing tasks (Zozor & Amblard, 1999; Chapeau-Blondeau & Rousseau, 2004; Mitaim & Kosko, 1998; Grandvalet et al., 1997). It has also been extensively studied in several machine learning and optimization fields, e.g. robust optimization (Ben-Tal et al., 2009) and data augmentation techniques (Perez & Wang, 2017). Concurrently to our work, noise injection techniques have been adopted by the adversarial defense community under the randomized smoothing name. The idea of provable defense through noise injection was first proposed by Lecuyer et al. (2019) and refined by Li et al. (2019), Cohen et al. (2019), Salman et al. (2019) and Yang et al. (2020). The rational behind randomized smoothing is very simple: smooth \({\varvec{h}}\) after training by convolution with a Gaussian measure to build a more stable classifier. Our work belongs to the same line of research, but the nature of our results is different. Randomized smoothing is an ensemble method that builds a deterministic classifier by smoothing a pre-trained model with a Gaussian kernel. This scheme requires to compute a Monte-Carlo estimation of the smoothed classifier; hence requiring many rounds of evaluations to output a deterministic label. Our method is based on randomization and only requires one evaluation round for inferring a label, making the prediction randomized and computationally efficient. While randomized smoothing focuses on the construction of certified defenses, we study the generalization properties of randomized mechanisms both in the standard and the adversarial setting. Our analysis presents the fundamental properties of randomized defenses, including (but not limited to) randomized smoothing (c.f. Sect. 7).

This paper is an extended version of a work by Pinot et al. (2019). Since then, we considerably consolidated our theoretical results as follows.

  1. 1.

    Pinot et al. (2019) only studied neural networks defended with noise injection techniques, here we study the much more general class of randomized classifiers which includes, but is not limited to neural networks.

  2. 2.

    We provide a much more detailed analysis of our notion of distributional robustness by presenting an in depth analysis based on the Total variation distance that was missing from (Pinot et al., 2019) (Theorems 1, 5 and 7).

  3. 3.

    Pinot et al. did not analyze the generalization of randomized classifiers. Here, we study the generalization of these classifiers according to the notion of robustness they respect (Theorem 5 and Corollary 1).

  4. 4.

    Last but not least, we added an in-depth discussion on the fundamental properties of randomized classifiers, and how they relate to the notion of randomized smoothing (Sect. 7).

3 Definition of risk and robustness for randomized classifiers

In this work, the goal is to analyze how randomized classifiers can solve the problem of classification in the presence of an adversary. Let us start by defining what we mean by randomized classifiers.

Remark 1

(Note on measurability) Through the paper, we assume every spaces \({{\mathcal{Z}}}\) to be associated with a \(\sigma\)-algebra denoted \({\mathcal{A}}\left( {{\mathcal{Z}}}\right)\). Furthermore, we denote \({\mathcal{P}}\left( {{\mathcal{Z}}} \right)\) the set of probability distributions defined on the measurable space \(\left( {{\mathcal{Z}}},{\mathcal{A}}\left( {\mathcal{Z}}\right) \right)\). In the following, for simplicity, we refer to \({\mathcal{A}}\left( {\mathcal{Z}}\right)\) only when necessary.

Definition 1

(Probabilistic mapping) Let \({\mathcal{Z}}\) and \({\mathcal{Z}}'\) be two arbitrary spaces. A probabilistic mapping from \({\mathcal{Z}}\) to \({\mathcal{Z}}'\) is a mapping \(\mathrm{m}: {\mathcal{Z}} \rightarrow {\mathcal{P}}\left( {\mathcal{Z}}' \right)\), where \({\mathcal{P}}\left( {\mathcal{Z}}' \right)\) is the space of probability measures on \({\mathcal{Z}}'\). When \({\mathcal{Z}} = {\mathcal{X}}\) and \({\mathcal{Z}}' ={\mathcal{Y}}\), \(\mathrm{m}\) is called a randomized classifier. To get a numerical answer for an input \({\varvec{x}}\), we sample \(\hat{y} \sim \mathrm{m}( {\varvec{x}})\).

Any mapping can be considered as a probabilistic mapping, whether it explicitly considers randomization or not. In fact, any deterministic classifier can be considered as a randomized one, since it can be characterized by a Dirac measure. Accordingly, the definition of a randomized classifier is fully general and equally consider classifiers with or without randomization scheme.

3.1 Risk and adversarial risk for randomized classifiers

To analyze this new hypothesis class, we can adapt the concepts of risk and adversarial risk for a randomized classifier. The loss function we use is the natural extension of the \(0/1\) loss to the randomized regime. Given a randomized classifier \(\mathrm{m}\) and a sample \(({\varvec{x}},y) \sim {\mathcal{D}}\) it writes

$$\begin{aligned} {\mathcal{L}}_{0/1}(\mathrm{m}({\varvec{x}}),y) := {\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}})} \left[ {\mathbbm{1}} \left\{ \hat{y} \ne y\right\} \right] . \end{aligned}$$
(6)

This loss function evaluates the probability of misclassification of \(\mathrm{m}\) on a data sample \(({\varvec{x}},y) \sim {\mathcal{D}}\). Accordingly, the risk of \(\mathrm{m}\) with respect to \({\mathcal{D}}\) writes

$$\begin{aligned} {\mathcal{R}}(\mathrm{m})&:= {\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ {\mathcal{L}}_{0/1}(\mathrm{m}( {\varvec{x}}),y) \right] . \end{aligned}$$
(7)

Finally, given \(\mathrm{m}\) and \(({\varvec{x}},y) \sim {\mathcal{D}}\), the adversary seeks a perturbation \(\varvec{\tau }\in B_{p}(\alpha _{p})\) that maximizes the expected error of the classifier on \({\varvec{x}}\) (i.e. \({\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ {\mathbbm{1}} \left\{ \hat{y} \ne y\right\} \right]\)). Therefore, the adversarial risk of \(\mathrm{m}\) under \(\alpha _{p}\)-bounded perturbations writes

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p})&:= {\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathcal{L}}_{0/1}(\mathrm{m}({\varvec{x}}+ \varvec{\tau }),y) \right] . \end{aligned}$$
(8)

By analogy with the deterministic setting, we denote

$$\begin{aligned} {\mathcal{R}}_{{\mathcal{S}}}\left( \mathrm{m}\right) :=\frac{1}{n}\sum _{i=1}^{n} {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}_{i}), y_{i}\right) , {\text{ and}} \end{aligned}$$
(9)
$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}_{{\mathcal{S}}}\left( \mathrm{m}; \alpha _{p} \right) := \frac{1}{n}\sum _{i=1}^{n} \sup _{\varvec{\tau }\in B_{p}(\alpha _{p})}{\mathcal{L}}_{0/1}\left( \mathrm{m}({{\varvec{x}}_{\varvec{i}}} + \varvec{\tau }), y_{i}\right) , \end{aligned}$$
(10)

the empirical risks of \(\mathrm{m}\) for a given training sample \({\mathcal{S}}:=\{ ({{\varvec{x}}_{\varvec{1}}},y_{1}), \dots , ({{\varvec{x}}_{\varvec{n}}},y_{n}) \}\).

3.2 Robustness for randomized classifiers

We could define the notion of robustness for a randomized classifier depending on whether it misclassifies any test sample \(({\varvec{x}},y) \sim {\mathcal{D}}\). But in practice, neither the adversary nor the model provider have access to the ground-truth distribution \({\mathcal{D}}\). Furthermore, in real-world scenarios, one wants to check before its deployment that the model is robust. Therefore, it is required for the classifier to be stable on the regions of the space where it already classifies correctly. Formally a (deterministic) classifier \(c: {\mathcal{X}}\rightarrow {\mathcal{Y}}\) is called robust if for any \(({\varvec{x}}, y) \sim {\mathcal{D}}\) such that \(c({\varvec{x}}) = y\), and for any \(\varvec{\tau }\in {\mathcal{X}}\) one has

$$\begin{aligned} \Vert { \varvec{\tau }}\Vert _{p} \le \alpha _{p} \implies c({\varvec{x}}) = c({\varvec{x}}+ \varvec{\tau }). \end{aligned}$$
(11)

By analogy with this, we define robustness for a randomized classifier below.

Definition 2

(Robustness for a randomized classifier) A randomized classifier \(\mathrm{m}: {\mathcal{X}}\rightarrow {\mathcal{P}}({\mathcal{Y}})\) is called \((\alpha _{p},\epsilon )\)-robust w.r.t. D if for any \({\varvec{x}}, \varvec{\tau }\in {\mathcal{X}}\), one has

$$\begin{aligned} \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p} \implies D\left( \mathrm{m}({\varvec{x}}) , \mathrm{m}({\varvec{x}}+ \varvec{\tau })\right) \le \epsilon . \end{aligned}$$

Where D is a metric/divergence between two probability measures. Given such a metric/divergence D, we denote \({\mathcal{M}}_{D}(\alpha _{p},\epsilon )\) the set of all randomized classifiers that are \((\alpha _{p},\epsilon )\)-robust w.r.t.  D.

Note that we did not add the constraint that \(\mathrm{m}\) classifies well on \(({\varvec{x}},y) \sim {\mathcal{D}}\), since it is already encompassed in the probability distribution itself. If the two probabilities \(\mathrm{m}({\varvec{x}})\) and \(\mathrm{m}({\varvec{x}}+ \varvec{\tau })\) are close, and if \(\mathrm{m}({\varvec{x}})\) outputs y with high probability, then it will be the same for \(\mathrm{m}({\varvec{x}}+ \varvec{\tau })\). This formulation naturally raises the question of the choice of the metric D. Any choice of metric/divergence will instantiate a notion of adversarial robustness, and it should be carefully selected. In the present work, we focus our study on the total variation distance and the Renyi divergence. The question whether these metrics/divergences are more appropriate than others remains open but these two divergences are sufficiently general to cover a wide range of other definitions (see “Appendix 2” for more details). Furthermore, these notions of distance comply with both a theoretical analysis (Sect. 5) and practical considerations (Sect. 8).

3.3 Divergence and probability metrics

Let us now recall the definition of total variation distance and Renyi divergence. Let \({\mathcal{Z}}\) be an arbitrary space, and \(\rho\), \(\rho '\) be two measures in \({\mathcal{P}}({\mathcal{Z}})\).Footnote 1 The total variation distance between \(\rho\) and \(\rho '\) is

$$\begin{aligned} D_{TV}\left( \rho , \rho ' \right) := \sup \limits _{Z \subset {\mathcal{A}} ({\mathcal{Z}})} \vert \rho (Z) - \rho ' (Z) \vert , \end{aligned}$$
(12)

where \({\mathcal{A}}({\mathcal{Z}})\) is the \(\sigma\)-algebra associated with the set of measures \({\mathcal{P}}({\mathcal{Z}})\). The total variation distance is one of the most commonly used probability metrics. It admits several very simple interpretations, and is a very useful tool in many mathematical fields such as probability theory, Bayesian statistics or optimal transport (Villani, 2003; Robert, 2007; Peyré & Cuturi, 2019). In optimal transport, it can be rewritten as the solution of the Monge-Kantorovich problem with the cost function \({\text{cost}}(\varvec{z},\varvec{z}') ={\mathbbm{1}}\left\{ \varvec{z}\ne \varvec{z}'\right\}\),

$$\begin{aligned} D_{TV}(\rho , \rho ' ) = \inf \int _{{\mathcal{Z}}^{2}}{\mathbbm{1}}\left\{ \varvec{z} \ne \varvec{z}'\right\} d\pi (\varvec{z},\varvec{z}') , \end{aligned}$$
(13)

where the infimum is taken over all joint probability measures \(\pi\) in \({\mathcal{P}}\left( {\mathcal{Z}}\times {\mathcal{Z}} \right)\) with marginals \(\rho\) and \(\rho '\). According to this interpretation, it seems quite natural to consider the total variation distance as a relaxation of the trivial distance on [0, 1] (for deterministic classifiers).

Let us now suppose that \(\rho\) and \(\rho '\) admit probability density functions g and \(g'\) according to a third measure \(\nu\). Then the Renyi divergence of order \(\beta\) between \(\rho\) and \(\rho '\) writes

$$\begin{aligned} D_{\beta }\left( \rho , \rho ' \right) :=\frac{1}{\beta -1}\log \int _{\mathcal{Y}} g' (y) \left( \frac{g(y)}{g' (y)}\right) ^{\beta } d\nu (y). \end{aligned}$$
(14)

The Renyi divergence (Rényi, 1961) is a generalized divergence defined for any \(\beta\) on the interval \([1,\infty ]\). It equals the Kullback–Leibler divergence when \(\beta \rightarrow 1\), and the maximum divergence when \(\beta \rightarrow \infty\). It also has the property of being non-decreasing with respect to \(\beta\). This divergence is very common in machine learning and Information theory (van Erven & Harremos, 2014), especially in its Kullback-Leibler form as it is widely used as the loss function, i.e. cross entropy, of classification algorithms. In the remaining, we denote \({\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\) the set of \((\alpha _{p},\epsilon )\)-robust classifiers w.r.t. \(D_{\beta }\).

Let us now give some properties of these divergences that will be useful for our analysis. First we recall the probability preservation property of the Renyi divergence, first presented by Langlois et al. (2014).

Proposition 1

(Langlois et al., 2014) Let \(\rho\) and \(\rho '\) be two measures in \({\mathcal{P}}({\mathcal{Z}})\). Then for any \(Z \in {\mathcal{A}}({\mathcal{Z}})\), the following holds,

$$\begin{aligned} \rho (Z)\le \left( \exp \left( D_{\beta }(\rho , \rho ' )\right) \rho ' (Z)\right) ^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Now thanks to previous works by Gilardoni (2010) and Vajda (1970), we also get the following results relating the total variation distance and the Renyi divergence.

Proposition 2

(Inequality between total variation and Renyi divergence) Let \(\rho\) and \(\rho '\) be two measures in \({\mathcal{P}}({\mathcal{Z}})\), and \(\beta \ge 1\). Then the following holds,

$$\begin{aligned} D_{TV}(\rho , \rho ' ) \le \min \left( \frac{3}{2}\left( \sqrt{1 + \frac{4 D_{\beta }(\rho , \rho ' )}{9}} - 1\right) ^{1/2} ,\ \frac{\exp \left( D_{\beta }(\rho , \rho ' ) +1 \right) -1}{\exp \left( D_{\beta }(\rho , \rho ' ) +1 \right) +1} \right) . \end{aligned}$$

Proof

Thanks to Gilardoni (2010), one has

$$\begin{aligned}&D_{1}(\rho , \rho ') \ge 2D_{TV}(\rho , \rho ')^{2}+ \frac{4D_{TV}(\rho , \rho ')^{4}}{9}. \end{aligned}$$

From which it follows that

$$\begin{aligned}&D_{TV}(\rho , \rho ') \le \frac{3}{2}\left( \sqrt{1 + \frac{4D_{1}(\rho , \rho ')}{9}} - 1\right) ^{1/2}. \end{aligned}$$

Moreover, using inequality from Vajda (1970), one gets

$$\begin{aligned}&D_{1}(\rho , \rho ') +1 \ge \log \left( \frac{1 + D_{TV}(\rho , \rho ')}{1 - D_{TV}(\rho , \rho ')} \right) . \end{aligned}$$

This inequality leads to the following

$$\begin{aligned}&\frac{\exp (D_{1}(\rho , \rho ') +1) -1}{\exp (D_{1}(\rho , \rho ') +1) +1} \ge D_{TV}(\rho , \rho '). \end{aligned}$$

By combining the above inequalities and by monotony of Renyi divergence regarding \(\beta\), one obtains the expected result. \(\square\)

From now on, we denote \({\mathcal{M}}_{TV}\left( \alpha ,\epsilon \right)\) and \({\mathcal{M}}_{\beta }\left( \alpha ,\epsilon \right)\) the set of \((\alpha ,\epsilon )\)-robust classifiers respectively for \(D_{TV}\) and \(D_{\beta }\). The next section gives bounds on the generalization gap in the standard and the adversarial settings for these specific hypothesis classes.

4 Risks’ gap and generalization gap for robust randomized classifiers

As discussed in Sect. 2.1, we can always decompose the adversarial risk of a classifier \({\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p})\) in two terms. First the standard risk \({\mathcal{R}}(\mathrm{m})\) and second the amount of risk the adversary creates with non-zero perturbations \({\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m};\alpha _{p})\). Hence minimizing \({\mathcal{R}}(\mathrm{m})\) can give poor values for \({\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p})\) and vice-versa. In this section, we upper-bound the risks’ gap \({\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m};\alpha _{p})\), i.e. the gap between the risk and the adversarial risk of a robust classifier.

4.1 Risks’ gap for robust classifiers w.r.t. \(D_{TV}\)

First, let us consider \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\). We can control the loss of accuracy under attack of this classifier with the robustness parameter \(\epsilon\).

Theorem 3

(Risk’s gap for robust classifiers w.r.t \(D_{TV}\)) Let \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\) . Then we have

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{p}) \le {\mathcal{R}}(\mathrm{m}) + \epsilon . \end{aligned}$$

Proof

Let \(\mathrm{m}\) be an \((\alpha _{p},\epsilon )\)-robust classifier w.r.t. \(D_{TV}\) , \(({\varvec{x}},y ) \sim {\mathcal{D}}\) and \(\varvec{\tau }\in {\mathcal{X}}\) such that \(\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}\). By definition of the \(0/1\) loss we have

$$\begin{aligned}&{\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }), y \right) = {\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] . \end{aligned}$$

Furthermore, by definition of the total variation distance we have

$$\begin{aligned}&{\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] - {\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}})} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] \le D_{TV}( \mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+\varvec{\tau })). \end{aligned}$$

Since \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\), the above amounts to write

$$\begin{aligned}&{\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }), y \right) - {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}), y \right) \le \epsilon . \end{aligned}$$

Finally, this holds for any \(({\varvec{x}},y) \sim {\mathcal{D}}\) and any \(\alpha _{p}\) bounded perturbation \(\varvec{\tau }\), then we get

$$\begin{aligned}&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}} \left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }), y \right) \right] - {\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}} \left[ {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}), y \right) \right] \le \epsilon . \end{aligned}$$

The above inequality concludes the proof. \(\square\)

This result means that if we can design a class \({\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\) with small enough \(\epsilon\), then minimizing the risk of \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\) is also sufficient to control the adversarial risk. It is relatively easy to obtain, but it has an interesting consequence on the understanding we have of the trade-off between robustness and accuracy. It says that there exists some classes of randomized classifiers for which robustness and standard accuracy may not be at odds, since we can upper-bound the maximal loss of accuracy the model may suffer under attack. This questions previous intuitions developed on deterministic classifiers by Su et al. (2018), Jetley et al. (2018), Tsipras et al. (2019) and Zhang et al. (2019) and advocates for the use of randomization schemes as defenses against adversarial attacks. Note, however, that we did not evade the trade-off between robustness and accuracy, we only showed that with certain hypothesis classes it can be controlled.

4.2 Risks’ gap for robust classifiers w.r.t. \(D_{\beta }\)

We now extend the previous results the Renyi divergence. We show that, for any randomized classifier in \({\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\), we can bound the gap between the risk and the adversarial risk of \(\mathrm{m}\). Using the Renyi divergence, the factor that controls the classifier’s loss of accuracy under attack can be either multiplicative or additive, and depends both on the robustness parameter \(\epsilon\) and on the divergence parameter \(\beta\).

Theorem 4

(Multiplicative risks’ gap for Renyi-robust classifiers) Let \(\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\). Then we have

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p}) \le \left( e^{\epsilon } {\mathcal{R}}(\mathrm{m})\right) ^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Proof

Let \(\mathrm{m}\) be an \((\alpha _{p},\epsilon )\)-robust classifier w.r.t. \(D_{\beta }\), \(({\varvec{x}},y ) \sim {\mathcal{D}}\) and \(\varvec{\tau }\in {\mathcal{X}}\) such that \(\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}\). With the same reasoning as above, and with Proposition 1, we get

$$\begin{aligned} {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }), y \right) = ~&{\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] \\ = ~&{\mathbb{P}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ \hat{y} \ne y\right] \\ \le ~&\left( e^{ D_{\beta }\left( \mathrm{m}({\varvec{x}}+\varvec{\tau }),\mathrm{m}({\varvec{x}}) \right) } {\mathbb{P}}_{\hat{y} \sim \mathrm{m}({\varvec{x}})} \left[ \hat{y} \ne y \right] \right) ^{\frac{\beta -1}{\beta }} \quad ({\text{Prop.}}\,1)\\ = ~&\left( e^{ D_{\beta }\left( \mathrm{m}({\varvec{x}}+\varvec{\tau }),\mathrm{m}({\varvec{x}}) \right) } {\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}})} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] \right) ^{\frac{\beta -1}{\beta }}\\ \le ~&\left( e^{\epsilon } {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}), y \right) \right) ^{\frac{\beta -1}{\beta }} . \end{aligned}$$

Since this holds for any \(({\varvec{x}},y) \sim {\mathcal{D}}\) and any \(\alpha _{p}\) bounded perturbation \(\varvec{\tau }\), we get

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{p}) = ~&{\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathcal{L}}_{0/1}\left( \mathrm{m}( {\varvec{x}}+\varvec{\tau }), y\right) \right] \\ \le ~&{\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ e^{\frac{\beta -1}{\beta }\epsilon } {\mathcal{L}}_{0/1}\left( \mathrm{m}( {\varvec{x}}), y\right) ^{\frac{\beta -1}{\beta }} \right] \\ \le ~&e^{\frac{\beta -1}{\beta }\epsilon } {\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ {\mathcal{L}}_{0/1}\left( \mathrm{m}( {\varvec{x}}), y\right) ^{\frac{\beta -1}{\beta }}\right] . \end{aligned}$$

Finally, using the Jensen inequality, one gets

$$\begin{aligned} \le ~&e^{\frac{\beta -1}{\beta }\epsilon } {\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ {\mathcal{L}}_{0/1}\left( \mathrm{m}( {\varvec{x}}), y\right) \right] ^{\frac{\beta -1}{\beta }} =\left( e^{\epsilon } {\mathcal{R}}(\mathrm{m})\right) ^{\frac{\beta -1}{\beta }} . \end{aligned}$$

The above inequality concludes the proof. \(\square\)

This first result gives a multiplicative bound on the gap between the standard and adversarial risks. This means that if we can design a class \({\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\) with small enough \(\epsilon\), and big enough \(\beta\), then minimizing the risk of any \(\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\) is sufficient to also minimize the adversarial risk of \(\mathrm{m}\). Nevertheless, multiplicative factors are not easy to analyze.

Remark 2

More general bounds can be computed if we assume that for every randomized classifier \(\mathrm{m}\) there exists a convex function \({\mathbf{f}}\) such that for all \({\varvec{x}}\) and \(\varvec{\tau }\) with \(\Vert \varvec{\tau }\Vert _{p}\le \alpha _{p}\), we have \(\mathrm{m}({\varvec{x}})(Z)\le {\mathbf{f}}(\mathrm{m}({\varvec{x}}+\varvec{\tau })(Z))\) for all measurable sets Z. In this case, we get \({\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p}) \le {\mathbf{f}}\left( {\mathcal{R}}(\mathrm{m})\right)\). This has a close link with randomized smoothing (Cohen et al., 2019) and f-differential privacy (Bu et al., 2020) where both try to fit the best possible \({\mathbf{f}}\) using Neyman–Pearson lemma.

The following result provides an additive counterpart to Theorem 4. It gives a control over the loss of accuracy under attack with respect to the robustness parameter \(\epsilon\) and the Shannon entropy of \(\mathrm{m}\).

Theorem 5

(Additive risks’ gap for Renyi-robust classifiers) Let \(\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\), then we have

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{p})-{\mathcal{R}}(\mathrm{m}) \le 1-e^{-\epsilon } {\mathbb{E}}_{{\varvec{x}}\sim {\mathcal{D}}_{\mid {\mathcal{X}}}}\left[ e^{-H(\mathrm{m}({\varvec{x}}))}\right] \end{aligned}$$

where H is the Shannon entropy (i.e. for any \(\rho \in {\mathcal{P}}\left( {\mathcal{Y}}\right) , H(\rho )= -\sum \nolimits _{k \in {\mathcal{Y}}} \rho _{k} \log (\rho _{k})\)) and \({\mathcal{D}}_{\mid {\mathcal{X}}}\) is the marginal distribution of \({\mathcal{D}}\) for \({\mathcal{X}}\).

Proof

Let \(\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\), then

$$\begin{aligned}&{\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p})-{\mathcal{R}}(\mathrm{m}) \\ = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }) , y \right) - {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}) , y \right) \right] . \end{aligned}$$

By definition of the \(0/1\) loss, this amounts to write

$$\begin{aligned} = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathbb{E}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }), \hat{y} \sim \mathrm{m}({\varvec{x}}) }\left[ {\mathbbm{1}}\left( \hat{y}_{\mathrm{adv}}\ne y\right) - {\mathbbm{1}}\left( \hat{y}\ne y\right) \right] \right] \\ \le ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathbb{E}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }), \hat{y} \sim \mathrm{m}({\varvec{x}})}\left[ {\mathbbm{1}}\left( \hat{y}_{\mathrm{adv}}\ne \hat{y}\right) \right] \right] \\ = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})}{\mathbb{P}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }),\hat{y}\sim \mathrm{m}({\varvec{x}})} \left[ \hat{y}_{\mathrm{adv}}\ne \hat{y} \right] \right] \\ = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} 1 - {\mathbb{P}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }),\hat{y}\sim \mathrm{m}({\varvec{x}})} \left[ \hat{y}_{\mathrm{adv}} = \hat{y} \right] \right] \\ = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} 1 - \sum _{i=1}^{K} \mathrm{m}({\varvec{x}})_{i} \times \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{i} \right] . \end{aligned}$$

Now, note that for any \(({\varvec{x}},y) \sim {\mathcal{D}}\) and \(\varvec{\tau }\in {\mathcal{X}}\), by definition of a probability vector in \({\mathcal{P}}\left( {\mathcal{Y}}\right)\), and thanks to Jensen inequality we can write

$$\begin{aligned}&\sum _{i=1}^{K} \mathrm{m}({\varvec{x}})_{i} \times \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{i} \ge \exp \left( \sum _{i=1}^{K} \mathrm{m}({\varvec{x}})_{i} \log \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{i}\right) . \end{aligned}$$

Then by definition of the entropy and the Kullback Leibler divergence we have

$$\begin{aligned}&\exp \left( \sum _{i=1}^{K} \mathrm{m}({\varvec{x}})_{i} \log \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{i}\right) =\exp \big (-D_{1}\left( \mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+ \varvec{\tau }) \right) - H\left( \mathrm{m}({\varvec{x}}) \right) \big ). \end{aligned}$$

Finally, by combining the above inequalities and since \(\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\) we get

$$\begin{aligned}&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})}{\mathbb{P}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }),\hat{y}\sim \mathrm{m}({\varvec{x}})}(\hat{y}_{\mathrm{adv}}\ne \hat{y})\right] \\ \le ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} 1-e^{- D_{1}(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+\varvec{\tau }))-H(\mathrm{m}({\varvec{x}}))} \right] \\ \le ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ 1-e^{-\epsilon -H(\mathrm{m}({\varvec{x}}))} \right] = 1-e^{-\epsilon }{\mathbb{E}}_{{\varvec{x}}\sim {\mathcal{D}}_{\mid {\mathcal{X}}}}\left[ e^{-H(\mathrm{m}({\varvec{x}}))}\right] . \end{aligned}$$

The above inequality concludes the proof. \(\square\)

This result is interesting because it relates the accuracy of \(\mathrm{m}\) with the bound we obtain. In words, when \(\mathrm{m}({\varvec{x}})\) has large entropy (i.e. \(H(\mathrm{m}({\varvec{x}}))\rightarrow \log (K)\)) the output distribution tends towards the uniform distribution; hence \(\epsilon \rightarrow 0\). This means that the classifier is very robust but also completely inaccurate, since it outputs classes uniformly at random. On the opposite, if \(H(\mathrm{m}({\varvec{x}}))\rightarrow 0\), then \(\epsilon \rightarrow \infty\). The classifier may be accurate, but it is not robust anymore (at least according to our definition). Hence we need to find a classifier that achieves a trade-off between robustness and accuracy.

5 Standard generalization gap

In this section we devise generalization gap bounds for randomized classifiers when they are robust according either to the total variation distance or the Renyi divergence. To do so, we upper-bound the Rademacher complexity of the loss space for TV-robust classifiers

$$\begin{aligned} {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }:=\{ ({\varvec{x}},y) \mapsto {\mathcal{L}}_{0/1}({\varvec{h}}({\varvec{x}}),y) \mid \mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) \}. \end{aligned}$$

The empirical Rademacher complexity, first introduced by Bartlett and Mendelson (2002), is one of the standard measures of generalization gap. It is particularly useful to obtain quality bounds for complex classes such as neural networks since it does not depend on the number of parameters in the network contrary to combinatorial notions such as the VC dimension.

Definition 3

(Rademacher complexity) For any class of real-valued functions \({\mathcal{F}} :=\{({\varvec{x}},y)\mapsto {\mathbb{R}} \}\), given a training sample \({\mathcal{S}}=\{({{\varvec{x}}_{\varvec{1}}},y_{1}), \dots ,({{\varvec{x}}_{\varvec{n}}},y_{n})\}\), the empirical Rademacher complexity of \({\mathcal{F}}\) is defined as

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}({\mathcal{F}}):=\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{f \in {\mathcal{F}}} \sum _{i=1}^{n} r_{i} f({{\varvec{x}}_{\varvec{i}}},y_{i}) \right] , \end{aligned}$$

whith \(r_{i}\) i.i.d. drawn from a Rademacher measure, i.e. \({\mathbb{P}}(r_{i} = 1) = {\mathbb{P}}(r_{i} = -1) = \frac{1}{2}\).

The empirical Rademacher complexity measures the uniform convergence rate of the empirical risk towards the risk on the function class \({\mathcal{F}}\) as demonstrated by Mohri et al. (2018). Thanks to this notion of complexity, we can bound with high probability the generalization gap of any hypothesis \(\mathrm{m}\) in a class \({\mathcal{M}}\).

Theorem 6

(Mohri et al., 2018) Let \({\mathcal{M}}\) be a class of possibly randomized classifiers and \({\mathcal{L}}_{{\mathcal{M}}} :=\{ {\mathcal{L}}_{\mathrm{m}} :({\varvec{x}},y) \mapsto {\mathcal{L}}_{0/1}\left( \mathrm{m}(\varvec{x}),y\right) \mid \mathrm{m}\in {\mathcal{M}}\}\). Then for any \(\delta \in (0,1)\), with probability at least \(1-\delta\), the following holds for any \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\),

$$\begin{aligned} {\mathcal{R}}\left( \mathrm{m}\right) - {\mathcal{R}}_{{\mathcal{S}}}\left( \mathrm{m}\right) \le 2 {\mathfrak{R}}_{{\mathcal{S}}}({\mathcal{L}}_{{\mathcal{M}}}) + 3 \sqrt{\frac{\ln (2/\delta )}{2n}} . \end{aligned}$$

5.1 Generalization error for robust classifiers

Accordingly, we want to upper bound the empirical Rademacher complexity of \({\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\), which motivates the following definition.

Definition 4

(\(\alpha\)-covering and external covering number) Let us consider \(( {\mathcal{X}}, \Vert {.}\Vert _{p})\) a vector space equipped with the \(\ell _{p}\) norm, \(B \subset {\mathcal{X}}\) and \(\alpha \ge 0\). Then

  • \(C =\{ {{\varvec{c}}_{\varvec{1}}}, \dots , {{\varvec{c}}_{\varvec{m}}} \}\) is an \(\alpha\)-covering of B for the \(\ell _{p}\) norm if for any \({\varvec{x}}\in B\) there exists \({{\varvec{c}}_{\varvec{i}}} \in C\) such that \(\Vert {{\varvec{x}}- {{\varvec{c}}_{\varvec{i}}}}\Vert _{p} \le \alpha\).

  • The external covering number of B writes \(N\left( B,\Vert {.}\Vert _{p},\alpha \right)\). It is the minimal number of points one needs to build an \(\alpha\)-covering of B for the \(\ell _{p}\) norm.

The covering number is a well-known measure that is often used in statistical learning theory (Shalev-Shwartz & Ben-David, 2014) and asymptotic statistics (Van der Vaart, 2000) to evaluate the complexity of a set of functions. Here we use it to evaluate the number of \(\ell _{p}\) balls we need to cover the training samples, which gives us the following bound on the Rademacher complexity of \({\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\).

Theorem 7

(Rademacher complexity for TV-robust classifiers) Let \({\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\) be the loss function class associated with \({\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\). Then, for any \({\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}), \ldots , ({{\varvec{x}}_{\varvec{n}}},y_{n})\}\), the following holds,

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\right) \le \sqrt{\frac{ N \times K }{n}}+\epsilon . \end{aligned}$$

where \(N =N\left( \{{{\varvec{x}}_{\varvec{1}}},\ldots , {{\varvec{x}}_{\varvec{n}}}\}, \Vert {.}\Vert _{p}, \alpha _{p} \right)\) is the \(\alpha _{p}\)-external covering number of the inputs \(\{{{\varvec{x}}_{\varvec{1}}},\ldots , {{\varvec{x}}_{\varvec{n}}}\}\) for the \(\ell _{p}\) norm and \(K = | {\mathcal{Y}}|\) is the number of labels in the classification task.

Proof

We denote \({\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}), \dots , ({{\varvec{x}}_{\varvec{n}}},y_{n})\}\) and \(N=N\left( \{{{\varvec{x}}_{\varvec{1}}},\dots , {{\varvec{x}}_{\varvec{n}}}\}, \Vert {.}\Vert _{p}, \alpha _{p} \right)\). By definition of a covering number, there exists \(C= \{{{\varvec{c}}_{\varvec{1}}} , \dots , {{\varvec{c}}_{\varvec{N}}}\}\) an \(\alpha _{p}\)-covering of \(\{{{\varvec{x}}_{\varvec{1}}},\dots {{\varvec{x}}_{\varvec{n}}}\}\) for the \(\ell _{p}\) norm. Furthermore, for \(j\in \{1,\dots ,N\}\) and \(y \in \{1,\dots ,K\}\), we define

$$\begin{aligned} E_{y,j} = \left\{ i \in \{1,\dots , n\} ~{s.t.}~ y_{i} = y {\text{ and }} \mathop {\mathrm{argmin}}\limits _{l \in \{ 1, \dots , N\}} \Vert {x_{i} - c_{l}}\Vert = j\right\} . \end{aligned}$$

We also denote \(E_{j} = \mathop {\cup }\nolimits _{y \in [K]} E_{y,j}\). Finally, we denote \({\mathcal{L}}_{\mathrm{m}} :({\varvec{x}},y) \mapsto {\mathcal{L}}_{0/1}\left( \mathrm{m}(\varvec{x}),y\right)\). Then, by definition of the empirical Rademacher complexity, we can write

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\right) = ~&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{i=1}^{n} r_{i} {\mathcal{L}}_{\mathrm{m}}({{\varvec{x}}_{\varvec{i}}}, y_{i})\right] . \end{aligned}$$

Then we can use \(E_{j}\) to write

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\right) = \,&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} {\mathcal{L}}_{\mathrm{m}}({{\varvec{x}}_{\varvec{i}}}, y_{i}) \right] . \end{aligned}$$

Furthermore for any \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\) and \(i\in E_{j}\), there exists \(\epsilon _{i} \in [-\epsilon ,\epsilon ]\) such that: \({\mathcal{L}}_{\mathrm{m}}({{\varvec{x}}_{\varvec{i}}}, y_{i}) = {\mathcal{L}}_{\mathrm{m}}({{\varvec{c}}_{\varvec{j}}},y_{i})+\epsilon _{i}\). Then we have

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \right)&\le \,\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} {\mathcal{L}}_{\mathrm{m}}({{\varvec{c}}_{\varvec{j}}},y_{i}) \right] \\&\quad + \, \frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\epsilon _{i}\in [-\epsilon ,\epsilon ]} \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} \epsilon _{i} \right] . \end{aligned}$$

Let us start by studying the second term. We have

$$\begin{aligned} \frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\epsilon _{i}\in [-\epsilon ,\epsilon ]} \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} \epsilon _{i} \right] =\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\epsilon _{i}\in [-\epsilon ,\epsilon ]} \sum _{i=1}^{n} r_{i} \epsilon _{i} \right] = \frac{1}{n} \sum _{i=1}^{n} \epsilon =\epsilon . \end{aligned}$$

Now looking at the first term. Since \({\mathcal{L}}_{\mathrm{m}}({\varvec{x}},y)\in [0,1]\) for all \(({\varvec{x}},y)\) we have

$$\begin{aligned}&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} {\mathcal{L}}_{\mathrm{m}}({{\varvec{c}}_{\varvec{j}}},y_{i}) \right] \\ = \,&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{j=1}^{N}\sum _{y=1}^{K} {\mathcal{L}}_{\mathrm{m}}({{\varvec{c}}_{\varvec{j}}},y) \sum _{i\in E_{y,j}}r_{i} \right] \\ \le \,&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sum _{j=1}^{N}\sum _{y=1}^{K} \left| { \sum _{i\in E_{y,j}}r_{i}}\right| \right] . \end{aligned}$$

Finally using the Khintchine inequality and the Cauchy Schartz inequality we get

$$\begin{aligned} \frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sum _{j=1}^{N}\sum _{y=1}^{K} \left| { \sum _{i\in E_{y,j}}r_{i}}\right| \right] \le \,&\frac{1}{n} \sum _{j=1}^{N}\sum _{y=1}^{K} \sqrt{\big | {E_{y,j}} \big |} \quad {\text{(Khintchine)}}\\ \le \,&\frac{1}{n} \sqrt{N\times K}\sqrt{\sum _{j=1}^{N}\sum _{y=1}^{K} \big | {E_{y,j}} \big |} \quad {\text{(Cauchy)}} \\ = \,&\sqrt{\frac{N\times K}{n}}. \end{aligned}$$

By combining the upper-bounds we have for each term, we get the expected result,

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\right) \le \sqrt{\frac{N\times K}{n}}+\epsilon . \end{aligned}$$

\(\square\)

Remark 3

Usually, generalization bounds are involving covering numbers on the hypothesis space using Dudley’s entropy integral (Shalev-Shwartz & Ben-David, 2014). In the proposed bound in previous Theorem, it is worth noting that the involved covering number is on the hypothesis space of TV-robust classifiers. This makes a fundamental different between these bounds. Some works (Xu & Mannor, 2012; Petzka et al., 2021) proposed to study the generalization of slowly varying classifiers. The bound they derive are similar to ours, even though they do not apply to the same objects.

The above result means that, if we can cover the n training samples with O(1) balls, then we can bound the generalization gap of any randomized classifier \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\) by \(O\left( \frac{1}{\sqrt{n}}\right) + \epsilon\). Furthermore, a natural corollary of Theorem 7 bounds the Rademacher complexity of the class \({\mathcal{L}}_{{\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right) }\).

Corollary 1

Let \({\mathcal{L}}_{{\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right) }\) be the loss function class associated with \({\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\). Then, for any \({\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}), \ldots , ({{\varvec{x}}_{\varvec{n}}},y_{n})\}\), the following holds,

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right) }\right) \le \sqrt{\frac{ N \times K }{n}}+ \min \left( \frac{3}{2}\left( \sqrt{1 + \frac{4\epsilon }{9}} - 1\right) ^{1/2}, \frac{e^{\epsilon +1} -1}{e^{\epsilon +1} +1}\right) . \end{aligned}$$

where \(N =N\left( \{{{\varvec{x}}_{\varvec{1}}},\ldots , {{\varvec{x}}_{\varvec{n}}}\}, \Vert {.}\Vert _{p}, \alpha _{p} \right)\) is the \(\alpha _{p}\)-external covering number of the inputs \(\{{{\varvec{x}}_{\varvec{1}}},\ldots , {{\varvec{x}}_{\varvec{n}}}\}\) for the \(\ell _{p}\) norm.

Proof

This corollary is an immediate consequence of Theorem 7 and Proposition 2. \(\square\)

Thanks to Theorems 6 and 7 and Corollary 1, one can easily bound the generalization gap of robust randomized classifiers.

5.2 Discussion and dimensionality issues

Xu and Mannor (2012) previously studied generalization bounds for learning algorithms based on their robustness. Although we use very different proof techniques, their results and ours are similar. More precisely, both analyses conclude that robust models generalize well if the training samples have a small covering number. Note, however, that we base our formulation on an adaptive partition of the samples, while the initial paper from Xu and Mannor (2012) only focuses on a fixed partition of the input space. We refer the reader to the discussion section in Xu and Mannor (2012) for more details.

These findings seem to contradict the current line of works on the hardness of generalization in the adversarial setting. In fact, if the ground truth distribution is sufficiently concentrated (e.g. lies in a low dimensional subspace of \({\varvec{x}}\)), a small number of balls can cover \({\mathcal{S}}\) with high probability; hence \(N = O(1)\). This means that we can learn robust classifiers with the same sample complexity as in the standard setting. But if the ground truth distribution is not concentrated enough, the training samples will be far one from another; hence forcing the covering number to be large. In the worse case scenario, we need to cover the whole space \([0,1]^{d}\) giving a covering number \(N = O\left( \frac{1}{(\alpha _{p})^{d} }\right)\) which is exponential in the dimension of the problem.

Therefore, in the worst-case scenario, our bound is in \(O\left( \frac{1}{(\alpha _{p})^{d} \sqrt{n}}\right) + \epsilon\). When \(\alpha _{p}\) is small and the dimension of the problem is high, this bound is too large to give any meaningful insight on the generalization gap of the problem. Therefore, we still need to tighten our analysis to show that robust learning for randomized classifiers is possible in high dimensional spaces.

Remark 4

Note that, we provided a very general result for randomized classifiers under the only assumption that they are robust w.r.t. the total variation distance. Our result applies to any class of classifiers and not only linear classifiers or one-hidden layer neural networks. To build a finer analysis, and to evade the curse of dimensionality, we should consider designing specific sub-classes \({\mathcal{M}}\subset {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\) and adapt the proofs to make the term N smaller in the worst-case scenario.

6 Building robust randomized classifiers

In this section we present a simple yet efficient way to transform a non-robust, non-randomized classifier into a robust randomized classifier. To do so, we use a key property of both the Renyi divergence and the total variation distance called the Data processing inequality. It is a well-known result from information theory which states that “post-processing cannot increase information”. The data processing inequality is as follows.

Theorem 8

(Cover & Thomas, 2012) Let us consider two arbitrary spaces \({\mathcal{Z}}, {\mathcal{Z}}'\), \(\rho ,\rho ' \in {\mathcal{P}}\left( {\mathcal{Z}} \right)\) and \(D \in \{D_{TV},D_{\beta }\}\). Then for any \(\psi : {\mathcal{Z}} \rightarrow {\mathcal{Z}}'\) we have

$$\begin{aligned} D\left( \psi \#\rho , \psi \#\rho ' \right) \le D\left( \rho ,\rho ' \right) , \end{aligned}$$

where \(\psi \#\rho\) denotes the pushforward of distribution \(\rho\) by \(\psi\).

In the context of robustness to adversarial examples, we use the data processing inequality to ease the design of robust randomized classifiers. In particular, let us suppose that we can build a randomized pre-processing \({\mathfrak{p}}: {\mathcal{X}}\rightarrow {\mathcal{P}}\left( {\mathcal{X}}\right)\) such that for any \({\varvec{x}}\in {\mathcal{X}}\) and any \(\alpha _{p}\)-bounded perturbation \(\varvec{\tau }\), we have

$$\begin{aligned} D\left( {\mathfrak{p}}({\varvec{x}}), {\mathfrak{p}}({\varvec{x}}+ \varvec{\tau }) \right) \le \epsilon , {\text{ with }}D \in \{D_{TV}, D_{\beta } \}. \end{aligned}$$
(15)

Then, thanks to the data processing inequality, we can take any deterministic classifier \({\varvec{h}}\) to build an \((\alpha _{p},\epsilon )\) robust classifier w.r.t D defined as \(\mathrm{m}: {\varvec{x}}\mapsto {\varvec{h}}\# {\mathfrak{p}}({\varvec{x}})\). This considerably simplifies the problem of building a class of robust models. Therefore, we want to build \({\mathfrak{p}}\) a randomized pre-processing for which we can control the Renyi divergence and/or total variation distance between two inputs. To do this, we analyze the simple procedure of injecting random noise directly on the image before sending it to a classifier. Since the Renyi divergence and the total variation distances are particularly well suited to the study of Gaussian distributions, we first use this type of noise injection. More precisely, in this section, we focus on a mapping that writes as follows.

$$\begin{aligned} {\mathfrak{p}}: {\varvec{x}}\mapsto {\mathcal{N}}\left( {\varvec{x}}, \varSigma \right) , \end{aligned}$$
(16)

for some given non-degenerate covariance matrix \(\varSigma \in {\mathcal{M}}_{d\times d}({\mathbb{R}})\). We refer the interested reader to Pinot et al. (2019) for more general classes of noise, namely exponential families. Let us now evaluate the maximal variation of Gaussian pre-processing \({\mathfrak{p}}\) when applied to an image \({\varvec{x}}\in {\mathcal{X}}\) with and without perturbation.

Lemma 1

Let \(\beta >1\), \({\varvec{x}}, \varvec{\tau }\in {\mathcal{X}}\) and \(\varSigma \in {\mathcal{M}}_{d \times d}({\mathbb{R}})\) a non-degenerate covariance matrix. Let \(\rho = {\mathcal{N}}({\varvec{x}},\varSigma )\) and \(\rho '={\mathcal{N}}({\varvec{x}}+ \varvec{\tau },\varSigma )\), then \(D_{\beta }(\rho ,\rho ') = \frac{ \beta }{2} \Vert {\varvec{\tau }}\Vert _{\varSigma ^{- 1}}^{2}\).

Thanks to the above lemma, we know how to evaluate the level of Renyi-robustness that a Gaussian noise pre-processing brings to a classifier. Now that we have this result, thanks to Proposition 2, we can also upper-bound the total variation distance between \({\mathcal{N}}({\varvec{x}},\varSigma )\) and \({\mathcal{N}}({\varvec{x}}+ \varvec{\tau },\varSigma )\). But this bound is not always tight. Besides, we can directly evaluate the total variation distance between two Gaussian distributions as follows.

Lemma 2

Let \({\varvec{x}}, {\varvec{x}}' \in {\mathcal{X}}\) and \(\varSigma \in {\mathcal{M}}_{d \times d}({\mathbb{R}})\) a non-degenerate covariance matrix. Let \(\rho = {\mathcal{N}}({\varvec{x}},\varSigma )\) and \(\rho '={\mathcal{N}}( {\varvec{x}}+ \varvec{\tau },\varSigma )\), then \(D_{TV}(\rho ,\rho ') = 2\varPhi \left( \frac{\Vert {\varvec{\tau }}\Vert _{\varSigma ^{-1}}}{2}\right) -1\) with \(\varPhi\) the cumulative density function of the standard Gaussian distribution.

Note that both bounds increase with the Mahalanobis norm of \(\varvec{\tau }\). Furthermore, we see that the greater the entropy of the Gaussian noise we inject, the smaller the distance between distributions. If we simplify the covariance matrix by setting \(\varSigma = \sigma ^{2} I_{d}\), it means that we can build more or less robust randomized classifiers against \(\ell _{2}\) adversaries, depending on \(\sigma\).

Theorem 9

(Robustness of Gaussian pre-processing) Let us consider \(c: {\mathcal{X}} \rightarrow {\mathcal{Y}}\) a deterministic classifier, \(\sigma > 0\) and \({\mathfrak{p}}: {\varvec{x}}\mapsto {\mathcal{N}}({\varvec{x}}, \sigma ^{2} I_{d})\) a pre-processing probabilistic mapping. Then the randomized classifier \(\mathrm{m}:=c \# {\mathfrak{p}}\) is

  • \((\alpha _{2}, \frac{(\alpha _{2})^{2} \beta }{2 \sigma })\)-robust w.r.t. \(D_{\beta }\) against \(\ell _{2}\) adversaries.

  • \((\alpha _{2},\ 2 \varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) - 1)\)-robust w.r.t. \(D_{TV}\) against \(\ell _{2}\) adversaries.

Proof

Let \({\varvec{x}}, \varvec{\tau }\in {\mathcal{X}}\) such that \(\Vert {\varvec{\tau }}\Vert _{2} \le \alpha _{2}\). Thanks to Lemma 1 we have

$$\begin{aligned} D_{\beta }({\mathfrak{p}}({\varvec{x}}),{\mathfrak{p}}({\varvec{x}}+ \varvec{\tau }))&=\frac{\beta }{2}\Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}^{2} = \frac{\beta }{2 \sigma ^{2}}\Vert \varvec{\tau }\Vert _{2}^{2} \le \frac{\beta (\alpha _{2})^{2}}{2 \sigma ^{2}}. \end{aligned}$$

Similarly, thanks to Lemma 2, we get

$$\begin{aligned} D_{TV}({\mathfrak{p}}({\varvec{x}}),{\mathfrak{p}}({\varvec{x}}+ \varvec{\tau }))&= 2\varPhi \left( \frac{\Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}}{2} \right) -1 \le 2\varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) -1. \end{aligned}$$

Finally, from the data processing inequality, i.e.  Theorem 8, we get both

$$\begin{aligned} D_{\beta }(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+ \varvec{\tau }))&\le \frac{\beta (\alpha _{2})^{2}}{2 \sigma ^{2}}, \end{aligned}$$

and

$$\begin{aligned} D_{TV}(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+ \varvec{\tau }))&\le 2\varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) -1. \end{aligned}$$

The above inequalities conclude the proof. \(\square\)

Theorem 9 means that we can build simple noise injection schemes as pre-processing of state-of-the-art image classification models and keep track of the maximal loss of accuracy under attack of the resulting randomized classifier. These results also highlight the profound link between randomized classifiers and randomized smoothing as presented by Cohen et al. (2019). Even though our findings are of different nature, both techniques use the same base mechanism (Gaussian noise injection). Therefore, Gaussian pre-processing is a principled defense method that can be analyzed through several standpoints, including certified robustness and statistical learning theory.

7 Discussion: mode preservation property and randomized smoothing

Even though randomized classifiers have some interesting properties regarding generalization error, we can also study them through the prism of deterministic robustness. Let us for example consider the classifier that outputs the class with the highest probability for \(\mathrm{m}({\varvec{x}})\), a.k.a. the mode of \(\mathrm{m}({\varvec{x}})\). It writes

$$\begin{aligned} {\varvec{h}}_{\mathrm{rob}}: {\varvec{x}}\mapsto \mathop {\mathrm{argmax}}\limits _{k \in [K]} \mathrm{m}({\varvec{x}})_{k} \end{aligned}$$
(17)

Then checking whether \({\varvec{h}}_{\mathrm{rob}}\) is robust boils down to demonstrating that the mode of \(\mathrm{m}({\varvec{x}})\) does not change under perturbation. It turns out that \(D_{TV}\) robust classifiers have this property. We call it the mode preservation property of \({\mathcal{M}}_{TV}(\alpha _{p},\epsilon )\).

Proposition 10

(Mode preservation for \(D_{TV}\)-robust classifiers) Let \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\) be a robust randomized classifier and \({\varvec{x}}\in {\mathcal{X}}\) such that \(\mathrm{m}({\varvec{x}})_{(1)} \ge \mathrm{m}({\varvec{x}})_{(2)} +2 \epsilon\). Then, for any \(\varvec{\tau }\in {\mathcal{X}}\), the following holds,

$$\begin{aligned} \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p} \implies {\varvec{h}}_{{rob}}({\varvec{x}}) = {\varvec{h}}_{{rob}}({\varvec{x}}+ \varvec{\tau }). \end{aligned}$$

Proof

Let \({\varvec{x}},\varvec{\tau }\in {\mathcal{X}}\) such that \(\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}\) and \(\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\) such that

$$\begin{aligned} \mathrm{m}({\varvec{x}})_{(1)} \ge \mathrm{m}({\varvec{x}})_{(2)} +2\epsilon . \end{aligned}$$

By definition of \({\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)\), we have that

$$\begin{aligned} D_{TV}(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+\varvec{\tau }))\le \epsilon . \end{aligned}$$

Then, for all \(k \in \{1, \dots , K\}\) we have

$$\begin{aligned} \mathrm{m}({\varvec{x}})_{k}-\epsilon \le \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}\le \mathrm{m}({\varvec{x}})_{k}+\epsilon . \end{aligned}$$

Let us denote \(k^{*}\) the index of the biggest value in \(\mathrm{m}({\varvec{x}})\), i.e. \(\mathrm{m}({\varvec{x}})_{k^{*}} =\mathrm{m}({\varvec{x}})_{(1)}\). For any \(k\in \{1, \dots , K\}\) with \(k \ne k^{*}\), we have \(\mathrm{m}({\varvec{x}})_{k^{*}} \ge \mathrm{m}({\varvec{x}})_{k} + 2\epsilon\). Finally, for any \(k \ne k^{*}\), we get

$$\begin{aligned} \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k^{*}}\ge \mathrm{m}({\varvec{x}})_{k^{*}}-\epsilon \ge \mathrm{m}({\varvec{x}})_{k}+\epsilon \ge \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}. \end{aligned}$$

Then, \(\mathop {\mathrm{argmax}}\nolimits _{k \in [K]}\mathrm{m}({\varvec{x}})_{k}=\mathop {\mathrm{argmax}}\nolimits _{k \in [K]}\mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}\). This concludes the proof. \(\square\)

Similarly, we can demonstrate a mode preservation property for robust classifiers w.r.t. the Renyi divergence.

Proposition 11

(Mode preservation for Renyi-robust classifiers) Let \(\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\) be a robust randomized classifier and \({\varvec{x}}\in {\mathcal{X}}\) such that

$$\begin{aligned} \left( \mathrm{m}({\varvec{x}})_{(1) }\right) ^{\frac{\beta }{\beta - 1}} \ge \exp \left( (2-\frac{1}{\beta }) \epsilon \right) \left( \mathrm{m}({\varvec{x}})_{(2)}\right) ^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Then, for any \(\varvec{\tau }\in {\mathcal{X}}\), the following holds,

$$\begin{aligned} \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p} \implies {\varvec{h}}_{\mathrm{rob}}({\varvec{x}}) = {\varvec{h}}_{\mathrm{rob}}({\varvec{x}}+ \varvec{\tau }), \end{aligned}$$

where \({\varvec{h}}_{\mathrm{rob}}({\varvec{x}}) :=\mathop {\mathrm{argmax}}\nolimits _{k \in [K]}\mathrm{m}({\varvec{x}})_{k}\).

Proof

Let \({\varvec{x}}, \varvec{\tau }\in {\mathcal{X}}\) such that \(\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}\) and \(\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\) such that

$$\begin{aligned} \left( \mathrm{m}({\varvec{x}})_{(1)}\right) ^{\frac{\beta }{\beta - 1}} \ge \exp \left( \left( 2-\frac{1}{\beta }\right) \epsilon \right) \left( \mathrm{m}({\varvec{x}})_{(2)}\right) ^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Then by definition of \({\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)\), we have

$$\begin{aligned} D_{\beta }(\mathrm{m}({\varvec{x}}),\mathrm{m}( {\varvec{x}}+\varvec{\tau })) \le \epsilon . \end{aligned}$$

Furthermore, by using Proposition 1, for any \(k \in \{1 ,\dots , K \}\) we have

$$\begin{aligned} (*) \mathrm{m}({\varvec{x}})_{k}\le \left( \exp (\epsilon )\mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}\right) ^{\frac{\beta -1}{\beta }}{\text{ and }} (**) \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}\le \left( \exp (\epsilon )\mathrm{m}({\varvec{x}})_{k}\right) ^{\frac{\beta -1}{\beta }} . \end{aligned}$$

Let us denote \(k^{*}\) the index such that \(\mathrm{m}({\varvec{x}})_{k^{*}} =\mathrm{m}({\varvec{x}})_{(1)}\). Then using \((*)\) we get

$$\begin{aligned} \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k^{*}} \ge \exp (-\epsilon )(\mathrm{m}({\varvec{x}})_{k^{*}})^{\frac{\beta }{\beta -1}}. \end{aligned}$$

Furthermore for any \(k \in \{1, \dots ,K\}\) where \(k \ne k^{*}\), we can use the assumption we made on \(\mathrm{m}\) to get

$$\begin{aligned} \exp (-\epsilon )(\mathrm{m}({\varvec{x}})_{k^{*}})^{\frac{\beta }{\beta -1}}\ge \exp \left( \frac{\beta -1}{\beta }\epsilon \right) (\mathrm{m}({\varvec{x}})_{k})^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Finally, using \((**)\) we have

$$\begin{aligned} \exp \left( \frac{\beta -1}{\beta }\epsilon \right) (\mathrm{m}({\varvec{x}})_{k})^{\frac{\beta -1}{\beta }} \ge \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{k}. \end{aligned}$$

The above gives us \(\mathop {\mathrm{argmax}}\nolimits _{k \in [K] }\mathrm{m}({\varvec{x}})_{k}=\mathop {\mathrm{argmax}}\nolimits _{k \in [K] }\mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}\). This concludes the proof. \(\square\)

Coming back to the decomposition in Eq. (5), with the above result, we can bound the risk the adversary induces with non-zero perturbations by the mass of points on which the classifier \({\varvec{h}}_{\mathrm{rob}}\) gives the good response but based on a low probability of success, i.e. with small confidence

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m}) \le {\mathbb{P}}_{({\varvec{x}},y)\sim {\mathcal{D}}} \left[ {\varvec{h}}_{\mathrm{rob}}({\varvec{x}})=y and \mathrm{m}({\varvec{x}})_{(1)} < \mathrm{m}({\varvec{x}})_{(2)} +2 \epsilon \right] . \end{aligned}$$
(18)

This means that the only points on which the adversary may induce misclassification are the points on which \(\mathrm{m}\) already has a high risk. Once more, this says something fundamental about the behavior of robust randomized classifiers. On undefended models, the adversary could change the decision on any point it wanted; now it is limited to changing points on which the classifier is already inaccurate. This considerably mitigates the threat model we should consider. Furthermore, for any deterministic classifier designed as in Eq. (17), we can also bound the maximal loss of accuracy under attack the classifier may suffer. This bound may, however, be harder to evaluate since it now depends on both the classifier and the dataset distribution. The classifier we define in Eq. (17) and the mode preservation property of \(\mathrm{m}\) are closely related to provable defenses based on randomized smoothing. The core idea of randomized smoothing is to take a hypothesis \({\varvec{h}}\) and to build a robust classifier that writes

$$\begin{aligned} c_{rob}: {\varvec{x}}\mapsto \mathop {\mathrm{argmax}}\limits _{k \in [K]}{\mathbb{P}}_{\varvec{z} \sim {\mathcal{N}}\left( 0,\sigma ^{2} I\right) }\left[ {\varvec{h}}({\varvec{x}}+\varvec{z}) = k\right] . \end{aligned}$$
(19)

From a probabilistic point of view, for any input \({\varvec{x}}\), randomized smoothing amounts to output the most probable class of the probability measure \(\mathrm{m}({\varvec{x}}) :={\varvec{h}}\# {\mathcal{N}}\left( {\varvec{x}},\sigma ^{2} I\right)\). Hence, randomized smoothing uses the mode preservation property of \(\mathrm{m}\) to build a provably robust (deterministic) classifier. Therefore, the above results (Proposition 10 and Eq. 18) also hold for provable defenses based on randomized smoothing. Studying randomized smoothing from our point of view could give an interesting new perspective on that method. So far no results have been published on the generalisation gap of this defense in the adversarial setting. We could devise generalization bounds by similarity with our analysis. Furthermore, the probabilistic interpretation stresses that randomized smoothing is somewhat restrictive since it only considers probability measures which are the expectation on a simple noise injection scheme. The mode preservation property explains the behavior of randomized smoothing, but also presents fundamental properties of randomized defenses that could be used to construct more general defense schemes.

8 Numerical validations against \(\ell _{2}\) adversary

To illustrate our findings, we train randomized neural networks with Gaussian pre-processing during training and inference on CIFAR-10 and CIFAR-100. Based on this randomized classifier, we study the impact of randomization on the standard accuracy of the network, and observe the theoretical trade-off between accuracy and robustness.

8.1 Architecture and training procedure

All the neural networks we use in this section are WideResNets (Zagoruyko & Komodakis, 2016) with 28 layers, a widen factor of 10, a dropout factor of 0.3 and LeakyRelu activation with a 0.1 slope. To train an undefended standard classifier we use the following hyper-parameters.Footnote 2

  • Number of Epochs: 200

  • Batch size: 400

  • Loss function: Cross Entropy Loss

  • Optimizer: Stochastic gradient descent algorithm with momentum 0.9, weight decay of \(2\times 10^{-4}\) and a learning rate that decreases during the training as follows:

    $$\begin{aligned} lr = \left\{ \begin{array}{ll} 0.1 &{}\quad {\text{if}} \; 0 \le {\text{epoch}}< 60\\ 0.02 &{}\quad {\text{if}} \; 60 \le {\text{epoch}}< 120\\ 0.004 &{}\quad {\text{if}} \; 120 \le {\text{epoch}}< 160\\ 0.0008 &{}\quad {\text{if}} \; 160 \le {\text{epoch}} < 200.\\ \end{array} \right. \end{aligned}$$

To transform these standard networks into randomized classifiers, we inject noise drawn from Gaussian distributions, each with various standard deviations directly on the image before passing it through the network. Both during training and test, for computational efficiency, we evaluate the performance of the algorithm over a single run for every images; hence no Monte Carlo estimator is used. However, in practice, the test-time accuracy is stable when evaluated over the entire test dataset.

8.2 Results

Figures 1 and 2 show the accuracy and the minimum level of accuracy under attack of our randomized neural network for several levels of injected noise. We can see (Fig. 1) that the precision decreases as the noise intensity grows. In that sense, the noise must be calibrated to preserve both accuracy and robustness against adversarial attacks. This is to be expected, because the greater the entropy of the classifier, the less precise it gets.

Fig. 1
figure 1

Impact of the standard deviation of the Gausian noise on accuracy in a randomized model on CIFAR-10 and CIFAR-100 dataset

Furthermore, when injecting Gaussian noise as a defense mechanism, the resulting randomized network \(\mathrm{m}\) is both \((\alpha _{2}, \frac{(\alpha _{2})^{2}}{2 \sigma })\)-robust w.r.t. \(D_{1}\) and \((\alpha _{2},2 \varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) - 1)\)-robust w.r.t. \(D_{TV}\) against \(\ell _{2}\) adversaries. Therefore thanks to Theorems 3 and 5 we have that

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{2}) - {\mathcal{R}}(\mathrm{m})&\le 2 \varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) - 1, {\text{ and}} \end{aligned}$$
(20)
$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{2}) - {\mathcal{R}}(\mathrm{m})&\le 1-e^{-\frac{(\alpha _{2})^{2}}{2 \sigma }} {\mathbb{E}}_{{\varvec{x}}\sim {\mathcal{D}}_{\mid {\mathcal{X}}}}\left[ e^{-H(\mathrm{m}({\varvec{x}}))}\right] . \end{aligned}$$
(21)
Fig. 2
figure 2

Guaranteed accuracy of different randomized models with Gaussian noise given the \(\ell _{2}\) norm of the adversarial perturbations

Figure 2 illustrates the theoretical lower bound on accuracy under attack [based on the minimum gap between Eqs. (20) and (21)] for different standard deviations. The term in entropy has been estimated using a Monte Carlo method with \(10^{4}\) simulations. The trade-off between accuracy and robustness appears with respect to the noise intensity. With small noises, the accuracy is high, but the guaranteed accuracy drops fast with respect to the magnitude of the adversarial perturbation. Conversely, with bigger noises, the accuracy is lower but decreases slowly with respect to the magnitude of the adversarial perturbation. Overall, we get strong accuracy guarantees against small adversarial perturbations, but when the perturbation is bigger than 0.5 on CIFAR-10 (resp. 0.3 on CIFAR-100, the guarantees are still not sufficient).

9 Lesson learned and future work

This paper brings new contributions to the theory of robustness to adversarial attacks. We provided an in depth analysis of randomized classifier, demonstrating their interest to defend against adversarial attacks. We first defined a notion of robustness for randomized classifiers using probability metrics/divergences, namely the total variation distance and the Renyi divergence. Second, we demonstrated that when a randomized classifier complies with this definition of robustness, we can bound their loss of accuracy under attack. We also studied the generalization properties of this class of functions and gave results indicating that robust randomized classifiers can generalize. Finally, we showed that randomized classifiers have a mode preservation property. This presents a fundamental property of randomized defenses that can be used to explain randomized smoothing from a probabilistic point of view. To support our theoretical findings we presented a simple yet efficient scheme for building robust randomized classifiers. We show that Gaussian noise injection can provide principled robustness against \(\ell _{2}\) adversarial attacks. We ran a set of experiments on CIFAR-10 and CIFAR-100 using Gaussian noise injection with advanced neural network architectures to build accurate models with controlled loss of accuracy under attack.

Future work will focus on studying the combination of randomization with more sophisticated defenses and on devising new tight bounds on the adversarial generalization and the adversarial risk gap of randomized classifiers. Based on the connections we established we randomized smoothing in Sect. 7, we will also aim at devising bounds on the gap between the standard and adversarial risks for this defense. Another interesting direction would be to show that the classifiers based on randomized smoothing have a generalization gap similar to the classes of randomized classifiers we studied.