On the robustness of randomized classifiers to adversarial examples

Pinot, Rafael; Meunier, Laurent; Yger, Florian; Gouy-Pailler, Cédric; Chevaleyre, Yann; Atif, Jamal

doi:10.1007/s10994-022-06216-6

On the robustness of randomized classifiers to adversarial examples

Open access
Published: 02 August 2022

Volume 111, pages 3425–3457, (2022)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

On the robustness of randomized classifiers to adversarial examples

Download PDF

Rafael Pinot ORCID: orcid.org/0000-0001-5372-8300¹^na1,
Laurent Meunier^2,3^na1,
Florian Yger³,
Cédric Gouy-Pailler⁴,
Yann Chevaleyre³ &
…
Jamal Atif³

2544 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This paper investigates the theory of robustness against adversarial attacks. We focus on randomized classifiers (i.e. classifiers that output random variables) and provide a thorough analysis of their behavior through the lens of statistical learning theory and information theory. To this aim, we introduce a new notion of robustness for randomized classifiers, enforcing local Lipschitzness using probability metrics. Equipped with this definition, we make two new contributions. The first one consists in devising a new upper bound on the adversarial generalization gap of randomized classifiers. More precisely, we devise bounds on the generalization gap and the adversarial gap i.e. the gap between the risk and the worst-case risk under attack) of randomized classifiers. The second contribution presents a yet simple but efficient noise injection method to design robust randomized classifiers. We show that our results are applicable to a wide range of machine learning models under mild hypotheses. We further corroborate our findings with experimental results using deep neural networks on standard image datasets, namely CIFAR-10 and CIFAR-100. On these tasks, we manage to design robust models that simultaneously achieve state-of-the-art accuracy (over 0.82 clean accuracy on CIFAR-10) and enjoy guaranteed robust accuracy bounds (0.45 against $\ell _{2}$ adversaries with magnitude 0.5 on CIFAR-10).

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Article Open access 17 April 2023

Explainable artificial intelligence: a comprehensive review

Article 18 November 2021

1 Introduction

In the last few years, there has been a growing concern on adversarial example attacks in machine learning. An adversarial attack refers to a small (humanly imperceptible) change of an input specifically designed to fool a machine learning model. These attacks have recently come to light thanks to works by Biggio et al. (2013) and Szegedy et al. (2014) studying deep neural networks for image classification, although it was an existing topic in spam filter analysis (Dalvi et al., 2004; Lowd & Meek, 2005; Globerson & Roweis, 2006). The vulnerability of state-of-the-art classifiers to these attacks has genuine security implications especially for deep neural networks used in AI-driven technologies such as self-driving cars, as repetitively demonstrated by Sharif et al. (2016), Sitawarin et al. (2018) and Yao et al. (2020). Besides security issues, this shows how little we know about the worst-case behaviors of models the industry uses daily. It is essential for the community to understand the very nature of this phenomenon in order to mitigate the threat.

Accordingly, a large body of works has been trying to design new models that would be less vulnerable to the adversarial setting (Goodfellow et al., 2015; Metzen et al., 2017; Xie et al., 2018; Hu et al., 2019; Verma & Swami, 2019) but most of them were proven (in time) to offer only limited protection against more sophisticated attacks (Carlini & Wagner, 2017; He et al., 2017; Athalye et al., 2018; Croce & Hein, 2020; Tramer et al., 2020). Among the defense strategies, randomization has proven effective in some contexts (Xie et al., 2018; Dhillon et al., 2018; Liu et al., 2018; He et al., 2019). Albeit these significant efforts, randomization techniques lack theoretical arguments. In this paper, we generalize the prior results from Pinot et al. (2019) by studying a general class of randomized classifiers, including randomized neural networks, for which we demonstrate adversarial robustness guarantees and analyze their generalization properties (see Sect. 2.3 for more details).

1.1 Supervised learning for image classification

Let us consider the supervised classification problem with an input space ${\mathcal{X}}$ and an output space ${\mathcal{Y}}$. In the following, w.l.o.g. we will consider ${\mathcal{X}}\subset [-1,1]^{d}$ to be a set of images, and ${\mathcal{Y}}:=[K] :=\{1,\dots ,K\}$ a set of labels describing them. The goal of a supervised machine learning algorithm is to design classifier that maps any image ${\varvec{x}}\in {\mathcal{X}}$ to a label $y \in {\mathcal{Y}}$. To do so, the learner has access to a training sample of n image-label pairs ${\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}),\dots ,({{\varvec{x}}_{\varvec{n}}},y_{n})\}$. Each training pair $({{\varvec{x}}_{\varvec{i}}},y_{i})$ is assumed to be drawn i.i.d. from a ground-truth distribution ${\mathcal{D}}$. To build a classifier, the usual strategy is to select a hypothesis function ${\varvec{h}}: {\mathcal{X}} \rightarrow {\mathcal{Y}}$ from a pre-defined hypothesis class ${\mathcal{H}}$ to minimize the risk with respect to ${\mathcal{D}}$. This risk minimization problem writes

$$\begin{aligned} \inf _{{\varvec{h}}\in {\mathcal{H}}} {\mathcal{R}}({\varvec{h}}) :={\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ {\mathcal{L}}_{0/1}\left( {\varvec{h}}({\varvec{x}}), y\right) \right] , \end{aligned}$$

(1)

where ${\mathcal{L}}_{0/1}$, the $0/1$ loss, outputs 1 when ${\varvec{h}}({\varvec{x}}) \ne y$, and zero otherwise.

In practice, the learner does not have access to the ground-truth distribution; hence it cannot estimate the risk ${\mathcal{R}}({\varvec{h}})$. To find an approximate solution for Problem (1), a learning algorithm solves the empirical risk minimization problem instead. In this case, we simply replace the risk by its empirical counterpart over the training sample ${\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}),\ldots ,({{\varvec{x}}_{\varvec{n}}},y_{n})\}$. The empirical risk minimization problem writes

$$\begin{aligned} \inf _{{\varvec{h}}\in {\mathcal{H}}} {\mathcal{R}}_{{\mathcal{S}}}({\varvec{h}}) :=\frac{1}{n} \sum _{i=1}^{n} {\mathcal{L}}_{0/1}\left( {\varvec{h}}({{\varvec{x}}_{\varvec{i}}}), y_{i}\right) . \end{aligned}$$

(2)

Then, to evaluate how far the selected hypothesis is from the optimum, one wants to upper bound the difference between the risk and the empirical risk of any ${\varvec{h}}\in {\mathcal{H}}$. This difference is known as the generalization gap.

1.2 Classification in the presence of an adversary

Given a hypothesis ${\varvec{h}}\in {\mathcal{H}}$ and a sample $({\varvec{x}},y) \sim {\mathcal{D}}$, the goal of an adversary is to find a perturbation $\varvec{\tau } \in {\mathcal{X}}$ such that the following assertions both hold. First, the perturbation is imperceptible to humans. This means that a human cannot visually distinguish the standard example ${\varvec{x}}$ from the adversarial example ${\varvec{x}}+ \varvec{\tau }$. Second, the perturbation modifies ${\varvec{x}}$ enough to make the classifier misclassify. More formally, the adversary seeks a perturbation $\varvec{\tau } \in {\mathcal{X}}$ such that ${\varvec{h}}({\varvec{x}}+\varvec{\tau }) \ne y$.

Although the notion of imperceptible modification is very natural for humans, it is genuinely hard to formalize. Despite these difficulties, in the image classification setting, a sufficient condition to ensure that the attack will remain undetected is to constrain the perturbation $\varvec{\tau }$ to have a small $\ell _{p}$ norm. This means that for any $p \in [1,\infty ]$, there exists a threshold $\alpha _{p} > 0$ for which any perturbation $\varvec{\tau }$ is imperceptible as soon as $\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}$. It is worth noting that $\ell _{p}$ norms are only surrogates for the perception distance, for which it is still an open question to give a formal definition. In this paper, we only focus on robustness on $\ell _{p}$ norms. The literature on adversarial attacks for image classification usually uses either an $\ell _\infty$ norm akin (Madry et al., 2018) or an $\ell _{2}$ norm akin (Carlini & Wagner, 2017) as a surrogate for imperceptibility. Other authors such as Chen et al. (2018) and Papernot et al. (2016) also used an $\ell _{1}$ norm or an $\ell _{0}$ semi-norm.

To account for adversaries possibly manipulating the input images, one needs to revisit the standard risk minimization by incorporating the adversary in the problem. The goal becomes to minimize the worst-case risk under $\alpha _{p}$-bounded manipulations. We call this problem the adversarial risk minimization. It writes

$$\begin{aligned} \inf _{{\varvec{h}}\in {\mathcal{H}}} {{\mathcal{R}}^{\mathrm{adv}}}({\varvec{h}}; \alpha _{p}) :={\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})}{\mathcal{L}}_{0/1}\left( {\varvec{h}}({\varvec{x}}+ \varvec{\tau }), y\right) \right] , \end{aligned}$$

(3)

where $B_{p}(\alpha _{p}) :=\{ \tau \in {\mathcal{X}}~{s.t.}~ \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}\}$. In this new formulation, the adversary focuses on optimizing the inner maximization, while the learner tries to get the best hypothesis from ${\mathcal{H}}$ “under attack”. By analogy with the standard setting, given n training examples ${\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}),\dots ,({{\varvec{x}}_{\varvec{n}}},y_{n})\}$, we want to find an approximate solution to the adversarial risk minimization by studying its empirical counterpart, the empirical adversarial risk minimization. This optimization problem writes

$$\begin{aligned} \inf _{{\varvec{h}}\in {\mathcal{H}}} {\mathcal{R}}^{\mathrm{adv}}_{{\mathcal{S}}}({\varvec{h}};\alpha _{p}) :=\frac{1}{n}\sum _{i=1}^{n} \sup _{\varvec{\tau }\in B_{p}(\alpha _{p})}{\mathcal{L}}_{0/1}\left( {\varvec{h}}({{\varvec{x}}_{\varvec{i}}} + \varvec{\tau }), y_{i}\right) . \end{aligned}$$

(4)

In the presence of an adversary, two major issues appear in the empirical risk minimization. First, as recently pointed out by Madry et al. (2018), the adversarial generalization error (i.e. the gap between the empirical adversarial risk and the adversarial risk) can be much larger than in the standard setting. Indeed, the adversary makes the problem dependent on the dimension of ${\mathcal{X}}$. Hence, in high-dimension (e.g. for images) one needs much more samples to classify correctly as pointed out by Schmidt et al. (2018) as well as Simon-Gabriel et al. (2019). Moreover, finding an approximate solution to the adversarial risk minimization is not always sufficient. Indeed, recent works by Tsipras et al. (2019) and Zhang et al. (2019) gave theoretical evidence that training a robust model may lead to an increase of its standard risk. Hence finding a good approximation for Problem (3) may lead to a poor solution for Problem (1). Accordingly, it is natural to wonder whether we can find a class of models ${{{\mathcal{H}}}}$ for which we can control both the standard and adversarial risks?

In this paper, we provide answers to the above question by conducting an in depth analysis of a special class of models called randomized classifiers, i.e. classifiers that output random variables instead of labels. Our main contributions summarize as follows.

1.3 Contributions

Our first contribution consists in studying randomized classifiers. By analogy with the deterministic case, we define a notion of robustness for randomized classifiers. This definition amounts to making the classifier locally Lipschitz with respect to the $\ell _{p}$ norm on ${\mathcal{X}}$, and a probability metric on ${\mathcal{Y}}$ (e.g. the total variation distance or the Renyi divergence). More precisely, if we denote D the probability metric at hand, a randomized classifier $\mathrm{m}$ is called $(\alpha _{p}, \epsilon )$-robust w.r.t. D if for any ${\varvec{x}},{\varvec{x}}' \in {\mathcal{X}}$

$$\begin{aligned} \Vert {{\varvec{x}}- {\varvec{x}}'}\Vert _{p} \le \alpha _{p} \implies D(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}')) \le \epsilon . \end{aligned}$$

Denoting ${\mathcal{M}}_{D}(\alpha _{p},\epsilon )$ the class of randomized classifiers that respect this local Lipschitz condition, we present the following results.

1.
If D is either the total variation distance or the Renyi divergence, we show that for any $\mathrm{m}\in {\mathcal{M}}_{D}(\alpha _{p},\epsilon )$, we can upper-bound the gap between the risk and the adversarial risk of $\mathrm{m}$. Notably, if D is the total variation distance, for any $\mathrm{m}\in {\mathcal{M}}_{D}(\alpha _{p},\epsilon )$ we have ${\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p}) - {\mathcal{R}}(\mathrm{m}) \le \epsilon$. Hence, $\epsilon$ controls the maximal trade-off between robust and standard accuracy for locally Lipschitz randomized classifier. We demonstrate similar results when D is the Renyi divergence showing that ${\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p}) - {\mathcal{R}}(\mathrm{m}) \le 1- O\left( e^{-\epsilon }\right)$. This means that, for the class of locally Lipschitz randomized classifiers, solving the risk minimization problem, i.e. Problem (1), gives an approximate solution to the adversarial risk minimization problem, i.e. Problem (3), up to an additive factor that depends on the robustness parameter $\epsilon$.
2.
We devise an upper-bound on the generalization gap of any $\mathrm{m}$ in ${\mathcal{M}}_{D}(\alpha _{p},\epsilon )$. In particular, when D is the total variation distance, we demonstrate that for any $\mathrm{m}\in {\mathcal{M}}_{D}(\alpha _{p},\epsilon )$ we have
$$\begin{aligned} {\mathcal{R}}(\mathrm{m}) - {\mathcal{R}}_{{\mathcal{S}}}(\mathrm{m}) \le O\left( \sqrt{\frac{N \times K}{n}}\right) + \epsilon , \end{aligned}$$
where N is the external $\alpha _{p}$-covering number of the input samples. This means that, when $N/n \underset{n \rightarrow \infty }{\rightarrow } 0$, solving the empirical risk minimization problem, i.e. Problem (2), on ${\mathcal{M}}_{D}(\alpha _{p},\epsilon )$ provides an approximate solution to the risk minimization problem, i.e. Problem (1). Since we can also bound the gap between the adversarial and the standard risk, we can combine the two results to bound the adversarial generalization gap on ${\mathcal{M}}_{D}(\alpha _{p},\epsilon )$. Note however, that this result relies on a strong assumption on ${\mathcal{X}}$ that does not always avoid dimensionality issues. The problem of finding a subclass of ${\mathcal{M}}_{D}(\alpha _{p},\epsilon )$ that provides tighter generalization bounds is an open question.

For our second contribution, we present a practical way to design this class ${\mathcal{M}}(\alpha _{p},\epsilon )$ by using a simple yet efficient noise injection scheme. This allows us to build randomized classifiers from state-of-the-art machine learning models, including deep neural networks. More precisely our contribution is as follows.

1.
Based on information-theoretic properties of the total variation distance and the Renyi divergence (e.g. the data processing inequality) we design a noise injection scheme to turn a state-of-the-art machine learning model into a robust randomized classifier. More formally, let us denote $\varPhi$ the c.d.f. of a standard Gaussian distribution. Let us consider ${\varvec{h}}$ a deterministic hypothesis, we show that the randomized classifier $\mathrm{m}: {\varvec{x}}\mapsto {\varvec{h}}\left( {\varvec{x}}+n\right)$ with $n\sim {\mathcal{N}}(0, \sigma ^{2} I_{d})$ is both $(\alpha _{2}, \frac{(\alpha _{2})^{2}}{2 \sigma })$-robust w.r.t. the Renyi divergence and $(\alpha _{2},\ 2 \varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) - 1)$-robust w.r.t. the total variation distance. Our results on randomized classifiers are applicable to a wide range of machine learning models including deep neural networks.
2.
We further corroborate our theoretical results with experiments using deep neural networks on standard image datasets, namely CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009). These models can simultaneously provide accurate prediction (over 0.82 clean accuracy on CIFAR-10) and reasonable robustness against $\ell _{2}$ adversarial examples (0.45 against $\ell _{2}$ adversaries with magnitude 0.5 on CIFAR-10).

2 Related work

Contrary to other notions such as training corruption, a.k.a. poisoning attacks (Kearns & Li, 1993; Kearns et al., 1994), the theoretical study of adversarial robustness is still in its infancy. So far, empirical observations tend to show that (1) adversarial examples on state-of-the-art models are hard to mitigate and (2) robust training methods give poor generalization performances. Some recent works started to study the problem through the lens of learning theory either to understand the links between robustness and accuracy or to provide bounds on the generalization gap of current learning procedures in the adversarial setting.

2.1 Accuracy versus robustness trade-off

A first line of research (Su et al., 2018; Jetley et al., 2018; Tsipras et al., 2019) suggests that designing robust models might be inconsistent with standard accuracy. These works argue with experiments and toy examples that robust and standard classification are two concurrent problems. Following this line, Zhang et al. (2019) observed that the adversarial risk of any hypothesis ${\varvec{h}}$ decomposes as follows,

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}({\varvec{h}};\alpha _{p}) = {\mathcal{R}}({\varvec{h}}) + {\mathcal{R}}^{\mathrm{adv}}_{>0}({\varvec{h}};\alpha _{p}), \end{aligned}$$

(5)

where ${\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m};\alpha _{p})$ is the amount of risk that the adversary gets with non-null perturbations. Looking at Eq. (5), we realize that minimizing the adversarial risk is not enough to control standard accuracy, as one could only optimize over the second term. This indicates that adversarial risk minimization, i.e. Problem (3), is harder to solve than the standard risk minimization, i.e. Problem (1).

While this indicates that both goals may be difficult be achieve simultaneously, Eq. (5), along with the empirical studies from the literature do not highlight any fundamental trade-off between robustness and accuracy. Moreover, no upper-bound on ${\mathcal{R}}^{\mathrm{adv}}_{>0}({\varvec{h}};\alpha _{p})$ has been demonstrated yet. Hence the questions whether this trade-off exists and can be controlled remain open. In this paper, we provide a rigorous answer to these questions by identifying classes ${\mathcal{M}}_{D}(\alpha _{p},\epsilon )$ of randomized classifiers for which we can upper bound the trade-off term ${\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m};\alpha _{p})$ for any $\mathrm{m}\in {\mathcal{M}}_{D}(\alpha _{p},\epsilon )$. Hence, we can control the maximum loss of accuracy that the model can suffer in the adversarial setting. It also challenges the intuitions developed by previous works (Su et al., 2018; Jetley et al., 2018; Tsipras et al., 2019) and argues in favor of using randomized mechanisms as a defense against adversarial attacks.

2.2 Studying adversarial generalization

To further compare the hardness of the two problems, a recent line of research began to explore the notion of adversarial generalization gap. In this line, Schmidt et al. (2018) presented some first intuitions by studying a simplified binary classification framework where ${\mathcal{D}}$ is a mixture of multi-dimensional Gaussian distributions. In this framework the authors show that without attacks, we only need O(1) training samples to have a small generalization gap. But against an $\ell _{\infty }$ adversary, we need $O(\sqrt{d})$ training samples instead. In the discussion of their work, the authors present the problem of obtaining similar results without making any assumption about the distribution as an open problem.

This issue was recently studied using the Rademacher complexity by Khim and Loh (2018), Yin et al. (2019) and Awasthi et al. (2020). These papers relate the adversarial generalization error of linear classifiers and one-hidden layer neural networks with the dimension of the problem. They show that the adversarial generalization depends on the dimension of the problem. At a first glance, the difficulty of adversarial generalization seems to contradict previous conclusions on the link between robustness and generalization presented by Xu and Mannor (2012). But, as we will discuss in the sequel, these results assume that the input space ${\mathcal{X}}$ can be partitioned in O(1) sub-space in which the classification function has small variations. This assumption may not always hold when dealing with high dimensional input spaces (e.g. images) and very sophisticated classification algorithms (e.g. deep neural networks).

Going further, it should be noted that the generalization gap measures only the difference between empirical and theoretical risks. In practice, the empirical adversarial risk is hard to estimate, since we cannot compute the exact solution to the inner maximization problem. The following question therefore remains open: even if we can set up a learning procedure with a controlled generalization gap, can we give guarantees on the standard and adversarial risks? In this paper, we start answering this question by providing techniques that provably offer both small standard risk and reasonable robustness against adversarial examples (see Sect. 1.3 for more details).

2.3 Defense against adversarial examples based on noise injection

Injecting noise into algorithms to improve train time robustness has been used for ages in detection and signal processing tasks (Zozor & Amblard, 1999; Chapeau-Blondeau & Rousseau, 2004; Mitaim & Kosko, 1998; Grandvalet et al., 1997). It has also been extensively studied in several machine learning and optimization fields, e.g. robust optimization (Ben-Tal et al., 2009) and data augmentation techniques (Perez & Wang, 2017). Concurrently to our work, noise injection techniques have been adopted by the adversarial defense community under the randomized smoothing name. The idea of provable defense through noise injection was first proposed by Lecuyer et al. (2019) and refined by Li et al. (2019), Cohen et al. (2019), Salman et al. (2019) and Yang et al. (2020). The rational behind randomized smoothing is very simple: smooth ${\varvec{h}}$ after training by convolution with a Gaussian measure to build a more stable classifier. Our work belongs to the same line of research, but the nature of our results is different. Randomized smoothing is an ensemble method that builds a deterministic classifier by smoothing a pre-trained model with a Gaussian kernel. This scheme requires to compute a Monte-Carlo estimation of the smoothed classifier; hence requiring many rounds of evaluations to output a deterministic label. Our method is based on randomization and only requires one evaluation round for inferring a label, making the prediction randomized and computationally efficient. While randomized smoothing focuses on the construction of certified defenses, we study the generalization properties of randomized mechanisms both in the standard and the adversarial setting. Our analysis presents the fundamental properties of randomized defenses, including (but not limited to) randomized smoothing (c.f. Sect. 7).

This paper is an extended version of a work by Pinot et al. (2019). Since then, we considerably consolidated our theoretical results as follows.

1.
Pinot et al. (2019) only studied neural networks defended with noise injection techniques, here we study the much more general class of randomized classifiers which includes, but is not limited to neural networks.
2.
We provide a much more detailed analysis of our notion of distributional robustness by presenting an in depth analysis based on the Total variation distance that was missing from (Pinot et al., 2019) (Theorems 1, 5 and 7).
3.
Pinot et al. did not analyze the generalization of randomized classifiers. Here, we study the generalization of these classifiers according to the notion of robustness they respect (Theorem 5 and Corollary 1).
4.
Last but not least, we added an in-depth discussion on the fundamental properties of randomized classifiers, and how they relate to the notion of randomized smoothing (Sect. 7).

3 Definition of risk and robustness for randomized classifiers

In this work, the goal is to analyze how randomized classifiers can solve the problem of classification in the presence of an adversary. Let us start by defining what we mean by randomized classifiers.

Remark 1

(Note on measurability) Through the paper, we assume every spaces ${{\mathcal{Z}}}$ to be associated with a $\sigma$-algebra denoted ${\mathcal{A}}\left( {{\mathcal{Z}}}\right)$. Furthermore, we denote ${\mathcal{P}}\left( {{\mathcal{Z}}} \right)$ the set of probability distributions defined on the measurable space $\left( {{\mathcal{Z}}},{\mathcal{A}}\left( {\mathcal{Z}}\right) \right)$. In the following, for simplicity, we refer to ${\mathcal{A}}\left( {\mathcal{Z}}\right)$ only when necessary.

Definition 1

(Probabilistic mapping) Let ${\mathcal{Z}}$ and ${\mathcal{Z}}'$ be two arbitrary spaces. A probabilistic mapping from ${\mathcal{Z}}$ to ${\mathcal{Z}}'$ is a mapping $\mathrm{m}: {\mathcal{Z}} \rightarrow {\mathcal{P}}\left( {\mathcal{Z}}' \right)$, where ${\mathcal{P}}\left( {\mathcal{Z}}' \right)$ is the space of probability measures on ${\mathcal{Z}}'$. When ${\mathcal{Z}} = {\mathcal{X}}$ and ${\mathcal{Z}}' ={\mathcal{Y}}$, $\mathrm{m}$ is called a randomized classifier. To get a numerical answer for an input ${\varvec{x}}$, we sample $\hat{y} \sim \mathrm{m}( {\varvec{x}})$.

Any mapping can be considered as a probabilistic mapping, whether it explicitly considers randomization or not. In fact, any deterministic classifier can be considered as a randomized one, since it can be characterized by a Dirac measure. Accordingly, the definition of a randomized classifier is fully general and equally consider classifiers with or without randomization scheme.

3.1 Risk and adversarial risk for randomized classifiers

To analyze this new hypothesis class, we can adapt the concepts of risk and adversarial risk for a randomized classifier. The loss function we use is the natural extension of the $0/1$ loss to the randomized regime. Given a randomized classifier $\mathrm{m}$ and a sample $({\varvec{x}},y) \sim {\mathcal{D}}$ it writes

$$\begin{aligned} {\mathcal{L}}_{0/1}(\mathrm{m}({\varvec{x}}),y) := {\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}})} \left[ {\mathbbm{1}} \left\{ \hat{y} \ne y\right\} \right] . \end{aligned}$$

(6)

This loss function evaluates the probability of misclassification of $\mathrm{m}$ on a data sample $({\varvec{x}},y) \sim {\mathcal{D}}$. Accordingly, the risk of $\mathrm{m}$ with respect to ${\mathcal{D}}$ writes

$$\begin{aligned} {\mathcal{R}}(\mathrm{m})&:= {\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ {\mathcal{L}}_{0/1}(\mathrm{m}( {\varvec{x}}),y) \right] . \end{aligned}$$

(7)

Finally, given $\mathrm{m}$ and $({\varvec{x}},y) \sim {\mathcal{D}}$, the adversary seeks a perturbation $\varvec{\tau }\in B_{p}(\alpha _{p})$ that maximizes the expected error of the classifier on ${\varvec{x}}$ (i.e. ${\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ {\mathbbm{1}} \left\{ \hat{y} \ne y\right\} \right]$). Therefore, the adversarial risk of $\mathrm{m}$ under $\alpha _{p}$-bounded perturbations writes

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p})&:= {\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathcal{L}}_{0/1}(\mathrm{m}({\varvec{x}}+ \varvec{\tau }),y) \right] . \end{aligned}$$

(8)

By analogy with the deterministic setting, we denote

$$\begin{aligned} {\mathcal{R}}_{{\mathcal{S}}}\left( \mathrm{m}\right) :=\frac{1}{n}\sum _{i=1}^{n} {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}_{i}), y_{i}\right) , {\text{ and}} \end{aligned}$$

(9)

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}_{{\mathcal{S}}}\left( \mathrm{m}; \alpha _{p} \right) := \frac{1}{n}\sum _{i=1}^{n} \sup _{\varvec{\tau }\in B_{p}(\alpha _{p})}{\mathcal{L}}_{0/1}\left( \mathrm{m}({{\varvec{x}}_{\varvec{i}}} + \varvec{\tau }), y_{i}\right) , \end{aligned}$$

(10)

the empirical risks of $\mathrm{m}$ for a given training sample ${\mathcal{S}}:=\{ ({{\varvec{x}}_{\varvec{1}}},y_{1}), \dots , ({{\varvec{x}}_{\varvec{n}}},y_{n}) \}$.

3.2 Robustness for randomized classifiers

We could define the notion of robustness for a randomized classifier depending on whether it misclassifies any test sample $({\varvec{x}},y) \sim {\mathcal{D}}$. But in practice, neither the adversary nor the model provider have access to the ground-truth distribution ${\mathcal{D}}$. Furthermore, in real-world scenarios, one wants to check before its deployment that the model is robust. Therefore, it is required for the classifier to be stable on the regions of the space where it already classifies correctly. Formally a (deterministic) classifier $c: {\mathcal{X}}\rightarrow {\mathcal{Y}}$ is called robust if for any $({\varvec{x}}, y) \sim {\mathcal{D}}$ such that $c({\varvec{x}}) = y$, and for any $\varvec{\tau }\in {\mathcal{X}}$ one has

$$\begin{aligned} \Vert { \varvec{\tau }}\Vert _{p} \le \alpha _{p} \implies c({\varvec{x}}) = c({\varvec{x}}+ \varvec{\tau }). \end{aligned}$$

(11)

By analogy with this, we define robustness for a randomized classifier below.

Definition 2

(Robustness for a randomized classifier) A randomized classifier $\mathrm{m}: {\mathcal{X}}\rightarrow {\mathcal{P}}({\mathcal{Y}})$ is called $(\alpha _{p},\epsilon )$-robust w.r.t. D if for any ${\varvec{x}}, \varvec{\tau }\in {\mathcal{X}}$, one has

$$\begin{aligned} \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p} \implies D\left( \mathrm{m}({\varvec{x}}) , \mathrm{m}({\varvec{x}}+ \varvec{\tau })\right) \le \epsilon . \end{aligned}$$

Where D is a metric/divergence between two probability measures. Given such a metric/divergence D, we denote ${\mathcal{M}}_{D}(\alpha _{p},\epsilon )$ the set of all randomized classifiers that are $(\alpha _{p},\epsilon )$-robust w.r.t. D.

Note that we did not add the constraint that $\mathrm{m}$ classifies well on $({\varvec{x}},y) \sim {\mathcal{D}}$, since it is already encompassed in the probability distribution itself. If the two probabilities $\mathrm{m}({\varvec{x}})$ and $\mathrm{m}({\varvec{x}}+ \varvec{\tau })$ are close, and if $\mathrm{m}({\varvec{x}})$ outputs y with high probability, then it will be the same for $\mathrm{m}({\varvec{x}}+ \varvec{\tau })$. This formulation naturally raises the question of the choice of the metric D. Any choice of metric/divergence will instantiate a notion of adversarial robustness, and it should be carefully selected. In the present work, we focus our study on the total variation distance and the Renyi divergence. The question whether these metrics/divergences are more appropriate than others remains open but these two divergences are sufficiently general to cover a wide range of other definitions (see “Appendix 2” for more details). Furthermore, these notions of distance comply with both a theoretical analysis (Sect. 5) and practical considerations (Sect. 8).

3.3 Divergence and probability metrics

Let us now recall the definition of total variation distance and Renyi divergence. Let ${\mathcal{Z}}$ be an arbitrary space, and $\rho$, $\rho '$ be two measures in ${\mathcal{P}}({\mathcal{Z}})$.^{Footnote 1} The total variation distance between $\rho$ and $\rho '$ is

$$\begin{aligned} D_{TV}\left( \rho , \rho ' \right) := \sup \limits _{Z \subset {\mathcal{A}} ({\mathcal{Z}})} \vert \rho (Z) - \rho ' (Z) \vert , \end{aligned}$$

(12)

where ${\mathcal{A}}({\mathcal{Z}})$ is the $\sigma$-algebra associated with the set of measures ${\mathcal{P}}({\mathcal{Z}})$. The total variation distance is one of the most commonly used probability metrics. It admits several very simple interpretations, and is a very useful tool in many mathematical fields such as probability theory, Bayesian statistics or optimal transport (Villani, 2003; Robert, 2007; Peyré & Cuturi, 2019). In optimal transport, it can be rewritten as the solution of the Monge-Kantorovich problem with the cost function ${\text{cost}}(\varvec{z},\varvec{z}') ={\mathbbm{1}}\left\{ \varvec{z}\ne \varvec{z}'\right\}$,

$$\begin{aligned} D_{TV}(\rho , \rho ' ) = \inf \int _{{\mathcal{Z}}^{2}}{\mathbbm{1}}\left\{ \varvec{z} \ne \varvec{z}'\right\} d\pi (\varvec{z},\varvec{z}') , \end{aligned}$$

(13)

where the infimum is taken over all joint probability measures $\pi$ in ${\mathcal{P}}\left( {\mathcal{Z}}\times {\mathcal{Z}} \right)$ with marginals $\rho$ and $\rho '$. According to this interpretation, it seems quite natural to consider the total variation distance as a relaxation of the trivial distance on [0, 1] (for deterministic classifiers).

Let us now suppose that $\rho$ and $\rho '$ admit probability density functions g and $g'$ according to a third measure $\nu$. Then the Renyi divergence of order $\beta$ between $\rho$ and $\rho '$ writes

$$\begin{aligned} D_{\beta }\left( \rho , \rho ' \right) :=\frac{1}{\beta -1}\log \int _{\mathcal{Y}} g' (y) \left( \frac{g(y)}{g' (y)}\right) ^{\beta } d\nu (y). \end{aligned}$$

(14)

The Renyi divergence (Rényi, 1961) is a generalized divergence defined for any $\beta$ on the interval $[1,\infty ]$. It equals the Kullback–Leibler divergence when $\beta \rightarrow 1$, and the maximum divergence when $\beta \rightarrow \infty$. It also has the property of being non-decreasing with respect to $\beta$. This divergence is very common in machine learning and Information theory (van Erven & Harremos, 2014), especially in its Kullback-Leibler form as it is widely used as the loss function, i.e. cross entropy, of classification algorithms. In the remaining, we denote ${\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$ the set of $(\alpha _{p},\epsilon )$-robust classifiers w.r.t. $D_{\beta }$.

Let us now give some properties of these divergences that will be useful for our analysis. First we recall the probability preservation property of the Renyi divergence, first presented by Langlois et al. (2014).

Proposition 1

(Langlois et al., 2014) Let $\rho$ and $\rho '$ be two measures in ${\mathcal{P}}({\mathcal{Z}})$. Then for any $Z \in {\mathcal{A}}({\mathcal{Z}})$, the following holds,

$$\begin{aligned} \rho (Z)\le \left( \exp \left( D_{\beta }(\rho , \rho ' )\right) \rho ' (Z)\right) ^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Now thanks to previous works by Gilardoni (2010) and Vajda (1970), we also get the following results relating the total variation distance and the Renyi divergence.

Proposition 2

(Inequality between total variation and Renyi divergence) Let $\rho$ and $\rho '$ be two measures in ${\mathcal{P}}({\mathcal{Z}})$, and $\beta \ge 1$. Then the following holds,

$$\begin{aligned} D_{TV}(\rho , \rho ' ) \le \min \left( \frac{3}{2}\left( \sqrt{1 + \frac{4 D_{\beta }(\rho , \rho ' )}{9}} - 1\right) ^{1/2} ,\ \frac{\exp \left( D_{\beta }(\rho , \rho ' ) +1 \right) -1}{\exp \left( D_{\beta }(\rho , \rho ' ) +1 \right) +1} \right) . \end{aligned}$$

Proof

Thanks to Gilardoni (2010), one has

$$\begin{aligned}&D_{1}(\rho , \rho ') \ge 2D_{TV}(\rho , \rho ')^{2}+ \frac{4D_{TV}(\rho , \rho ')^{4}}{9}. \end{aligned}$$

From which it follows that

$$\begin{aligned}&D_{TV}(\rho , \rho ') \le \frac{3}{2}\left( \sqrt{1 + \frac{4D_{1}(\rho , \rho ')}{9}} - 1\right) ^{1/2}. \end{aligned}$$

Moreover, using inequality from Vajda (1970), one gets

$$\begin{aligned}&D_{1}(\rho , \rho ') +1 \ge \log \left( \frac{1 + D_{TV}(\rho , \rho ')}{1 - D_{TV}(\rho , \rho ')} \right) . \end{aligned}$$

This inequality leads to the following

$$\begin{aligned}&\frac{\exp (D_{1}(\rho , \rho ') +1) -1}{\exp (D_{1}(\rho , \rho ') +1) +1} \ge D_{TV}(\rho , \rho '). \end{aligned}$$

By combining the above inequalities and by monotony of Renyi divergence regarding $\beta$, one obtains the expected result. $\square$

From now on, we denote ${\mathcal{M}}_{TV}\left( \alpha ,\epsilon \right)$ and ${\mathcal{M}}_{\beta }\left( \alpha ,\epsilon \right)$ the set of $(\alpha ,\epsilon )$-robust classifiers respectively for $D_{TV}$ and $D_{\beta }$. The next section gives bounds on the generalization gap in the standard and the adversarial settings for these specific hypothesis classes.

4 Risks’ gap and generalization gap for robust randomized classifiers

As discussed in Sect. 2.1, we can always decompose the adversarial risk of a classifier ${\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p})$ in two terms. First the standard risk ${\mathcal{R}}(\mathrm{m})$ and second the amount of risk the adversary creates with non-zero perturbations ${\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m};\alpha _{p})$. Hence minimizing ${\mathcal{R}}(\mathrm{m})$ can give poor values for ${\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p})$ and vice-versa. In this section, we upper-bound the risks’ gap ${\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m};\alpha _{p})$, i.e. the gap between the risk and the adversarial risk of a robust classifier.

4.1 Risks’ gap for robust classifiers w.r.t. $D_{TV}$

First, let us consider $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$. We can control the loss of accuracy under attack of this classifier with the robustness parameter $\epsilon$.

Theorem 3

(Risk’s gap for robust classifiers w.r.t $D_{TV}$) Let $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$ . Then we have

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{p}) \le {\mathcal{R}}(\mathrm{m}) + \epsilon . \end{aligned}$$

Proof

Let $\mathrm{m}$ be an $(\alpha _{p},\epsilon )$-robust classifier w.r.t. $D_{TV}$ , $({\varvec{x}},y ) \sim {\mathcal{D}}$ and $\varvec{\tau }\in {\mathcal{X}}$ such that $\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}$. By definition of the $0/1$ loss we have

$$\begin{aligned}&{\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }), y \right) = {\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] . \end{aligned}$$

Furthermore, by definition of the total variation distance we have

$$\begin{aligned}&{\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] - {\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}})} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] \le D_{TV}( \mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+\varvec{\tau })). \end{aligned}$$

Since $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$, the above amounts to write

$$\begin{aligned}&{\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }), y \right) - {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}), y \right) \le \epsilon . \end{aligned}$$

Finally, this holds for any $({\varvec{x}},y) \sim {\mathcal{D}}$ and any $\alpha _{p}$ bounded perturbation $\varvec{\tau }$, then we get

$$\begin{aligned}&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}} \left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }), y \right) \right] - {\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}} \left[ {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}), y \right) \right] \le \epsilon . \end{aligned}$$

The above inequality concludes the proof. $\square$

This result means that if we can design a class ${\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$ with small enough $\epsilon$, then minimizing the risk of $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$ is also sufficient to control the adversarial risk. It is relatively easy to obtain, but it has an interesting consequence on the understanding we have of the trade-off between robustness and accuracy. It says that there exists some classes of randomized classifiers for which robustness and standard accuracy may not be at odds, since we can upper-bound the maximal loss of accuracy the model may suffer under attack. This questions previous intuitions developed on deterministic classifiers by Su et al. (2018), Jetley et al. (2018), Tsipras et al. (2019) and Zhang et al. (2019) and advocates for the use of randomization schemes as defenses against adversarial attacks. Note, however, that we did not evade the trade-off between robustness and accuracy, we only showed that with certain hypothesis classes it can be controlled.

4.2 Risks’ gap for robust classifiers w.r.t. $D_{\beta }$

We now extend the previous results the Renyi divergence. We show that, for any randomized classifier in ${\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$, we can bound the gap between the risk and the adversarial risk of $\mathrm{m}$. Using the Renyi divergence, the factor that controls the classifier’s loss of accuracy under attack can be either multiplicative or additive, and depends both on the robustness parameter $\epsilon$ and on the divergence parameter $\beta$.

Theorem 4

(Multiplicative risks’ gap for Renyi-robust classifiers) Let $\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$. Then we have

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p}) \le \left( e^{\epsilon } {\mathcal{R}}(\mathrm{m})\right) ^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Proof

Let $\mathrm{m}$ be an $(\alpha _{p},\epsilon )$-robust classifier w.r.t. $D_{\beta }$, $({\varvec{x}},y ) \sim {\mathcal{D}}$ and $\varvec{\tau }\in {\mathcal{X}}$ such that $\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}$. With the same reasoning as above, and with Proposition 1, we get

$$\begin{aligned} {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }), y \right) = ~&{\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] \\ = ~&{\mathbb{P}}_{\hat{y} \sim \mathrm{m}({\varvec{x}}+ \varvec{\tau })} \left[ \hat{y} \ne y\right] \\ \le ~&\left( e^{ D_{\beta }\left( \mathrm{m}({\varvec{x}}+\varvec{\tau }),\mathrm{m}({\varvec{x}}) \right) } {\mathbb{P}}_{\hat{y} \sim \mathrm{m}({\varvec{x}})} \left[ \hat{y} \ne y \right] \right) ^{\frac{\beta -1}{\beta }} \quad ({\text{Prop.}}\,1)\\ = ~&\left( e^{ D_{\beta }\left( \mathrm{m}({\varvec{x}}+\varvec{\tau }),\mathrm{m}({\varvec{x}}) \right) } {\mathbb{E}}_{\hat{y} \sim \mathrm{m}({\varvec{x}})} \left[ {\mathbbm{1}}\left\{ \hat{y} \ne y\right\} \right] \right) ^{\frac{\beta -1}{\beta }}\\ \le ~&\left( e^{\epsilon } {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}), y \right) \right) ^{\frac{\beta -1}{\beta }} . \end{aligned}$$

Since this holds for any $({\varvec{x}},y) \sim {\mathcal{D}}$ and any $\alpha _{p}$ bounded perturbation $\varvec{\tau }$, we get

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{p}) = ~&{\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathcal{L}}_{0/1}\left( \mathrm{m}( {\varvec{x}}+\varvec{\tau }), y\right) \right] \\ \le ~&{\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ e^{\frac{\beta -1}{\beta }\epsilon } {\mathcal{L}}_{0/1}\left( \mathrm{m}( {\varvec{x}}), y\right) ^{\frac{\beta -1}{\beta }} \right] \\ \le ~&e^{\frac{\beta -1}{\beta }\epsilon } {\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ {\mathcal{L}}_{0/1}\left( \mathrm{m}( {\varvec{x}}), y\right) ^{\frac{\beta -1}{\beta }}\right] . \end{aligned}$$

Finally, using the Jensen inequality, one gets

$$\begin{aligned} \le ~&e^{\frac{\beta -1}{\beta }\epsilon } {\mathbb{E}}_{({\varvec{x}},y)\sim {\mathcal{D}}}\left[ {\mathcal{L}}_{0/1}\left( \mathrm{m}( {\varvec{x}}), y\right) \right] ^{\frac{\beta -1}{\beta }} =\left( e^{\epsilon } {\mathcal{R}}(\mathrm{m})\right) ^{\frac{\beta -1}{\beta }} . \end{aligned}$$

The above inequality concludes the proof. $\square$

This first result gives a multiplicative bound on the gap between the standard and adversarial risks. This means that if we can design a class ${\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$ with small enough $\epsilon$, and big enough $\beta$, then minimizing the risk of any $\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$ is sufficient to also minimize the adversarial risk of $\mathrm{m}$. Nevertheless, multiplicative factors are not easy to analyze.

Remark 2

More general bounds can be computed if we assume that for every randomized classifier $\mathrm{m}$ there exists a convex function ${\mathbf{f}}$ such that for all ${\varvec{x}}$ and $\varvec{\tau }$ with $\Vert \varvec{\tau }\Vert _{p}\le \alpha _{p}$, we have $\mathrm{m}({\varvec{x}})(Z)\le {\mathbf{f}}(\mathrm{m}({\varvec{x}}+\varvec{\tau })(Z))$ for all measurable sets Z. In this case, we get ${\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p}) \le {\mathbf{f}}\left( {\mathcal{R}}(\mathrm{m})\right)$. This has a close link with randomized smoothing (Cohen et al., 2019) and f-differential privacy (Bu et al., 2020) where both try to fit the best possible ${\mathbf{f}}$ using Neyman–Pearson lemma.

The following result provides an additive counterpart to Theorem 4. It gives a control over the loss of accuracy under attack with respect to the robustness parameter $\epsilon$ and the Shannon entropy of $\mathrm{m}$.

Theorem 5

(Additive risks’ gap for Renyi-robust classifiers) Let $\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$, then we have

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{p})-{\mathcal{R}}(\mathrm{m}) \le 1-e^{-\epsilon } {\mathbb{E}}_{{\varvec{x}}\sim {\mathcal{D}}_{\mid {\mathcal{X}}}}\left[ e^{-H(\mathrm{m}({\varvec{x}}))}\right] \end{aligned}$$

where H is the Shannon entropy (i.e. for any $\rho \in {\mathcal{P}}\left( {\mathcal{Y}}\right) , H(\rho )= -\sum \nolimits _{k \in {\mathcal{Y}}} \rho _{k} \log (\rho _{k})$) and ${\mathcal{D}}_{\mid {\mathcal{X}}}$ is the marginal distribution of ${\mathcal{D}}$ for ${\mathcal{X}}$.

Proof

Let $\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$, then

$$\begin{aligned}&{\mathcal{R}}^{\mathrm{adv}}(\mathrm{m};\alpha _{p})-{\mathcal{R}}(\mathrm{m}) \\ = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}+ \varvec{\tau }) , y \right) - {\mathcal{L}}_{0/1}\left( \mathrm{m}({\varvec{x}}) , y \right) \right] . \end{aligned}$$

By definition of the $0/1$ loss, this amounts to write

$$\begin{aligned} = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathbb{E}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }), \hat{y} \sim \mathrm{m}({\varvec{x}}) }\left[ {\mathbbm{1}}\left( \hat{y}_{\mathrm{adv}}\ne y\right) - {\mathbbm{1}}\left( \hat{y}\ne y\right) \right] \right] \\ \le ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} {\mathbb{E}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }), \hat{y} \sim \mathrm{m}({\varvec{x}})}\left[ {\mathbbm{1}}\left( \hat{y}_{\mathrm{adv}}\ne \hat{y}\right) \right] \right] \\ = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})}{\mathbb{P}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }),\hat{y}\sim \mathrm{m}({\varvec{x}})} \left[ \hat{y}_{\mathrm{adv}}\ne \hat{y} \right] \right] \\ = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} 1 - {\mathbb{P}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }),\hat{y}\sim \mathrm{m}({\varvec{x}})} \left[ \hat{y}_{\mathrm{adv}} = \hat{y} \right] \right] \\ = ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} 1 - \sum _{i=1}^{K} \mathrm{m}({\varvec{x}})_{i} \times \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{i} \right] . \end{aligned}$$

Now, note that for any $({\varvec{x}},y) \sim {\mathcal{D}}$ and $\varvec{\tau }\in {\mathcal{X}}$, by definition of a probability vector in ${\mathcal{P}}\left( {\mathcal{Y}}\right)$, and thanks to Jensen inequality we can write

$$\begin{aligned}&\sum _{i=1}^{K} \mathrm{m}({\varvec{x}})_{i} \times \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{i} \ge \exp \left( \sum _{i=1}^{K} \mathrm{m}({\varvec{x}})_{i} \log \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{i}\right) . \end{aligned}$$

Then by definition of the entropy and the Kullback Leibler divergence we have

$$\begin{aligned}&\exp \left( \sum _{i=1}^{K} \mathrm{m}({\varvec{x}})_{i} \log \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{i}\right) =\exp \big (-D_{1}\left( \mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+ \varvec{\tau }) \right) - H\left( \mathrm{m}({\varvec{x}}) \right) \big ). \end{aligned}$$

Finally, by combining the above inequalities and since $\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$ we get

$$\begin{aligned}&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})}{\mathbb{P}}_{\hat{y}_{\mathrm{adv}}\sim \mathrm{m}({\varvec{x}}+\varvec{\tau }),\hat{y}\sim \mathrm{m}({\varvec{x}})}(\hat{y}_{\mathrm{adv}}\ne \hat{y})\right] \\ \le ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ \sup _{ \varvec{\tau }\in B_{p}(\alpha _{p})} 1-e^{- D_{1}(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+\varvec{\tau }))-H(\mathrm{m}({\varvec{x}}))} \right] \\ \le ~&{\mathbb{E}}_{({\varvec{x}},y) \sim {\mathcal{D}}}\left[ 1-e^{-\epsilon -H(\mathrm{m}({\varvec{x}}))} \right] = 1-e^{-\epsilon }{\mathbb{E}}_{{\varvec{x}}\sim {\mathcal{D}}_{\mid {\mathcal{X}}}}\left[ e^{-H(\mathrm{m}({\varvec{x}}))}\right] . \end{aligned}$$

The above inequality concludes the proof. $\square$

This result is interesting because it relates the accuracy of $\mathrm{m}$ with the bound we obtain. In words, when $\mathrm{m}({\varvec{x}})$ has large entropy (i.e. $H(\mathrm{m}({\varvec{x}}))\rightarrow \log (K)$) the output distribution tends towards the uniform distribution; hence $\epsilon \rightarrow 0$. This means that the classifier is very robust but also completely inaccurate, since it outputs classes uniformly at random. On the opposite, if $H(\mathrm{m}({\varvec{x}}))\rightarrow 0$, then $\epsilon \rightarrow \infty$. The classifier may be accurate, but it is not robust anymore (at least according to our definition). Hence we need to find a classifier that achieves a trade-off between robustness and accuracy.

5 Standard generalization gap

In this section we devise generalization gap bounds for randomized classifiers when they are robust according either to the total variation distance or the Renyi divergence. To do so, we upper-bound the Rademacher complexity of the loss space for TV-robust classifiers

$$\begin{aligned} {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }:=\{ ({\varvec{x}},y) \mapsto {\mathcal{L}}_{0/1}({\varvec{h}}({\varvec{x}}),y) \mid \mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) \}. \end{aligned}$$

The empirical Rademacher complexity, first introduced by Bartlett and Mendelson (2002), is one of the standard measures of generalization gap. It is particularly useful to obtain quality bounds for complex classes such as neural networks since it does not depend on the number of parameters in the network contrary to combinatorial notions such as the VC dimension.

Definition 3

(Rademacher complexity) For any class of real-valued functions ${\mathcal{F}} :=\{({\varvec{x}},y)\mapsto {\mathbb{R}} \}$, given a training sample ${\mathcal{S}}=\{({{\varvec{x}}_{\varvec{1}}},y_{1}), \dots ,({{\varvec{x}}_{\varvec{n}}},y_{n})\}$, the empirical Rademacher complexity of ${\mathcal{F}}$ is defined as

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}({\mathcal{F}}):=\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{f \in {\mathcal{F}}} \sum _{i=1}^{n} r_{i} f({{\varvec{x}}_{\varvec{i}}},y_{i}) \right] , \end{aligned}$$

whith $r_{i}$ i.i.d. drawn from a Rademacher measure, i.e. ${\mathbb{P}}(r_{i} = 1) = {\mathbb{P}}(r_{i} = -1) = \frac{1}{2}$.

The empirical Rademacher complexity measures the uniform convergence rate of the empirical risk towards the risk on the function class ${\mathcal{F}}$ as demonstrated by Mohri et al. (2018). Thanks to this notion of complexity, we can bound with high probability the generalization gap of any hypothesis $\mathrm{m}$ in a class ${\mathcal{M}}$.

Theorem 6

(Mohri et al., 2018) Let ${\mathcal{M}}$ be a class of possibly randomized classifiers and ${\mathcal{L}}_{{\mathcal{M}}} :=\{ {\mathcal{L}}_{\mathrm{m}} :({\varvec{x}},y) \mapsto {\mathcal{L}}_{0/1}\left( \mathrm{m}(\varvec{x}),y\right) \mid \mathrm{m}\in {\mathcal{M}}\}$. Then for any $\delta \in (0,1)$, with probability at least $1-\delta$, the following holds for any $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$,

$$\begin{aligned} {\mathcal{R}}\left( \mathrm{m}\right) - {\mathcal{R}}_{{\mathcal{S}}}\left( \mathrm{m}\right) \le 2 {\mathfrak{R}}_{{\mathcal{S}}}({\mathcal{L}}_{{\mathcal{M}}}) + 3 \sqrt{\frac{\ln (2/\delta )}{2n}} . \end{aligned}$$

5.1 Generalization error for robust classifiers

Accordingly, we want to upper bound the empirical Rademacher complexity of ${\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }$, which motivates the following definition.

Definition 4

($\alpha$-covering and external covering number) Let us consider $( {\mathcal{X}}, \Vert {.}\Vert _{p})$ a vector space equipped with the $\ell _{p}$ norm, $B \subset {\mathcal{X}}$ and $\alpha \ge 0$. Then

$C =\{ {{\varvec{c}}_{\varvec{1}}}, \dots , {{\varvec{c}}_{\varvec{m}}} \}$ is an $\alpha$-covering of B for the $\ell _{p}$ norm if for any ${\varvec{x}}\in B$ there exists ${{\varvec{c}}_{\varvec{i}}} \in C$ such that $\Vert {{\varvec{x}}- {{\varvec{c}}_{\varvec{i}}}}\Vert _{p} \le \alpha$.
The external covering number of B writes $N\left( B,\Vert {.}\Vert _{p},\alpha \right)$. It is the minimal number of points one needs to build an $\alpha$-covering of B for the $\ell _{p}$ norm.

The covering number is a well-known measure that is often used in statistical learning theory (Shalev-Shwartz & Ben-David, 2014) and asymptotic statistics (Van der Vaart, 2000) to evaluate the complexity of a set of functions. Here we use it to evaluate the number of $\ell _{p}$ balls we need to cover the training samples, which gives us the following bound on the Rademacher complexity of ${\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }$.

Theorem 7

(Rademacher complexity for TV-robust classifiers) Let ${\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }$ be the loss function class associated with ${\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$. Then, for any ${\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}), \ldots , ({{\varvec{x}}_{\varvec{n}}},y_{n})\}$, the following holds,

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\right) \le \sqrt{\frac{ N \times K }{n}}+\epsilon . \end{aligned}$$

where $N =N\left( \{{{\varvec{x}}_{\varvec{1}}},\ldots , {{\varvec{x}}_{\varvec{n}}}\}, \Vert {.}\Vert _{p}, \alpha _{p} \right)$ is the $\alpha _{p}$-external covering number of the inputs $\{{{\varvec{x}}_{\varvec{1}}},\ldots , {{\varvec{x}}_{\varvec{n}}}\}$ for the $\ell _{p}$ norm and $K = | {\mathcal{Y}}|$ is the number of labels in the classification task.

Proof

We denote ${\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}), \dots , ({{\varvec{x}}_{\varvec{n}}},y_{n})\}$ and $N=N\left( \{{{\varvec{x}}_{\varvec{1}}},\dots , {{\varvec{x}}_{\varvec{n}}}\}, \Vert {.}\Vert _{p}, \alpha _{p} \right)$. By definition of a covering number, there exists $C= \{{{\varvec{c}}_{\varvec{1}}} , \dots , {{\varvec{c}}_{\varvec{N}}}\}$ an $\alpha _{p}$-covering of $\{{{\varvec{x}}_{\varvec{1}}},\dots {{\varvec{x}}_{\varvec{n}}}\}$ for the $\ell _{p}$ norm. Furthermore, for $j\in \{1,\dots ,N\}$ and $y \in \{1,\dots ,K\}$, we define

$$\begin{aligned} E_{y,j} = \left\{ i \in \{1,\dots , n\} ~{s.t.}~ y_{i} = y {\text{ and }} \mathop {\mathrm{argmin}}\limits _{l \in \{ 1, \dots , N\}} \Vert {x_{i} - c_{l}}\Vert = j\right\} . \end{aligned}$$

We also denote $E_{j} = \mathop {\cup }\nolimits _{y \in [K]} E_{y,j}$. Finally, we denote ${\mathcal{L}}_{\mathrm{m}} :({\varvec{x}},y) \mapsto {\mathcal{L}}_{0/1}\left( \mathrm{m}(\varvec{x}),y\right)$. Then, by definition of the empirical Rademacher complexity, we can write

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\right) = ~&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{i=1}^{n} r_{i} {\mathcal{L}}_{\mathrm{m}}({{\varvec{x}}_{\varvec{i}}}, y_{i})\right] . \end{aligned}$$

Then we can use $E_{j}$ to write

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\right) = \,&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} {\mathcal{L}}_{\mathrm{m}}({{\varvec{x}}_{\varvec{i}}}, y_{i}) \right] . \end{aligned}$$

Furthermore for any $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$ and $i\in E_{j}$, there exists $\epsilon _{i} \in [-\epsilon ,\epsilon ]$ such that: ${\mathcal{L}}_{\mathrm{m}}({{\varvec{x}}_{\varvec{i}}}, y_{i}) = {\mathcal{L}}_{\mathrm{m}}({{\varvec{c}}_{\varvec{j}}},y_{i})+\epsilon _{i}$. Then we have

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \right)&\le \,\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} {\mathcal{L}}_{\mathrm{m}}({{\varvec{c}}_{\varvec{j}}},y_{i}) \right] \\&\quad + \, \frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\epsilon _{i}\in [-\epsilon ,\epsilon ]} \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} \epsilon _{i} \right] . \end{aligned}$$

Let us start by studying the second term. We have

$$\begin{aligned} \frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\epsilon _{i}\in [-\epsilon ,\epsilon ]} \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} \epsilon _{i} \right] =\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\epsilon _{i}\in [-\epsilon ,\epsilon ]} \sum _{i=1}^{n} r_{i} \epsilon _{i} \right] = \frac{1}{n} \sum _{i=1}^{n} \epsilon =\epsilon . \end{aligned}$$

Now looking at the first term. Since ${\mathcal{L}}_{\mathrm{m}}({\varvec{x}},y)\in [0,1]$ for all $({\varvec{x}},y)$ we have

$$\begin{aligned}&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{j=1}^{N}\sum _{i\in E_{j}} r_{i} {\mathcal{L}}_{\mathrm{m}}({{\varvec{c}}_{\varvec{j}}},y_{i}) \right] \\ = \,&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sup _{\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) } \sum _{j=1}^{N}\sum _{y=1}^{K} {\mathcal{L}}_{\mathrm{m}}({{\varvec{c}}_{\varvec{j}}},y) \sum _{i\in E_{y,j}}r_{i} \right] \\ \le \,&\frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sum _{j=1}^{N}\sum _{y=1}^{K} \left| { \sum _{i\in E_{y,j}}r_{i}}\right| \right] . \end{aligned}$$

Finally using the Khintchine inequality and the Cauchy Schartz inequality we get

$$\begin{aligned} \frac{1}{n} {\mathbb{E}}_{r_{i}}\left[ \sum _{j=1}^{N}\sum _{y=1}^{K} \left| { \sum _{i\in E_{y,j}}r_{i}}\right| \right] \le \,&\frac{1}{n} \sum _{j=1}^{N}\sum _{y=1}^{K} \sqrt{\big | {E_{y,j}} \big |} \quad {\text{(Khintchine)}}\\ \le \,&\frac{1}{n} \sqrt{N\times K}\sqrt{\sum _{j=1}^{N}\sum _{y=1}^{K} \big | {E_{y,j}} \big |} \quad {\text{(Cauchy)}} \\ = \,&\sqrt{\frac{N\times K}{n}}. \end{aligned}$$

By combining the upper-bounds we have for each term, we get the expected result,

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right) }\right) \le \sqrt{\frac{N\times K}{n}}+\epsilon . \end{aligned}$$

$\square$

Remark 3

Usually, generalization bounds are involving covering numbers on the hypothesis space using Dudley’s entropy integral (Shalev-Shwartz & Ben-David, 2014). In the proposed bound in previous Theorem, it is worth noting that the involved covering number is on the hypothesis space of TV-robust classifiers. This makes a fundamental different between these bounds. Some works (Xu & Mannor, 2012; Petzka et al., 2021) proposed to study the generalization of slowly varying classifiers. The bound they derive are similar to ours, even though they do not apply to the same objects.

The above result means that, if we can cover the n training samples with O(1) balls, then we can bound the generalization gap of any randomized classifier $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$ by $O\left( \frac{1}{\sqrt{n}}\right) + \epsilon$. Furthermore, a natural corollary of Theorem 7 bounds the Rademacher complexity of the class ${\mathcal{L}}_{{\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right) }$.

Corollary 1

Let ${\mathcal{L}}_{{\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right) }$ be the loss function class associated with ${\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$. Then, for any ${\mathcal{S}}:=\{({{\varvec{x}}_{\varvec{1}}},y_{1}), \ldots , ({{\varvec{x}}_{\varvec{n}}},y_{n})\}$, the following holds,

$$\begin{aligned} {\mathfrak{R}}_{{\mathcal{S}}}\left( {\mathcal{L}}_{{\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right) }\right) \le \sqrt{\frac{ N \times K }{n}}+ \min \left( \frac{3}{2}\left( \sqrt{1 + \frac{4\epsilon }{9}} - 1\right) ^{1/2}, \frac{e^{\epsilon +1} -1}{e^{\epsilon +1} +1}\right) . \end{aligned}$$

where $N =N\left( \{{{\varvec{x}}_{\varvec{1}}},\ldots , {{\varvec{x}}_{\varvec{n}}}\}, \Vert {.}\Vert _{p}, \alpha _{p} \right)$ is the $\alpha _{p}$-external covering number of the inputs $\{{{\varvec{x}}_{\varvec{1}}},\ldots , {{\varvec{x}}_{\varvec{n}}}\}$ for the $\ell _{p}$ norm.

Proof

This corollary is an immediate consequence of Theorem 7 and Proposition 2. $\square$

Thanks to Theorems 6 and 7 and Corollary 1, one can easily bound the generalization gap of robust randomized classifiers.

5.2 Discussion and dimensionality issues

Xu and Mannor (2012) previously studied generalization bounds for learning algorithms based on their robustness. Although we use very different proof techniques, their results and ours are similar. More precisely, both analyses conclude that robust models generalize well if the training samples have a small covering number. Note, however, that we base our formulation on an adaptive partition of the samples, while the initial paper from Xu and Mannor (2012) only focuses on a fixed partition of the input space. We refer the reader to the discussion section in Xu and Mannor (2012) for more details.

These findings seem to contradict the current line of works on the hardness of generalization in the adversarial setting. In fact, if the ground truth distribution is sufficiently concentrated (e.g. lies in a low dimensional subspace of ${\varvec{x}}$), a small number of balls can cover ${\mathcal{S}}$ with high probability; hence $N = O(1)$. This means that we can learn robust classifiers with the same sample complexity as in the standard setting. But if the ground truth distribution is not concentrated enough, the training samples will be far one from another; hence forcing the covering number to be large. In the worse case scenario, we need to cover the whole space $[0,1]^{d}$ giving a covering number $N = O\left( \frac{1}{(\alpha _{p})^{d} }\right)$ which is exponential in the dimension of the problem.

Therefore, in the worst-case scenario, our bound is in $O\left( \frac{1}{(\alpha _{p})^{d} \sqrt{n}}\right) + \epsilon$. When $\alpha _{p}$ is small and the dimension of the problem is high, this bound is too large to give any meaningful insight on the generalization gap of the problem. Therefore, we still need to tighten our analysis to show that robust learning for randomized classifiers is possible in high dimensional spaces.

Remark 4

Note that, we provided a very general result for randomized classifiers under the only assumption that they are robust w.r.t. the total variation distance. Our result applies to any class of classifiers and not only linear classifiers or one-hidden layer neural networks. To build a finer analysis, and to evade the curse of dimensionality, we should consider designing specific sub-classes ${\mathcal{M}}\subset {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$ and adapt the proofs to make the term N smaller in the worst-case scenario.

6 Building robust randomized classifiers

In this section we present a simple yet efficient way to transform a non-robust, non-randomized classifier into a robust randomized classifier. To do so, we use a key property of both the Renyi divergence and the total variation distance called the Data processing inequality. It is a well-known result from information theory which states that “post-processing cannot increase information”. The data processing inequality is as follows.

Theorem 8

(Cover & Thomas, 2012) Let us consider two arbitrary spaces ${\mathcal{Z}}, {\mathcal{Z}}'$, $\rho ,\rho ' \in {\mathcal{P}}\left( {\mathcal{Z}} \right)$ and $D \in \{D_{TV},D_{\beta }\}$. Then for any $\psi : {\mathcal{Z}} \rightarrow {\mathcal{Z}}'$ we have

$$\begin{aligned} D\left( \psi \#\rho , \psi \#\rho ' \right) \le D\left( \rho ,\rho ' \right) , \end{aligned}$$

where $\psi \#\rho$ denotes the pushforward of distribution $\rho$ by $\psi$.

In the context of robustness to adversarial examples, we use the data processing inequality to ease the design of robust randomized classifiers. In particular, let us suppose that we can build a randomized pre-processing ${\mathfrak{p}}: {\mathcal{X}}\rightarrow {\mathcal{P}}\left( {\mathcal{X}}\right)$ such that for any ${\varvec{x}}\in {\mathcal{X}}$ and any $\alpha _{p}$-bounded perturbation $\varvec{\tau }$, we have

$$\begin{aligned} D\left( {\mathfrak{p}}({\varvec{x}}), {\mathfrak{p}}({\varvec{x}}+ \varvec{\tau }) \right) \le \epsilon , {\text{ with }}D \in \{D_{TV}, D_{\beta } \}. \end{aligned}$$

(15)

Then, thanks to the data processing inequality, we can take any deterministic classifier ${\varvec{h}}$ to build an $(\alpha _{p},\epsilon )$ robust classifier w.r.t D defined as $\mathrm{m}: {\varvec{x}}\mapsto {\varvec{h}}\# {\mathfrak{p}}({\varvec{x}})$. This considerably simplifies the problem of building a class of robust models. Therefore, we want to build ${\mathfrak{p}}$ a randomized pre-processing for which we can control the Renyi divergence and/or total variation distance between two inputs. To do this, we analyze the simple procedure of injecting random noise directly on the image before sending it to a classifier. Since the Renyi divergence and the total variation distances are particularly well suited to the study of Gaussian distributions, we first use this type of noise injection. More precisely, in this section, we focus on a mapping that writes as follows.

$$\begin{aligned} {\mathfrak{p}}: {\varvec{x}}\mapsto {\mathcal{N}}\left( {\varvec{x}}, \varSigma \right) , \end{aligned}$$

(16)

for some given non-degenerate covariance matrix $\varSigma \in {\mathcal{M}}_{d\times d}({\mathbb{R}})$. We refer the interested reader to Pinot et al. (2019) for more general classes of noise, namely exponential families. Let us now evaluate the maximal variation of Gaussian pre-processing ${\mathfrak{p}}$ when applied to an image ${\varvec{x}}\in {\mathcal{X}}$ with and without perturbation.

Lemma 1

Let $\beta >1$, ${\varvec{x}}, \varvec{\tau }\in {\mathcal{X}}$ and $\varSigma \in {\mathcal{M}}_{d \times d}({\mathbb{R}})$ a non-degenerate covariance matrix. Let $\rho = {\mathcal{N}}({\varvec{x}},\varSigma )$ and $\rho '={\mathcal{N}}({\varvec{x}}+ \varvec{\tau },\varSigma )$, then $D_{\beta }(\rho ,\rho ') = \frac{ \beta }{2} \Vert {\varvec{\tau }}\Vert _{\varSigma ^{- 1}}^{2}$.

Thanks to the above lemma, we know how to evaluate the level of Renyi-robustness that a Gaussian noise pre-processing brings to a classifier. Now that we have this result, thanks to Proposition 2, we can also upper-bound the total variation distance between ${\mathcal{N}}({\varvec{x}},\varSigma )$ and ${\mathcal{N}}({\varvec{x}}+ \varvec{\tau },\varSigma )$. But this bound is not always tight. Besides, we can directly evaluate the total variation distance between two Gaussian distributions as follows.

Lemma 2

Let ${\varvec{x}}, {\varvec{x}}' \in {\mathcal{X}}$ and $\varSigma \in {\mathcal{M}}_{d \times d}({\mathbb{R}})$ a non-degenerate covariance matrix. Let $\rho = {\mathcal{N}}({\varvec{x}},\varSigma )$ and $\rho '={\mathcal{N}}( {\varvec{x}}+ \varvec{\tau },\varSigma )$, then $D_{TV}(\rho ,\rho ') = 2\varPhi \left( \frac{\Vert {\varvec{\tau }}\Vert _{\varSigma ^{-1}}}{2}\right) -1$ with $\varPhi$ the cumulative density function of the standard Gaussian distribution.

Note that both bounds increase with the Mahalanobis norm of $\varvec{\tau }$. Furthermore, we see that the greater the entropy of the Gaussian noise we inject, the smaller the distance between distributions. If we simplify the covariance matrix by setting $\varSigma = \sigma ^{2} I_{d}$, it means that we can build more or less robust randomized classifiers against $\ell _{2}$ adversaries, depending on $\sigma$.

Theorem 9

(Robustness of Gaussian pre-processing) Let us consider $c: {\mathcal{X}} \rightarrow {\mathcal{Y}}$ a deterministic classifier, $\sigma > 0$ and ${\mathfrak{p}}: {\varvec{x}}\mapsto {\mathcal{N}}({\varvec{x}}, \sigma ^{2} I_{d})$ a pre-processing probabilistic mapping. Then the randomized classifier $\mathrm{m}:=c \# {\mathfrak{p}}$ is

$(\alpha _{2}, \frac{(\alpha _{2})^{2} \beta }{2 \sigma })$-robust w.r.t. $D_{\beta }$ against $\ell _{2}$ adversaries.
$(\alpha _{2},\ 2 \varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) - 1)$-robust w.r.t. $D_{TV}$ against $\ell _{2}$ adversaries.

Proof

Let ${\varvec{x}}, \varvec{\tau }\in {\mathcal{X}}$ such that $\Vert {\varvec{\tau }}\Vert _{2} \le \alpha _{2}$. Thanks to Lemma 1 we have

$$\begin{aligned} D_{\beta }({\mathfrak{p}}({\varvec{x}}),{\mathfrak{p}}({\varvec{x}}+ \varvec{\tau }))&=\frac{\beta }{2}\Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}^{2} = \frac{\beta }{2 \sigma ^{2}}\Vert \varvec{\tau }\Vert _{2}^{2} \le \frac{\beta (\alpha _{2})^{2}}{2 \sigma ^{2}}. \end{aligned}$$

Similarly, thanks to Lemma 2, we get

$$\begin{aligned} D_{TV}({\mathfrak{p}}({\varvec{x}}),{\mathfrak{p}}({\varvec{x}}+ \varvec{\tau }))&= 2\varPhi \left( \frac{\Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}}{2} \right) -1 \le 2\varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) -1. \end{aligned}$$

Finally, from the data processing inequality, i.e. Theorem 8, we get both

$$\begin{aligned} D_{\beta }(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+ \varvec{\tau }))&\le \frac{\beta (\alpha _{2})^{2}}{2 \sigma ^{2}}, \end{aligned}$$

and

$$\begin{aligned} D_{TV}(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+ \varvec{\tau }))&\le 2\varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) -1. \end{aligned}$$

The above inequalities conclude the proof. $\square$

Theorem 9 means that we can build simple noise injection schemes as pre-processing of state-of-the-art image classification models and keep track of the maximal loss of accuracy under attack of the resulting randomized classifier. These results also highlight the profound link between randomized classifiers and randomized smoothing as presented by Cohen et al. (2019). Even though our findings are of different nature, both techniques use the same base mechanism (Gaussian noise injection). Therefore, Gaussian pre-processing is a principled defense method that can be analyzed through several standpoints, including certified robustness and statistical learning theory.

7 Discussion: mode preservation property and randomized smoothing

Even though randomized classifiers have some interesting properties regarding generalization error, we can also study them through the prism of deterministic robustness. Let us for example consider the classifier that outputs the class with the highest probability for $\mathrm{m}({\varvec{x}})$, a.k.a. the mode of $\mathrm{m}({\varvec{x}})$. It writes

$$\begin{aligned} {\varvec{h}}_{\mathrm{rob}}: {\varvec{x}}\mapsto \mathop {\mathrm{argmax}}\limits _{k \in [K]} \mathrm{m}({\varvec{x}})_{k} \end{aligned}$$

(17)

Then checking whether ${\varvec{h}}_{\mathrm{rob}}$ is robust boils down to demonstrating that the mode of $\mathrm{m}({\varvec{x}})$ does not change under perturbation. It turns out that $D_{TV}$ robust classifiers have this property. We call it the mode preservation property of ${\mathcal{M}}_{TV}(\alpha _{p},\epsilon )$.

Proposition 10

(Mode preservation for $D_{TV}$-robust classifiers) Let $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$ be a robust randomized classifier and ${\varvec{x}}\in {\mathcal{X}}$ such that $\mathrm{m}({\varvec{x}})_{(1)} \ge \mathrm{m}({\varvec{x}})_{(2)} +2 \epsilon$. Then, for any $\varvec{\tau }\in {\mathcal{X}}$, the following holds,

$$\begin{aligned} \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p} \implies {\varvec{h}}_{{rob}}({\varvec{x}}) = {\varvec{h}}_{{rob}}({\varvec{x}}+ \varvec{\tau }). \end{aligned}$$

Proof

Let ${\varvec{x}},\varvec{\tau }\in {\mathcal{X}}$ such that $\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}$ and $\mathrm{m}\in {\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$ such that

$$\begin{aligned} \mathrm{m}({\varvec{x}})_{(1)} \ge \mathrm{m}({\varvec{x}})_{(2)} +2\epsilon . \end{aligned}$$

By definition of ${\mathcal{M}}_{TV}\left( \alpha _{p},\epsilon \right)$, we have that

$$\begin{aligned} D_{TV}(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+\varvec{\tau }))\le \epsilon . \end{aligned}$$

Then, for all $k \in \{1, \dots , K\}$ we have

$$\begin{aligned} \mathrm{m}({\varvec{x}})_{k}-\epsilon \le \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}\le \mathrm{m}({\varvec{x}})_{k}+\epsilon . \end{aligned}$$

Let us denote $k^{*}$ the index of the biggest value in $\mathrm{m}({\varvec{x}})$, i.e. $\mathrm{m}({\varvec{x}})_{k^{*}} =\mathrm{m}({\varvec{x}})_{(1)}$. For any $k\in \{1, \dots , K\}$ with $k \ne k^{*}$, we have $\mathrm{m}({\varvec{x}})_{k^{*}} \ge \mathrm{m}({\varvec{x}})_{k} + 2\epsilon$. Finally, for any $k \ne k^{*}$, we get

$$\begin{aligned} \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k^{*}}\ge \mathrm{m}({\varvec{x}})_{k^{*}}-\epsilon \ge \mathrm{m}({\varvec{x}})_{k}+\epsilon \ge \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}. \end{aligned}$$

Then, $\mathop {\mathrm{argmax}}\nolimits _{k \in [K]}\mathrm{m}({\varvec{x}})_{k}=\mathop {\mathrm{argmax}}\nolimits _{k \in [K]}\mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}$. This concludes the proof. $\square$

Similarly, we can demonstrate a mode preservation property for robust classifiers w.r.t. the Renyi divergence.

Proposition 11

(Mode preservation for Renyi-robust classifiers) Let $\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$ be a robust randomized classifier and ${\varvec{x}}\in {\mathcal{X}}$ such that

$$\begin{aligned} \left( \mathrm{m}({\varvec{x}})_{(1) }\right) ^{\frac{\beta }{\beta - 1}} \ge \exp \left( (2-\frac{1}{\beta }) \epsilon \right) \left( \mathrm{m}({\varvec{x}})_{(2)}\right) ^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Then, for any $\varvec{\tau }\in {\mathcal{X}}$, the following holds,

$$\begin{aligned} \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p} \implies {\varvec{h}}_{\mathrm{rob}}({\varvec{x}}) = {\varvec{h}}_{\mathrm{rob}}({\varvec{x}}+ \varvec{\tau }), \end{aligned}$$

where ${\varvec{h}}_{\mathrm{rob}}({\varvec{x}}) :=\mathop {\mathrm{argmax}}\nolimits _{k \in [K]}\mathrm{m}({\varvec{x}})_{k}$.

Proof

Let ${\varvec{x}}, \varvec{\tau }\in {\mathcal{X}}$ such that $\Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}$ and $\mathrm{m}\in {\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$ such that

$$\begin{aligned} \left( \mathrm{m}({\varvec{x}})_{(1)}\right) ^{\frac{\beta }{\beta - 1}} \ge \exp \left( \left( 2-\frac{1}{\beta }\right) \epsilon \right) \left( \mathrm{m}({\varvec{x}})_{(2)}\right) ^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Then by definition of ${\mathcal{M}}_{\beta }\left( \alpha _{p},\epsilon \right)$, we have

$$\begin{aligned} D_{\beta }(\mathrm{m}({\varvec{x}}),\mathrm{m}( {\varvec{x}}+\varvec{\tau })) \le \epsilon . \end{aligned}$$

Furthermore, by using Proposition 1, for any $k \in \{1 ,\dots , K \}$ we have

$$\begin{aligned} (*) \mathrm{m}({\varvec{x}})_{k}\le \left( \exp (\epsilon )\mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}\right) ^{\frac{\beta -1}{\beta }}{\text{ and }} (**) \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}\le \left( \exp (\epsilon )\mathrm{m}({\varvec{x}})_{k}\right) ^{\frac{\beta -1}{\beta }} . \end{aligned}$$

Let us denote $k^{*}$ the index such that $\mathrm{m}({\varvec{x}})_{k^{*}} =\mathrm{m}({\varvec{x}})_{(1)}$. Then using $(*)$ we get

$$\begin{aligned} \mathrm{m}({\varvec{x}}+\varvec{\tau })_{k^{*}} \ge \exp (-\epsilon )(\mathrm{m}({\varvec{x}})_{k^{*}})^{\frac{\beta }{\beta -1}}. \end{aligned}$$

Furthermore for any $k \in \{1, \dots ,K\}$ where $k \ne k^{*}$, we can use the assumption we made on $\mathrm{m}$ to get

$$\begin{aligned} \exp (-\epsilon )(\mathrm{m}({\varvec{x}})_{k^{*}})^{\frac{\beta }{\beta -1}}\ge \exp \left( \frac{\beta -1}{\beta }\epsilon \right) (\mathrm{m}({\varvec{x}})_{k})^{\frac{\beta -1}{\beta }}. \end{aligned}$$

Finally, using $(**)$ we have

$$\begin{aligned} \exp \left( \frac{\beta -1}{\beta }\epsilon \right) (\mathrm{m}({\varvec{x}})_{k})^{\frac{\beta -1}{\beta }} \ge \mathrm{m}({\varvec{x}}+ \varvec{\tau })_{k}. \end{aligned}$$

The above gives us $\mathop {\mathrm{argmax}}\nolimits _{k \in [K] }\mathrm{m}({\varvec{x}})_{k}=\mathop {\mathrm{argmax}}\nolimits _{k \in [K] }\mathrm{m}({\varvec{x}}+\varvec{\tau })_{k}$. This concludes the proof. $\square$

Coming back to the decomposition in Eq. (5), with the above result, we can bound the risk the adversary induces with non-zero perturbations by the mass of points on which the classifier ${\varvec{h}}_{\mathrm{rob}}$ gives the good response but based on a low probability of success, i.e. with small confidence

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}_{>0}(\mathrm{m}) \le {\mathbb{P}}_{({\varvec{x}},y)\sim {\mathcal{D}}} \left[ {\varvec{h}}_{\mathrm{rob}}({\varvec{x}})=y and \mathrm{m}({\varvec{x}})_{(1)} < \mathrm{m}({\varvec{x}})_{(2)} +2 \epsilon \right] . \end{aligned}$$

(18)

This means that the only points on which the adversary may induce misclassification are the points on which $\mathrm{m}$ already has a high risk. Once more, this says something fundamental about the behavior of robust randomized classifiers. On undefended models, the adversary could change the decision on any point it wanted; now it is limited to changing points on which the classifier is already inaccurate. This considerably mitigates the threat model we should consider. Furthermore, for any deterministic classifier designed as in Eq. (17), we can also bound the maximal loss of accuracy under attack the classifier may suffer. This bound may, however, be harder to evaluate since it now depends on both the classifier and the dataset distribution. The classifier we define in Eq. (17) and the mode preservation property of $\mathrm{m}$ are closely related to provable defenses based on randomized smoothing. The core idea of randomized smoothing is to take a hypothesis ${\varvec{h}}$ and to build a robust classifier that writes

$$\begin{aligned} c_{rob}: {\varvec{x}}\mapsto \mathop {\mathrm{argmax}}\limits _{k \in [K]}{\mathbb{P}}_{\varvec{z} \sim {\mathcal{N}}\left( 0,\sigma ^{2} I\right) }\left[ {\varvec{h}}({\varvec{x}}+\varvec{z}) = k\right] . \end{aligned}$$

(19)

From a probabilistic point of view, for any input ${\varvec{x}}$, randomized smoothing amounts to output the most probable class of the probability measure $\mathrm{m}({\varvec{x}}) :={\varvec{h}}\# {\mathcal{N}}\left( {\varvec{x}},\sigma ^{2} I\right)$. Hence, randomized smoothing uses the mode preservation property of $\mathrm{m}$ to build a provably robust (deterministic) classifier. Therefore, the above results (Proposition 10 and Eq. 18) also hold for provable defenses based on randomized smoothing. Studying randomized smoothing from our point of view could give an interesting new perspective on that method. So far no results have been published on the generalisation gap of this defense in the adversarial setting. We could devise generalization bounds by similarity with our analysis. Furthermore, the probabilistic interpretation stresses that randomized smoothing is somewhat restrictive since it only considers probability measures which are the expectation on a simple noise injection scheme. The mode preservation property explains the behavior of randomized smoothing, but also presents fundamental properties of randomized defenses that could be used to construct more general defense schemes.

8 Numerical validations against $\ell _{2}$ adversary

To illustrate our findings, we train randomized neural networks with Gaussian pre-processing during training and inference on CIFAR-10 and CIFAR-100. Based on this randomized classifier, we study the impact of randomization on the standard accuracy of the network, and observe the theoretical trade-off between accuracy and robustness.

8.1 Architecture and training procedure

All the neural networks we use in this section are WideResNets (Zagoruyko & Komodakis, 2016) with 28 layers, a widen factor of 10, a dropout factor of 0.3 and LeakyRelu activation with a 0.1 slope. To train an undefended standard classifier we use the following hyper-parameters.^{Footnote 2}

Number of Epochs: 200
Batch size: 400
Loss function: Cross Entropy Loss
Optimizer: Stochastic gradient descent algorithm with momentum 0.9, weight decay of $2\times 10^{-4}$ and a learning rate that decreases during the training as follows:
$$\begin{aligned} lr = \left\{ \begin{array}{ll} 0.1 &{}\quad {\text{if}} \; 0 \le {\text{epoch}}< 60\\ 0.02 &{}\quad {\text{if}} \; 60 \le {\text{epoch}}< 120\\ 0.004 &{}\quad {\text{if}} \; 120 \le {\text{epoch}}< 160\\ 0.0008 &{}\quad {\text{if}} \; 160 \le {\text{epoch}} < 200.\\ \end{array} \right. \end{aligned}$$

To transform these standard networks into randomized classifiers, we inject noise drawn from Gaussian distributions, each with various standard deviations directly on the image before passing it through the network. Both during training and test, for computational efficiency, we evaluate the performance of the algorithm over a single run for every images; hence no Monte Carlo estimator is used. However, in practice, the test-time accuracy is stable when evaluated over the entire test dataset.

8.2 Results

Figures 1 and 2 show the accuracy and the minimum level of accuracy under attack of our randomized neural network for several levels of injected noise. We can see (Fig. 1) that the precision decreases as the noise intensity grows. In that sense, the noise must be calibrated to preserve both accuracy and robustness against adversarial attacks. This is to be expected, because the greater the entropy of the classifier, the less precise it gets.

Furthermore, when injecting Gaussian noise as a defense mechanism, the resulting randomized network $\mathrm{m}$ is both $(\alpha _{2}, \frac{(\alpha _{2})^{2}}{2 \sigma })$-robust w.r.t. $D_{1}$ and $(\alpha _{2},2 \varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) - 1)$-robust w.r.t. $D_{TV}$ against $\ell _{2}$ adversaries. Therefore thanks to Theorems 3 and 5 we have that

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{2}) - {\mathcal{R}}(\mathrm{m})&\le 2 \varPhi \left( \frac{\alpha _{2}}{2 \sigma } \right) - 1, {\text{ and}} \end{aligned}$$

(20)

$$\begin{aligned} {\mathcal{R}}^{\mathrm{adv}}(\mathrm{m}; \alpha _{2}) - {\mathcal{R}}(\mathrm{m})&\le 1-e^{-\frac{(\alpha _{2})^{2}}{2 \sigma }} {\mathbb{E}}_{{\varvec{x}}\sim {\mathcal{D}}_{\mid {\mathcal{X}}}}\left[ e^{-H(\mathrm{m}({\varvec{x}}))}\right] . \end{aligned}$$

(21)

Figure 2 illustrates the theoretical lower bound on accuracy under attack [based on the minimum gap between Eqs. (20) and (21)] for different standard deviations. The term in entropy has been estimated using a Monte Carlo method with $10^{4}$ simulations. The trade-off between accuracy and robustness appears with respect to the noise intensity. With small noises, the accuracy is high, but the guaranteed accuracy drops fast with respect to the magnitude of the adversarial perturbation. Conversely, with bigger noises, the accuracy is lower but decreases slowly with respect to the magnitude of the adversarial perturbation. Overall, we get strong accuracy guarantees against small adversarial perturbations, but when the perturbation is bigger than 0.5 on CIFAR-10 (resp. 0.3 on CIFAR-100, the guarantees are still not sufficient).

9 Lesson learned and future work

This paper brings new contributions to the theory of robustness to adversarial attacks. We provided an in depth analysis of randomized classifier, demonstrating their interest to defend against adversarial attacks. We first defined a notion of robustness for randomized classifiers using probability metrics/divergences, namely the total variation distance and the Renyi divergence. Second, we demonstrated that when a randomized classifier complies with this definition of robustness, we can bound their loss of accuracy under attack. We also studied the generalization properties of this class of functions and gave results indicating that robust randomized classifiers can generalize. Finally, we showed that randomized classifiers have a mode preservation property. This presents a fundamental property of randomized defenses that can be used to explain randomized smoothing from a probabilistic point of view. To support our theoretical findings we presented a simple yet efficient scheme for building robust randomized classifiers. We show that Gaussian noise injection can provide principled robustness against $\ell _{2}$ adversarial attacks. We ran a set of experiments on CIFAR-10 and CIFAR-100 using Gaussian noise injection with advanced neural network architectures to build accurate models with controlled loss of accuracy under attack.

Future work will focus on studying the combination of randomization with more sophisticated defenses and on devising new tight bounds on the adversarial generalization and the adversarial risk gap of randomized classifiers. Based on the connections we established we randomized smoothing in Sect. 7, we will also aim at devising bounds on the gap between the standard and adversarial risks for this defense. Another interesting direction would be to show that the classifiers based on randomized smoothing have a generalization gap similar to the classes of randomized classifiers we studied.

Data availability

We only use public (benchmark) dataset and our code is accessible online.

Code availability

For direct access to the implementation, one can refer to the following Github repository https://github.com/MILES-PSL/Adversarial-Robustness-Through-Randomization.

Notes

Recall from Definition 1 that ${\mathcal{P}}({\mathcal{Z}})$ is the set of probability measures on ${\mathcal{Z}}$.
Reusable code can be found in the following repository: https://github.com/MILES-PSL/Adversarial-Robustness-Through-Randomization.

References

Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In J. Dy & A. Krause (Eds.), Proceedings of the 35th international conference on machine learning, proceedings of machine learning research (vol. 8, pp. 274–283). Stockholm Sweden: Stockholmsmässan.
Awasthi, P., Frank, N., & Mohri, M. (2020). Adversarial learning guarantees for linear hypotheses and neural networks. In H. D. III & A. Singh (Eds.), Proceedings of the 37th international conference on machine learning, proceedings of machine learning research (vol. 119, pp. 431–441). PMLR.
Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.
MathSciNet MATH Google Scholar
Ben-Tal, A., El Ghaoui, L., & Nemirovski, A. (2009). Robust optimization (Vol. 28). Princeton University Press.
Book Google Scholar
Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., & Roli, F. (2013). Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases (pp. 387–402). Springer.
Bu, Z., Dong, J., Long, Q., & Weijie, S. (2020). Deep learning with Gaussian differential privacy. Harvard Data Science Review. https://doi.org/10.1162/99608f92.cfc5dd25.
Article Google Scholar
Carlini, N., & Wagner, D. (2017). Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM workshop on artificial intelligence and security (pp. 3–14).
Chapeau-Blondeau, F., & Rousseau, D. (2004). Noise-enhanced performance for an optimal Bayesian estimator. IEEE Transactions on Signal Processing, 52(5), 1327–1334. https://doi.org/10.1109/TSP.2004.826176.
Article MathSciNet MATH Google Scholar
Chen, P. Y., Sharma, Y., Zhang, H., Yi, J., & Hsieh, C. J. (2018). Ead: Elastic-net attacks to deep neural networks via adversarial examples. In AAAI.
Cohen, J., Rosenfeld, E., & Kolter, Z. (2019). Certified adversarial robustness via randomized smoothing. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th international conference on machine learning, proceedings of machine learning research (vol. 97, pp. 1310–1320). PMLR.
Cover, T. M., & Thomas, J. A. (2012). Elements of information theory. Wiley.
MATH Google Scholar
Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, virtual event, proceedings of machine learning research (vol. 119, pp. 2206–2216). PMLR.
Dalvi, N., Domingos, P., Sanghai, S., & Verma, D. (2004). Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 99–108).
Dhillon, G. S., Azizzadenesheli, K., Lipton, Z. C., Bernstein, J., Kossaifi, J., Khanna, A., & Anandkumar, A. (2018). Stochastic activation pruning for robust adversarial defense. In 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings. OpenReview.net.
Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International Statistical Review/Revue Internationale de Statistique, 70(3), 419–435.
MATH Google Scholar
Gilardoni, G. L. (2010). On Pinsker’s and Vajda’s type inequalities for Csiszár’s$f$-divergences. IEEE Transactions on Information Theory, 56(11), 5377–5386. https://doi.org/10.1109/TIT.2010.2068710.
Article MathSciNet MATH Google Scholar
Globerson, A., & Roweis, S. (2006). Nightmare at test time: Robust learning by feature deletion. In Proceedings of the 23rd international conference on Machine learning (pp. 353–360).
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In Y. Bengio & Y. LeCun (Eds.), 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings.
Grandvalet, Y., Canu, S., & Boucheron, S. (1997). Noise injection: Theoretical prospects. Neural Computation, 9(5), 1093–1108.
Article Google Scholar
He, W., Wei, J., Chen, X., Carlini, N. & Song, D. (2017). Adversarial example defense: Ensembles of weak defenses are not strong. In 11th USENIX Workshop on Offensive Technologies (WOOT 17).
He, Z., Rakin, A. S., & Fan, D. (2019). Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 588–597).
Hu, S., Yu, T., Guo, C., Chao, W. L. & Weinberger, K. Q. (2019). A new defense against adversarial images: Turning a weakness into a strength. In Advances in neural information processing systems (pp. 1635–1646).
Jetley, S., Lord, N. A., & Torr, P. H. (2018). With friends like these, who needs adversaries? In Proceedings of the 32nd international conference on neural information processing systems, NIPS’18, Red Hook, NY, USA (pp. 10772–10782). Curran Associates Inc.
Kearns, M., & Li, M. (1993). Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4), 807–837.
Article MathSciNet Google Scholar
Kearns, M. J., Schapire, R. E., & Sellie, L. M. (1994). Toward efficient agnostic learning. Machine Learning, 17(2–3), 115–141.
MATH Google Scholar
Khim, J., & Loh, P. L. (2018). Adversarial risk bounds for binary classification via function transformation. arXiv preprint arXiv:1810.09519 .
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Citeseer: Technical report.
Langlois, A., Stehlé, D., & Steinfeld, R. (2014). Gghlite: More efficient multilinear maps from ideal lattices. In Annual international conference on the theory and applications of cryptographic techniques (pp. 239–256). Springer.
Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., & Jana, S. (2019). Certified robustness to adversarial examples with differential privacy. In 2019 IEEE symposium on security and privacy (SP) (pp. 656–672). IEEE.
Li, B., Chen, C., Wang, W., & Carin, L. (2019). Certified adversarial robustness with additive noise. In Advances in neural information processing systems (pp. 9464–9474).
Liu, X., Cheng, M., Zhang, H., & Hsieh, C. J. (2018). Towards robust neural networks via random self-ensemble. In European conference on computer vision (pp. 381–397). Springer.
Lowd, D., & Meek, C. (2005). Adversarial learning. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 641–647).
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings. OpenReview.net.
Metzen, J. H., Genewein, T., Fischer, V. & Bischoff, B. (2017). On detecting adversarial perturbations. In 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings. OpenReview.net.
Mitaim, S., & Kosko, B. (1998). Adaptive stochastic resonance. Proceedings of the IEEE, 86(11), 2152–2183.
Article Google Scholar
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning.
Papernot, N., McDaniel, P., Wu, X., Jha, S., & Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP) (pp. 582–597). IEEE.
Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621.
Petzka, H., Kamp, M., Adilova, L., Sminchisescu, C., & Boley, M. (2021). Relative flatness and generalization. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in neural information processing systems (Vol. 34, pp. 18420–18432). Curran Associates Inc.
Google Scholar
Peyré, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning 11(5-6): 355–607.
Pinot, R., Meunier, L., Araujo, A., Kashima, H., Yger, F., Gouy-Pailler, C., & Atif, J. (2019). Theoretical evidence for adversarial robustness through randomization. In Advances in neural information processing systems (pp. 11838–11848).
Rényi, A. (1961). On measures of entropy and information. Hungarian Academy of Sciences Budapest Hungary: Technical report.
Robert, C. (2007). The Bayesian choice: From decision-theoretic foundations to computational implementation. Springer.
MATH Google Scholar
Salman, H., Li, J., Razenshteyn, I., Zhang, P., Zhang, H., Bubeck, S., & Yang, G. (2019). Provably robust deep learning via adversarially trained smoothed classifiers. In Advances in neural information processing systems (pp. 11289–11300).
Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., & Madry, A. (2018). Adversarially robust generalization requires more data. In Advances in neural information processing systems (pp. 5014–5026).
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.
Book Google Scholar
Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M. K. (2016). Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (pp. 1528–1540).
Simon-Gabriel, C. J., Ollivier, Y., Bottou, L., Schölkopf, B., & Lopez-Paz, D. (2019). First-order adversarial vulnerability of neural networks and input dimension. In International conference on machine learning (pp. 5809–5817).
Sitawarin, C., Bhagoji, A. N., Mosenia, A., Chiang, M., & Mittal, P. (2018). Darts: Deceiving autonomous cars with toxic signs. arXiv preprint arXiv:1802.06430.
Su, D., Zhang, H., Chen, H., Yi, J., Chen, P. Y., & Gao, Y. (2018). Is robustness the cost of accuracy?—A comprehensive study on the robustness of 18 deep image classification models. In Proceedings of the European conference on computer vision (ECCV) (pp. 631–648).
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., & Fergus, R. (2014). Intriguing properties of neural networks. In Y. Bengio & Y. LeCun (Eds.), 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, conference track proceedings.
Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, pp. 1633–1645). Curran Associates Inc.
Google Scholar
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., & Madry, A. (2019). Robustness may be at odds with accuracy. In 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net.
Vajda, I. (1970). Note on discrimination information and variation. IEEE Transactions on Information Theory, 16(6), 771–773.
Article MathSciNet Google Scholar
Van der Vaart, A. W. (2000). Asymptotic statistics (Vol. 3). Cambridge University Press.
Google Scholar
van Erven, T., & Harremos, P. (2014). Rényi divergence and Kullback–Leibler divergence. IEEE Transactions on Information Theory, 60(7), 3797–3820.
Article MathSciNet Google Scholar
Verma, G., & Swami, A. (2019). Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in neural information processing systems 32 (pp. 8646–8656). Curran Associates Inc.
Google Scholar
Villani, C. (2003). Topics in optimal transportation. Number 58. American Mathematical Soc.
Xie, C., Wang, J., Zhang, Z., Ren, Z., & Yuille, A. L. (2018). Mitigating adversarial effects through randomization. In 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings. OpenReview.net.
Xu, H., & Mannor, S. (2012). Robustness and generalization. Machine Learning, 86(3), 391–423.
Article MathSciNet Google Scholar
Yang, G., Duan, T., Hu, E., Salman, H., Razenshteyn, I., & Li, J. (2020). Randomized smoothing of all shapes and sizes.
Yao, D., Xi, Z., Tianyi, Z., Chen, C., Guannan, L., & Miryung, K. (2020). An analysis of adversarial attacks and defenses on autonomous driving models. In 18th Annual IEEE international conference on pervasive computing and communications. IEEE.
Yin, D., Kannan, R., & Bartlett, P. (2019). Rademacher complexity for adversarially robust generalization. In International conference on machine learning (pp. 7085–7094).
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Proceedings of the British machine vision conference (BMVC) (pp. 87.1–87.12). BMVA Press.
Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., & Jordan, M. I. (2019). Theoretically principled trade-off between robustness and accuracy. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th international conference on machine learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, proceedings of machine learning research (vol. 97, pp. 7472–7482). PMLR.
Zozor, S., & Amblard, P. O. (1999). Stochastic resonance in discrete time nonlinear AR(1) models. IEEE Transactions on Signal Processing, 47(1), 108–122.
Article MathSciNet Google Scholar

Download references

Funding

Open access funding provided by EPFL Lausanne. Rafael Pinot has been supported in part by Ecocloud, an EPFL research center (Postdoctoral Research Award). Finally, this work was granted access to OpenPOWER prototype from GENCI-IDRIS under the Preparatory Access AP010610510, and HPC resources of IDRIS under the allocation 2020-101141 made by GENCI.

Author information

Rafael Pinot and Laurent Meunier have contributed equally to this work.

Authors and Affiliations

EPFL, 1015, Lausanne, Switzerland
Rafael Pinot
Meta AI Research, 75002, Paris, France
Laurent Meunier
LAMSADE, CNRS, Université Paris-Dauphine, PSL Research University, 75016, Paris, France
Laurent Meunier, Florian Yger, Yann Chevaleyre & Jamal Atif
CEA, List, Université Paris-Saclay, 91120, Palaiseau, France
Cédric Gouy-Pailler

Authors

Rafael Pinot
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Meunier
View author publications
You can also search for this author in PubMed Google Scholar
Florian Yger
View author publications
You can also search for this author in PubMed Google Scholar
Cédric Gouy-Pailler
View author publications
You can also search for this author in PubMed Google Scholar
Yann Chevaleyre
View author publications
You can also search for this author in PubMed Google Scholar
Jamal Atif
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the problem definition. The technical content and proofs were mainly carried out by RP and LM under the joint supervision of FY, CG-P, JA and YC. The first version of the manuscript was written by RP and all authors commented on the previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Rafael Pinot or Laurent Meunier.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose, no conflicts of interest to declare that are relevant to the content of this article.

Ethics approval

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript nor financial or proprietary interests in any material discussed in this article. The authors approve the ethical standards of the publisher.

Consent to participate and publication

All authors are aware of the submission of this manuscript and agree to its publication.

Additional information

Editor: Willem Waegeman.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of technical lemmas

1.1 Appendix 1.1: Proof of Lemma 1

Proof

Let $\beta >1$. Let us denote g and $g'$ respectively the probability density functions of $\rho$ and $\rho '$ with respect to the Lebesgue measure. We also set ${\varvec{x}}' = {\varvec{x}}+ \varvec{\tau }$ for readability. Then we have

$$\begin{aligned} D_{\beta }(\rho ,\rho ')&=\frac{1}{\beta -1}\log {\mathbb{E}}_{\varvec{z} \sim \rho '} \left[ \left( \frac{g(\varvec{z}) }{g'(\varvec{z})}\right) ^{\beta } \right] \\ =&\frac{1}{\beta -1} \log {\mathbb{E}}_{\varvec{z} \sim \rho '} \Big [ \exp \Big (\frac{\beta }{2}\big ((\varvec{z}-{\varvec{x}}')^{\intercal } \varSigma ^{-1}(\varvec{z}-{\varvec{x}}') - (\varvec{z}-{\varvec{x}})^{\intercal } \varSigma ^{-1}(\varvec{z}-{\varvec{x}}) \big ) \Big ) \Big ]. \end{aligned}$$

By change of variable we get

$$\begin{aligned} =\,&\frac{1}{\beta -1}\log {\mathbb{E}}_{\varvec{z} \sim {\mathcal{N}}(0,\varSigma ) }\left[ \exp \left( \frac{\beta }{2}\big (\varvec{z}^{\intercal }\varSigma ^{-1}\varvec{z}-(\varvec{z}+\varvec{\tau })^{\intercal } \varSigma ^{-1}(\varvec{z}+\varvec{\tau }) \big ) \right) \right] \\ =\,&\frac{1}{\beta -1} \log {\mathbb{E}}_{\varvec{z} \sim {\mathcal{N}}(0,\varSigma ) }\left[ \exp \left( \frac{\beta }{2}\left( - 2\varvec{z}^{\intercal }\varSigma ^{-1}\varvec{\tau }- \Vert {\varvec{\tau }}\Vert _{\varSigma ^{-1}}^{2}\right) \right) \right] \\ = \,&\frac{1}{\beta -1} \log \int _{{\mathbb{R}}^{d}} \frac{\exp \left( -\frac{1}{2}\varvec{z}^{\intercal }\varSigma ^{-1}\varvec{z} - \frac{\beta }{2}2\varvec{z}^{\intercal }\varSigma ^{-1}\varvec{\tau }- \frac{\beta }{2}\Vert {\varvec{\tau }}\Vert _{\varSigma ^{-1}}^{2}\right) }{(2 \pi )^{d} \det (\varSigma )^{d/2}} d\varvec{z} . \end{aligned}$$

Furthermore, for any $\varvec{z} \in {\mathbb{R}}^{d}$, we have

$$\begin{aligned}&-\frac{1}{2}\varvec{z}^{\intercal }\varSigma ^{-1}\varvec{z} - \frac{\beta }{2}2\varvec{z}^{\intercal }\varSigma ^{-1}\varvec{\tau }- \frac{\beta }{2}\Vert {\varvec{\tau }}\Vert _{\varSigma ^{-1}}^{2} \\ =\,&- \frac{1}{2}(\varvec{z} + \beta \varvec{\tau })^{\intercal }\varSigma ^{-1}(\varvec{z} + \beta \varvec{\tau }) + \frac{\beta ^{2} - \beta }{2} \Vert {\varvec{\tau }}\Vert _{\varSigma ^{-1}}^{2} . \end{aligned}$$

Then we can rewrite the Renyi divergence as follows

$$\begin{aligned} D_{\beta }(\rho ,\rho ')&= \frac{1}{\beta -1}\log {\mathbb{E}}_{\varvec{z} \sim {\mathcal{N}}(- \beta \varvec{\tau },\varSigma )} \left[ \exp \left( \frac{\beta ^{2} - \beta }{2} \Vert {\varvec{\tau }}\Vert _{\varSigma ^{-1}}^{2} \right) \right] \\&= \frac{1}{\beta -1}\log \left( \exp \left( \frac{\beta ^{2} - \beta }{2} \Vert {\varvec{\tau }}\Vert _{\varSigma ^{-1}}^{2} \right) \right) \\&= \frac{ \beta }{2} \Vert {\varvec{\tau }}\Vert _{\varSigma ^{- 1}}^{2} . \end{aligned}$$

This concludes the proof. $\square$

1.2 Appendix 1.2: Proof of Lemma 2

Proof

Let us denote g and $g'$ respectively the probability density functions of $\rho$ and $\rho '$ with respect to the Lebesgue measure. Furthermore, we denote ${\varvec{x}}' = {\varvec{x}}+ \varvec{\tau }$. Then by definition of the total variation distance, we have $D_{TV}(\rho ,\rho )=\rho (Z)-\rho '(Z)$ with $Z=\{\varvec{z} ~{s.t.}~ g(\varvec{z})\ge g'(\varvec{z})\}$. In our case $g(\varvec{z})\ge g'(\varvec{z})$ is equivalent to

$$\begin{aligned} (\varvec{z}-{\varvec{x}}')^{\intercal }\varSigma ^{-1}(\varvec{z}-{\varvec{x}}')-(\varvec{z}-{\varvec{x}})^{\intercal }\varSigma ^{-1}(\varvec{z}-{\varvec{x}})\ge 0. \end{aligned}$$

Then with the same simplification as above, we have

$$\begin{aligned} \rho (Z)&= {\mathbb{P}}_{ \varvec{z}\sim {\mathcal{N}}({\varvec{x}},\varSigma )}\left( ( \varvec{z}-{\varvec{x}}')^{\intercal }\varSigma ^{-1}( \varvec{z}-{\varvec{x}}')-( \varvec{z}-{\varvec{x}})^{\intercal }\varSigma ^{-1}( \varvec{z}-{\varvec{x}})\ge 0 \right) \\&= {\mathbb{P}}_{ \varvec{z}\sim {\mathcal{N}}(0,\varSigma )}\left( ( \varvec{z}-\varvec{\tau })^{\intercal }\varSigma ^{-1}( \varvec{z}-\varvec{\tau })- \varvec{z}^{\intercal }\varSigma ^{-1} \varvec{z}\ge 0 \right) \\&= {\mathbb{P}}_{ \varvec{z}\sim {\mathcal{N}}(0,\varSigma )}\left( -2 \varvec{z}^{\intercal }\varSigma ^{-1}\varvec{\tau }+\Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}^{2}\ge 0 \right) \\&={\mathbb{P}}_{ \varvec{z}\sim {\mathcal{N}}(0,I_{d})}\left( \varvec{z}^{\intercal }\varSigma ^{-1/2}\varvec{\tau }\le \frac{1}{2} \Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}^{2}\right) . \end{aligned}$$

Furthermore, if $\varvec{z}\sim {\mathcal{N}}(0,I_{d})$ then $\varvec{z}^{\intercal }\varSigma ^{-1/2}\varvec{\tau }\sim {\mathcal{N}}(0,\Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}^{2})$; hence we also have $\frac{\varvec{z}^{\intercal }\varSigma ^{-1/2}\varvec{\tau }}{\Vert \varvec{\tau }\Vert _{\varSigma ^{-1}} }\sim {\mathcal{N}}(0,1)$. Accordingly we get

$$\begin{aligned} \rho (Z)&= {\mathbb{P}}_{\varvec{z}\sim {\mathcal{N}}(0,1)}\left( \varvec{z}\le \frac{1}{2} \Vert \varvec{\tau }\Vert _{\varSigma ^{-1}} \right) = \varPhi \left( \frac{1}{2} \Vert \varvec{\tau }\Vert _{\varSigma ^{-1}} \right) . \end{aligned}$$

By symmetry we get that $\rho '(A)= 1-\rho (A) = 1-\varPhi \left( \frac{1}{2} \Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}\right)$. We then get

$$\begin{aligned} D_{TV}(\mu ,\nu ) = 2\varPhi \left( \frac{\Vert \varvec{\tau }\Vert _{\varSigma ^{-1}}}{2}\right) -1 \end{aligned}$$

which concludes the proof. $\square$

Appendix 2: Discussion on probability metrics

As mentioned earlier in this paper, the choice of the metric/divergence is crucial as it characterizes the notion of adversarial robustness we are examining. We focus on the total variation distance and Renyi divergence, but the question of whether these metrics/divergences are more appropriate than others remains open. It should be noted, however, that our definition of robustness is monotonous depending on the metric/divergence we use.

Proposition 12

(Monotonicity of the robustness) Let $\mathrm{m}$ be a randomized classifier, and let D and $D'$ be two divergences/metrics on ${\mathcal{P}}( {\mathcal{Y}})$. If there exists a non decreasing function $f: {\mathbb{R}} \mapsto {\mathbb{R}}$ such that $\forall \rho ,\rho ' \in {\mathcal{P}}( {\mathcal{Y}})$, $D(\rho , \rho ') \le f(D'(\rho , \rho '))$, then the following assertion holds.

$$\begin{aligned} \mathrm{m}{\text{ is }} (\alpha _{p}, \epsilon ){\text{-robust {w.r.t.~}}}D' \implies \mathrm{m}{\text{ is }} (\alpha _{p}, f(\epsilon )){\text{-robust {w.r.t.~}}} D. \end{aligned}$$

The proof straightforwardly comes from the definition of robustness.

Proof

Let us consider $\mathrm{m}$ a randomized classifier $(\alpha _{p}, \epsilon )$-robust w.r.t. $D'$. Then for any ${\varvec{x}}\sim {\mathcal{D}}$, and $\varvec{\tau }~{s.t.}~ \Vert {\varvec{\tau }}\Vert _{p} \le \alpha _{p}$, since f is non decreasing, we have

$$\begin{aligned} D(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+\varvec{\tau })) \le f\left( D'(\mathrm{m}({\varvec{x}}),\mathrm{m}({\varvec{x}}+\varvec{\tau }))\right) \le f\left( \epsilon \right) . \end{aligned}$$

Then $\mathrm{m}$ is $(\alpha _{p}, f(\epsilon ))$-robust w.r.t. D which concludes the proof. $\square$

The above result suggests that the different notions of robustness we might conceive are more related than they appear. Here are some of the most classical divergences used in machine learning. Let $\rho ,\rho ',\nu$ three measures in ${\mathcal{P}}( {\mathcal{Y}})$. We denotes g and $g'$ the probability density functions of $\rho$ and $\rho '$ with respect to $\nu$. Then we can define the Wasserstein distance as follows

$$\begin{aligned} D_{W}(\rho , \rho ' ) :=\inf \int _{{\mathcal{Y}}^{2}} {\text{dist}}\left( y, y'\right) d\pi (y,y'), \end{aligned}$$

(22)

where dist is some ground distance on ${\mathcal{Y}}$, and the infimum is taken over all joint distributions $\pi$ in ${\mathcal{P}}\left( {\mathcal{Y}}\times {\mathcal{Y}} \right)$ with marginals $\rho$ and $\rho '$.

Remark 5

In transportation theory, the Wasserstein distance is solution of the Monge-Kantorovich problem with the cost function $c(y,y') = {\text{dist}}(y,y')$. Then, the definitions of total variation and Wasserstein distance match when we use the trivial distance ${\text{dist}}(y,y') = {\mathbbm{1}}\{y \ne y'\}$.

We also define respectively the Hellinger distance and the Separation distance as follows.

$$\begin{aligned}&D_{H}(\rho , \rho '):= \left[ \int _{{\mathcal{Y}} }\left( \sqrt{g} - \sqrt{g'} \right) ^{2} d \nu \right] ^{1/2}. \end{aligned}$$

(23)

$$\begin{aligned}&D_{S}(\rho , \rho '):= \sup \limits _{y \in {\mathcal{Y}}} \left( 1 - \frac{g(y)}{g'(y)} \right) . \end{aligned}$$

(24)

If we take any of the above metrics/divergences to instantiate a notion of adversarial robustness we might get very different semantics for them. However, we can show that any of these definitions can be covered—with respect to Proposition 12—either by the Renyi or the total variation robustness. Figure 3 summarizes the links we can make between all these different definitions of robustness, and Propositions 13 and 14 present the associated results. We can see that the total variation distance and the Renyi divergence are both central since they can cover any of the other robustness notions. This does not mean that they are more appropriate than the others, but at least they are general enough to cover a wide range of possible definitions.

Proposition 13

Let $\mathrm{m}$ be a randomized classifier. If $\mathrm{m}$ is $(\alpha _{p}, \epsilon )$-robust w.r.t. $D_{TV}$ then the following assertions hold.

$\mathrm{m}$ is $\left( \alpha _{p}, \epsilon \times diam\left( {\mathcal{Y}}\right) \right)$-robust w.r.t. $D_{W}$, where $diam\left( {\mathcal{Y}}\right) :=\max \limits _{y,y' \in {\mathcal{Y}}} dist(y,y')$.
$\mathrm{m}$ is $\left( \alpha _{p}, \sqrt{2 \epsilon } \right)$-robust w.r.t. $D_{H}$.

Proof

Let us consider $\rho$ and $\rho ' \in {\mathcal{P}}\left( {\mathcal{Y}}\right)$. Thanks to Gibbs and Su (2002) we have

$D_{W}(\rho ,\rho ') \le {\text{diam}}({\mathcal{Y}}) D_{TV}(\rho ,\rho ')$.
$D_{H}(\rho ,\rho ') \le \sqrt{2 D_{TV}(\rho ,\rho ')}$.

Hence, by using Proposition 12 respectively with $f: x \mapsto {\text{diam}}({\mathcal{Y}}) x$ and $f: x \mapsto \sqrt{2x}$ we get the expected results. $\square$

Proposition 14

Let $\mathrm{m}$ be a randomized classifier. If $\mathrm{m}$ is $(\alpha _{p}, \epsilon )$-robust w.r.t. $D_{\beta }$ then the following assertions hold.

$\mathrm{m}$ is $(\alpha _{p}, \epsilon ')$-robust w.r.t. $D_{TV}$ with $\epsilon ' = \min \left( \frac{3}{2}\left( \sqrt{1 + \frac{4\epsilon }{9}} - 1\right) ^{1/2}, \frac{\exp (\epsilon +1) -1}{\exp (\epsilon +1) +1}\right)$.
$\mathrm{m}$ is $(\alpha _{p}, \sqrt{\epsilon })$-robust w.r.t. $D_{H}$.
If $\beta =\infty$, then $\mathrm{m}$ is $(\alpha _{p}, \epsilon )$ robust w.r.t. $D_{S}$.

Proof

(1) First, let us suppose that $\beta \ge 1$. Thanks to Proposition 2 and to (Gibbs & Su, 2002), for any $\rho , \rho ' \in {\mathcal{P}}\left( {\mathcal{Y}}\right)$ we have

$D_{H}(\rho ,\rho ') \le \sqrt{ D_{1}(\rho ,\rho ') } \le \sqrt{ D_{\beta }(\rho ,\rho ') }$ (see Gibbs & Su, 2002).
$D_{TV}(\rho ,\rho ') \le \min \left( \frac{3}{2}\left( \sqrt{1 + \frac{4D_{\beta }(\rho ,\rho ') }{9}} - 1\right) ^{1/2}, \frac{\exp (D_{\beta }(\rho ,\rho ') +1) -1}{\exp (D_{\beta }(\rho ,\rho ') +1) +1}\right)$ (Proposition 2).

Hence, by using Proposition 12, as above, we get the expected results.

(2) Now let us suppose that $\beta = \infty$. By definition of the supremum divergence, we have

$$\begin{aligned} D_{\infty }(\rho ,\rho ') = \sup _{B \subset {\mathcal{Y}}} \ \left| \ln \frac{\rho (B)}{\rho '(B)}\right| . \end{aligned}$$

Furthermore, note that the function $x \mapsto 1-x - \left| \ln (x)\right|$ is negative on ${\mathbb{R}}$, therefore for any $y \in {\mathcal{Y}}$ one has

$$\begin{aligned} 1-\frac{\rho (y)}{\rho '(y)} \le \left| \ln \frac{\rho (y)}{\rho '(y)}\right| . \end{aligned}$$

Since the above inequality is true for any $y \in {\mathcal{Y}}$, we have

$$\begin{aligned} D_{S}\left( \rho ,\rho '\right) = \sup _{y \in {\mathcal{Y}}}\left( 1-\frac{\rho (y)}{\rho '(y)} \right) \le \sup _{y \in {\mathcal{Y}}}\left| \ln \frac{\rho (y)}{\rho '(y)}\right| \le \sup _{B \subset {\mathcal{Y}} }\left| \ln \frac{\rho (B)}{\rho '(B)}\right| = D_{\infty }(\rho ,\rho '). \end{aligned}$$

Finally, by using Proposition 12 with $f: x \mapsto x$ we get the expected results. $\square$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pinot, R., Meunier, L., Yger, F. et al. On the robustness of randomized classifiers to adversarial examples. Mach Learn 111, 3425–3457 (2022). https://doi.org/10.1007/s10994-022-06216-6

Download citation

Received: 26 October 2021
Revised: 09 June 2022
Accepted: 13 June 2022
Published: 02 August 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10994-022-06216-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the robustness of randomized classifiers to adversarial examples

Abstract

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Explainable artificial intelligence: a comprehensive review

1 Introduction

1.1 Supervised learning for image classification

1.2 Classification in the presence of an adversary

1.3 Contributions

2 Related work

2.1 Accuracy versus robustness trade-off

2.2 Studying adversarial generalization

2.3 Defense against adversarial examples based on noise injection

3 Definition of risk and robustness for randomized classifiers

Remark 1

Definition 1

3.1 Risk and adversarial risk for randomized classifiers

3.2 Robustness for randomized classifiers

Definition 2

3.3 Divergence and probability metrics

Proposition 1

Proposition 2

Proof

4 Risks’ gap and generalization gap for robust randomized classifiers

4.1 Risks’ gap for robust classifiers w.r.t. \(D_{TV}\)

Theorem 3

Proof

4.2 Risks’ gap for robust classifiers w.r.t. \(D_{\beta }\)

Theorem 4

Proof

Remark 2

Theorem 5

Proof

5 Standard generalization gap

Definition 3

Theorem 6

5.1 Generalization error for robust classifiers

Definition 4

Theorem 7

Proof

Remark 3

Corollary 1

Proof

5.2 Discussion and dimensionality issues

Remark 4

6 Building robust randomized classifiers

Theorem 8

Lemma 1

Lemma 2

Theorem 9

Proof

7 Discussion: mode preservation property and randomized smoothing

Proposition 10

Proof

Proposition 11

Proof

8 Numerical validations against \(\ell _{2}\) adversary

8.1 Architecture and training procedure

8.2 Results

9 Lesson learned and future work

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate and publication

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of technical lemmas

1.1 Appendix 1.1: Proof of Lemma 1

Proof