1 Introduction

Neural networks have achieved great success in the past decade. Today, they are one of the primary candidates in solving a wide variety of machine learning tasks, from object detection and classification (He et al., 2016; Wu et al., 2019) to photo-realistic image generation (Karras et al., 2020; Vahdat & Kautz, 2020) and beyond. Despite their impressive performance, neural networks are vulnerable to adversarial attacks (Biggio et al., 2013; Szegedy et al., 2014): adding well-crafted, imperceptible perturbations to their input can change their output. This unexpected behavior of neural networks prevents their widespread deployment in safety-critical applications, including autonomous driving (Eykholt et al., 2018) and medical diagnosis (Ma et al., 2021). As such, training robust neural networks against adversarial attacks is of paramount importance and has gained ample attention.

Adversarial training is one of the most successful approaches in defending neural networks against adversarial attacks. This approach first constructs a perturbed version of the training data. Then, the neural network is optimized over these perturbed inputs instead of the clean samples. This procedure must be done iteratively as the perturbations depend on the neural network weights. Since the weights are optimized during training, the perturbations also need to be adjusted for each data sample in every iteration,Footnote 1

Various adversarial training methods primarily differ in how they define and find the perturbed version of the input (Madry et al., 2018; Laidlaw et al., 2021; Zhang et al., 2019). However, they all require repetitive construction of these perturbations during training which is often cast as another non-linear optimization problem. Therefore, the time/computational complexity of adversarial training is much higher than vanilla training. In practice, neural networks require massive amounts of training data (Adadi, 2021) and need to be trained multiple times with various hyper-parameters to get their best performance (Killamsetty et al., 2021a). Thus, reducing the time/computational complexity of adversarial training is critical to enabling the environmentally efficient application of robust neural networks in real-world scenarios (Schwartz et al., 2020; Strubell et al., 2019).

Fast Adversarial Training (FAT) (Wong et al., 2020) is a successful approach proposed for efficient training of robust neural networks. Contrary to the common belief that building the perturbed versions of the inputs using Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) does not help in training arbitrary robust models (Madry et al., 2018; Tramèr et al., 2018), Wong et al. (2020) show that by carefully applying uniformly random initialization before the FGSM step one can make this training approach work. Using FGSM to generate the perturbed input in a single step combined with implementation tricks such as mixed precision and cyclic learning rate (Smith, 2017), FAT can significantly reduce the training time of robust neural networks.

Despite its success, FAT may exhibit unexpected behaviors in different settings. For instance, it was shown that FAT suffers from catastrophic overfitting where the robust accuracy during training suddenly drops to 0% (Andriushchenko & Flammarion, 2020; Wong et al., 2020). Another fundamental issue with FAT and its variations such as GradAlign (Andriushchenko & Flammarion, 2020) and N-FGSM (de Jorge Aranda et al., 2022) is that they are specifically designed and implemented for \(\ell _\infty \) adversarial training. This is because FGSM, particularly an \(\ell _\infty \) perturbation generator, is at the heart of these methods. As a result, the quest for a unified approach that can reduce the time complexity of all types of adversarial training is not over.

Motivated by the limited scope of FAT, in this paper, we take an important step toward finding a general yet principled approach for reducing the time complexity of adversarial training. We notice that repetitive construction of adversarial examples for each data point is the main bottleneck of robust training. While this needs to be done iteratively, we speculate that perhaps we can find a subset of the training data that is more important to robust network optimization than the rest. Specifically, we ask the following research question: Can we train an adversarially robust neural network using a subset of the entire training data without sacrificing clean or robust accuracy?

Fig. 1
figure 1

Overview of neural network training using coreset selection. Contrary to vanilla coreset selection, in our adversarial version we first need to construct adversarial examples and then perform coreset selection

This paper shows that the answer to this question is affirmative: by selecting a weighted subset of the data based on the neural network state, we run weighted adversarial training only on this selected subset. To achieve this goal, we first theoretically analyze adversarial subset selection convergence under gradient descent for a few idealistic settings. Our study demonstrates that the convergence bound is directly related to the capability of the weighted subset in approximating the loss gradient over the entire training set. Motivated by this analysis, we propose using the gradient approximation error as our adversarial coreset selection objective for training robust neural networks. We then draw an elegant connection between adversarial training and vanilla coreset selection algorithms. In particular, we use Danskin’s theorem and demonstrate how the entire training data can effectively be approximated with an informative weighted subset. To conduct this selection, our study shows that one needs to build adversarial examples for the entire training data and solve a respective subset selection objective. Afterward, training can be performed on this selected subset of the training data. In our approach, shown in Fig. 1, adversarial coreset selection is only required every few epochs, effectively reducing the time complexity of robust training algorithms. We demonstrate how our proposed approach can be used as a general framework in conjunction with different adversarial training objectives, opening the door to a more principled approach for efficient training of robust neural networks in a general setting. Our experimental results show that one can reduce the time complexity of various robust training objectives by 2–3 times without sacrificing too much clean and robust accuracy.

In summary, we make the following contributions:

  • We propose a practical yet principled algorithm for efficient training of robust neural networks based on adaptive coreset selection. To the best of our knowledge, we are the first to use coreset selection in adversarial training.

  • We provide theoretical guarantees for the convergence of our adversarial coreset selection algorithm under different settings.

  • Based on our theoretical study, we develop adversarial coreset selection for neural networks and show that our approach can be applied to a variety of robust learning objectives, including TRADES (Zhang et al., 2019), \(\ell _p\)-PGD (Madry et al., 2018) and Perceptual (Laidlaw et al., 2021) Adversarial Training. Our approach encompasses a broader range of robust training objectives compared to the limited scope of the existing methods.

  • Our experiments demonstrate that the proposed approach can result in a 2–3 fold reduction of the training time in adversarial training, with only a slight reduction in the clean and robust accuracy.

The rest of this paper is organized as follows. In Sect. 2, we go over the preliminaries of our work and review the related work. We then propose our approach in Sect. 3. Next, we present and discuss our experimental results in Sect. 4. Finally, we conclude the paper in Sect. 5.

2 Preliminaries

In this section, we review the related background to our work.

2.1 Adversarial Training

Let \(\mathcal {D}=\left\{ \left( \varvec{x}_{i}, y_{i}\right) \right\} _{i=1}^{n} \subset \mathcal {X} \times \mathcal {Y}\) denote a training dataset consisting of n i.i.d. samples. Each data point contains an input data \(\varvec{x}_{i}\) from domain \(\mathcal {X}\) and an associated label \(y_{i}\) taking one of k possible values \({\mathcal {Y}=\left[ k\right] =\left\{ 1, 2, \ldots , k\right\} }\). Without loss of generality, in this paper we focus on the image domain \(\mathcal {X}\). Furthermore, assume that \({f_{\varvec{\theta }}: \mathcal {X} \rightarrow \mathbb {R}^{k}}\) denotes a neural network classifier with parameters \(\varvec{\theta }\) that takes \(\varvec{x} \in \mathcal {X}\) as input and maps it to a logit value \(f_{\varvec{\theta }}(\varvec{x}) \in \mathbb {R}^{k}\). Then, training a neural network in its most general format can be written as the following minimization problem:

$$\begin{aligned} \min _{\varvec{\theta }} \sum _{i \in \mathcal {V}} \varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}_{i}, y_{i}\right) , \end{aligned}$$
(1)

Here, \(\varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}, y\right) \) is a function that takes a data point \(\left( \varvec{x}, y\right) \) and a function \(f_{\varvec{\theta }}\) as its inputs, and its output is a measure of discrepancy between the input \(\varvec{x}\) and its ground-truth label y. Also, \(\mathcal {V}=\left[ n\right] =\left\{ 1, 2, \ldots , n\right\} \) denotes the entire training data indices. By writing the training objective in this format, we can denote both vanilla and robust adversarial training using the same notation. Below we show how various choices of the function \(\varvec{\Phi }\) amount to different training objectives.

2.1.1 Vanilla Training

In case of vanilla training, the function \(\varvec{\Phi }\) is a simple evaluation of an appropriate loss function over the neural network output \(f_{\varvec{\theta }}(\varvec{x})\) and the ground-truth label y. In other words, for vanilla training we have:

$$\begin{aligned} \varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}, y\right) = \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\varvec{x}), y\right) , \end{aligned}$$
(2)

where \(\mathcal {L}_{\textrm{CE}}(\cdot , \cdot )\) is the cross-entropy loss.

Fig. 2
figure 2

Coreset selection aims at finding a weighted subset of the data that can approximate certain behaviors of the entire data samples. In this figure, we denote the behavior of interest as a function \(\mathcal {B}(\cdot , \cdot )\) that receives a set and its associated weights. The goal of coreset selection is to move from the original data \(\mathcal {V}\) with uniform wights \(\varvec{1}\) to a weighted subset \(\mathcal {S}^{*}\) with weights \(\varvec{\gamma }^{*}\)

2.1.2 FGSM, \(\ell _p\)-PGD, and Perceptual Adversarial Training

In adversarial training, the objective is itself an optimization problem:

$$\begin{aligned} \varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}, y\right) = \max _{\tilde{\varvec{x}}} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\tilde{\varvec{x}}), y\right) ~ \text {s.t.}~\textrm{d}\left( {\tilde{\varvec{x}}, \varvec{x}}\right) \le \varepsilon \end{aligned}$$
(3)

where \(\textrm{d}(\cdot , \cdot )\) is an appropriate distance measure over image domain \(\mathcal {X}\), and \(\varepsilon \) denotes a scalar. The constraint over \(\textrm{d}\left( {\tilde{\varvec{x}}, \varvec{x}}\right) \) is used to ensure visual similarity between \(\tilde{\varvec{x}}\) and \(\varvec{x}\). It can be shown that solving Eq. (3) amounts to finding an adversarial example \(\tilde{\varvec{x}}\) for the clean sample \(\varvec{x}\) (Madry et al., 2018). Different choices of the visual similarity measure \(\textrm{d}(\cdot , \cdot )\) and solvers for Eq. (3) result in different adversarial training objectives:

  • FGSM (Goodfellow et al., 2015) assumes that \(\textrm{d}({\tilde{\varvec{x}}, \varvec{x}}) = \left\Vert {\tilde{\varvec{x}}-\varvec{x}}\right\Vert _{\infty }\). Using this \(\ell _\infty \) assumption, the solution to Eq. (3) is computed using one iteration of gradient ascent.

  • \(\ell _p\)-PGD (Madry et al., 2018) utilizes \(\ell _p\) norms as a proxy for visual similarity \(\textrm{d}(\cdot , \cdot )\). Then, several steps of projected gradient ascent is taken to solve Eq. (3).

  • Finally, Perceptual Adversarial Training (PAT) (Laidlaw et al., 2021) uses Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al., 2018) as its distance measure. Laidlaw et al. (2021) propose to solve the inner maximization of Eq. (3) objective using either projected gradient ascent or Lagrangian relaxation.

2.1.3 TRADES Adversarial Training

This approach uses a combination of Eqs. (3) and (2). The intuition behind TRADES (Zhang et al., 2019) is creating a trade-off between clean and robust accuracy. In particular, the objective is written as:

$$\begin{aligned} \varvec{\Phi } \left( \varvec{x}, y; f_{\varvec{\theta }}\right) =~&\mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\varvec{x}), y\right) \nonumber \\&+ \max _{\tilde{\varvec{x}}} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\tilde{\varvec{x}}), f_{\varvec{\theta }}(\varvec{x})\right) /\lambda , \end{aligned}$$
(4)

such that \(\textrm{d}=({\tilde{\varvec{x}}, \varvec{x}})\le \varepsilon \). Here, \(\lambda \) is a regularization parameter that controls the trade-off.

2.2 Coreset Selection

Coreset selection (also referred to as adaptive data subset selection) attempts to find a weighted subset of the data that can approximate specific attributes of the entire population (Feldman, 2020). Coreset selection algorithms start with defining a criterion based on which the subset of interest is found:

$$\begin{aligned} \mathcal {S}^{*}, \varvec{\gamma }^{*} = {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{\mathcal {S} \subseteq \mathcal {V}, \varvec{\gamma }}} \mathcal {C}(\mathcal {S}, \varvec{\gamma }). \end{aligned}$$
(5)

In this definition, \(\mathcal {S}\) is a subset of the entire data \(\mathcal {V}\), and \(\varvec{\gamma }\) denotes the weights associated with each sample in the subset \(\mathcal {S}\). Moreover, \(\mathcal {C}(\cdot , \cdot )\) denotes a selection criterion based on which the coreset \(\mathcal {S}^{*}\) and its weights \(\varvec{\gamma }^{*}\) are aimed to be found. Once the coreset is found, one can work with these samples to represent the entire dataset. Figure 2 depicts this definition of coreset selection.

Traditionally, coreset selection has been used for different machine learning tasks such as k-means and k-medians (Har-Peled & Mazumdar, 2004), Naïve Bayes and nearest neighbor classifiers (Wei et al., 2015), and Bayesian inference (Campbell & Broderick, 2018). Recently, coreset selection algorithms are developed for neural network training (Killamsetty et al., 2021a; b; c; Mirzasoleimanet al., 2020a; b). The main idea behind such methods is often to approximate the full gradient using a subset of the training data.

Existing coreset selection algorithms can only be used for the vanilla training of neural networks. As such, they still suffer from adversarial vulnerability. This paper extends coreset selection algorithms to robust neural network training and shows how they can be adapted to various robust training objectives.

3 Proposed Method

The main bottleneck in the time/computational complexity of adversarial training stems from constructing adversarial examples for the entire training set at each epoch. FAT (Wong et al., 2020) tries to eliminate this issue by using FGSM as its adversarial example generator. However, this simplification (1) may lead to catastrophic overfitting (Andriushchenko & Flammarion, 2020; Wong et al., 2020), and (2) is not easily applicable to different types of adversarial training as FGSM is designed explicitly for \(\ell _\infty \) attacks.

Instead of using a faster adversarial example generator, here, we take a different, orthogonal path and try to reduce the training set size effectively. This way, the original adversarial training algorithm can still be used on this smaller subset of training data. This approach can reduce the time/computational complexity, while optimizing a similar objective as the initial training. In this sense, it leads to unified method that can be used along various types of adversarial training objectives, including the ones that already exist and the ones that will be proposed in the future.

The main hurdle in materializing this idea is the following question:

How should we select this subset of the training data while minimizing the impact on the clean or robust accuracy?

To answer this question, we next provide convergence guarantees for adversarial training using a subset of the training data. This analysis would lay the foundation of our adversarial coreset selection objective in the subsequent sections.

3.1 Convergence Guarantees

This section provides theoretical insights into our proposed adversarial coreset selection. Specifically, we aim to find a convergence bound for adversarial training over a subset of the data and see how it relates to the optimal solution.

Let \(L(\varvec{\theta })\) denote the adversarial training objective over the entire training dataset such that:Footnote 2

$$\begin{aligned} L(\varvec{\theta }) = \sum _{i \in \mathcal {V}} \max _{\tilde{\varvec{x}}_i} \mathcal {L}(\varvec{\theta }; \tilde{\varvec{x}}_i), \end{aligned}$$
(6)

where \(\mathcal {L}(\varvec{\theta }; \tilde{\varvec{x}}_i)\) is the evaluation of the loss over input \(\tilde{\varvec{x}}_i\) with network parameters \(\varvec{\theta }\).Footnote 3 The goal is to find the optimal set of parameters \(\varvec{\theta }\) such that this objective is minimized. To optimize the parameters \(\varvec{\theta }\) of the underlying learning algorithm, we use gradient descent. Let \({t = 0, 1, \ldots , T-1}\) denote the current epoch. Then, gradient descent update can be written as:

$$\begin{aligned} \varvec{\theta }_{t + 1} = \varvec{\theta }_{t} - \alpha _{t} \nabla _{\varvec{\theta }}L(\varvec{\theta }_{t}), \end{aligned}$$
(7)

where \(\alpha _{t}\) is the learning rate.

As demonstrated in Eq. (5), the ultimate goal of coreset selection is to find a subset \(\mathcal {S} \subseteq \mathcal {V}\) of the training data with weights \(\varvec{\gamma }\) to approximate certain behaviors of the entire population \(\mathcal {V}\). In our case, the aim is to successfully train a robust neural network over this weighted dataset using:

$$\begin{aligned} L^{\mathcal {S}}_{\varvec{\gamma }}{(\varvec{\theta })} = \sum _{j \in \mathcal {S}} \gamma _{j} \max _{\tilde{\varvec{x}}_j} \mathcal {L}(\varvec{\theta }; \tilde{\varvec{x}}_j) \end{aligned}$$
(8)

which is the weighted loss over the coreset \(\mathcal {S}\).Footnote 4 Once a coreset \(\mathcal {S}\) is found, we can replace the gradient descent update rule in Eq. (7) with:

$$\begin{aligned} \varvec{\theta }_{t + 1} \approx \varvec{\theta }_{t} - \alpha _{t} \nabla _{\varvec{\theta }}L_{\varvec{\gamma }^{t}}(\varvec{\theta }_{t}), \end{aligned}$$
(9)

where

$$\begin{aligned} L_{\varvec{\gamma }^{t}}{(\varvec{\theta }_{t})} = \sum _{j \in \mathcal {S}^{t}} \gamma ^{t}_{j} \max _{\tilde{\varvec{x}}_j} \mathcal {L}(\varvec{\theta }_{t}; \tilde{\varvec{x}}_j) \end{aligned}$$
(10)

is the weighted empirical loss over the coreset \(\mathcal {S}^{t}\) at iteration t.

The following theorem extends the convergence guarantees of Killamsetty et al. (2021a) to adversarial training.

Theorem 1

Let \(\varvec{\gamma }^{t}\) and \(\mathcal {S}^{t}\) denote the weights and subset derived by any adversarial coreset selection algorithm at iteration t of the full gradient descent. Also, let \(\varvec{\theta }^{*}\) be the optimal model parameters, \(\mathcal {L}\) be a convex loss function with respect to \(\varvec{\theta }\), and that the parameters are bounded such that \(\left\Vert \varvec{\theta } - \varvec{\theta }^{*}\right\Vert \le \Delta \). Moreover, let us define the gradient approximation error at iteration t with:

$$\begin{aligned} \Gamma (L, L_{\varvec{\gamma }}, \varvec{\gamma }^{t}, \mathcal {S}^{t}, \varvec{\theta }_t) :=\left\Vert \nabla _{\varvec{\theta }}L(\varvec{\theta }_t) - \nabla _{\varvec{\theta }}L^{\mathcal {S}^{t}}_{\varvec{\gamma }^{t}}{(\varvec{\theta }_t)}\right\Vert . \end{aligned}$$

Then, for \(t=0, 1, \cdots , T-1\) the following guarantees hold:

(1) For a Lipschitz continuous loss function \(\mathcal {L}\) with parameter \(\sigma \) and constant learning rate \({\alpha =\frac{\Delta }{\sigma \sqrt{T}}}\) we have:

$$\begin{aligned} \min _{t=0: T-1}&L(\varvec{\theta }_{t})-L(\varvec{\theta }^{*}) \\ {}&\quad \le \frac{\Delta \sigma }{\sqrt{T}} \cdots \\&\cdots +\frac{\Delta }{T} \sum _{t=0}^{T-1} \Gamma (L, L_{\varvec{\gamma }^t}, \varvec{\gamma }^{t}, \mathcal {S}^{t}, \varvec{\theta }_t). \end{aligned}$$

(2) Moreover, for a Lipschitz continuous loss \(\mathcal {L}\) with parameter \(\sigma \) and strongly convex with parameter \(\mu \), by setting a learning rate \(\alpha _{t}=\frac{2}{n\mu (1+t)}\) we have:

$$\begin{aligned} \min _{t=0: T-1}&L(\varvec{\theta }_{t})-L(\varvec{\theta }^{*}) \le \frac{2 \sigma ^{2}}{n\mu (T-1)} \cdots \\&\cdots +\sum _{t=0}^{T-1} \frac{2 \Delta t}{T(T-1)} \Gamma (L, L_{\varvec{\gamma }^t}, \varvec{\gamma }^{t}, \mathcal {S}^{t}, \varvec{\theta }_t), \end{aligned}$$

where n is the total number of training data.

Proof

(Proof Sketch) We first draw a connection between the Lipschitz and strongly convex properties of the loss function \(\mathcal {L}\) and its max function \(\max \mathcal {L}\). Then, we exploit these lemmas as well as Danskin’s theorem (Theorem 2) to provide the convergence guarantees. For more details, please see Appendix B. \(\square \)

3.2 Coreset Selection for Efficient Adversarial Training

As our analysis in Theorem 1 indicates, the convergence bound consists of two terms: an irreducible noise term and an additional term consisting of gradient approximation errors. Motivated by our analysis for this idealistic setting, we set our adversarial coreset selection objective to minimize the gradient approximation error.

In particular, let us assume that we have a neural network that we aim to robustly train using:

$$\begin{aligned} \min _{\varvec{\theta }} \sum _{i \in \mathcal {V}} \varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}_{i}, y_{i}\right) , \end{aligned}$$
(11)

where \(\mathcal {V}\) denotes the entire training data, and \(\varvec{\Phi }(\cdot )\) takes one of Eqs. (3) and (4) formats. We saw that we need a subset of data that can minimize the gradient approximation error to have a tight convergence bound. This choice also makes intuitive sense: since the gradient contains the relevant information for training a neural network using gradient descent, we must attempt to find a subset of the data that can approximate the full gradient. As such, we set the adversarial coreset selection criterion to:

$$\begin{aligned} \mathcal {S}^{*}, \varvec{\gamma }^{*} =&{{\,\mathrm{arg\,min}\,}}_{{\mathcal {S}} \subseteq {\mathcal {V}}, {\varvec{\gamma }}} \Big \Vert \sum _{i \in {\mathcal {V}}} \nabla _{\varvec{\theta }}\varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}_{i}, y_{i}\right) \cdots \nonumber \\&- \sum _{j \in {\mathcal {S}}} {\gamma _{j}}\nabla _{\varvec{\theta }}\varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}_{j}, y_{j}\right) \Big \Vert , \end{aligned}$$
(12)

where \(\mathcal {S}^{*} \subseteq \mathcal {V}\) is the coreset, and \(\gamma ^{*}_{j}\)’s are the weights of each sample in the coreset. Once the coreset is found, instead of training the neural network using Eq. (11), we can optimize it just over the coreset using a weighted training objective

$$\begin{aligned} \min _{\varvec{\theta }} \sum _{j \in \mathcal {S}^{*}} \gamma ^{*}_{j} \varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}_{j}, y_{j}\right) . \end{aligned}$$
(13)

It can be shown that solving Eq. (12) is NP-hard (Mirza-soleiman et al., 2020a; b). Roughly, various coreset selection methods differ in how they approximate the solution of the aforementioned objective. For instance, Craig (Mirzasoleiman et al., 2020a) casts this objective as a submodular set cover problem and uses existing greedy solvers to get an approximate solution. As another example, GradMatch (Killamsetty et al., 2021a) analyzes the convergence of stochastic gradient descent using adaptive data subset selection. Based on this study, Killamsetty et al. (2021a) propose to use Orthogonal Matching Pursuit (OMP) (Elenberg et al., 2016; Pati et al., 1993) as a greedy solver of the data selection objective. More information about these methods is provided in Appendix A.

The issue with the aforementioned coreset selection methods is that they are designed explicitly for vanilla training of neural networks (see Fig. 1b), and they do not reflect the requirements of adversarial training. As such, we should modify these methods to make them suitable for our purpose of robust neural network training. Meanwhile, we should also consider the fact that the field of coreset selection is still evolving. Thus, we aim to find a general modification that can later be used alongside newer versions of greedy coreset selection algorithms.

We notice that various coreset selection methods proposed for vanilla neural network training only differ in their choice of greedy solvers. Therefore, we narrow down the changes we want to make to the first step of coreset selection: gradient computation. Then, existing greedy solvers can be used to find the subset of training data that we are looking for. To this end, we draw a connection between coreset selection methods and adversarial training using Danskin’s theorem, as outlined next. Our analysis shows that for adversarial coreset selection, one needs to add a pre-processing step where adversarial attacks for the raw training data need to be computed (see Fig. 1c).

3.3 From Vanilla to Adversarial Coreset Selection

To construct the coreset selection objective given in Eq. (12), we need to compute the loss gradient with respect to the neural network weights. Once done, we can use existing greedy solvers to find the solution. The gradient computation needs to be performed for the entire training set. In particular, using our notation from Sect. 2.1, this step can be written as:

$$\begin{aligned} \nabla _{\varvec{\theta }}\varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}_{i}, y_{i}\right) \quad \forall \quad i \in \mathcal {V}, \end{aligned}$$
(14)

where \(\mathcal {V}\) denotes the training set.

For vanilla neural network training (see Sect. 2.1) the above gradient is simply equal to \(\nabla _{\varvec{\theta }}\mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\varvec{x}_{i}), y_{i}\right) \) which can be computed using standard backpropagation. In contrast, for the adversarial training objectives in Eqs. (3) and (4), this gradient requires taking partial derivative of a maximization objective. To this end, we use the famous Danskin’s theorem (Danskin, 1967) as stated below.

Theorem 2

(Theorem A.1 in Madry et al. (2018)) Let \(\mathcal {K}\) be a nonempty compact topological space, \({\mathcal {L}: \mathbb {R}^{m} \times \mathcal {K} \rightarrow \mathbb {R}}\) be such that \(\mathcal {L}(\cdot , \varvec{\delta })\) is differentiable and convex for every \(\varvec{\delta } \in \mathcal {K}\), and \(\nabla _{\varvec{\theta }} \mathcal {L}(\varvec{\theta }, \varvec{\delta })\) is continuous on \(\mathbb {R}^{m} \times \mathcal {K}\). Also, let \({\varvec{\delta }^{*}(\varvec{\theta })=\left\{ \varvec{\delta } \in \arg \max _{\varvec{\delta } \in \mathcal {K}} \mathcal {L}(\varvec{\theta }, \varvec{\delta })\right\} }\). Then, the corresponding max-function

$$\begin{aligned} \phi (\varvec{\theta })=\max _{\delta \in \mathcal {K}} \mathcal {L}(\varvec{\theta }, \varvec{\delta }) \end{aligned}$$

is locally Lipschitz continuous, convex, directionally differentiable, and its directional derivatives along vector \(\varvec{h}\) satisfy

$$\begin{aligned} \phi ^{\prime }(\varvec{\theta }, \varvec{h})=\sup _{\varvec{\delta } \in \varvec{\delta }^{*}(\varvec{\theta })} \varvec{h}^{\top } \nabla _{\varvec{\theta }} \mathcal {L}(\varvec{\theta }, \varvec{\delta }). \end{aligned}$$

In particular, if for some \(\varvec{\theta } \in \mathbb {R}^{m}\) the set \(\varvec{\delta }^{*}(\varvec{\theta })=\left\{ \varvec{\delta }_{\varvec{\theta }}^{*}\right\} \) is a singleton, then the max-function is differentiable at \(\varvec{\theta }\) and

$$\begin{aligned} \nabla \phi (\varvec{\theta })=\nabla _{\varvec{\theta }} \mathcal {L}\left( \varvec{\theta }, \varvec{\delta }_{\varvec{\theta }}^{*}\right) . \end{aligned}$$

In summary, Theorem 2 indicates how to take the gradient of a max-function. To this end, it suffices to (1) find the maximizer, and (2) evaluate the normal gradient at this point.

Now that we have stated Danskin’s theorem, we are ready to show how it can provide the connection between vanilla coreset selection and the adversarial training objectives of Eqs. (3) and (4). We show this for the two cases of adversarial training and TRADES, but it can also be used for any other robust training objective.

3.3.1 Case 1 (\(\ell _p\)-PGD and Perceptual Adversarial Training)

Going back to Eq. (14), we need to compute this gradient term for our coreset selection objective in Eq. (3). In particular, we need to compute:

$$\begin{aligned} \nabla _{\varvec{\theta }}\varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}, y\right) = \nabla _{\varvec{\theta }} \max _{\tilde{\varvec{x}}} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\tilde{\varvec{x}}), y\right) \end{aligned}$$
(15)

under the constraint \(\textrm{d}({\tilde{\varvec{x}}, \varvec{x}})\le \varepsilon \) for every training sample. Based on Danskin’s theorem, we deduce:

$$\begin{aligned} \nabla _{\varvec{\theta }}\varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}, y\right) = \nabla _{\varvec{\theta }} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}({\varvec{x}^{*}}), y\right) , \end{aligned}$$
(16)

where \(\varvec{x}^{*}\) is the solution to:

$$\begin{aligned} \max _{\tilde{\varvec{x}}} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\tilde{\varvec{x}}), y\right) \quad \text {s.t.} \quad \textrm{d}({\tilde{\varvec{x}}, \varvec{x}})\le \varepsilon . \end{aligned}$$
(17)

The conditions under which Danskin’s theorem hold might not be satisfied for neural networks in general. This is due to the presence of functions with discontinuous gradients, such as ReLU activation, in neural networks. More importantly, finding the exact solution of Eq. (17) is not straightforward as neural networks are highly non-convex. Usually, the exact solution \(\varvec{x}^{*}\) is replaced with its approximation, which is an adversarial example generated under the Eq. (17) objective (Kolter & Madry, 2018). Based on this approximation, we can re-write Eq. (16) as:

$$\begin{aligned} \nabla _{\varvec{\theta }}\varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}, y\right) \approx \nabla _{\varvec{\theta }} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}({\varvec{x}_{\mathrm{{adv}}}}), y\right) . \end{aligned}$$
(18)

In other words, to perform coreset selection for \(\ell _p\)-PGD (Madry et al., 2018) and Perceptual (Laidlaw et al., 2021) Adversarial Training, one needs to add a pre-processing step to the gradient computation. At this step, adversarial examples for the entire training set must be constructed. Then, the coresets can be built as in vanilla neural networks.

3.3.2 Case 2 (TRADES Adversarial Training)

For TRADES (Zhang et al., 2019), the gradient computation is slightly different as the objective in Eq. (4) consists of two terms. In this case, the gradient can be written as:

$$\begin{aligned} \nabla _{\varvec{\theta }}\varvec{\Phi } \left( \varvec{x}, y; f_{\varvec{\theta }}\right)&= \nabla _{\varvec{\theta }}\mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\varvec{x}), y\right) \cdots \nonumber \\ \ {}&\quad + \nabla _{\varvec{\theta }}\max _{\tilde{\varvec{x}}} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\tilde{\varvec{x}}), f_{\varvec{\theta }}(\varvec{x})\right) /\lambda , \end{aligned}$$
(19)

The first term is the normal gradient of the neural network. For the second term, we apply Danskin’s theorem to obtain:

$$\begin{aligned} \nabla _{\varvec{\theta }}\varvec{\Phi } \left( \varvec{x}, y; f_{\varvec{\theta }}\right)&\approx \nabla _{\varvec{\theta }}\mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\varvec{x}), y\right) \cdots \nonumber \\&\quad + \nabla _{\varvec{\theta }} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\varvec{x}_{\mathrm{{adv}}}), f_{\varvec{\theta }}(\varvec{x})\right) /\lambda , \end{aligned}$$
(20)

where \(\varvec{x}_{\mathrm{{adv}}}\) is an approximate solution to:

$$\begin{aligned} \max _{\tilde{\varvec{x}}} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\tilde{\varvec{x}}), f_{\varvec{\theta }}(\varvec{x})\right) /\lambda ~ \text {s.t.} ~ \textrm{d}({\tilde{\varvec{x}}, \varvec{x}})\le \varepsilon . \end{aligned}$$
(21)

Then, we compute the second gradient term in Eq. (20) using the multi-variable chain rule (see Sect. B.1). We can write the final TRADES gradient as:

$$\begin{aligned}&\nabla _{\varvec{\theta }}\varvec{\Phi } \left( \varvec{x}, y; f_{\varvec{\theta }}\right) \nonumber \\&\quad = \nabla _{\varvec{\theta }}\mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\varvec{x}), y\right) \cdots \nonumber \\&\qquad + \nabla _{\varvec{\theta }} \mathcal {L}_{\textrm{CE}}\left( f_{\varvec{\theta }}(\varvec{x}_{\mathrm{{adv}}}), {\texttt {freeze}}\left( f_{\varvec{\theta }}(\varvec{x})\right) \right) /\lambda \cdots \nonumber \\&\qquad + \nabla _{\varvec{\theta }} \mathcal {L}_{\textrm{CE}}\left( {\texttt {freeze}}\left( f_{\varvec{\theta }}(\varvec{x}_{\mathrm{{adv}}})\right) , f_{\varvec{\theta }}(\varvec{x})\right) /\lambda . \end{aligned}$$
(22)

where \({\texttt {freeze}}(\cdot )\) stops the gradients from backpropagating through its argument function.

Having found the loss gradients \(\nabla _{\varvec{\theta }}\varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}_{i}, y_{i}\right) \) for \(\ell _p\)-PGD, PAT (Case 1), and TRADES (Case 2), we can construct Eq. (12) and use existing greedy solvers like Craig (Mirzasoleiman et al., 2020a) or GradMatch (Killamsetty et al., 2021a) to find the coreset. Conceptually, adversarial coreset selection amounts to adding a pre-processing step where we need to build perturbed versions of the training data using their respective objectives in Eqs. (17) and (21). Afterward, greedy subset selection algorithms are used to construct the coresets based on the value of the gradients. Finally, having selected the coreset data, one can run a weighted adversarial training only on the data that remains in the coreset:

$$\begin{aligned} \min _{\varvec{\theta }} \sum _{j \in {\mathcal {S}^{*}}} {\gamma ^{*}_{j}} \varvec{\Phi } \left( f_{\varvec{\theta }}; \varvec{x}_{j}, y_{j}\right) . \end{aligned}$$
(23)

As can be seen, we are not changing the essence of the training objective in this process. We are just reducing the training size to enhance the computational efficiency of our proposed solution, and as such, we can use it along any adversarial training objective.

3.4 Practical Considerations

Algorithm 1
figure a

Adversarial Training with Coreset Selection 

Since coreset selection depends on the current values of the neural network weights, it is crucial to update the coresets as the training progresses. Prior work (Killamsetty et al., 2021a; b) has shown that this selection needs to be done every T epochs, where T is usually greater than 15. Also, we employ small yet critical practical changes while using coreset selection to increase efficiency. We summarize these practical tweaks below. Further detail can be found in (Killamsetty et al., 2021a; Mirzasoleiman et al., 2020a).

Gradient ApproximationAs we saw, both Eqs. (18) and (22) require computation of the loss gradient with respect to the neural network weights. This is equal to backpropagation through the entire neural network, which is inefficient. Instead, it is common to replace the exact gradients in Eqs. (18) and (22) with their last-layer approximation (Katharopoulos & Fleuret, 2018; Killamsetty et al., 2021a; Mirzasoleiman et al., 2020a). In other words, instead of backpropagating through the entire network, one can backpropagate up until the penultimate layer. This estimate has an approximate complexity equal to forwardpropagation, and it has been shown to work well in practice (Mirzasoleiman et al., 2020a; b; Killamsetty et al., 2021a; b).Footnote 5

Batch-wise Coreset Selection As discussed in Sect. 3.2, data selection is usually done in a sample-wise fashion where each data sample is separately considered to be selected. This way, one must find the data candidates out of the entire training set. To increase efficiency, Killamsetty et al. (2021a) proposed the batch-wise variant. In this type of coreset selection, the data is first split into several batches. Then, the algorithm makes a selection out of these batches. Intuitively, this change can increase efficiency as the sample size is reduced from the number of data points to the number of batches.

Warm-start with the Entire Data Finally, as we shall see in the experiments, it is important to warm-start the adversarial training using the entire dataset. Afterward, the coreset selection is activated, and adversarial training is only performed using the coreset data.

3.5 Final Algorithm

Figure 1 and Algorithm 1 summarize our coreset selection approach for adversarial training. As can be seen, our proposed method is a generic and principled approach in contrast to existing methods such as FAT (Wong et al., 2020). In particular, our approach provides the following advantages compared to existing methods:

  1. 1.

    The proposed approach does not involve algorithmic level manipulations and dependency on specific training attributes such as \(\ell _\infty \) bound or cyclic learning rate. Also, it controls the training speed through coreset size, which can be specified solely based on available computational resources.

  2. 2.

    The simplicity of our method makes it compatible with any existing/future adversarial training objectives. Furthermore, as we will see in Sect. 4, our approach can be combined with any greedy coreset selection algorithms to deliver robust neural networks.

These characteristics are important as they increase the likelihood of applying our proposed method for robust neural network training no matter the training objectives. This starkly contrasts with existing methods focusing solely on a particular training objective.

4 Experimental Results

In this section, we present our experimental results. We show how our proposed approach can efficiently reduce the training time of various robust objectives in different settings. To this end, we train our approach using TRADES (Zhang et al., 2019), \(\ell _p\)-PGD (Madry et al., 2018) and PAT (Laidlaw et al., 2021) on CIFAR-10 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011), and a subset of ImageNet (Russakovsky et al., 2015) with 12 and 100 classes. For TRADES and \(\ell _p\)-PGD training, we use ResNet-18 (He et al., 2016) and WideResNet-28-10 (Zagoruyko & Komodakis, 2016) classifiers. For PAT and ImageNet experiments, we use ResNet-34 and 50 architectures. Further implementation details can be found in Appendix C.

Table 1 Clean (ACC) and robust (RACC) accuracy, and total training time (T) of different adversarial training methods. For each method, all the hyper-parameters were kept the same as full training. For our proposed approach, the difference with full training is shown in parentheses. The information on the computation of RACC in each case is given in Appendix C

4.1 TRADES and \(\ell _p\)-PGD Robust Training

In our first set of experiments, we train well-known neural network classifiers on CIFAR-10, SVHN, and ImageNet-100 datasets using TRADES, \(\ell _\infty \) and \(\ell _2\)-PGD adversarial training objectives. In each case, we set the training hyper-parameters such as the learning rate, the number of epochs, and attack parameters. Then, we train the network using the entire training data and our adversarial coreset selection approach. For our approach, we use batch-wise versions of Craig (Mirzasoleiman et al., 2020a) and GradMatch (Killamsetty et al., 2021a) with warm-start. We set the coreset size (the percentage of training data to be selected) to 50% for CIFAR-10 and ImageNet-100, and 30% for SVHN to get a reasonable balance between accuracy and training time. We report the clean and robust accuracy (in %) as well as the total training time (in minutes) in Table 1. For our approach, we also report the difference with full training in parenthesis. In each case, we evaluate the robust accuracy using an attack with similar attributes as the training objective.

Table 2 Clean (ACC) and robust (RACC) accuracy, and total training time (T) of Perceptual Adversarial Training for CIFAR-10 and ImageNet-12 datasets. At inference, the networks are evaluated against five attacks that were not seen during training (Unseen RACC), as well as different versions of Perceptual Adversarial Attack (Seen RACC). In each case, the average is reported. For more information and details about the experiment, please see Appendices C and D
Table 3 Clean (ACC) and robust (RACC) accuracy, and total training time (T) of different adversarial training methods over WideResNet-28-10. For each method, all the hyper-parameters were kept the same as Table 1. The only exception is that all the epoch-related parameters were halved. The difference with full training is shown in parentheses for our proposed approach. The information on the computation of RACC in each case is given in Appendix C

As can be seen in Table 1, in most cases, we reduce the training time by more than a factor of two, while keeping the clean and robust accuracy almost intact. Note that in these experiments, all the training attributes such as the hyper-parameters, learning rate scheduler, etc. are the same among different training schemes. This is important since we want to clearly show the relative boost in performance that one can achieve just by using coreset selection. Nonetheless, it is likely that by tweaking the hyper-parameters for our approach, one can obtain even better results in terms of clean and robust accuracy.Footnote 6

4.2 Perceptual Adversarial Training Versus Unseen Attacks

As discussed in Sect. 2, PAT (Laidlaw et al., 2021) replaces the visual similarity measure \(\textrm{d}(\cdot , \cdot )\) in Eq. (3) with LPIPS (Zhang et al., 2018) distance. The logic behind this choice is that \(\ell _p\) norms can only capture a small portion of images similar to the clean one, limiting the search space of adversarial attacks. Motivated by this reason, Laidlaw et al. (2021) proposes two different ways of finding the solution to Eq. (3) when \(\textrm{d}(\cdot , \cdot )\) is the LPIPS distance. The first version uses PGD, and the second is a relaxation of the original problem using the Lagrangian form. We refer to these two versions as PPGD (Perceptual PGD) and LPA (Lagrangian Perceptual Attack), respectively. Then, Laidlaw et al. (2021) proposed to utilize a fast version of LPA to enable its efficient usage in adversarial training. More information on this approach can be found in (Laidlaw et al., 2021).

For our next set of experiments, we show how our approach can be adapted to this unusual training objective. This is done to showcase the compatibility of our proposed method with different training objectives as opposed to existing methods that are carefully tuned for a particular training objective. To this end, we train ResNet-50 classifiers using Fast-LPA. In this case, we train the classifiers on CIFAR-10 and ImageNet-12 datasets. Like our previous experiments, we set the hyper-parameters of the training to be fixed and then train the models using the entire training data and our adversarial coreset selection method. For our method, we use batch-wise versions of Craig (Mirzasoleiman et al., 2020a) and GradMatch (Killamsetty et al., 2021a) with warm-start. The coreset size for CIFAR-10 and ImageNet-12 were set to 40% and 50%, respectively. As in Laidlaw et al. (2021), we measure the performance of the trained models against unseen attacks during training, as well as the two variants of perceptual attacks. The unseen attacks for each dataset were selected in a similar manner to Laidlaw et al. (2021), and the attack parameters can be found in Appendix C. We also record the total training time taken by each method.

Table 2 summarizes our results on PAT using Fast-LPA (full results can be found in Appendix D). As seen, our adversarial coreset selection approach can deliver a competitive performance in terms of clean and average unseen attack accuracy while reducing the training time by at least a factor of two. These results indicate the flexibility of our adversarial coreset selection that can be combined with various objectives. This is due to the orthogonality of the proposed approach with the existing efficient adversarial training methods. In this case, we can make Fast-LPA even faster by using our approach.

Table 4 Clean (ACC) and robust (RACC) accuracy, and average training speed (Savg) of Fast Adversarial Training (Wong et al., 2020) without and with our adversarial coreset selection on CIFAR-10. For our proposed approach, the difference with full training is shown in parentheses

4.3 Compatibility with Existing Methods

To showcase that our adversarial coreset selection approach is complementary to existing methods, we integrate it with two existing baselines that aim to improve the efficiency of adversarial training.

Early Termination Going through our results in Tables 1 and 2, one might wonder what would happen if we decrease the number of training epochs by half. To perform this experiment, we select the WideResNet-28-10 architecture and train robust neural networks over CIFAR-10 and SVHN datasets. We set all our hyper-parameters in a similar manner to the ones used for the experiments in Table 1, and only halve the number of training epochs. To make sure that the learning rate is also comparable, we halve the learning rate scheduler epochs as well. Then, we train the neural networks using \(\ell _\infty \) and \(\ell _2\)-PGD adversarial training.

Table 3 shows our results compared to the ones reported in Table 1. As can be seen, adversarial coreset selection obtains a similar performance to using the entire data by consuming 2–3 times less training time.

Fast Adversarial Training Additionally, we integrate adversarial coreset selection with a stable version of Fast Adversarial Training (FAT) (Wong et al., 2020) that does not use cyclic learning rate. Specifically, we train a neural network using FAT (Wong et al., 2020), and then add adversarial coreset selection to this approach and record the training time and clean and robust accuracy. We run the experiments on the CIFAR-10 dataset and train a ResNet-18 for each case. We set our methods’ coreset size to 50%. The results are shown in Table 4. As seen, our approach can be easily combined with existing methods to deliver faster training. This is due to the orthogonality of our approach with existing methods that we discussed previously.

Fig. 3
figure 3

Robust accuracy as a function of \(\ell _\infty \) attack norm. We train neural networks with a perturbation norm of \(\left\Vert \varepsilon \right\Vert _\infty \le 8\) on CIFAR-10. At inference, we evaluate the robust accuracy against PGD-50 with various attack strengths

Moreover, we show that adversarial coreset selection gives a better approximation to \(\ell _\infty \)-PGD adversarial training compared to using FGSM (Goodfellow et al., 2015) as done in FAT (Wong et al., 2020). To this end, we use our adversarial GradMatch to train neural networks with the original \(\ell _\infty \)-PGD objective. We also train these networks using FAT (Wong et al., 2020) that uses FGSM. We train neural networks with a perturbation norm of \(\left\Vert \varepsilon \right\Vert _\infty \le 8\). Then, we evaluate the trained networks against PGD-50 adversarial attacks with different attack strengths to see how each network generalizes to unseen perturbations. As seen in Fig. 3, adversarial coreset selection is a closer approximation to \(\ell _\infty \)-PGD compared to FAT (Wong et al., 2020). This indicates the success of the proposed approach in retaining the characteristics of the original objective as opposed to existing methods like (Andriushchenko & Flammarion, 2020; Wong et al., 2020).

Table 5 Performance of \(\ell _\infty \)-PGD. In “Half-Half”, we mix half adversarial coreset selection samples with another half of clean samples and train a neural network similar to (Tsipras et al., 2019). In “ONLY-Core” we just use adversarial coreset samples. Settings are given in Table 8. The results are averaged over 5 runs
Fig. 4
figure 4

Relative robust error versus speed up for TRADES. For a given subset size, we compare our adversarial coreset selection (GradMatch) against random data selection. Furthermore, we show our results for a selection of different warm-start settings

4.4 Ablation Studies

In this section, we perform a few ablation studies to examine the effectiveness of our adversarial coreset selection method. In our first set of experiments, we compare a random data selection with adversarial GradMatch. Figure 4 shows that for any given coreset size, our adversarial coreset selection method results in a lower robust error. Furthermore, we modify the warm-start epochs for a fixed coreset size of 50%. The proposed method is not very sensitive to the number of warm-start epochs, although a longer warm-start is generally beneficial.

Table 6 Clean (ACC) and robust (RACC) accuracy, and total training time (T) of \(\ell _\infty \)-PGD adversarial training over CIFAR-10 for WideResNet-28-10 architecture. For each method, all the hyper-parameters were kept the same as Table 3. The frequency column indicates the number of epoch that we wait to update the coreset using Algorithm 1. The information on the computation of RACC in each case is given in Appendix C

In another comparison, we run an experiment similar to that of Tsipras et al. (2019). Specifically, we minimize the average of adversarial and vanilla training in each epoch. The non-coreset data is treated as clean samples to minimize the vanilla objective, while for the coreset samples, we would perform adversarial training. Table 5 shows the results of this experiment. As seen, adding the non-coreset data as clean inputs to the training improves the clean accuracy while decreasing the robust accuracy. These results align with the observations of Tsipras et al. (2019) around the existence of a trade-off between clean and robust accuracy.

Next, we investigate the effect of adversarial coreset selection frequency. Remember from Sect. 3.4 where we argued that performing adversarial coreset selection every T epochs would help with the speed-up. However, one must note that setting T to a large number might come at the cost of sacrificing clean and robust accuracy. To show this, we perform our early stopping experiments from Table 3 with different coreset renewal frequencies. Our results are given in Table 6. As can be seen, decreasing coreset selection renewal frequency would be helpful in gaining more speed-up but it could hurt the overall model performance.

Finally, we study the accuracy versus speed-up trade-off in different versions of adversarial coreset selection. For this study, we train our adversarial coreset selection method using different versions of Craig (Mirzasoleiman et al., 2020a) and GradMatch (Killamsetty et al., 2021a) on CIFAR-10 using TRADES. In particular, for each method, we start with the base algorithm and add the batch-wise selection and warm-start one by one. Also, to capture the effect of the coreset size, we vary this number from 50 to 10%. Figure 5 shows the clean and robust error versus speed-up compared to full adversarial training. In each case, the combination of warm-start and batch-wise versions of the adversarial coreset selection gives the best performance. Moreover, as we gradually decrease the coreset size, the training speed goes up. However, this gain in training speed is achieved at the cost of increasing the clean and robust error. Both of these observations align with that of Killamsetty et al. (2021a) around vanilla coreset selection.

Fig. 5
figure 5

Relative error versus speed up curves for different versions of adversarial coreset selection in training CIFAR-10 models using the TRADES objective. In each curve, the coreset size is changed from 50 to 10% (left to right). a, b Clean and robust error versus speed up compared to full TRADES for different versions of Adversarial Craig. c, d Clean and robust error versus speed up compared to full TRADES for different versions of Adversarial GradMatch

5 Conclusion

In this paper, we proposed a general yet principled approach for efficient adversarial training based on the theory of coreset selection. We discussed how repetitive computation of adversarial attacks for the entire training data could impede the training speed. Unlike previous works that try to solve this issue by making the adversarial attack more straightforward, here, we took an orthogonal path to reduce the training set size without modifying the attacker. We first provided convergence bounds for adversarial training using a subset of the training data. Our analysis showed that the convergence bound is related to how well this selected subset can approximate the loss gradient computed with the entire data. Based on this study, we proposed to use the gradient approximation error as our coreset selection objective and tried to make a connection with vanilla coreset selection. To this end, we discussed how coreset selection could be viewed as a two-step process, where first, the gradients for the entire training data are computed. Then greedy solvers choose a weighted subset of data that can approximate the full gradient. Using Danskin’s theorem, we drew a connection between greedy coreset selection algorithms and adversarial training. We then showed the flexibility of our adversarial coreset selection method by utilizing it for TRADES, \(\ell _p\)-PGD, and Perceptual Adversarial Training. Our experimental results indicate that adversarial coreset selection can reduce the training time by more than 2–3 while slightly reducing the clean and robust accuracy.