Nrat: towards adversarial training with inherent label noise

Chen, Zhen; Wang, Fu; Mu, Ronghui; Xu, Peipei; Huang, Xiaowei; Ruan, Wenjie

doi:10.1007/s10994-023-06437-3

Nrat: towards adversarial training with inherent label noise

Open access
Published: 10 January 2024

Volume 113, pages 3589–3610, (2024)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Nrat: towards adversarial training with inherent label noise

Download PDF

Zhen Chen¹,
Fu Wang²,
Ronghui Mu³,
Peipei Xu¹,
Xiaowei Huang¹ &
…
Wenjie Ruan ORCID: orcid.org/0000-0002-8311-8738¹

834 Accesses
1 Altmetric
Explore all metrics

Abstract

Adversarial training (AT) has been widely recognized as the most effective defense approach against adversarial attacks on deep neural networks and it is formulated as a min-max optimization. Most AT algorithms are geared towards research-oriented datasets such as MNIST, CIFAR10, etc., where the labels are generally correct. However, noisy labels, e.g., mislabelling, are inevitable in real-world datasets. In this paper, we investigate AT with inherent label noise, where the training dataset itself contains mislabeled samples. We first empirically show that the performance of AT typically degrades as the label noise rate increases. Then, we propose a Noisy-Robust Adversarial Training (NRAT) algorithm, which leverages the recent advancements in learning with noisy labels to enhance the performance of AT in the presence of label noise. For experimental comparison, we consider two essential metrics in AT: (i) trade-off between natural and robust accuracy; (ii) robust overfitting. Our experiments show that NRAT’s performance is on par with, or better than, the state-of-the-art AT methods on both evaluation metrics. Our code is publicly available at: https://github.com/TrustAI/NRAT.

One radish, One hole: Specific adversarial training for enhancing neural network’s robustness

Article 07 June 2021

Combating Noisy Labels via Contrastive Learning with Challenging Pairs

Mutual Diverse-Label Adversarial Training

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Deep neural networks have achieved considerable success in many fields (He et al. 2016; Devlin et al. 2018; Mnih et al. 2015), such as computer vision, natural language processes, and reinforcement learning, which have emerged as a transformative forces due to their remarkable efficacy and broad applicability. However, these powerful models are vulnerable to imperceptible perturbation (Goodfellow et al. 2014), i.e., adversarial examples (AEs). An AE, denoted as $x^{\prime }$, can be crafted by adding an adversarial perturbation $\varvec{\delta }$ to a natural example x, i.e., $x^{\prime } = x + \varvec{\delta }$. Adversarial attacks typically cause the classifier $h_{\varvec{\theta }}$ to make an incorrect prediction. Such a perturbation $\varvec{\delta }$ is often small and imperceptible to human perception, bounded by a $L_p$ - norm ball, that can be written as $\left\| x^{\prime } - x \right\| _p \le \epsilon$.

AEs were first introduced by Szegedy et al. (2013), which enables the community to be aware of the vulnerability of neural networks and inspires the development of adversarial defenses, including defensive distillation (Papernot et al. 2016), feature squeezing (Xu et al. 2017), and adversarial training (AT) (Goodfellow et al. 2014), etc. Among them, AT has been regarded as the most powerful one (Athalye et al. 2018). The basic idea of AT is to incorporate both natural examples and AEs during the training stage, enabling models to perform better against AEs compared to standard training (ST). Formally, AT can be formulated as a min-max problem, i.e.,

$$\begin{aligned} \min _{\varvec{\theta }} \mathbb {E}_{(\varvec{Z}, y) \sim \mathcal {D}}\left[ \max _{\Vert \varvec{\delta }\Vert \le \epsilon } L\left( h_{\varvec{\theta }}(\varvec{X}+\varvec{\delta }), y\right) \right] , \end{aligned}$$

(1)

where the inner maximization searches for perturbations that maximize the loss, while the outer minimization optimizes the neural network. A multi-step gradient-based attack known as the PGD attack was proposed by Madry et al. (2017) to solve the inner maximization of AT, which can significantly improve the adversarial robustness of neural networks against various attacks, it has been deemed as the standard and baseline method of AT, referred to as PGD-AT in this paper. Based on their idea, researchers have proposed various variations, such as TRADES (Zhang et al. 2019), MART (Wang et al. 2019), AWP (Wu et al. 2020), and S$^2$O (Jin et al. 2022). Their ideas are basically based on three directions: objective functions, data augmentation, and weight perturbation. However, most of these methods have not taken into account the presence of noisy labels, whereas real-world datasets are reported to have an inherent noise label rate between 8 to 38.5% (Xiao et al. 2015).

Therefore, designing an effective AT algorithm in the presence of inherent label noise is a nontrivial research challenge, yet this challenge is under-explored by the community. While there is some literature discusses the relationship between AT and noisy labels (Zhu et al. 2021; Zhang et al. 2021; Dong et al. 2021), they are more concerned with how to strengthen AT through noisy labels while do not consider AT with inherent label noise, i.e., noisy labels already exist in the original dataset, this paper aims to make the first attempt to tackle this research challenge. There are two important metrics for evaluating the performance of AT: (1) natural-robust trade-off: the trade-off between natural accuracy and robust accuracy; (2) the extent of robust overfitting, i.e., the robust accuracy decreases after a certain training epoch, while the natural accuracy for natural examples remains increasing or relatively constant, i.e., robust overfitting thereby hindering the natural-robust trade-off.

First, we empirically evaluate the performance of three recent AT methods on CIFAR-10 with injected inherent noisy labels. We observe that both natural accuracy and robust accuracy decrease significantly with increasing noise rate, across all AT methods. Furthermore, in the presence of inherent label noise, we notice that the natural accuracy exhibits a decline from a specific training epoch, i.e., natural overfitting, in addition to the already observed robust overfitting. This phenomenon, we call “double overfitting" in this paper. Conversely, when there is no inherent label noise, the natural accuracy consistently improves or remains stable throughout the training process. The Cross-Entropy (CE) Loss, although widely used in the aforementioned mainstream AT methods, has been shown to be non-robust to label noise (Feng et al. 2021). This vulnerability may lead to degrades in their generalization performance. To address this issue, we propose incorporating noisy-robust loss functions in AT to enhance generalizability in the presence of label noise. The performance of these methods is shown in Fig. 1.

To accurately assess the true performance of a model trained with inherent label noise, the common practice is to train the model on a training dataset that contains noisy labels and then evaluate its performance on a clean test dataset without any noisy labels. This ensures that the model’s performance is correctly evaluated. If the test dataset also contains a proportionate amount of noisy labels, it would not be possible to gauge the model’s true performance, as the label noise in the test dataset would confound the evaluation and not accurately reflect its actual effectiveness. The overview figure of AT with inherent label noise is shown in Fig. 2, which can be seen as a general framework in our setting, including the noisy training set, adversarial examples generation, classifier, and prediction on the clean test dataset. Our main contributions are as follows:

$\bullet$ We investigate AT with inherent label noise and observe that it typically be unstable and prone to show poor generalization performance. Furthermore, we empirically identify the occurrence of the “double overfitting” phenomenon, where both the natural accuracy of natural examples and the robust accuracy of AEs start to decline after a certain training stage;

$\bullet$ From the perspective of objective functions for AT with inherent label noise, we replace the non-robust CE loss with a noisy-robust loss function and further propose Noisy-Robust Adversarial Training (NRAT);

$\bullet$ Theoretically and empirically, we demonstrate that NRAT achieves compatible performance or outperforms recent AT methods when dealing with inherent label noise.

2 Related works

2.1 Adversarial training algorithms

This section introduces three widely recognized AT algorithms, which are PGD-AT (Madry et al. 2017), TRADES (Zhang et al. 2019), and MART (Wang et al. 2019) (We use $h_{\varvec{\theta }}$ to denote the classifier with model parameter $\varvec{\theta }$). First, we elaborate on their objective functions employed during the training process. Subsequently, we present the objective function of our NRAT and conduct a comparative analysis.

PGD-AT The idea of PGD-AT is straightforward, as it first generates AEs and then directly optimizes them. Despite its intuitive and simplistic formulation, it has been empirically shown to achieve excellent performance in terms of adversarial robustness. The training loss function is given by

$$\begin{aligned} \ell ^{PGD-AT}\left( \textbf{x}_{i}, y_{i}, \varvec{\theta }\right) ={\text {CE}}\left( h_{\varvec{\theta }} \left( \textbf{x}_{i}^{\prime }\right) ,y_{i}\right) . \end{aligned}$$

(2)

TRADES TRADES aimed to trade off natural accuracy and robust accuracy by employing the CE loss for natural examples and incorporating a KL-divergence as the regularization term for adversarial examples. Its objective function can be written as

$$\begin{aligned} \ell ^{TRADES}\left( \textbf{x}_{i}, y_{i}, \varvec{\theta }\right) =\sum _{i=1}^{n} \left\{ {\text {CE}}\left( h_{\varvec{\theta }}\left( \textbf{x}_{i}\right) , y_{i}\right) +\beta \cdot \max _{\textbf{x}_{i}^{\prime } \in \mathcal {S}} {\text {KL}}\left( h_{\varvec{\theta }}\left( \textbf{x}_{i}\right) \Vert h_{\varvec{\theta }}\left( \textbf{x}_{i}^{\prime }\right) \right) \right\} , \end{aligned}$$

(3)

where the first term aims to maximize natural accuracy, and the second term aims to improve robust accuracy by minimizing the distance between the predictions of natural examples and adversarial examples, thereby encouraging the outputs to be smooth. The hyperparameter $\beta$ controls the trade-off between natural accuracy and robust accuracy.

MART The fundamental idea behind MART is to treat misclassified and correctly classified examples as distinct instances and assign different optimization directions for them. The training loss can be formulated as follows

$$\begin{aligned}{} & {} \ell ^{MART}\left( \textbf{x}_{i}, y_{i}, \varvec{\theta }\right) =\text {BCE} \left( h_{\varvec{\theta }}\left( \textbf{x}_{i}^{\prime }\right) ,y_{i}\right) \nonumber \\{} & {} \quad +\lambda \cdot \text {KL}\left( h_{\varvec{\theta }}\left( \textbf{x}_{i}\right) \Vert h_{\varvec{\theta }}\left( \textbf{x}_{i}^{\prime }\right) \right) \cdot \left( 1-h_{\varvec{\theta }}\left( \textbf{x}_{i}\right) \right) , \end{aligned}$$

(4)

BCE is the boosted cross-entropy that can be written as

$$\begin{aligned} \text {BCE}\left( h_{\varvec{\theta }} \left( \textbf{x}_{i}^{\prime }\right) ,y_{i}\right) = \text {CE}\left( h_{\varvec{\theta }} \left( \textbf{x}_{i}^{\prime }\right) ,y_{i}\right) -\log \left( 1-\max _{k \ne y_{i}} \left( h_{\varvec{\theta }}\left( \textbf{x}_{i}^{\prime }\right) ,y_{i}\right) \right) , \end{aligned}$$

(5)

where k is the predicted class of $h_{\varvec{\theta }}$. Empirically, BCE can mitigate insufficient learning of CE to some extent. The hyperparameter $\lambda$ balances the influence of misclassified and correctly classified examples.

2.2 Interactions between adversarial training and noisy labels

Noisy labels are unavoidable in real-world datasets due to errors in manual annotation or in annotation platforms (Xiao et al. 2015). Accordingly, research on learning with noisy labels has also emerged. While the analysis of noisy labels in ST has been extensively explored (Li et al. 2017; Natarajan et al. 2013), researchers have recently started investigating the relationship between noisy labels and AT, i.e., AT with noisy labels. Zhu et al. (2021) explored the distinctions between AT and ST in the presence of noisy labels from the perspective of the smoothing effects of AT and the loss landscape. Zhang et al. (2021) proposed NoiLIn, which gradually injects noisy labels in both the inner maximization and outer minimization stages of AT to improve adversarial robustness. Dong et al. (2021) studied AT under random labels (almost 100% noisy labels) and identified that the remembering of one-hot labels as the cause of robust overfitting. They then adopted the Temporal Ensemble to mitigate this overfitting. Basically, these works explore the properties of noisy labels in AT and how to enhance AT’s performance on clean datasets by introducing noisy labels. Inherent label noise, instead refers to the situation where the dataset itself already contains noisy labels.

2.3 Robust loss functions and learning with noisy labels

It has been demonstrated that a trained DNN with a suitable adjusted loss function $\mathcal {L}$, namely, a noisy-robust loss function (referred to as robust loss function hereafter), can approach the best classifier $h_{\varvec{\theta }}$ under some mild assumptions with symmetric and asymmetric label noise (Ghosh et al. 2017). These robust loss functions satisfy the following equation (for a K-class classification problem, K>1 and any training example x)

$$\begin{aligned} \sum _{j=1}^{K} \mathcal {L}(h(\varvec{x}), j)=C, \end{aligned}$$

(6)

where C is a constant. Equation(6) indicates that these loss functions are symmetric and considered to be noise-tolerant according to the definitions in Ghosh et al. (2017). Even though there are many loss functions that satisfy this symmetry, the most commonly used CE does not possess this symmetry. Recent studies (Zhang and Sabuncu 2018; Amid et al. 2019) have demonstrated that adopting robust loss functions is the most straightforward and generic approach for effectively training deep neural networks with inherent label noise. Specifically, Ma et al. (2020) proposed the robust loss function NCE $+$RCE following an active-passive loss (APL) framework, which currently achieves state-of-the-art performance on ST. For a K-class classification task, NCE (Ma et al. 2020) represents the normalized version of CE, and RCE (Wang et al. 2019) is the reversed version of CE. We will provide a more detailed explanation of both NCE and RCE in the next section.

In addition to the noisy-robust loss function, there are several other methods available for learning under label noise, including label correction (Zheng et al. 2021) and collaborative learning (Han et al. 2018), etc. However, these methods often involve too complicated procedures, making their application in AT quite challenging. Considering that AT already requires significant computational resources and may incur additional performance costs. Therefore, in this paper, we specifically focus on implementing the core concept of AT under inherent label noise using a robust loss function.

3 Noisy-robust adversarial training

In the previous sections, we discussed robust loss functions which exhibit the symmetric property. Based on this, we present a novel perspective on AT with inherent label noise, i.e., replacing the non-robust loss function with a robust counterpart to enhance the performance of AT in the presence of inherent label noise. Finally, we will conduct a comprehensive comparison between our proposed NRAT with existing approaches.

3.1 Basic notation of AT with inherent label noise

For a K-class classification task, let $X=\{(x_i, y_i)\}_{i=1,...n}$ be the training dataset drawn from an input distribution $\mathcal {D}$ with n training instances, where $x_i\in \mathbb {R}^d$ represents a natural example and $y_i\in \{1,..., K\}$ denotes its annotated label, which may be incorrect, therefore we denote $y_i^*$ as the true label for $x_i$. We use $\varvec{q}(k \mid \varvec{x})$ to represent the distribution of sample x of label $k\in K$ and $\sum _{k=1}^{K} \varvec{q}(k \mid \varvec{x})=1$. We consider two types of label noise: symmetric and asymmetric label noise with an overall noise rate $\eta \in [0, 1]$. For each class j flipped to k, we denote its class-wise noise rate by $\eta _{jk}$. Symmetric label noise, which means that each label has the same probability of flipping to any other class, i.e., $\eta _{jk} = \frac{\eta }{K-1}, j \ne k$; While asymmetric noise refers to labels being flipped between similar classes, e.g., the class “truck” being flipped to “car”.

Given a classifier $h_{\varvec{\theta }}$ with model parameter $\varvec{\theta }$ (For simplicity, we may omit the $\varvec{\theta }$ in the subsequent content), it predicts the class of an input example as

$$\begin{aligned} h_{\varvec{\theta }}\left( {x}\right) =\arg \max \varvec{p}_k(x,\varvec{\theta }), \ \text{ where } \ \varvec{p}_k(x,\varvec{\theta })=\frac{e^{\varvec{z}_{k}(x, \varvec{\theta })}}{\sum _{j=1}^{K} e^{\varvec{z}_{j}(x, \varvec{\theta })}} , \end{aligned}$$

(7)

where $\varvec{z}_{k}(x, \varvec{\theta })$ denotes the logits output of a network and $\varvec{p}_k(x,\varvec{\theta })$ represents the softmax output of x. Then we denote $x^\prime$ as the AE, $X^\prime$ and $\mathcal {D}^\prime$ be the adversarial set and distribution, respectively. We perform PGD attack to produce the AEs, i.e.,

$$\begin{aligned} x^0= & {} x + \sigma ,\; \text{ where }\; \sigma \sim \mathcal {N}(0,1), \end{aligned}$$

(8)

$$\begin{aligned} x^{t+1}= & {} \Pi _{x+\mathcal {S}}(x^t + \alpha sign(\nabla _x\mathcal {L}(\theta , x^t, y)), \end{aligned}$$

(9)

where x denotes the natural example and $x^0$ is obtained by perturbing x with random noise $\sigma$ sampled from the normal distribution $\mathcal {N}(0,1)$, t denotes the current time step, $\alpha$ is the step size, $\Pi$ denotes the projection function, $\mathcal {S} \subseteq \mathbb {R}^{d}$ denotes the perturbation set of AEs. Based on the definition, the adversarial risk to be optimized for the given dataset and classifier $h_{\varvec{\theta }}$ is defined as follows:

$$\begin{aligned} \mathcal {R}_{adv}\left( h_{\varvec{\theta }}\right) =\frac{1}{n} \sum _{i=1}^{n} \max _{\textbf{x}_{i}^{\prime } \in \mathcal {S}} \ell \left( h_{\varvec{\theta }}\left( \textbf{x}_{i}^{\prime }\right) \ne y_{i}\right) . \end{aligned}$$

(10)

3.2 Interactions between adversarial training and robust loss functions

In Sect. 2.3, we introduced that loss functions with symmetric properties are robust to inherent label noise. In this section, we will mainly delve into their theoretical details. Following the works in Ghosh et al. (2017) and Ma et al. (2020), we first demonstrate that a symmetric loss function exhibits noise tolerance under both symmetric and asymmetric label noise with certain mild assumptions.

Lemma 1

In a multi-class classification problem, let a loss function L satisfy Eq. (6). Then L is noisy-robust under symmetric label noise if noise rate $\eta < \frac{K-1}{K}$. ( Ghosh et al. 2017)

Lemma 2

In a multi-class classification problem, suppose L satisfies Eq. (6) and $0\le L(h(x),k)\le \frac{C}{(K-1)},\forall k \in K$. If $R(h^*) = 0$, then L is noisy-robust under asymmetric label noise if noise rate $\eta _{jk} < \eta _{jj}$. ( Ghosh et al. 2017) where C is a constant, $h^*$ denotes the global minimizer. Ghosh et al. (2017) provide detailed proofs of these two lemmas. Lemma 2 is not easy to understand, we also provide our proof in "Appendix A". Given the above conditions on the label noise rate $\eta$, then the learning risk under both clean labels R(h) and under noisy labels with noise rate $\eta : R^\eta (h)$ shares the same global minimizer $h^*$, i.e., the loss function L is noisy-robust.

The above discussion focuses on the robust loss functions for ST, i.e., natural examples. Referring to Eqs. (8) and (9), we know that AEs are generated at the input level. Therefore, when confronted with symmetric label noise, the loss function L remains noisy-robust for AEs if it is already noisy-robust for natural examples, since there are no additional conditions regarding the inputs, as stated in Lemma 1. However, when considering asymmetric label noise, first note that the condition $0\le L(h(x),k)\le \frac{C}{(K-1)},\forall k \in K$ can be easily satisfied by a typical loss function (Ma et al. 2020). However, when replacing natural examples with AEs, it is intuitive that the value of L will increase significantly, potentially exceeding the upper bound. Another condition $R(h^*) = 0$ is a restrictive condition for the noisy-robust theory to hold, which means that an $h^*$ can achieve 100% classification accuracy. In experiments, it has been observed that satisfactory performance can still be achieved as long as $R(h^*)$ is close to 0 (Ma et al. 2020). However, as $R(h^*)$ increases, the corresponding performance tends to decline. Currently, the SOTA robust accuracy for AEs on CIFAR-10 is below 70%, indicating a high value of $R(h^*)$ and low utility of noisy-robust loss functions for AEs. Based on the above analysis, in the case of asymmetric label noise, AEs may not satisfy the two conditions for the noisy-robust properties to hold. We can conclude that:

Proposition 1

In a multi-class classification problem, suppose L is a noisy-robust loss function, then L remains noisy-robust for AEs under symmetric label noise while may be non-robust for AEs under asymmetric label noise.

Given the analysis presented above, the mainstream AT algorithms currently rely on the CE loss, some of them direct optimizing AEs, as seen in PGD-AT in Eq. (2) and MART in Eq. (4). However, replacing the CE with a robust loss function may not be effective for them under asymmetric label noise referring to Proposition 1. In contrast, TRADES in Eq. (3) optimizes natural examples and incorporates a regularization term to approximate the distribution of AEs and natural examples, it is mathematically well-suited for the application of a robust loss function. We will also empirically verify this proposition in the experiments.

3.3 Noisy robust cross entropy loss

First, we demonstrate why a simple mean absolute error (MAE) is symmetric while CE does not. For a K-class classification, recall Eq. (6), it is obvious that for MAE, $\sum _{j}^{K} \mathcal {L}(f(\varvec{x}), j) =\sum _{j}^{K}(2-2\sum _{j}^{K}\varvec{p}(k \mid \varvec{x})) = 2*(K-1)$ which is a constant. While for CE, the $\sum _{j}^{K} \mathcal {L}(f(\varvec{x}), j) = -log\varvec{p}(y \mid \varvec{x})$ where y is the ground truth, it is obvious that the value of $-log\varvec{p}(y \mid \varvec{x})$ may vary for different x, i.e., the value of $\sum _{j}^{K} \mathcal {L}(f(\varvec{x}), j)$ is not a constant for CE. However, the simplicity of MAE makes it susceptible to underfitting on large datasets. While CE is a widely used and effective loss function, it lacks the property of noisy-robust. Thus, it is intuitive to consider combining the advantages of both MAE and CE.

In Ma et al. (2020), they divide the loss function into active and passive components according to whether it solely counts on the value of $\varvec{p}(k=y|x)$. Specifically, CE is considered as an active loss function while MAE is regarded as a passive loss function. In Ma et al. (2020), they argue that combining an active loss function with a passive loss function can benefit from complementary learning, as demonstrated by Kim et al. (2019). This combination is referred to as Active-Passive Loss (APL) framework. Furthermore, by using noisy-robust versions of both active and passive loss functions, a noisy-robust APL loss can be obtained.

To transfer CE to a noisy-robust APL loss form, we require the noisy-robust active and passive versions of CE. The normalized CE, shown in Eq. (11) which is obtained by dividing by $\sum _{j}^{K} \mathcal {L}(h(\varvec{x}), j)$, is proven to be a noisy-robust active loss function in Ma et al. (2020), while the reversed CE in Eq. (12) is a noisy-robust passive loss function. Currently, the SOTA version of robust loss functions is a combination of NCE+RCE.

$$\begin{aligned}{} & {} \begin{aligned} N C E = \frac{CE}{{\sum _{j=1}^K}\mathcal {L}(h(x), j)}&= \frac{-\sum _{k=1}^{K} \varvec{q}(k \mid \varvec{x}) \log \varvec{p}(k \mid \varvec{x})}{-\sum _{j=1}^{K} \sum _{k=1}^{K} \varvec{q}(y=j \mid \varvec{x}) \log \varvec{p}(k \mid \varvec{x})}\\&= \log _{\prod _{k}^{K} \varvec{p}(k \mid \varvec{x})} \varvec{p}(y \mid \varvec{x}), \end{aligned} \end{aligned}$$

(11)

$$\begin{aligned} RCE = {-\sum _{k=1}^{K} \varvec{p}(k \mid \varvec{x}) \log \varvec{q}(k \mid \varvec{x})}, \end{aligned}$$

(12)

By definition, both NCE and RCE satisfy the symmetry in Eq. (6). Therefore, NCE+RCE can be regarded as a robust variant of CE when noisy labels are inhered in the training datasets.

3.4 Noisy robust adversarial training

In Sect. 3.2, we have analyzed why TRADES is suitable for incorporating a robust loss function while PGD-AT and MART are not as compatible in this regard. Our NRAT is formally based on TRADES, with enhancements made to both of its components. These enhancements aim to make NRAT more effective for datasets with inherent label noise. We rewrite the original objective function of TRADES first

$$\begin{aligned} \ell ^{TRADES}\left( \textbf{x}_{i}, y_{i}, \varvec{\theta }\right) =\sum _{i=1}^{n} \left\{ \text {CE}\left( h_{\varvec{\theta }}\left( \textbf{x}_{i}\right) , y_{i}\right) +\beta \cdot \max _{\textbf{x}_{i}^{\prime } \in \mathcal {S}} \text {KL}\left( h_{\varvec{\theta }}\left( \textbf{x}_{i}\right) \Vert h_{\varvec{\theta }}\left( \textbf{x}_{i}^{\prime }\right) \right) \right\} , \end{aligned}$$

(13)

We already know that CE is non-robust for inherent label noise. Now, we demonstrate that the KL-divergence can also be replaced by a more robust alternative. The KL-divergence, given by

$$\begin{aligned} \text {KL}(p(x_i, \varvec{\theta })||p(x_i^\prime , \varvec{\theta })) = \sum _{k=1}^{K}p_k(x_i, \varvec{\theta })log \frac{p_k(x_i, \varvec{\theta })}{p_k(x_i^\prime , \varvec{\theta })}, \end{aligned}$$

(14)

the KL-divergence is an asymmetric measure that treats two distributions unequally. In learning with noisy labels, it becomes apparent that the ground truth distribution $\varvec{q}(k|x)$ for x may not accurately reflect the true distribution, while the predicted distribution $\varvec{p}(k|x)$ may better represent the true distribution to some extent (Wang et al. 2019). Therefore, in learning with noisy labels, it will be more robust to utilize both $\text {KL}(p_{1}||p_{2})$ and $\text {KL}(p_{2}||p_{1})$ to obtain a symmetric divergence measure

$$\begin{aligned} \text {KL}_{sym}(p(x_i, \varvec{\theta })||p(x_i^\prime , \varvec{\theta })) = \frac{1}{2}\left\{ \text {KL}(p(x_i, \varvec{\theta })||p(x_i^\prime , \varvec{\theta })) + (\text {KL}(p(x_i^\prime , \varvec{\theta })||p(x_i, \varvec{\theta }))\right\} . \end{aligned}$$

(15)

Another theory that supports the use of symmetric KL-divergence is the memorization effects (Dong et al. 2021) of neural networks. These effects indicate that neural networks have the capability to fit training data well, even in the presence of noisy labels. However, mislabeled examples tend to incur larger losses compared to correctly labeled examples (Song et al. 2019), leading to increased uncertainty in the output probabilities $p(x_i, \varvec{\theta })$ and $p(x_i^\prime , \varvec{\theta })$ (assuming $x_i$ as a mislabelled example). Consequently, as the label noise rate increases, the inequality of KL-divergence is magnified throughout the dataset. We also analyze the robust risk of symmetric KL-divergence under symmetric label noise in Appendix B and show that symmetric KL-divergence tends to have a tighter bound compared with KL-divergence. Based on this analysis, we replace the two terms in the original TRADES with more robust alternatives that are more suitable for datasets with inherent noisy labels, the training objective function of NRAT is defined as followed

$$\begin{aligned}{} & {} \ell ^{NRAT}\left( \textrm{x}_{i}, y_{i}, \varvec{\theta }\right) : =\sum _{i=1}^{n}\bigg \{ L_{apl}\left( h_{\varvec{\theta }}\left( \textrm{x}_{i}\right) , y_{i}\right) \nonumber \\{} & {} \quad +\lambda \cdot \max _{\textbf{x}_{i}^{\prime } \in \mathcal {S}} \text {KL}_{\text{ sym } }\left( h_{\varvec{\theta }}\left( \textrm{x}_{i}\right) \Vert h_{\varvec{\theta }}\left( \textrm{x}_{i}^{\prime }\right) \right) \bigg \}, \end{aligned}$$

(16)

where $L_{apl}$ denotes the robust loss functions NCE$+$RCE following the APL framework in Eqs. (11) and (12), $\text {KL}_{sym}$ denotes the symmetric KL-divergence in Eq. (15). The pseudocode of the training algorithm for NRAT is given below.

3.5 Relation to existing work

So far, PGD-AT, MART, and TRADES have commonly been used as baselines for newly proposed AT algorithms, like in Wu et al. (2020), we have introduced our NRAT as an enhanced and more noisy-robust objective function compared to TRADES in Sect. 3.4. In this section, we will mainly focus on the distinctions between our NRAT and MART, since they also share a similar formulation of objective functions.

MART divides the training dataset into correctly classified examples and misclassified examples which are similar to correctly labeled examples and mislabeled examples. When facing noisy labels, we can also divide the natural training set $\mathcal {S}$ into two subsets, that is, examples with correct labels as $\mathcal {S}^+$ and examples with noisy labels as $\mathcal {S}^-$, given a classifier $h_\theta ^*$ that satisfies $\mathcal {R}(h_\theta ^*) = 0$, then we get:

$$\begin{aligned} \begin{aligned} \mathcal {S}_{h_{\varvec{\theta }}}^{+}=\left\{ i: i \in [n], h_{\varvec{\theta }}^* \left( \textbf{x}_{i}\right) =y_{i} = y_{i}^*\right\} ; \\ \mathcal {S}_{h_{\varvec{\theta }}}^{-}=\left\{ i: i \in [n], h_{\varvec{\theta }}^* \left( \textbf{x}_{i}\right) = y_{i} \ne y_{i}^*\right\} . \end{aligned} \end{aligned}$$

(17)

However, in learning with noisy labels, we do not know which label is incorrect or correct in advance. Hence, it becomes necessary to minimize the overall risk $\mathcal {R}\left( h_{\varvec{\theta }}\right)$ instead of dividing it into subsets, as done in MART. Therefore, our NRAT algorithm follows PGD-AT and TRADES by minimizing the risk of the whole dataset. MART, on the other hand, it utilizes different objective functions for correctly classified and misclassified examples. MART achieves optimal performance when applied to clean datasets, while the presence of noisy labels often results in misclassifications being actually correct, and vice versa. This leads to the possibility of its different objective functions being applied to inappropriate examples, thereby diminishing its performance.

4 Experiments

In this section, we empirically evaluate the performance of the proposed NRAT on CIFAR-10 dataset against two types of injected inherent label noise: symmetric noise and asymmetric noise. We compare our method with three existing AT methods and their variants on the noisy dataset with varying label noise rates.

4.1 Experimental setup

Baselines We consider three well-known AT algorithms as baselines: (1) PGD-AT; (2) TRADES; (3) MART. To evaluate the effectiveness of robust loss functions in these algorithms, we also replace the CE loss used in these algorithms with NCE$+$RCE as three additional baselines, i.e., (4) PGD-AT-APL; (5) TRADES-APL; (6) MART-APL.

Generation of label noise To simulate real-world datasets that may contain inherent label noise, we introduce two types of research-oriented label noise to the original CIFAR-10 dataset. Symmetric label noise refers to each label having an equal probability of being flipped to any other class; In contrast, asymmetric noise involves label flipping between similar classes, which is more representative of real-world scenarios. For asymmetric label noise, we flip labels between $TRUCK \leftrightarrow AUTOMOBILE$, $BIRD \leftrightarrow AIRPLANE$, $DEER \leftrightarrow HORSE$, and $CAT\leftrightarrow DOG$, following (Zhang and Sabuncu 2018). We consider noise rates ranging from 20% and 40% to simulate the noise rate in real-world datasets, and we also report the performance on the clean dataset without any label noise (0%). We also provide the results of NRAT on MNIST and FashionMNIST via Table 4 in the “Appendix C”.

Adversarial training settings For AT, we train ResNet18 on all algorithms, we basically follow the standard settings in Rice et al. (2020) with some improvements made to be more suitable for AT with noisy labels. Specifically, we use stochastic gradient descent (SGD) with momentum 0.9, the total training epochs is 200, with weight decay 5e-4, we used standard data augmentation, i.e., random crops and random horizontal flips, we also implement data normalization for all methods. For the training attack, we use PGD-10 with random initialization and perturbation limit $\epsilon =8/255$, step size 2/255. For the initial learning rate, the standard default value is 0.1, while we choose different smaller initial learning rates from [0.01, 0.05, 0.1] for different noisy rates since is prone to show gradient collapse when AT with inherent label noise, the general principle is to choose the largest possible learning rate without encountering gradient collapse. (In "Appendix D", we provide an additional experiment by replacing the CE in PGD attack with our proposed loss function.)

We use $\ell _\infty$ threat models for all methods. We do not train any WideResNet since it usually shows a similar trend with ResNet18, while it is much more time-consuming. For NCE$+$RCE in NRAT, we follow the setting in Rice et al. (2020) for CIFAR10, i.e., both the coefficients before the two terms are 1. The hyperparameters of the baselines are consistent with their original papers: $\lambda = 5$ for MART and $\beta = 6$ for TRADES. For our NRAT, we try $\lambda = [4, 6, 8, 10]$ and find that $\lambda = 6$ yields the best empirical results across different noise rates, we report the best natural-robust trade-off performance for all the methods. All experiments are implemented on a server with an Intel i7-12700F CPU and an RTX3090 GPU. Note that we do not perform any training tricks like gradient clipping, label smoothing, etc., to accurately compare the performance between different objective strategies.

4.2 Performance evaluation

Adversarial attacks We conduct two different typical white-box attacks: PGD-20, CW-20 (Carlini and Wagner 2017) (the $\ell _\infty$ version of CW loss optimized by PGD-20), and one more powerful auto attack (Croce et al. 2020) to evaluate the baselines as well as NRAT. Auto attack contains an ensemble of parameter-free attacks, which can serve as a reliable metric for assessing the robustness performance of a model. While some of the attacks here may not be a white-box attack in the noisy labels setting, like CW-20, as it may easily be swayed by gradient obfuscation caused by the random label flipping, we believe they can reflect the robustness performance of the model to a certain extent.

To evaluate the performance, we report “natural” and “robust” which denote the accuracy of natural test images and adversarial test images using different attacks, respectively. From it, we can see the natural-robust trade-off of different methods. Another metric of measuring AT is the degree of robust overfitting, so we also report the “Best” (highest accuracy) and “Last” (accuracy at the last training epoch) natural and robust accuracy to see the gap between them, the smaller the gap, the lower the degree of overfitting. Results are shown in Tables 1 and 2 for learning with symmetric/asymmetric label noises respectively. Recall Fig. 1, we find that even natural accuracy is overfitting when there is label noise for the baselines. From our results, this double overfitting can be largely mitigated by our method.

Table 1 Best and last robustness performance(%) on CIFAR-10 with inherent symmetric label noise with 0%, 20% and 40% noise rate

Full size table

Table 2 Best and Last Robustness performance(%) on CIFAR-10 with inherent asymmetric label noise with 20% and 40% noise rate

Full size table

Remark for Tables 1 and 2. Under the symmetric label noise, Table 1 shows that NRAT can outperform the baselines considering the best robust accuracy when facing symmetric label noise. While for the clean dataset, TRADES exhibits superior robust performance. Comparing the performance of MART with MART-APL and TRADES with TRADES-APL, we observe that the robust performance both improved under 20% and 40% symmetric label noise. Particularly, MART-APL demonstrates a significant improvement, these results indicate that in the presence of noisy labels, a robust loss function can be considered as a more robust alternative to the CE loss.

While MART and TRADES exhibit significant robust overfitting, considering the gap between the last performance and best performance (around 11% to 16% for 20% symmetric label noise and 18% to 28% for 40% asymmetric label noise), the APL versions can significantly mitigate the double overfitting issues (around 6% to 10% for 20% symmetric label noise and 8% to 10% for 40% asymmetric label noise. These demonstrate the effectiveness of robust loss functions in addressing the double overfitting issues. We provide the learning curves for MART-APL and TRADES-APL in Fig. 3 below:

Under the asymmetric label noise, Table 2 further demonstrates that NRAT achieves the highest robust performance under 20% and 40% asymmetric label noise. Another noteworthy observation is that the best robust performance of MART-APL consistently falls below that of MART, which aligns with our Proposition 1 that the robust loss functions may be non-robust for AEs under asymmetric label noise, as the CE is non-robust and the performance of NCE+RCE is even lower than CE around 1% to 3.5%. Another notable phenomenon is that the robust overfitting observed in PGD-AT, MART, and TRADES is not as pronounced under asymmetric label noise compared to symmetric label noise. This suggests that, in AT, the CE loss is relatively more robust for asymmetric label noise compared to symmetric label noise. Conversely, in ST, asymmetric label noise is generally more challenging.

The effectiveness of symmetric KL-divergence The key difference between TRADE-APL and our NRAT is whether to use a symmetric KL-divergence or not. Considering their performance shown in Tables 1 and 2, it is evident that NRAT consistently achieves higher robust performance but lower natural performance compared to TRADES-APL. This highlights the role of symmetric KL divergence, as it serves to bridge the performance gap between natural and robust, albeit at the cost of some natural performance. Given that robust performance is the primary focus of AT, indeed, this trade-off is considered an appropriate compromise.

The performance of PGD-AT-APL As we have analyzed in Sect. 3.2, PGD-AT is not well-suited for APL. Empirically, the training process for PGD-AT-APL exhibits a peculiar tendency, with significantly low natural accuracy (less than 30%) and high robust accuracy (more than 60%). Therefore we do not show the results of PGD-AT-APL in the above tables. However, the underlying reasons for such performance remain an open issue that requires further investigation.

Further discussions with TRADES We make a full comparison between our NRAT and TRADES in this part. First, under symmetric label noise, NRAT gets a higher best robust accuracy, and with the noise rate increasing from 20 to 40%, the improvement becomes apparent at around 2% to 4% under different attacks; another improvement is that NRAT can mitigate the double overfitting, from the second part of Table 1, TRADES shows a significant double overfitting issue, that the gap between the last accuracy and best accuracy is quite large, while the last performance of NRAT is much higher than that of TRADES. Second, for asymmetric label noise which shows an opposite phenomenon that although NRAT still outperformance TRADES at the best robust accuracy, with the noise rate increase, the gap becomes closer. This is related to the condition $R(h^{*}=0)$ in Lemma 2, with the noise rate increase, $R(h^{*})$ tends to be far away from 0 which limit the performance of the NCE+RCE loss.

4.3 Mitigating double overfitting

Although NRAT partially mitigates the issue of double overfitting, there is still significant robust overfitting, resulting in a substantial best and last performance gap. The gap is around 7% to 10% for symmetric label noise and 8% to 12% for asymmetric label noise. To further address this, we introduce using weight perturbation. Adversarial weight perturbation (AWP) (Wu et al. 2020) aims to adversarially perturb both the inputs and weights during the training stage. The input perturbation is produced via PGD attack, while the weight perturbation can be written as

$$\begin{aligned} \textbf{v} \leftarrow \Pi _{\gamma }\left( \textbf{v}+\eta \frac{\nabla _{\textbf{v}} \frac{1}{n} \sum _{i=1}^{n} \ell \left( \textbf{f}_{\textbf{w}+\textbf{v}}\left( \textbf{x}_{i}^{\prime }\right) , y_{i}\right) }{\left\| \nabla _{\textbf{v}} \frac{1}{n} \sum _{i=1}^{n} \ell \left( \textbf{f}_{\textbf{w}+\textbf{v}}\left( \textbf{x}_{i}^{\prime }\right) , y_{i}\right) \right\| }\Vert \textbf{w}\Vert \right) , \end{aligned}$$

(18)

where $\textbf{v}$ denotes the weight perturbation, which can be solved by multi-step methods like PGD, and n is the batch size. Combining $x^{\prime }$ and $\textbf{v}$ for adversarial training has been shown to enhance adversarial robustness, as well as alleviate robust overfitting. Furthermore, we empirically demonstrate that NRAT is compatible with AWP and can effectively mitigate the issue of double overfitting in the presence of label noise. The comparison between NRAT and NRAT-AWP is shown in Fig. 4 and Table 3.

Table 3 Robustness performance(%) on CIFAR-10 of NRAT-AWP and NRAT with 20% and 40% symmetric/asymmetric label noise

Full size table

It is clear that NRAT-AWP achieves higher robust accuracy and significantly mitigates robust overfitting. The performance gap is less than 5% across all label noise rates.

4.4 AT with generated data

Currently, one of the most effective approaches in AT is leveraging additional data. For instance, Wang et al. (2023) used the elucidating diffusion model (EDM) (Turkeltaub et al. 2023) to generate millions of additional data for AT, leading to the state-of-the-art performance on the RobustBench (Croce et al.. 2020) leaderboard. However, it is worth noting that these augmented datasets may also contain an unknown proportion of noisy labels. Out of curiosity, we also trained NRAT on these additional data. (We refer to their method as DM_AT in this section.)

Settings for this part We use the 1 M generated data provided in Wang et al. (2023), following most of the settings outlined in Sect. 4.1. While for each method (DM_AT and NRAT), we employed the WideResNet-28–10 model to train this large dataset. Additionally, as per Wang et al. (2023), we apply label smoothing with a value of 0.1 and separate the first 1024 images of the training set to create a fixed validation set to replace the test data in CIFAR-10, since the distribution of generated dataset is still different from the distribution of the test set of CIFAR-10 dataset, a fixed validation set sampled from the generated dataset can be seen as a more fair comparison to eliminate the impact of distribution distance. We train each method for 150 epochs to observe the training tendency. The performance on the validation set is shown in Fig. 5.

Although the exact number of noisy labels in the generated dataset is unknown, it is clear from Fig. 5 that NRAT exhibits higher clean accuracy on the validation set compared to DM_AT. However, the robust accuracy of NRAT appears slightly lower than that of DM_AT. The best natural accuracy achieved is 77.4% for DM_AT and 82.66% for NRAT, while the best robust accuracy is 49.41% for DM_AT and 49.02% for NRAT.

5 Conclusion

In this paper, we first investigate the performance of existing AT methods when confronted with inherent label noise. We observe that these methods exhibit poor generalization on inherent label noise. To address this issue, we propose a novel noisy robust adversarial training algorithm, i.e., NRAT, by incorporating a robust loss function and a more robust regularization term to enhance adversarial robustness in the presence of inherent label noise. This work is a combination of technologies in the field of noisy labels and AT, aiming to improve the performance of adversarial robustness on more realistic datasets. Comprehensive experiments show that, with inherent label noise, NRAT achieves comparable or superior performance compared to existing AT algorithms in terms of robust accuracy and robust overfitting. Furthermore, we empirically show that NRAT is well-suited for training with large generated datasets, which is the state-of-the-art practice for improving adversarial training.

Availability of data and materials

We used the publicly available CIFAR-10 dataset.

Code availability

All the codes is publicly available on GitHub at https://github.com/TrustAI/NRAT.

References

Amid, E., Warmuth, M.K., Anil, R. & Koren, T. (2019). Robust bi-tempered logistic loss based on bregman divergences. Advances in Neural Information Processing Systems 32
Athalye, A., Carlini, N. & Wagner, D.A. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, pp. 274–283. PMLR
Carlini, N. & Wagner, D. (2017). Towards evaluating the robustness of neural networks. In 2017 IEEE symposium on security and privacy (SP) (pp. 39–57). https://doi.org/10.1109/SP.2017.49.
Croce, F. & Hein, M. (2020) Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning (pp. 2206–2216). PMLR.
Croce, F., Andriushchenko, M., Sehwag, V., Debenedetti, E., Flammarion, N., Chiang, M., Mittal, P. & Hein, M. (2020). Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Dong, Y., Xu, K., Yang, X., Pang, T., Deng, Z., Su, H. & Zhu, J. (2021) Exploring memorization in adversarial training. arXiv preprint arXiv:2106.01606.
Feng, L., Shu, S., Lin, Z., Lv, F., Li, L. & An, B. (2021). Can cross entropy loss be robust to label noise?. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence (pp. 2206–2212).
Ghosh, A., Kumar, H. & Sastry, P.S.: (2017). Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence (vol. 31).
Goodfellow, I.J., Shlens, J. & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I. & Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems 31
He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Jin, G., Yi, X., Huang, W., Schewe, S. & Huang, X. (2022). Enhancing adversarial training with second-order statistics of weights. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15273–15283).
Kim, Y., Yim, J., Yun, J. & Kim, J. (2019). Nlnl: Negative learning for noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 101–110).
Li, Y., Yang, J., Song, Y., Cao, L., Luo, J. & Li, L.-J. (2017). Learning from noisy labels with distillation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Ma, X., Huang, H., Wang, Y., Romano, S., Erfani, S. & Bailey, J. (2020). Normalized loss functions for deep learning with noisy labels. In International conference on machine learning (pp. 6543–6553). PMLR.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.
Article Google Scholar
Natarajan, N., Dhillon, I.S., Ravikumar, P.K., & Tewari, A. (2013). Learning with noisy labels. Advances in Neural Information Processing Systems 26 (2013)
Papernot, N., McDaniel, P., Wu, X., Jha, S. & Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP) (pp. 582–597). IEEE
Rice, L., Wong, E. & Kolter, Z. (2020). Overfitting in adversarially robust deep learning. In International conference on machine learning (pp. 8093–8104). PMLR
Song, H., Kim, M., Park, D. & Lee, J.-G. (2019). How does early stopping help generalization against label noise?. arXiv preprint arXiv:1911.08059.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199
Turkeltaub, T., Mannheim, R., Furman, A., & Weisbrod, N. (2023). Elucidating the relationship between gaseous o2 and redox potential in a soil aquifer treatment system using data driven approaches and an oxygen diffusion model. Journal of Hydrology, 618, 129168.
Article Google Scholar
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X. & Gu, Q. (2019). Improving adversarial robustness requires revisiting misclassified examples. In International conference on learning representations.
Wang, Y., Ma, X., Chen, Z., Luo, Y., Yi, J. & Bailey, J. (2019). Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 322–330).
Wang, Z., Pang, T., Du, C., Lin, M., Liu, W. & Yan, S. (2023). Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638.
Wu, D., Xia, S.-T., & Wang, Y. (2020). Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33, 2958–2969.
Google Scholar
Xiao, T., Xia, T., Yang, Y., Huang, C. & Wang, X. (2015). Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2691–2699).
Xu, W., Evans, D. & Qi, Y. (2017). Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L. & Jordan, M. (2019). Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning (pp. 7472–7482). PMLR.
Zhang, J., Xu, X., Han, B., Liu, T., Niu, G., Cui, L. & Sugiyama, M. (2021). Noilin: Do noisy labels always hurt adversarial training? arXiv preprint arXiv:2105.14676.
Zhang, Z. & Sabuncu, M. (2018). Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems 31.
Zheng, G., Awadallah, A.H., Dumais, S. (2021). Meta label correction for noisy label learning. In Proceedings of the AAAI conference on artificial intelligence, (vol. 35, pp. 11053–11061).
Zhu, J., Zhang, J., Han, B., Liu, T., Niu, G., Yang, H., Kankanhalli, M., & Sugiyama, M. (2021). Understanding the interaction of adversarial training with noisy labels. arXiv preprint arXiv:2102.03482.

Download references

Funding

This work was conducted without any specific funding.

Author information

Authors and Affiliations

Department of Computer Science, Liverpool University, Liverpool, UK
Zhen Chen, Peipei Xu, Xiaowei Huang & Wenjie Ruan
Department of Computer Science, University of Exeter, Exeter, UK
Fu Wang
Department of Computer Science, Lancaster University, Lancaster, UK
Ronghui Mu

Authors

Zhen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ronghui Mu
View author publications
You can also search for this author in PubMed Google Scholar
Peipei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Ruan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ZC contributed to the idea, algorithm, theoretical analysis, writing, and experiments. FW, RM, PX and XH contributed to the algorithm and writing. WR contributed to the idea and writing. All the co-authors participated in the discussion and contributed to refining the manuscript.

Corresponding author

Correspondence to Wenjie Ruan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

This work does not involve any human or animal subjects so has no ethical concerns.

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Editors: Vu Nguyen, Dani Yogatama.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof for Lemma 2

Lemma 2 mainly shows the two conditions for a loss function L to be noise tolerant under asymmetric label noise, which means given a classifier h, if $h^*$ is the global minimizer of R(h), then $h^*_{\eta }$ is the minimizer under the asymmetric label noise, i.e. $R^{\eta }\left( h_{\eta }^{*}\right) -R^{\eta }\left( h^{*}\right) \le 0$.

Proof

For asymmetric label noise, the risk of a loss function L is:

$$\begin{aligned} \begin{aligned} R^{\eta }(h)&=\mathbb {E}_{\varvec{x}, y} L(h(\varvec{x}), y)= \mathbb {E}_{x} \mathbb {E}_{y^{*} \mid x} \mathbb {E}_{y \mid x, y^{*}} L(h(\varvec{x}), y) \\&=\mathbb {E}_{x} \mathbb {E}_{y^{*} \mid x}\left[ \eta _{y^{*}} L \left( h(\varvec{x}), y^{*}\right) +\sum _{k \ne y^{*}} \frac{\left( 1-\eta _{y^{*}}\right) L(h(\varvec{x}), k)}{K-1}\right] \\&=\mathbb {E}_{x} \mathbb {E}_{y^{*} \mid x} \eta _{y^{*}} L \left( h(\varvec{x}), y^{*}\right) +\mathbb {E}_{x} \mathbb {E}_{y^{*} \mid x} \frac{1-\eta _{y^{*}}}{K-1}\left( C-L\left( h(\varvec{x}), y^{*}\right) \right) \\&=\mathbb {E}_{x} \mathbb {E}_{y^{*} \mid x} \frac{C}{K-1}\left( 1-\eta _{y^{*}}\right) +\mathbb {E}_{x} \mathbb {E}_{y^{*} \mid x}\left( \left( 1-\frac{K\left( 1-\eta _{y^{*}}\right) }{K-1}\right) L\left( h(\varvec{x}), y^{*}\right) \right) \end{aligned} \end{aligned}$$

(19)

where $y^{*}$ denotes true labels, $\eta _{y^{*}}$ means the rate of label $y^{*}$ being the true labels, therefore $1-\eta _y^{*}$ is the noisy label rate, then

$$\begin{aligned} R^{\eta }\left( h^{*}\right) -R^{\eta }(h)=\mathbb {E}_{x} \mathbb {E}_{y^{*} \mid x}\left\{ \left( 1-\frac{K\left( 1-\eta _{y^{*}}\right) }{K-1} \right) \left( L\left( h^{*}(\varvec{x}), y^{*}\right) -L\left( h(\varvec{x}), y^{*}\right) \right) \right\} \end{aligned}$$

(20)

If $R(h^*) = 0$ and L is a non-negative robust loss function, then $(L\left( h^{*}(\varvec{x}), y^{*}\right) =0$, since $\eta _{y^{*}}<1$, then $1-\frac{K\left( 1-\eta _{y^{*}}\right) }{K-1}>0$ and we have $R^{\eta }\left( h^{*}\right) -R^{\eta }(h)\le 0$, which means the $h^*$ for clean dataset is also the minimizer for asymmetric noisy dataset, thus completes the proof.□

Appendix B: Robust risk of symmetric KL-divergence

For a dataset with the label y which contains the symmetric label noise with the noise rate $\eta$, we denote the robust risk of $\mathcal {D}(h(\varvec{x}), h(\varvec{x}^{\prime }))$ under the noise rate of $\eta$ as $R^{\eta }(\mathcal {D}, y)$, where $\mathcal {D}$ is any distance metrics, $\varvec{x}$ and $\varvec{x}^{\prime }$ represent natural examples and adversarial examples respectively:

$$\begin{aligned} \begin{aligned} R^{\eta }(\mathcal {D}, y)&=\mathbb {E}_{\varvec{x}, \varvec{x}^{\prime }, y} \mathcal {D}(h(\varvec{x}), h(\varvec{x}^{\prime }))=\mathbb {E}_{\varvec{x}} \mathbb {E}_{y \mid \varvec{x}} \mathbb {E}_{\varvec{x}^{\prime }} \mathbb {E}_{y \mid \varvec{x}^{\prime }} \mathcal {D}(h(\varvec{x}), h(\varvec{x}^{\prime })) \\&=\mathbb {E} \left[ (1-\eta ) \mathcal {D}(h(\varvec{x}), h(\varvec{x}^{\prime }))+\frac{\eta }{K-1} \sum _{k \ne y} \mathcal {D}(h(\varvec{x}), h(\varvec{x}^{\prime }))\right] \\&=(1-\eta ) R(\mathcal {D}, y)+\frac{\eta }{K-1}\left( \mathbb {E} \left[ \sum _{k=1}^{K} \mathcal {D}(h(\varvec{x}), h(\varvec{x}^{\prime }))\right] -R(\mathcal {D}, y)\right) \\&=R(\mathcal {D}, y)\left( 1-\frac{\eta K}{K-1}\right) +\frac{\eta }{K-1}\mathbb {E}\left[ \sum _{k=1}^{K} \mathcal {D}(h(\varvec{x}), h(\varvec{x}^{\prime }))\right] , \end{aligned} \end{aligned}$$

(21)

To simplify, we denote $\mathbb {E}_{\varvec{x}} \mathbb {E}_{y \mid \varvec{x}} \mathbb {E}_{\varvec{x}^{\prime }} \mathbb {E}_{y \mid \varvec{x}^{\prime }}$ in the first line as $\mathbb {E}$ in the following lines. Given the assumption that in clean dataset, the $R(\mathcal {D}, y)$ of KL divergence and symmetric KL divergence are close, then the $R^{\eta }(\mathcal {D}, y)$ is only related to $\mathbb {E}\left[ \sum _{k=1}^{K} \mathcal {D}(h(\varvec{x}), h(\varvec{x}^{\prime }))\right]$, which means in the case of symmetric (random) labels, the prediction distance between $\varvec{x}$ and $\varvec{x}^{\prime }$, this is not a tight bound, however in the case of random labels it becomes evident that both $h(\varvec{x})$ and $h(\varvec{x}^{\prime })$ will be less robust compared to clean labels, take the KL divergence into $\mathcal {D}$, we are not sure $\mathbb {E}\left[ \sum _{k=1}^{K} \text {KL}(h(\varvec{x}), h(\varvec{x}^{\prime }))\right]$ and $\mathbb {E}\left[ \sum _{k=1}^{K} \text {KL} h(\varvec{x}^{\prime }), (h(\varvec{x}))\right]$ which one may have a boarder impact because of the uncertainy of each sample $\varvec{x}_i$, therefore we consider the symmetric KL divergence which utilize both the direction as the more robust and fair counterpart compared with KL divergence which in general the bound of $R^{\eta }(\mathcal {D}, y) - R(\mathcal {D}, y)$ is lower than KL divergence.

Appendix C: Performance of NRAT with MNIST and Fashion MNIST

For MNIST and FashionMNIST, we use a small CNN network with 4 layers as the defense model. The total training epochs are 100 and the initial learning rate is 0.01 which is divided by 10 at the 55-th, 75-th, and 90-th epochs. The perturbation $\varvec{\delta }=0.3$. Results are shown in Table 4, For both symmetric and asymmetric label noise except for the asymmetric label noise 40%, NRAT achieves similar results with the clean dataset which shows a good generability under noisy labels in these two datasets. We omit the results of the CW attack as they are similar to PGD.

Table 4 Robustness performance(%) on MNIST and FashionMNIST of NRAT with 0%(clean) 20% and 40% symmetric/asymmetric label noise

Full size table

Appendix D: Performance of NRAT with AE generation using the proposed loss function

In Algorithm 1, we follow the original PGD attack to use the CE in the AE generation, while in the training optimization, we choose the robust loss function NCE+RCE as the main part, therefore if we replace the CE with our proposed loss function in the AE generation would give a better intuition and a fair performance evaluation. Results are shown in Table 5, which indicate a slight improvement with the original PGD attack.

Table 5 Best and last robustness performance(%) of NRAT using our proposed loss function in AE generation

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, Z., Wang, F., Mu, R. et al. Nrat: towards adversarial training with inherent label noise. Mach Learn 113, 3589–3610 (2024). https://doi.org/10.1007/s10994-023-06437-3

Download citation

Received: 02 June 2023
Revised: 21 August 2023
Accepted: 07 October 2023
Published: 10 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s10994-023-06437-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Nrat: towards adversarial training with inherent label noise

Abstract

Similar content being viewed by others

One radish, One hole: Specific adversarial training for enhancing neural network’s robustness

Combating Noisy Labels via Contrastive Learning with Challenging Pairs

Mutual Diverse-Label Adversarial Training

Explore related subjects

1 Introduction

2 Related works

2.1 Adversarial training algorithms

2.2 Interactions between adversarial training and noisy labels

2.3 Robust loss functions and learning with noisy labels

3 Noisy-robust adversarial training

3.1 Basic notation of AT with inherent label noise

3.2 Interactions between adversarial training and robust loss functions

Lemma 1

Lemma 2

Proposition 1

3.3 Noisy robust cross entropy loss

3.4 Noisy robust adversarial training

3.5 Relation to existing work

4 Experiments

4.1 Experimental setup

4.2 Performance evaluation

4.3 Mitigating double overfitting

4.4 AT with generated data

5 Conclusion

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix A: Proof for Lemma 2

Proof

Appendix B: Robust risk of symmetric KL-divergence

Appendix C: Performance of NRAT with MNIST and Fashion MNIST

Appendix D: Performance of NRAT with AE generation using the proposed loss function

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation