An adversarial transferability metric based on SVD of Jacobians to disentangle the correlation with robustness

Qin, Ruoxi; Wang, Linyuan; Du, Xuehui; Ma, Shuxiao; Chen, Xingyuan; Yan, Bin

doi:10.1007/s10489-022-04066-2

An adversarial transferability metric based on SVD of Jacobians to disentangle the correlation with robustness

Open access
Published: 09 September 2022

Volume 53, pages 11636–11653, (2023)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

An adversarial transferability metric based on SVD of Jacobians to disentangle the correlation with robustness

Download PDF

Ruoxi Qin¹,
Linyuan Wang¹,
Xuehui Du²,
Shuxiao Ma¹,
Xingyuan Chen² &
…
Bin Yan ORCID: orcid.org/0000-0002-0393-9641¹

1192 Accesses
1 Altmetric
Explore all metrics

Abstract

Transferability of adversarial samples under different convolutional neural network (CNN) models is one of the metrics indicators to assess the efficiency of adversarial examples and an important research direction in defense of that. Transferability isolate models employ a particular alternative model to avoid black-box attacks. Meanwhile, recent research has revealed that adversarial transferability across sub-models may be utilized to express the diversity requirements of sub-models under ensemble robustness abstractly. Due to the lack of mathematical description for this adversarial transferability, it was utilized to be abstractly described as the diversity of different hypotheses. This paper employs the Jacobians matrix’s singular value decomposition (SVD) to provide a more accurate mathematical description of the transferability of adversarial samples between models and proposes a corresponding evaluation metric. Based on this metric, a new regularization constraint is introduced into the ensemble training, and the adversarial transferability between the sub-models is isolated optimally without the prior information of the adversarial samples. Based on the proposed metric accurately defining the transferability, further ensemble robustness experiments under small-scale dataset disentangle the correlation between transferability and robustness, indicating that the transferability isolation can only achieve robustness under an alternative transfer-based attack with partial sub-models of the ensemble.

Jacobian Regularization for Mitigating Universal Adversarial Perturbations

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks

Benchmarking Robustness Beyond $$l_p$$ Norm Adversaries

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the widespread application of deep learning in image recognition, research related to adversarial samples has encouraged people to rethink the security of deep networks in practical applications. Szegedy et al. [1] firstly introduced the principle of adversarial samples as the small disturbances that do not affect the judgment of the human eye but can make the deep learning network model recognize the wrong category with high confidence. Since then, adversarial samples have been rapidly developed. In the research of adversarial examples, transferable adversarial examples have become an important research direction due to their flexible and extensive application scenarios. In the research of the white-box attack algorithm, the Fast Gradient Sign Method (FGSM) [2] first creates adversarial samples by superimposing the single-step loss gradient through Sign operation with the original image. Based on this algorithm, subsequent studies have successively proposed the Basic Iterative Method (BIM) [3], Momentum Iterative Method (MIM) [4], and Projected Gradient Descent method (PGD) [5] under the demand to improve adversarial examples’ transferability. The adversarial samples’ transferability has been continuously improved through iteration, momentum, and gradient projection. In the black-box algorithm, Natural Evolution Strategies (NES) [6] and Simultaneous Perturbation Stochastic Approximation (SPSA) [7] algorithms improve the transferability of adversarial disturbances under a single estimate, declining the number of queries to enhance the performance of black-box attacks.

In order to promote robustness, the ensemble has become an essential research direction to defend against adversarial samples. Intuitively, the voting mechanism function establishes the error prediction, considering that each sub-model converges towards the wrong prediction. Essentially, the well-calibrated uncertainty estimation for adversarial samples outside the training data distribution ensures the robustness of the ensemble model [8]. Related test results combined with research [9] introduced the concept of diversity of sub-models under ensemble conditions. They experimentally demonstrated the specific correlation of the robustness of the ensemble with the diversity of sub-models. The diversity metric is commonly summarized as the model structure diversity through the wide competition on both attack and defense sides. For attackers, ensemble tends to contain different sub-model architectures as much as possible to empirically generate superior black box alternative models. For a defender, more diverse architecture in the ensemble also means the robustness for adversarial examples [10]. Such empirical conclusions based on sub-model architecture rely on the weak correlation between architecture and gradients. More studies have demonstrated that models trained on the same data set without additional constraints are more inclined to extract the same non-robust properties [11, 12], reducing the effectiveness of such an empirical defense method in practice.

More research further defines the diversity between sub-models through the adversarial transferability description to enhance the ensemble’s robustness [13,14,15]. These methods associate sub-models transferability with three diversity assumptions: 1. Diversity of output logits’ distributions [13]; 2. Diversity of adversarial subspaces [14]; 3. Diversified overlap of non-robust features [15]. Different hypotheses have improved the robustness of the ensemble in the experiment. The common problem of these methods is that there is no accurate mathematical definition of transferability but abstract hypotheses. This paper adopts the first-order approximations of model outputs to define the adversarial transferability through the target singular value with the smallest Wasserstein distance according to the source singular vector. Through the mathematical theory of this metric, the mentioned hypotheses are further explained theoretically, and the shortcomings are analyzed in transferability evaluation. Geometrically, Fig. 1 demonstrates the difference between the proposed evaluation method and these methods by the level set of the gradient optimization problem. This paper further utilizes the proposed transferability metric as a regular constraint in the ensemble training process to isolate the transferability between sub-models. An accurate definition of transferability also disentangles the correlation between robustness and transferability through further ensemble robustness analysis. Finally, the fundamental novelties of the current paper are listed as the following:

1.
Through the first-order approximation of the Jacobian matrix for the CNN model, the Jacobian matrix’s singular value decomposition is employed to quantitatively verify the adversarial distribution. In the case of complex parameters and algorithms of adversarial sample generation, such a definition has more general applicability in defense evaluation.
2.
Based on the interpretability of optimal transfer theory for CNN optimization, the optimal transfer distances of different adversarial distributions are employed to describe the adversarial transferability between different models, to analyze the deficiencies of existing transferability metrics.
3.
This paper employs the proposed adversarial transferability metric as a regular norm item in ensemble training to realize the transferability isolation between sub-models. The robust analysis demonstrates and disentangles the correlation between transferability and robustness in ensembles.

In the following, prior transferability hypotheses are reviewed in Section 2, and preliminaries of transfer-based attack and the first-order analysis of Jacobians matrix are introduced in Section 3. Then, the proposed transferability metric is described in Section 4. Finally, Section 5 gives the experiments and discussion.

2 Related work

Evaluation metric for adversarial transferability

The concept of adversarial transferability is defined as a diversity metric while studying the ensemble robustness [8]. In preliminary practice, sub-models transferability is first described as the diversity of model architecture. However, this evaluation metric confines the improvement of ensemble robust performance [10]. Subsequent studies mostly start from the hypothesis of the correlation between diversity and sub-model transferability to propose different evaluation metrics for ensemble robustness. Based on the model logits outputs, such transferability was evaluated through the diversity of non-maximal predictions between sub-models named Adaptive Diversity Promoting (ADP) [13]. Based on the overlap of adversarial subspaces, such transferability was evaluated through the gradient direction diversity between sub-models named Gradient Alignment Loss (GAL) [14]. Based on the non-robust features extracted by the model, such transferability was evaluated through the diversity degree of non-robust feature space overlap named Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles (DVERGE) [15]. Unlike the assumption mentioned above, this paper gives a first-order approximations expression of adversarial distribution through optimization theory, and the characterization of transferability through optimal transmission theory tends the definition of transferability toward the mathematical analysis of model attributes. In terms of quantitative values, the metric values in this paper make it a regularization norm constraint during training.

Model attributes analysis based on the Jacobian

With the growth of CNN technology, scholars are more inclined to reveal the model’s black-box attributes for more theoretical analysis. The Jacobian matrix-based analysis is getting more and more attention from researchers. On the one hand, the Jacobian matrix’s Frobenius norm is utilized in the regularization training of the model’s robustness [16, 17]. When the adversarial samples are initially discovered, the spectral norm of a certain layer weight of the model is considered a metric for evaluating the model’s sensitivity [1]. Simultaneously, this spectral norm has also been studied as a constrained metric of the generalization performance of the CNN [18], even to promote the efficiency of the generative adversarial networks (GAN) for generating diversity pictures [19, 20]. The model’s weight in a particular layer is the Jacobian matrix for extracting specific features. Based on the model attribute extraction through the spectral norm, the Jacobian matrix’s global spectral norm is further utilized to constrain the model’s robustness [21, 22].

Additionally, as opposed to the robustness problem, the research of black-box attack algorithms [23, 24] adopted the eigendecomposition of the Jacobian matrix to improve the transferability of adversarial disturbances under a single estimate, reflecting that the eigenvectors are more essential metrics to evaluate gradient changes. In [24, 25], it was revealed that the adversarial samples’ iterative generation procedure is the first-order approximation to the maximum singular vector of the Jacobian matrix through the power method [26]. Thus, the adversarial network training is equivalent to the Jacobian matrix’s spectral norm regularization. Based on these studies, the Jacobian matrix’s Frobenius norm is connected with the transferability of Universal Adversarial Perturbations (UAP) through further mathematical analysis [27]. It can be theoretically proved that the variations of upper bounds under transfer attack are quantified by the cosine distance between the Jacobian matrix when the singular values are fully aligned.

Most of the existing robust optimization approaches for the Jacobian matrix are based on the solution of its norm. Although the norm solution skips the complete singular value decomposition process to simplify the calculation process, it cannot fully describe the distribution of adversarial samples. This paper converts the high-dimensional Jacobian matrix into a two-dimensional condition for batched singular value decomposition. It is utilized as a simplified expression of the adversarial distribution model without generating adversarial samples to optimize the transferability expression under the Jacobian matrix.

3 Background

3.1 Transfer-based black-box attacks algorithm

The white-box attack considers that the model’s classification is approximately linear in the high-dimensional space of features. Thus, the gradient of loss function reflects the optimal direction to the classification boundary. Under such a linear assumption, a gradient-based white-box adversarial attack algorithm was developed. Then, the FGSM algorithm was proposed under the single-step gradient direction as (1):

$$ x^{\prime}=x+\epsilon*sign(\nabla_{x}J(f,x,y)) $$

(1)

where $x^{\prime }$ is the adversarial example, and ∇_xJ(f,x,y) is the gradient of loss function calculated under model f, image x, and label y.

This paper considers the attack transferability of adversarial examples generated from the surrogate model to the target model. Although the attacker does not know parameters, even the gradient information of the target model, it can obtain a surrogate model trained on the same dataset. Thus, the adversarial sample generated from the surrogate model under a gradient-based algorithm makes the scenario a transfer-based black-box attack. In order to simultaneously improve the attack performance of adversarial examples under the surrogate model and transferability under the target model, the iterative optimal gradient-based method has been studied, described with the following equation:

$$ x_{t+1}=\underset{X+s}{\Pi}\left( x_{t}+\epsilon*sign\left( \nabla_{x} J(f,x,y))\right)\right) $$

(2)

where π_X+s represents the constraints on perturbation after iteration t. Among various iterative attack algorithms, the PGD method can improve adversarial transferability. The PGD method has been widely utilized in robustness evaluation and adversarial training due to its excellent attack performance. In the PGD method, π_X+s is the projection of the gradient on the spherical surface combined with the random initialization of perturbation to improve transferability. In the following experiments, the adversarial distillation of DVERGE and the attack-based robustness analysis are performed based on the PGD method.

3.2 Robust first-order analysis based on the Jacobian matrix

Define f(x) as the logit output of convolutional neural network f under image x, while $J_{f}(x) = \left (\frac {\partial f_{i}}{\partial x}\right )\mid _{x}$ is the Jacobian matrix under image.is the Jacobian matrix under image x. When the disturbance δ is small enough, the model’s output f can be linearly represented as the first-order Taylor series expansion through the Jacobian matrix J_f(x) by ignoring the higher-order terms:

$$ J_{f}(x)f(x+\delta) \simeq f(x)+J_{f}(x) \delta $$

(3)

The regular term L_q measures the degree of variation of the model’s output, and Cauchy’s inequality defines the upper bound of the output variation as:

$$ \Arrowvert f(x+\delta)-f(x) \Arrowvert \approx \Arrowvert J_{f}(x)\delta \Arrowvert_{q}\leq \Arrowvert J_{f}(x) \Arrowvert_{F}\Arrowvert \delta \Arrowvert $$

(4)

According to the above mathematical analysis, the Jacobian matrix’s Frobenius norm defines an upper bound on the single model’s robustness. By converting the Jacobian matrix to a stacked Jacobian matrix $\bar {J}_{N}$, the upper bound of (4) can also define the theoretical upper bound of transferability attack, which is described with the cosine distance between the Jacobian matrices in [14, 27] as GAL:

$$ \cos \left( J_{i}, J_{j}\right)_{i \neq j} = \frac{\left\langle J_{i}(x), J_{j}(x)\right\rangle}{\left\|J_{i}(x)\right\|_{F}\left\|J_{j}(x)\right\|_{F}} \leq 1 $$

(5)

The upper bound in (5) can be realized if different Jacobian matrices share their corresponding singular vectors, while their singular values are constant up to a fixed scalar. Through the relationship between the adversarial perturbation direction and non-maximal predictions, ADP [13] defines the diversity of adversarial perturbation gradients through the divergence of non-maximal predicted ones. Understanding adversarial transferability through a distance perspective, ADP intuitively characterizes the KLD between the optimal adversarial perturbation gradients through the classification loss.

As shown in Fig. 1, there are some problems with these two definitions of transferability: (1) The Frobenius norm only describes the upper bound requirement of the alignment degree [14, 27]. Since the minimization of this metric during training is an indirect method to isolate the transferability, it cannot accurately characterize the transferability under incomplete alignment; (2) ADP achieves the KLD constraint between the optimal perturbations through the classification loss. However, adversarial transferability does not necessarily occur between the optimal perturbations. The transferability still exists even if the optimal perturbation and the local minimum sub-optimal perturbation have little difference to the model’s output but have a closer KLD. Taking these two issues as the starting point, this article innovatively defines an evaluation metric in the next section for measuring the transferability of adversarial samples and optimizes this metric as the standard term to diversify sub-models as a regular item in network training, which finally isolates the transferability between sub-models.

4 Method

4.1 Preliminaries of SVD and transferability

From the perspective of optimization theory, adversarial sample optimization in (4) aims to maximize $\left \|J_{f}(x) \delta \right \|_{q}$. When q = 2, the goal of adversarial sample optimization can be simplified to a constrained optimization problem of quadratic functions:

$$ \begin{array}{@{}rcl@{}} maxmize\quad \delta^{T} Q \delta \\ subject\quad to\quad \delta^{T} P \mathcal{S} & = k \end{array} $$

(6)

where Q = J^TJ. Since the norm is homogenous, k is set as 1 to solve the optimization problem (6). Given the Lagrangian function $l(x, \lambda )=x^{T} Q x+\lambda \left (1-x^{T} P x\right )$ for the constrained optimization problem (6), the following Lagrange condition can be obtained:

$$ P^{-1} Q \delta = \lambda \delta $$

(7)

Therefore, the eigenvector of P^− 1Q is the optimal corresponding to the solution of the objective function (6). When the perturbation constraint is also under the L₂ norm, P is the identity matrix, and the maximum eigenvalue of Q is the maximum value of the cost function (6). It can be seen that the singular vector of the Jacobian matrix J essentially defines the possible local optimal solutions of δ. Thus, the entire singular matrix describes the model’s adversarial distribution. The singular value corresponding to the singular vector reflects the output disturbance that each adversarial distribution can generate, while the maximum singular value defines the maximum output variation of the model under the L₂ norm. This also reveals why the spectral norm is a more stringent constraint than the Frobenius norm under the single model’s robustness. Each local optimal perturbation solution is approximated under the first-order condition through the singular value decomposition of the Jacobian matrix. The transferability of adversarial examples essentially becomes the degree of alignment of the singular vectors of different Jacobian matrices [25].

In the DVERGE method [15], the degree of alignment is defined by (8) as the adversarial loss of adversarial examples distilled between sub-models. Transferability is explicitly defined by the loss of adversarial examples under the model. $x_{f}^{\prime }$ is the y_s targeted adversarial example from the source model f_i to the targeted model f_j,i≠j based on image x. Due to the randomness of image input pairs$(x, y)\left (x_{s}, y_{s}\right )$, the overall transferability metric through loss function is expressed in terms of expectation.

$$ \begin{array}{@{}rcl@{}} d\left( f_{i}, f_{j}\right)_{DVE}&=&\frac{1}{2} \mathrm{E}_{(x, y)\left( x_{s}, y_{s}\right), l}\left[l_{f}\left( x_{f_{j}}^{\prime}\left( x, x_{s}\right), y\right)\right.\\ &&\left.+l_{f_{i}}^{\prime}\left( x_{f_{i}}\left( x, x_{s}\right), y\right)\right] \end{array} $$

(8)

While minimizing the adversarial loss, the transferability of the largest singular vectors from the source model to the target model can be explicitly characterized, and the effect depends on the adversarial sample generation algorithm and parameters. The starting point of the method and main innovation point of this paper is: how to more reasonably evaluate the alignment degree without relying on the prior information of the adversarial sample and achieve the widely applicable evaluation of transferability.

This paper accurately defines the adversarial transferability through the adversarial distribution characterized by the Jacobian matrix’s singular value decomposition. As shown in Fig. 2, the parameter update based on the classification loss function is considered a constraint on the KLD between the two distributions. Wasserstein GAN (WGAN) [28] indicated that the constraint of the variation between the logits output of different images is essentially a constraint on the Wasserstein distance. Converting to the scenario of adversarial distinguish, the transferability metric accurately defined by adversarial sample in DVERGE [15] can be expressed similarly as the constraint from discrimination of GAN to distinguish the adversarial example right from other sub-models. Without the loss function, the Wasserstein distance is a more efficient optimization objective in optimal transport theory between two distributions [28]. More experiments on other distances can be found in Section 5.1.

4.2 Transferability metric based on Wasserstein distance

Based on the above adversarial distribution and distance characterization related to transferability, the following more accurate assessment of transferability can be defined: Given the singular vector corresponding to the maximum singular value of the source Jacobian matrix, the singular value corresponding to the target Jacobian singular vector under minimizing the Wasserstein distance reveals the approximate transfer adversarial output variation.

The following equation describes the interrelationship of this transferability under the singular value vector (s_vec) and the singular value (s_val):

$$ Measure\_trans_{S \to T} = \frac{\underset{s\_val_{{J_{T}}}}{\arg\max} P\left( \max \left( s\_vec_{J_{s}}\right) \rightarrow s\_vec_{J_{T}}\right)}{\underset{C}{\arg\max} P\left( \max \left( s\_vec_{J_{s}}\right) \rightarrow s\_vec_{J_{T}}\right)} $$

(9)

where the subscripts S and T stand for the source, and the target models, respectively. P is the distribution transfer matrix obtained by calculating the Wasserstein distance, while C is the corresponding transfer work done factor. Algorithm 1 shows the algorithm flow of the method in this section and focuses on the dimensional changes of the Jacobian matrix through the SVD solution process.

The CNN model maps high-dimensional data input to low-dimensional feature vectors. After the loss gradient direction projection, the Jacobian matrix still has [batch-size, image-channel, image-size, image-size] dimensions. Among the available multi-way SVD methods [29], Higher-Order SVD (HOSVD) reveals universal adversarial samples when the Jacobian matrix is flattened in the batch-size dimension. According to the adversarial transferability definition that causes misclassification between different models under the same perturbation and sample, the calculation of the transferability metric should be sample-centric between different models. Thus, HOSVD is not a sample-centric transferability evaluation method and has a high computational cost with high-dimensional operations and constraints. Similarly, the two-Way SVD on Average of X method ignores the sample-centric definition of transferability, and the singular values and singular vectors calculated from Canonical Polyadic Decomposition cannot have a sample-to-sample correspondence. The singular value matrix under different samples can be obtained, using the Replicated SVD, and the dimension of the singular value matrix is compatible with the image size to reduce the computational complexity and memory footprint of the GPU operation. In order to obtain sample-centric singular value decomposition regarding the computational cost of the high dimensional matrix, the Jacobian matrix is merged after loss projection in batch-size and image-channel dimensions. Finally, the 2D batched SVD decomposition is achieved through Replicated SVD in the dimension [image-size, image-size]. This is also the main calculation difference compared with the method depending on the adversarial sample generation. Although the proposed method does not generate adversarial samples, it makes a numerical evaluation.

4.3 Training routine of ensemble based on transferability metrics

After calculating the transferability metric between different sub-models for network ensemble training, the obtained metric is added with traditional classification loss as a common term. After obtaining the constraints loss of each sub-model, this section follows the parameter optimization using the ensemble as the whole model. Equation (10) shows the overall loss function for the optimization problem.

$$ \text{ensemble\_loss} = \lambda \sum\limits_{i=1}^{N} l_{\text{class}}\left( f_{i}(x), y\right)+\sum\limits_{i=1}^{N} \text{Metric\_trans}_{i,j\neq i} $$

(10)

Algorithm 2 shows the algorithm flow of the overall optimization process. It indicates the importance of the L₂ regularization of the Jacobian matrix and gradient clipping in practice while updating parameters. The validity of the Wasserstein distance solution can be guaranteed under the first-order approximation.

Different experimental settings are all trained in parallel on 4 GPUs. The batch size and the initial training rate are 100 and 0.001, respectively, and the sub-model architectures are all ResNet-20 [30]. The experiment proves the same conclusion as DVERGE during training. Accordingly, an Adam optimizer can obtain better isolate transferability starting from scratch. It should be emphasized that the parameter update in this paper does not depend on any adversarial sample generation process, and there is no random factor from the random initialization of the adversarial samples to improve the diversity. The transferability constraints on the ensemble model are only derived from the evaluation metric.

5 Experiment and results

The experiment was first preliminarily verified on the CIFAR-10 dataset [31]. The benchmark model as Baseline is an ensemble network with ResNet-20 as the sub-model architecture. Each sub-model employs the cross-entropy loss as the optimization goal. The training parameters of batch-size were set as 128, while the learning rate was 0.001. The Adam optimizer is employed for 200 epochs of model training, and the learning rate decays by a factor of 0.1 at 100 and 150 epochs respectively. Taking the baseline model as the benchmark performance, the comparison models such as ADP, GAL, and DVERGE are obtained under the same parameter conditions and model architecture with different transferability constraints in the loss function. Among different constraint methods, the ensemble models of Baseline and Adversarial training [32] set the optimizer and loss function of each sub-model to train independently to ensure the sub-models irrelevance. The sub-model architecture, training parameters, and optimizer are the same as the benchmark setting for the proposed transferability metric. Under the same training parameters, the following section presents a fair and effective comparison of the models trained with different constraints.

5.1 Transferability evaluation of adversarial distributions with different distance functions

In this part, under the theoretical characterization of the adversarial distribution by SVD decomposition, the influence of different distance metric functions as the denominator part of (9) on the performance of the transferability metric is discussed to demonstrate the advantages of the applied Wasserstein distance. According to the isolated effects of DVERGE (DVE) [15], GAL [14], ADP [13] and Baseline (Base) on transferability in previous studies, this paper employs the attack success rate of adversarial samples between sub-models as the gold standard for transferability metric. The degree of consistency between the metric under different distance measures and the gold standard demonstrates the effectiveness of Wasserstein distance (Wass) in the transferability evaluation. In order to compare the distance, distance metric commonly used in machine learning are chosen, including Euclidean distance (L2), Cosine distance (Cos), and Maximum mean discrepancy (MMD) applied in domain transfer problems.

The experiment is performed under the same 2000 CIFAR-10 test dataset, regarding the specific detail in this part. The transferability metric of a sub-model to the other sub-models is obtained for each sample as mentioned in Section 4.2 on batch-size and image-channel dimensions, and the average measurement of all sub-models and samples is finally given as the evaluation result of the ensemble model. Tables 1, 2 and 3 show the contrastive experiments results with 3, 5, and 8 sub-models, respectively. The best transferability isolated model evaluated by the corresponding metric is marked with black font.

Table 1 Contrastive experiments on transferability evaluation through different distance metrics under three sub-models. The best transferability evaluated by the corresponding metric is marked with black front

Full size table

Table 2 Contrastive experiments on transferability evaluation through different distance metrics under five sub-models. The best transferability evaluated by the corresponding metric is marked with black front

Full size table

Table 3 Contrastive experiments on transferability evaluation through different distance metrics under eight sub-models. The best transferability evaluated by the corresponding metric is marked with black front

Full size table

It can be seen from the results that under the transferability metric of the gold standard, the transferability isolation effect from high to low is DVERGE, GAL, ADP, and Baseline, respectively. By evaluating these models under different distance metrics, the transferability results evaluated by Cosine distance are too different from binge a metric. The evaluation results of the other distances are consistent for the best isolation mode. However, the MMD and Euclidean distance differ from the actual results in other gold standards. Generally, the transferability evaluations when applying the Wasserstein distance are most consistent with the actual attack situation, demonstrating the validity of (9) in evaluating transferability under the Wasserstein distance.

5.2 Experiment of transferability evaluation between sub-models

The experiments in this section use different transferability metrics to cross-evaluate different transferability isolated ensemble models to study the correlation between different metrics and demonstrate their effectiveness. The experiment contains four evaluation metrics, including metrics in DVERGE, GAL, our paper, and the success rate of adversarial transfer attacks. In addition to the models obtained by the above three metrics as constraints, this experiment further combines the Baseline without any constraints, the ADP model, the adversarial training (AT) model, and the DVERGE model combined with adversarial training (DVE+AT) for comparative analysis. The attacker algorithm employs PGD untargeted attack to generate adversarial samples under different sub-models to attack other sub-models in the ensemble. Finally, the gold standard for evaluation is the average success rate of transferability attacks under a single sub-model against all other sub-models. According to the evaluation of adversarial samples, the experiments give the results under different disturbance constraints to show their consistency. All experiments are performed under the CIFAR-10 dataset. The metric in DVERGE through classification loss is defined in (8). The metric in GAL is defined by (11) as follows:

$$ d\left( f_{i}, f_{j}\right)_{GAL} = \log \left( \sum\limits_{1\leq a\textless b\leq N} \!\exp \left( \frac{\left<J_{i}, J_{j}\right>}{\left\|J_{i}\right\|_{F}\left\|J_{j}\right\|_{F}}\right)\right) $$

(11)

Equation (8) depicts the degree of variation of the output of adversarial distillation of different sub-models to others through the recognition loss under random target categories. The higher this diversity evaluation metric, the smaller the adversarial sample transfer between sub-models. Conversely, (8) and (9) are directly proportional to transferability under numerical quantification. In order to achieve comparison, this section gives the average of different metrics under 2000 test set samples under different numbers of sub-model ensembles. Tables 4, 5 and 6 show the results under different numbers of ensemble sub-models. The top two methods of transferability evaluated under each metric are marked in black. The percentages behind each model represent the base accuracy on a clean test set sample.

Table 4 Cross-evaluate different transferability isolated metrics under three sub-models. The top two methods of transferability evaluated under each metric are marked in black

Full size table

Table 5 Cross-evaluate different transferability isolated metrics under five sub-models. The top two methods of transferability evaluated under each metric are marked in black

Full size table

Table 6 Cross-evaluate different transferability isolated metrics under eight sub-models. The top two methods of transferability evaluated under each metric are marked in black

Full size table

By observing the occupied proportion of the top two results of different transferability metrics under different ensemble models, the ensemble model obtained based on the proposed metric has superior evaluation consistency to the other metrics. A relatively effective transferability isolation effect of the model is obtained through the gold standard under the L2 and Linf constraints, reflecting an effective transferability evaluation metric. It is worth noting that with the increase in the number of sub-models, the transferability metric constraint-based method is not as good as the adversarial training, indicating that taking the isolated transferability as the optimization goal will cause bottlenecks with the increase in the number of sub-models. However, although adversarial training does not cause any constraints on transferability, it has a good performance in isolating the transferability. The relevant reasons are analyzed in two aspects: (a) The robustness of a single model brought by adversarial training can also be reflected in the transferability improvement through attack success rate; (b) The random initialization of the algorithm for adversarial samples brings more different parameter update directions to the model training. Further discussion and analysis will be given in the subsequent analysis of transferability and robustness.

Excluding the transfer evaluation results related to adversarial training, the proposed method achieves a transfer isolation effect closest to DVERGE, defining transferability explicitly through adversarial examples. Since no adversarial samples are generated, the proposed metric does not have the randomness from the initialization and attack categories of adversarial samples, making the quantification more accurate. Beyond that, it also does not have the diversification conditions for gradient update brought by randomness in the optimization process. It is accurate because the evaluation of the adversarial transferability of the presented method is not related to the complexity of the adversarial sample algorithm and parameters that increase the general applicability of the metric in defense evaluation.

5.3 Experiment of ensemble robustness

After obtaining the isolated transferability performance of sub-models, this section aims to demonstrate the relevant conclusions between the sub-model’s transferability and the ensemble’s robustness through the adversarial robustness analysis. The ensemble robustness experiments are categorized into white-box and black-box attacks. Under the white-box attack, the experiments evaluate the PGD method under different perturbation limits of L₂. PGD generates untargeted adversarial perturbations using 50 step size parameters with an eps/5 iteration step size and 5 random initializations. The ensemble robustness of different models under different transfer constraint methods is verified in Fig. 3(a)-(c). The horizontal axis stands for the attack perturbation. In contrast, the vertical one stands for recognition accuracy.

As shown in Fig. 3(a)-(c), DVERGE’s ensemble model provides better robustness among all methods under white-box attack. DVERGE combined with adversarial training has the worst performance for clean accuracy but the smallest decreasing amplitude of accuracy under the white-box attack, indicating the best robustness. Compared with Baseline, ADP has no noticeable improvement in robustness; GAL cannot provide better robustness than ADP under a small disturbance. However, the robustness of GAL is better than ADP with the increase of perturbation, ranking only second to DVERGE. In contrast, although the proposed method achieves better results than GAL and ADP in transferability, it provides the worst robustness under white-box attacks. Such a conclusion differs from the previous ensemble robustness and transferability conclusions. Through the theoretical analysis of singular decomposition in this paper, this different conclusion may be due to improving the model’s robustness by the previous method.

According to (9), transferability is mainly characterized by the target singular value with the smallest distance according to the source singular vector. Considering the robustness constraint of the largest singular value, reducing the largest singular value (improving the robustness) can, no matter what, indirectly make the singular value with the closest distance decrease, thereby achieving transferability isolation. The GAL method takes transferability as the starting point. However, due to the upper bound constraint condition, GAL essentially constrains the singular value in the optimization without considering the mutual distance of the singular vector. GAL tends to decrease all model’s singular values through sub-model interaction constraints in the optimization process. The transferability upper bound proposed by GAL can obtain a small value under the small F-norm of each the Jacobian matrix of sub-model; that is, the relative robustness of the sub-model through (4). Although the DVERGE method explicitly defines the adversarial transferability, the number of iterative rounds, the iterative step size, and initialization direction will affect the convergence rate of the adversarial samples through the gradient descent perspective of adversarial optimization. The small number of iterations (DVERGE sets it as 10) and random initialization conditions (random initialization of adversarial samples and randomization of adversarial classes) in the adversarial example generation process may cause incomplete convergence of distilling adversarial examples and some extent, achieve the adversarial training on other sub-models. This also explains why different iteration step sizes affect the robustness of the ensemble through the DVERGE method.

The metric in this paper does not constrain the maximum singular value of sub-models, which is an important aspect that distinguishes the proposed method from others. Only looking at the success rate of the adversarial transfer attack, although the adversarial training can also achieve a good transfer isolation effect, this transfer isolation effect is based on the overall reduction of singular values. Even though DVERGE achieves excellent isolation of transferability, the best robust performance should be further improved through adversarial training. Based on Tables 4, 5 and 6, DVERGE with adversarial training has a decline in transferability. This also shows from another perspective that there is no correlation between ensemble robustness and transferability. The transferability metric adopted in this paper accurately disentangles the correlation between transferability and robustness under ensemble. Although good robustness can achieve better transferability isolation, transferability isolation cannot achieve robustness even under ensemble conditions.

The experiments on black-box adversarial samples are derived from the transferable adversarial samples generated by different attack methods using the baseline model as an alternative model in the DVERGE. Three types of attack methods, including PGD [5], M-DI2-FGSM [33], and SGM [34], are generated, and the final accuracy rate is calculated comprehensively under different types of adversarial samples. Each sample collects 30 adversarial counterparts for black-box testing according to the permutation and combination of the loss function, the number of surrogate models, and different attack methods. In terms of evaluation criteria, according to DVERGE’s settings, the robustness of the model to this one sample can be deduced if the 30 types of adversarial samples are correctly identified. Figure 3(d)-(f) presents the black-box attack evaluation results. Due to transferability isolation for more transfer attacks, the proposed method provides better results than the Baseline and ADP. However, it is still poor compared with the DVERGE and GAL methods in terms of robustness.

The above black-box experiments show that the isolation of transferability has relatively limited black-box robustness. In order to further clarify the advantages of transferability isolation, this section assumes different attack scenarios to explore the key of transferability in adversarial example defense. When the model parameters are treated as protected private data, the attacker suffers from limited knowledge of the sub-models. The experiment evaluates the influence of partial model leakage to attackers on the robustness of the ensemble model. The ensemble model is attacked according to the untargeted adversarial samples of each sub-model under the white-box condition, and the average value of the attack success rate under different sub-models is taken as the evaluation parameter of the robustness result. The algorithm adopts the PGD attack under the L2 constraint for adversarial samples. The iteration round is 50, the iteration step size is eps/5, and the number of random initializations is 5. Table 7 presents the relevant results, while the best results are marked in red.

Table 7 Robustness analysis of the ensemble model in the case of leakage of a single sub-model

Full size table

Based on Table 7, the ensemble model optimizing the transferability metric in this paper has good robustness under the white-box attack caused by the leakage of some sub-model parameters. This robustness increases with the number of sub-models. This also shows that the improvement of transferability to robustness does not lie in traditional white-box or black-box attacks scenarios but in the white-box transfer attacks scenario caused by leaking some model parameters. The isolation of transferability without constraining the maximum singular value helps the white-box attacker steer the generation of adversarial examples in different directions, reducing the impact of model data leakage on the overall model robustness.

5.4 Robust radius analysis based on the decision boundary

The above contradictory correlation results are found through different experimental settings. This combines the analysis of singular values to disentangle the correlation between robustness and transferability. In the previous analysis, the attack success rate of multiple samples was mainly utilized as the same gold standard of transferability and robustness. Based on the results in Tables 4, 5, and 6, it is impossible to judge what factors influence the attack success rate. This also leads to inconsistency between different metrics and gold standards in the comparison, which is not conducive to the conclusion of the disentanglement transferability and robustness. In order to further illustrate the correlation between robustness and transferability and demonstrate our conclusion, this section uses the robust radius represented by the decision boundary under a single sample to make a complete demonstration.

The decision boundary sample points are within a 2D plane span of an adversarial direction and a random Rademacher vector around a testing image. Then, the classification output of these sampled points is evaluated and visualized in the plot. The model prediction under different perturbations is drawn considering the gradient direction obtained by the sub-model through loss function as the vertical axis and the random Rademacher direction as the horizontal axis. Different colors represent different categories. Figure 4 presents the relevant results under the 3 sub-models of the ensemble.

Taking the baseline model as a robust benchmark, it can be seen that the baseline has certain robustness in the sub-model and the final ensemble, rather than being completely easy to attack. AT significantly expands the robustness range (blue area in the graph) in both the sub-model and the ensemble compared with the baseline model. Based on the analysis in the previous section, although the GAL and DVERGE methods take transferability as the starting point, they still promote the model’s robustness to some extent. This improvement in robustness is reflected in the decision boundary as the expansion of the blue area compared to the baseline model. In contrast, due to the lack of constraint on the largest singular value, the isolated transferability model under the proposed method even shows more fragile robustness than the baseline. The evaluations of this robust radius are the same as the results of the robustness experiments under white-box attacks. It is also confirmed that the improvement of the ensemble robustness of previous research is partly due to the imprecise definition of transferability, making constraints tend to optimize the robustness of sub-models. It further confirms our disentanglement conclusion about the relationship between robustness and transferability in the previous section; that is, the isolation of transferability cannot achieve the robustness of the ensemble.

5.5 Transferability contrastive analysis under ImageNet

In order to evaluate the effectiveness of the proposed method on different benchmark datasets, this section conducts experiments on different transferability constraint methods on ImageNet datasets [35]. This section reproduces different transferability constraints of DVERGE, ADP, GAL, Baseline, and AT under the ImageNet dataset. The benchmark model is an ensemble model with 3 ResNet50 sub-models, where each sub-model employs the cross-entropy loss as the optimization goal to train. The optimizer is SGD with 100 epochs. Among different constraint methods, the ensemble models of Baseline and AT set the optimizer and loss function of each sub-model to train independently to ensure the sub-models irrelevance. A unified optimizer applies the other constraint methods with a joint loss function to ensure that the adversarial transferability is not affected by random factors in the optimization process. For the training hyperparameters, the batch-size is 256, while the learning rate is 0.1. The learning rate is reduced by 0.5 after every 10 epochs. Due to the significant memory footprint of the Jacobian matrix, the proposed method sets the batch-size to 128, and other training hyperparameters and sub-models for the ensemble are compatible with other methods. All models are trained by parallel acceleration calculated with 8 RTX3060s.

After obtaining models with different loss function constraints under the ImageNet dataset, experiments are performed on the transferability of sub-models and the ensemble robustness for the single-model leakage to detect differences in conclusions on small-scale datasets. As discussion in [5], the larger the attack capability, the worse attack performance to ensure the generalization of adversarial samples for the transfer-based attack in the case of significant model capability. Therefore, this section evaluates the transferability using FGSM attack algorithm with L2 constraint. The evaluation criteria are compatible with Sections 5.2 and 5.3 while the evaluation of accuracy and attack success rate are discussed under the top1 accuracy rate. Table 8 shows the relevant results.

Table 8 Experiments on transferability and robustness on the ImageNet with FGSM attack under L2 constraint

Full size table

By analyzing the adversarial transferability, the Baseline can achieve transferability isolation under FGSM attacks through random initialization of sub-models and independent SGD optimization. It illustrates the negative correlation between network and attack capability with adversarial transferability, and indicates the singnificant effect of randomness factors on transferability improvement under a large-scale dataset. Compared with the Baseline, DVERGE, GAL, ADP, and AT provide a certain degree of improvement in transferability isolation. Unlike the conclusion under the CIFAR-10, the transferability isolation effect of ADP is more effective than that of GAL, reflecting the specific performance boundaries based on the first-order analysis under the ImageNet. Furthermore, the proposed method achieves better transferability isolation performance than the ADP and GAL in a small perturbation range. However, this conclusion does not hold for larger perturbations, indicating that the effectiveness of the first-order analysis is limited by the perturbation range for larger models and datasets. The proposed method achieves the consistent conclusion as CIFAR-10 in a small perturbation range under the ImageNet. The main reason is that the complexity in the model architecture and the classification task further narrows the approximate conditions for the perturbation range in (3). Thus, the first-order analysis is not always sufficiently accurate under all perturbations. Generally, the influence of network depth and wide on the approximation degree of the first-order analysis determines the complexity of transferability analysis for different datasets, which is also common boundedness in existing research on robustness and even transferability on ImageNet through the first-order analysis.

6 Conclusion

This paper takes the adversarial samples’ transferability between sub-models as the starting point for the study of ensemble robustness. Through the first-order approximation analysis under Lagrange conditions of optimization theory, this paper characterizes the model’s adversarial distribution and output variation based on the singular value decomposition of the Jacobian matrix. Based on this theory, the level set of the gradient optimization theory can analyze the shortcomings of the previous transferability metric. This paper effectively redefines the transferability metric between models by performing optimal transport theory on the singular matrix. Given the singular vector corresponding to the maximum singular value of the source Jacobian matrix, the singular value corresponding to the target Jacobian singular vector under minimizing the Wasserstein distance reveals the approximate output variation. The sub-models obtained by this transfer metric as a standard term achieve the best transfer isolation performance without prior information of adversarial samples. Such a definition has more general applicability in defense evaluation as the mathematical analysis of model attributes in the case of complex parameters and algorithms of adversarial sample generation. Further ensemble robustness experiments and theoretical analyses disentangle the correlation between robustness and transferability. The alternative transfer attack under partial model parameter leakage more reflects the ensemble robustness of transferability. In future research, there are some potential research directions: (1) the robustness of ensemble based on transferability should consider richer ensemble strategies, so that the isolation of transferability can improve the robustness; (2) Except for the Replicated SVD method for multiway Jacobian data, singular values and vectors obtained using HOSVD can be an important research direction to discuss more properties of adversarial samples; (3) On large datasets such as ImageNet, the effective transferability metric based on the high-order analysis with greater capability should be further discussed and studied.

Code Availability

https://pan.baidu.com/s/1W6DQgFS1IMQmDk4-nVXQcQ Extract password: 8wz6

References

Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, et al. (2014) Intriguing properties of neural networks. In: Bengio Y, LeCun Y (eds) 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, Conference Track Proceedings. Available from: arXiv:1312.6199
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and Harnessing Adversarial Examples. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. Available from: arXiv:1412.6572
Kurakin A, Goodfellow IJ, Bengio S (2018) Adversarial examples in the physical world. In: Artificial intelligence safety and security, Chapman and Hall/CRC, pp 99–112
Dong Y, Liao F, Pang T, Su H, Zhu J, Hu X, et al. (2018) Boosting adversarial attacks with momentum. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, pp 9185–9193. Available from: http://openaccess.thecvf.com/content_cvpr_2018/html/Dong_Boosting_Adversarial_Attacks_CVPR_2018_paper.html
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Available from: https://openreview.net/forum?id=rJzIBfZAb
Ilyas A, Engstrom L, Athalye A, Lin J (2018) Black-box adversarial attacks with limited queries and information. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80 of Proceedings of Machine Learning Research. PMLR, pp 2142–2151. Available from: http://proceedings.mlr.press/v80/ilyas18a.html
Uesato J, O’Donoghue B, Kohli P, van den Oord A (2018) Adversarial risk and the dangers of evaluating against weak attacks. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol 80. of Proceedings of Machine Learning Research. PMLR, pp 5032–5041. Available from: http://proceedings.mlr.press/v80/uesato18a.html
Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, et al. (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 6402–6413. Available from: https://proceedings.neurips.cc/paper/2017/hash/9ef2ed4b7fd2c810847ffa5fa85bce38-Abstract.html
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207
Article MATH Google Scholar
Kurakin A, Goodfellow I, Bengio S, Dong Y, Liao F, Liang M, et al. (2018) Adversarial attacks and defences competition. In: The NIPS’17 competition: building intelligent systems, Springer, pp 195–231
Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A (2019) Adversarial examples are not bugs, they are features. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp 125–136. Available from: https://proceedings.neurips.cc/paper/2019/hash/e2c420d928d4bf8ce0ff2ec19b371514-Abstract.html
Li Y, Yosinski J, Clune J, Lipson H, Hopcroft JE (2016) Convergent Learning: Do different neural networks learn the same representations? In: Bengio Y, LeCun Y (eds) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. Available from: arXiv:1511.07543
Pang T, Xu K, Du C, Chen N, Zhu J (2019) Improving adversarial robustness via promoting ensemble diversity. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol 97. of Proceedings of Machine Learning Research. PMLR, pp 4970–4979. Available from: http://proceedings.mlr.press/v97/pang19a.html
Kariyappa S, Qureshi MK (2019) Improving adversarial robustness of ensembles with diversity training. arXiv:190109981
Yang H, Zhang J, Dong H, Inkawhich N, Gardner A, Touchet A, et al. (2020) DVERGE: Diversifying vulnerabilities for enhanced robust generation of ensembles. Adv Neural Inf Process Syst 33:5505–5515
Google Scholar
Jakubovitz D, Giryes R (2018) Improving dnn robustness to adversarial attacks using jacobian regularization. In: Proceedings of the European conference on computer vision (ECCV), pp 514–529
Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Available from: https://openreview.net/forum?id=HJC2SzZCW
Bartlett PL, Foster DJ, Telgarsky M (2017) Spectrally-normalized margin bounds for neural networks. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, et al. (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 6240–6249. Available from: https://proceedings.neurips.cc/paper/2017/hash/b22b257ad0519d4500539da3c8bcf4dd-Abstract.html
Brock A, Donahue J, Simonyan K (2019) Large scale GAN training for high fidelity natural image synthesis. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. Available from: https://openreview.net/forum?id=B1xsqj09Fm
Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Available from: https://openreview.net/forum?id=B1QRgziT-
Sokolić J, Giryes R, Sapiro G, Rodrigues MR (2017) Robust large margin deep neural networks. IEEE Trans Signal Process 65(16):4265–4280
Article MathSciNet MATH Google Scholar
Farnia F, Zhang JM, Tse D (2019) Generalizable adversarial training via spectral normalization. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. Available from: https://openreview.net/forum?id=Hyx4knR9Ym
Zhou L, Cui P, Zhang X, Jiang Y, Yang S (2022) Adversarial eigen attack on black-box models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15254–15262
Khrulkov V, Oseledets IV (2018) Art of singular vectors and universal adversarial perturbations. In: 2018 IEEE conference on computer vision and pattern recognition, cvPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, pp 8562–8570. Available from: http://openaccess.thecvf.com/content_cvpr_2018/html/Khrulkov_Art_of_Singular_CVPR_2018_paper.html
Roth K, Kilcher Y, Hofmann T (2020) Adversarial training is a form of data-dependent operator norm regularization. Adv Neural Inf Process Syst 33:14973–14985
Google Scholar
Boyd DW (1974) The power method for lp norms. Linear Algebra Appl 9:95–101
Article MathSciNet MATH Google Scholar
Co KT, Rego DM, Lupu EC (2021) Jacobian regularization for mitigating universal adversarial perturbations. In: International conference on artificial neural networks. Springer, pp 202–213
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of Wasserstein GANs. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, et al. (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 5767–5777. Available from: https://proceedings.neurips.cc/paper/2017/hash/892c3b1c6dccd52936e27cbd0ff683d6-Abstract.html
Kroonenberg PM (2020) Multiway extensions of the SVD. In: Advanced studies in behaviormetrics and data science. Springer, pp 141–157
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, pp 770–778. Available from: https://doi.org/10.1109/CVPR.2016.90
Krizhevsky A, Hinton G, et al. (2009) Learning multiple layers of features from tiny images
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Available from: https://openreview.net/forum?id=rJzIBfZAb
Xie C, Zhang Z, Zhou Y, Bai S, Wang J, Ren Z, et al. (2019) Improving transferability of adversarial examples with input diversity. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, pp 2730–2739. Available from: http://openaccess.thecvf.com/content_CVPR_2019/html/Xie_Improving_Transferability_of_Adversarial_Examples_With_Input_Diversity_CVPR_2019%_paper.html
Wu D, Wang Y, Xia S, Bailey J, Ma X (2020) Skip connections matter: on the transferability of adversarial examples generated with ResNets. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. Available from: https://openreview.net/forum?id=BJlRs34Fvr
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:84–90
Google Scholar

Download references

Funding

This document is the results of the research project funded by the National Key Research and Development Program of China under Grant 2016YFB050190104 by Xingyuan Chen.

Author information

Authors and Affiliations

Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategy Support Force Information Engineering University, Science Road, Zhengzhou, 450001, Henan, China
Ruoxi Qin, Linyuan Wang, Shuxiao Ma & Bin Yan
PLA Strategy Support Force Information Engineering University, Science Road, Zhengzhou, 450001, Henan, China
Xuehui Du & Xingyuan Chen

Authors

Ruoxi Qin
View author publications
You can also search for this author in PubMed Google Scholar
Linyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuehui Du
View author publications
You can also search for this author in PubMed Google Scholar
Shuxiao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xingyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ruoxi Qin: Conceptualization, Validation, Software, Writing - Original Draft. Linyuan Wang: Methodology, Resources, Supervision. Xuehui Du: Funding acquisition, Investigation, Supervision. Shuxiao Ma: Supervision. Xingyuan Chen: Project administration, Funding acquisition. Bin Yan: Writing- Reviewing and Editing.

Corresponding authors

Correspondence to Xingyuan Chen or Bin Yan.

Ethics declarations

Ethics approval and consent to participate

The article was submitted with the consent of all the authors to participate.

Consent for Publication

The article was submitted with the consent of all the authors and institutions for publication.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Availability of data and materials

The data used in this paper are all from public datasets

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Qin, R., Wang, L., Du, X. et al. An adversarial transferability metric based on SVD of Jacobians to disentangle the correlation with robustness. Appl Intell 53, 11636–11653 (2023). https://doi.org/10.1007/s10489-022-04066-2

Download citation

Accepted: 03 August 2022
Published: 09 September 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-04066-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An adversarial transferability metric based on SVD of Jacobians to disentangle the correlation with robustness

Abstract

Similar content being viewed by others

Jacobian Regularization for Mitigating Universal Adversarial Perturbations

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks

Benchmarking Robustness Beyond $$l_p$$ Norm Adversaries

1 Introduction

2 Related work

Evaluation metric for adversarial transferability

Model attributes analysis based on the Jacobian

3 Background

3.1 Transfer-based black-box attacks algorithm

3.2 Robust first-order analysis based on the Jacobian matrix

4 Method

4.1 Preliminaries of SVD and transferability

4.2 Transferability metric based on Wasserstein distance

4.3 Training routine of ensemble based on transferability metrics

5 Experiment and results

5.1 Transferability evaluation of adversarial distributions with different distance functions

5.2 Experiment of transferability evaluation between sub-models

5.3 Experiment of ensemble robustness

5.4 Robust radius analysis based on the decision boundary

5.5 Transferability contrastive analysis under ImageNet

6 Conclusion

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for Publication

Competing interests

Additional information

Availability of data and materials

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation