SGMA: a novel adversarial attack approach with improved transferability

Deep learning models are easily deceived by adversarial examples, and transferable attacks are crucial because of the inaccessibility of model information. Existing SOTA attack approaches tend to destroy important features of objects to generate adversarial examples. This paper proposes the split grid mask attack (SGMA), which reduces the intensity of model-specific features by split grid mask transformation, effectively highlighting the important features of the input image. Perturbing these important features can guide the development of adversarial examples in a more transferable direction. Specifically, we introduce the split grid mask transformation into the input image. Due to the vulnerability of model-specific features to image transformations, the intensity of model-specific features decreases after aggregation while the intensities of important features remain. The generated adversarial examples guided by destroying important features have excellent transferability. Extensive experimental results demonstrate the effectiveness of the proposed SGMA. Compared to the SOTA attack approaches, our method improves the black-box attack success rates by an average of 6.4% and 8.2% against the normally trained models and the defense ones respectively.


Introduction
The last decade has witnessed the rapid development of deep neural networks (DNN), convolutional neural networks (CNN), and their applications in various vision-related tasks such as pedestrian trajectory prediction, image restoration, face recognition, and so forth [1][2][3][4][5][6]. Despite the impressive progress, prior studies show that deep learning systems are not always reliable and can be easily misled by carefully designed perturbations. Such malicious perturbations are referred to as adversarial noise, and the resulting data (raw data plus perturbation noise) are called adversarial examples or adversarial attacks [7,8]. The discovery of adversarial examples poses a huge threat to various security-and safetysensitive applications [9][10][11][12].
Adversarial attacks are usually categorized into two types: white-box attacks and black-box attacks, depending on the access of the target model. Briefly, in the white-box attacks, adversaries obtain knowledge of the target model and fabricate adversarial examples accordingly. Therefore, the attack success rates can be very high. By contrast, in black-box attacks where adversaries have partial or no information regarding the target model, we utilize a surrogate model with known information to generate adversarial examples. However, since model structures vary, adversarial examples generated by the surrogate model may be low in transferability [13] and thus have poor attack success rates on the target model [14]. Note that the black-box attack scenario is more common in practice, and thus the investigation of high transferable attacks attracts much attention. In literature, methods to generate adversarial examples with high transferability can be categorized into three paradigms: (1) gradient optimization [15,16]; (2) input transformation [17][18][19][20]; and (3) ensemble-model attack [13,21,22]. On the other hand, there are also many researches on methods to combat adversarial attacks, such as studying the stability of neural network systems [23][24][25] or using methods such as adversarial training [26] to improve the robustness of models. In addition, knowledge distillation, denoising, and random smoothing are also explored to promote deep model's robustness against adversarial attacks [27][28][29][30].
In adversarial examples generation, perturbing the intrinsic features in raw data usually leads to adversarial examples with high transferability. Such intrinsic features are model-agnostic and contribute to the final decision-making, regardless of the specific model structure. However, such model-agnostic features usually mingle with model-specific patterns in model optimization; thus, identifying these model-agnostic features is non-trivial. Prior studies show that many adversarial attacks are likely to overfit the surrogate model in adversary generation and modify the model-specific features instead of model-agnostic features, reducing the transferability of adversarial examples. To differentiate the intrinsic features from "noisy" model-specific patterns in adversary generation, Wang et al. [31] introduced a feature importance-aware (FIA) strategy where a random pixeldropping transformation followed by gradient aggregation is employed. The FIA approach then destroys the intrinsic, model-agnostic features, and thus the transferability of the resulting adversarial examples is significantly improved. Nevertheless, there still exists a drawback in FIA. We find that these noisy, model-specific features are not well filtered out because of the high correlations among adjacent image pixels.
To address the afore-mentioned issues, we propose a novel adversarial attack approach, i.e., Split Grid Mask Attack (being referred to as SGMA for simplicity). To generate adversarial examples with improved transferability, we introduce the split grid masks to identify model-agnostic important features in images; thus, we can significantly improve the transferability of adversarial examples by suppressing these important features. During the recognition process, different models tend to focus on diverse discriminative regions; hence, through removing partial information from the image, CNN can learn information that was not sensitive or important. Thus, the problem of overfitting to the source model can be effectively alleviated through aggregating a number of transformed images. Furthermore, the SGM-transformed images preserve spatial structural and textural information that is related with the principal part of the figure. This ensures the model to fluctuate on noise features while simultaneously learn the important features. Specifically, we divide the image into grids of the same size; then in each grid, we generate masks with random positions and random side lengths. While the important features are identified by computing the aggregate gradient of features in the intermediate layers for a set of transformed images. Then, we are anticipated to suppress the influence of important features on the decision of a model; this can be implemented through adding perturbations which can significantly improve the transferability of adversarial examples.
Summing up, the contributions of our manuscript can be elucidated as: The schematic of this manuscript is provided as follows. The "Related works" section reviews related works. In the "Methodology" section, we illustrate the proposed approach, i.e., SGMA, in detail. Extensive experiments and the corresponding results are provided in the "Experimental analysis" section. "Conclusion" concludes this paper and discusses some future works.

Related works
In [32], Szegedy et al. first proposed the concept of adversarial examples and demonstrated the security issues of deep learning models. Since then, derivation of adversarial examples has received numerous attention. Under the black-box attack setting, inaccessibility of information of the target model encourages investigating transfer-based attacks. In this regard, many methods are proposed for adversarial attacks with high transferability [33][34][35].
Among these methods, gradient-based black-box attacks is of great importance. They first adopted a well-trained model as the surrogate source model for adversarial examples generation. Then, these adversarial examples are applied to the targeted model for attacks. In [14], Goodfellow  To improve the transferability of adversarial attacks, Dong et al. proposed to integrate I-FGSM and the momentum method (MI-FGSM for short) in adversary generation [16]. In this method, the gradient vector from previous iterations is accumulated to guide the calculation of the next gradient, helping escape from the local optimal in adversary optimization and thus improving their transferability. Later, Lin et al. proposed a new approach through replacing momentum with Nesterov Accelerated Gradient, i.e., NI-FGSM [15]. Before calculating the gradient in each iteration, NI-FGSM makes a prediction in the direction of the previously accumulated gradient. Thanks to this looking ahead property, it is much easier and faster to escape from the local optimum. Then, Wang et al. proposed VMI-FGSM through adjusting the iterative gradient according to the gradient variance [37]. When computing the gradient at each iteration, the current gradient is no longer directly used for momentum accumulation. In addition, the gradient variance of previous iteration is also considered for the adjustment of the current gradient.
Input transformation are also explored to promote the transferability of adversarial examples. Inspired by the fact that data augmentation can alleviate the overfitting problem in model optimization, Xie et al. proposed Diverse Input Method (DIM) to improve the transferability through using input transformation [17]. Briefly, gradient are computed on data crafted from random resizing and random padding of the clean image with a certain probability. Later, Dong et al. presented the Translation-Invariant Method (TIM) in which the gradient of the original image is replaced with the average gradient of a set of translated images [18]. Later, Lin et al. considered the scale invariance of CNN and then proposed an adversarial example generation algorithm, i.e., Scale-Invariant Method (SIM) [15]. In SIM, the generation of perturbations is guided by computing the average gradient of a set of scaled images. Moreover, Zhang et al. proposed the Admix Attack Method (Admix) to introduce information from other categories of images [19]. Specifically, this approach first randomly selects images of other classes and then mixes them with the original images; thus, the average gradient of the mixed images is utilized to update the perturbation. On the basis of information deletion, Hong et al. developed an adversarial example algorithm, named Grid Mask Attack (GM-Attack) [38]. Here, proper information removal can ensure the CNN to extract more features; thus, the problem of adversarial examples overfitting the source model can be effectively alleviated by GM-Attack.
The afore-mentioned methods mainly focus on attacking the last layer of CNN. It is intuitive to ask the question: Whether it is also effective to attack the features of middle layers? Various scholars devote their efforts to tackling this question [39,40]. Zhou et al. presented the Transferable Adversarial Perturbations (TAP) to improve the transferability through increasing the feature distances between the original image and its adversarial example in the intermediate layers [41]. Later, Ganeshan et al. proposed the Feature Disruptive Attack (FDA) [42]. In FDA, the features of each layer in the model are corrupted which eventually leads to wrong classification results. Inkawhich et al. modifies the features of the original image such that the resulting middle layer representation is closer to an image from another class, thereby improving the transferability of adversarial examples [43]. To further promote the transferability, Huang et al. proposed the Intermediate Level Attack (ILA), aiming to fine-tune the previously generated adversarial examples through increasing their perturbations to pre-specified layers in the source model [44].
Recently, differentiation of model-agnostic features and model-specific features in adversarial robustness is proposed. Specifically, a deep learning model might extract noisy features that adapt to its structure. However, these model-specific patterns may not be utilized by other models with different structures. Hence, perturbing these features is of limited significance in improving the transferability of adversarial examples. Therefore, it is very important to identify and attack the important features that can dominate the decisions of different models. For instance, Wang et al. presented the Feature Importance-aware Attack (FIA) [31]. Different from previous methods that modify all features indiscriminately, FIA distinguishes the model-specific and model-agnostic, intrinsic features through gradient aggregating and focuses on perturbation of those important features, which improve adversarial examples' transferability. Similar to FIA, Zhang et al. proposes the Random Patch Attack (RPA) [45], where random patch transformation is used to identify model-agnostic features for enhancing attack transferability. To further improve the transferability of adversarial attacks, this study proposes a novel method namely SGMA. It uti-lizes aggregate gradient to differentiate features and selects important features to attack. Details of the proposed SGMA will be illustrated explicitly later.

Methodology
In this section, we illustrate the proposed SGMA explicitly with the overall architecture being provided in Fig. 1. Firstly, we propose the SGM-transformed approach to handle the pre-processing of initial input images while a variant, i.e., SGMR-transformed approach, is also provided. Then, details of the adopted gradient aggregating model are provided. After that, we present the overall architecture to illustrate the process of our SGMA.

SGM-transformed image-processing approach
Given an input image m, we randomly generate some grid areas and then remove the corresponding pixels inside the grids similar as in [38]. Specifically, we first specify a matrix M with the elements being filled with 1. The size of M is the same as the input image m. Then, the matrix M will be divided into a number of grids , where the side length of each grid is d. In each grid, we generate a square mask with a side length of d * r where r denotes the side length keep-ratio of the mask; and we set the value of pixels inside the mask to 0. Let the relative positions of each mask to the upper left corner of the grid be represented as ψ x and ψ y respectively. Then, the generated masked image can be expressed as: where denotes the element-wise product.

SGM-transformed approach
In this study, we introduce a novel input-transformation approach, i.e., Split Grid Mask-transformed (being referred to as SGM-transformed for simplicity), to incorporate randomness in adversarial example generation process to improve its transferability. Different from the GM-Attack [38] where the derived masks are uniformly distributed, we propose to assign a random offset (s x , s y ) to the mask in each grid. s x and s y follow the uniform distribution between (−S, S), where S represents the maximum offset. Furthermore, the side length of each mask also changes randomly, following the uniform distribution between (−C, C), where C denotes the maximum varying of the side length. Thus, the final generated masked image can be represented as: where p represents the occurrence probability of side length varying.

SGMR-transformed approach
We further improve the proposed SGM-transformed approach by incorporating mask random rotation. In this variant of SGM-transformed approach, after the generation of masks in each grid, we introduce a rotating operation to the obtained masks, i.e., rotating all masks clockwise by the same angle of θ . We assume the rotation angle follows a pre-defined distribution such as the uniform distribution between 0 and 90 • . The proposed variant is denoted as SGMR-transformed approach with the overall formula being represented as:  After generating masks using Eq. (2) or Eq. (3), the input images are pre-processed, generating a set of masked images for downstream gradient aggregating and feature significance estimation. For better interpretation, an illustrative example of the mask generation process is provided in Fig. 2.

Gradient aggregating
Aggregate gradient upon the SGM-and SGMR-transformed images is capable of highlighting the object-related features meanwhile attenuating the noisy, model-specific features. From this perspective, it facilitates to boost the transferability of generated adversarial examples. We illustrate the gradient aggregating process adopted in this work in Fig. 3.
Let f denote the source/surrogate model. For a given input m, the corresponding feature maps of the kth layer can be denoted as f k (m). Here, we can adopt the gradient to reflect the relative importance of features: where l(., .) represents the logit output corresponding to the true label y real . Note that the gradient on the original image usually does not highlight the object-related features very well. To identify and highlight significant features, we propose to calculate the aggregate gradient from the set of masked images by the proposed SGM-transformed/SGMR-transformed approach. The corresponding aggregate gradient is calculated as where ens represents the number of masked images, m denotes the masked image generated by the SGM-

Proposed SGMA approach
As illustrated in Fig. 1, after obtaining the aggregate gradients from the masked images, we obtain the weighted feature maps of the input image. Based on the weighted feature maps, important features can be highlighted accordingly. During adversarial example generating process, the following loss function is used to represent the important features: where m adv represents the adversarial example corresponding to the input image m. In Eq. (6) where represents a pre-specified threshold while the term m adv − m ∞ ≤ indicates that the difference between m adv and m must be smaller than .
We utilize the momentum optimization to solve Eq. (7) and the detailed processes of the proposed SGMA approach are specified in Algorithm 1 (lines 10-13). Furthermore, a variant SGMRA can be derived accordingly by replacing the SGM-transformed image processing approach with the SGMR-transformed one [i.e., utilizing Eq. (3) instead of Eq.

Dataset description
In this manuscript, without loss of generality, we also select the dataset from a widely adopted ImageNet-compatible dataset [46]. The surrogate models are well trained with high classification accuracy.

Algorithm 1 The SGMA Algorithm.
Require: Input image m, true label y real , classifier f with loss function L; Perturbation size , number of iterations T , the decay factor μ; Length of the grid d, side length keep-ratio r , maximum offset S; Maximum varying of the side length C, the number of masks ens; Ensure: An adversarial example m adv ; 1: Initialization α = /T ; m adv 0 = m; g 0 = 0; g 0 = 0; 2: Calculate the aggregate gradients: 3: for i = 0 → ens do 4: Derive the masked image by adopting Eq. (2); 5: Determine the gradient of the masked image:

Hyper-parameters
To be in consistent, we adopt the attack parameter settings in [31]: the maximum perturbation is set to 16, the number of iteration T is assigned to 10, and the step size α equals to 1.6. For DIM, the transformation probability p is 0.7. While for TIM, the adopted kernel size is 15. For FIA, we assign the ensemble number N to 30. As to the drop probability p d , the value is set to 0.3 and 0.1 when attacking normally trained models and defense ones respectively. For RPA, we set the ensemble number N to 60, the modify probability p m is set to 0.3 and 0.2 when attacking normally trained models and defense ones respectively. For the proposed SGMA, the ensemble number ens equals to 30, the length of the grid d will be randomly selected in the range of [3,105]. The side length keep-ratio r is 0.5 and 0.6 for the normally training model and defense one respectively, the maximum offset S and the maximum varying of the side length C will vary Bold values represents the best performance The first column illustrates source models, and the first row enumerates target models. Our methods are SGMA and SGMA + PIDIM. * represents the values obtained for white-box attacks within d. Our choice of layers to be attacked is the same as that in [31].

Comparison of transferability
In this section, aiming to illustrate the superiority of SGMA approach, extensive experiments are conducted with the transferability of the obtained adversarial examples being compared with those derived by the considered baseline approaches. Furthermore, we consider applying attacks against two types of models, i.e., vanilla trained models and adversarially trained ones. Bold values represents the best performance The first column illustrates source models, and the first row enumerates target models. Our methods are SGMA and SGMA + PIDIM

Attacking vanilla trained models
Here, we incorporate Inc-v3, IncRes-v2, Res-152, Vgg-16 as the source/surrogate models and then attacks are applied to all vanilla trained models. We do not choose TIM for comparison because it is designed for the defense models. The corresponding simulation results are presented in Table 1. As indicated, the attack success rates of our method in the blackbox setting are increased by an average of 6.4% over the other state-of-the-art methods. Especially for adversarial examples generated on Inc-v3 and IncRes-v2, our method can improve the transferability by over 10%. Furthermore, our attack approach can also be effectively combined with other attack approaches to further improve the transferability. When our method is combined with PIDIM (i.e., SGMA + PIDIM) craft adversarial examples on Res-152 and Vgg-16, the average attack success rates on all the models are above 95%. Compared with other feature-level attack approaches, the performance of our method is superior in both white-box and black-box scenarios. To improve the transferability, FIA and RPA sacrifice certain level of performance regarding some white-box attacks; especially in IncRes-v2 model, the corresponding attack success rates are reduced to 89.2% and 80% respectively. However, the white-box success rates of SGMA approach are all higher than 99%.

Attacking adversarially trained models
Then, we consider the attacks against defense models. The corresponding results are provided in Table 2 which are also compared with those obtained by considered baselines. Since the defense models are usually robust to adversarial examples, the derived attack success rates are likely to decrease compared with those obtained for normally trained ones. However, as indicated, our approach still significantly outperforms the other SOTA attack approaches. We find that our approach is able to increase the black-box attack success rate by an extent of 8.2% on average. Especially when the adopted source model is Res-152, the attack success rate of SGMA approach on Ens-IncRes-v2 is approximately 26% higher than the previous best result obtained by RPA. When generating adversarial examples on Vgg-16, the attack success rates of SGMA exceed 90% on all the considered defense models; specially, the success rate against Adv-Inc-v3 is increased to 95.9%. As to the combined methods, when attacking Inc-v3 and IncRes-v2, the SGMA + PIDITIM approach can further improve the transferability by 18.52% and 12.36% on average respectively.

Varying layer selection for the source model
The selection of feature layer is a key factor which plays an important role in affecting the performance of featurelevel attacks. For DNNs, early layers usually extract only low-level features while data-specific feature sets are still under-constructing. Once enough features are extracted to model the data, later layers will combine and optimize lowlevel features to form high-level ones aiming to improve the eventually classification accuracy [51]. Therefore, the early layers have not learned the semantic information and important features related with the object, whereas the later layers are likely to extract too many noise features. Thus, we target to find the layer that has learned the semantic information and important features related with the object sufficiently while not too many noise features are incorporated simultaneously. In fact, the features of the middle layer can avoid the shortcomings of insufficient features and high correlation with the model; hence, the selection of appropriate middle layer might lead to promoted transferability. For simplicity, we just adopt Vgg-16 and Inc-v3 for illustration while the corresponding results are illustrated in Fig. 4a, b, respectively. As revealed, our experiments on Vgg-16 and Inc-v3 prove the afore-conclusion regarding the selection of middle layers. We find that for the considered two models, while there usually exists an optimal layer corresponding to a maximum attack success rate. This indicates that adversarial examples obtained by attacking the middle layer are likely to be of higher transferability compared with those being generated by attacking other layers.

Effects of varying r and ens
In this section, we explore the effects of varying the other hyper-parameters on the final performance of the proposed SGMA approach. Here, we mainly consider two hyperparameters, i.e., the keep-ratio of the side length (r ) and the ensemble number (ens); while the success rates under the black-box attack for different scenarios are derived with the corresponding results being provided in Fig. 5. For simplicity, we adopt the Inc-v3 as the source model and then adversarial examples are derived by setting different r and ens. Here, we suppose r varies from 0.3 to 0.7, and the increment is 0.1. For each r , ens is iterated from 10 to 70 with a step size of 10. Four models, i.e., Res-152, Vgg-16, Ens3-Inc-v3, and Ens-IncRes-v2, are considered as the target model to be attacked by the generated adversarial examples while the transferability is reflected by the attack success rate.
In practice, a small r will remove too much information from the input image, which may result in the failure of extracting object-related features; whereas a large r retains too much information which might limit the performance of the method. This can be reflected by the obtained results  Fig. 5. As reflected, the optimal r is around 0.5 when attacking the considered normally trained models (Res-152 and Vgg-16 are considered for simplicity); while the value is around 0.6 if attacking the considered defense models (Ens3-Inc-v3 and Ens-IncRes-v2 are incorporated). Generally speaking, the attack success rates increase with the increase of ensemble number ens. Nevertheless, when ens is large enough, the success rate tends to saturate and may even decline. Hence, for the parameter setting for the SGMA approach, we set ens = 30, while r is assigned to be 0.5 and 0.6 for the normally training model and defense one respectively.

Comparison of GM-attack versus SGMA versus SGMRA
The aggregate gradients reduce the intensity of the noise feature by taking advantage of its vulnerability to image transformation. Therefore, increasing the sampling space of the transformed images can help highlight important features. Similar as GM-Attack, SGMA and SGMRA are subject to the image transformation category based on information deletion. All these approaches are aiming to achieve a good balance between deleting information and retaining information, which can help DNNs to extract sufficient features. Whereas the difference lies in that SGMA and SGMRA can efficiently increase the randomness of image transformation; thus, we can efficiently expand the example space of transformed images. Therefore, they further highlight the important features of the image which plays an important role in guiding the generation of adversarial examples.
Aiming to valid this inherently, we perform experiments through attacking different models while the IncRes-v2 is adopted as the source model; the corresponding results are provided in Fig. 6. As indicated, we find that the attack success rates of GM-Attack, SGMA and SGMRA increase gradually while the performances of SGMA and SGMRA are always much better than that of GM-Attack. For the majority Adversarial images are generated on the source model (VGG16 is considered) and utilized to attack the target model (Inception-V3 is adopted). Our SGMA reduces the model's ability to capture important features of objects and focuses on completely irrelevant regions instead; whereas the model's attention to adversarial example generated by FIA partially overlaps with that on the clean image of considered scenarios, the transferability of the adversarial examples crafted by SGMRA is higher than that obtained through SGMA being indicated by larger attack success rates.

Capability of distorting attention region
To visualize the capability of adversarial examples generated by our proposed SGMA in distorting the model's attention, we present the corresponding results in Fig. 7; while the corresponding results for the adversarial examples derived by FIA are also provided.
We find that the corresponding attention region of the model given the adversarial example is completely different from that on the clean image. In contrast, the model's attention to adversarial examples generated by FIA partially overlaps with that on clean image. Whereas for the SGMA, the two regions are totally different; this illustrates the effectiveness of the split grid mask transform in capturing those important features.

Conclusion
In this paper, we propose a novel Split Grid Mask attack (SGMA), which can generate adversarial examples with higher transferability. The SGM-transform can alleviate the overfitting problem by randomly removing some discontinuous regions so that the model can extract more features. Using the aggregate gradients of the SGM transformed image can reduce the intensity of model-specific features and effectively highlight important object-related features of the input images. Perturbing these important features guides the development of adversarial examples in a more transferable direction. As demonstrated by the experimental results, compared with the SOTA transfer-based attack approaches, SGMA achieves higher success rates when attacking both the normal training model and the defense model. Although our algorithm improves the transferability of adversarial examples, there still exists some directions to be investigated in the future. Under the constraints of the recognized modification range, adversarial examples can still be found through careful observation; the success rates of attacking models with defense mechanisms are not high enough. We hope that these problems can be resolved in future research.

Data availability
The data sets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Declarations
Conflict of interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.