Multi-layer noise reshaping and perceptual optimization for effective adversarial attack of images

Adversarial attack aims to fail the deep neural network by adding a small amount of perturbation to the input image, in which the attack success rate and resulting image quality are maximized under the lp norm perturbation constraint. However, the lp norm is not accurately correlated to human perception of image quality. Attack methods based on l0 norm constraint usually suffer from the high computational cost due to the iterative search for candidate pixels to modify. In this work, we explore how perceptual quality optimization can be incorporated into the adversarial attack design and propose a two-stage attack method to reshape the adversarial noise by an initial attack and optimize the visual quality of the attacked images without sacrificing the attack success rate. Specifically, we construct a visual attention network to generate a perceptual attention map to modulate the adversarial noise generated by a base attack method. The network is trained to maximize the visual quality in Structural Similarity Index Metric (SSIM) while achieving the same attack success rate. To improve the image perceptual quality further, we propose a fast search algorithm to perform an iterative block-wise pruning of the adversarial noise. We evaluate our method on the mini-ImageNet dataset against three different defense schemes. The results have demonstrated that our method can achieve better attack performance in image quality, attack success rate, and efficiency than the state-of-the-art attack methods.

Recently, a number of dense adversarial attack methods have been developed, such as FGSM [9], BIM [15], and PGD [17]. The main objective of these methods is to maximize the attack success rate under a l 2 or l ∞ norm noise constraint [17]. However, the resultant attacked images ususally have low preceptual qualities.
Sparse attack methods based on l 0 norm try to modify as few pixels as possible to attack the image without limiting the noise magnitude. These methods usually suffer from low efficiency in achieving a high attack success rate as it is very computationally intensive to search the image space for the candidate pixels [4,7,12]. To alleviate the problem, JSMA [21] and GreedyFool [7] predict a saliency or distortion map using a trained network to guide the search process. However, these maps are not directly optimized towards image quality and attack success rate. On the other hand, attacking based on the l 0 norm constraint alone does not guarantee the high visual quality of the attacked images.
The problem is to design an adversarial attack method to simultaneously optimize the attack success rate, visual qualities of the attacked images, and time efficiency or attack complexity that can be measured by the number of model inferences. To this end, we propose to take advantage of the high attack efficiency and success rate of those l 2 or l ∞ constraint based attack methods, and reshape the perturbation noise to optimize the image quality while keeping the attack success rate. However, it has been well recognized that the l p norm of image noise is not accurately correlated to human perception of image quality [23,32]. In the literature, the structural similarity index (SSIM) has been demonstrated to be an effective metric for perceptual image quality [20,23,28,31]. Perceptual color distance is also a good metric for human perception [16,32]. Figure 1 shows two adversarial examples which have almost the same Peak Signal to Noise Ratio (PSNR) between the original images (left) and the attacked images (right). However, we can clearly see that the adversarial noise in (a) is much more visible and annoying than that in (b), which corresponds to the significant difference between the SSIM values. We recognize that, in different regions of the image, the adversarial noise has different levels of visibility. Or specifically, with the same amount of noise, the SSIM values of different image regions are quite different. For example, the SSIM values of the structural regions are often higher than those in the smooth image regions. This motivates us to argue that the perceptual quality of the adversarial images can be improved by modulating the adversarial noise with the perceptual weights or noise sensitivity levels of different image regions.
Based on this, we propose to develop a perceptually optimized noise reshaping (PONS) scheme to reshape the adversarial noise generated by a base attacker and optimize the visual quality of the attacked images while achieving the same attack success rate. Specifically, in the first stage, we use a convolution SSIM model to calculate the SSIM value between a clean image and its attacked version. Guided by this SSIM visual quality module, the proposed method learns a perceptual attention network that predicts a perceptual sensitivity map to modulate the adversarial noise in the input image, aiming to maximize the visual quality while achieving the same attack success rate. The perceptual attention network and attack method are jointly trained.
Within the context of image classification, we recognize that the decision of the network is binary with the network inference score being compared to a decision threshold. This implies that some regions of the adversarial noise can be removed or pruned to further improve the perceptual quality while ensuring that the classification score remains above the threshold to achieve the same adversarial state. To this end, in the second stage, we propose a fast search algorithm to perform an iterative block-wise pruning of the adversarial noise without affecting the attack success rate.
We have tested our method on the mini-ImageNet dataset against different defense methods. Our method is able to significantly improve the image visual quality over the base attack method without significant loss of the attack success rate. Compared with the state-of-the-art attack methods, our method can achieve better attack performance in terms of image quality, attack success rate, and efficiency.
The major contribution of this work can be summarized as follows: (1) We incorporate the perceptual quality optimization into the adversarial attack method design and propose a two-stage attack method to reshape the adversarial noise generated by an initial attack while achieving the same attack success rate.  [30]. To circumvent the "obfuscated gradients" problem introduced by the defense methods,  [4]. Sparse attack methods attempt to make the perturbations imperceptible by constraining the magnitude of the adversarial noise using l p norm, such as the l 0 , l 2 , and l ∞ norms [22]. To search for a minimal adversarial perturbation for a given image, Moosavi-Dezfooli et al. proposed Deep-Fool [19] which can generate adversarial examples with less amount of perturbations than the FGSM method [9] while achieving similar attack success rates. An extreme case of minimizing image perturbation is the one-pixel attack method in which only one pixel in the image is changed to fool the classifier. Su et al. achieved an attack success rate of 70.97% on the tested images by changing just one pixel of the input image [24]. SparseFool [18] converts the l 0 constraint problem into an l 1 constraint problem and exploits the boundaries' low mean curvature to compute sparse adversarial perturbations. GreedyFool [7] selects a few of the most effective candidate pixels to modify using a predicted distortion map as guidance. TSAA [12] translates a benign image into an adversarial image by a generator network which is trained to learn the mapping between natural images and sparse adversarial images.

Defense against adversarial attack
In this work, we evaluate the attack methods against different defense schemes. Here we briefly review the related defense works. Recently, many methods have been developed for defending deep neural networks against adversarial attacks. Adversarial training is a common method to increase the network robustness by adding adversarial examples into the training data [14,26,27]. Tramer et al. proposed an ensemble adversarial training method to augment training data with perturbations transferred from other models [27]. Transformation to the input is another way to defend against adversarial attacks, such as bit-depth reduction, JPEG compression, and total variance minimization [10]. The network structure can be modified to improve its robustness to adversarial attacks. Dhillon et al. pruned a random subset of activations according to their magnitude to enhance network robustness [6]. Cihang et al. proposed a feature denoising method using non-local means or other filters. Along with adversarial training, the adversarial robustness can be substantially improved [29].

Method
In this section, we present the proposed perceptually optimized noise reshaping (PONS) method for adversarial attacks.

Problem formulation
Let X be a natural image, and y T be the corresponding ground-truth label, also referred to as the target label. A classifier θ (X) = y takes an image X as input and predicts its label y ∈ Y, where Y is the output label space. The goal of the adversarial attack is to generate an adversarial noise Z constrained by l p norm so that the attacked image X a = X + Z is misclassified by the target model. In this work, the image quality SSI M(X, X a ) is maximized during the attack process. Mathematically, it is written as where is the perturbation budget, p is the order of matrix norm and it can be 0, 1, 2 or ∞.

Method overview
In this work, we propose a novel two-stage adversarial attack framework that takes advantage of the high attack efficiency and success rate of a base attack method and reshapes the perturbation noise to optimize the visual quality of the attacked images while maintaining the same attack success rate. As illustrated in Fig. 2, the proposed PONS method uses the SSIM perceptual quality and sensitivity analysis to drive the adversarial noise generation process. In the first stage, we use a convolution model to calculate the SSIM index between the attacked image and the input, and a perceptual attention network to predict the perceptual sensitivity map, which is used to modulate the adversarial noise generated by the base attack method. The attacker network and the perceptual attention map prediction network are learned end-to-end to optimize the SSIM image qualities while achieving successful adversarial attacks. In the second stage, we propose a fast binary search method to perform a block-wise pruning of the adversarial noise based on the perceptual sensitivity map to improve the image qualities further.

SSIM prediction network and perceptual sensitivity map
SSIM has been extensively used as a metric to measure the perceptual quality of images. Here we give a brief definition. For more details, please go to the reference [28]. Let A and B be the two images being compared. A window Fig. 2 Diagram of the proposed PONS method. The lower figure is a two-stage attack process. In the first stage, the perceptual attention network (PAN) is used to predict a perceptual attention mask using the extracted features from the images to modulate the adversarial noise generated by the base attack method. In the second stage, to further reduce the perturbation, a block-wise perturbation pruning is applied to find out image blocks to turn off the perturbation. In the upper figure, the perceptual attention network is trained to optimize the image quality (L q ) and attack success rate (L c ), where the SSIM model is used to calculate the SSIM score between the clean image and the adversarial moves pixel-by-pixel from the top left corner to the bottom right corner of the image. In each step, the local statistics (A j , B j ) is calculated within local window j as follows [28]: , (2) where m A j , m B j , σ A j , σ B j , and σ A j B j represent the average intensity of image patches A j and B j , the standard deviation of A j and B j , and covariance between A j and B j , respectively. C 1 and C 2 are two small constants introduced to avoid numerical instability. The SSIM index between A and B is defined by [28] ( where N s is the number of local windows in the image and W (A j , B j ) is the weights applied to window j [28]. It should be noted that the above computation is highly nonlinear. In this work, we use a PyTorch model (https://github.com/aserdega/ssim-pytorch) to approximate the SSIM function (, ).
We introduce the perceptual sensitivity map to measure the impact of adversarial noise on the image quality at different locations. For image A, B with size of [W, H, C], we propose to use the feature map F (m, n, c), 1 ≤ m ≤ W , 1 ≤ n ≤ H , 1 ≤ c ≤ C, from the last layer of the SSIM calculation network. This feature map represents the impact of the difference between the attacked image and the original clean image on the overall image SSIM quality. The perceptual sensitivity map σ (m, n) is defined as Figure 3 shows an example of the perceptual sensitivity map. The figure shows that the smooth areas are relatively whiter than those texture-rich regions, which means smooth areas are more sensitive to adversarial perturbation.

SSIM-optimized adversarial attack
The core idea of our method is to reshape the perturbation noise in adversarial examples that are successfully attacked by a base attack method. The base attacker usually has a high attack success rate. This work considers PGD and its variant BPDA as the base attack method. The PGD is an iterative version of the original FGSM method which adds perturbation along the gradient direction of the loss ∇ x L(X, y T ; θ) onto the input X to generate adversarial example as follows [9]: where L(·) is usually defined as the cross-entropy loss, and controls the l ∞ -norm of the difference between X and X a . The PGD method iteratively updates the image as [17] where t is the iteration index, α is the step size, and [X; X a ] is a clipping function which makes sure that the largest difference between X and X a is less than .
As illustrated in Fig. 3, different image regions have different levels of sensitivity to the adversarial noise. Motivated by this observation, we propose to modulate the adversarial noise by a learned perceptual weight mask M = [M(m, n)], as illustrated in Fig. 2. The perceptually modulated PGD attack is given by where Z t+1 is the final perturbation generated by the base attack method. The perceptual mask M reshapes the adversarial noise such that, in those image regions whose visual quality levels are sensitive to noise, the noise magnitude is reduced. In our proposed method, this perceptual mask M is predicted by the perceptual attention network which is learned with the target network, the attack network, and the SSIM network. The output mask has the same size as the input image. The proposed perceptual mask prediction network is based on an auto-encoder-decoder structure with a Resnet-18 backbone [11]. The training process is to maximize the SSIM qualities while achieving successful Smooth areas are relatively whiter than texture-rich regions, which means higher sensitivity to noise attacks of the original images. Specifically, let y A be the attack target, the incorrect label that the attack forces the classifier θ to produce, and y a be the current classifier softmax output. We use the following cross-entropy loss L C between the attack target y A and the actual classifier output y a The following loss is then used to train the perceptual attention network where λ is a weight factor, (X a t+1 , X) measures the SSIM quality between the current attacked image X a t+1 and the original image X.

Block-wise binary pruning of adversarial noise
In the above section, we have learned a network to predict a perceptual attention mask M(m, n) to modulate the adversarial noise. It should be noted that this learned mask M(m, n) is a continuous value that aims to maximize the overall SSIM image quality of the attacked image. This prediction and quality optimization is a global process, and all entries of the mask M are jointly predicted from one network inference. From our experiments, we observe that the value of each individual entry M(m, n) can be fine-tuned to further improve the perceptual quality of the attacked image. To this end, we propose a fast and efficient method, called block-wise binary pruning to perform local adjustment of the perceptual mask. Specifically, we equally partition the mask M into non-overlapping blocks and, for each block, the proposed algorithm will make the following binary decision: the adversarial noise within this block being kept (indicated by 1) or totally removed / pruned (indicated by 0). Conceptually, we need to remove or turn off the adversarial noise in those image blocks with high levels of sensitivity to noise to maximize the perceptual quality while achieving a successful attack of the image. In (4), we have derived the perceptual sensitivity map from the SSIM index calculation network. We propose to use this perceptual sensitivity map [σ (m, n)] W ×H to guide the block-wise binary pruning of adversarial noise. Each entry σ (m, n) of this map represents the perceptual sensitivity of an image pixel. The central task here is: we need to select a subset of these image blocks to remove or prune their adversarial noise, i.e., setting their attack noise to zeros, so that the SSIM perceptual quality of the attacked image is maximized without changing the adversarial state. Clearly, it is very computationally intensive to search for this subset of image blocks. Most importantly, at each search step, we need to run the classification network to check whether the resulting image is still successfully attacked or not. Therefore, the number of search steps has to be very limited.
To address this issue, we propose a fast binary search algorithm to obtain a sub-optimal solution. Specifically, we first sort the perceptual sensitivity value σ [k] of all blocks in an ascending order, denoted by σ which is the smallest index with successful attack, and which is the largest index with failed attack. Then, in our proposed binary search, we set Once the I t+1 is determined, we will evaluate (X + Z[I t+1 ]) and repeat the above steps until I t+1 − I t < 1. The overall binary search-based method is outlined as Algorithm 1. Figure 4 shows four examples of the above binary search process. The horizontal axis is the total number of blocks where the adversarial noises are removed.
As the number of pruned blocks increases, the SSIM quality of the attacked image improves significantly. The ending value represents the point where the attack fails, which is the target value that the binary algorithm aims to find.

Attack complexity analysis
Besides the computation consumed by the base attack method, the extra complexity of our method comes from the inference of the perceptual attention prediction network and the binary search for adversarial noise pruning. The first part is a one-time cost, which is relatively small, depending on the model complexity. The complexity of the second part is the number of search steps multiplied by the complexity of the target classification network. Theoretically, our binary search in Algorithm 1 has the complexity of In our experiments, the image size is 224 × 224, the block size is 4×4, so the total number of blocks is N = 3136. Therefore, the maximum number of model inferences is 12.

Experiments
In this section, we provide extensive experimental results to evaluate the performance of our proposed perceptual attention-guided visual quality optimization algorithm for adversarial attacks.

Experimental setup and datasets
In this section, we compare the performance of our method to the state-of-the-art attack methods with different defensive schemes. The methods include two dense attack methods PGD and BPDA, and four sparse attack methods Perc CW [32], PGD L0+σ [4], GreedyFool [7] and TSAA [12]. The performance is studied in metrics of the SSIM score between the clean image and the adversarial, attack success rate, and attack efficiency in terms of the number of target model inferences.
As Perc CW is optimized on the color distance score, we also use perceptual color distance to measure the quality of the adversarial images [32]. To compare to Perc CW and PGD L0+σ , we only compare the adversarial image qualities at five different attack success rates as it is timeconsuming to get the desired attack success rate due to the high dimensional search space of their hyperparameters. For example, Perc CW has five parameters that can affect the attack success rate. We keep running these two tools with different hyperparameter settings and use the attack result if its attack success rate is within [0.85 0.95]. GreedyFool [7] and TSAA [12] are l 0 norm based attack methods. GreedyFool requires a large number of iterations to obtain good attack results. So we only evaluate it using one model. TSAA is slightly different from the others as it is designed mainly for the black-box attack to improve the transferability of the adversarial examples. We still show its results for comparison.
The dataset is the mini-ImageNet [5] which consists of 60,000 images for 100 classes. We select 20 classes, and for each class, 100 images are randomly selected as test images, and the rest is used for training.

Resnet-34 with adversarial training for defense
In this experiment, the target model is Resnet-34 which is adversarially trained on the training dataset, and the default classification accuracy is 76%. During adversarial training, the adversarial examples are generated by PGD with default parameters = 0.05, alpha = 0.01, num iter = 10, where alpha, num iter are the step size and the number of iterations, respectively. The parameter of λ in (9) is set to 0.001 during training of the mask prediction network. Table 1 shows the average SSIM score of PGD and our method at different values of . The scores are calculated over all tested images that are correctly classified by the target model. We can see that the SSIM gain obtained by the proposed PONS algorithm is quite significant. When = 0.03, which is quite small, the SSIM gain is about 7%. For the loss of attack success rate, it is quite small.   Table 2 shows the average SSIM score of Perc CW [32] and PGD L0+σ [4] and ours. Our method improves the SSIM score and outperforms these two state-of-the-art methods by large margins, especially at high attack success rates. For example, for Perc CW, the SSIM gain of our method is 0.35 at the attack success rate 0.95, and for PGD L0+σ , when the attack success rate is 0.94, the SSIM gain is 0.1.

Resnet-101 with feature denoising for defense
In the following experiment, the target model is the Resnet-101 network which is modified by adding Feature denoising layers [29] to reduce the effect of adversarial noise. The model is also adversarially trained using PGD attacked images. The model has a classification accuracy of 80% on clean images. Table 3 shows the average SSIM score of the baseline PGD method and our PONS method at different values of . Similar to the previous experiment, our method has significantly improved the SSIM quality. For the attack success rate, the loss is very small, mostly less than 1%. For tests with = 0.06, 0.07, 0.09, 0.1, the drop of success rate is zero. Table 4 shows the average SSIM scores of Perc CW, PGD L0+σ and our method, where we can see  The target model is Resnet-101 with feature denoising layers for defense that our method has achieved much higher SSIM scores, especially when the attack success rate is greater than 0.9.

Resnet-50 with input transformation for defense
Input transform has been demonstrated as an effective method for adversarial defense [10]. In this experiment, the input transformation is bit reduction [10], which removes the least 5 significant bits of each pixel value. Or, equivalently, the pixel values are quantized into the set {0, 32, 64, 128, 160, 192, 224}. It should be noted that this bit reduction processing is not differentiable. In this case, the BPDA method uses an identity function during the gradient backward propagation process. During the BPDA attack, the number of steps is set to 200, and the learning rate is set to 0.1. The target model is Resnet-50, which is also adversarially trained using PGD-attacked images. The baseline classification accuracy on clean images is 77.3%. Table 5 shows the average SSIM score of BPDA and our method at different values of . Our PONS method can significantly improve the SSIM visual quality of the images attacked by BPDA. In the meantime, the drop of attack The target model is Resnet-50 with the defense method of bit-depth reduction. The SSIM improvement is quite significant, especially when ≥ 0.03. Moreover, the loss of attack success rate is ignorable The target model is Resnet-50 with the defense method of bit-depth reduction success rate remains very small. Again, Table 6 compares the average SSIM against Perc CW and PGD L0+σ . The improvement is consistent and significant.

Color distance comparison
We notice that Perc CW is optimized on the color distance or difference [16,32] between the attacked image and the original one. Table 7 shows the color distance scores between Perc CW and our method at different attack success rates in each of the previous tests. Smaller color distance with respect to the natural image means better image quality of the adversarial. We can see that even though our method is not optimized on the color distance, the reported average color distance is still comparable to that of Perc CW. For most of the cases, our method is even better.

Comparison to GreedyFool and TSAA
TSAA [12] and GreedyFool [7] generate adversarial samples based on l 0 norm constraint which try to modify fewest number of image pixels. We study their attack performance when the perturbation magnitude, i.e., l ∞ norm constraint, is also applied. Table 8 shows the attack result using the model Resnet-34 in Section 4.2.1 when changes from 0.1 to 0.5 and 1.0, in which = 1.0 means the adversarial attack is fully under l 0 norm constraint. The three numbers in the table are the average SSIM, attack success rate, and the number of model inferences. For TSAA, the performance improves when increases from 0.1 to 0.5 and 1.0, achieving the best average SSIM of 0.65 and attack success rate of 0.53. For GreedyFool, the number of iterations used to search for the candidate pixels affects its performance significantly. When = 1.0, GreedyFool achieves the best average SSIM (0.97) and attack success rate (0.98) within 500 iterations, i.e., the GF(500) in the table. However, the actual average number of model inferences is as high as 669, which costs a significant amount of time. Compared to Table 1, our method can achieve the average SSIM of 0.93 and attack success rate of 0.97 with only 12 model inferences at = 0.1. In this case, GreedyFool works fully under the l 0 norm constraint. We have further checked the Mean Squared Error (MSE) of the attacked images and found that GF(500) has an average MSE of 104.95, and ours is 80.02. This means that GreedyFool adds much more perturbations to images than our method. For GF(500), when = 0.5, the attack success rate drops to 0.89 at the cost of 1009 model inferences. When only 12 iterations are  (12) is the GreedyFool method with the parameter of iteration set to 12, and GF(500) sets the parameter to 500. The three numbers in the table are the average SSIM, attack success rate, and the number of model inferences  Figure (a), (b), and (c) shows the SSIM improvement by the perceptual attention prediction and block-wise pruning of the adversarial noise. In all three figures, the black line is the base attack method, the brown line is the SSIM score with only perceptual attention prediction, and the orange line shows the SSIM score with both attention mask prediction and block-wise noise pruning used for GreedyFool, i.e., the GF (12), it fails to attack the images. The success rate is almost zero.

Ablation studies
In the following experiments, we conduct ablation studies to further understand the performance of the proposed PONS method.

Contribution of algorithm components
Our method has two stages, perceptual attention mask prediction, and block-wise adversarial noise pruning. In Fig. 5, we show the average SSIM improvement brought by each stage in the previous tests. The horizontal axis is the perturbation budget . In each sub-figure, we show the average SSIM score for (a) the baseline attack method, (b) the baseline method plus the perceptual attention noise reshaping, and (c) the baseline method plus both algorithm modules. In all three tests, we can see that both algorithm modules contribute significantly to the overall performance. The performance gain achieved by the binary pruning is more significant than the perceptual attention noise reshaping, especially for the BPDA attack method. For example, in Fig. 5 (b), at = 0.05, the SSIM score of the base attack is 0.84, the perceptual attention module increases it to 0.87, and the block-wise perturbation pruning further increases it to 0.95.

Comparison of different block sorting schemes
In our method, we use SSIM sensitivity score to sort the image blocks to find out candidate blocks to turn off the perturbation by the binary search algorithm. Table 9 compares the average SSIM scores when different block sorting schemes are used, namely 1) sorting by the SSIM sensitivity, 2) original order (no sorting), and 3) randomly sorting. From Table 9, we can see that sorting image blocks according to the SSIM sensitivity score achieves the largest improvement of the SSIM score. The random sorting is the least. For example, in the test of BPDA with input transform, when = 0.05, the average SSIM scores are 0.88, 0.90, 0.95 respectively, sorting based on the SSIM sensitivity improves the average SSIM by 5 percentage points when compared to the case without any sorting.

Effect of SSIM loss ratio
To train the perceptual attention model, Eq. (9) has two loss terms, the cross-entropy between the attack target and the predicted labels, and the SSIM image quality between the attacked image and the input image. The value of ratio λ affects the quality of the attention mask predicted by the The three image block sorting methods are 1) original order, i.e., no sorting, 2) randomly sorting, and (3) sorting by the SSIM sensitivity score. All three tests show that block-wise perturbation pruning with blocks sorted by the SSIM sensitivity score gives the best performance perceptual attention network. More weight on the SSIM term will likely improve the image quality and decrease the attack success rate. We redo the test of Table 1 in Section 4.2.1, but without the block-wise noise pruning. Table 10 shows the performance of average SSIM and attack success rate with different values of λ. We can see that the performance is quite robust when λ increases from 0.001 to 0.5. When it reaches 1.0, the average SSIM increases, and the attack success rate drops significantly, compared with the case of 0.001.

Effect of block size in perturbation pruning
In block-wise perturbation pruning, increasing the block size will speed up the process and likely lower the image quality.  The two numbers are the average SSIM and the number of model inferences that the average SSIM is highest when the block size is 2, which is reasonable as we search for the smaller blocks to turn off the perturbation. However, it is the slowest among the four. For block sizes of 4 and 8, the average SSIM is very close, which means our method can be even faster without significant loss of the image qualities. Figure 6 compares the image quality of several adversarial examples generated by the base attack method and our method. For each pair, the left one is generated by the base attack method, and the right one is ours. The three rows correspond to = 0.05, 0.06, 0.07, respectively. For each adversarial image, we also show the SSIM score with respect to the clean image. We can see that our method significantly improves the visual quality of the attacked images. In Fig. 7, we show several pairs to compare the quality of the adversarial images generated by Perc CW, PGD L0+σ and our method. We also show the SSIM score of each adversarial image. These sample pairs are taken from attack tests with similar attack success rates. We can see that under the same attack success rate, the qualities of the attacked images by Perc CW, PGD L0+σ are worse than ours.

Discussion and future work
In the previous section, we have compared our method to the state-of-the-art dense and sparse attack methods against different defense schemes. The experimental results have demonstrated that our method can achieve significantly better attack performance in image qualities between the clean image and the adversarial, the attack success rate, and the attack efficiency in terms of the number of model Image quality comparison against Perc CW and PGD L0+σ . The first row is the comparison between ours and Perc CW, and the second row is against PGD L0+σ . The numbers in parentheses are the SSIM scores. Each pair is taken from tests with similar attack success rates inferences. GreedyFool under l 0 norm constraint, i.e., the GF(500) in Table 8 at = 1.0, has achieved a significant better average SSIM than ours at = 0.1 in Table 1. However, the average MSE and attack complexity are substantially worse than ours.
The advantage of our method is that we use a base attack method that has strong attack capabilities so that we only need to optimize the image qualities without significant loss of attack success rate. Tables 1, 3, and 5 have demonstrated this. On the other side, the attack success rate of our method is largely determined by the base attack method as we only reshape the adversarial noise to improve the image quality. For this point, we can increase the perturbation budge to obtain a high attack success rate, which will inevitably introduce more noise to the images and require a more powerful perceptual attention network to produce high-quality perceptual attention masks. A better base attack method will certainly help improve the performance of our method.
In this work, the perceptual attention network is designed based on the Resnet-18 backbone to demonstrate its effectiveness. If we use a more advanced architecture for the perceptual attention network, we expect to see more improvement in the image quality and attack success rate.

Conclusion
In this work, we have observed that most existing adversarial attack methods are designed to maximize the attack success rate under the l p norm constraint, which has not fully considered the perceptual sensitivity of the adversarial noise in different image regions. Motivated by this, we propose a novel two-stage attack method to maximize the image perceptual quality as well as the attack success rate. Specifically, we construct and learn a perceptual attention network to generate a perceptual attention mask to modulate the adversarial noise generated by a base attack method in the input image, aiming to maximize the visual quality while achieving the same attack success rate. To further improve the image perceptual quality, we propose a fast binary search algorithm to perform an iterative pruning of the adversarial noise based on the perceptual sensitivity map. We have conducted comprehensive evaluations and demonstrated that our method could significantly improve the image visual quality over the base attack method without sacrificing the attack success rate. When compared with the state-of-the-art adversarial attack methods, our method can achieve better attack performance in terms of image quality, attack success rate, and attack efficiency.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.