Introduction

DNNs have achieved incomparable performance in various of traditional computer vision tasks, e.g., image classification [1], image processing [2,3,4],image quality assessment [5,6,7,8], etc.

Research has demonstrated that deep learning algorithms can surpass human performance in specific tasks, such as facial recognition [9] and image classification [10]. It has been shown that deep learning algorithms even outperform human beings for certain tasks, e.g., face recognition, image classification, and so on. These computing models has applied in edge computing which emphasizes distributing computation and data to enhance the system efficiency and performance. Edge computing [11,12,13,14]provides real-time data processing, minimizes network latency, and enhances system reliability and security. This technology has a broad range of applications, including industrial automation [15], healthcare [16], smart cities [17], environment [18], IoT [19] and so on.

However, the robustness of the DNN models are still weak, and DNN-based systems are highly vulnerable to adversarial examples [20]. For instance, the adversarial examples, which are generated by injecting elaborated perturbation into clean images, lead the classifier to misclassify it. If the weak models are applied on edge-cloud computing, then the security issue on identification will become severe. Although Kurakin et al. [21] stated that adversarial examples brought the hidden threat to the classifiers in the physical world scenarios, it had significant meanings in the privacy-preserving field and its relevant applications. Massive face photos are shared in social networking services (SNS) in our daily life. To avoid their identity information being used maliciously, their private information is protected by injecting elaborated perturbation into photos. However, perturbation commonly leads to the degradation of the perceptual quality, which decreases the photo-sharing experience on social media. Hence, how to generate perturbation to avoid being perceived by the HVS and being able to fool the DNNs simultaneously is a significant problem to settle.

Most existing methods [20,21,22,23,24,25,26,27,28,29] focused on tricking DNN based recognition systems. Only a few of them [27,28,29] tried to preserve the quality of adversarial examples. However, the perceptual quality of adversarial images is still awful.

In this paper, an HVS-inspired adversarial example generation method is proposed for privacy-preserving with high perceptual quality. To achieve this, we first propose a novel perceptual loss based on JND characteristics, which assesses the adversarial examples along three quality-related factors between the adversarial example and its associated original image. The perturbation that is beyond the JND thresholds is perceived by the HVS, and therefore is counted in the perceptual loss in this work. Besides, we develop a perturbation adjustment strategy to integrate into JND-based perceptual loss, which is able to assign more perturbation to the insensitive color channel. All these designs make the perturbation unable to be perceived by the HVS as much as possible while successfully attacking the DNN-based recognition system and maintaining the high perceptual quality of adversarial examples. Experimental results indicate that the suggested method achieves state-of-the-art results in adversarial example generation with respect to perceptual quality.

Related work

Adversarial attack

Szegedy et al. [22] suggested the L-BFGS algorithm along with a box restraint to create the adversarial examples. Goodfellow et al. [23] created the Fast Gradient Sign Method (FGSM), which executes a single step on the clean image using a computed gradient with sign. After that, most of the SOTA works [20, 21, 24,25,26] are based on FGSM. e.g. Rozsa et al. [24] used the sign of the magnitude rather than gradient and proposed the Fast Gradient Value (FGV). Madry et al. [25] presented the Projected Gradient Method (PGD), which is a white-box attack with accessing the model gradients. Kurakin et al. [21] proposed the Iterative Fast Gradient Sign Method (I-FGSM), iteratively updating the image generation. Dong et al. [26] developed Momentum Iterative Fast Gradient Sign (MI-FGSM), which integrates the momentum term to stabilize update directions to escape poor local maxima during iteration. Xie et al. [20] presented the Diverse Inputs Iterative Fast Gradient Sign Method DI\(^2\)-FGSM and Momentum Diverse Inputs Iterative Fast Gradient Sign Method (M-DI\(^2\)-FGSM) [20], which considers the input diversity strategy to generate adversarial samples.

The methods mentioned above only pay attention to finding an efficient way to generate adversarial images to fool the DNNs tasks rather than preserving the adversarial image quality. To hide the unnecessary distortion, for example, Zhang et al. [27] first attempted to integrate the hand-crafted JND coefficients into FGSM while generating adversarial examples. Although the perceptual quality was improved to some degree, its low success attack ratio limited its application of privacy-preserving. To keep a high success attack ratio, Adil et al. [28] iteratively distorted a clean image until the classifier made wrong predictions. It reduced the injected perturbation and distortion of adversarial examples. After that, Sun et al. [29] utilized quality metric SSIM [30] to supervise the adversarial example generation. However, SSIM can’t fully represent the perceptual quality of the HVS. Distortions can be observed in [29].

Hence, generating adversarial examples with high perceptual quality is still an open problem. As reviewed above, the perceptual quality of adversarial examples generated by the SOTA methods above is still unacceptable. The perturbation is not hidden in insensitive areas of images so that the distortion is still obvious.

Just notification difference

JND [31] reflects the minimum amount of change in visual signals that can be captured by the human visual system (HVS). This reflects the redundancy of perceptual information contained in visual signals. Generally, there are two categories of JND models: HVS-inspired [32, 33] and learning-based JND [34,35,36,37,38]. HVS-inspired JND is obtained by the characteristics of the HVS. For example, Chou et al. [31] put forward a JND model in the spatial domain by merging contrast masking (CM) and luminance adaptation (LA). Yang et al. [39] generated the JND model by introducing a nonlinear additivity model for masking effects (NAMM). Wu et al.[33] use the pattern complexity (PC) of visual content to further improve the accuracy of the HVS-inspired JND model. However, due to the HVS is not sufficiently ackownledged by human being, so these hand-crafed HVS feature based method cannot obtain the more accurate JND threshold as well.

As deep learning achieved incomparable success in dozens of visual problems, learning-based JND models were proposed [35,36,37, 40,41,42,43,44,45,46,47]. Due to inefficiency in generating labeled JND datasets, the unsupervised JND models are proposed recently. Jin et al. [37] proposed the RGB-JND model, which takes the stimuli of the whole color space into account. They also proposed HVS-SD JND models using the prior information in image reconstruction task for better guiding the JND generation. It has already achieved best MOS performance among all SOTA models.

JND based privacy-preserving adversarial image generation

Problem formulation

In Sec. 1, the perturbation is required to fool the DNN based recognition systems without being perceived by the HVS. Thus, the identification information of adversarial samples is protected while maintaining their high perceptual quality. We have illustrated the overview of the adversarial attack process in Fig. 1. It is mentioned in figure that the attack process followed by two processes, which is perceptual quality perserving \(L_{per}\) and adversarial attack \(L_{adv}\). The generated adversarial image \(I_{adv}\) is put through the DNN recognization system. If the recognization result is not matched the correct result then the iteration process will continue, else the attack will finish.

Fig. 1
figure 1

The process of HVS-inspired adversarial attack: \(I_{ori}\) is the original clear image, \(I_{jnd}\) is the three-channel JND map through the HVS-SD JND method, \(I_{adv}\) is the generated adversarial image. \(L_{adv}\) and \(I_{per}\) is the process of adversarial attack and quality preserving process respectively

Here, the clean image and its associated adversarial example are marked by X and Z, respectively. The injected perturbation is marked by P. Then, we have \(P = Z - X\). As a reference in our adversarial image generation method, we incorporate the HVS-SD JND [38] and denote the JND of the clean image as J. The HVS-SD JND is a recent learning-based model that outperforms both handcrafted JND methods [32, 33] and other learning-based JND methods [34,35,36,37,38], achieving state-of-the-art performance. Then, the generation of adversarial examples for privacy-preserving can be explicated as follows

$$\begin{aligned} \arg \min _{Z} \mathcal {L}=\left\{ \begin{array}{l} \alpha \cdot \mathcal {L}_{a d v}(Z, X)+\mathcal {L}_{per}(Z, X, J), \text {if non-targeted.}\\ \alpha \cdot \mathcal {L}_{a d v}(Z, C)+\mathcal {L}_{per}(Z, X, J), \text {otherwise.} \end{array} \right. \end{aligned}$$
(1)

Both non-targeted attacks and targeted attacks are included in this work. For non-targeted attack, we have \(\mathcal {L}_{adv}(Z,X)=-\mathcal {E}(\theta (Z),\theta (X))\), where \(\mathcal {E}(\cdot , \cdot )\) is a variant of cross-entropy loss [29]. It makes sure that the recognition result of adversarial example Z (denoted by \(\theta (Z)\), where \(\theta (\cdot )\) denotes the DNN-based recognition system) has deviated from that of the clean image X (denoted by \(\theta (X)\)). For targeted attack, we have \(\mathcal {L}_{adv}(Z,X)=\mathcal {E}(\theta (Z),C)\). It makes sure that the recognition result of Z is close to a targeted label C. \(\mathcal {L}_{per}(\cdot ,\cdot ,\cdot )\) is a JND-based perceptual loss, which is used to maintain the high perceptual quality of Z and is to be introduced in the next subsection. J is the JND threshold generated through [38], which is the latest learning-based JND model that overperforms the SOTA JND models. The hyper-parameter \(\alpha\) has been set to 1 in order to balance between these two factors.

JND based perceptual loss

In this subsection, we introduced a new perceptual loss that is based on JND, \(\mathcal {L}_{per}(\cdot , \cdot , \cdot )\), where three quality-related factors (including the deviation, fidelity, and gradient of the adversarial example) are formulated with three sub-losses by taking the JND into account. Besides, we also design a perturbation adjustment matrix \(\mathcal {M}(\cdot )\) to assign more perturbation to the color channels that are insensitive. The details are as follows.

\(\mathcal {L}_{per}(\cdot , \cdot , \cdot )\) mainly contains three JND based sub-losses: 1) deviation loss \(\mathcal {L}_{1}(\cdot ,\cdot ,\cdot )\), 2) fidelity loss \(\mathcal {L}_{2}(\cdot ,\cdot ,\cdot )\), and 3) gradient loss \(\mathcal {G}(\cdot ,\cdot ,\cdot )\). The effectiveness of the three sub-losses will be proved in ablation experiments in Sec. 4.2. We have

$$\begin{aligned} \mathcal {L}_{per} =\beta _{1}\cdot \mathcal {L}_{1}(Z,X,J)+\beta _{2}\cdot \mathcal {L}_{2}(Z,X,J)+\beta _{3}\cdot \mathcal {G}(Z,X,J) \end{aligned}$$
(2)

\(\mathcal {L}_{1}(Z,X,J)\) is used to control the magnitude of the perturbation. \(\mathcal {L}_{2}(Z,X,J)\) is used to constrain the fidelity distortion between X and Z. \(\mathcal {G}(Z,X,J)\) ensures that Z and X keep a similar gradient. Hyper-parameters \(\beta _{1}\) ,\(\beta _{2}\) and \(\beta _{3}\) are used to balance the three items. Notice that the three items above are designed based on the JND.

Deviation loss \(\mathcal {L}_{1}(Z,X,J)\) describes the actual deviation from adversarial example Z to its associated clean image X. Here, only the deviation beyond the JND threshold is considered as the actual deviation that HVS can obviously be perceived. In other words, the deviation under the JND is not perceived by the HVS, which is not taken into account while calculating the deviation loss. Therefore, we have

$$\begin{aligned} \mathcal {L}_{1}(Z, X, J)= & {} \sum _{n} \sum _{h} \sum _{w} (|Z(n,h,w)-X(n,h,w)| \nonumber \\{} & {} -J(n,h,w)) \cdot \lambda (n,h,w) \end{aligned}$$
(3)

and

$$\begin{aligned} \lambda (h,w,n) =\left\{ \begin{array}{l} 1, \text {if}\ |Z(n,h,w)-X(n,h,w)|>J(n,h,w), \\ 0, \text {otherwise.} \end{array}\right. \end{aligned}$$
(4)

where Z(nhw), X(nhw), and J(nhw) denote the pixels’ value located at (nhw) of Z, X, and J, respectively. n denotes the index of color channel. We have \(n\in \{r,g,b\}\).

Similarly, for fidelity loss \(\mathcal {L}_{2}\), the perturbation beyond (or under) the JND threshold is (or not) counted in, as the fidelity distortion beyond (or under) the JND threshold is (or not) perceived by the HVS. We have

$$\begin{aligned} \mathcal {L}_{2}(Z, X, J)= & {} \sum _{n} \sum _{h} \sum _{w} (|Z(n,h,w)-X(n,h,w)| \nonumber \\{} & {} -J(n,h,w))^2 \cdot \lambda (n,h,w). \end{aligned}$$
(5)

Gradient similarity, as a major perceptual metric for the HVS, is also taken into account based on the JND so that we can better constrain the adversarial example generation. We use the \(\ell _{1}\)-norm variation to describe the gradient loss. The gradient loss \(\mathcal {G}(Z,X,J)\) can be formulated as:

$$\begin{aligned} \mathcal {G}(Z,X,J) = \mathcal {L}_{1}(g(Z),g(X),g(J)) \end{aligned}$$
(6)

where \(g(\cdot )\) is the Sobel operator [48]. We use \(g(\cdot )\) to calculate the gradient in both horizontal and vertical directions of Z, X, and J. Also, only the gradient beyond JND is counted in the gradient loss.

Considering that the HVS has a different sensitivity to different colors. For instance, the HVS has high, medium, and low sensitivity in green, red, and blue, respectively. Hence, there are small, medium and large JNDs in the green, red, and blue channels demonstrated in [37] as well.

In view of this, we design a perturbation adjustment matrix \(\mathcal {M}(n)\) to adjust the distribution of perturbation among different color channels. Hence, Eq. (4) can be reformulated as:

$$\begin{aligned} \lambda (n,h,w) =\left\{ \begin{array}{l} \mathcal {M}(n), \text {if}\ |Z(n,h,w)-X(n,h,w)|>J(n,h,w), \\ 0, \text {otherwise.} \end{array}\right. \end{aligned}$$
(7)

\(\mathcal {M}(n)\) is able to assign more perturbation to insensitivity color channels, like blue and red channels, while less perturbation is assigned to the green channel. The specific value of \(\mathcal {M}(n)\) is adjusted according to the regular pattern of different color channels perceived in [38].

Optimization

During our optimization of the function in Eq. (1), we still use the gradient descent algorithm. Besides, as the proposed algorithm is a balance between the perceptual quality and classification deviation, unsuccess attacks are inevitable when perceptual loss trade off too much against classification deviation loss. To make sure that all the generated examples successfully trick the recognition system and are used for privacy-preserving, the hyper-parameter \(\beta _{i}\) in Eq. (2) will be adjusted when unsuccess attacks occur. The adjustment of \(\beta _{i}\) is as follows

$$\begin{aligned} \beta _{i}=\beta _{i} + \delta \end{aligned}$$
(8)

where \(\delta\) is an adjuster. The adjustment of \(\beta _{i}\) will be activated when unsuccess attacks occur. Then, a new adversarial example will be generated, This will attack the identification system again until it succeeds. With such optimization, we can achieve the 100% success attack ratio on VGGFace2.

Experiments

Experimental settings

Datasets and Anchors. Experiments are conducted on VGGFace2 [9] dataset. The DNN based face recognition system is trained on VGGFace2, which contains 8,631 identities. To determine whether the provided face image applys to corresponding identities, 100 images are randomly selected from VGGFace2 for non-attacked attack and targeted attack evaluation among 6 SOTA anchor methods and the proposed method. The anchor methods include BIM [21], PGD [25], MIFGSM [26], DI\(^2\)FGSM [20], JNDMGA [28], MND [29]. In this paper, it’s important to note that all the outcomes from the suggested approach were achieved.

Evaluation metric. The objective evaluation metrics used to assess the quality of adversarial examples generated with anchor and proposed methods are the Peak Structural Similarity Index (SSIM) and Signal-to-Noise Ratio (PSNR). 60 subjects are invited to conduct the subjective viewing test based on ITU-R BT.500-11 criterion [49]. The participants in the subjective viewing test encompass individuals from diverse backgrounds, including students, teachers, doctors, artists, researchers, and others, all of whom hold a bachelor’s degree or higher. The age range of the participants falls between 18 and 45 years, and they do not have any vision impairments. Out of the total participants, 38 are male, while the remaining participants are female.

Ablation Setting. To verify the reasonability of \(\mathcal {L}_{per}\), ablation experiments are conducted with different settings. The details are listed below:

  • \(\mathcal {L}_{1}\) evaluation: Only deviation loss \(\mathcal {L}_{1}(\cdot ,\cdot ,\cdot )\) in formula (1) is optimized. That is, \(\beta _{1}\not =0\), \(\beta _{2}=\beta _{3}=0\).

  • \(L1_{ori}\) evaluation: Only original L1 loss \(L1_{ori}(\cdot ,\cdot )\) in formula (1) is optimized. That is, \(\beta _{1}\not =0\), \(\beta _{2}=\beta _{3}=0\).

  • \(\mathcal {L}_{2}\) evaluation: Only fidelity loss \(\mathcal {L}_{2}(\cdot ,\cdot ,\cdot )\) in formula (1) is optimized. We have \(\beta _{2}\not =0\), \(\beta _{1}=\beta _{3}=0\).

  • \(L2_{ori}\) evaluation: Only original L2 loss \(L2_{ori}(\cdot ,\cdot )\) in formula (1) is optimized. We have \(\beta _{2}\not =0\), \(\beta _{1}=\beta _{3}=0\).

  • \(\mathcal {L}_{1} + \mathcal {L}_{2}\) evaluation: Deviation loss \(\mathcal {L}_{1}(\cdot ,\cdot ,\cdot )\) and fidelity loss \(\mathcal {L}_{2}(\cdot ,\cdot ,\cdot )\) are optimized for generating Z. That is, \(\beta _{1}, \beta _{2}\not =0\),\(\beta _{3}=0\).

  • \(\mathcal {L}_{1} + \mathcal {L}_{2} + \mathcal {G}\) evaluation: All the three items in formula (1) are used for constrain Z. \(\beta _{1}, \beta _{2}, \beta _{3}\not =0\)

In targeted attack, if \(\beta _{i}\not =0\), we set \(\beta _{i}=30\) and adjuster \(\delta =-1\). In non-targeted attack, for the \(\beta _{i}\not =0\), we set \(\beta _{i}=1500\) and adjuster \(\delta =-50\). \(\alpha\) is set to 1 in both targeted attacks and non-targeted attacks. For metric \(\mathcal {M}(n)\), we set \(\mathcal {M}(r) = 3, \mathcal {M}(g) = 5, \mathcal {M}(b) = 1\), respectively.

Comparison and objective evaluation

Figure 2 shows a comparison of the adversarial images generated with anchors and our proposed method under targeted attack and non-targeted attack. As the perturbation in our adversarial examples under the JND are as much as possible, our adversarial examples are closest to clean images. The algorithm of our adversarial process is as following Algorithm 1. In the algorithm, We generated the adversarial image through JND, if the adversarial image cannot successfull attack the DNN recognization system, then we iterate the process and update the pertubation following the JND threshold.

Fig. 2
figure 2

Comparison among five anchor methods and the proposed method under targeted and non-targeted attack on face recognition. Anchor methods include BIM [21], DI\(^2\)FGSM [20], MIFGSM [26], MND [29], JNDMGA [28] and PGD [25]. A specific part in each image is enlarged in the red box

figure a

Algorithm 1 Optimization for generating HVS-inspired Adversarial image

Table 1 displays the PSNR and SSIM values when comparing a clean image with its corresponding adversarial example through various targeted and non-targeted attack methods. Our method achieves the SOTA performance in both PSNR and SSIM. Besides, the Ablation experiments shown in Table 1 demonstrate the effectiveness of our loss setting for perceptual quality. For instance, by gradually adding \(\mathcal {L}_{1}\), \(\mathcal {L}_{2}\), and \(\mathcal {G}\) into the perceptual loss, the PSNR and SSIM of our method have been improved. Our method achieves the SOTA when all \(\mathcal {L}_{1}\), \(\mathcal {L}_{2}\), and \(\mathcal {G}\) are combined together. To better evaluate the disparity between the \(\mathcal {L}_{1}\) loss and the original \(L_1\) loss which denoted as \(L1_{ori}\), we have the comparasion included an ablation experiment. Similarly, we also added the comparison between the original \(\mathcal {L}_{2}\) loss and the original \(L_2\) loss which denoted as \(L2_{ori}\).

Table 1 Objective Viewing Test Of The Proposed Method And Anchor Methods Conducted Under Targeted/Non-targeted Attacks

Subjective viewing test

Eight images are randomly selected from the VGGFace2 dataset for the subjective viewing test. The results of non-targeted and targeted attacks are exhibited in Tables 2 and 3. A positive (or negative) value of ‘mean’ represents that the adversarial example generated with our method has higher (or lower) perceptual quality than that of the anchors. Besides, the larger value of “mean” represents better perceptual quality. All the positive average values of ‘mean’ demonstrate that the adversarial examples generated by our method are of higher perceptual quality than those generated by the anchor methods. To prove the generalizability of our proposed method, we also selected eight images from the ImageNet dataset [10] for the subjective viewing test. Similar results are shown in Tables 4 and 5.

Table 2 Comparison Of Subjective Viewing Tests Conducted Between The Proposed Method And The Anchor method Using The VGGFace2 Dataset Under Non-targeted Attacks
Table 3 Comparison Of Subjective Viewing Tests Conducted Between The Proposed Method And The Anchor method Using The VGGFace2 Dataset Under Targeted Attacks
Table 4 Comparison Of Subjective Viewing Tests Conducted Between The Proposed Method And The Anchor method Using The ImageNet Dataset Under Non-targeted Attacks
Table 5 Comparison Of Subjective Viewing Tests Conducted Between The Proposed Method And The Anchor method Using The ImageNet Dataset Under Targeted Attacks

Conclusion

In this work, an HVS-inspired adversarial example generation method with high perceptual quality is proposed. Specifically, a JND-based perceptual loss has been proposed by taking three quality-related factors to account for the constraint of the JND thresholds. Besides, we designed a perturbation adjustment strategy to adjust the distribution among different color channels. All these designs above made the perturbation can be tolerated by the HVS as much as possible and demonstrated high perceptual quality. Ablation experiments have demonstrated the reasonability of the proposed JND-based perceptual loss. After wide experiment comparations, the proposed method has achieved the SOTA performance in subjective and objective evaluation.

It should be mentioned that our adversarial examples are iteratively generated under the constraint of the proposed JND-based perceptual loss and adversarial loss. On the one hand, the iterative generation allows us to achieve a high attack rate. On the other hand, the iterative method reduces the efficiency of adversarial examples generation compared with methods without iterations. Hence, we will focus on improving efficiency while keeping high perceptual quality in our future work.