Skip to main content

Verifying Attention Robustness of Deep Neural Networks Against Semantic Perturbations

  • Conference paper
  • First Online:
NASA Formal Methods (NFM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13903))

Included in the following conference series:

Abstract

It is known that deep neural networks (DNNs) classify an input image by paying particular attention to certain specific pixels; a graphical representation of the magnitude of attention to each pixel is called a saliency-map. Saliency-maps are used to check the validity of the classification decision basis, e.g., it is not a valid basis for classification if a DNN pays more attention to the background rather than the subject of an image. Semantic perturbations can significantly change the saliency-map. In this work, we propose the first verification method for attention robustness, i.e., the local robustness of the changes in the saliency-map against combinations of semantic perturbations. Specifically, our method determines the range of the perturbation parameters (e.g., the brightness change) that maintains the difference between the actual saliency-map change and the expected saliency-map change below a given threshold value. Our method is based on activation region traversals, focusing on the outermost robust boundary for scalability on larger DNNs. We empirically evaluate the effectiveness and performance of our method on DNNs trained on popular image classification datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this paper, the term “attention” refers to the focus of certain specific pixels in the image, and not to the “attention mechanism” used in transformer models [39].

  2. 2.

    https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html.

References

  1. AIRC, A: ABCI system overview (2022). https://docs.abci.ai/en/system-overview/

  2. Ashmore, R., Calinescu, R., Paterson, C.: Assuring the machine learning lifecycle: desiderata, methods, and challenges. ACM Comput. Surv. (CSUR) 54(5), 1–39 (2021)

    Article  Google Scholar 

  3. Balunovic, M., Baader, M., Singh, G., Gehr, T., Vechev, M.: Certifying geometric robustness of neural networks. In: NeurIPS, vol. 32 (2019)

    Google Scholar 

  4. Chen, J., Wu, X., Rastogi, V., Liang, Y., Jha, S.: Robust attribution regularization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  5. Daniel, S., Nikhil, T., Been, K., Fernanda, V., Wattenberg, M.: Smoothgrad: removing noise by adding noise. In: ICMLVIZ. PMLR (2017)

    Google Scholar 

  6. Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)

    Article  Google Scholar 

  7. Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: ICML, pp. 1802–1811. PMLR (2019)

    Google Scholar 

  8. Fawzi, A., Frossard, P.: Manitest: are classifiers really invariant? In: BMVC, pp. 106.1–106.13 (2015)

    Google Scholar 

  9. Fromherz, A., Leino, K., Fredrikson, M., Parno, B., Pasareanu, C.: Fast geometric projections for local robustness certification. In: ICLR (2021)

    Google Scholar 

  10. Fromherz, A., Leino, K., Fredrikson, M., Parno, B., Pasareanu, C.: Fast geometric projections for local robustness certification—openreview (2021). https://openreview.net/forum?id=zWy1uxjDdZJ

  11. Gao, X., Saha, R.K., Prasad, M.R., Roychoudhury, A.: Fuzz testing based data augmentation to improve robustness of deep neural networks. In: ICSE, pp. 1147–1158. IEEE, ACM (2020)

    Google Scholar 

  12. Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: CVPR, pp. 729–739. IEEE, CVF (2019)

    Google Scholar 

  13. Han, T., Tu, W.W., Li, Y.F.: Explanation consistency training: facilitating consistency-based semi-supervised learning with interpretability. In: AAAI, vol. 35, pp. 7639–7646. AAAI (2021)

    Google Scholar 

  14. Hanin, B., Rolnick, D.: Deep ReLU networks have surprisingly few activation patterns. In: NeurIPS, vol. 32 (2019)

    Google Scholar 

  15. Hinz, P.: An analysis of the piece-wise affine structure of ReLU feed-forward neural networks. Ph.D. thesis, ETH Zurich (2021)

    Google Scholar 

  16. Huang, X., et al.: A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37, 100270 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  17. Jha, S.K., Ewetz, R., Velasquez, A., Ramanathan, A., Jha, S.: Shaping noise for robust attributions in neural stochastic differential equations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 9567–9574 (2022)

    Google Scholar 

  18. Jordan, M., Lewis, J., Dimakis, A.G.: Provable certificates for adversarial examples: fitting a ball in the union of polytopes. In: NeurIPS, vol. 32 (2019)

    Google Scholar 

  19. Kanbak, C., Moosavi-Dezfooli, S., Frossard, P.: Geometric robustness of deep networks: analysis and improvement. In: CVPR, pp. 4441–4449 (2018)

    Google Scholar 

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS, vol. 25 (2012)

    Google Scholar 

  21. Lim, C.H., Urtasun, R., Yumer, E.: Hierarchical verification for adversarial robustness. In: ICML, vol. 119, pp. 6072–6082. PMLR (2020)

    Google Scholar 

  22. Mirman, M., Hägele, A., Bielik, P., Gehr, T., Vechev, M.: Robustness certification with generative models. In: PLDI, pp. 1141–1154. ACM SIGPLAN (2021)

    Google Scholar 

  23. Mohapatra, J., Weng, T.W., Chen, P.Y., Liu, S., Daniel, L.: Towards verifying robustness of neural networks against a family of semantic perturbations. In: CVPR, pp. 244–252. IEEE, CVF (2020)

    Google Scholar 

  24. Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 73, 1–15 (2018)

    Article  MathSciNet  Google Scholar 

  25. Müller, M.N., Makarchuk, G., Singh, G., Püschel, M., Vechev, M.T.: PRIMA: general and precise neural network certification via scalable convex hull approximations. Proc. ACM Program. Lang. 6(POPL), 1–33 (2022)

    Google Scholar 

  26. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM SIGKDD (2016)

    Google Scholar 

  27. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV. IEEE (2017)

    Google Scholar 

  28. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR (2014)

    Google Scholar 

  29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  30. Singh, G., Gehr, T., Püschel, M., Vechev, M.: An abstract domain for certifying neural networks. In: POPL, pp. 1–30. ACM New York (2019)

    Google Scholar 

  31. Sotoudeh, M., Thakur, A.V.: Computing linear restrictions of neural networks. In: NeurIPS, vol. 32 (2019)

    Google Scholar 

  32. Sotoudeh, M., Thakur, A.V.: Provable repair of deep neural networks. In: PLDI, pp. 588–603. ACM SIGPLAN (2021)

    Google Scholar 

  33. Sotoudeh, M., Thakur, A.V.: SyReNN: a tool for analyzing deep neural networks. In: TACAS, pp. 281–302 (2021)

    Google Scholar 

  34. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, pp. 3319–3328. PMLR (2017)

    Google Scholar 

  35. Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)

    Google Scholar 

  36. Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: ICLR (2019)

    Google Scholar 

  37. Urban, C., Christakis, M., Wüstholz, V., Zhang, F.: Perfectly parallel fairness certification of neural networks. Proc. ACM Program. Lang. 4(OOPSLA), 1–30 (2020)

    Google Scholar 

  38. Urban, C., Miné, A.: A review of formal methods applied to machine learning. CoRR abs/2104.02466 (2021). https://arxiv.org/abs/2104.02466

  39. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  40. Xiao, C., Zhu, J., Li, B., He, W., Liu, M., Song, D.: Spatially transformed adversarial examples. In: ICLR (2018)

    Google Scholar 

  41. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv:1708.07747

  42. Xu, S., Vaughan, J., Chen, J., Zhang, A., Sudjianto, A.: Traversing the local polytopes of ReLU neural networks: a unified approach for network verification. In: AdvML. AAAI (2022)

    Google Scholar 

  43. Yang, P., et al.: Enhancing robustness verification for deep neural networks via symbolic propagation. Formal Aspects Comput. 33(3), 407–435 (2021)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Satoshi Munakata .

Editor information

Editors and Affiliations

H Appendix

H Appendix

1.1 H.1 Linearity of Activation Regions

Given activation pattern \(p \in AP^f\) as constant, within activation region \(ar^f(p)\) each output of ReLU-FNN \(f_j(x \in ar^f(p))\) is linear for x (cf. Fig. 8) because all ReLU operators have already resolved to 0 or x [14]. i.e., \(f_j(x \in ar^f(p)) = A'_j x + b'_j\): where, \(A'_j\) and \(b'_j\) denote simplified weights and bias about activation pattern p and class j. That is, the gradient of each ReLU-FNN output \(f_j(x)\) within activation region \(ar^f(p)\) is constant, i.e., the following equation holds: where \(C \in \mathbb {R}\) is a constant value.

Fig. 8.
figure 8

An example of activation regions [14]. ReLU-FNN output is linear on each activation region, i.e., each output plane painted for each activation region is flat.

$$\begin{aligned} Feasible^f(p \in AP^f) \Rightarrow \frac{\partial f_j(x)}{\partial x_i} = C \;\; (x \in ar^f(p)) \end{aligned}$$
(1)

An activation region can be interpreted as the H-representation of a convex polytope on input space \(\mathbb {R}^{N^f}\). Specifically, neuron activity \(p_{l,n}\) and p have a one-to-one correspondence with a half-space and convex polytope defined by the intersection (conjunction) of all half-spaces, because \(f_n^{(l)}(x)\) is also linear when \(p \in AP^f\) is constant. Therefore, we interpret activation region \(ar^f(p)\) and the following H-representation of convex polytope \(HConvex^f(x;p)\) each other as needed: where, \(A''\) and \(b''\) denote simplified weights and bias about activation pattern p, and \(A''_{l,n} x \le b''_{l,n}\) is the half-space corresponding to the n-th neuron activity \(p_{l,n}\) in the l-th layer.

$$\begin{aligned} \begin{aligned} HConvex^f(x;p) {\mathop {=}\limits ^{\textrm{def}}}A'' x \le b'' \;\equiv \; \bigwedge _{l,n} A''_{l,n} x \le b''_{l,n} \end{aligned} \end{aligned}$$
(2)

1.2 H.2 Connectivity of Activation Regions

When feasible activation patterns \(p,p' \in AP^f\) are in a relationship with each other that flips single neuron activity \(p_{l,n} \in \{0,1\}\), they are connected regions because they share single face \(HFace^f_{l,n}(x;p) {\mathop {=}\limits ^{\textrm{def}}}A''_{l,n} x = b''_{l,n}\) corresponding to flipped \(p_{l,n}\) [18]. It is possible to flexibly traverse activation regions while ensuring connectivity by selecting a neuron activity to be flipped according to a prioritization; several traversing methods have been proposed [9, 18, 21]. However, there are generally rather many neuron activities that become infeasible when flipped [18]. For instance, half-spaces \(h_{1,3}\) is a face of activation region \(\eta \) in Fig. 6(1a); thus, flipping neuron activity \(p_{1,3}\), GBS can traverse connected region \(\eta \) in Fig. 6(1b). In contrast, half-space \(h_{1,1}\) is not a face of activation region \(\eta \) in Fig. 6(1a); thus, flipping neuron activity \(p_{1,1}\), the corresponded activation region is infeasible (i.e., the intersection of flipped half-spaces has no area).

1.3 H.3 Hierarchy of Activation Regions

When feasible activation patterns \(p,p' \in AP^f\) are in a relationship with each other that matches all of \(L'^f\)-th upstream activation pattern \(p_{\le L'^f} {\mathop {=}\limits ^{\textrm{def}}}[ p_{l,n} \mid 1 \le l \le L'^f, 1 \le n \le N^f_l ] \;\; (1 \le L'^f \le L^f)\), they are included parent activation region \(ar^f_{\le L'^f}(p)\) corresponding to convex polytope \(HConvex^f_{\le L'^f}(x;p) {\mathop {=}\limits ^{\textrm{def}}}\bigwedge _{l \le L'^f,n} A''_{l,n} x \le b''_{l,n}\) [21]. That is, \(\forall x \in ar^f(p).\; x \in ar^f_{\le L'^f}(p)\) and \(\forall x \in \mathbb {R}^{N^f}.\; HConvex^f(x;p) \Rightarrow HConvex^f_{\le L'^f}(x;p)\).

Similarly, we define \(L'^f\)-th downstream activation pattern as \(p_{\ge L'^f} {\mathop {=}\limits ^{\textrm{def}}}[ p_{l,n} \mid L'^f \le l \le L^f, 1 \le n \le N^f_l ] \;\; (1 \le L'^f \le L^f)\).

1.4 H.4 Linear Programming on an Activation Region

Based on the linearity of activation regions and ReLU-FNN outputs, we can use Linear Programming (LP) to compute (a) the feasibility of an activation region, (b) the flippability of a neuron activity, and (c) the minimum (maximum) of a ReLU-FNN output within an activation region. We show each LP encoding of the problems (a,b,c) in the SciPy LP formFootnote 2: where, \(p \in AP^f\) is a given activation pattern of ReLU-FNN f, and \(p_{l,n}\) is a give neuron activity to be flipped.

$$\begin{aligned} \begin{aligned} \mathbf {(a)} \;\;&\exists x \in \mathbb {R}^{N^f}.\; HConvex^f(x;p) {\mathop {\longrightarrow }\limits ^{\textrm{encode}}}\min _{x}{\textbf{0} x} \;\; \mathbf {s.t.,} \; A'' x \le b'' \\ \mathbf {(b)} \;\;&\exists x \in \mathbb {R}^{N^f}.\; HConvex^f(x;p) \wedge HFace^f_{l,n}(x;p) \\&\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; {\mathop {\longrightarrow }\limits ^{\textrm{encode}}}\min _{x}{\textbf{0} x} \;\; \mathbf {s.t.,} \; A'' x \le b'' ,\; A''_{l,n} x = b''_{l,n} \\ \mathbf {(c)} \;\;&\min _{x}{f_j(x)} \;\; \mathbf {s.t.,} \; HConvex^f(x;p) {\mathop {\longrightarrow }\limits ^{\textrm{encode}}}\Bigl ( \min _{x}{A'_j x} \;\; \mathbf {s.t.,} \; A'' x \le b'' \Bigr ) + b'_j \\ \end{aligned} \end{aligned}$$
(3)

1.5 H.5 Full Encoding Semantic Perturbations

We focus here on the perturbations of brightness change (B), patch (P), and translation (T), and then describe how to encode the combination of them into ReLU-FNN \(g^{x0}: \varTheta \rightarrow X\): where, \(|\theta ^{(l)}| = \dim \theta ^{(l)}\), w is the width of image x0, pxpypwph are the patch x-position, y-position, width, height, and tx is the amount of movement in x-axis direction. Here, perturbation parameter \(\theta \in \varTheta \) consists of the amount of brightness change for (B), the density of the patch for (P), and the amount of translation for (T). In contrast, perturbation parameters not included in the dimensions of \(\varTheta \), such as wpxpypwphtx, are assumed to be given as constants before verification.

figure c
$$\begin{aligned} \begin{aligned}&A^{(B)} = \left[ a^{(B)}_{r,c}\right] , A^{(P)} = \left[ a^{(P)}_{r,c}\right] , A^{(T)} = \left[ a^{(T)}_{r,c}\right] \\ \end{aligned} \end{aligned}$$
figure d
Fig. 9.
figure 9

MNIST images used for experiments.

Fig. 10.
figure 10

Fashion-MNIST images used for experiments.

1.6 H.6 Images Used for Our Experiments

We used 10 images (i.e., Indexes 69990–69999) selected from the end of the MNIST dataset (cf. Fig. 9) and the Fashion-MNIST dataset (cf. Fig. 10), respectively. We did not use these images in the training of any ReLU-FNNs.

1.7 H.7 An Example of Lemma 1

Lemma 1 is reprinted below (Fig. 11).

$$\begin{aligned} \frac{\partial f_j(x)}{\partial x_i} = C \;\; (x \in \{ g^{x0}(\theta ) \mid \theta \in ar^{f \circ g}(p) \}) \end{aligned}$$
figure e
Fig. 11.
figure 11

An image for a small example of Lemma 1.

1.8 H.8 Algorithm BFS

Algorithm BFS traverses entire activation regions in perturbation parameter space \(\varTheta \), as shown in Fig. 12.

Fig. 12.
figure 12

Examples of BFS results. (Near the edges, polygons may fail to render, resulting in blank regions.)

Algorithm BFS initializes Q with \(ap^{f \circ g^{x0}}(\textbf{0})\) (Line 3). Then, for each activation pattern p in Q (Lines 5–6), it reconstructs the corresponding activation region \(\eta \) (subroutine constructActivationRegion, Line 8) as the H-representation of p (cf. Eq. 2). Next, for each neuron in \(f \circ g^{x0}\) (Line 12), it checks whether the neuron activity \(p_{l,n}\) cannot flip within the perturbation parameter space \(\varTheta \), i.e., one of the half-spaces has no feasible points within \(\varTheta \) (subroutine isStable, Line 13). Otherwise, a new activation pattern \(p'\) is constructed by flipping \(p_{l,n}\) (subroutine flipped, Line 14) and added to the queue (Line 20) if \(p'\) is feasible (subroutine calcInteriorPointOnFace, Lines 17–18). Finally, the activation region \(\eta \) is simplified (Line 24) and used to verify CR (subroutine solveCR and solveVR, Lines 25–27, cf. Sect. 4.4) and VR (subroutine solveAR and solveIR, Lines 32–34, cf. Sect. 4.4).

1.9 H.9 Details of Experimental Results

Table 2 shows the breakdown of verification statuses in experimental results for each algorithm and each DNN size (cf. Sect. 5). In particular, for traversing AR boundaries, we can see the problem that the ratio of “Timeout” and “Failed (out-of-memory)” increases as the size of the DNN increases. This problem is because gbs-AR traverses more activation regions by the width of the hyperparameter \(w^\delta \) than gbs-CR. It would be desirable in the future, for example, to traverse only the small number of activation regions near the AR boundary.

Table 2. Breakdown of verification statuses. “Robust” and “NotRobust” mean algorithm found “only robust regions” and “at least one not-robust region”, respectively. “Timeout” and “Failed” mean algorithm did not finish “within 2 h” and “due to out-of-memory”, respectively.
figure f

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Munakata, S., Urban, C., Yokoyama, H., Yamamoto, K., Munakata, K. (2023). Verifying Attention Robustness of Deep Neural Networks Against Semantic Perturbations. In: Rozier, K.Y., Chaudhuri, S. (eds) NASA Formal Methods. NFM 2023. Lecture Notes in Computer Science, vol 13903. Springer, Cham. https://doi.org/10.1007/978-3-031-33170-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33170-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33169-5

  • Online ISBN: 978-3-031-33170-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics