Verifying Attention Robustness of Deep Neural Networks Against Semantic Perturbations

Munakata, Satoshi; Urban, Caterina; Yokoyama, Haruki; Yamamoto, Koji; Munakata, Kazuki

doi:10.1007/978-3-031-33170-1_3

Satoshi Munakata⁹,
Caterina Urban¹⁰,
Haruki Yokoyama⁹,
Koji Yamamoto⁹ &
…
Kazuki Munakata⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13903))

Included in the following conference series:

NASA Formal Methods Symposium

450 Accesses
1 Citations

Abstract

It is known that deep neural networks (DNNs) classify an input image by paying particular attention to certain specific pixels; a graphical representation of the magnitude of attention to each pixel is called a saliency-map. Saliency-maps are used to check the validity of the classification decision basis, e.g., it is not a valid basis for classification if a DNN pays more attention to the background rather than the subject of an image. Semantic perturbations can significantly change the saliency-map. In this work, we propose the first verification method for attention robustness, i.e., the local robustness of the changes in the saliency-map against combinations of semantic perturbations. Specifically, our method determines the range of the perturbation parameters (e.g., the brightness change) that maintains the difference between the actual saliency-map change and the expected saliency-map change below a given threshold value. Our method is based on activation region traversals, focusing on the outermost robust boundary for scalability on larger DNNs. We empirically evaluate the effectiveness and performance of our method on DNNs trained on popular image classification datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this paper, the term “attention” refers to the focus of certain specific pixels in the image, and not to the “attention mechanism” used in transformer models [39].
2.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html.

References

AIRC, A: ABCI system overview (2022). https://docs.abci.ai/en/system-overview/
Ashmore, R., Calinescu, R., Paterson, C.: Assuring the machine learning lifecycle: desiderata, methods, and challenges. ACM Comput. Surv. (CSUR) 54(5), 1–39 (2021)
Article Google Scholar
Balunovic, M., Baader, M., Singh, G., Gehr, T., Vechev, M.: Certifying geometric robustness of neural networks. In: NeurIPS, vol. 32 (2019)
Google Scholar
Chen, J., Wu, X., Rastogi, V., Liang, Y., Jha, S.: Robust attribution regularization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Daniel, S., Nikhil, T., Been, K., Fernanda, V., Wattenberg, M.: Smoothgrad: removing noise by adding noise. In: ICMLVIZ. PMLR (2017)
Google Scholar
Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
Article Google Scholar
Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: ICML, pp. 1802–1811. PMLR (2019)
Google Scholar
Fawzi, A., Frossard, P.: Manitest: are classifiers really invariant? In: BMVC, pp. 106.1–106.13 (2015)
Google Scholar
Fromherz, A., Leino, K., Fredrikson, M., Parno, B., Pasareanu, C.: Fast geometric projections for local robustness certification. In: ICLR (2021)
Google Scholar
Fromherz, A., Leino, K., Fredrikson, M., Parno, B., Pasareanu, C.: Fast geometric projections for local robustness certification—openreview (2021). https://openreview.net/forum?id=zWy1uxjDdZJ
Gao, X., Saha, R.K., Prasad, M.R., Roychoudhury, A.: Fuzz testing based data augmentation to improve robustness of deep neural networks. In: ICSE, pp. 1147–1158. IEEE, ACM (2020)
Google Scholar
Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: CVPR, pp. 729–739. IEEE, CVF (2019)
Google Scholar
Han, T., Tu, W.W., Li, Y.F.: Explanation consistency training: facilitating consistency-based semi-supervised learning with interpretability. In: AAAI, vol. 35, pp. 7639–7646. AAAI (2021)
Google Scholar
Hanin, B., Rolnick, D.: Deep ReLU networks have surprisingly few activation patterns. In: NeurIPS, vol. 32 (2019)
Google Scholar
Hinz, P.: An analysis of the piece-wise affine structure of ReLU feed-forward neural networks. Ph.D. thesis, ETH Zurich (2021)
Google Scholar
Huang, X., et al.: A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37, 100270 (2020)
Article MathSciNet MATH Google Scholar
Jha, S.K., Ewetz, R., Velasquez, A., Ramanathan, A., Jha, S.: Shaping noise for robust attributions in neural stochastic differential equations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 9567–9574 (2022)
Google Scholar
Jordan, M., Lewis, J., Dimakis, A.G.: Provable certificates for adversarial examples: fitting a ball in the union of polytopes. In: NeurIPS, vol. 32 (2019)
Google Scholar
Kanbak, C., Moosavi-Dezfooli, S., Frossard, P.: Geometric robustness of deep networks: analysis and improvement. In: CVPR, pp. 4441–4449 (2018)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS, vol. 25 (2012)
Google Scholar
Lim, C.H., Urtasun, R., Yumer, E.: Hierarchical verification for adversarial robustness. In: ICML, vol. 119, pp. 6072–6082. PMLR (2020)
Google Scholar
Mirman, M., Hägele, A., Bielik, P., Gehr, T., Vechev, M.: Robustness certification with generative models. In: PLDI, pp. 1141–1154. ACM SIGPLAN (2021)
Google Scholar
Mohapatra, J., Weng, T.W., Chen, P.Y., Liu, S., Daniel, L.: Towards verifying robustness of neural networks against a family of semantic perturbations. In: CVPR, pp. 244–252. IEEE, CVF (2020)
Google Scholar
Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 73, 1–15 (2018)
Article MathSciNet Google Scholar
Müller, M.N., Makarchuk, G., Singh, G., Püschel, M., Vechev, M.T.: PRIMA: general and precise neural network certification via scalable convex hull approximations. Proc. ACM Program. Lang. 6(POPL), 1–33 (2022)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM SIGKDD (2016)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV. IEEE (2017)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Singh, G., Gehr, T., Püschel, M., Vechev, M.: An abstract domain for certifying neural networks. In: POPL, pp. 1–30. ACM New York (2019)
Google Scholar
Sotoudeh, M., Thakur, A.V.: Computing linear restrictions of neural networks. In: NeurIPS, vol. 32 (2019)
Google Scholar
Sotoudeh, M., Thakur, A.V.: Provable repair of deep neural networks. In: PLDI, pp. 588–603. ACM SIGPLAN (2021)
Google Scholar
Sotoudeh, M., Thakur, A.V.: SyReNN: a tool for analyzing deep neural networks. In: TACAS, pp. 281–302 (2021)
Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, pp. 3319–3328. PMLR (2017)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Google Scholar
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: ICLR (2019)
Google Scholar
Urban, C., Christakis, M., Wüstholz, V., Zhang, F.: Perfectly parallel fairness certification of neural networks. Proc. ACM Program. Lang. 4(OOPSLA), 1–30 (2020)
Google Scholar
Urban, C., Miné, A.: A review of formal methods applied to machine learning. CoRR abs/2104.02466 (2021). https://arxiv.org/abs/2104.02466
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Xiao, C., Zhu, J., Li, B., He, W., Liu, M., Song, D.: Spatially transformed adversarial examples. In: ICLR (2018)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv:1708.07747
Xu, S., Vaughan, J., Chen, J., Zhang, A., Sudjianto, A.: Traversing the local polytopes of ReLU neural networks: a unified approach for network verification. In: AdvML. AAAI (2022)
Google Scholar
Yang, P., et al.: Enhancing robustness verification for deep neural networks via symbolic propagation. Formal Aspects Comput. 33(3), 407–435 (2021)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu, Kanagawa, Japan
Satoshi Munakata, Haruki Yokoyama, Koji Yamamoto & Kazuki Munakata
Inria & ENS | PSL & CNRS, Paris, France
Caterina Urban

Authors

Satoshi Munakata
View author publications
You can also search for this author in PubMed Google Scholar
Caterina Urban
View author publications
You can also search for this author in PubMed Google Scholar
Haruki Yokoyama
View author publications
You can also search for this author in PubMed Google Scholar
Koji Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Kazuki Munakata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satoshi Munakata .

Editor information

Editors and Affiliations

Iowa State University, Ames, IA, USA
Kristin Yvonne Rozier
University of Texas at Austin, Austin, TX, USA
Swarat Chaudhuri

H Appendix

1.1 H.1 Linearity of Activation Regions

Given activation pattern $p \in AP^f$ as constant, within activation region $ar^f(p)$ each output of ReLU-FNN $f_j(x \in ar^f(p))$ is linear for x (cf. Fig. 8) because all ReLU operators have already resolved to 0 or x [14]. i.e., $f_j(x \in ar^f(p)) = A'_j x + b'_j$: where, $A'_j$ and $b'_j$ denote simplified weights and bias about activation pattern p and class j. That is, the gradient of each ReLU-FNN output $f_j(x)$ within activation region $ar^f(p)$ is constant, i.e., the following equation holds: where $C \in \mathbb {R}$ is a constant value.

$$\begin{aligned} Feasible^f(p \in AP^f) \Rightarrow \frac{\partial f_j(x)}{\partial x_i} = C \;\; (x \in ar^f(p)) \end{aligned}$$

(1)

An activation region can be interpreted as the H-representation of a convex polytope on input space $\mathbb {R}^{N^f}$. Specifically, neuron activity $p_{l,n}$ and p have a one-to-one correspondence with a half-space and convex polytope defined by the intersection (conjunction) of all half-spaces, because $f_n^{(l)}(x)$ is also linear when $p \in AP^f$ is constant. Therefore, we interpret activation region $ar^f(p)$ and the following H-representation of convex polytope $HConvex^f(x;p)$ each other as needed: where, $A''$ and $b''$ denote simplified weights and bias about activation pattern p, and $A''_{l,n} x \le b''_{l,n}$ is the half-space corresponding to the n-th neuron activity $p_{l,n}$ in the l-th layer.

$$\begin{aligned} \begin{aligned} HConvex^f(x;p) {\mathop {=}\limits ^{\textrm{def}}}A'' x \le b'' \;\equiv \; \bigwedge _{l,n} A''_{l,n} x \le b''_{l,n} \end{aligned} \end{aligned}$$

(2)

1.2 H.2 Connectivity of Activation Regions

When feasible activation patterns $p,p' \in AP^f$ are in a relationship with each other that flips single neuron activity $p_{l,n} \in \{0,1\}$, they are connected regions because they share single face $HFace^f_{l,n}(x;p) {\mathop {=}\limits ^{\textrm{def}}}A''_{l,n} x = b''_{l,n}$ corresponding to flipped $p_{l,n}$ [18]. It is possible to flexibly traverse activation regions while ensuring connectivity by selecting a neuron activity to be flipped according to a prioritization; several traversing methods have been proposed [9, 18, 21]. However, there are generally rather many neuron activities that become infeasible when flipped [18]. For instance, half-spaces $h_{1,3}$ is a face of activation region $\eta $ in Fig. 6(1a); thus, flipping neuron activity $p_{1,3}$, GBS can traverse connected region $\eta $ in Fig. 6(1b). In contrast, half-space $h_{1,1}$ is not a face of activation region $\eta $ in Fig. 6(1a); thus, flipping neuron activity $p_{1,1}$, the corresponded activation region is infeasible (i.e., the intersection of flipped half-spaces has no area).

1.3 H.3 Hierarchy of Activation Regions

When feasible activation patterns $p,p' \in AP^f$ are in a relationship with each other that matches all of $L'^f$-th upstream activation pattern $p_{\le L'^f} {\mathop {=}\limits ^{\textrm{def}}}[ p_{l,n} \mid 1 \le l \le L'^f, 1 \le n \le N^f_l ] \;\; (1 \le L'^f \le L^f)$, they are included parent activation region $ar^f_{\le L'^f}(p)$ corresponding to convex polytope $HConvex^f_{\le L'^f}(x;p) {\mathop {=}\limits ^{\textrm{def}}}\bigwedge _{l \le L'^f,n} A''_{l,n} x \le b''_{l,n}$ [21]. That is, $\forall x \in ar^f(p).\; x \in ar^f_{\le L'^f}(p)$ and $\forall x \in \mathbb {R}^{N^f}.\; HConvex^f(x;p) \Rightarrow HConvex^f_{\le L'^f}(x;p)$.

Similarly, we define $L'^f$-th downstream activation pattern as $p_{\ge L'^f} {\mathop {=}\limits ^{\textrm{def}}}[ p_{l,n} \mid L'^f \le l \le L^f, 1 \le n \le N^f_l ] \;\; (1 \le L'^f \le L^f)$.

1.4 H.4 Linear Programming on an Activation Region

Based on the linearity of activation regions and ReLU-FNN outputs, we can use Linear Programming (LP) to compute (a) the feasibility of an activation region, (b) the flippability of a neuron activity, and (c) the minimum (maximum) of a ReLU-FNN output within an activation region. We show each LP encoding of the problems (a,b,c) in the SciPy LP form^{Footnote 2}: where, $p \in AP^f$ is a given activation pattern of ReLU-FNN f, and $p_{l,n}$ is a give neuron activity to be flipped.

$$\begin{aligned} \begin{aligned} \mathbf {(a)} \;\;&\exists x \in \mathbb {R}^{N^f}.\; HConvex^f(x;p) {\mathop {\longrightarrow }\limits ^{\textrm{encode}}}\min _{x}{\textbf{0} x} \;\; \mathbf {s.t.,} \; A'' x \le b'' \\ \mathbf {(b)} \;\;&\exists x \in \mathbb {R}^{N^f}.\; HConvex^f(x;p) \wedge HFace^f_{l,n}(x;p) \\&\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; {\mathop {\longrightarrow }\limits ^{\textrm{encode}}}\min _{x}{\textbf{0} x} \;\; \mathbf {s.t.,} \; A'' x \le b'' ,\; A''_{l,n} x = b''_{l,n} \\ \mathbf {(c)} \;\;&\min _{x}{f_j(x)} \;\; \mathbf {s.t.,} \; HConvex^f(x;p) {\mathop {\longrightarrow }\limits ^{\textrm{encode}}}\Bigl ( \min _{x}{A'_j x} \;\; \mathbf {s.t.,} \; A'' x \le b'' \Bigr ) + b'_j \\ \end{aligned} \end{aligned}$$

(3)

1.5 H.5 Full Encoding Semantic Perturbations

We focus here on the perturbations of brightness change (B), patch (P), and translation (T), and then describe how to encode the combination of them into ReLU-FNN $g^{x0}: \varTheta \rightarrow X$: where, $|\theta ^{(l)}| = \dim \theta ^{(l)}$, w is the width of image x0, px, py, pw, ph are the patch x-position, y-position, width, height, and tx is the amount of movement in x-axis direction. Here, perturbation parameter $\theta \in \varTheta $ consists of the amount of brightness change for (B), the density of the patch for (P), and the amount of translation for (T). In contrast, perturbation parameters not included in the dimensions of $\varTheta $, such as w, px, py, pw, ph, tx, are assumed to be given as constants before verification.

$$\begin{aligned} \begin{aligned}&A^{(B)} = \left[ a^{(B)}_{r,c}\right] , A^{(P)} = \left[ a^{(P)}_{r,c}\right] , A^{(T)} = \left[ a^{(T)}_{r,c}\right] \\ \end{aligned} \end{aligned}$$

1.6 H.6 Images Used for Our Experiments

We used 10 images (i.e., Indexes 69990–69999) selected from the end of the MNIST dataset (cf. Fig. 9) and the Fashion-MNIST dataset (cf. Fig. 10), respectively. We did not use these images in the training of any ReLU-FNNs.

1.7 H.7 An Example of Lemma 1

Lemma 1 is reprinted below (Fig. 11).

$$\begin{aligned} \frac{\partial f_j(x)}{\partial x_i} = C \;\; (x \in \{ g^{x0}(\theta ) \mid \theta \in ar^{f \circ g}(p) \}) \end{aligned}$$

1.8 H.8 Algorithm BFS

Algorithm BFS traverses entire activation regions in perturbation parameter space $\varTheta $, as shown in Fig. 12.

Algorithm BFS initializes Q with $ap^{f \circ g^{x0}}(\textbf{0})$ (Line 3). Then, for each activation pattern p in Q (Lines 5–6), it reconstructs the corresponding activation region $\eta $ (subroutine constructActivationRegion, Line 8) as the H-representation of p (cf. Eq. 2). Next, for each neuron in $f \circ g^{x0}$ (Line 12), it checks whether the neuron activity $p_{l,n}$ cannot flip within the perturbation parameter space $\varTheta $, i.e., one of the half-spaces has no feasible points within $\varTheta $ (subroutine isStable, Line 13). Otherwise, a new activation pattern $p'$ is constructed by flipping $p_{l,n}$ (subroutine flipped, Line 14) and added to the queue (Line 20) if $p'$ is feasible (subroutine calcInteriorPointOnFace, Lines 17–18). Finally, the activation region $\eta $ is simplified (Line 24) and used to verify CR (subroutine solveCR and solveVR, Lines 25–27, cf. Sect. 4.4) and VR (subroutine solveAR and solveIR, Lines 32–34, cf. Sect. 4.4).

1.9 H.9 Details of Experimental Results

Table 2 shows the breakdown of verification statuses in experimental results for each algorithm and each DNN size (cf. Sect. 5). In particular, for traversing AR boundaries, we can see the problem that the ratio of “Timeout” and “Failed (out-of-memory)” increases as the size of the DNN increases. This problem is because gbs-AR traverses more activation regions by the width of the hyperparameter $w^\delta $ than gbs-CR. It would be desirable in the future, for example, to traverse only the small number of activation regions near the AR boundary.

Table 2. Breakdown of verification statuses. “Robust” and “NotRobust” mean algorithm found “only robust regions” and “at least one not-robust region”, respectively. “Timeout” and “Failed” mean algorithm did not finish “within 2 h” and “due to out-of-memory”, respectively.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Munakata, S., Urban, C., Yokoyama, H., Yamamoto, K., Munakata, K. (2023). Verifying Attention Robustness of Deep Neural Networks Against Semantic Perturbations. In: Rozier, K.Y., Chaudhuri, S. (eds) NASA Formal Methods. NFM 2023. Lecture Notes in Computer Science, vol 13903. Springer, Cham. https://doi.org/10.1007/978-3-031-33170-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-33170-1_3
Published: 03 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33169-5
Online ISBN: 978-3-031-33170-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Verifying Attention Robustness of Deep Neural Networks Against Semantic Perturbations

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

H Appendix

H Appendix

1.1 H.1 Linearity of Activation Regions

1.2 H.2 Connectivity of Activation Regions

1.3 H.3 Hierarchy of Activation Regions

1.4 H.4 Linear Programming on an Activation Region

1.5 H.5 Full Encoding Semantic Perturbations

1.6 H.6 Images Used for Our Experiments

1.7 H.7 An Example of Lemma 1

1.8 H.8 Algorithm BFS

1.9 H.9 Details of Experimental Results

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation