Abstract
It is known that deep neural networks (DNNs) classify an input image by paying particular attention to certain specific pixels; a graphical representation of the magnitude of attention to each pixel is called a saliency-map. Saliency-maps are used to check the validity of the classification decision basis, e.g., it is not a valid basis for classification if a DNN pays more attention to the background rather than the subject of an image. Semantic perturbations can significantly change the saliency-map. In this work, we propose the first verification method for attention robustness, i.e., the local robustness of the changes in the saliency-map against combinations of semantic perturbations. Specifically, our method determines the range of the perturbation parameters (e.g., the brightness change) that maintains the difference between the actual saliency-map change and the expected saliency-map change below a given threshold value. Our method is based on activation region traversals, focusing on the outermost robust boundary for scalability on larger DNNs. We empirically evaluate the effectiveness and performance of our method on DNNs trained on popular image classification datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this paper, the term “attention” refers to the focus of certain specific pixels in the image, and not to the “attention mechanism” used in transformer models [39].
- 2.
References
AIRC, A: ABCI system overview (2022). https://docs.abci.ai/en/system-overview/
Ashmore, R., Calinescu, R., Paterson, C.: Assuring the machine learning lifecycle: desiderata, methods, and challenges. ACM Comput. Surv. (CSUR) 54(5), 1–39 (2021)
Balunovic, M., Baader, M., Singh, G., Gehr, T., Vechev, M.: Certifying geometric robustness of neural networks. In: NeurIPS, vol. 32 (2019)
Chen, J., Wu, X., Rastogi, V., Liang, Y., Jha, S.: Robust attribution regularization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Daniel, S., Nikhil, T., Been, K., Fernanda, V., Wattenberg, M.: Smoothgrad: removing noise by adding noise. In: ICMLVIZ. PMLR (2017)
Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: ICML, pp. 1802–1811. PMLR (2019)
Fawzi, A., Frossard, P.: Manitest: are classifiers really invariant? In: BMVC, pp. 106.1–106.13 (2015)
Fromherz, A., Leino, K., Fredrikson, M., Parno, B., Pasareanu, C.: Fast geometric projections for local robustness certification. In: ICLR (2021)
Fromherz, A., Leino, K., Fredrikson, M., Parno, B., Pasareanu, C.: Fast geometric projections for local robustness certification—openreview (2021). https://openreview.net/forum?id=zWy1uxjDdZJ
Gao, X., Saha, R.K., Prasad, M.R., Roychoudhury, A.: Fuzz testing based data augmentation to improve robustness of deep neural networks. In: ICSE, pp. 1147–1158. IEEE, ACM (2020)
Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: CVPR, pp. 729–739. IEEE, CVF (2019)
Han, T., Tu, W.W., Li, Y.F.: Explanation consistency training: facilitating consistency-based semi-supervised learning with interpretability. In: AAAI, vol. 35, pp. 7639–7646. AAAI (2021)
Hanin, B., Rolnick, D.: Deep ReLU networks have surprisingly few activation patterns. In: NeurIPS, vol. 32 (2019)
Hinz, P.: An analysis of the piece-wise affine structure of ReLU feed-forward neural networks. Ph.D. thesis, ETH Zurich (2021)
Huang, X., et al.: A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37, 100270 (2020)
Jha, S.K., Ewetz, R., Velasquez, A., Ramanathan, A., Jha, S.: Shaping noise for robust attributions in neural stochastic differential equations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 9567–9574 (2022)
Jordan, M., Lewis, J., Dimakis, A.G.: Provable certificates for adversarial examples: fitting a ball in the union of polytopes. In: NeurIPS, vol. 32 (2019)
Kanbak, C., Moosavi-Dezfooli, S., Frossard, P.: Geometric robustness of deep networks: analysis and improvement. In: CVPR, pp. 4441–4449 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS, vol. 25 (2012)
Lim, C.H., Urtasun, R., Yumer, E.: Hierarchical verification for adversarial robustness. In: ICML, vol. 119, pp. 6072–6082. PMLR (2020)
Mirman, M., Hägele, A., Bielik, P., Gehr, T., Vechev, M.: Robustness certification with generative models. In: PLDI, pp. 1141–1154. ACM SIGPLAN (2021)
Mohapatra, J., Weng, T.W., Chen, P.Y., Liu, S., Daniel, L.: Towards verifying robustness of neural networks against a family of semantic perturbations. In: CVPR, pp. 244–252. IEEE, CVF (2020)
Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 73, 1–15 (2018)
Müller, M.N., Makarchuk, G., Singh, G., Püschel, M., Vechev, M.T.: PRIMA: general and precise neural network certification via scalable convex hull approximations. Proc. ACM Program. Lang. 6(POPL), 1–33 (2022)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM SIGKDD (2016)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV. IEEE (2017)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Singh, G., Gehr, T., Püschel, M., Vechev, M.: An abstract domain for certifying neural networks. In: POPL, pp. 1–30. ACM New York (2019)
Sotoudeh, M., Thakur, A.V.: Computing linear restrictions of neural networks. In: NeurIPS, vol. 32 (2019)
Sotoudeh, M., Thakur, A.V.: Provable repair of deep neural networks. In: PLDI, pp. 588–603. ACM SIGPLAN (2021)
Sotoudeh, M., Thakur, A.V.: SyReNN: a tool for analyzing deep neural networks. In: TACAS, pp. 281–302 (2021)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, pp. 3319–3328. PMLR (2017)
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: ICLR (2019)
Urban, C., Christakis, M., Wüstholz, V., Zhang, F.: Perfectly parallel fairness certification of neural networks. Proc. ACM Program. Lang. 4(OOPSLA), 1–30 (2020)
Urban, C., Miné, A.: A review of formal methods applied to machine learning. CoRR abs/2104.02466 (2021). https://arxiv.org/abs/2104.02466
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Xiao, C., Zhu, J., Li, B., He, W., Liu, M., Song, D.: Spatially transformed adversarial examples. In: ICLR (2018)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv:1708.07747
Xu, S., Vaughan, J., Chen, J., Zhang, A., Sudjianto, A.: Traversing the local polytopes of ReLU neural networks: a unified approach for network verification. In: AdvML. AAAI (2022)
Yang, P., et al.: Enhancing robustness verification for deep neural networks via symbolic propagation. Formal Aspects Comput. 33(3), 407–435 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
H Appendix
H Appendix
1.1 H.1 Linearity of Activation Regions
Given activation pattern \(p \in AP^f\) as constant, within activation region \(ar^f(p)\) each output of ReLU-FNN \(f_j(x \in ar^f(p))\) is linear for x (cf. Fig. 8) because all ReLU operators have already resolved to 0 or x [14]. i.e., \(f_j(x \in ar^f(p)) = A'_j x + b'_j\): where, \(A'_j\) and \(b'_j\) denote simplified weights and bias about activation pattern p and class j. That is, the gradient of each ReLU-FNN output \(f_j(x)\) within activation region \(ar^f(p)\) is constant, i.e., the following equation holds: where \(C \in \mathbb {R}\) is a constant value.
An activation region can be interpreted as the H-representation of a convex polytope on input space \(\mathbb {R}^{N^f}\). Specifically, neuron activity \(p_{l,n}\) and p have a one-to-one correspondence with a half-space and convex polytope defined by the intersection (conjunction) of all half-spaces, because \(f_n^{(l)}(x)\) is also linear when \(p \in AP^f\) is constant. Therefore, we interpret activation region \(ar^f(p)\) and the following H-representation of convex polytope \(HConvex^f(x;p)\) each other as needed: where, \(A''\) and \(b''\) denote simplified weights and bias about activation pattern p, and \(A''_{l,n} x \le b''_{l,n}\) is the half-space corresponding to the n-th neuron activity \(p_{l,n}\) in the l-th layer.
1.2 H.2 Connectivity of Activation Regions
When feasible activation patterns \(p,p' \in AP^f\) are in a relationship with each other that flips single neuron activity \(p_{l,n} \in \{0,1\}\), they are connected regions because they share single face \(HFace^f_{l,n}(x;p) {\mathop {=}\limits ^{\textrm{def}}}A''_{l,n} x = b''_{l,n}\) corresponding to flipped \(p_{l,n}\) [18]. It is possible to flexibly traverse activation regions while ensuring connectivity by selecting a neuron activity to be flipped according to a prioritization; several traversing methods have been proposed [9, 18, 21]. However, there are generally rather many neuron activities that become infeasible when flipped [18]. For instance, half-spaces \(h_{1,3}\) is a face of activation region \(\eta \) in Fig. 6(1a); thus, flipping neuron activity \(p_{1,3}\), GBS can traverse connected region \(\eta \) in Fig. 6(1b). In contrast, half-space \(h_{1,1}\) is not a face of activation region \(\eta \) in Fig. 6(1a); thus, flipping neuron activity \(p_{1,1}\), the corresponded activation region is infeasible (i.e., the intersection of flipped half-spaces has no area).
1.3 H.3 Hierarchy of Activation Regions
When feasible activation patterns \(p,p' \in AP^f\) are in a relationship with each other that matches all of \(L'^f\)-th upstream activation pattern \(p_{\le L'^f} {\mathop {=}\limits ^{\textrm{def}}}[ p_{l,n} \mid 1 \le l \le L'^f, 1 \le n \le N^f_l ] \;\; (1 \le L'^f \le L^f)\), they are included parent activation region \(ar^f_{\le L'^f}(p)\) corresponding to convex polytope \(HConvex^f_{\le L'^f}(x;p) {\mathop {=}\limits ^{\textrm{def}}}\bigwedge _{l \le L'^f,n} A''_{l,n} x \le b''_{l,n}\) [21]. That is, \(\forall x \in ar^f(p).\; x \in ar^f_{\le L'^f}(p)\) and \(\forall x \in \mathbb {R}^{N^f}.\; HConvex^f(x;p) \Rightarrow HConvex^f_{\le L'^f}(x;p)\).
Similarly, we define \(L'^f\)-th downstream activation pattern as \(p_{\ge L'^f} {\mathop {=}\limits ^{\textrm{def}}}[ p_{l,n} \mid L'^f \le l \le L^f, 1 \le n \le N^f_l ] \;\; (1 \le L'^f \le L^f)\).
1.4 H.4 Linear Programming on an Activation Region
Based on the linearity of activation regions and ReLU-FNN outputs, we can use Linear Programming (LP) to compute (a) the feasibility of an activation region, (b) the flippability of a neuron activity, and (c) the minimum (maximum) of a ReLU-FNN output within an activation region. We show each LP encoding of the problems (a,b,c) in the SciPy LP formFootnote 2: where, \(p \in AP^f\) is a given activation pattern of ReLU-FNN f, and \(p_{l,n}\) is a give neuron activity to be flipped.
1.5 H.5 Full Encoding Semantic Perturbations
We focus here on the perturbations of brightness change (B), patch (P), and translation (T), and then describe how to encode the combination of them into ReLU-FNN \(g^{x0}: \varTheta \rightarrow X\): where, \(|\theta ^{(l)}| = \dim \theta ^{(l)}\), w is the width of image x0, px, py, pw, ph are the patch x-position, y-position, width, height, and tx is the amount of movement in x-axis direction. Here, perturbation parameter \(\theta \in \varTheta \) consists of the amount of brightness change for (B), the density of the patch for (P), and the amount of translation for (T). In contrast, perturbation parameters not included in the dimensions of \(\varTheta \), such as w, px, py, pw, ph, tx, are assumed to be given as constants before verification.
1.6 H.6 Images Used for Our Experiments
We used 10 images (i.e., Indexes 69990–69999) selected from the end of the MNIST dataset (cf. Fig. 9) and the Fashion-MNIST dataset (cf. Fig. 10), respectively. We did not use these images in the training of any ReLU-FNNs.
1.7 H.7 An Example of Lemma 1
Lemma 1 is reprinted below (Fig. 11).
1.8 H.8 Algorithm BFS
Algorithm BFS traverses entire activation regions in perturbation parameter space \(\varTheta \), as shown in Fig. 12.
Algorithm BFS initializes Q with \(ap^{f \circ g^{x0}}(\textbf{0})\) (Line 3). Then, for each activation pattern p in Q (Lines 5–6), it reconstructs the corresponding activation region \(\eta \) (subroutine constructActivationRegion, Line 8) as the H-representation of p (cf. Eq. 2). Next, for each neuron in \(f \circ g^{x0}\) (Line 12), it checks whether the neuron activity \(p_{l,n}\) cannot flip within the perturbation parameter space \(\varTheta \), i.e., one of the half-spaces has no feasible points within \(\varTheta \) (subroutine isStable, Line 13). Otherwise, a new activation pattern \(p'\) is constructed by flipping \(p_{l,n}\) (subroutine flipped, Line 14) and added to the queue (Line 20) if \(p'\) is feasible (subroutine calcInteriorPointOnFace, Lines 17–18). Finally, the activation region \(\eta \) is simplified (Line 24) and used to verify CR (subroutine solveCR and solveVR, Lines 25–27, cf. Sect. 4.4) and VR (subroutine solveAR and solveIR, Lines 32–34, cf. Sect. 4.4).
1.9 H.9 Details of Experimental Results
Table 2 shows the breakdown of verification statuses in experimental results for each algorithm and each DNN size (cf. Sect. 5). In particular, for traversing AR boundaries, we can see the problem that the ratio of “Timeout” and “Failed (out-of-memory)” increases as the size of the DNN increases. This problem is because gbs-AR traverses more activation regions by the width of the hyperparameter \(w^\delta \) than gbs-CR. It would be desirable in the future, for example, to traverse only the small number of activation regions near the AR boundary.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Munakata, S., Urban, C., Yokoyama, H., Yamamoto, K., Munakata, K. (2023). Verifying Attention Robustness of Deep Neural Networks Against Semantic Perturbations. In: Rozier, K.Y., Chaudhuri, S. (eds) NASA Formal Methods. NFM 2023. Lecture Notes in Computer Science, vol 13903. Springer, Cham. https://doi.org/10.1007/978-3-031-33170-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-33170-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33169-5
Online ISBN: 978-3-031-33170-1
eBook Packages: Computer ScienceComputer Science (R0)