Abstract
Despite their impressive performance, deep convolutional neural networks (CNN) have been shown to be sensitive to small adversarial perturbations. These nuisances, which one can barely notice, are powerful enough to fool sophisticated and well performing classifiers, leading to ridiculous misclassification results. In this paper, we analyze the stability of state-of-the-art deep learning classification machines to adversarial perturbations, where we assume that the signals belong to the (possibly multilayer) sparse representation model. We start with convolutional sparsity and then proceed to its multilayered version, which is tightly connected to CNN. Our analysis links between the stability of the classification to noise and the underlying structure of the signal, quantified by the sparsity of its representation under a fixed dictionary. In addition, we offer similar stability theorems for two practical pursuit algorithms, which are posed as two different deep learning architectures—the layered thresholding and the layered basis pursuit. Our analysis establishes the better robustness of the later to adversarial attacks. We corroborate these theoretical results by numerical experiments on three datasets: MNIST, CIFAR-10 and CIFAR-100.
Similar content being viewed by others
Notes
Note that in this scheme, the number of iterations for each BP pursuit stage is implicit, hidden by the number of loops to apply. More on this is given in later sections.
Locally bounded noise results exist for the CSC as well [22], and can be leveraged in a similar fashion.
References
Aberdam, A., Sulam, J., Elad, M.: Multi-layer sparse coding: the holistic way. SIAM J. Math. Data Sci. 1(1), 46–77 (2019)
Bibi, A., Ghanem, B., Koltun, V., Ranftl, R: Deep layers as stochastic solvers. In: International Conference on Learning Representations (2019)
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Bredensteiner, E.J., Bennett, K.P.: Multicategory classification by support vector machines. In: Computational Optimization, pp. 53–79. Springer, Berlin (1999)
Candes, E.J.: The restricted isometry property and its implications for compressed sensing. C.R. Math. 346(9–10), 589–592 (2008)
Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing, 1st edn. Springer, Berlin (2010)
Fawzi, A., Fawzi, H., Fawzi, O.: Adversarial vulnerability for any classifier. arXiv preprint arXiv:1802.08686 (2018)
Fawzi, A., Fawzi, O., Frossard, P.: Analysis of classifiers’ robustness to adversarial perturbations. Mach. Learn. 107(3), 481–508 (2018)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. ICLR (2015)
Gregor, K., LeCun, Y.: Learning fast approximations of sparse coding. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 399–406 (2010)
Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset. online: http://www.cs.toronto.edu/kriz/cifar.html (2014)
Kurakin. A., Goodfellow, I.,, Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
LeCun, Y., Cortes, C., Burges, C.J.: MNIST handwritten digit database. AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2 (2010)
Liao, F., Liang, M., Dong, Y., Pang, T., Zhu, J., Hu, X.: Defense against adversarial attacks using high-level representation guided denoiser. In: IEEE-CVPR (2018)
Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. In: ICLR (2017)
Mahdizadehaghdam, S., Panahi, A., Krim, H., Dai, L.: Deep dictionary learning: a parametric network approach. arXiv preprint arXiv:1803.04022 (2018)
Mairal, J., Bach, F., Ponce, J.: Sparse modeling for image and vision processing. arXiv preprint arXiv:1411.3230 (2014)
Moustapha, C., Piotr, B., Edouard, G., Yann, D., Nicolas, U.: Parseval networks: improving robustness to adversarial examples. In: ICML (2017)
Papyan, V., Romano, Y., Elad, M.: Convolutional neural networks analyzed via convolutional sparse coding. J. Mach. Learn. Res. 18(83), 1–52 (2017)
Papyan, V., Sulam, J., Elad, M.: Working locally thinking globally: theoretical guarantees for convolutional sparse coding. IEEE Trans. Signal Process. 65(21), 5687–5701 (2017)
Sokolić, J., Giryes, R., Sapiro, G., Rodrigues, M.R.D.: Robust large margin deep neural networks. IEEE Trans. Signal Process. 65(16), 4265–4280 (2016)
Sulam, J., Aberdam, A., Beck, A., Elad, M.: On multi-layer basis pursuit, efficient algorithms and convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
Sulam, J., Papyan, V., Romano, Y., Elad, M.: Multilayer convolutional sparse modeling: pursuit and dictionary learning. IEEE Trans. Signal Process. 66(15), 4090–4104 (2018)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Dumitru, E., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: ICLR (2014)
Zeiler, M.D.., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: IEEE-CVPR (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Y. Romano and A. Aberdam contributed equally to this work.
The research leading to these results has received funding from the Technion Hiroshi Fujiwara Cyber Security Research Center and the Israel Cyber Directorate, and from Israel Science Foundation (ISF) grant no. 335/18.
Y. R. thanks the Zuckerman Institute, ISEF Foundation and the Viterbi Fellowship, Technion, for supporting this research.
Appendices
Appendix A: Proof of Theorem 7: Stable Binary Classification of the CSC Model
Theorem 5
(Stable binary classification of the CSC model) Suppose we are given a CSC signal \( {\mathbf {X}}\), \( \Vert {\varvec{\Upgamma }}\Vert _{0,\infty }^{\scriptscriptstyle {{\mathbf {S}}}}\le k \), contaminated with perturbation \( {\mathbf {E}}\) to create the signal \( {\mathbf {Y}}= {\mathbf {X}}+ {\mathbf {E}}\), such that \( \Vert {\mathbf {E}}\Vert _{2} \le \epsilon \). Suppose further that \( {\mathcal {O}}_{{\mathcal {B}}}^* > 0 \) and denote by \({\hat{{\varvec{\Upgamma }}}}\) the solution of the \( {\text {P}_{0,\infty }^{{\varvec{{\mathcal {E}}}}}}\) problem. Assuming that \( \delta _{2k} < 1 - \left( \frac{2{\left\| {\mathbf {w}}\right\| _2}\epsilon }{{\mathcal {O}}_{{\mathcal {B}}}^*}\right) ^2, \) then \( sign(f({\mathbf {X}})) = sign(f({\mathbf {Y}}))\).
Considering the more conservative bound that relies on \( \mu ({\mathbf {D}}) \), and assuming that
then \( sign(f({\mathbf {X}})) = sign(f({\mathbf {Y}}))\).
Proof
Without loss of generality, consider the case where \( {\mathbf {w}}^T{\varvec{\Upgamma }}+ \omega > 0 \), i.e., the original signal \( {\mathbf {X}}\) is assigned to class \( y = 1 \). Our goal is to show that \( {\mathbf {w}}^T{\hat{{\varvec{\Upgamma }}}} + \omega > 0 \). We start by manipulating the latter expression as follows:
where the first inequality relies on the relation \( a + b \ge a - |b| \) for \( a > 0 \), and the last derivation leans on the Cauchy-Schwarz inequality. Using the SRIP [22] and the fact that both \( \Vert {\mathbf {Y}}- {\mathbf {D}}{\varvec{\Upgamma }}\Vert _2 \le \epsilon \) and \( \Vert {\mathbf {Y}}- {\mathbf {D}}{\hat{{\varvec{\Upgamma }}}}\Vert _2 \le \epsilon \), we get
Thus,
Combining the above with Eq. (4) leads to (recall that y = 1):
Using the definition of the score of our classifier, satisfying
we get
We are now after the condition for \( {\mathcal {O}}_{{\mathcal {B}}}({\mathbf {Y}},y) > 0\), and so we require:
where we relied on the fact that \( {\mathcal {O}}_{{\mathcal {B}}}({\mathbf {X}},y) \ge {\mathcal {O}}_{{\mathcal {B}}}^* \). The above inequality leads to
Next we turn to develop the condition that relies on \( \mu ({\mathbf {D}}) \). We shall use the relation between the SRIP and the mutual coherence [22], given by \( \delta _{2k} \ge (2k-1)\mu ({\mathbf {D}}) \) for all \( k < \frac{1}{2} \left( 1 + \frac{1}{\mu ({\mathbf {D}})}\right) \). Plugging this bound into Eq. (5) results in
which completes our proof. \(\square \)
Appendix B: Proof of Theorem 9: Stable Multi-class Classification of the CSC Model
Theorem 7
(Stable multi-class classification of the CSC model) Suppose we are given a CSC signal \( {\mathbf {X}}\), \( \Vert {\varvec{\Upgamma }}\Vert _{0,\infty }^{\scriptscriptstyle {{\mathbf {S}}}}\le k \), contaminated with perturbation \( {\mathbf {E}}\) to create the signal \( {\mathbf {Y}}= {\mathbf {X}}+ {\mathbf {E}}\), such that \( \Vert {\mathbf {E}}\Vert _{2} \le \epsilon \). Suppose further that \( f_u({\mathbf {X}}) = {\mathbf {w}}_u^T{\varvec{\Upgamma }}+ \omega _u \) correctly assigns \( {\mathbf {X}}\) to class \( y = u \). Suppose further that \( {\mathcal {O}}_{{\mathcal {M}}}^* > 0 \), and denote by \({\hat{{\varvec{\Upgamma }}}}\) the solution of the \( {\text {P}_0^{{\varvec{{\mathcal {E}}}}}}\) problem. Assuming that \( \delta _{2k} < 1 - \left( \frac{2\phi ({\mathbf {W}})\epsilon }{{\mathcal {O}}_{{\mathcal {M}}}^*}\right) ^2, \) then \( {\mathbf {Y}}\) will be assigned to the correct class.
Considering the more conservative bound that relies on \( \mu ({\mathbf {D}}) \) and assuming that
then \( {\mathbf {Y}}\) will be assigned to the correct class.
Proof
Given that \( f_u({\varvec{\Upgamma }}) = {\mathbf {w}}_u^T{\varvec{\Upgamma }}+ \omega _u > f_v({\varvec{\Upgamma }}) = {\mathbf {w}}_v^T{\varvec{\Upgamma }}+ \omega _v \) for all \( v \ne u \), i.e., \( {\mathbf {X}}\) belongs to class \( y = u \), we shall prove that \( f_u({\hat{{\varvec{\Upgamma }}}}) > f_v({\hat{{\varvec{\Upgamma }}}}) \) for all \( v \ne u \). Denoting \( \varDelta = {\hat{{\varvec{\Upgamma }}}} - {\varvec{\Upgamma }}\), we bound from below the difference \( f_u({\hat{{\varvec{\Upgamma }}}}) - f_v({\hat{{\varvec{\Upgamma }}}})\) as follows:
Similarly to the proof of Theorem 7, the first inequality holds since \( a + b \ge a - |b| \) for \( a = f_u({\varvec{\Upgamma }}) - f_v({\varvec{\Upgamma }}) > 0 \), and the last inequality relies on the Cauchy-Schwarz formula. Relying on \( \phi ({\mathbf {W}}) \) that satisfies
and plugging \(\Vert \varDelta \Vert _2^2 \le \frac{4\epsilon ^2}{1-\delta _{2k}} \) into Eq. (6) we get
where the second to last inequality holds since \( f_u({\varvec{\Upgamma }}) - f_v({\varvec{\Upgamma }}) \ge {\mathcal {O}}_{{\mathcal {M}}}({{\mathbf {X}}},y)\), and the last inequality follows the definition of \( {\mathcal {O}}_{{\mathcal {M}}}^* \). As such, we shall seek for the following inequality to hold:
Similarly to the binary setting, one can readily write the above in terms of \( \mu ({\mathbf {D}}) \). \(\square \)
Appendix C: Proof of Theorem 12: Stable Binary Classification of the L-THR
Theorem 10
(Stable binary classification of the L-THR) Suppose we are given an ML-CSC signal \( {\mathbf {X}}\) contaminated with perturbation \( {\mathbf {E}}\) to create the signal \( {\mathbf {Y}}= {\mathbf {X}}+ {\mathbf {E}}\), such that \(\Vert {\mathbf {E}}\Vert _{2,\infty }^{\scriptscriptstyle {{\mathbf {P}}}}\le \epsilon _0\). Denote by \(|\varGamma _i^{\text {min}}|\) and \(|\varGamma _i^{\text {max}}|\) the lowest and highest entries in absolute value in the vector \({\varvec{\Upgamma }}_i\), respectively. Suppose further that \( {\mathcal {O}}_{{\mathcal {B}}}^* > 0 \) and let \(\{{\hat{{\varvec{\Upgamma }}}}_i\}_{i=1}^{K}\) be the set of solutions obtained by running the layered soft thresholding algorithm with thresholds \(\{\beta _i\}_{i=1}^{K}\), i.e., \({\hat{{\varvec{\Upgamma }}}}_i=\S _{\beta _i}({\mathbf {D}}_i^T{\hat{{\varvec{\Upgamma }}}}_{i-1})\) where \( \S _{\beta _i} \) is the soft thresholding operator and \({\hat{{\varvec{\Upgamma }}}}_{0}={\mathbf {Y}}\). Assuming that \(\forall \ 1 \le i \le K\)
- a.
\(\Vert {\varvec{\Upgamma }}_i \Vert _{0,\infty }^{\scriptscriptstyle {{\mathbf {S}}}}< \frac{1}{2} \left( 1 + \frac{1}{\mu ({\mathbf {D}}_i)} \frac{ |\varGamma _i^{\text {min}}| }{ |\varGamma _i^{\text {max}}| } \right) - \frac{1}{\mu ({\mathbf {D}}_i)}\frac{ \epsilon _{i-1} }{|\varGamma _i^{\text {max}}|}\);
- b.
The threshold \(\beta _i\) is chosen according to
$$\begin{aligned} |{\varvec{\Upgamma }}_i^{\text {min}}| - C_i - \epsilon _{i-1}> \beta _i > K_i + \epsilon _{i-1}, \end{aligned}$$where
$$\begin{aligned} \begin{aligned} C_i= & {} ( \Vert {\varvec{\Upgamma }}_i \Vert _{0,\infty }^{\scriptscriptstyle {{\mathbf {S}}}}- 1 ) \mu ({\mathbf {D}}_i) |{\varvec{\Upgamma }}_i^{\text {max}}|, \\ K_i= & {} \Vert {\varvec{\Upgamma }}_i \Vert _{0,\infty }^{\scriptscriptstyle {{\mathbf {S}}}}\mu ({\mathbf {D}}_i) |{\varvec{\Upgamma }}_i^{\text {max}}|, \\ \epsilon _i= & {} \sqrt{ \Vert {\varvec{\Upgamma }}_{i} \Vert _{0,\infty }^{\scriptscriptstyle {{\mathbf {P}}}}} \ \Big ( \epsilon _{i-1} + C_i + \beta _i \Big ); \end{aligned} \end{aligned}$$and
- c.
\({\mathcal {O}}_{{\mathcal {B}}}^* > \Vert {\mathbf {w}}\Vert _2\sqrt{\Vert {\varvec{\Upgamma }}_K\Vert _0} \Big (\epsilon _{K-1} +C_K + \beta _K\Big )\),
then \( sign(f({\mathbf {Y}})) = sign(f({\mathbf {X}}))\).
Proof
Following Theorem 10 in [22], if assumptions (a)–(c) above hold \(\forall \ 1 \le i \le K\) then
- 1.
The support of the solution \({\hat{{\varvec{\Upgamma }}}}_i\) is equal to that of \({\varvec{\Upgamma }}_i\); and
- 2.
\(\Vert {\varvec{\Upgamma }}_i - {\hat{{\varvec{\Upgamma }}}}_i \Vert _{2,\infty }^{\scriptscriptstyle {{\mathbf {P}}}}\le \epsilon _i\), where \(\epsilon _i\) defined above.
In particular, the last layer satisfies
Defining \( \varDelta = {\hat{{\varvec{\Upgamma }}}}_K - {\varvec{\Upgamma }}_K \), we get
where the last equality relies on the successful recovery of the support. Having the upper bound on \( \Vert \varDelta \Vert _2 \), one can follow the transition from Eqs. (4) to (5) (see the proof of Theorem 7), leading to the following requirement for accurate classification:
Plugging Eq. (7) to the above expression results in the additional condition that ties the propagated error throughout the layers to the output margin, given by
\(\square \)
Rights and permissions
About this article
Cite this article
Romano, Y., Aberdam, A., Sulam, J. et al. Adversarial Noise Attacks of Deep Learning Architectures: Stability Analysis via Sparse-Modeled Signals. J Math Imaging Vis 62, 313–327 (2020). https://doi.org/10.1007/s10851-019-00913-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-019-00913-z