Abstract
Capsule networks (CapsNets) are new neural networks that classify images based on the spatial relationships of features. By analyzing the pose of features and their relative positions, it is more capable of recognizing images after affine transformation. The stacked capsule autoencoder (SCAE) is a state-of-the-art CapsNet that achieved unsupervised classification of CapsNets for the first time. However, the security vulnerabilities and the robustness of the SCAE have rarely been explored. In this paper, we propose an evasion attack against SCAE, where the attacker can generate adversarial perturbations by reducing the contribution of the object capsules related to the original category of the image in the SCAE. Adversarial perturbations are then applied to the original images, and the perturbed images are misclassified with a high probability. For such an evasion attack, we further propose a defense method called hybrid adversarial training (HAT), which makes use of adversarial training and adversarial distillation to achieve better robustness of SCAE against the evasion attack. We evaluate the defense method and the experimental results show that the SCAE trained with HAT ensures that the model can maintain relatively high classification accuracy under the evasion attack and achieve similar classification accuracy to that of the original SCAE model on clean samples. The source code is available at https://github.com/FrostbiteXSW/SCAE_Defense.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Figc_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Figd_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Fige_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Figf_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05002-8/MediaObjects/10489_2023_5002_Figg_HTML.png)
Similar content being viewed by others
Notes
During the experiments, we use \(\textrm{arctanh} \left( \left( 2x-1 \right) * \epsilon \right) \) to avoid dividing by zero.
References
Kosiorek A, Sabour S, Teh YW, Hinton GE (2019) Stacked capsule autoencoders. Adv Neural Inf Process Sys 32
Yoon J (2017) Adversarial Attack to Capsule Networks. Published: [EB/OL]
Michels F, Uelwer T, Upschulte E, Harmeling S (2019) On the vulnerability of capsule networks to adversarial attacks. arXiv preprint. arXiv:1906.03612
Marchisio A, Nanfa G, Khalid F, Hanif MA, Martina M, Shafique M (2019) CapsAttacks: robust and imperceptible adversarial attacks on capsule networks. arXiv preprint. arXiv:1901.09878
De Marco A (2020) Capsule networks robustness against adversarial attacks and affine transformations. PhD Thesis, Politecnico di Torino Turin, Italy
Dai J, Xiong S (2020) An evasion attack against stacked capsule autoencoder. arXiv preprint. arXiv:2010.07230
Li Y, Su H, Zhu J (2021) AdvCapsNet: to defense adversarial attacks based on Capsule networks. J Vis Commun Image Represent 75:103037. Publisher: Elsevier
Peer D, Stabinger S, Rodriguez-Sanchez A (2019) Increasing the adversarial robustness and explainability of capsule networks with \({\gamma }\)-capsules. arXiv preprint. arXiv:1812.09707
Garg S, Alexander J, Kothari T (2017) Using capsule networks with thermometer encoding to defend against adversarial attacks. Proceedings of the CS229 final project session, Stanford, CA
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Adv Neural Inf Process Sys 30
Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with EM routing. In International conference on learning representations
Lee J, Lee Y, Kim J, Kosiorek A, Choi S, Teh YW (2019) Set transformer: a framework for attention-based permutation-invariant neural networks. In: International conference on machine learning, pp 3744–3753. PMLR
Chen X, Liu C, Li B, Lu K, Song D (2017) Targeted backdoor attacks on deep learning systems using data poisoning. J Environ Sci (China) English Ed
Zhong H, Liao C, Squicciarini AC, Zhu S, Miller D (2020) Backdoor embedding in convolutional neural network models via invisible perturbation. In Proceedings of the tenth ACM conference on data and application security and privacy, pp 97–108
Shafahi A, Huang WR, Najibi M, Suciu O, Studer C, Dumitras T, Goldstein T (2018) Poison frogs! Targeted clean-label poisoning attacks on neural networks. Adv Neural Inf Process Sys 31
Saha A, Subramanya A, Pirsiavash H (2020) Hidden trigger backdoor attacks. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, p 11957–11965. Issue: 07
Dai J, Chen C, Li Y (2019) A backdoor attack against lstm-based text classification systems. IEEE Access 7:138872–138878. Publisher: IEEE
Yao Y, Li H, Zheng H, Zhao BY (2019) Latent backdoor attacks on deep neural networks. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pp 2041–2055
Shen J, Zhu X, Ma D (2019) TensorClog: an imperceptible poisoning attack on deep neural network applications. IEEE Access 7:41498–41506. Publisher: IEEE
Zhu C, Huang WR, Li H, Taylor G, Studer C, Goldstein T (2019) Transferable clean-label poisoning attacks on deep neural nets. In: International conference on machine learning, pp 7614–7623. PMLR
Liu Y, Ma S, Aafer Y, Lee W-C, Zhai J, Wang W, Zhang X (2018) Trojaning attack on neural networks. In: Proceedings of the 25th annual network and distributed system security symposium
Kwon H, Yoon H, Park K-W (2019) Selective poisoning attack on deep neural networks. Symmetry 11(7):892. Publisher: Multidisciplinary Digital Publishing Institute
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv preprint. arXiv:1412.6572
Akhtar N, Mian A (2018) Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6:14410–14430. Publisher: IEEE
Moosavi-Dezfooli S-M, Fawzi A, Fawzi O, Frossard P (2017) Universal adversarial perturbations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1765–1773
Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (SP), pp 39–57. IEEE
Kurakin A, Goodfellow IJ, Bengio S (2018) Adversarial examples in the physical world. In: Artificial intelligence safety and security, pp 99–112. Chapman and Hall/CRC
Su J, Vargas DV, Sakurai K (2019) One pixel attack for fooling deep neural networks. IEEE Trans Evol Comput 23(5):828–841. Publisher: IEEE
Sarkar S, Bansal A, Mahbub U, Chellappa R (2017) UPSET and ANGRI: breaking high performance image classifiers. arXiv preprint. arXiv:1707.01159
Baluja S, Fischer I (2017) Adversarial transformation networks: learning to generate adversarial examples. arXiv preprint. arXiv:1703.09387
Cisse M, Adi Y, Neverova N, Keshet J (2017) Houdini: fooling deep structured prediction models. arXiv preprint. arXiv:1707.05373
Din SU, Akhtar N, Younis S, Shafait F, Mansoor A, Shafique M (2020) Steganographic universal adversarial perturbations. Pattern Recogn Lett 135:146–152. Publisher: Elsevier
Brendel W, Rauber J, Bethge M (2018) Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: International conference on learning representations
Xu H, Ma Y, Liu H-C, Deb D, Liu H, Tang J-L, Jain AK (2020) Adversarial attacks and defenses in images, graphs and text: a review. Int J Autom Comput 17(2):151–178. Publisher: Springer
Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP), pp 582–597. IEEE
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks
Tramèr F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P (2018) Ensemble adversarial training: attacks and defenses. In: International conference on learning representations
Sinha A, Namkoong H, Duchi J (2018) Certifying some distributional robustness with principled adversarial training
Shafahi A, Najibi M, Ghiasi MA, Xu Z, Dickerson J, Studer C, Davis LS, Taylor G, Goldstein T (2019) Adversarial training for free! Adv Neural Inf Process Sys 32
Buckman J, Roy A, Raffel C, Goodfellow I (2018) Thermometer encoding: one hot way to resist adversarial examples. In: International conference on learning representations
Guo C, Rana M, Cisse M, van der Maaten L (2018) Countering adversarial images using input transformations
Dhillon GS, Azizzadenesheli K, Lipton ZC, Bernstein JD, Kossaifi J, Khanna A, Anandkumar A (2018) Stochastic activation pruning for robust adversarial defense
Xie C, Wang J, Zhang Z, Ren Z, Yuille A (2018) Mitigating adversarial effects through randomization
Le T, Park N, Lee D (2021) A sweet rabbit hole by darcy: using honeypots to detect universal trigger’s adversarial attacks. In: 59th annual meeting of the association for Comp. Linguistics (ACL)
Shan S, Wenger E, Wang B, Li B, Zheng H, Zhao BY (2020) Gotta catch’em all: using honeypots to catch adversarial attacks on neural networks. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 67–83
Chen T, Zhang Z, Liu S, Chang S, Wang Z (2020) Robust overfitting may be mitigated by properly learned smoothening
Goldblum M, Fowl L, Feizi S, Goldstein T (2020) Adversarially robust distillation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3996–4003. Issue: 04
Zhu J, Yao J, Han B, Zhang J, Liu T, Niu G, Zhou J, Xu J, Yang H (2021) Reliable adversarial distillation with unreliable teachers. arXiv preprint. arXiv:2106.04928
Frosst N, Sabour S, Hinton GE (2018) DARCCC: detecting adversaries by reconstruction from class conditional capsules. arXiv preprint. arXiv:1811.06969
Qin Y, Frosst N, Raffel C, Cottrell G, Hinton G (2020) Deflecting adversarial attacks. arXiv preprint. arXiv:2002.07405
Deng P, Rahman MS, Wright M (2020) Detecting adversarial patches with class conditional reconstruction networks. arXiv preprint. arXiv:2011.05850
Thang DD, Matsui T (2020) Adversarial examples identification in an end-to-end system with image transformation and filters. IEEE Access 8:44426–44442. Publisher: IEEE
Su Y, Sun G, Fan W, Lu X, Liu Z (2019) Cleaning adversarial perturbations via residual generative network for face verification. In: 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2597–2601. IEEE
Gu S, Yi P, Zhu T, Yao Y, Wang W (2019) Detecting adversarial examples in deep neural networks using normalizing filters. ScitePress, Publisher, UMBC Student Collection
Fan W, Sun G, Su Y, Liu Z, Lu X (2019) Hybrid defense for deep neural networks: an integration of detecting and cleaning adversarial perturbations. In: 2019 IEEE international conference on multimedia & expo workshops (ICMEW), pp 210–215. IEEE
Ruan Y, Dai J (2018) TwinNet: a double sub-network framework for detecting universal adversarial perturbations. Futur Internet 10(3):26. Publisher: Multidisciplinary Digital Publishing Institute
AprilPyone MM, Kinoshita Y, Kiya H (2019) Filtering adversarial noise with double quantization. In: 2019 Asia-pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 1745–1749. IEEE
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2(1-2):83–97. Publisher: Wiley Online Library
Hinton GE, Vinyals O, Dean J, others (2015) Distilling the knowledge in a neural network 2(7). arXiv preprint. arXiv:1503.02531
Kosiorek A, Sabour S, Teh YW, Hinton GE (2019) Stacked capsule autoencoders. In:Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No.: 1390, pp 15512–15522
Acknowledgements
The research of this paper was supported by Natural Science Foundation of Shanghai Municipality (Grant NO.22ZR1422600).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Generating adversarial samples for training
Appendix A: Generating adversarial samples for training
The generation algorithm of adversarial samples for training is based on the evasion attack algorithm proposed in Section 4. We replace the optimizer in the original algorithm with the following formula to accelerate the generation of adversarial samples:
where \(p_{N}\) represents the perturbation obtained in the \(N^{th}\) iteration, \(f \left( x+p_{N} \right) \) is the target function of Formula (3), and \(\beta >0\) is a hyperparameter that defines the step of update in each iteration. The whole algorithm is shown in Algorithm 5.
The update strategy of \(\alpha \) is the same as that described in Section 4. We will not go into much detail here.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dai, J., Xiong, S. Towards robust stacked capsule autoencoder with hybrid adversarial training. Appl Intell 53, 28153–28168 (2023). https://doi.org/10.1007/s10489-023-05002-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05002-8