Skip to main content
Log in

Towards robust stacked capsule autoencoder with hybrid adversarial training

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Capsule networks (CapsNets) are new neural networks that classify images based on the spatial relationships of features. By analyzing the pose of features and their relative positions, it is more capable of recognizing images after affine transformation. The stacked capsule autoencoder (SCAE) is a state-of-the-art CapsNet that achieved unsupervised classification of CapsNets for the first time. However, the security vulnerabilities and the robustness of the SCAE have rarely been explored. In this paper, we propose an evasion attack against SCAE, where the attacker can generate adversarial perturbations by reducing the contribution of the object capsules related to the original category of the image in the SCAE. Adversarial perturbations are then applied to the original images, and the perturbed images are misclassified with a high probability. For such an evasion attack, we further propose a defense method called hybrid adversarial training (HAT), which makes use of adversarial training and adversarial distillation to achieve better robustness of SCAE against the evasion attack. We evaluate the defense method and the experimental results show that the SCAE trained with HAT ensures that the model can maintain relatively high classification accuracy under the evasion attack and achieve similar classification accuracy to that of the original SCAE model on clean samples. The source code is available at https://github.com/FrostbiteXSW/SCAE_Defense.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 5

Similar content being viewed by others

Notes

  1. During the experiments, we use \(\textrm{arctanh} \left( \left( 2x-1 \right) * \epsilon \right) \) to avoid dividing by zero.

References

  1. Kosiorek A, Sabour S, Teh YW, Hinton GE (2019) Stacked capsule autoencoders. Adv Neural Inf Process Sys 32

  2. Yoon J (2017) Adversarial Attack to Capsule Networks. Published: [EB/OL]

  3. Michels F, Uelwer T, Upschulte E, Harmeling S (2019) On the vulnerability of capsule networks to adversarial attacks. arXiv preprint. arXiv:1906.03612

  4. Marchisio A, Nanfa G, Khalid F, Hanif MA, Martina M, Shafique M (2019) CapsAttacks: robust and imperceptible adversarial attacks on capsule networks. arXiv preprint. arXiv:1901.09878

  5. De Marco A (2020) Capsule networks robustness against adversarial attacks and affine transformations. PhD Thesis, Politecnico di Torino Turin, Italy

  6. Dai J, Xiong S (2020) An evasion attack against stacked capsule autoencoder. arXiv preprint. arXiv:2010.07230

  7. Li Y, Su H, Zhu J (2021) AdvCapsNet: to defense adversarial attacks based on Capsule networks. J Vis Commun Image Represent 75:103037. Publisher: Elsevier

  8. Peer D, Stabinger S, Rodriguez-Sanchez A (2019) Increasing the adversarial robustness and explainability of capsule networks with \({\gamma }\)-capsules. arXiv preprint. arXiv:1812.09707

  9. Garg S, Alexander J, Kothari T (2017) Using capsule networks with thermometer encoding to defend against adversarial attacks. Proceedings of the CS229 final project session, Stanford, CA

  10. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Adv Neural Inf Process Sys 30

  11. Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with EM routing. In International conference on learning representations

  12. Lee J, Lee Y, Kim J, Kosiorek A, Choi S, Teh YW (2019) Set transformer: a framework for attention-based permutation-invariant neural networks. In: International conference on machine learning, pp 3744–3753. PMLR

  13. Chen X, Liu C, Li B, Lu K, Song D (2017) Targeted backdoor attacks on deep learning systems using data poisoning. J Environ Sci (China) English Ed

  14. Zhong H, Liao C, Squicciarini AC, Zhu S, Miller D (2020) Backdoor embedding in convolutional neural network models via invisible perturbation. In Proceedings of the tenth ACM conference on data and application security and privacy, pp 97–108

  15. Shafahi A, Huang WR, Najibi M, Suciu O, Studer C, Dumitras T, Goldstein T (2018) Poison frogs! Targeted clean-label poisoning attacks on neural networks. Adv Neural Inf Process Sys 31

  16. Saha A, Subramanya A, Pirsiavash H (2020) Hidden trigger backdoor attacks. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, p 11957–11965. Issue: 07

  17. Dai J, Chen C, Li Y (2019) A backdoor attack against lstm-based text classification systems. IEEE Access 7:138872–138878. Publisher: IEEE

  18. Yao Y, Li H, Zheng H, Zhao BY (2019) Latent backdoor attacks on deep neural networks. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pp 2041–2055

  19. Shen J, Zhu X, Ma D (2019) TensorClog: an imperceptible poisoning attack on deep neural network applications. IEEE Access 7:41498–41506. Publisher: IEEE

  20. Zhu C, Huang WR, Li H, Taylor G, Studer C, Goldstein T (2019) Transferable clean-label poisoning attacks on deep neural nets. In: International conference on machine learning, pp 7614–7623. PMLR

  21. Liu Y, Ma S, Aafer Y, Lee W-C, Zhai J, Wang W, Zhang X (2018) Trojaning attack on neural networks. In: Proceedings of the 25th annual network and distributed system security symposium

  22. Kwon H, Yoon H, Park K-W (2019) Selective poisoning attack on deep neural networks. Symmetry 11(7):892. Publisher: Multidisciplinary Digital Publishing Institute

  23. Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv preprint. arXiv:1412.6572

  24. Akhtar N, Mian A (2018) Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6:14410–14430. Publisher: IEEE

  25. Moosavi-Dezfooli S-M, Fawzi A, Fawzi O, Frossard P (2017) Universal adversarial perturbations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1765–1773

  26. Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (SP), pp 39–57. IEEE

  27. Kurakin A, Goodfellow IJ, Bengio S (2018) Adversarial examples in the physical world. In: Artificial intelligence safety and security, pp 99–112. Chapman and Hall/CRC

  28. Su J, Vargas DV, Sakurai K (2019) One pixel attack for fooling deep neural networks. IEEE Trans Evol Comput 23(5):828–841. Publisher: IEEE

  29. Sarkar S, Bansal A, Mahbub U, Chellappa R (2017) UPSET and ANGRI: breaking high performance image classifiers. arXiv preprint. arXiv:1707.01159

  30. Baluja S, Fischer I (2017) Adversarial transformation networks: learning to generate adversarial examples. arXiv preprint. arXiv:1703.09387

  31. Cisse M, Adi Y, Neverova N, Keshet J (2017) Houdini: fooling deep structured prediction models. arXiv preprint. arXiv:1707.05373

  32. Din SU, Akhtar N, Younis S, Shafait F, Mansoor A, Shafique M (2020) Steganographic universal adversarial perturbations. Pattern Recogn Lett 135:146–152. Publisher: Elsevier

  33. Brendel W, Rauber J, Bethge M (2018) Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: International conference on learning representations

  34. Xu H, Ma Y, Liu H-C, Deb D, Liu H, Tang J-L, Jain AK (2020) Adversarial attacks and defenses in images, graphs and text: a review. Int J Autom Comput 17(2):151–178. Publisher: Springer

  35. Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP), pp 582–597. IEEE

  36. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks

  37. Tramèr F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P (2018) Ensemble adversarial training: attacks and defenses. In: International conference on learning representations

  38. Sinha A, Namkoong H, Duchi J (2018) Certifying some distributional robustness with principled adversarial training

  39. Shafahi A, Najibi M, Ghiasi MA, Xu Z, Dickerson J, Studer C, Davis LS, Taylor G, Goldstein T (2019) Adversarial training for free! Adv Neural Inf Process Sys 32

  40. Buckman J, Roy A, Raffel C, Goodfellow I (2018) Thermometer encoding: one hot way to resist adversarial examples. In: International conference on learning representations

  41. Guo C, Rana M, Cisse M, van der Maaten L (2018) Countering adversarial images using input transformations

  42. Dhillon GS, Azizzadenesheli K, Lipton ZC, Bernstein JD, Kossaifi J, Khanna A, Anandkumar A (2018) Stochastic activation pruning for robust adversarial defense

  43. Xie C, Wang J, Zhang Z, Ren Z, Yuille A (2018) Mitigating adversarial effects through randomization

  44. Le T, Park N, Lee D (2021) A sweet rabbit hole by darcy: using honeypots to detect universal trigger’s adversarial attacks. In: 59th annual meeting of the association for Comp. Linguistics (ACL)

  45. Shan S, Wenger E, Wang B, Li B, Zheng H, Zhao BY (2020) Gotta catch’em all: using honeypots to catch adversarial attacks on neural networks. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 67–83

  46. Chen T, Zhang Z, Liu S, Chang S, Wang Z (2020) Robust overfitting may be mitigated by properly learned smoothening

  47. Goldblum M, Fowl L, Feizi S, Goldstein T (2020) Adversarially robust distillation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3996–4003. Issue: 04

  48. Zhu J, Yao J, Han B, Zhang J, Liu T, Niu G, Zhou J, Xu J, Yang H (2021) Reliable adversarial distillation with unreliable teachers. arXiv preprint. arXiv:2106.04928

  49. Frosst N, Sabour S, Hinton GE (2018) DARCCC: detecting adversaries by reconstruction from class conditional capsules. arXiv preprint. arXiv:1811.06969

  50. Qin Y, Frosst N, Raffel C, Cottrell G, Hinton G (2020) Deflecting adversarial attacks. arXiv preprint. arXiv:2002.07405

  51. Deng P, Rahman MS, Wright M (2020) Detecting adversarial patches with class conditional reconstruction networks. arXiv preprint. arXiv:2011.05850

  52. Thang DD, Matsui T (2020) Adversarial examples identification in an end-to-end system with image transformation and filters. IEEE Access 8:44426–44442. Publisher: IEEE

  53. Su Y, Sun G, Fan W, Lu X, Liu Z (2019) Cleaning adversarial perturbations via residual generative network for face verification. In: 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2597–2601. IEEE

  54. Gu S, Yi P, Zhu T, Yao Y, Wang W (2019) Detecting adversarial examples in deep neural networks using normalizing filters. ScitePress, Publisher, UMBC Student Collection

    Book  Google Scholar 

  55. Fan W, Sun G, Su Y, Liu Z, Lu X (2019) Hybrid defense for deep neural networks: an integration of detecting and cleaning adversarial perturbations. In: 2019 IEEE international conference on multimedia & expo workshops (ICMEW), pp 210–215. IEEE

  56. Ruan Y, Dai J (2018) TwinNet: a double sub-network framework for detecting universal adversarial perturbations. Futur Internet 10(3):26. Publisher: Multidisciplinary Digital Publishing Institute

  57. AprilPyone MM, Kinoshita Y, Kiya H (2019) Filtering adversarial noise with double quantization. In: 2019 Asia-pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 1745–1749. IEEE

  58. Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2(1-2):83–97. Publisher: Wiley Online Library

  59. Hinton GE, Vinyals O, Dean J, others (2015) Distilling the knowledge in a neural network 2(7). arXiv preprint. arXiv:1503.02531

  60. Kosiorek A, Sabour S, Teh YW, Hinton GE (2019) Stacked capsule autoencoders. In:Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No.: 1390, pp 15512–15522

Download references

Acknowledgements

The research of this paper was supported by Natural Science Foundation of Shanghai Municipality (Grant NO.22ZR1422600).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiazhu Dai.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Generating adversarial samples for training

Appendix A: Generating adversarial samples for training

The generation algorithm of adversarial samples for training is based on the evasion attack algorithm proposed in Section 4. We replace the optimizer in the original algorithm with the following formula to accelerate the generation of adversarial samples:

$$\begin{aligned} p_{N+1}^{'} = p_{N}^{'} - \beta \cdot \textrm{sign} \left( \bigtriangledown _{p_{N}^{'}} \left( \left\| p_{N} \right\| _2 + \alpha \cdot f \left( x+p_{N} \right) \right) \right) \end{aligned}$$
(A1)

where \(p_{N}\) represents the perturbation obtained in the \(N^{th}\) iteration, \(f \left( x+p_{N} \right) \) is the target function of Formula (3), and \(\beta >0\) is a hyperparameter that defines the step of update in each iteration. The whole algorithm is shown in Algorithm 5.

The update strategy of \(\alpha \) is the same as that described in Section 4. We will not go into much detail here.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, J., Xiong, S. Towards robust stacked capsule autoencoder with hybrid adversarial training. Appl Intell 53, 28153–28168 (2023). https://doi.org/10.1007/s10489-023-05002-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05002-8

Keywords

Navigation