Abstract
Deep neural network (DNN) models are widely renowned for their resistance to random perturbations. However, researchers have found out that these models are indeed extremely vulnerable to deliberately crafted and seemingly imperceptible perturbations of the input, referred to as adversarial examples. Adversarial attacks have the potential to substantially compromise the security of DNN-powered systems and posing high risks especially in the areas where security is a top priority. Numerous studies have been conducted in recent years to defend against these attacks and to develop more robust architectures resistant to adversarial threats. In this study, we propose a new architecture and enhance a recently proposed technique by which we can restore adversarial samples back to their original class manifold. We leverage the use of several uncertainty metrics obtained from Monte Carlo dropout (MC Dropout) estimates of the model together with the model’s own loss function and combine them with the use of defensive distillation technique to defend against these attacks. We have experimentally evaluated and verified the efficacy of our approach on MNIST (Digit), MNIST (Fashion) and CIFAR10 datasets. In our experiments, we showed that our proposed method reduces the attack’s success rate lower than 5% without compromising clean accuracy.
Similar content being viewed by others
Data availability
Datasets used in the manuscript can be found at: http://yann.lecun.com/exdb/mnist/, https://github.com/zalandoresearch/fashion-mnist, https://www.cs.toronto.edu/kriz/cifar.html.
Notes
We use Torch 1.13.0 to implement the attacks and the proposed defense method in a computer with processor Intel Core i5-1145G7 2.6 GHz and Windows 10 OS.
References
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). arXiv:1512.03385
Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks (2014). arXiv:1312.6082
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks (2014). arXiv:1312.6199
Sato, M., Suzuki, J., Shindo, H., Matsumoto, Y.: Interpretable adversarial perturbation in input embedding space for text (2018). arXiv:1805.02917
Carlini, N., Wagner, D.: Audio adversarial examples: Targeted attacks on speech-to-text (2018). arXiv:1801.01944
Finlayson, S.G., Chung, H.W., Kohane, I.S., Beam, A.L.: Adversarial attacks against medical deep learning systems (2019). arXiv:1804.05296
Sitawarin, C., Bhagoji, A.N., Mosenia, A., Chiang, M., Mittal, P.: Darts: Deceiving autonomous cars with toxic signs (2018). arXiv:1802.06430
Morgulis, N., Kreines, A., Mendelowitz, S., Weisglass, Y.: Fooling a real car with adversarial traffic signs (2019). arXiv:1907.00374
Tuna, O.F., Catak, F.O., Eskil, M.T.: Uncertainty as a swiss army knife: new adversarial attack and defense ideas based on epistemic uncertainty. Complex Intell. Syst. https://doi.org/10.1007/s40747-022-00701-0
Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y., Thamo, E., Wu, M., Yi, X.: A survey of safety and trustworthiness of deep neural networks: verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37, 100270 (2020). https://doi.org/10.1016/j.cosrev.2020.100270
Catak, F.O., Sivaslioglu, S., Sahinbas, K.: A generative model based adversarial security of deep learning and linear classifier models (2020). arXiv:2010.08546
Qayyum, A., Usama, M., Qadir, J., Al-Fuqaha, A.: Securing connected autonomous vehicles: challenges posed by adversarial machine learning and the way forward. IEEE Commun. Surv. Tutor. 22(2), 998–1026 (2020). https://doi.org/10.1109/COMST.2020.2975048
Sadeghi, K., Banerjee, A., Gupta, S.K.S.: A system-driven taxonomy of attacks and defenses in adversarial machine learning. IEEE Trans. Emerg. Top. Comput. Intell. 4(4), 450–467 (2020). https://doi.org/10.1109/TETCI.2020.2968933
Zheng, Z., Hong, P.: Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/file/e7a425c6ece20cbc9056f98699b53c6f-Paper.pdf
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2015). arXiv:1412.6572
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world (2017). arXiv:1607.02533
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial machine learning at scale. CoRR arXiv:1611.01236
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks (2019). arXiv:1706.06083
Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks (2016). arXiv:1511.04599
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks (2017). arXiv:1608.04644
Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query-efficient black-box adversarial attack via random search (2020). arXiv:1912.00049
Chen, J., Jordan, M.I., Wainwright, M.J.: Hopskipjumpattack: A query-efficient decision-based attack. In: IEEE Symposium on Security and Privacy (SP) 2020, pp. 1277–1294 (2020). https://doi.org/10.1109/SP40000.2020.00045
Ilyas, A., Engstrom, L., Madry, A.: Prior convictions: Black-box adversarial attacks with bandits and priors (2019). arXiv:1807.07978
Tuna, O.F., Catak, F.O., Eskil, M.T.: Exploiting epistemic uncertainty of the deep learning models to generate adversarial samples. Multimedia Tools Appl. 81(8), 11479–11500 (2022). https://doi.org/10.1007/s11042-022-12132-7
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv:1503.02531
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks (2016). arXiv:1511.04508
Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., Zhu, J.: Defense against adversarial attacks using high-level representation guided denoiser (2018). arXiv:1712.02976
Shen, S., Jin, G., Gao, K., Zhang, Y.: Ape-gan: Adversarial perturbation elimination with gan (2017). arXiv:1707.05474
Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples (2020). arXiv:1801.09344
Tuna, O.F., Catak, F.O., Eskil, M.T.: Closeness and uncertainty aware adversarial examples detection in adversarial machine learning (2020). arXiv:2012.06390
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods (2020). arXiv:1910.09457
An, D., Liu, J., Zhang, M., Chen, X., Chen, M., Sun, H.: Uncertainty modeling and runtime verification for autonomous vehicles driving control: a machine learning-based approach. J. Syst. Softw. 167, 110617 (2020)
Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Appl. Soft Comput. 101, 107046 (2021). https://doi.org/10.1016/j.asoc.2020.107046
Antonelli, F., Cortellessa, V., Gribaudo, M., Pinciroli, R., Trivedi, K.S., Trubiani, C.: Analytical modeling of performance indices under epistemic uncertainty applied to cloud computing systems. Future Gen. Comput. Syst. 102, 746–761 (2020). https://doi.org/10.1016/j.future.2019.09.006
Zhou, D.-X.: Universality of deep convolutional neural networks (2018). arXiv:1805.10769
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. (MCSS) 2(4), 303–314 (1989). https://doi.org/10.1007/BF02551274
Loquercio, A., Segu, M., Scaramuzza, D.: general framework for uncertainty estimation in deep learning. IEEE Robot. Autom. Lett. 5(2), 3153–3160 (2020). https://doi.org/10.1109/LRA.2020.2974682
Gurevich, P., Stuke, H.: Pairing an arbitrary regressor with an artificial neural network estimating aleatoric uncertainty. Neurocomputing 350, 291–306 (2019). https://doi.org/10.1016/j.neucom.2019.03.031
Senge, R., Bösner, S., Dembczyński, K., Haasenritter, J., Hirsch, O., Donner-Banzhoff, N., Hüllermeier, E.: Reliable classification: learning classifiers that distinguish aleatoric and epistemic uncertainty. Inf. Sci. 255, 16–29 (2014). https://doi.org/10.1016/j.ins.2013.07.030
Reinhold, J.C., He, Y., Han, S., Chen, Y., Gao, D., Lee, J., Prince, J.L., Carass, A.: Finding novelty with uncertainty (2020). arXiv:2002.04626
Graves, A.: Practical variational inference for neural networks. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 2348–2356. Curran Associates Inc, London (2011)
Paisley, J., Blei, D., Jordan, M.: Variational bayesian inference with stochastic search (2012). arXiv:1206.6430
Hoffman, M., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference (2013). arXiv:1206.7051
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks (2015). arXiv:1505.05424
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles (2017). arXiv:1612.01474
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning (2016). arXiv:1506.02142
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? (2017). arXiv:1703.04977
Kwon, Y., Won, J.-H., Kim, B.J., Paik, M.C.: Uncertainty quantification using Bayesian neural networks in classification: application to biomedical image segmentation. Comput. Stat. Data Anal. 142, 106816 (2020). https://doi.org/10.1016/j.csda.2019.106816
Aladag, M., Catak, F.O., Gul, E.: Preventing data poisoning attacks by using generative models. In: 2019 1st International Informatics and Software Engineering Conference (UBMYK), pp. 1–5 (2019). https://doi.org/10.1109/UBMYK48245.2019.8965459
Carlini, N., Athalye, A., Papernot, N., Brendel, W., Rauber, J., Tsipras, D., Goodfellow, I., Madry, A., Kurakin, A.: On evaluating adversarial robustness (2019). arXiv:1902.06705
LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv:1708.07747
Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 (canadian institute for advanced research). http://www.cs.toronto.edu/~kriz/cifar.html
Rauber, J., Brendel, W., Bethge, M.: Foolbox: A python toolbox to benchmark the robustness of machine learning models (2018). arXiv:1707.04131
Ding, G. W., Wang, L., Jin, X.: advertorch v0.1: An adversarial robustness toolbox based on pytorch (2019). arXiv:1902.07623
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples (2018). arXiv:1802.00420
Katzir, Z., Elovici, Y.: Why blocking targeted adversarial perturbations impairs the ability to learn (2019). arXiv:1907.05718
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015). arXiv:1409.1556
Acknowledgements
This work was supported by The Scientific and Technological Research Council of Turkey (TUBITAK) through the 1515 Frontier Research and Development Laboratories Support Program under Project 5169902, and has been partly funded by the European Union’s Horizon Europe research and innovation programme and Smart Networks and Services Joint Undertaking (SNS JU) under Grant Agreement No: 101096034 (VERGE Project).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare. All co-authors have seen and agreed with the contents of the manuscript. We certify that the submission is original work and is not under review at any other publication.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tuna, O.F., Catak, F.O. & Eskil, M.T. TENET: a new hybrid network architecture for adversarial defense. Int. J. Inf. Secur. 22, 987–1004 (2023). https://doi.org/10.1007/s10207-023-00675-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-023-00675-1