Abstract
Object detection plays a vital role in numerous fields, including unmanned vehicles, security, and industry, among others. However, recent studies have shown that object detection models based on deep learning are vulnerable to adversarial example attacks. This poses significant challenges to the robustness of the models and substantially limits the applicability of object detection in critical security scenarios. Many existing adversarial attack methods for object detection focus primarily on specific types of object detection models. However, a widespread issue persists regarding weak transferability across different models. Furthermore, there is considerable room for enhancing the success rate of these attacks. To tackle these challenges, this paper proposes a data augmentation framework that leverages on the similarity between adversarial example generation and neural network training. Instead of utilizing the current gradient directly during the iterative process of calculating the gradient, the proposed method employs a weighted average gradient obtained by combining multiple data augmentation methods randomly. Additionally, the method combines the data-augmented gradients with momentum to obtain the gradients applicable to the adversarial attack in order to stabilize the update direction and avoid overfitting to the white-box model. The performance of the proposed method in attacking object detection models is evaluated on the MS COCO dataset using Faster R-CNN, YOLOv3 and YOLOv8. The experimental results demonstrate that, when compared to Projective Gradient Descend method and Iterative Fast Gradient Sign Method, the proposed method outperforms in both white-box and black-box settings. The transfer success rate achieved is up to 81.9% on RetinaNet.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the correspondence author.
References
Fang, W., Shen, L., Chen, Y.: Survey on image object detection algorithms based on deep learning. In: Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19–23, 2021, Proceedings, Part I 7, pp. 468–480. Springer (2021)
Arnold, E., et al.: A survey on 3d object detection methods for autonomous driving applications. IEEE Trans. Intell. Transp. Syst. 20(10), 3782–3795 (2019)
Shen, M., et al.: Effective and robust physical-world attacks on deep learning face recognition systems. IEEE Trans. Inf. Forensics Secur. 16, 4063–4077 (2021)
Mishra, P.K., Saroha, G.P.: A study on video surveillance system for object detection and tracking. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 221–226. IEEE (2016)
Kim, I.S., et al.: Intelligent visual surveillance—a survey. Int. J. Control. Autom. Syst. 8, 926–939 (2010)
Kim, H., et al.: Autonomous exploration in a cluttered environment for a mobile robot with 2d-map segmentation and object detection. IEEE Robot. Autom. Lett. 7(3), 6343–6350 (2022)
Li, Z., et al.: A mobile robotic arm grasping system with autonomous navigation and object detection. In: 2021 International Conference on Control, Automation and Information Sciences (ICCAIS), pp. 543–548. IEEE (2021)
Qiu, S., et al.: Review of artificial intelligence adversarial attack and defense technologies. Appl. Sci. 9(5), 909 (2019)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv: 1312.6199 (2013)
Akhtar, N., et al.: Threat of adversarial attacks on deep learning in computer vision: survey II. https://doi.org/10.48550/arXiv.2108.00401 (2021)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)
Goodfellow, I. J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014)
Madry, A., et al.: Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083 (2017)
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Mane, S., Mangale, S.: Moving object detection and tracking using convolutional neural networks. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1809–1813. IEEE (2018)
Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv:1605.07277 (2016)
Chang, X., Zhang, W., Qian, Y., et al.: End-to-end multi-speaker speech recognition with transformer. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6134–6138. IEEE (2020)
Ali, A.M., Ghaleb, F.A., Mohammed, M.S., et al.: Web-informed-augmented fake news detection model using stacked layers of convolutional neural network and deep autoencoder. Mathematics 11(9), 1992 (2023)
Hafeezallah, A., Al-Dhamari, A., Abu-Bakar, S.A.R.: Visual motion segmentation in crowd videos based on spatial-angular stacked sparse autoencoders. Comput. Syst. Sci. Eng. 47(1), 593–611 (2023)
Mohammed, M.S., Al-Dhamari, A., Saeed, W., et al.: Motion pattern-based scene classification using adaptive synthetic oversampling and fully connected deep neural network. IEEE Access 11, 119659–119675 (2023)
Zou, Z., et al.: Object detection in 20 years: a survey. In: Proceedings of the IEEE (2023)
Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Ren, S., et al.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 28 (2015)
Liu, W., et al.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp. 21–37 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. arXiv e-prints (2018)
Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Lin, T.Y., et al.: Focal loss for dense object detection. IEEE (2017). https://doi.org/10.1109/ICCV.2017.324
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world (2016)
Xie, C., et al.: Improving Transferability of Adversarial Examples with Input Diversity (2018)
Dong, Y., et al.: Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4312–4321 (2019)
Lin, J., et al.: Nesterov accelerated gradient and scale invariance for adversarial attacks. arXiv:1908.06281 (2019)
Lu, J., Sibai, H., Fabry, E.: Adversarial examples that fool detectors. arXiv:1712.02494 (2017)
Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, p. 28 (2015)
Li, Y., et al.: Robust adversarial perturbation on deep proposal-based models. arXiv:1809.05962 (2018)
Acknowledgements
The authors thank the peer reviewers for their careful reading and constructive comments.
Author information
Authors and Affiliations
Contributions
D. contributed to the writing of the main manuscript text, S. providing insightful analysis and interpretation of the research findings. M.D.X. played a key role in preparing figures 1-5, ensuring their accuracy and visual clarity. Additionally, all authors actively participated in the review process, offering valuable feedback and suggestions to enhance the overall quality of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, Z., Sun, L., Mao, X. et al. Adversarial example generation for object detection using a data augmentation framework and momentum. SIViP 18, 2485–2497 (2024). https://doi.org/10.1007/s11760-023-02924-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02924-1