Sparse Black-Box Video Attack with Reinforcement Learning

Wei, Xingxing; Yan, Huanqian; Li, Bo

doi:10.1007/s11263-022-01604-w

Sparse Black-Box Video Attack with Reinforcement Learning

Published: 06 April 2022

Volume 130, pages 1459–1473, (2022)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1018 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Adversarial attacks on video recognition models have been explored recently. However, most existing works treat each video frame equally and ignore their temporal interactions. To overcome this drawback, a few methods try to select some key frames and then perform attacks based on them. Unfortunately, their selection strategy is independent of the attacking step, therefore the resulting performance is limited. Instead, we argue the frame selection phase is closely relevant with the attacking phase. The key frames should be adjusted according to the attacking results. For that, we formulate the black-box video attacks into a Reinforcement Learning (RL) framework. Specifically, the environment in RL is set as the recognition model, and the agent in RL plays the role of frame selecting. By continuously querying the recognition models and receiving the attacking feedback, the agent gradually adjusts its frame selection strategy and adversarial perturbations become smaller and smaller. We conduct a series of experiments with two mainstream video recognition models: C3D and LRCN on the public UCF-101 and HMDB-51 datasets. The results demonstrate that the proposed method can significantly reduce the adversarial perturbations with efficient query times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Article 04 June 2022

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Applications of game theory in deep learning: a survey

Article 09 February 2022

References

Akhtar, N., & Mian, A. (2018). Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 6, 14410–14430.
Article Google Scholar
Bose, J. A., & Aarabi, P. (2018). Adversarial attacks on face detectors using neural net based constrained optimization. pp. 1–6.
Cheng, M., Le, T., Chen, P.-Y., Zhang, H., Yi, J., & Hsieh, C.-J. (2019). Query-efficient hard-label black-box attack: An optimization-based approach. In International conference on learning representations.
Cheng, M., Singh, S., Chen, P.-Y., Liu, S., & Hsieh, C.-J. (2020). Sign-opt: A query-efficient hard-label adversarial attack. In International conference on learning representations.
Croce, F., Rauber, J., & Hein, M. (2020). Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks. In International Journal of Computer Vision, 128(4), 1028–1046.
Article MathSciNet MATH Google Scholar
Das, N., Shanbhogue, M., Chen, S., Hohman, F., Li, S., Chen, L., Kounavis, M. E., & Chau, D. H. (2018) Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. Knowledge discovery and data mining, pp. 196–204.
Deng,L., Chen, J., Sun, Q., He, X., Tang, S., Ming, Z., Zhang, Y. & Chua,T.-S. (2019). Mixed-dish recognition with contextual relation networks. Proceedings of the 27th ACM International conference on multimedia.
Donahue, J., Hendricks, L. A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., & Darrell, T. (2017). Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 677–691.
Article Google Scholar
Dong, W., Zhang, Z., & Tan, T. (2019). Attention-aware sampling via deep reinforcement learning for action recognition. In National Conference on Artificial Intelligence, 33, 8247–8254.
Article Google Scholar
Dong, Y., Su, H., Wu, B., Li, Z., Liu, W., Zhang, T., & Zhu, J. (2019). Efficient decision-based black-box adversarial attacks on face recognition. In Computer vision and pattern recognition, pp. 7714–7722.
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J. M., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.
Article Google Scholar
Goodfellow, I. J., Jonathon, S., & Christian, S. (2015). Explaining and harnessing adversarial examples. In International conference on learning representations.
Goswami, G., Agarwal, A., Ratha, N., Singh, R., & Vatsa, M. (2019). Detecting and mitigating adversarial perturbations for robust face recognition. In International Journal of Computer Vision, 127(6), 719–742.
Article Google Scholar
Guo, C., Rana, M., Cisse, M., & Van Der Maaten, L. (2017). Countering adversarial images using input transformations. International conference on learning representations.
Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In computer vision and pattern recognition, pp. 6546–6555.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Computer vision and pattern recognition, pp. 770–778.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
Article Google Scholar
Ilyas, A.,Engstrom, L., Athalye, A., Lin, J., Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2018). Black-box adversarial attacks with limited queries and information. In International conference on machine learning.
Jia, X., Wei, X., & Cao, X. (2019). Identifying and resisting adversarial videos using temporal consistency. arXiv preprint arXiv:1909.04837.
Jia, X., Wei, X., Cao, X., & Foroosh, H. (2019). Comdefend: An efficient image compression model to defend adversarial examples. In 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp. 6084–6092.
Jiang, L., Ma, X., Chen, S., Bailey, J., & Jiangl, Y.-G. (2019). Black-box adversarial attacks on video recognition models. In Acm multimedia, pp. 864–872.
Kingma ,D. P. & Ba, J. (2015). Adam: A method for stochastic optimization. International conference on learning representations.
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre,T. (2011). Hmdb: A large video database for human motion recognition. In 2011 International conference on computer vision, IEEE. pp. 2556–2563.
Lecun, Y., Bengio, Y., & Hinton, G. E. (2015). Deep learning. Nature, 521(7553), 436–444.
Article Google Scholar
Li, S., Neupane, A., Paul, S., Song, C., Krishnamurthy, S. V., Roy-Chowdhury, A. K., & Swami, A. (2019). In network and distributed system security symposium: Stealthy adversarial perturbations against real-time video classification systems.
Li, Y. (2017). Deep reinforcement learning: An overview. arXiv: Learning.
Litjens, G. J. S., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., Van Der Laak, J. A., Van Ginneken, B., & Sanchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88.
Article Google Scholar
Liu, S., Chen, P. Y., Chen, X., & Hong, M. (2019). Signsgd via zeroth-order oracle. In 7th International conference on learning representations, ICLR 2019.
Lu, J., Sibai, H., Fabry, E., & Forsyth, D. (2017). No need to worry about adversarial examples in object detection in autonomous vehicles. arXiv: Computer Vision and Pattern Recognition, 2017.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2017).Towards deep learning models resistant to adversarial attacks. International conference on learning representations.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Martin, A. (2013). Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Moosavidezfooli, S., Fawzi, A., & Frossard, P. (2016). Deepfool: A simple and accurate method to fool deep neural networks. Computer vision and pattern recognition, pp. 2574–2582.
Nezami, O. M., Chaturvedi, A., Dras, M., & Garain, U. (2020). Pick-object-attack: Type-specific adversarial attack for object detection.
Prakash, A., Moran, N., Garber, S., DiLillo, A., & Storer, J. (2018). Deflecting adversarial attacks with pixel deflection. In 2018 IEEE/CVF conference on computer vision and pattern recognition, pp. 8571–8580.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484-489.
Article Google Scholar
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T. P., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., & Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354–359.
Article Google Scholar
Soomro K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv: Computer Vision and Pattern Recognition, 2012.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. Computer vision and pattern recognition, pp. 2818–2826.
Teng, S., Zhang, S., Huang, Q., & Sebe N. (2021). Viewpoint and scale consistency reinforcement for UAV vehicle re-identification. International Journal of Computer Vision 129.3 719-735.
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D., McDaniel, P. D. (2018). Ensemble adversarial training: Attacks and defenses. In International conference on learning representations.
Wei, X., Liang, S., Chen, N. & Cao, X. (2019). Transferable adversarial attacks for image and video object detection. International joint conference on artificial intelligence, pp. 954–960.
Wei, X., Zhu, J., Feng, S., Su, H. (2018) Video-to-video translation with global temporal consistency. Proceedings of the 26th ACM International conference on multimedia.
Wei, X., Zhu, J., Yuan, S., & Hang, S. (2019). Sparse adversarial perturbations for videos. In National Conference on Artificial Intelligence, 33, 8973–8980.
Article Google Scholar
Wei, Z., Chen, J., Wei, X., & Yugang, J. (2020). Heuristic black-box adversarial attacks on video recognition models. In National conference on artificial intelligence.
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. The Journal of Machine Learning Research, 15(1), 949–980.
MathSciNet MATH Google Scholar
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256.
MATH Google Scholar
Xie, C., Wang, J.,Zhang, Z., Ren, Z.,&Yuille, A. L. (2017) Mitigating adversarial effects through randomization. International conference on learning representations.
Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., & Yuille, A. L. (2017). Adversarial examples for semantic segmentation and object detection. International conference on computer vision, pp. 1378–1387.
Zhang, H., & Wang, J. (2019). Towards adversarially robust object detection. International conference on computer vision, pp. 421–430.
Zhou, K., Qiao, Y., Xiang, T. (2018). Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In National conference on artificial intelligence.

Download references

Acknowledgements

This work is supported by National Key R&D Program of China (Grant No.2020AAA0104002), National Natural Science Foundation of China (No.62076018). We also thank anonymous reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Institute of Artificial Intelligence, Hangzhou Innovation Institute, Beihang University, Beijing, China
Xingxing Wei
Beijing Key Laboratory of Digital Media (DML) and State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
Huanqian Yan & Bo Li

Authors

Xingxing Wei
View author publications
You can also search for this author in PubMed Google Scholar
Huanqian Yan
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingxing Wei.

Additional information

Communicated by Wenjun Kevin Zeng.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, X., Yan, H. & Li, B. Sparse Black-Box Video Attack with Reinforcement Learning. Int J Comput Vis 130, 1459–1473 (2022). https://doi.org/10.1007/s11263-022-01604-w

Download citation

Received: 31 July 2020
Accepted: 03 March 2022
Published: 06 April 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11263-022-01604-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Black-Box Video Attack with Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Multi-agent deep reinforcement learning: a survey

Applications of game theory in deep learning: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse Black-Box Video Attack with Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Multi-agent deep reinforcement learning: a survey

Applications of game theory in deep learning: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation