Skip to main content
Log in

Sparse Black-Box Video Attack with Reinforcement Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Adversarial attacks on video recognition models have been explored recently. However, most existing works treat each video frame equally and ignore their temporal interactions. To overcome this drawback, a few methods try to select some key frames and then perform attacks based on them. Unfortunately, their selection strategy is independent of the attacking step, therefore the resulting performance is limited. Instead, we argue the frame selection phase is closely relevant with the attacking phase. The key frames should be adjusted according to the attacking results. For that, we formulate the black-box video attacks into a Reinforcement Learning (RL) framework. Specifically, the environment in RL is set as the recognition model, and the agent in RL plays the role of frame selecting. By continuously querying the recognition models and receiving the attacking feedback, the agent gradually adjusts its frame selection strategy and adversarial perturbations become smaller and smaller. We conduct a series of experiments with two mainstream video recognition models: C3D and LRCN on the public UCF-101 and HMDB-51 datasets. The results demonstrate that the proposed method can significantly reduce the adversarial perturbations with efficient query times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Akhtar, N., & Mian, A. (2018). Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 6, 14410–14430.

    Article  Google Scholar 

  • Bose, J. A., & Aarabi, P. (2018). Adversarial attacks on face detectors using neural net based constrained optimization. pp. 1–6.

  • Cheng, M., Le, T., Chen, P.-Y., Zhang, H., Yi, J., & Hsieh, C.-J. (2019). Query-efficient hard-label black-box attack: An optimization-based approach. In International conference on learning representations.

  • Cheng, M., Singh, S., Chen, P.-Y., Liu, S., & Hsieh, C.-J. (2020). Sign-opt: A query-efficient hard-label adversarial attack. In International conference on learning representations.

  • Croce, F., Rauber, J., & Hein, M. (2020). Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks. In International Journal of Computer Vision, 128(4), 1028–1046.

    Article  MathSciNet  MATH  Google Scholar 

  • Das, N., Shanbhogue, M., Chen, S., Hohman, F., Li, S., Chen, L., Kounavis, M. E., & Chau, D. H. (2018) Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. Knowledge discovery and data mining, pp. 196–204.

  • Deng,L., Chen, J., Sun, Q., He, X., Tang, S., Ming, Z., Zhang, Y. & Chua,T.-S. (2019). Mixed-dish recognition with contextual relation networks. Proceedings of the 27th ACM International conference on multimedia.

  • Donahue, J., Hendricks, L. A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., & Darrell, T. (2017). Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 677–691.

    Article  Google Scholar 

  • Dong, W., Zhang, Z., & Tan, T. (2019). Attention-aware sampling via deep reinforcement learning for action recognition. In National Conference on Artificial Intelligence, 33, 8247–8254.

    Article  Google Scholar 

  • Dong, Y., Su, H., Wu, B., Li, Z., Liu, W., Zhang, T., & Zhu, J. (2019). Efficient decision-based black-box adversarial attacks on face recognition. In Computer vision and pattern recognition, pp. 7714–7722.

  • Esteva, A., Kuprel, B., Novoa, R. A., Ko, J. M., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.

    Article  Google Scholar 

  • Goodfellow, I. J., Jonathon, S., & Christian, S. (2015). Explaining and harnessing adversarial examples. In International conference on learning representations.

  • Goswami, G., Agarwal, A., Ratha, N., Singh, R., & Vatsa, M. (2019). Detecting and mitigating adversarial perturbations for robust face recognition. In International Journal of Computer Vision, 127(6), 719–742.

    Article  Google Scholar 

  • Guo, C., Rana, M., Cisse, M., & Van Der Maaten, L. (2017). Countering adversarial images using input transformations. International conference on learning representations.

  • Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In computer vision and pattern recognition, pp. 6546–6555.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Computer vision and pattern recognition, pp. 770–778.

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.

    Article  Google Scholar 

  • Ilyas, A.,Engstrom, L., Athalye, A., Lin, J., Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2018). Black-box adversarial attacks with limited queries and information. In International conference on machine learning.

  • Jia, X., Wei, X., & Cao, X. (2019). Identifying and resisting adversarial videos using temporal consistency. arXiv preprint arXiv:1909.04837.

  • Jia, X., Wei, X., Cao, X., & Foroosh, H. (2019). Comdefend: An efficient image compression model to defend adversarial examples. In 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp. 6084–6092.

  • Jiang, L., Ma, X., Chen, S., Bailey, J., & Jiangl, Y.-G. (2019). Black-box adversarial attacks on video recognition models. In Acm multimedia, pp. 864–872.

  • Kingma ,D. P. & Ba, J. (2015). Adam: A method for stochastic optimization. International conference on learning representations.

  • Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre,T. (2011). Hmdb: A large video database for human motion recognition. In 2011 International conference on computer vision, IEEE. pp. 2556–2563.

  • Lecun, Y., Bengio, Y., & Hinton, G. E. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  • Li, S., Neupane, A., Paul, S., Song, C., Krishnamurthy, S. V., Roy-Chowdhury, A. K., & Swami, A. (2019). In network and distributed system security symposium: Stealthy adversarial perturbations against real-time video classification systems.

  • Li, Y. (2017). Deep reinforcement learning: An overview. arXiv: Learning.

  • Litjens, G. J. S., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., Van Der Laak, J. A., Van Ginneken, B., & Sanchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88.

    Article  Google Scholar 

  • Liu, S., Chen, P. Y., Chen, X., & Hong, M. (2019). Signsgd via zeroth-order oracle. In 7th International conference on learning representations, ICLR 2019.

  • Lu, J., Sibai, H., Fabry, E., & Forsyth, D. (2017). No need to worry about adversarial examples in object detection in autonomous vehicles. arXiv: Computer Vision and Pattern Recognition, 2017.

  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2017).Towards deep learning models resistant to adversarial attacks. International conference on learning representations.

  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Martin, A. (2013). Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

  • Moosavidezfooli, S., Fawzi, A., & Frossard, P. (2016). Deepfool: A simple and accurate method to fool deep neural networks. Computer vision and pattern recognition, pp. 2574–2582.

  • Nezami, O. M., Chaturvedi, A., Dras, M., & Garain, U. (2020). Pick-object-attack: Type-specific adversarial attack for object detection.

  • Prakash, A., Moran, N., Garber, S., DiLillo, A., & Storer, J. (2018). Deflecting adversarial attacks with pixel deflection. In 2018 IEEE/CVF conference on computer vision and pattern recognition, pp. 8571–8580.

  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484-489.

    Article  Google Scholar 

  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T. P., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., & Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354–359.

    Article  Google Scholar 

  • Soomro K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv: Computer Vision and Pattern Recognition, 2012.

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. Computer vision and pattern recognition, pp. 2818–2826.

  • Teng, S., Zhang, S., Huang, Q., & Sebe N. (2021). Viewpoint and scale consistency reinforcement for UAV vehicle re-identification. International Journal of Computer Vision 129.3 719-735.

  • Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D., McDaniel, P. D. (2018). Ensemble adversarial training: Attacks and defenses. In International conference on learning representations.

  • Wei, X., Liang, S., Chen, N. & Cao, X. (2019). Transferable adversarial attacks for image and video object detection. International joint conference on artificial intelligence, pp. 954–960.

  • Wei, X., Zhu, J., Feng, S., Su, H. (2018) Video-to-video translation with global temporal consistency. Proceedings of the 26th ACM International conference on multimedia.

  • Wei, X., Zhu, J., Yuan, S., & Hang, S. (2019). Sparse adversarial perturbations for videos. In National Conference on Artificial Intelligence, 33, 8973–8980.

    Article  Google Scholar 

  • Wei, Z., Chen, J., Wei, X., & Yugang, J. (2020). Heuristic black-box adversarial attacks on video recognition models. In National conference on artificial intelligence.

  • Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. The Journal of Machine Learning Research, 15(1), 949–980.

    MathSciNet  MATH  Google Scholar 

  • Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256.

    MATH  Google Scholar 

  • Xie, C., Wang, J.,Zhang, Z., Ren, Z.,&Yuille, A. L. (2017) Mitigating adversarial effects through randomization. International conference on learning representations.

  • Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., & Yuille, A. L. (2017). Adversarial examples for semantic segmentation and object detection. International conference on computer vision, pp. 1378–1387.

  • Zhang, H., & Wang, J. (2019). Towards adversarially robust object detection. International conference on computer vision, pp. 421–430.

  • Zhou, K., Qiao, Y., Xiang, T. (2018). Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In National conference on artificial intelligence.

Download references

Acknowledgements

This work is supported by National Key R&D Program of China (Grant No.2020AAA0104002), National Natural Science Foundation of China (No.62076018). We also thank anonymous reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingxing Wei.

Additional information

Communicated by Wenjun Kevin Zeng.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, X., Yan, H. & Li, B. Sparse Black-Box Video Attack with Reinforcement Learning. Int J Comput Vis 130, 1459–1473 (2022). https://doi.org/10.1007/s11263-022-01604-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01604-w

Keywords

Navigation