Understanding adversarial attacks on observations in deep reinforcement learning

Qiaoben, You; Ying, Chengyang; Zhou, Xinning; Su, Hang; Zhu, Jun; Zhang, Bo

doi:10.1007/s11432-021-3688-y

Understanding adversarial attacks on observations in deep reinforcement learning

Research Paper
Published: 26 April 2024

Volume 67, article number 152104, (2024)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

You Qiaoben¹,
Chengyang Ying¹,
Xinning Zhou¹,
Hang Su^1,2,
Jun Zhu^1,2 &
…
Bo Zhang¹

62 Accesses
Explore all metrics

Abstract

Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics. Herein, a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks, repelling them via a generic two-stage framework. In the first stage, a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, it is theoretically shown that our adversary is strong under an appropriate noise level. Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness, achieving state-of-the-art performance in both Atari and MuJoCo environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Adversarial Perturbations Against Deep Reinforcement Learning Policies: Attacks and Defenses

Defending Observation Attacks in Deep Reinforcement Learning via Detection and Denoising

Multiple-Model Based Defense for Deep Reinforcement Learning Against Adversarial Attack

References

Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning. 2013. ArXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529–533
Article Google Scholar
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning (ICML), 2016. 1928–1937
Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484–489
Article Google Scholar
Schrittwieser J, Antonoglou I, Hubert T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 2020, 588: 604–609
Article Google Scholar
Huang S, Papernot N, Goodfellow I, et al. Adversarial attacks on neural network policies. 2017. ArXiv:1702.02284
Kos J, Song D. Delving into adversarial attacks on deep policies. 2017. ArXiv:1705.06452
Zhang H, Chen H G, Xiao C W, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020. 21024–21037
Zhang H, Chen H G, Boning D S, et al. Robust reinforcement learning on state observations with learned optimal adversary. In: Proceedings of International Conference on Learning Representations (ICLR), 2021
Biggio B, Corona I, Maiorca D, et al. Evasion attacks against machine learning at test time. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), 2013. 387–402
Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. In: Proceedings of International Conference onLearning Representations (ICLR), 2014
Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. 2014. ArXiv:1412.6572
Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. In: Proceedings of International Conference on Learning Representations (ICML), 2018
Dong Y P, Liao F Z, Pang T Y, et al. Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 9185–9193
Xiao C W, Pan X L, He W, et al. Characterizing attacks on deep reinforcement learning. 2019. ArXiv:1907.09470
Mandlekar A, Zhu Y K, Garg A, et al. Adversarially robust policy learning: active construction of physically-plausible perturbations. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS), 2017. 3932–3939
Russo A, Proutiere A. Optimal attacks on reinforcement learning policies. 2019. ArXiv:1907.13548
Lin Y C, Hong Z W, Liao Y H, et al. Tactics of adversarial attack on deep reinforcement learning agents. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. 3756–3762
Pattanaik A, Tang Z, Liu S, et al. Robust deep reinforcement learning with adversarial attacks. In: Proceedings of the 17th International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), 2018. 2040–2042
Lin J Y, Dzeparoska K, Zhang S Q, et al. On the robustness of cooperative multi agent reinforcement learning. In: Proceedings of IEEE Security and Privacy Workshops (SPW), 2020. 62–68
Ying C, Zhou X, Su H, et al. Towards safe reinforcement learning via constraining conditional value-at-risk. 2022. ArXiv:2206.04436
Sun J W, Zhang T W, Xie X, et al. Stealthy and efficient adversarial attacks against deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2020
Yang C H H, Qi J, Chen P Y, et al. Enhanced adversarial strategically-timed attacks against deep reinforcement learning. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020. 3407–3411
Inkawhich M, Chen Y, Li H. Snooping attacks on deep reinforcement learning. In: Proceedings of the 19th International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), 2020. 557–565
Zhang H, Weng T W, Chen P Y, et al. Efficient neural network robustness certification with general activation functions. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2018. 4939–4948
Nielsen F, Sun K. Guaranteed deterministic bounds on the total variation distance between univariate mixtures. In: Proceedings of the 28th International Workshop on Machine Learning for Signal Processing (MLSP), 2018. 1–6
Achiam J, Held D, Tamar A, et al. Constrained policy optimization. In: Proceedings of International Conference on Machine Learning (ICML), 2017. 22–31
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization. 2017. ArXiv:1707.06347
Horgan D, Quan J, Budden D, et al. Distributed prioritized experience replay. In: Proceedings of International Conference on Learning Representations (ICLR), 2018
Konda V R, Tsitsiklis J N. Actor-critic algorithms. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2000. 1008–1014
Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization. In: Proceedings of International Conference on Machine Learning (ICML), 2015. 1889–1897
Oikarinen T, Weng T W, Daniel L. Robust deep reinforcement learning through adversarial loss. 2020. ArXiv:2008.01976
Kostrikov I. PyTorch implementations of reinforcement learning algorithms. https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (Grant Nos. 2020AAA0104304, 2017YFA0700904), National Natural Science Foundation of China (Grant Nos. 61620106010, 62061136001, 61621136008, 62076147, U19B2034, U1811461, U19A2081), Beijing NSF Project (Grant No. JQ19016), Beijing Academy of Artificial Intelligence (BAAI), Tsinghua-Huawei Joint Research Program, Tsinghua Institute for Guo Qiang, Tsinghua-OPPO Joint Research Center for Future Terminal Technology and Tsinghua-China Mobile Communications Group Co., Ltd. Joint Institute.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Beijing National Research Center for Information Science and Technology, Tsinghua-Bosch Joint Center for Machine Learning, Institute for Artificial Intelligence, Tsinghua University, Beijing, 100084, China
You Qiaoben, Chengyang Ying, Xinning Zhou, Hang Su, Jun Zhu & Bo Zhang
Peng Cheng Laboratory, Shenzhen, 518055, China
Hang Su & Jun Zhu

Authors

You Qiaoben
View author publications
You can also search for this author in PubMed Google Scholar
Chengyang Ying
View author publications
You can also search for this author in PubMed Google Scholar
Xinning Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hang Su
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Zhu.

Additional information

Supporting information Appendix A. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File