Abstract
Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics. Herein, a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks, repelling them via a generic two-stage framework. In the first stage, a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, it is theoretically shown that our adversary is strong under an appropriate noise level. Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness, achieving state-of-the-art performance in both Atari and MuJoCo environments.
Similar content being viewed by others
References
Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning. 2013. ArXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529–533
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning (ICML), 2016. 1928–1937
Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484–489
Schrittwieser J, Antonoglou I, Hubert T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 2020, 588: 604–609
Huang S, Papernot N, Goodfellow I, et al. Adversarial attacks on neural network policies. 2017. ArXiv:1702.02284
Kos J, Song D. Delving into adversarial attacks on deep policies. 2017. ArXiv:1705.06452
Zhang H, Chen H G, Xiao C W, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020. 21024–21037
Zhang H, Chen H G, Boning D S, et al. Robust reinforcement learning on state observations with learned optimal adversary. In: Proceedings of International Conference on Learning Representations (ICLR), 2021
Biggio B, Corona I, Maiorca D, et al. Evasion attacks against machine learning at test time. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), 2013. 387–402
Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. In: Proceedings of International Conference onLearning Representations (ICLR), 2014
Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. 2014. ArXiv:1412.6572
Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. In: Proceedings of International Conference on Learning Representations (ICML), 2018
Dong Y P, Liao F Z, Pang T Y, et al. Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 9185–9193
Xiao C W, Pan X L, He W, et al. Characterizing attacks on deep reinforcement learning. 2019. ArXiv:1907.09470
Mandlekar A, Zhu Y K, Garg A, et al. Adversarially robust policy learning: active construction of physically-plausible perturbations. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS), 2017. 3932–3939
Russo A, Proutiere A. Optimal attacks on reinforcement learning policies. 2019. ArXiv:1907.13548
Lin Y C, Hong Z W, Liao Y H, et al. Tactics of adversarial attack on deep reinforcement learning agents. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. 3756–3762
Pattanaik A, Tang Z, Liu S, et al. Robust deep reinforcement learning with adversarial attacks. In: Proceedings of the 17th International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), 2018. 2040–2042
Lin J Y, Dzeparoska K, Zhang S Q, et al. On the robustness of cooperative multi agent reinforcement learning. In: Proceedings of IEEE Security and Privacy Workshops (SPW), 2020. 62–68
Ying C, Zhou X, Su H, et al. Towards safe reinforcement learning via constraining conditional value-at-risk. 2022. ArXiv:2206.04436
Sun J W, Zhang T W, Xie X, et al. Stealthy and efficient adversarial attacks against deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2020
Yang C H H, Qi J, Chen P Y, et al. Enhanced adversarial strategically-timed attacks against deep reinforcement learning. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020. 3407–3411
Inkawhich M, Chen Y, Li H. Snooping attacks on deep reinforcement learning. In: Proceedings of the 19th International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), 2020. 557–565
Zhang H, Weng T W, Chen P Y, et al. Efficient neural network robustness certification with general activation functions. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2018. 4939–4948
Nielsen F, Sun K. Guaranteed deterministic bounds on the total variation distance between univariate mixtures. In: Proceedings of the 28th International Workshop on Machine Learning for Signal Processing (MLSP), 2018. 1–6
Achiam J, Held D, Tamar A, et al. Constrained policy optimization. In: Proceedings of International Conference on Machine Learning (ICML), 2017. 22–31
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization. 2017. ArXiv:1707.06347
Horgan D, Quan J, Budden D, et al. Distributed prioritized experience replay. In: Proceedings of International Conference on Learning Representations (ICLR), 2018
Konda V R, Tsitsiklis J N. Actor-critic algorithms. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2000. 1008–1014
Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization. In: Proceedings of International Conference on Machine Learning (ICML), 2015. 1889–1897
Oikarinen T, Weng T W, Daniel L. Robust deep reinforcement learning through adversarial loss. 2020. ArXiv:2008.01976
Kostrikov I. PyTorch implementations of reinforcement learning algorithms. https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail
Acknowledgements
This work was supported by National Key Research and Development Program of China (Grant Nos. 2020AAA0104304, 2017YFA0700904), National Natural Science Foundation of China (Grant Nos. 61620106010, 62061136001, 61621136008, 62076147, U19B2034, U1811461, U19A2081), Beijing NSF Project (Grant No. JQ19016), Beijing Academy of Artificial Intelligence (BAAI), Tsinghua-Huawei Joint Research Program, Tsinghua Institute for Guo Qiang, Tsinghua-OPPO Joint Research Center for Future Terminal Technology and Tsinghua-China Mobile Communications Group Co., Ltd. Joint Institute.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information Appendix A. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Rights and permissions
About this article
Cite this article
Qiaoben, Y., Ying, C., Zhou, X. et al. Understanding adversarial attacks on observations in deep reinforcement learning. Sci. China Inf. Sci. 67, 152104 (2024). https://doi.org/10.1007/s11432-021-3688-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3688-y