Skip to main content
Log in

New challenges in reinforcement learning: a survey of security and privacy

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Reinforcement learning is one of the most important branches of AI. Due to its capacity for self-adaption and decision-making in dynamic environments, reinforcement learning has been widely applied in multiple areas, such as healthcare, data markets, autonomous driving, and robotics. However, some of these applications and systems have been shown to be vulnerable to security or privacy attacks, resulting in unreliable or unstable services. A large number of studies have focused on these security and privacy problems in reinforcement learning. However, few surveys have provided a systematic review and comparison of existing problems and state-of-the-art solutions to keep up with the pace of emerging threats. Accordingly, we herein present such a comprehensive review to explain and summarize the challenges associated with security and privacy in reinforcement learning from a new perspective, namely that of the Markov Decision Process (MDP). In this survey, we first introduce the key concepts related to this area. Next, we cover the security and privacy issues linked to the state, action, environment, and reward function of the MDP process, respectively. We further highlight the special characteristics of security and privacy methodologies related to reinforcement learning. Finally, we discuss the possible future research directions within this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Ahmed U, Lin JC-W, Srivastava G (2021) Privacy-preserving deep reinforcement learning in vehicle adhoc networks. In: IEEE consumer electronics magazine

  • Ahmed U, Lin JC-W, Srivastava G, Chen H-C (2022) Deep active reinforcement learning for privacy preserve data mining in 5g environments. J Intell Fuzzy Syst, pp 1–8

  • Alaya B, Laouamer L, Msilini N (2020) Homomorphic encryption systems statement: trends and challenges. Comput Sci Rev 36:100235

    MathSciNet  MATH  Google Scholar 

  • Arora S, Doshi P (2021) A survey of inverse reinforcement learning: chalenges, methods and progress. Artif Intell 297:103500

    MATH  Google Scholar 

  • Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38

    Google Scholar 

  • Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 5:834–846

    Google Scholar 

  • Behzadan V, Munir A (2017) Vulnerability of deep reinforcement learning to policy induction attacks. In: International conference on machine learning and data mining in pattern recognition. Springer, pp 262–275

  • Belhadi A, Djenouri Y, Srivastava G, Jolfaei A, Lin JC-W (2021) Privacy reinforcement learning for faults detection in the smart grid. Ad Hoc Netw 119:102541

    Google Scholar 

  • Bellifemine FL, Caire G, Greenwood D (2007) Developing multi-agent systems with JADE. Wiley, Hoboken

    Google Scholar 

  • Bellman R (1957) Dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Bohlin TP (2006) Practical grey-box process identification: theory and applications. Springer, New York

    MATH  Google Scholar 

  • Chan PP, Wang Y, Yeung DS (2020) Adversarial attack against deep reinforcement learning with static reward impact map. In: Proceedings of the 15th ACM Asia conference on computer and communications security, pp 334–343

  • Chen T, Liu J, Xiang Y, Niu W, Tong E, Han Z (2019) Adversarial attack and defense in reinforcement learning-from AI security view. Cybersecurity 2(1):1–22

    Google Scholar 

  • Chen M, Liu A, Liu W, Ota K, Dong M, Xiong NN (2021a) Rdrl: a recurrent deep reinforcement learning scheme for dynamic spectrum access in reconfigurable wireless networks. IEEE Trans Netw Sci Eng 9(2):364–376

    Google Scholar 

  • Chen M, Liu W, Wang T, Liu A, Zeng Z (2021b) Edge intelligence computing for mobile augmented reality with deep reinforcement learning approach. Comput Netw 195:108186

    Google Scholar 

  • Chen W, Qiu X, Cai T, Dai H-N, Zheng Z, Zhang Y (2021c) Deep reinforcement learning for internet of things: a comprehensive survey. IEEE Commun Surv Tutor 23:1659

    Google Scholar 

  • Chen M, Wang T, Zhang S, Liu A (2021d) Deep reinforcement learning for computation offloading in mobile edge computing environment. Comput Commun 175:1–12

    Google Scholar 

  • Chen M, Liu W, Wang T, Zhang S, Liu A (2022) A game-based deep reinforcement learning approach for energy-efficient computation in mec systems. Knowl-Based Syst 235:107660

    Google Scholar 

  • Cheng Z, Ye D, Zhu T, Zhou W, Yu PS, Zhu C (2022) Multi-agent reinforcement learning via knowledge transfer with differentially private noise. Int J Intell Syst 37(1):799–828

    Google Scholar 

  • Chowdhury SR, Zhou X (2021) Differentially private regret minimization in episodic markov decision processes. http://arxiv.org/abs/2112.10599

  • Dai C, Xiao L, Wan X, Chen Y (2019) Reinforcement learning with safe exploration for network security. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3057–3061

  • Deng Y, Bao F, Kong Y, Ren Z, Dai Q (2016) Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans Neural Netw Learn Syst 28(3):653–664

    Google Scholar 

  • François-Lavet V (2017) Contributions to deep reinforcement learning and its applications in smartgrids. PhD thesis, Universite de Liege, Liege, Belgique

  • Fu J, Luo K, Levine S (2017) Learning robust rewards with adversarial inverse reinforcement learning. http://arxiv.org/abs/1710.11248

  • Gandhi D, Pinto L, Gupta A (2017) Learning to fly by crashing. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 3948–3955

  • Gao H, Huang W, Liu T, Yin Y, Li Y (2022) Ppo2: Location privacy-oriented task offloading to edge computing using reinforcement learning for intelligent autonomous transport systems. In: IEEE transactions on intelligent transportation systems

  • Garrett IY, Gerdes RM (2019) Z table: Cost-optimized attack on reinforcement learning. In: 2019 First IEEE international conference on trust, privacy and security in intelligent systems and applications (TPS-ISA). IEEE, pp 10–17

  • Gosavi A (2009) Reinforcement learning: a tutorial survey and recent advances. INFORMS J Comput 21(2):178–192

    MathSciNet  MATH  Google Scholar 

  • Huang Y, Zhu Q (2019) Deceptive reinforcement learning under adversarial manipulations on cost signals. In: International conference on decision and game theory for security. Springer, pp 217–237

  • Kaiser L, Babaeizadeh M, Milos P, Osinski B, Campbell RH, Czechowski K, Erhan D, Finn C, Kozakowski P, Levine S et al (2019) Model-based reinforcement learning for atari. http://arxiv.org/abs/1903.00374

  • Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274

    Google Scholar 

  • Lee XY, Ghadai S, Tan KL, Hegde C, Sarkar S (2020) Spatiotemporally constrained action space attacks on deep reinforcement learning agents. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 4577–4584

  • Lei L, Tan Y, Zheng K, Liu S, Zhang K, Shen X (2020) Deep reinforcement learning for autonomous internet of things: model, applications and challenges. IEEE Commun Surv Tutor 22(3):1722–1760

    Google Scholar 

  • Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373

    MathSciNet  MATH  Google Scholar 

  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web, pp 661–670

  • Li Z, Kiseleva J, de Rijke M (2019a) Dialogue generation: From imitation learning to inverse reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 6722–6729

  • Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019b) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 4213–4220

  • Li H, Wu Y, Chen M (2020) Adaptive fault-tolerant tracking control for discrete-time multiagent systems via reinforcement learning algorithm. IEEE Trans Cybern 51(3):1163–1174

    Google Scholar 

  • Li J, Ren T, Yan D, Su H, Zhu J (2022) Policy learning for robust markov decision process with a mismatched generative mode. http://arxiv.org/abs/2203.06587

  • Lin JC-W, Fournier-Viger P, Wu L, Gan W, Djenouri Y, Zhang J (2018) Ppsf: an open-source privacy-preserving and security mining framework. In: 2018 IEEE international conference on data mining workshops (ICDMW). IEEE, pp 1459–1463

  • Lin J, Dzeparoska K, Zhang SQ, Leon-Garcia A, Papernot N (2020) On the robustness of cooperative multi-agent reinforcement learning. In: 2020 IEEE security and privacy workshops (SPW). IEEE, pp 62–68

  • Littman ML, Dean TL, Kaelbling LP (2013) On the complexity of solving markov decision problems. http://arxiv.org/abs/1302.4971

  • Liu L, Wang Z, Zhang H (2016) Adaptive fault-tolerant tracking control for mimo discrete-time systems via reinforcement learning algorithm with less learning parameters. IEEE Trans Autom Sci Eng 14(1):299–313

    Google Scholar 

  • Liu Z, Yang Y, Miller T, Masters P (2021) Deceptive reinforcement learning for privacy-preserving planning. http://arxiv.org/abs/2102.03022

  • Liu S, Zheng C, Huang Y, Quek TQ (2022) Distributed reinforcement learning for privacy-preserving dynamic edge caching. IEEE J Sel Areas Commun 40(3):749–760

    Google Scholar 

  • Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y-C, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor 21(4):3133–3174

    Google Scholar 

  • Meng TL, Khushi M (2019) Reinforcement learning in financial markets. Data 4(3):110

    Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Google Scholar 

  • Neu G, Szepesvári C (2009) Training parsers by inverse reinforcement learning. Mach Learn 77(2):303–337

    MATH  Google Scholar 

  • Pan X, You Y, Wang Z, Lu C (2017) Virtual to real reinforcement learning for autonomous driving. http://arxiv.org/abs/1704.03952

  • Pan X, Wang W, Zhang X, Li B, Yi J, Song D (2019) How you act tells a lot: privacy-leaking attack on deep reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 368–376

  • Park J, Kim DS, Lim H (2020) Privacy-preserving reinforcement learning using homomorphic encryption in cloud computing infrastructures. IEEE Access 8:203564–203579

    Google Scholar 

  • Prakash K, Husain F, Paruchuri P, Gujar SP (2021) How private is your RL policy? An inverse RL based analysis framework. http://arxiv.org/abs/2112.05495

  • Rakhsha A, Radanovic G, Devidze R, Zhu X, Singla A (2020) Policy teaching via environment poisoning: training-time adversarial attacks against reinforcement learning. In: International conference on machine learning. PMLR, pp 7974–7984

  • Rakhsha A, Radanovic G, Devidze R, Zhu X, Singla A (2021) Policy teaching in reinforcement learning via environment poisoning attacks. J Mach Learn Res 22(210):1–45

    MathSciNet  MATH  Google Scholar 

  • Ren Y, Liu W, Liu A, Wang T, Li A (2022) A privacy-protected intelligent crowdsourcing application of iot based on the reinforcement learning. Future Gener Comput Syst 127:56–69

    Google Scholar 

  • Rodríguez-Barroso N, López DJ, Luzón M, Herrera F, Martínez-Cámara E (2022) Survey on federated learning threats: concepts, taxonomy on attacks and defences, experimental study and challenges. http://arxiv.org/abs/2201.08135

  • Sakuma J, Kobayashi S, Wright RN (2008) Privacy-preserving reinforcement learning. In: Proceedings of the 25th international conference on machine learning, pp 864–871

  • Sehgal A, La H, Louis S, Nguyen H (2019) Deep reinforcement learning using genetic algorithm for parameter optimization. In: 2019 third IEEE international conference on robotic computing (IRC). IEEE, pp 596–601

  • Sun J, Zhang T, Xie X, Ma L, Zheng Y, Chen K, Liu Y (2020) Stealthy and efficient adversarial attacks against deep reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 5883–5891

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, p 22447

    Google Scholar 

  • Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  • Tessler C, Efroni Y, Mannor S (2019) Action robust reinforcement learning and applications in continuous control. In: International conference on machine learning, PMLR. pp 6215–6224

  • Tucker A, Gleave A, Russell S (2018) Inverse reinforcement learning for video games. http://arxiv.org/abs/1810.10593

  • Uprety A, Rawat DB (2020) Reinforcement learning for IoT security: a comprehensive survey. IEEE Internet Things J 8(11):8693–8706

    Google Scholar 

  • Vietri G, Balle B, Krishnamurthy A, Wu S (2020) Private reinforcement learning with pac and regret guarantees. In: International conference on machine learning. PMLR, pp 9754–9764

  • Wang B, Hegde N (2019) Privacy-preserving q-learning with functional noise in continuous state spaces. http://arxiv.org/abs/1901.10634

  • Wang X, Nair S, Althoff M (2020) Falsification-based robust adversarial reinforcement learning. In: 2020 19th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 205–212

  • Watkins CJCH (1989) Learning from delayed rewards

  • Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292

    MATH  Google Scholar 

  • Wu Y, Wang Z, Ma Y, Leung VC (2021) Deep reinforcement learning for blockchain in industrial iot: a survey. Comput Netw 191:108004

    Google Scholar 

  • Xiao Z, Xiao Y (2012) Security and privacy in cloud computing. IEEE Commun Surv Tutor 15(2):843–859

    MathSciNet  Google Scholar 

  • Ye D, Zhu T, Zhou W, Philip SY (2019) Differentially private malicious agent avoidance in multiagent advising learning. IEEE Trans Cybern 50(10):4214–4227

    Google Scholar 

  • Ye D, Zhu T, Shen S, Zhou W (2020a) A differentially private game theoretic approach for deceiving cyber adversaries. IEEE Trans Inf Forensic Secur 16:569–584

    Google Scholar 

  • Ye D, Shen S, Zhu T, Liu B, Zhou W (2020b) One parameter defense-defending against data inference attacks via differential privacy. IEEE Trans Inf Forensics Secur

  • Ye D, Zhu T, Cheng Z, Zhou W, Philip SY (2020c) Differential advising in multiagent reinforcement learning. In: IEEE transactions on cybernetics

  • Ye D, Zhu T, Shen S, Zhou W, Yu P (2020d) Differentially private multi-agent planning for logistic-like problems. In: IEEE transactions on dependable and secure computing

  • Ye D, Zhu T, Zhu C, Zhou W, Philip SY (2022) Model-based self-advising for multi-agent learning. In: IEEE transactions on neural networks and learning systems

  • Ying Z, Zhang Y, Cao S, Xu S, Liu X (2020) Oidpr: optimized insulin dosage based on privacy-preserving reinforcement learning. In: 2020 IFIP Networking Conference (Networking). IEEE, pp 655–657

  • Yu S, Chen X, Zhou Z, Gong X, Wu D (2020) When deep reinforcement learning meets federated learning: intelligent multitimescale resource management for multiaccess edge computing in 5g ultradense network. IEEE Internet Things J 8(4):2238–2251

    Google Scholar 

  • Yu C, Liu J, Nemati S, Yin G (2021) Reinforcement learning in healthcare: a survey. ACM Comput Surv (CSUR) 55(1):1–36

    Google Scholar 

  • Zhai P, Luo J, Dong Z, Zhang L, Wang S, Yang D (2022) Robust adversarial reinforcement learning with dissipation inequation constraint

  • Zhang X, Ma Y, Singla A, Zhu X (2020) Adaptive reward-poisoning attacks against reinforcement learning. In: International conference on machine learning. PMLR, pp 11225–11234

  • Zhao Y, Shumailov I, Cui H, Gao X, Mullins R, Anderson R (2020) Blackbox attacks on reinforcement learning agents using approximated temporal information. In: 2020 50th Annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W). IEEE, pp 16–24

  • Zhou X (2022) Differentially private reinforcement learning with linear function approximation. http://arxiv.org/abs/2201.07052

  • Zhu T, Li G, Zhou W, Philip SY (2017) Differentially private data publishing and analysis: a survey. IEEE Trans Knowl Data Eng 29(8):1619–1638

    Google Scholar 

  • Zhu T, Ye D, Wang W, Zhou W, Yu PS (2020) More than privacy: applying differential privacy in key areas of artificial intelligence. http://arxiv.org/abs/2008.01916

Download references

Acknowledgements

This work is supported by ARC Discovery Project (DP190100981, DP200100946) from the Australian Research Council,Australia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianqing Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lei, Y., Ye, D., Shen, S. et al. New challenges in reinforcement learning: a survey of security and privacy. Artif Intell Rev 56, 7195–7236 (2023). https://doi.org/10.1007/s10462-022-10348-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-022-10348-5

Keywords

Navigation