Skip to main content

Explainable Reinforcement Learning: A Survey

  • Conference paper
  • First Online:
Machine Learning and Knowledge Extraction (CD-MAKE 2020)


Explainable Artificial Intelligence (XAI), i.e., the development of more transparent and interpretable AI models, has gained increased traction over the last few years. This is due to the fact that, in conjunction with their growth into powerful and ubiquitous tools, AI models exhibit one detrimental characteristic: a performance-transparency trade-off. This describes the fact that the more complex a model’s inner workings, the less clear it is how its predictions or decisions were achieved. But, especially considering Machine Learning (ML) methods like Reinforcement Learning (RL) where the system learns autonomously, the necessity to understand the underlying reasoning for their decisions becomes apparent. Since, to the best of our knowledge, there exists no single work offering an overview of Explainable Reinforcement Learning (XRL) methods, this survey attempts to address this gap. We give a short summary of the problem, a definition of important terms, and offer a classification and assessment of current XRL methods. We found that a) the majority of XRL methods function by mimicking and simplifying a complex model instead of designing an inherently simple one, and b) XRL (and XAI) methods often neglect to consider the human side of the equation, not taking into account research from related fields like psychology or philosophy. Thus, an interdisciplinary effort is needed to adapt the generated explanations to a (non-expert) human user in order to effectively progress in the field of XRL and XAI in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    Please note that, while there is a distinction between Reinforcement Learning and Deep Reinforcement Learning (DRL), for the sake of simplicity, we will refer to both as just Reinforcement Learning going forward.

  2. 2.

    E.g. the AI Explainability 360 (AIX360) as the currently most comprehensive one [4] (see also for a list of other toolkits).

  3. 3.

  4. 4.

    With the exception of method C in Sect. 3.3, where we present a Linear Model U-Tree method although another paper with a different, but related method was published slightly later. See the last paragraph of that section for our reasoning for this decision.


  1. Abdul, A., Vermeulen, J., Wang, D., Lim, B.Y., Kankanhalli, M.: Trends and trajectories for explainable, accountable and intelligible systems. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI 2018. ACM Press (2018)

    Google Scholar 

  2. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018).

    Article  Google Scholar 

  3. Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, vol. 70, pp. 166–175. (2017)

    Google Scholar 

  4. Arya, V., et al.: One explanation does not fit all: a toolkit and taxonomy of AI explainability techniques (2019). arXiv:1909.03012

  5. Bevana, N., Kirakowskib, J., Maissela, J.: What is usability. In: Proceedings of the 4th International Conference on HCI. Citeseer (1991)

    Google Scholar 

  6. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI gym (2016). arXiv:1606.01540

  7. Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019).

    Article  Google Scholar 

  8. Chakraborty, S., et al.: Interpretability of deep learning models: a survey of results. In: 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE (2017)

    Google Scholar 

  9. Coppens, Y., et al.: Distilling deep reinforcement learning policies in soft decision trees. In: Proceedings of the IJCAI 2019 Workshop on Explainable Artificial Intelligence, pp. 1–6 (2019)

    Google Scholar 

  10. Doran, D., Schulz, S., Besold, T.R.: What does explainable AI really mean? A new conceptualization of perspectives (2017). arXiv:1710.00794

  11. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning (2017). arXiv:1702.08608

  12. Dosilovic, F.K., Brcic, M., Hlupic, N.: Explainable artificial intelligence: a survey. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE (2018).

  13. Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning. Commun. ACM 63(1), 68–77 (2019).

    Article  Google Scholar 

  14. European Commission, Parliament: Regulation (EU) 2016/679 of the European parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). OJ L 119, 1–88 (2016)

    Google Scholar 

  15. Fischer, L., Memmen, J.M., Veith, E.M., Tröschel, M.: Adversarial resilience learning–towards systemic vulnerability analysis for large and complex systems. In: The Ninth International Conference on Smart Grids, Green Communications and IT Energy-aware Technologies (ENERGY 2019), vol. 9, pp. 24–32 (2019)

    Google Scholar 

  16. Freitas, A.A.: Comprehensible classification models. ACM SIGKDD Explor. Newsl. 15(1), 1–10 (2014)

    Article  Google Scholar 

  17. Fukuchi, Y., Osawa, M., Yamakawa, H., Imai, M.: Autonomous self-explanation of behavior for interactive reinforcement learning agents. In: Proceedings of the 5th International Conference on Human Agent Interaction - HAI 2017. ACM Press (2017)

    Google Scholar 

  18. Glass, A., McGuinness, D.L., Wolverton, M.: Toward establishing trust in adaptive agents. In: Proceedings of the 13th International Conference on Intelligent User Interfaces - IUI 2008. ACM Press (2008)

    Google Scholar 

  19. Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 38(3), 50–57 (2017)

    Article  Google Scholar 

  20. Halpern, J.Y.: Causes and explanations: a structural-model approach. Part II: explanations. Br. J. Philos. Sci. 56(4), 889–911 (2005)

    Article  Google Scholar 

  21. Hayes, B., Shah, J.A.: Improving robot controller transparency through autonomous policy explanation. In: Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction - HRI 2017. ACM Press (2017)

    Google Scholar 

  22. Hein, D., Hentschel, A., Runkler, T., Udluft, S.: Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies. Eng. Appl. Artif. Intell. 65, 87–98 (2017).

    Article  Google Scholar 

  23. Hein, D., Udluft, S., Runkler, T.A.: Interpretable policies for reinforcement learning by genetic programming. Eng. Appl. Artif. Intell. 76, 158–169 (2018)

    Article  Google Scholar 

  24. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work - CSCW 2000. ACM Press (2000)

    Google Scholar 

  25. Holzinger, A., Carrington, A., Müller, H.: Measuring the quality of explanations: the system causability scale (SCS). KI - Künstliche Intelligenz 34(2), 193–198 (2020).

    Article  Google Scholar 

  26. Holzinger, A., Langs, G., Denk, H., Zatloukal, K., Müller, H.: Causability and explainability of artificial intelligence in medicine. WIREs Data Min. Knowl. Disc. 9(4) (2019).

  27. Ikonomovska, E., Gama, J., Džeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Disc. 23(1), 128–168 (2010)

    Article  MathSciNet  Google Scholar 

  28. Israelsen, B.W., Ahmed, N.R.: “Dave...I can assure you...that it’s going to be all right...” a definition, case for, and survey of algorithmic assurances in human-autonomy trust relationships. ACM Comput. Surv. 51(6), 1–37 (2019)

    Article  Google Scholar 

  29. Juozapaitis, Z., Koul, A., Fern, A., Erwig, M., Doshi-Velez, F.: Explainable reinforcement learning via reward decomposition. In: Proceedings of the IJCAI 2019 Workshop on Explainable Artificial Intelligence, pp. 47–53 (2019)

    Google Scholar 

  30. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey (1996). arXiv:cs/9605103

  31. Kim, B., Khanna, R., Koyejo, O.O.: Examples are not enough, learn to criticize! criticism for interpretability. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 2280–2288. Curran Associates, Inc. (2016).

  32. Lee, J.D., See, K.A.: Trust in automation: designing for appropriate reliance. Hum. Fact. J. Hum. Fact. Ergon. Soc. 46(1), 50–80 (2004).

    Article  Google Scholar 

  33. Lee, J.H.: Complementary reinforcement learning towards explainable agents (2019). arXiv:1901.00188

  34. Li, Y.: Deep reinforcement learning (2018). arXiv:1810.06339

  35. Lipton, Z.C.: The mythos of model interpretability (2016). arXiv:1606.03490

  36. Lipton, Z.C.: The mythos of model interpretability. Commun. ACM 61(10), 36–43 (2018)

    Article  Google Scholar 

  37. Liu, G., Schulte, O., Zhu, W., Li, Q.: Toward interpretable deep reinforcement learning with linear model U-Trees. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11052, pp. 414–429. Springer, Cham (2019).

    Chapter  Google Scholar 

  38. Liu, Y., et al.: Detecting cancer metastases on gigapixel pathology images (2017). arXiv:1703.02442

  39. Loh, W.Y.: Classification and regression trees. WIREs Data Min. Knowl. Disc. 1(1), 14–23 (2011)

    Article  Google Scholar 

  40. Madumal, P., Miller, T., Sonenberg, L., Vetere, F.: Explainable reinforcement learning through a causal lens (2019). arXiv:1905.10958

  41. Martens, D., Vanthienen, J., Verbeke, W., Baesens, B.: Performance of classification models from a user perspective. Decis. Support Syst. 51(4), 782–793 (2011)

    Article  Google Scholar 

  42. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)

    Article  MathSciNet  Google Scholar 

  43. Molar, C.: Interpretable machine learning (2018). Accessed 31 Mar 2020

  44. Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Signal Proc. 73, 1–15 (2018)

    Article  MathSciNet  Google Scholar 

  45. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  46. Nguyen, T.T., Hui, P.M., Harper, F.M., Terveen, L., Konstan, J.A.: Exploring the filter bubble. In: Proceedings of the 23rd International Conference on World Wide Web - WWW 2014. ACM Press (2014)

    Google Scholar 

  47. Quinlan, J.R., et al.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, vol. 92, pp. 343–348. World Scientific (1992)

    Google Scholar 

  48. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016. ACM Press (2016)

    Google Scholar 

  49. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)

    Article  Google Scholar 

  50. Rusu, A.A., et al.: Policy distillation (2015). arXiv:1511.06295

  51. Schrittwieser, J., et al.: Mastering ATARI, go, chess and shogi by planning with a learned model (2019)

    Google Scholar 

  52. Sequeira, P., Gervasio, M.: Interestingness elements for explainable reinforcement learning: understanding agents’ capabilities and limitations (2019). arXiv:1912.09007

  53. Shu, T., Xiong, C., Socher, R.: Hierarchical and interpretable skill acquisition in multi-task reinforcement learning (2017)

    Google Scholar 

  54. Szegedy, C., et al.: Intriguing properties of neural networks (2013). arXiv:1312.6199

  55. The European Commission: Communication from the Commission to the European Parliament, the European Council, the Council, the European Economic and Social Committee and the Committee of the Regions. The European Commission (2018). Article. Accessed 27 Mar 2020

  56. The European Commission: Independent High-Level Expert Group on Artificial Intelligence set up by the European Commission. The European Commission (2018). Article. Accessed 27 Apr 2020

  57. Tomzcak, K., et al.: Let Tesla park your Tesla: driver trust in a semi-automated car. In: 2019 Systems and Information Engineering Design Symposium (SIEDS). IEEE (2019)

    Google Scholar 

  58. Uther, W.T., Veloso, M.M.: Tree based discretization for continuous state space reinforcement learning. In: AAAI/IAAI, pp. 769–774 (1998)

    Google Scholar 

  59. Veith, E., Fischer, L., Tröschel, M., Nieße, A.: Analyzing cyber-physical systems from the perspective of artificial intelligence. In: Proceedings of the 2019 International Conference on Artificial Intelligence, Robotics and Control. ACM (2019)

    Google Scholar 

  60. Veith, E.M.: Universal Smart Grid Agent for Distributed Power Generation Management. Logos Verlag Berlin GmbH, Berlin (2017)

    Google Scholar 

  61. Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. PMLR 80, 5045–5054 (2018). arXiv:1804.02477

  62. van der Waa, J., van Diggelen, J., van den Bosch, K., Neerincx, M.: Contrastive explanations for reinforcement learning in terms of expected consequences. In: IJCAI 2018 Workshop on Explainable AI (XAI), vol. 37 (2018). arXiv:1807.08706

  63. Wymann, B., Espié, E., Guionneau, C., Dimitrakakis, C., Coulom, R., Sumner, A.: TORCS, the open racing car simulator, vol. 4, no. 6, p. 2 (2000). Software

  64. Zahavy, T., Zrihem, N.B., Mannor, S.: Graying the black box: understanding DQNs (2016). arXiv:1602.02658

  65. Zhou, J., Chen, F. (eds.): Human and Machine Learning. HIS. Springer, Cham (2018).

    Book  Google Scholar 

  66. Zhou, J., Chen, F.: Towards trustworthy human-AI teaming under uncertainty. In: IJCAI 2019 Workshop on Explainable AI (XAI) (2019)

    Google Scholar 

  67. Zhou, J., Hu, H., Li, Z., Yu, K., Chen, F.: Physiological indicators for user trust in machine learning with influence enhanced fact-checking. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2019. LNCS, vol. 11713, pp. 94–113. Springer, Cham (2019).

    Chapter  Google Scholar 

Download references


This work was supported by the German Research Foundation under the grant GZ: JI 140/7-1. We thank our colleagues Stephan Balduin, Johannes Gerster, Lasse Hammer, Daniel Lange and Nils Wenninghoff for their helpful comments and contributions.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Erika Puiutta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Puiutta, E., Veith, E.M.S.P. (2020). Explainable Reinforcement Learning: A Survey. In: Holzinger, A., Kieseberg, P., Tjoa, A., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2020. Lecture Notes in Computer Science(), vol 12279. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57320-1

  • Online ISBN: 978-3-030-57321-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics