Policy Entropy for Out-of-Distribution Classification

Sedlmeier, Andreas; Müller, Robert; Illium, Steffen; Linnhoff-Popien, Claudia

doi:10.1007/978-3-030-61616-8_34

Andreas Sedlmeier¹¹,
Robert Müller¹¹,
Steffen Illium¹¹ &
…
Claudia Linnhoff-Popien¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

International Conference on Artificial Neural Networks

2163 Accesses
2 Citations

Abstract

One critical prerequisite for the deployment of reinforcement learning systems in the real world is the ability to reliably detect situations on which the agent was not trained. Such situations could lead to potential safety risks when wrong predictions lead to the execution of harmful actions. In this work, we propose PEOC, a new policy entropy based out-of-distribution classifier that reliably detects unencountered states in deep reinforcement learning. It is based on using the entropy of an agent’s policy as the classification score of a one-class classifier. We evaluate our approach using a procedural environment generator. Results show that PEOC is highly competitive against state-of-the-art one-class classification algorithms on the evaluated environments. Furthermore, we present a structured process for benchmarking out-of-distribution classification in reinforcement learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C.C.: Outlier analysis. In: Aggarwal, C.C., et al. (eds.) Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Chapter Google Scholar
Andrychowicz, O.M., Baker, et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019)
Brockman, G., et al.: Openai gym (2016)
Google Scholar
Cobbe, K., Hesse, C., Hilton, J., Schulman, J.: Leveraging procedural generation to benchmark reinforcement learning. arXiv preprint arXiv:1912.01588 (2019)
Dhariwal, P., et al.: Openai baselines (2017)
Google Scholar
Espeholt, L., Soyer, H., Munos, R., et al.: IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures. CoRR (2018)
Google Scholar
Farebrother, J., Machado, M.C., Bowling, M.: Generalization and regularization in DQN (2018)
Google Scholar
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1352–1361. JMLR. org (2017)
Google Scholar
Hendrycks, D., Gimpel, K.: A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. ArXiv e-prints, October 2016
Google Scholar
Liang, S., Li, Y., Srikant, R.: Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. ArXiv e-prints, June 2017
Google Scholar
Liu, Y., et al.: Generative adversarial active learning for unsupervised outlier detection. IEEE Trans. Knowl. Data Eng. (2019)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. CoRR (2016)
Google Scholar
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Sig. Process. 99, 215–249 (2014)
Article Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
Google Scholar
Schulman, J., Wolski, F., et al.: Proximal policy optimization algorithms. CoRR (2017)
Google Scholar
Sedlmeier, A., Gabor, T., Phan, T., Belzner, L., Linnhoff-Popien, C.: Uncertainty-based out-of-distribution classification in deep reinforcement learning, pp. 522–529 (2020)
Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
Google Scholar
Williams, R.J., Peng, J.: Function optimization using connectionist reinforcement learning algorithms. Connection Sci. 3(3), 241–268 (1991)
Article Google Scholar
Zhang, C., Vinyals, O., Munos, R., Bengio, S.: A study on overfitting in deep reinforcement learning (2018)
Google Scholar
Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

LMU Munich, Munich, Germany
Andreas Sedlmeier, Robert Müller, Steffen Illium & Claudia Linnhoff-Popien

Authors

Andreas Sedlmeier
View author publications
You can also search for this author in PubMed Google Scholar
Robert Müller
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Illium
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Linnhoff-Popien
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Sedlmeier .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sedlmeier, A., Müller, R., Illium, S., Linnhoff-Popien, C. (2020). Policy Entropy for Out-of-Distribution Classification. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_34
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics