Abstract
One critical prerequisite for the deployment of reinforcement learning systems in the real world is the ability to reliably detect situations on which the agent was not trained. Such situations could lead to potential safety risks when wrong predictions lead to the execution of harmful actions. In this work, we propose PEOC, a new policy entropy based out-of-distribution classifier that reliably detects unencountered states in deep reinforcement learning. It is based on using the entropy of an agent’s policy as the classification score of a one-class classifier. We evaluate our approach using a procedural environment generator. Results show that PEOC is highly competitive against state-of-the-art one-class classification algorithms on the evaluated environments. Furthermore, we present a structured process for benchmarking out-of-distribution classification in reinforcement learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: Outlier analysis. In: Aggarwal, C.C., et al. (eds.) Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Andrychowicz, O.M., Baker, et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019)
Brockman, G., et al.: Openai gym (2016)
Cobbe, K., Hesse, C., Hilton, J., Schulman, J.: Leveraging procedural generation to benchmark reinforcement learning. arXiv preprint arXiv:1912.01588 (2019)
Dhariwal, P., et al.: Openai baselines (2017)
Espeholt, L., Soyer, H., Munos, R., et al.: IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures. CoRR (2018)
Farebrother, J., Machado, M.C., Bowling, M.: Generalization and regularization in DQN (2018)
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1352–1361. JMLR. org (2017)
Hendrycks, D., Gimpel, K.: A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. ArXiv e-prints, October 2016
Liang, S., Li, Y., Srikant, R.: Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. ArXiv e-prints, June 2017
Liu, Y., et al.: Generative adversarial active learning for unsupervised outlier detection. IEEE Trans. Knowl. Data Eng. (2019)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. CoRR (2016)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Sig. Process. 99, 215–249 (2014)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
Schulman, J., Wolski, F., et al.: Proximal policy optimization algorithms. CoRR (2017)
Sedlmeier, A., Gabor, T., Phan, T., Belzner, L., Linnhoff-Popien, C.: Uncertainty-based out-of-distribution classification in deep reinforcement learning, pp. 522–529 (2020)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
Williams, R.J., Peng, J.: Function optimization using connectionist reinforcement learning algorithms. Connection Sci. 3(3), 241–268 (1991)
Zhang, C., Vinyals, O., Munos, R., Bengio, S.: A study on overfitting in deep reinforcement learning (2018)
Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sedlmeier, A., Müller, R., Illium, S., Linnhoff-Popien, C. (2020). Policy Entropy for Out-of-Distribution Classification. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-61616-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)