Abstract
Given the plethora of Reinforcement Learning algorithms available in the literature, it can prove challenging to decide on the most appropriate one to use in order to solve a given Reinforcement Learning task. This work presents a benchmark study on the performance of several Reinforcement Learning algorithms for discrete learning environments. The study includes several deep as well as non-deep learning algorithms, with special focus on the Deep Q-Network algorithm and its variants. Neural Fitted Q-Iteration, the predecessor of Deep Q-Network as well as Vanilla Policy Gradient and a planner were also included in this assessment in order to provide a wider range of comparison between different approaches and paradigms. Three learning environments were used in order to carry out the tests, including a 2D maze and two OpenAI Gym environments, namely a custom-built Foraging/Tagging environment and the CartPole environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 2094–2100. AAAI Press, Phoenix (2016)
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement learning. In: 33rd International Conference on Machine Learning (ICML 2016), pp. 1995–2003. JMLR, New York (2016)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations, San Juan, Puerto Rico (2016)
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: 34th International Conference on Machine Learning, pp. 449–458. JMLR, Sydney (2017)
Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: 12th International Conference on Neural Information Processing Systems, pp. 1057–1063. MIT Press, Cambridge (1999)
Schulman, J., Levine, S., Moritz, P., Jordan, M., Abbeel, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, pp. 1889–1897. JMLR, Lille (2015)
Wu, Y., Mansimov, E., Liao, S., Grosse, R., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5280–5289. Curran Associates, California, USA (2017)
Urtans, E., Nikitenko, A.: Survey of deep Q-network variants in PyGame learning environment. In: 2nd International Conference on Deep Learning Technologies, pp. 27–36. ACM, Chongqing (2018)
Van Hasselt, H.: Double Q-learning. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 2613–2621. Curran Associates, Vancouver (2010)
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: 33rd International Conference on Machine Learning, pp. 1329–1338. JMLR, New York (2016)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Sig. Process. Mag. 34(6), 26–38 (2017)
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 3207–3214. AAAI Press, Louisiana (2018)
POPF Homepage. https://nms.kcl.ac.uk/planning/software/popf.html. Accessed 13 June 2019
Acknowledgements
This work was supported by National Funds through the FCT - Foundation for Science and Technology in the context of the project UID/CEC/00127/2019 and also by FCT PhD scholarship SFRH/BD/145723/2019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Duarte, F.F., Lau, N., Pereira, A., Reis, L.P. (2020). Benchmarking Deep and Non-deep Reinforcement Learning Algorithms for Discrete Environments. In: Silva, M., Luís Lima, J., Reis, L., Sanfeliu, A., Tardioli, D. (eds) Robot 2019: Fourth Iberian Robotics Conference. ROBOT 2019. Advances in Intelligent Systems and Computing, vol 1093. Springer, Cham. https://doi.org/10.1007/978-3-030-36150-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-36150-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36149-5
Online ISBN: 978-3-030-36150-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)