Abstract
Whereas most work in reinforcement learning (RL) ignores the structure or relationships between actions, in this paper we show that exploiting structure in the action space can improve sample efficiency during exploration. To show this we focus on concurrent action spaces where the RL agent selects multiple actions per timestep. Concurrent action spaces are challenging to learn in especially if the number of actions is large as this can lead to a combinatorial explosion of the action space.
This paper proposes two methods: a first approach uses implicit structure to perform high-level action elimination using task-invariant actions; a second approach looks for more explicit structure in the form of action clusters. Both methods are context-free, focusing only on an analysis of the action space and show a significant improvement in policy convergence times.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 4148–4152. IJCAI 2015, AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832747.2832830
Chandak, Y., Theocharous, G., Kostas, J., Jordan, S., Thomas, P.: Learning action representations for reinforcement learning. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 941–950. PMLR, Long Beach, California, USA (09–15 June 2019). http://proceedings.mlr.press/v97/chandak19a.html
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI (1998)
Dimakopoulou, M., Van Roy, B.: Coordinated exploration in concurrent reinforcement learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1271–1279. PMLR, Stockholmsmässan, Stockholm Sweden (2018)
Harmer, J., et al.: Imitation learning with concurrent actions in 3D games. In: 2018 IEEE Conference on Computational Intelligence and Games, CIG 2018, Maastricht, The Netherlands, August 14–17, 2018, pp. 1–8 (2018). https://doi.org/10.1109/CIG.2018.8490398
Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1301.html/abs-1301-3781
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 1928–1937. ICML 2016, JMLR.org (2016). http://dl.acm.org/citation.cfm?id=3045390.3045594
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 849–856. MIT Press, Cambridge (2002). http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf
Rohanimanesh, K.: Concurrent decision making in Markov decision processes. Citeseer (2006)
Rosman, B., Ramamoorthy, S.: Action priors for learning domain invariances. IEEE Trans. Auton. Ment. Dev. 7(2), 107–118 (2015)
Sharma, S., Suresh, A., Ramesh, R., Ravindran, B.: Learning to factor policies and action-value functions: factored action space representations for deep reinforcement learning. CoRR abs/1705.07269 (2017). http://arxiv.org/abs/1705.07269
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354 (2017). https://doi.org/10.1038/nature24270
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998). http://www.cs.ualberta.ca/~sutton/book/the-book.html
Tennenholtz, G., Mannor, S.: The natural language of actions. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6196–6205. PMLR, Long Beach, California, USA (09–15 Jun 2019). http://proceedings.mlr.press/v97/tennenholtz19a.html
Vinyals, O., et al.: Starcraft II: a new challenge for reinforcement learning. CoRR abs/1708.04782 (2017). http://arxiv.org/abs/1708.04782
Wang, H., Yu, Y.: Exploring multi-action relationship in reinforcement learning. In: Booth, R., Zhang, M.-L. (eds.) PRICAI 2016. LNCS (LNAI), vol. 9810, pp. 574–587. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42911-3_48
Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 3562–3573. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7615-learn-what-not-to-learn-action-elimination-with-deep-reinforcement-learning.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Moodley, P., Rosman, B., Hong, X. (2019). Understanding Structure of Concurrent Actions. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science(), vol 11927. Springer, Cham. https://doi.org/10.1007/978-3-030-34885-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-34885-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34884-7
Online ISBN: 978-3-030-34885-4
eBook Packages: Computer ScienceComputer Science (R0)