Understanding Structure of Concurrent Actions

Moodley, Perusha; Rosman, Benjamin; Hong, Xia

doi:10.1007/978-3-030-34885-4_6

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11927))

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

1051 Accesses

Abstract

Whereas most work in reinforcement learning (RL) ignores the structure or relationships between actions, in this paper we show that exploiting structure in the action space can improve sample efficiency during exploration. To show this we focus on concurrent action spaces where the RL agent selects multiple actions per timestep. Concurrent action spaces are challenging to learn in especially if the number of actions is large as this can lead to a combinatorial explosion of the action space.

This paper proposes two methods: a first approach uses implicit structure to perform high-level action elimination using task-invariant actions; a second approach looks for more explicit structure in the form of action clusters. Both methods are context-free, focusing only on an analysis of the action space and show a significant improvement in policy convergence times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 4148–4152. IJCAI 2015, AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832747.2832830
Chandak, Y., Theocharous, G., Kostas, J., Jordan, S., Thomas, P.: Learning action representations for reinforcement learning. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 941–950. PMLR, Long Beach, California, USA (09–15 June 2019). http://proceedings.mlr.press/v97/chandak19a.html
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI (1998)
Google Scholar
Dimakopoulou, M., Van Roy, B.: Coordinated exploration in concurrent reinforcement learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1271–1279. PMLR, Stockholmsmässan, Stockholm Sweden (2018)
Google Scholar
Harmer, J., et al.: Imitation learning with concurrent actions in 3D games. In: 2018 IEEE Conference on Computational Intelligence and Games, CIG 2018, Maastricht, The Netherlands, August 14–17, 2018, pp. 1–8 (2018). https://doi.org/10.1109/CIG.2018.8490398
Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z
Article MathSciNet Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1301.html/abs-1301-3781
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 1928–1937. ICML 2016, JMLR.org (2016). http://dl.acm.org/citation.cfm?id=3045390.3045594
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 849–856. MIT Press, Cambridge (2002). http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf
Rohanimanesh, K.: Concurrent decision making in Markov decision processes. Citeseer (2006)
Google Scholar
Rosman, B., Ramamoorthy, S.: Action priors for learning domain invariances. IEEE Trans. Auton. Ment. Dev. 7(2), 107–118 (2015)
Article Google Scholar
Sharma, S., Suresh, A., Ramesh, R., Ravindran, B.: Learning to factor policies and action-value functions: factored action space representations for deep reinforcement learning. CoRR abs/1705.07269 (2017). http://arxiv.org/abs/1705.07269
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354 (2017). https://doi.org/10.1038/nature24270
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998). http://www.cs.ualberta.ca/~sutton/book/the-book.html
MATH Google Scholar
Tennenholtz, G., Mannor, S.: The natural language of actions. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6196–6205. PMLR, Long Beach, California, USA (09–15 Jun 2019). http://proceedings.mlr.press/v97/tennenholtz19a.html
Vinyals, O., et al.: Starcraft II: a new challenge for reinforcement learning. CoRR abs/1708.04782 (2017). http://arxiv.org/abs/1708.04782
Wang, H., Yu, Y.: Exploring multi-action relationship in reinforcement learning. In: Booth, R., Zhang, M.-L. (eds.) PRICAI 2016. LNCS (LNAI), vol. 9810, pp. 574–587. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42911-3_48
Chapter Google Scholar
Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 3562–3573. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7615-learn-what-not-to-learn-action-elimination-with-deep-reinforcement-learning.pdf

Download references

Author information

Authors and Affiliations

University of Reading, Reading, UK
Perusha Moodley & Xia Hong
University of the Witwatersrand, Johannesburg, South Africa
Benjamin Rosman
CSIR, Johannesburg, South Africa
Benjamin Rosman

Authors

Perusha Moodley
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Rosman
View author publications
You can also search for this author in PubMed Google Scholar
Xia Hong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Perusha Moodley .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, UK
Max Bramer
Middlesex University, London, UK
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moodley, P., Rosman, B., Hong, X. (2019). Understanding Structure of Concurrent Actions. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science(), vol 11927. Springer, Cham. https://doi.org/10.1007/978-3-030-34885-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-34885-4_6
Published: 19 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34884-7
Online ISBN: 978-3-030-34885-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics