Skip to main content

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13566))

Included in the following conference series:

Abstract

The model-based reinforcement learning paradigm, which uses planning algorithms and neural network models, has recently achieved unprecedented results in diverse applications, leading to what is now known as deep reinforcement learning. These agents are quite complex and involve multiple components, factors that create challenges for research and development of new models. In this work, we propose a new modular software architecture suited for these types of agents, and a set of building blocks that can be easily reused and assembled to construct new model-based reinforcement learning agents. These building blocks include search algorithms, policies, and loss functions (Code available at https://github.com/GaspTO/Modular_MBRL).

We illustrate the use of this architecture by combining several of these building blocks to implement and test agents that are optimized to three different test environments: Cartpole, Minigrid, and Tictactoe. One particular search algorithm, made available in our implementation and not previously used in reinforcement learning, which we called averaged minimax, achieved good results in the three tested environments. Experiments performed with our implementation showed the best combination of search, policy, and loss algorithms to be heavily problem dependent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anthony, T., Tian, Z., Barber, D.: Thinking fast and slow with deep learning and tree search (2017)

    Google Scholar 

  2. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 5, 834–846 (1983)

    Article  Google Scholar 

  3. Borges, A., Oliveira, A.: Combining off and on-policy training in model-based reinforcement learning. arXiv preprint arXiv:2102.12194 (2021)

  4. Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)

  5. Campbell, M., Hoane, A., hsiung Hsu, F.: Deep blue. Artif. Intell. 134(1), 57–83 (2002). https://doi.org/10.1016/S0004-3702(01)00129-1, https://www.sciencedirect.com/science/article/pii/S0004370201001291

  6. Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI gym (2018). https://github.com/maximecb/gym-minigrid

  7. Cohen-Solal, Q.: Learning to play two-player perfect-information games without knowledge. arXiv preprint arXiv:2008.01188 (2020)

  8. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7

    Chapter  Google Scholar 

  9. Hamrick, J.B., et al.: Combining Q-learning and search with amortized value estimates. arXiv preprint arXiv:1912.02807 (2019)

  10. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  11. Oh, J., Singh, S., Lee, H.: Value prediction network. arXiv preprint arXiv:1707.03497 (2017)

  12. Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)

    Article  Google Scholar 

  13. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  14. Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)

    Article  MathSciNet  Google Scholar 

  15. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  16. de Vries, J.A., Voskuil, K.S., Moerland, T.M., Plaat, A.: Visualizing muzero models. arXiv preprint arXiv:2102.12924 (2021)

  17. Wang, H., Emmerich, M., Preuss, M., Plaat, A.: Alternative loss functions in alphazero-like self-play. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 155–162 (2019). https://doi.org/10.1109/SSCI44817.2019.9002814

Download references

Acknowledgments

This work was supported by the Portuguese Science Foundation, under projects PRELUNA PTDC/CCI-INF/4703/2021 and UIDB/50021/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tiago Gaspar Oliveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oliveira, T.G., Oliveira, A.L. (2022). Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16474-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16473-6

  • Online ISBN: 978-3-031-16474-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics