Model-Free Deep Reinforcement Learning—Algorithms and Applications

Otto, Fabian

doi:10.1007/978-3-030-41188-6_10

Fabian Otto⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 883))

3361 Accesses
5 Citations

Abstract

This survey presents an overview of the current model-free deep reinforcement learning landscape. It provides a comparison of state-of-the-art on-policy and off-policy algorithms in the value-based and policy-based domain. Influences and possible drawbacks of different algorithmic approaches are analyzed and associated with new improvements in order to overcome previous problems. Further, the survey shows application scenarios for difficult domains, including the game of Go, Starcraft II, Dota 2, and the Rubik’s Cube.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdolmaleki, A., Springenberg, J.T., Tassa, Y., Munos, R., Heess, N., Riedmiller, M.: Maximum a Posteriori Policy Optimisation. arXiv preprint arXiv:1806.06920 (2018)
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU (2017)
Google Scholar
Barth-Maron, G., Hoffman, M.W., Budden, D., Dabney, W., Horgan, D., TB, D., Muldal, A., Heess, N., Lillicrap, T.: Distributed distributional deterministic policy gradients. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: International Conference for Machine Learning (ICML) (2017)
Google Scholar
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference for Machine Learning (ICML), pp. 1582–1591 (2018)
Google Scholar
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep Q-Learning with model-based acceleration. In: International Conference on Machine Learning (ICML), pp. 2829–2838 (2016)
Google Scholar
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning (ICML), pp. 1352–1361 (2017)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft Actor-Critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference for Machine Learning (ICML) (2018)
Google Scholar
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., Levine, S.: Soft Actor-Critic Algorithms and Applications. arXiv preprint (2018)
Google Scholar
van Hasselt, H.: Double Q-learning. In: Conference on Neural Information Processing Systems (NIPS), pp. 2613–2621 (2010)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep Reinforcement Learning with Double Q-learning. In: AAAI Conference on Artificial Intelligence, pp. 2094–2100 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S.M.A., Riedmiller, M., Silver, D.: Emergence of Locomotion Behaviours in Rich Environments (2017)
Google Scholar
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Hunter, D.R., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58(1), 30–37 (2004)
Article MathSciNet Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp. 448–456 (2015)
Google Scholar
Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P.l., Chao, C., Clark, C., Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T.V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C.R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., Yoon, D.H.: In-datacenter performance analysis of a tensor processing unit. In: International Symposium on Computer Architecture (ISCA), pp. 1–12 (2017)
Google Scholar
Kakade, S.: A natural policy gradient. In: Conference on Neural Information Processing Systems (NIPS), pp. 1531–1538 (2001)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Google Scholar
Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Conference on Neural Information Processing Systems (NIPS), pp. 3675–3683 (2016)
Google Scholar
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3), 293–321 (1992)
Google Scholar
Lowe, R., WU, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Conference on Neural Information Processing Systems (NIPS), pp. 6379–6390 (2017)
Google Scholar
Martens, J., Grosse, R.: Optimizing neural networks with kronecker-factored approximate curvature. In: International conference on machine learning (ICML), pp. 2408–2417 (2015)
Google Scholar
McAleer, S., Agostinelli, F., Shmakov, A., Baldi, P.: Solving the Rubik’s cube without human knowledge. In: Conference on Neural Information Processing Systems (NIPS) (2018)
Google Scholar
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Mathe. Biophys. 5(4), 115–133 (1943)
Article MathSciNet Google Scholar
Minsky, M., Papert, S.: An introduction to computational geometry. MIT Press (1969)
Google Scholar
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 1928–1937 (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.: Safe and efficient off-policy reinforcement learning. In: Conference on Neural Information Processing Systems (NIPS), pp. 1054–1062 (2016)
Google Scholar
OpenAI: Openai five. https://blog.openai.com/openai-five/ (2018)
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)
Article Google Scholar
Pollack, J.B., Blair, A.D.: Why did TD-gammon work? In: Conference on Neural Information Processing Systems (NIPS), pp. 10–16 (1997)
Google Scholar
Riedmiller, M.: Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: European Conference on Machine Learning (ECML), pp. 317–328 (2005)
Google Scholar
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, pp. 586–591 (1993)
Google Scholar
Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review pp. 65–386 (1958)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Rummery, G.A., Niranjan, M.: On-Line Q-Learning Using Connectionist Systems, vol. 37. University of Cambridge, Department of Engineering Cambridge, England (1994)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference for Learning Representations (ICLR) (2015)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. Int. Conf. Mach. Lear. (ICML) 37, 1889–1897 (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017)
Google Scholar
Schulze, C., Schulze, M.: ViZDoom: DRQN with prioritized experience replay, Double-Q learning and snapshot ensembling. In: Intelligent Systems and Applications, pp. 1–17 (2018)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., Hassabis, D.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Article MathSciNet Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning (ICML), pp. 387–395 (2014)
Google Scholar
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, second edn. MIT Press (2018)
Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: S.A. Solla, T.K. Leen, K. Müller (eds.) Advances in Neural Information Processing Systems 12, pp. 1057–1063. MIT Press (2000)
Google Scholar
Tesauro, G.: TD-gammon, a self-teaching backgammon program, achieves master-level play. Appl. Neural Netw. 6(2), 215–219 (1994)
Google Scholar
Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W.M., Dudzik, A., Huang, A., Georgiev, P., Powell, R., Ewalds, T., Horgan, D., Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dalibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets, S., Molloy, J., Cai, T., Budden, D., Paine, T., Gulcehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y., Yogatama, D., Cohen, J., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Apps, C., Kavukcuoglu, K., Hassabis, D., Silver, D.: AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/ (2019)
Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., de Freitas, N.: Sample efficient actor-critic with experience replay. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 1995–2003 (2016)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992)
MATH Google Scholar
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Conference on Neural Information Processing Systems (NIPS), pp. 5279–5288 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Universität Tübingen, Computer Science Department, Tübingen, Germany
Fabian Otto

Authors

Fabian Otto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabian Otto .

Editor information

Editors and Affiliations

Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Boris Belousov
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Hany Abdulsamad
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Pascal Klink
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Simone Parisi
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Jan Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Otto, F. (2021). Model-Free Deep Reinforcement Learning—Algorithms and Applications. In: Belousov, B., Abdulsamad, H., Klink, P., Parisi, S., Peters, J. (eds) Reinforcement Learning Algorithms: Analysis and Applications. Studies in Computational Intelligence, vol 883. Springer, Cham. https://doi.org/10.1007/978-3-030-41188-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-41188-6_10
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41187-9
Online ISBN: 978-3-030-41188-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics