Actor vs Critic: Learning the Policy or Learning the Value

Scharf, Fabian; Helfenstein, Felix; Jäger, Jonas

doi:10.1007/978-3-030-41188-6_11

Part of the book series: Studies in Computational Intelligence ((SCI,volume 883))

3171 Accesses
1 Citations

Abstract

The widespread use of value-based, policy gradient, and actor-critic methods for solving problems in the area of Reinforcement Learning raises the question whether one of these methods is superior to the others in general or at least whether it is more appropriate to use a particular one under certain circumstances. In this paper, we will compare these methods and identify their advantages and disadvantages. Moreover, we will illustrate the insights obtained using the examples of REINFORCE, DQN and DDPG for a better understanding. Finally, we will give brief suggestions about which approach to use under certain conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. arXiv:1708.05866 (2017)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming, vol. 5. Athena Scientific Belmont, MA (1996)
MATH Google Scholar
Bhatnagar, S., Ghavamzadeh, M., Lee, M., Sutton, R.S.: Incremental natural actor-critic algorithms. In: Advances in neural information processing systems, pp. 105–112 (2008)
Google Scholar
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning, pp. 1329–1338 (2016)
Google Scholar
Fairbank, M., Alonso, E.: The divergence of reinforcement learning algorithms with value-iteration and function approximation. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2012)
Google Scholar
Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S., Riedmiller, M., et al.: Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017)
Ji Lin, L.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992). https://doi.org/10.1007/BF00992699
Article Google Scholar
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013)
Article Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in neural information processing systems, pp. 1008–1014 (2000)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Onactor-critic algorithms. SIAM journal on Control and Optimization 42(4), 1143–1166 (2003)
Article MathSciNet Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of machine learning research 4(Dec), 1107–1149 (2003)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37(4–5), 421–436 (2018)
Article Google Scholar
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2015). URL http://arxiv.org/abs/1509.02971
Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A.J., Banino, A., Denil, M., Goroshin, R., Sifre, L., Kavukcuoglu, K., et al.: Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp. 1928–1937 (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013). URL http://arxiv.org/abs/1312.5602
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). URL http://dx.doi.org/10.1038/nature14236
Peters, J., Bagnell, J.A.: Policy gradient methods. Encyclopedia of Machine Learning pp. 774–776 (2010)
Google Scholar
Peters, J., Schaal, S.: Policy gradient methods for robotics. In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2219–2225. IEEE (2006)
Google Scholar
Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schulman, J., Todorov, E., Levine, S.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087 (2017)
Riedmiller, M.: Neural fitted q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) Machine Learning: ECML 2005, pp. 317–328. Springer, Berlin Heidelberg, Berlin, Heidelberg (2005)
Chapter Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering Cambridge, England (1994)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961
Article Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: ICML (2014)
Google Scholar
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., Van Den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature (2017). https://doi.org/10.1038/nature24270
Article Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine learning 3(1), 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.: Reinforcement learning : an introduction (2018)
Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp. 1057–1063 (2000)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5), 674–690 (1997). https://doi.org/10.1109/9.580874
Article MathSciNet MATH Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Machine learning 8(3–4), 279–292 (1992)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3–4), 229–256 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Darmstadt, 64289, Darmstadt, Germany
Fabian Scharf, Felix Helfenstein & Jonas Jäger

Authors

Fabian Scharf
View author publications
You can also search for this author in PubMed Google Scholar
Felix Helfenstein
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Jäger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabian Scharf .

Editor information

Editors and Affiliations

Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Boris Belousov
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Hany Abdulsamad
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Pascal Klink
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Simone Parisi
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Jan Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Scharf, F., Helfenstein, F., Jäger, J. (2021). Actor vs Critic: Learning the Policy or Learning the Value. In: Belousov, B., Abdulsamad, H., Klink, P., Parisi, S., Peters, J. (eds) Reinforcement Learning Algorithms: Analysis and Applications. Studies in Computational Intelligence, vol 883. Springer, Cham. https://doi.org/10.1007/978-3-030-41188-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-41188-6_11
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41187-9
Online ISBN: 978-3-030-41188-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics