Skip to main content

Deep Reinforcement Learning

  • Chapter
  • First Online:
Neural Networks and Deep Learning
  • 3713 Accesses

Abstract

Human beings do not learn from a concrete notion of training data. Learning in humans is a continuous experience-driven process in which decisions are made, and the reward/punishment received from the environment are used to guide the learning process for future decisions. In other words, learning in intelligent beings is by reward-guided trial and error.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. D. Amodei at al. Concrete problems in AI safety. arXiv:1606.06565, 2016. https://arxiv.org/abs/1606.06565

  2. B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. arXiv:1611.02167, 2016. https://arxiv.org/abs/1611.02167

  3. J. Baxter, A. Tridgell, and L. Weaver. Knightcap: a chess program that learns by combining td (lambda) with game-tree search. arXiv cs/9901002, 1999. https://arxiv.org/abs/cs/9901002

  4. M. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, pp. 253–279, 2013.

    Article  Google Scholar 

  5. M. Bojarski et al. End to end learning for self-driving cars. arXiv:1604.07316, 2016. https://arxiv.org/abs/1604.07316

  6. M. Bojarski et al. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car. arXiv:1704.07911, 2017. https://arxiv.org/abs/1704.07911

  7. C. Browne et al. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp. 1–43, 2012.

    Article  Google Scholar 

  8. C. Clark and A. Storkey. Training deep convolutional neural networks to play go. ICML, pp. 1766–1774, 2015.

    Google Scholar 

  9. S. Gelly et al. The grand challenge of computer Go: Monte Carlo tree search and extensions. Communications of the ACM, 55, pp. 106–113, 2012.

    Article  Google Scholar 

  10. P. Glynn. Likelihood ratio gradient estimation: an overview, Proceedings of the 1987 Winter Simulation Conference, pp. 366–375, 1987.

    Google Scholar 

  11. I. Grondman, L. Busoniu, G. A. Lopes, and R. Babuska. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, 42(6), pp. 1291–1307, 2012.

    Article  Google Scholar 

  12. H. van Hasselt, A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q-Learning. AAAI, 2016.

    Google Scholar 

  13. N. Heess et al. Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286, 2017. https://arxiv.org/abs/1707.02286Video 1 at:https://www.youtube.com/watch?v=hx_bgoTF7bsVideo 2 at:https://www.youtube.com/watch?v=gn4nRCC9TwQ&feature=youtu.be

  14. S. Kakade. A natural policy gradient. NeurIPS, pp. 1057–1063, 2002.

    Google Scholar 

  15. L. Kocsis and C. Szepesvari. Bandit based monte-carlo planning. ECML, pp. 282–293, 2006.

    Google Scholar 

  16. M. Lai. Giraffe: Using deep reinforcement learning to play chess. arXiv:1509.01549, 2015. https://arxiv.org/abs/1509.01549

  17. S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39), pp. 1–40, 2016. Video at:https://sites.google.com/site/visuomotorpolicy/

  18. M. Lewis et al. Deal or No Deal? End-to-End Learning for Negotiation Dialogues. arXiv:1706.05125, 2017. https://arxiv.org/abs/1706.05125

  19. J. Li, W. Monroe, A. Ritter, M. Galley,, J. Gao, and D. Jurafsky. Deep reinforcement learning for dialogue generation. arXiv:1606.01541, 2016. https://arxiv.org/abs/1606.01541

  20. Y. Li. Deep reinforcement learning: An overview. arXiv:1701.07274, 2017. https://arxiv.org/abs/1701.07274

  21. L.-J. Lin. Reinforcement learning for robots using neural networks. Technical Report, DTIC Document, 1993.

    Google Scholar 

  22. C. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in Go using deep convolutional neural networks. ICLR, 2015.

    Google Scholar 

  23. V. Mnih et al. Human-level control through deep reinforcement learning. Nature, 518 (7540), pp. 529–533, 2015.

    Article  Google Scholar 

  24. V. Mnih et al. Playing atari with deep reinforcement learning. arXiv:1312.5602., 2013. https://arxiv.org/abs/1312.5602

  25. V. Mnih et al. Asynchronous methods for deep reinforcement learning. ICML, pp. 1928–1937, 2016.

    Google Scholar 

  26. V. Mnih, N. Heess, and A. Graves. Recurrent models of visual attention. NeurIPS, pp. 2204–2212, 2014.

    Google Scholar 

  27. A. Moore and C. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1), pp. 103–130, 1993.

    Article  Google Scholar 

  28. M. Müller, M. Enzenberger, B. Arneson, and R. Segal. Fuego - an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2, pp. 259–270, 2010.

    Article  Google Scholar 

  29. K. S. Narendra and K. Parthasarathy. Identification and control of dynamical systems using neural networks. IEEE TNNLS, 1(1), pp. 4–27, 1990.

    Google Scholar 

  30. A. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. Uncertainity in Artificial Intelligence, pp. 406–415, 2000.

    Google Scholar 

  31. J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), pp. 682–697, 2008.

    Article  Google Scholar 

  32. D. Pomerleau. ALVINN: An autonomous land vehicle in a neural network. Technical Report, Carnegie Mellon University, 1989.

    Google Scholar 

  33. G. Rummery, and M. Niranjan. Online Q-learning using connectionist systems (Vol. 37). University of Cambridge, Department of Engineering, 1994.

    Google Scholar 

  34. A. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, pp. 210–229, 1959.

    Article  MathSciNet  Google Scholar 

  35. W. Saunders, G. Sastry, A. Stuhlmueller, and O. Evans. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. arXiv:1707.05173, 2017. https://arxiv.org/abs/1707.05173

  36. S. Schaal. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3(6), pp. 233–242, 1999.

    Article  Google Scholar 

  37. T. Schaul, J. Quan, I. Antonoglou, and D. Silver. Prioritized experience replay. arXiv:1511.05952, 2015. https://arxiv.org/abs/1511.05952

  38. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. ICML, 2015.

    Google Scholar 

  39. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. ICLR, 2016.

    Google Scholar 

  40. I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI, pp. 3776–3784, 2016.

    Google Scholar 

  41. D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529.7587, pp. 484–489, 2016.

    Article  Google Scholar 

  42. D. Silver et al. Mastering the game of go without human knowledge. Nature, 550.7676, pp. 354–359, 2017.

    Article  Google Scholar 

  43. D. Silver et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv, 2017. https://arxiv.org/abs/1712.01815

  44. H. Simon. The Sciences of the Artificial. MIT Press, 1996.

    Google Scholar 

  45. I. Sutskever and V. Nair. Mimicking Go experts with convolutional neural networks. International Conference on Artificial Neural Networks, pp. 101–110, 2008.

    Google Scholar 

  46. R. Sutton. Learning to Predict by the Method of Temporal Differences, Machine Learning, 3, pp. 9–44, 1988.

    Article  Google Scholar 

  47. R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

    Google Scholar 

  48. R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. NeurIPS, pp. 1057–1063, 2000.

    Google Scholar 

  49. G. Tesauro. Practical issues in temporal difference learning. NeurIPS, pp. 259–266, 1992.

    Google Scholar 

  50. G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), pp. 58–68, 1995.

    Article  Google Scholar 

  51. S. Thrun. Learning to play the game of chess NeurIPS, pp. 1069–1076, 1995.

    Google Scholar 

  52. Y. Tian, Q. Gong, W. Shang, Y. Wu, and L. Zitnick. ELF: An extensive, lightweight and flexible research platform for real-time strategy games. arXiv:1707.01067, 2017. https://arxiv.org/abs/1707.01067

  53. O. Vinyals and Q. Le. A Neural Conversational Model. arXiv:1506.05869, 2015. https://arxiv.org/abs/1506.05869

  54. C. J. H. Watkins. Learning from delayed rewards. PhD Thesis, King’s College, Cambridge, 1989.

    Google Scholar 

  55. C. J. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3–4), pp. 279–292, 1992.

    Article  MATH  Google Scholar 

  56. R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), pp. 229–256, 1992.

    Article  MATH  Google Scholar 

  57. K. Xu et al. Show, attend, and tell: Neural image caption generation with visual attention. ICML, 2015.

    Google Scholar 

  58. V. Zhong, C. Xiong, and R. Socher. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103, 2017. https://arxiv.org/abs/1709.00103

  59. B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv:1611.01578, 2016. https://arxiv.org/abs/1611.01578

  60. https://www.cs.toronto.edu/~kriz/cifar.html

  61. http://deeplearning.mit.edu/

  62. http://karpathy.github.io/2016/05/31/rl/

  63. https://github.com/hughperkins/kgsgo-dataset-preprocessor

  64. https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/ https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/

  65. https://qz.com/639952/googles-ai-won-the-game-go-by-defying-millennia-of-basic-human-instinct/

  66. http://www.mujoco.org/

  67. https://sites.google.com/site/gaepapersupp/home

  68. https://www.youtube.com/watch?v=1L0TKZQcUtA&list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf

  69. https://openai.com/

  70. https://www.youtube.com/watch?v=2pWv7GOvuf0

  71. https://gym.openai.com

  72. https://universe.openai.com

  73. https://github.com/facebookresearch/ParlAI

  74. https://github.com/openai/baselines

  75. https://github.com/carpedm20/deep-rl-tensorflow

  76. https://github.com/matthiasplappert/keras-rl

  77. http://apollo.auto/

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Aggarwal, C. (2023). Deep Reinforcement Learning. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-031-29642-0_11

Download citation

Publish with us

Policies and ethics