Abstract
Human beings do not learn from a concrete notion of training data. Learning in humans is a continuous experience-driven process in which decisions are made, and the reward/punishment received from the environment are used to guide the learning process for future decisions. In other words, learning in intelligent beings is by reward-guided trial and error.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
D. Amodei at al. Concrete problems in AI safety. arXiv:1606.06565, 2016. https://arxiv.org/abs/1606.06565
B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. arXiv:1611.02167, 2016. https://arxiv.org/abs/1611.02167
J. Baxter, A. Tridgell, and L. Weaver. Knightcap: a chess program that learns by combining td (lambda) with game-tree search. arXiv cs/9901002, 1999. https://arxiv.org/abs/cs/9901002
M. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, pp. 253–279, 2013.
M. Bojarski et al. End to end learning for self-driving cars. arXiv:1604.07316, 2016. https://arxiv.org/abs/1604.07316
M. Bojarski et al. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car. arXiv:1704.07911, 2017. https://arxiv.org/abs/1704.07911
C. Browne et al. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp. 1–43, 2012.
C. Clark and A. Storkey. Training deep convolutional neural networks to play go. ICML, pp. 1766–1774, 2015.
S. Gelly et al. The grand challenge of computer Go: Monte Carlo tree search and extensions. Communications of the ACM, 55, pp. 106–113, 2012.
P. Glynn. Likelihood ratio gradient estimation: an overview, Proceedings of the 1987 Winter Simulation Conference, pp. 366–375, 1987.
I. Grondman, L. Busoniu, G. A. Lopes, and R. Babuska. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, 42(6), pp. 1291–1307, 2012.
H. van Hasselt, A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q-Learning. AAAI, 2016.
N. Heess et al. Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286, 2017. https://arxiv.org/abs/1707.02286Video 1 at:https://www.youtube.com/watch?v=hx_bgoTF7bsVideo 2 at:https://www.youtube.com/watch?v=gn4nRCC9TwQ&feature=youtu.be
S. Kakade. A natural policy gradient. NeurIPS, pp. 1057–1063, 2002.
L. Kocsis and C. Szepesvari. Bandit based monte-carlo planning. ECML, pp. 282–293, 2006.
M. Lai. Giraffe: Using deep reinforcement learning to play chess. arXiv:1509.01549, 2015. https://arxiv.org/abs/1509.01549
S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39), pp. 1–40, 2016. Video at:https://sites.google.com/site/visuomotorpolicy/
M. Lewis et al. Deal or No Deal? End-to-End Learning for Negotiation Dialogues. arXiv:1706.05125, 2017. https://arxiv.org/abs/1706.05125
J. Li, W. Monroe, A. Ritter, M. Galley,, J. Gao, and D. Jurafsky. Deep reinforcement learning for dialogue generation. arXiv:1606.01541, 2016. https://arxiv.org/abs/1606.01541
Y. Li. Deep reinforcement learning: An overview. arXiv:1701.07274, 2017. https://arxiv.org/abs/1701.07274
L.-J. Lin. Reinforcement learning for robots using neural networks. Technical Report, DTIC Document, 1993.
C. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in Go using deep convolutional neural networks. ICLR, 2015.
V. Mnih et al. Human-level control through deep reinforcement learning. Nature, 518 (7540), pp. 529–533, 2015.
V. Mnih et al. Playing atari with deep reinforcement learning. arXiv:1312.5602., 2013. https://arxiv.org/abs/1312.5602
V. Mnih et al. Asynchronous methods for deep reinforcement learning. ICML, pp. 1928–1937, 2016.
V. Mnih, N. Heess, and A. Graves. Recurrent models of visual attention. NeurIPS, pp. 2204–2212, 2014.
A. Moore and C. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1), pp. 103–130, 1993.
M. Müller, M. Enzenberger, B. Arneson, and R. Segal. Fuego - an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2, pp. 259–270, 2010.
K. S. Narendra and K. Parthasarathy. Identification and control of dynamical systems using neural networks. IEEE TNNLS, 1(1), pp. 4–27, 1990.
A. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. Uncertainity in Artificial Intelligence, pp. 406–415, 2000.
J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), pp. 682–697, 2008.
D. Pomerleau. ALVINN: An autonomous land vehicle in a neural network. Technical Report, Carnegie Mellon University, 1989.
G. Rummery, and M. Niranjan. Online Q-learning using connectionist systems (Vol. 37). University of Cambridge, Department of Engineering, 1994.
A. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, pp. 210–229, 1959.
W. Saunders, G. Sastry, A. Stuhlmueller, and O. Evans. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. arXiv:1707.05173, 2017. https://arxiv.org/abs/1707.05173
S. Schaal. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3(6), pp. 233–242, 1999.
T. Schaul, J. Quan, I. Antonoglou, and D. Silver. Prioritized experience replay. arXiv:1511.05952, 2015. https://arxiv.org/abs/1511.05952
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. ICML, 2015.
J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. ICLR, 2016.
I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI, pp. 3776–3784, 2016.
D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529.7587, pp. 484–489, 2016.
D. Silver et al. Mastering the game of go without human knowledge. Nature, 550.7676, pp. 354–359, 2017.
D. Silver et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv, 2017. https://arxiv.org/abs/1712.01815
H. Simon. The Sciences of the Artificial. MIT Press, 1996.
I. Sutskever and V. Nair. Mimicking Go experts with convolutional neural networks. International Conference on Artificial Neural Networks, pp. 101–110, 2008.
R. Sutton. Learning to Predict by the Method of Temporal Differences, Machine Learning, 3, pp. 9–44, 1988.
R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. NeurIPS, pp. 1057–1063, 2000.
G. Tesauro. Practical issues in temporal difference learning. NeurIPS, pp. 259–266, 1992.
G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), pp. 58–68, 1995.
S. Thrun. Learning to play the game of chess NeurIPS, pp. 1069–1076, 1995.
Y. Tian, Q. Gong, W. Shang, Y. Wu, and L. Zitnick. ELF: An extensive, lightweight and flexible research platform for real-time strategy games. arXiv:1707.01067, 2017. https://arxiv.org/abs/1707.01067
O. Vinyals and Q. Le. A Neural Conversational Model. arXiv:1506.05869, 2015. https://arxiv.org/abs/1506.05869
C. J. H. Watkins. Learning from delayed rewards. PhD Thesis, King’s College, Cambridge, 1989.
C. J. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3–4), pp. 279–292, 1992.
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), pp. 229–256, 1992.
K. Xu et al. Show, attend, and tell: Neural image caption generation with visual attention. ICML, 2015.
V. Zhong, C. Xiong, and R. Socher. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103, 2017. https://arxiv.org/abs/1709.00103
B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv:1611.01578, 2016. https://arxiv.org/abs/1611.01578
https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/ https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/
https://qz.com/639952/googles-ai-won-the-game-go-by-defying-millennia-of-basic-human-instinct/
https://www.youtube.com/watch?v=1L0TKZQcUtA&list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Aggarwal, C. (2023). Deep Reinforcement Learning. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-031-29642-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-29642-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29641-3
Online ISBN: 978-3-031-29642-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)