Skip to main content

From Reinforcement Learning to Deep Reinforcement Learning: An Overview

  • Chapter
  • First Online:
Braverman Readings in Machine Learning. Key Ideas from Inception to Current State

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11100))

Abstract

This article provides a brief overview of reinforcement learning, from its origins to current research trends, including deep reinforcement learning, with an emphasis on first principles.

G. Hocquet—Work performed while visiting the University of California, Irvine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM (2004)

    Google Scholar 

  2. Agostinelli, F., Ceglia, N., Shahbaba, B., Sassone-Corsi, P., Baldi, P.: What time is it? deep learning approaches for circadian rhythms. Bioinformatics 32(12), i8–i17 (2016)

    Article  Google Scholar 

  3. Anderson, C.W.: Learning to control an inverted pendulum using neural networks. Control Syst. Mag. IEEE 9(3), 31–37 (1989)

    Article  Google Scholar 

  4. Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents. In: AAAI/IAAI, pp. 119–125 (2002)

    Google Scholar 

  5. Baldi, P., Chauvin, Y.: Neural networks for fingerprint recognition. Neural Comput. 5(3), 402–418 (1993)

    Article  Google Scholar 

  6. Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem. J. Mach. Learn. Res. 4, 575–602 (2003)

    MATH  Google Scholar 

  7. Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308 (2014)

    Article  Google Scholar 

  8. Bellemare, M.G., Ostrovski, G., Guez, A., Thomas, P.S., Munos, R.: Increasing the action gap: new operators for reinforcement learning. In: AAAI, pp. 1476–1483 (2016)

    Google Scholar 

  9. Bellman, R.: The theory of dynamic programming. Technical report, DTIC Document (1954)

    Google Scholar 

  10. Blundell, C., et al.: Model-free episodic control. arXiv preprint arXiv:1606.04460 (2016)

  11. Boyan, J., Moore, A.W.: Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, pp. 369–376 (1995)

    Google Scholar 

  12. Boyan, J.A., Littman, M.L., et al.: Packet routing in dynamically changing networks: a reinforcement learning approach. In: Advances in Neural Information Processing Systems, pp. 671–671 (1994)

    Google Scholar 

  13. Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)

    MathSciNet  MATH  Google Scholar 

  14. Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 38(2), 156–172 (2008)

    Article  Google Scholar 

  15. Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators, vol. 39. CRC Press, Boca Raton (2010)

    MATH  Google Scholar 

  16. Cassandra, A.R., Kaelbling, L.P., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: AAAI, vol. 94, p. 1023–1028 (1994)

    Google Scholar 

  17. Chiappa, S., Racaniere, S., Wierstra, D., Mohamed, S.: Recurrent environment simulators. arXiv preprint arXiv:1704.02254 (2017)

  18. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7

    Chapter  Google Scholar 

  19. Crites, R., Barto, A.: Improving elevator performance using reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 8. Citeseer (1996)

    Google Scholar 

  20. Cun, Y.L., et al.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems, pp. 396–404. Morgan Kaufmann, San Mateo (1990)

    Google Scholar 

  21. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. (MCSS) 2(4), 303–314 (1989)

    Article  MathSciNet  Google Scholar 

  22. Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 150–159. Morgan Kaufmann Publishers Inc. (1999)

    Google Scholar 

  23. Di Lena, P., Nagata, K., Baldi, P.: Deep architectures for protein contact map prediction. Bioinformatics 28, 2449–2457 (2012). https://doi.org/10.1093/bioinformatics/bts475. First published online: July 30, 2012

    Article  Google Scholar 

  24. Dietterich, T.G.: An overview of MAXQ hierarchical reinforcement learning. In: Choueiry, B.Y., Walsh, T. (eds.) SARA 2000. LNCS (LNAI), vol. 1864, pp. 26–44. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44914-0_2

    Google Scholar 

  25. Dong, D., Chen, C., Li, H., Tarn, T.J.: Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(5), 1207–1220 (2008)

    Article  Google Scholar 

  26. Dorigo, M., Gambardella, L.: Ant-Q: a reinforcement learning approach to the traveling salesman problem. In: Proceedings of ML-95, Twelfth International Conference on Machine Learning, pp. 252–260 (2014)

    Google Scholar 

  27. Drake, A.W.: Observation of a Markov process through a noisy channel. Ph.D. thesis, Massachusetts Institute of Technology (1962)

    Google Scholar 

  28. Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Mach. Learn. 43(1–2), 7–52 (2001)

    Article  Google Scholar 

  29. Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)

    Article  Google Scholar 

  30. de Farias, D.P., Van Roy, B.: The linear programming approach to approximate dynamic programming. Oper. Res. 51(6), 850–865 (2003)

    Article  MathSciNet  Google Scholar 

  31. Feng, Z., Zilberstein, S.: Region-based incremental pruning for POMDPs. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 146–153. AUAI Press (2004)

    Google Scholar 

  32. Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2016)

    Article  MathSciNet  Google Scholar 

  33. Gosavi, A.: Reinforcement learning: a tutorial survey and recent advances. INFORMS J. Comput. 21(2), 178–192 (2009)

    Article  MathSciNet  Google Scholar 

  34. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)

    Google Scholar 

  35. Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. J. Artif. Intell. Res. 19, 399–468 (2003)

    Article  MathSciNet  Google Scholar 

  36. Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: ICML, vol. 2, pp. 227–234 (2002)

    Google Scholar 

  37. Hasselt, H.V.: Double q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)

    Google Scholar 

  38. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)

  39. Hebb, D.O.: The Organization of Behavior: A Neuropsychological Approach. Wiley, New York (1949)

    Google Scholar 

  40. Heinrich, J., Lanctot, M., Silver, D.: Fictitious self-play in extensive-form games. In: International Conference on Machine Learning (ICML), pp. 805–813 (2015)

    Google Scholar 

  41. Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121 (2016)

  42. Holland, J.H.: Genetic algorithms and the optimal allocation of trials. SIAM J. Comput. 2(2), 88–105 (1973)

    Article  MathSciNet  Google Scholar 

  43. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)

    Article  Google Scholar 

  44. Howard, R.A.: Dynamic programming and Markov processes (1960)

    Google Scholar 

  45. Hutter, M.: Feature reinforcement learning: Part I. Unstructured MDPs. J. Artif. Gen. Intell. 1(1), 3–24 (2009)

    Google Scholar 

  46. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Article  Google Scholar 

  47. Kandel, E.R., Schwartz, J.H., Jessell, T.M.: Principles of Neural Science, vol. 4. McGraw-hill, New York (2000)

    Google Scholar 

  48. Kayala, M., Azencott, C., Chen, J., Baldi, P.: Learning to predict chemical reactions. J. Chem. Inf. Model. 51(9), 2209–2222 (2011)

    Article  Google Scholar 

  49. Kayala, M., Baldi, P.: Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J. Chem. Inf. Model. 52(10), 2526–2540 (2012)

    Article  Google Scholar 

  50. Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Mach. Learn. 49(2–3), 193–208 (2002)

    Article  Google Scholar 

  51. Keerthi, S.S., Ravindran, B.: A tutorial survey of reinforcement learning. Sadhana 19(6), 851–889 (1994)

    Article  MathSciNet  Google Scholar 

  52. Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013). p. 0278364913495721

    Article  Google Scholar 

  53. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  54. Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: NIPS. 13, 1008–1014 (1999)

    Google Scholar 

  55. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  56. Lai, M.: Giraffe: Using deep reinforcement learning to play chess. arXiv preprint arXiv:1509.01549 (2015)

  57. Leibfried, F., Kushman, N., Hofmann, K.: A deep learning approach for joint video frame and reward prediction in atari games. arXiv preprint arXiv:1611.07078 (2016)

  58. Levin, E., Pieraccini, R., Eckert, W.: A stochastic model of human-machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Process. 8(1), 11–23 (2000)

    Article  Google Scholar 

  59. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)

    MathSciNet  MATH  Google Scholar 

  60. Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. In: International Symposium on Experimental Robotics (2016)

    Google Scholar 

  61. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2016)

    Google Scholar 

  62. Lin, C.T., Lee, C.G.: Reinforcement structure/parameter learning for neural-network-based fuzzy logic control systems. IEEE Trans. Fuzzy Syst. 2(1), 46–63 (1994)

    Article  Google Scholar 

  63. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eleventh International Conference on Machine Learning, vol. 157, pp. 157–163 (1994)

    Google Scholar 

  64. Littman, M.L.: Algorithms for sequential decision making. Ph.D. thesis, Brown University (1996)

    Google Scholar 

  65. Lusci, A., Pollastri, G., Baldi, P.: Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53(7), 1563–1575 (2013)

    Article  Google Scholar 

  66. McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. Computer Science Department Faculty Publication Series, p. 8 (2001)

    Google Scholar 

  67. Michie, D.: Trial and error. In: Science Survey, Part 2, pp. 129–145 (1961)

    Google Scholar 

  68. Michie, D.: Experiments on the mechanization of game-learning part I. Characterization of the model and its parameters. Comput. J. 6(3), 232–236 (1963)

    Article  MathSciNet  Google Scholar 

  69. Michie, D., Chambers, R.A.: Boxes: an experiment in adaptive control. Mach. Intell. 2(2), 137–152 (1968)

    MATH  Google Scholar 

  70. Minsky, M.: Steps toward artificial intelligence. Proc. IRE 49(1), 8–30 (1961)

    Article  MathSciNet  Google Scholar 

  71. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML) (2016)

    Google Scholar 

  72. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  73. Moody, J., Saffell, M.: Reinforcement learning for trading. In: Advances in Neural Information Processing Systems, pp. 917–923 (1999)

    Google Scholar 

  74. Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. J. Artif. Intell. Res. (JAIR) 11, 241–276 (1999)

    Article  Google Scholar 

  75. Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Logic Program. 19, 629–679 (1994)

    Article  MathSciNet  Google Scholar 

  76. Nair, A., et al.: Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 (2015)

  77. Ng, A.Y., et al.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35

    Chapter  Google Scholar 

  78. Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: ICML, pp. 663–670 (2000)

    Google Scholar 

  79. Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in atari games. In: Advances in Neural Information Processing Systems, pp. 2863–2871 (2015)

    Google Scholar 

  80. Oh, J., Singh, S., Lee, H.: Value prediction network. In: Advances in Neural Information Processing Systems, pp. 6120–6130 (2017)

    Google Scholar 

  81. Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Mach. Learn. 49(2–3), 161–178 (2002)

    Article  Google Scholar 

  82. Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)

    Article  MathSciNet  Google Scholar 

  83. Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems, pp. 1043–1049 (1998)

    Google Scholar 

  84. Pascanu, R., et al.: Learning model-based planning from scratch. arXiv preprint arXiv:1707.06170 (2017)

  85. Pashenkova, E., Rish, I., Dechter, R.: Value iteration and policy iteration algorithms for Markov decision problem. In: AAAI 1996, Workshop on Structural Issues in Planning and Temporal Reasoning. Citeseer (1996)

    Google Scholar 

  86. Poupart, P., Boutilier, C.: VDCBPI: an approximate scalable algorithm for large POMDPs. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2004)

    Google Scholar 

  87. Powers, R., Shoham, Y.: New criteria and a new algorithm for learning in multi-agent systems. In: Advances in Neural Information Processing Systems, pp. 1089–1096 (2004)

    Google Scholar 

  88. Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, vol. 98, pp. 463–471. Citeseer (1998)

    Google Scholar 

  89. Ross, S.M.: Introduction to Stochastic Dynamic Programming. Academic press, Norwell (2014))

    Google Scholar 

  90. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering (1994)

    Google Scholar 

  91. Rusu, A.A., et al.: Policy distillation. In: International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  92. Sadowski, P., Collado, J., Whiteson, D., Baldi, P.: Deep learning, dark knowledge, and dark matter. In: Journal of Machine Learning Research, Workshop and Conference Proceedings, vol. 42, pp. 81–97 (2015)

    Google Scholar 

  93. Samuel, A.L.: Some studies in machine learning using the game of checkers. II. Recent progress. IBM J. Res. Dev. 11(6), 601–617 (1967)

    Article  Google Scholar 

  94. Santamaría, J.C., Sutton, R.S., Ram, A.: Experiments with reinforcement learning in problems with continuous state and action spaces. Adapt. Behav. 6(2), 163–217 (1997)

    Article  Google Scholar 

  95. Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning (ICML), pp. 1312–1320 (2015)

    Google Scholar 

  96. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  97. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  98. Sherstov, A.A., Stone, P.: On continuous-action Q-learning via tile coding function approximation. Under Review (2004)

    Google Scholar 

  99. Silver, D., et al.: The predictron: end-to-end learning and planning. arXiv preprint arXiv:1612.08810 (2016)

  100. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  101. Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)

  102. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  103. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)

    Article  Google Scholar 

  104. Singh, S., Bertsekas, D.: Reinforcement learning for dynamic channel allocation in cellular telephone systems. In: Advances in Neural Information Processing Systems, pp. 974–980 (1997)

    Google Scholar 

  105. Singh, S.P., Jaakkola, T.S., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision processes. In: ICML, pp. 284–292 (1994)

    Google Scholar 

  106. Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22(1–3), 123–158 (1996)

    MATH  Google Scholar 

  107. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)

    Google Scholar 

  108. Spaan, M.T., Spaan, M.T.: A point-based POMDP algorithm for robot planning. In: 2004 IEEE International Conference on Robotics and Automation, Proceedings, ICRA 2004, vol. 3, pp. 2399–2404. IEEE (2004)

    Google Scholar 

  109. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2368–2376 (2015)

    Google Scholar 

  110. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  111. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  112. Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Machine Learning Proceedings 1990, pp. 216–224. Elsevier (1990)

    Google Scholar 

  113. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  114. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)

    Article  MathSciNet  Google Scholar 

  115. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  116. Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 879–886. ACM (2007)

    Google Scholar 

  117. Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995)

    Article  Google Scholar 

  118. Thorndike, E.L.: Animal Intelligence: Experimental Studies. Transaction Publishers, New York (1965)

    Google Scholar 

  119. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)

    Article  MathSciNet  Google Scholar 

  120. Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)

    Google Scholar 

  121. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI, pp. 2094–2100 (2016)

    Google Scholar 

  122. Wang, X., Sandholm, T.: Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Advances in Neural Information Processing Systems, pp. 1571–1578 (2002)

    Google Scholar 

  123. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    Article  Google Scholar 

  124. Watter, M., Springenberg, J., Boedecker, J., Riedmiller, M.: Embed to control: a locally linear latent dynamics model for control from raw images. In: Advances in Neural Information Processing Systems, pp. 2746–2754 (2015)

    Google Scholar 

  125. Weber, T., et al.: Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203 (2017)

  126. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)

    MATH  Google Scholar 

  127. Wu, L., Baldi, P.: A scalable machine learning approach to go. In: Weiss, Y., Scholkopf, B., Editors, J.P. (eds.) NIPS 2006. MIT Press, Cambridge (2007)

    Google Scholar 

  128. Wu, L., Baldi, P.: Learning to play go using recursive neural networks. Neural Netw. 21(9), 1392–1400 (2008)

    Article  Google Scholar 

  129. Zhang, W., Dietterich, T.G.: High-performance job-shop scheduling with a time-delay td network. In: Advances in Neural Information Processing Systems, vol. 8, pp. 1024–1030 (1996)

    Google Scholar 

  130. Zhang, W.: Algorithms for partially observable Markov decision processes. Ph.D. thesis, Citeseer (2001)

    Google Scholar 

  131. Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12(10), 931–934 (2015)

    Article  Google Scholar 

Download references

Acknowledgment

This research was in part supported by National Science Foundation grant IIS-1550705 and a Google Faculty Research Award to PB.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Baldi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Agostinelli, F., Hocquet, G., Singh, S., Baldi, P. (2018). From Reinforcement Learning to Deep Reinforcement Learning: An Overview. In: Rozonoer, L., Mirkin, B., Muchnik, I. (eds) Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. Lecture Notes in Computer Science(), vol 11100. Springer, Cham. https://doi.org/10.1007/978-3-319-99492-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99492-5_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99491-8

  • Online ISBN: 978-3-319-99492-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics