From Reinforcement Learning to Deep Reinforcement Learning: An Overview

Agostinelli, Forest; Hocquet, Guillaume; Singh, Sameer; Baldi, Pierre

doi:10.1007/978-3-319-99492-5_13

Forest Agostinelli¹⁶,
Guillaume Hocquet¹⁶,
Sameer Singh¹⁶ &
…
Pierre Baldi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11100))

4091 Accesses
19 Citations
6 Altmetric

Abstract

This article provides a brief overview of reinforcement learning, from its origins to current research trends, including deep reinforcement learning, with an emphasis on first principles.

G. Hocquet—Work performed while visiting the University of California, Irvine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM (2004)
Google Scholar
Agostinelli, F., Ceglia, N., Shahbaba, B., Sassone-Corsi, P., Baldi, P.: What time is it? deep learning approaches for circadian rhythms. Bioinformatics 32(12), i8–i17 (2016)
Article Google Scholar
Anderson, C.W.: Learning to control an inverted pendulum using neural networks. Control Syst. Mag. IEEE 9(3), 31–37 (1989)
Article Google Scholar
Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents. In: AAAI/IAAI, pp. 119–125 (2002)
Google Scholar
Baldi, P., Chauvin, Y.: Neural networks for fingerprint recognition. Neural Comput. 5(3), 402–418 (1993)
Article Google Scholar
Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem. J. Mach. Learn. Res. 4, 575–602 (2003)
MATH Google Scholar
Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308 (2014)
Article Google Scholar
Bellemare, M.G., Ostrovski, G., Guez, A., Thomas, P.S., Munos, R.: Increasing the action gap: new operators for reinforcement learning. In: AAAI, pp. 1476–1483 (2016)
Google Scholar
Bellman, R.: The theory of dynamic programming. Technical report, DTIC Document (1954)
Google Scholar
Blundell, C., et al.: Model-free episodic control. arXiv preprint arXiv:1606.04460 (2016)
Boyan, J., Moore, A.W.: Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, pp. 369–376 (1995)
Google Scholar
Boyan, J.A., Littman, M.L., et al.: Packet routing in dynamically changing networks: a reinforcement learning approach. In: Advances in Neural Information Processing Systems, pp. 671–671 (1994)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)
MathSciNet MATH Google Scholar
Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 38(2), 156–172 (2008)
Article Google Scholar
Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators, vol. 39. CRC Press, Boca Raton (2010)
MATH Google Scholar
Cassandra, A.R., Kaelbling, L.P., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: AAAI, vol. 94, p. 1023–1028 (1994)
Google Scholar
Chiappa, S., Racaniere, S., Wierstra, D., Mohamed, S.: Recurrent environment simulators. arXiv preprint arXiv:1704.02254 (2017)
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
Crites, R., Barto, A.: Improving elevator performance using reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 8. Citeseer (1996)
Google Scholar
Cun, Y.L., et al.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems, pp. 396–404. Morgan Kaufmann, San Mateo (1990)
Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. (MCSS) 2(4), 303–314 (1989)
Article MathSciNet Google Scholar
Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 150–159. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Di Lena, P., Nagata, K., Baldi, P.: Deep architectures for protein contact map prediction. Bioinformatics 28, 2449–2457 (2012). https://doi.org/10.1093/bioinformatics/bts475. First published online: July 30, 2012
Article Google Scholar
Dietterich, T.G.: An overview of MAXQ hierarchical reinforcement learning. In: Choueiry, B.Y., Walsh, T. (eds.) SARA 2000. LNCS (LNAI), vol. 1864, pp. 26–44. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44914-0_2
Google Scholar
Dong, D., Chen, C., Li, H., Tarn, T.J.: Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(5), 1207–1220 (2008)
Article Google Scholar
Dorigo, M., Gambardella, L.: Ant-Q: a reinforcement learning approach to the traveling salesman problem. In: Proceedings of ML-95, Twelfth International Conference on Machine Learning, pp. 252–260 (2014)
Google Scholar
Drake, A.W.: Observation of a Markov process through a noisy channel. Ph.D. thesis, Massachusetts Institute of Technology (1962)
Google Scholar
Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Mach. Learn. 43(1–2), 7–52 (2001)
Article Google Scholar
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)
Article Google Scholar
de Farias, D.P., Van Roy, B.: The linear programming approach to approximate dynamic programming. Oper. Res. 51(6), 850–865 (2003)
Article MathSciNet Google Scholar
Feng, Z., Zilberstein, S.: Region-based incremental pruning for POMDPs. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 146–153. AUAI Press (2004)
Google Scholar
Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2016)
Article MathSciNet Google Scholar
Gosavi, A.: Reinforcement learning: a tutorial survey and recent advances. INFORMS J. Comput. 21(2), 178–192 (2009)
Article MathSciNet Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
Google Scholar
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. J. Artif. Intell. Res. 19, 399–468 (2003)
Article MathSciNet Google Scholar
Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: ICML, vol. 2, pp. 227–234 (2002)
Google Scholar
Hasselt, H.V.: Double q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Hebb, D.O.: The Organization of Behavior: A Neuropsychological Approach. Wiley, New York (1949)
Google Scholar
Heinrich, J., Lanctot, M., Silver, D.: Fictitious self-play in extensive-form games. In: International Conference on Machine Learning (ICML), pp. 805–813 (2015)
Google Scholar
Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121 (2016)
Holland, J.H.: Genetic algorithms and the optimal allocation of trials. SIAM J. Comput. 2(2), 88–105 (1973)
Article MathSciNet Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Howard, R.A.: Dynamic programming and Markov processes (1960)
Google Scholar
Hutter, M.: Feature reinforcement learning: Part I. Unstructured MDPs. J. Artif. Gen. Intell. 1(1), 3–24 (2009)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Kandel, E.R., Schwartz, J.H., Jessell, T.M.: Principles of Neural Science, vol. 4. McGraw-hill, New York (2000)
Google Scholar
Kayala, M., Azencott, C., Chen, J., Baldi, P.: Learning to predict chemical reactions. J. Chem. Inf. Model. 51(9), 2209–2222 (2011)
Article Google Scholar
Kayala, M., Baldi, P.: Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J. Chem. Inf. Model. 52(10), 2526–2540 (2012)
Article Google Scholar
Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Mach. Learn. 49(2–3), 193–208 (2002)
Article Google Scholar
Keerthi, S.S., Ravindran, B.: A tutorial survey of reinforcement learning. Sadhana 19(6), 851–889 (1994)
Article MathSciNet Google Scholar
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013). p. 0278364913495721
Article Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: NIPS. 13, 1008–1014 (1999)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lai, M.: Giraffe: Using deep reinforcement learning to play chess. arXiv preprint arXiv:1509.01549 (2015)
Leibfried, F., Kushman, N., Hofmann, K.: A deep learning approach for joint video frame and reward prediction in atari games. arXiv preprint arXiv:1611.07078 (2016)
Levin, E., Pieraccini, R., Eckert, W.: A stochastic model of human-machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Process. 8(1), 11–23 (2000)
Article Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)
MathSciNet MATH Google Scholar
Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. In: International Symposium on Experimental Robotics (2016)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2016)
Google Scholar
Lin, C.T., Lee, C.G.: Reinforcement structure/parameter learning for neural-network-based fuzzy logic control systems. IEEE Trans. Fuzzy Syst. 2(1), 46–63 (1994)
Article Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eleventh International Conference on Machine Learning, vol. 157, pp. 157–163 (1994)
Google Scholar
Littman, M.L.: Algorithms for sequential decision making. Ph.D. thesis, Brown University (1996)
Google Scholar
Lusci, A., Pollastri, G., Baldi, P.: Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53(7), 1563–1575 (2013)
Article Google Scholar
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. Computer Science Department Faculty Publication Series, p. 8 (2001)
Google Scholar
Michie, D.: Trial and error. In: Science Survey, Part 2, pp. 129–145 (1961)
Google Scholar
Michie, D.: Experiments on the mechanization of game-learning part I. Characterization of the model and its parameters. Comput. J. 6(3), 232–236 (1963)
Article MathSciNet Google Scholar
Michie, D., Chambers, R.A.: Boxes: an experiment in adaptive control. Mach. Intell. 2(2), 137–152 (1968)
MATH Google Scholar
Minsky, M.: Steps toward artificial intelligence. Proc. IRE 49(1), 8–30 (1961)
Article MathSciNet Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML) (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Moody, J., Saffell, M.: Reinforcement learning for trading. In: Advances in Neural Information Processing Systems, pp. 917–923 (1999)
Google Scholar
Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. J. Artif. Intell. Res. (JAIR) 11, 241–276 (1999)
Article Google Scholar
Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Logic Program. 19, 629–679 (1994)
Article MathSciNet Google Scholar
Nair, A., et al.: Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 (2015)
Ng, A.Y., et al.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35
Chapter Google Scholar
Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: ICML, pp. 663–670 (2000)
Google Scholar
Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in atari games. In: Advances in Neural Information Processing Systems, pp. 2863–2871 (2015)
Google Scholar
Oh, J., Singh, S., Lee, H.: Value prediction network. In: Advances in Neural Information Processing Systems, pp. 6120–6130 (2017)
Google Scholar
Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Mach. Learn. 49(2–3), 161–178 (2002)
Article Google Scholar
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
Article MathSciNet Google Scholar
Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems, pp. 1043–1049 (1998)
Google Scholar
Pascanu, R., et al.: Learning model-based planning from scratch. arXiv preprint arXiv:1707.06170 (2017)
Pashenkova, E., Rish, I., Dechter, R.: Value iteration and policy iteration algorithms for Markov decision problem. In: AAAI 1996, Workshop on Structural Issues in Planning and Temporal Reasoning. Citeseer (1996)
Google Scholar
Poupart, P., Boutilier, C.: VDCBPI: an approximate scalable algorithm for large POMDPs. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2004)
Google Scholar
Powers, R., Shoham, Y.: New criteria and a new algorithm for learning in multi-agent systems. In: Advances in Neural Information Processing Systems, pp. 1089–1096 (2004)
Google Scholar
Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, vol. 98, pp. 463–471. Citeseer (1998)
Google Scholar
Ross, S.M.: Introduction to Stochastic Dynamic Programming. Academic press, Norwell (2014))
Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering (1994)
Google Scholar
Rusu, A.A., et al.: Policy distillation. In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Sadowski, P., Collado, J., Whiteson, D., Baldi, P.: Deep learning, dark knowledge, and dark matter. In: Journal of Machine Learning Research, Workshop and Conference Proceedings, vol. 42, pp. 81–97 (2015)
Google Scholar
Samuel, A.L.: Some studies in machine learning using the game of checkers. II. Recent progress. IBM J. Res. Dev. 11(6), 601–617 (1967)
Article Google Scholar
Santamaría, J.C., Sutton, R.S., Ram, A.: Experiments with reinforcement learning in problems with continuous state and action spaces. Adapt. Behav. 6(2), 163–217 (1997)
Article Google Scholar
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning (ICML), pp. 1312–1320 (2015)
Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Sherstov, A.A., Stone, P.: On continuous-action Q-learning via tile coding function approximation. Under Review (2004)
Google Scholar
Silver, D., et al.: The predictron: end-to-end learning and planning. arXiv preprint arXiv:1612.08810 (2016)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning (ICML) (2014)
Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Article Google Scholar
Singh, S., Bertsekas, D.: Reinforcement learning for dynamic channel allocation in cellular telephone systems. In: Advances in Neural Information Processing Systems, pp. 974–980 (1997)
Google Scholar
Singh, S.P., Jaakkola, T.S., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision processes. In: ICML, pp. 284–292 (1994)
Google Scholar
Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22(1–3), 123–158 (1996)
MATH Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)
Google Scholar
Spaan, M.T., Spaan, M.T.: A point-based POMDP algorithm for robot planning. In: 2004 IEEE International Conference on Robotics and Automation, Proceedings, ICRA 2004, vol. 3, pp. 2399–2404. IEEE (2004)
Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2368–2376 (2015)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Machine Learning Proceedings 1990, pp. 216–224. Elsevier (1990)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)
Article MathSciNet Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 879–886. ACM (2007)
Google Scholar
Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995)
Article Google Scholar
Thorndike, E.L.: Animal Intelligence: Experimental Studies. Transaction Publishers, New York (1965)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Article MathSciNet Google Scholar
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)
Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI, pp. 2094–2100 (2016)
Google Scholar
Wang, X., Sandholm, T.: Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Advances in Neural Information Processing Systems, pp. 1571–1578 (2002)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Article Google Scholar
Watter, M., Springenberg, J., Boedecker, J., Riedmiller, M.: Embed to control: a locally linear latent dynamics model for control from raw images. In: Advances in Neural Information Processing Systems, pp. 2746–2754 (2015)
Google Scholar
Weber, T., et al.: Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203 (2017)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
MATH Google Scholar
Wu, L., Baldi, P.: A scalable machine learning approach to go. In: Weiss, Y., Scholkopf, B., Editors, J.P. (eds.) NIPS 2006. MIT Press, Cambridge (2007)
Google Scholar
Wu, L., Baldi, P.: Learning to play go using recursive neural networks. Neural Netw. 21(9), 1392–1400 (2008)
Article Google Scholar
Zhang, W., Dietterich, T.G.: High-performance job-shop scheduling with a time-delay td network. In: Advances in Neural Information Processing Systems, vol. 8, pp. 1024–1030 (1996)
Google Scholar
Zhang, W.: Algorithms for partially observable Markov decision processes. Ph.D. thesis, Citeseer (2001)
Google Scholar
Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12(10), 931–934 (2015)
Article Google Scholar

Download references

Acknowledgment

This research was in part supported by National Science Foundation grant IIS-1550705 and a Google Faculty Research Award to PB.

Author information

Authors and Affiliations

University of California - Irvine, Irvine, CA, 92697, USA
Forest Agostinelli, Guillaume Hocquet, Sameer Singh & Pierre Baldi

Authors

Forest Agostinelli
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Hocquet
View author publications
You can also search for this author in PubMed Google Scholar
Sameer Singh
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Baldi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Baldi .

Editor information

Editors and Affiliations

West Newton, MA, USA
Lev Rozonoer
National Research University Higher School of Economics, Moscow, Russia
Boris Mirkin
Rutgers University, Piscataway, NJ, USA
Ilya Muchnik

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Agostinelli, F., Hocquet, G., Singh, S., Baldi, P. (2018). From Reinforcement Learning to Deep Reinforcement Learning: An Overview. In: Rozonoer, L., Mirkin, B., Muchnik, I. (eds) Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. Lecture Notes in Computer Science(), vol 11100. Springer, Cham. https://doi.org/10.1007/978-3-319-99492-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-99492-5_13
Published: 23 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99491-8
Online ISBN: 978-3-319-99492-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics