Skip to main content

Reinforcement Learning

  • Chapter
  • First Online:
  • 4382 Accesses

Abstract

One of the primary goals of AI is to produce fully autonomous agents that learn optimal behaviors through trial and error by interacting with their environments. The reinforcement learning paradigm is essentially learning through interaction. It has its root in behaviorist psychology. Reinforcement learning is influenced by optimal control, which is underpinned by mathematical dynamic programming formalism. This chapter deals with reinforcement learning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abdallah, S., & Kaisers, M. (2016). Addressing environment non-stationarity by repeating Q-learning updates. Journal of Machine Learning Research, 17, 1–31.

    MathSciNet  MATH  Google Scholar 

  2. Bakker, B., & Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In Proceedings of the 8th Conference on Intelligent Autonomous Systems (pp. 438–445). Amsterdam, The Netherlands.

    Google Scholar 

  3. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Transaction on Systems, Man and Cybernetics, 13(5), 834–846.

    Article  Google Scholar 

  4. Barto, A. G. (1992). Reinforcement learning and adaptive critic methods. In D. A. White & D. A. Sofge (Eds.), Handbook of intelligent control: Neural, fuzzy, and adaptive approaches (pp. 469–471). New York: Van Nostrand Reinhold.

    Google Scholar 

  5. Bohmer, W., Grunewalde, S., Shen, Y., Musial, M., & Obermayer, K. (2013). Construction of approximation spaces for reinforcement learning. Journal of Machine Learning Research, 14, 2067–2118.

    MathSciNet  MATH  Google Scholar 

  6. Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33–57.

    MATH  Google Scholar 

  7. Choi, J., & Kim, K.-E. (2011). Inverse reinforcement learning in partially observable environments. Journal of Machine Learning Research, 12, 691–730.

    MathSciNet  MATH  Google Scholar 

  8. Dayan, P., & Sejnowski, T. (1994). TD(\(\lambda \)) converges with probability 1. Machine Learning, 14(1), 295–301.

    Google Scholar 

  9. Elfwing, S., Uchibe, E., & Doya, K. (2016). From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning. Neural Networks, 84, 17–27.

    Article  Google Scholar 

  10. Furmston, T., Lever, G., & Barber, D. (2016). Approximate Newton methods for policy search in Markov decision processes. Journal of Machine Learning Research, 17, 1–51.

    MathSciNet  MATH  Google Scholar 

  11. Furnkranz, J., Hullermeier, E., Cheng, W., & Park, S.-H. (2012). Preference-based reinforcement learning: A formal framework and a policy iteration algorithm. Machine Learning, 89, 123–156.

    Article  MathSciNet  MATH  Google Scholar 

  12. Garcia, J., & Fernandez, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437–1480.

    MathSciNet  MATH  Google Scholar 

  13. Ghavamzadeh, M., & Mahadevan, S. (2007). Hierarchical average reward reinforcement learning. Journal of Machine Learning Research, 8, 2629–2669.

    MathSciNet  MATH  Google Scholar 

  14. Greenwald, A., Hall, K., & Serrano, R. (2003). Correlated Q-learning. In Proceedings of the 20th International Conference on Machine Learning (pp. 242–249). Washington, DC.

    Google Scholar 

  15. Heess, N., Silver, D., & Teh, Y. W. (2012). Actor–critic reinforcement learning with energy-based policies. In JMLR Workshop and Conference Proceedings: 10th European Workshop on Reinforcement Learning (EWRL) (Vol. 24, pp. 43–57).

    Google Scholar 

  16. Houk, J., Adams, J. L., & Barto, A. G. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J. C. Houk, J. Davis, & D. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 250–268). Cambridge, MA: MIT Press.

    Google Scholar 

  17. Hu, J., & Wellman, M. P. (2003). Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4, 1039–1069.

    MathSciNet  MATH  Google Scholar 

  18. Hu, Y., Gao, Y., & An, B. (2015). Accelerating multiagent reinforcement learning by equilibrium transfer. IEEE Transactions on Cybernetics, 45(7), 1289–1302.

    Article  Google Scholar 

  19. Hu, Y., Gao, Y., & An, B. (2015). Multiagent reinforcement learning with unshared value functions. IEEE Transactions on Cybernetics, 45(4), 647–662.

    Article  Google Scholar 

  20. Hwang, K.-S., & Lo, C.-Y. (2013). Policy improvement by a model-free Dyna architecture. IEEE Transactions on Neural Networks and Learning Systems, 24(5), 776–788.

    Article  Google Scholar 

  21. Kaelbling, L. P., Littman, M. I., & Moore, A. W. (1996). Reinforcement lerning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.

    Article  Google Scholar 

  22. Kaisers, M., & Tuyls, K. (2010). Frequency adjusted multi-agent Q-learning. In Proceedings of International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (pp. 309–316).

    Google Scholar 

  23. Konda, V., & Tsitsiklis, J. (2000). Actor–critic algorithms. In Advances in neural information processing systems (Vol. 12, pp. 1008–1014).

    Google Scholar 

  24. Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.

    MathSciNet  MATH  Google Scholar 

  25. Legenstein, R., Wilbert, N., & Wiskott, L. (2010). Reinforcement learning on slow features of high-dimensional input streams. PLoS Computational Biology, 6(8), e1000894.

    Article  MathSciNet  Google Scholar 

  26. Leslie, D. S., & Collins, E. J. (2005). Individual Q-learning in normal form games. SIAM Journal on Control and Optimization, 44(2), 495–514.

    Google Scholar 

  27. Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (pp. 157–163). New Brunswick, NJ.

    Google Scholar 

  28. Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games. In Proceedings of the 18th International Conference on Machine Learning (pp. 322–328). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  29. Luciw, M., & Schmidhuber, J. (2012). Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In Proceedings of International Conference on Artificial Neural Networks, LNCS (Vol. 7553, pp. 279–287). Berlin: Springer.

    Google Scholar 

  30. Martin, H. J. A., de Lope, J., & Maravall, D. (2011). Robust high performance reinforcement learning through weighted \(k\)-nearest neighbors. Neurocomputing, 74(8), 1251–1259.

    Google Scholar 

  31. Mihatsch, O., & Neuneier, R. (2014). Risk-sensitive reinforcement learning. Machine Learning, 49(2), 267–290.

    MATH  Google Scholar 

  32. Narendra, K. S., & Thathachar, M. A. L. (1974). Learning automata: A survey. IEEE Transactions on Systems, Man, and Cybernetics, 4(4), 323–334.

    Article  MathSciNet  MATH  Google Scholar 

  33. Parisi, S., Tangkaratt, V., Peters, J., & Khan, M. E. (2019). TD-regularized actor-critic methods. Machine Learning. https://doi.org/10.1007/s10994-019-05788-0.

    Article  MathSciNet  MATH  Google Scholar 

  34. Peng, J., & Williams, R. J. (1994). Incremental multi-step Q-learning. In: Proceedings of the 11th International Conference on Machine Learning (pp. 226–232). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  35. Piot, B., Geist, M., & Pietquin, O. (2017). Bridging the gap between imitation learning and inverse reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, 28(8), 1814–1826.

    Article  MathSciNet  Google Scholar 

  36. Potjans, W., Morrison, A., & Diesmann, M. (2009). A spiking neural network model of an actor–critic learning agent. Neural Computation, 21, 301–339.

    Article  MathSciNet  MATH  Google Scholar 

  37. Reynolds, J. N., Hyland, B. I., & Wickens, J. R. (2001). A cellular mechanism of reward-related learning. Nature, 413, 67–70.

    Article  Google Scholar 

  38. Rummery, G., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University.

    Google Scholar 

  39. Sallans, B., & Hinton, G. E. (2004). Reinforcement learning with factored states and actions. Journal of Machine Learning Research, 5, 1063–1088.

    MathSciNet  MATH  Google Scholar 

  40. Sastry, P. S., Phansalkar, V. V., & Thathachar, M. A. L. (1994). Decentralised learning of Nash equilibria in multiperson stochastic games with incomplete information. IEEE Transactions on Systems, Man, and Cybernetics, 24, 769–777.

    Article  MathSciNet  MATH  Google Scholar 

  41. Schaal, S. (1999). Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3(6), 233–242.

    Article  Google Scholar 

  42. Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1–27.

    Article  MathSciNet  Google Scholar 

  43. Sutton, R. S. (1988). Learning to predict by the method of temporal difference. Machine Learning, 3(1), 9–44.

    Google Scholar 

  44. Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th International Conference on Machine Learning (pp. 216–224). Austin, TX.

    Google Scholar 

  45. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  46. Thathachar, M. A. L., & Sastry, P. S. (2002). Varieties of learning automata: An overview. IEEE Transactions on Systems, Man, and Cybernetics B, 32(6), 711–722.

    Article  Google Scholar 

  47. Tsetlin, M. L. (1973). Automata theory and modeling of biological systems. New York: Academic.

    MATH  Google Scholar 

  48. Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674–690.

    Article  MathSciNet  MATH  Google Scholar 

  49. van Seijen, H., Mahmood, A. R., Pilarski, P. M., Machado, M. C., & Sutton, R. S. (2016). True online temporal-difference learning. Journal of Machine Learning Research, 17, 1–40.

    MathSciNet  MATH  Google Scholar 

  50. Watkins, C. J. H. C. (1989). Learning from Delayed Rewards. Ph.D. thesis, Department of Computer Science, King’s College, Cambridge University, UK.

    Google Scholar 

  51. Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279–292.

    MATH  Google Scholar 

  52. Werbos, P. J. (1990). Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 3, 179–189.

    Article  Google Scholar 

  53. Zhao, T., Hachiya, H., Tangkaratt, V., Morimoto, J., & Sugiyama, M. (2013). Efficient sample reuse in policy gradients with parameter-based exploration. Neural Networks, 25(6), 1512–1547.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag London Ltd., part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Du, KL., Swamy, M.N.S. (2019). Reinforcement Learning. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-7452-3_17

Download citation

Publish with us

Policies and ethics