Model-Based Reinforcement Learning

Ray, Soumya; Tadepalli, Prasad

doi:10.1007/978-0-387-30164-8_556

Soumya Ray &
Prasad Tadepalli

3209 Accesses
3 Citations

Synonyms

Indirect reinforcement learning

Definition

Model-based Reinforcement Learning refers to learning optimal behavior indirectly by learning a model of the environment by taking actions and observing the outcomes that include the next state and the immediate reward. The models predict the outcomes of actions and are used in lieu of or in addition to interaction with the environment to learn optimal policies.

Motivation and Background

Reinforcement Learning (RL) refers to learning to behave optimally in a stochastic environment by taking actions and receiving rewards (Sutton & Barto, 1998). The environment is assumed Markovian in that there is a fixed probability of the next state given the current state and the agent’s action. The agent also receives an immediate reward based on the current state and the action. Models of the next-state distribution and the immediate rewards are referred to as “action models” and, in general, are not known to the learner. The agent’s goal is to...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Recommended Reading

Abbeel, P., Coates, A., Quigley, M., & Ng, A. Y. (2007). An application of reinforcement learning to aerobatic helicopter flight. In Advances in neural information processing systems (Vol. 19, pp. 1–8). Cambridge, MA: MIT Press.
Google Scholar
Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 1–8). ACM Press, New York, USA.
Google Scholar
Atkeson, C. G., & Santamaria, J. C. (1997). A comparison of direct and model-based reinforcement learning. In Proceedings of the international conference on robotics and automation (pp. 20–25). IEEE Press.
Google Scholar
Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proceedings of the fourteenth international conference on machine learning (Vol. 4, pp. 12–20). San Francisco: Morgan Kaufmann.
Google Scholar
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1), 81–138.
Google Scholar
Baxter, J., Tridgell, A., & Weaver, L. (1998). TDLeaf(λ): Combining temporal difference learning with game-tree search. In Proceedings of the ninth Australian conference on neural networks (ACNN’98) (pp. 168–172).
Google Scholar
Brafman, R. I., & Tennenholtz, M. (2002). R-MAX – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 2, 213–231.
Google Scholar
Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Google Scholar
Kearns, M., & Singh, S. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2/3), 209–232.
MATH Google Scholar
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103–130.
Google Scholar
Peng, J., & Williams, R. J. (1993). Efficient learning and planning within the dyna framework. Adaptive Behavior, 1(4), 437–454.
Google Scholar
Puterman, M. L. (1994). Markov decision processes: Discrete dynamic stochastic programming. New York: Wiley.
MATH Google Scholar
Schaal, S., & Atkeson, C. G. (1994). Robot juggling: Implementation of memory-based learning. IEEE Control Systems Magazine, 14(1), 57–71.
Google Scholar
Singh, S., Kearns, M., Litman, D., & Walker, M. (1999) Reinforcement learning for spoken dialogue systems. In Advances in neural information processing systems (Vol. 11, pp. 956–962). MIT Press.
Google Scholar
Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the seventh international conference on machine learning (pp. 216–224). San Francisco: Morgan Kaufmann.
Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Google Scholar
Tadepalli, P., & Ok, D. (1998). Model-based average-reward reinforcement learning. Artificial Intelligence, 100, 177–224.
MATH Google Scholar
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58–68.
Google Scholar
Wang, X., & Dietterich, T. G. (2003). Model-based policy gradient reinforcement learning. In Proceedings of the 20th international conference on machine learning (pp. 776–783). AAAI Press.
Google Scholar
Wilson, A., Fern, A., Ray, S., & Tadepalli, P. (2007). Multi-task reinforcement learning: A hierarchical Bayesian approach. In Proceedings of the 24th international conference on machine learning (pp. 1015–1022). Madison, WI: Omnipress.
Google Scholar
Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In Proceedings of the international joint conference on artificial intelligence (pp. 1114–1120). Morgan Kaufman.
Google Scholar

Download references

Author information

Authors and Affiliations

Authors

Soumya Ray
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Tadepalli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Ray, S., Tadepalli, P. (2011). Model-Based Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_556

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_556
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics