Skip to main content

Model-Based Reinforcement Learning

  • Reference work entry
Encyclopedia of Machine Learning

Synonyms

Indirect reinforcement learning

Definition

Model-based Reinforcement Learning refers to learning optimal behavior indirectly by learning a model of the environment by taking actions and observing the outcomes that include the next state and the immediate reward. The models predict the outcomes of actions and are used in lieu of or in addition to interaction with the environment to learn optimal policies.

Motivation and Background

Reinforcement Learning (RL) refers to learning to behave optimally in a stochastic environment by taking actions and receiving rewards (Sutton & Barto, 1998). The environment is assumed Markovian in that there is a fixed probability of the next state given the current state and the agent’s action. The agent also receives an immediate reward based on the current state and the action. Models of the next-state distribution and the immediate rewards are referred to as “action models” and, in general, are not known to the learner. The agent’s goal is to...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Abbeel, P., Coates, A., Quigley, M., & Ng, A. Y. (2007). An application of reinforcement learning to aerobatic helicopter flight. In Advances in neural information processing systems (Vol. 19, pp. 1–8). Cambridge, MA: MIT Press.

    Google Scholar 

  • Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 1–8). ACM Press, New York, USA.

    Google Scholar 

  • Atkeson, C. G., & Santamaria, J. C. (1997). A comparison of direct and model-based reinforcement learning. In Proceedings of the international conference on robotics and automation (pp. 20–25). IEEE Press.

    Google Scholar 

  • Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proceedings of the fourteenth international conference on machine learning (Vol. 4, pp. 12–20). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1), 81–138.

    Google Scholar 

  • Baxter, J., Tridgell, A., & Weaver, L. (1998). TDLeaf(λ): Combining temporal difference learning with game-tree search. In Proceedings of the ninth Australian conference on neural networks (ACNN’98) (pp. 168–172).

    Google Scholar 

  • Brafman, R. I., & Tennenholtz, M. (2002). R-MAX – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 2, 213–231.

    Google Scholar 

  • Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.

    Google Scholar 

  • Kearns, M., & Singh, S. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2/3), 209–232.

    MATH  Google Scholar 

  • Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103–130.

    Google Scholar 

  • Peng, J., & Williams, R. J. (1993). Efficient learning and planning within the dyna framework. Adaptive Behavior, 1(4), 437–454.

    Google Scholar 

  • Puterman, M. L. (1994). Markov decision processes: Discrete dynamic stochastic programming. New York: Wiley.

    MATH  Google Scholar 

  • Schaal, S., & Atkeson, C. G. (1994). Robot juggling: Implementation of memory-based learning. IEEE Control Systems Magazine, 14(1), 57–71.

    Google Scholar 

  • Singh, S., Kearns, M., Litman, D., & Walker, M. (1999) Reinforcement learning for spoken dialogue systems. In Advances in neural information processing systems (Vol. 11, pp. 956–962). MIT Press.

    Google Scholar 

  • Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the seventh international conference on machine learning (pp. 216–224). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

    Google Scholar 

  • Tadepalli, P., & Ok, D. (1998). Model-based average-reward reinforcement learning. Artificial Intelligence, 100, 177–224.

    MATH  Google Scholar 

  • Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58–68.

    Google Scholar 

  • Wang, X., & Dietterich, T. G. (2003). Model-based policy gradient reinforcement learning. In Proceedings of the 20th international conference on machine learning (pp. 776–783). AAAI Press.

    Google Scholar 

  • Wilson, A., Fern, A., Ray, S., & Tadepalli, P. (2007). Multi-task reinforcement learning: A hierarchical Bayesian approach. In Proceedings of the 24th international conference on machine learning (pp. 1015–1022). Madison, WI: Omnipress.

    Google Scholar 

  • Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In Proceedings of the international joint conference on artificial intelligence (pp. 1114–1120). Morgan Kaufman.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Ray, S., Tadepalli, P. (2011). Model-Based Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_556

Download citation

Publish with us

Policies and ethics