Skip to main content

Advice-Based Exploration in Model-Based Reinforcement Learning

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2018)


Convergence to an optimal policy using model-based reinforcement learning can require significant exploration of the environment. In some settings such exploration is costly or even impossible, such as in cases where simulators are not available, or where there are prohibitively large state spaces. In this paper we examine the use of advice to guide the search for an optimal policy. To this end we propose a rich language for providing advice to a reinforcement learning agent. Unlike constraints which potentially eliminate optimal policies, advice offers guidance for the exploration, while preserving the guarantee of convergence to an optimal policy. Experimental results on deterministic grid worlds demonstrate the potential for good advice to reduce the amount of exploration required to learn a satisficing or optimal policy, while maintaining robustness in the face of incomplete or misleading advice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46–57 (1977)

    Google Scholar 

  2. Brafman, R., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)

    MathSciNet  MATH  Google Scholar 

  3. Bacchus, F., Kabanza, F.: Using temporal logics to express search control knowledge for planning. Artif. Intell. 116(1–2), 123–191 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  4. De Giacomo, G., Masellis, R.D., Montali, M.: Reasoning on LTL on finite traces: insensitivity to infiniteness. In: AAAI, pp. 1027–1033 (2014)

    Google Scholar 

  5. Baier, J., McIlraith, S.: Planning with first-order temporally extended goals using heuristic search. In: AAAI, pp. 788–795 (2006)

    Google Scholar 

  6. Peng, B., MacGlashan, J., Loftin, R., Littman, M., Roberts, D., Taylor, M.: A need for speed: adapting agent action speed to improve task learning from non-expert humans. In: AAMAS, pp. 957–965 (2016)

    Google Scholar 

  7. Hansen, E., Zilberstein, S.: LAO*: a heuristic search algorithm that finds solutions with loops. Artif. Intell. 129(1–2), 35–62 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  8. McCarthy, J.: Programs with common sense. RLE and MIT Computation Center (1960)

    Google Scholar 

  9. Lacerda, B., Parker, D., Hawes, N.: Optimal and dynamic planning for markov decision processes with co-safe LTL specifications. In: IROS, pp. 1511–1516 (2014)

    Google Scholar 

  10. Wen, M., Topcu, U.: Probably approximately correct learning in stochastic games with temporal logic specifications. In: IJCAI, pp. 3630–3636 (2016)

    Google Scholar 

  11. Andre, D., Russell, S.J.: Programmable reinforcement learning agents. In: NIPS, pp. 1019–1025 (2000)

    Google Scholar 

  12. Shapiro, D., Langley, P., Shachter, R.: Using background knowledge to speed reinforcement learning in physical agents. In: AA, pp. 254–261 (2001)

    Google Scholar 

  13. Isbell, C., Shelton, C.R., Kearns, M., Singh, S., Stone, P.: A social reinforcement learning agent. In: AA, pp. 377–384 (2001)

    Google Scholar 

  14. Knox, W.B., Stone, P.: Tamer: training an agent manually via evaluative reinforcement. In: ICDL, pp. 292–297 (2008)

    Google Scholar 

  15. Judah, K., Roy, S., Fern, A., Dietterich, T.G.: Reinforcement learning via practice and critique advice. In: AAAI, pp. 481–486 (2010)

    Google Scholar 

  16. Griffith, S., Subramanian, K., Scholz, J., Isbell, C., Thomaz, A.L.: Policy shaping: integrating human feedback with reinforcement learning. In: NIPS (2013)

    Google Scholar 

  17. Maclin, R., Shavlik, J.: Creating advice-taking reinforcement learners. Mach. Learn. 22(1–3), 251–281 (1996)

    MATH  Google Scholar 

  18. Maclin, R., Shavlik, J., Torrey, L., Walker, T., Wild, E.: Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In: AAAI, pp. 819–824 (2005)

    Google Scholar 

  19. Kunapuli, G., Odom, P., Shavlik, J.W., Natarajan, S.: Guiding autonomous agents to better behaviors through human advice. In: ICDM, pp. 409–418 (2013)

    Google Scholar 

  20. Krening, S., Harrison, B., Feigh, K., Isbell, C., Riedl, M., Thomaz, A.: Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9(1), 44–55 (2016)

    Article  Google Scholar 

  21. Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. In: ICML, pp. 166–175 (2017)

    Google Scholar 

Download references


This research was supported by NSERC and CONICYT. A preliminary non-archival version of this work was presented at RLDM (2017).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rodrigo Toro Icarte .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Toro Icarte, R., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A. (2018). Advice-Based Exploration in Model-Based Reinforcement Learning. In: Bagheri, E., Cheung, J. (eds) Advances in Artificial Intelligence. Canadian AI 2018. Lecture Notes in Computer Science(), vol 10832. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-89655-7

  • Online ISBN: 978-3-319-89656-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics