Advice-Based Exploration in Model-Based Reinforcement Learning

Toro Icarte, Rodrigo; Klassen, Toryn Q.; Valenzano, Richard Anthony; McIlraith, Sheila A.

doi:10.1007/978-3-319-89656-4_6

Rodrigo Toro Icarte ORCID: orcid.org/0000-0002-7734-099X^15,16,
Toryn Q. Klassen¹⁵,
Richard Anthony Valenzano^15,17 &
…
Sheila A. McIlraith¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10832))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

3303 Accesses
2 Citations

Abstract

Convergence to an optimal policy using model-based reinforcement learning can require significant exploration of the environment. In some settings such exploration is costly or even impossible, such as in cases where simulators are not available, or where there are prohibitively large state spaces. In this paper we examine the use of advice to guide the search for an optimal policy. To this end we propose a rich language for providing advice to a reinforcement learning agent. Unlike constraints which potentially eliminate optimal policies, advice offers guidance for the exploration, while preserving the guarantee of convergence to an optimal policy. Experimental results on deterministic grid worlds demonstrate the potential for good advice to reduce the amount of exploration required to learn a satisficing or optimal policy, while maintaining robustness in the face of incomplete or misleading advice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46–57 (1977)
Google Scholar
Brafman, R., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)
MathSciNet MATH Google Scholar
Bacchus, F., Kabanza, F.: Using temporal logics to express search control knowledge for planning. Artif. Intell. 116(1–2), 123–191 (2000)
Article MathSciNet MATH Google Scholar
De Giacomo, G., Masellis, R.D., Montali, M.: Reasoning on LTL on finite traces: insensitivity to infiniteness. In: AAAI, pp. 1027–1033 (2014)
Google Scholar
Baier, J., McIlraith, S.: Planning with first-order temporally extended goals using heuristic search. In: AAAI, pp. 788–795 (2006)
Google Scholar
Peng, B., MacGlashan, J., Loftin, R., Littman, M., Roberts, D., Taylor, M.: A need for speed: adapting agent action speed to improve task learning from non-expert humans. In: AAMAS, pp. 957–965 (2016)
Google Scholar
Hansen, E., Zilberstein, S.: LAO*: a heuristic search algorithm that finds solutions with loops. Artif. Intell. 129(1–2), 35–62 (2001)
Article MathSciNet MATH Google Scholar
McCarthy, J.: Programs with common sense. RLE and MIT Computation Center (1960)
Google Scholar
Lacerda, B., Parker, D., Hawes, N.: Optimal and dynamic planning for markov decision processes with co-safe LTL specifications. In: IROS, pp. 1511–1516 (2014)
Google Scholar
Wen, M., Topcu, U.: Probably approximately correct learning in stochastic games with temporal logic specifications. In: IJCAI, pp. 3630–3636 (2016)
Google Scholar
Andre, D., Russell, S.J.: Programmable reinforcement learning agents. In: NIPS, pp. 1019–1025 (2000)
Google Scholar
Shapiro, D., Langley, P., Shachter, R.: Using background knowledge to speed reinforcement learning in physical agents. In: AA, pp. 254–261 (2001)
Google Scholar
Isbell, C., Shelton, C.R., Kearns, M., Singh, S., Stone, P.: A social reinforcement learning agent. In: AA, pp. 377–384 (2001)
Google Scholar
Knox, W.B., Stone, P.: Tamer: training an agent manually via evaluative reinforcement. In: ICDL, pp. 292–297 (2008)
Google Scholar
Judah, K., Roy, S., Fern, A., Dietterich, T.G.: Reinforcement learning via practice and critique advice. In: AAAI, pp. 481–486 (2010)
Google Scholar
Griffith, S., Subramanian, K., Scholz, J., Isbell, C., Thomaz, A.L.: Policy shaping: integrating human feedback with reinforcement learning. In: NIPS (2013)
Google Scholar
Maclin, R., Shavlik, J.: Creating advice-taking reinforcement learners. Mach. Learn. 22(1–3), 251–281 (1996)
MATH Google Scholar
Maclin, R., Shavlik, J., Torrey, L., Walker, T., Wild, E.: Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In: AAAI, pp. 819–824 (2005)
Google Scholar
Kunapuli, G., Odom, P., Shavlik, J.W., Natarajan, S.: Guiding autonomous agents to better behaviors through human advice. In: ICDM, pp. 409–418 (2013)
Google Scholar
Krening, S., Harrison, B., Feigh, K., Isbell, C., Riedl, M., Thomaz, A.: Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9(1), 44–55 (2016)
Article Google Scholar
Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. In: ICML, pp. 166–175 (2017)
Google Scholar

Download references

Acknowledgement

This research was supported by NSERC and CONICYT. A preliminary non-archival version of this work was presented at RLDM (2017).

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, Canada
Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano & Sheila A. McIlraith
Vector Institute, Toronto, Canada
Rodrigo Toro Icarte
Element AI, Toronto, Canada
Richard Anthony Valenzano

Authors

Rodrigo Toro Icarte
View author publications
You can also search for this author in PubMed Google Scholar
Toryn Q. Klassen
View author publications
You can also search for this author in PubMed Google Scholar
Richard Anthony Valenzano
View author publications
You can also search for this author in PubMed Google Scholar
Sheila A. McIlraith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo Toro Icarte .

Editor information

Editors and Affiliations

Ryerson University, Toronto, Ontario, Canada
Ebrahim Bagheri
McGill University, Montréal, Québec, Canada
Jackie C.K. Cheung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toro Icarte, R., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A. (2018). Advice-Based Exploration in Model-Based Reinforcement Learning. In: Bagheri, E., Cheung, J. (eds) Advances in Artificial Intelligence. Canadian AI 2018. Lecture Notes in Computer Science(), vol 10832. Springer, Cham. https://doi.org/10.1007/978-3-319-89656-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-89656-4_6
Published: 06 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89655-7
Online ISBN: 978-3-319-89656-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics