Abstract
Convergence to an optimal policy using model-based reinforcement learning can require significant exploration of the environment. In some settings such exploration is costly or even impossible, such as in cases where simulators are not available, or where there are prohibitively large state spaces. In this paper we examine the use of advice to guide the search for an optimal policy. To this end we propose a rich language for providing advice to a reinforcement learning agent. Unlike constraints which potentially eliminate optimal policies, advice offers guidance for the exploration, while preserving the guarantee of convergence to an optimal policy. Experimental results on deterministic grid worlds demonstrate the potential for good advice to reduce the amount of exploration required to learn a satisficing or optimal policy, while maintaining robustness in the face of incomplete or misleading advice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46–57 (1977)
Brafman, R., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)
Bacchus, F., Kabanza, F.: Using temporal logics to express search control knowledge for planning. Artif. Intell. 116(1–2), 123–191 (2000)
De Giacomo, G., Masellis, R.D., Montali, M.: Reasoning on LTL on finite traces: insensitivity to infiniteness. In: AAAI, pp. 1027–1033 (2014)
Baier, J., McIlraith, S.: Planning with first-order temporally extended goals using heuristic search. In: AAAI, pp. 788–795 (2006)
Peng, B., MacGlashan, J., Loftin, R., Littman, M., Roberts, D., Taylor, M.: A need for speed: adapting agent action speed to improve task learning from non-expert humans. In: AAMAS, pp. 957–965 (2016)
Hansen, E., Zilberstein, S.: LAO*: a heuristic search algorithm that finds solutions with loops. Artif. Intell. 129(1–2), 35–62 (2001)
McCarthy, J.: Programs with common sense. RLE and MIT Computation Center (1960)
Lacerda, B., Parker, D., Hawes, N.: Optimal and dynamic planning for markov decision processes with co-safe LTL specifications. In: IROS, pp. 1511–1516 (2014)
Wen, M., Topcu, U.: Probably approximately correct learning in stochastic games with temporal logic specifications. In: IJCAI, pp. 3630–3636 (2016)
Andre, D., Russell, S.J.: Programmable reinforcement learning agents. In: NIPS, pp. 1019–1025 (2000)
Shapiro, D., Langley, P., Shachter, R.: Using background knowledge to speed reinforcement learning in physical agents. In: AA, pp. 254–261 (2001)
Isbell, C., Shelton, C.R., Kearns, M., Singh, S., Stone, P.: A social reinforcement learning agent. In: AA, pp. 377–384 (2001)
Knox, W.B., Stone, P.: Tamer: training an agent manually via evaluative reinforcement. In: ICDL, pp. 292–297 (2008)
Judah, K., Roy, S., Fern, A., Dietterich, T.G.: Reinforcement learning via practice and critique advice. In: AAAI, pp. 481–486 (2010)
Griffith, S., Subramanian, K., Scholz, J., Isbell, C., Thomaz, A.L.: Policy shaping: integrating human feedback with reinforcement learning. In: NIPS (2013)
Maclin, R., Shavlik, J.: Creating advice-taking reinforcement learners. Mach. Learn. 22(1–3), 251–281 (1996)
Maclin, R., Shavlik, J., Torrey, L., Walker, T., Wild, E.: Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In: AAAI, pp. 819–824 (2005)
Kunapuli, G., Odom, P., Shavlik, J.W., Natarajan, S.: Guiding autonomous agents to better behaviors through human advice. In: ICDM, pp. 409–418 (2013)
Krening, S., Harrison, B., Feigh, K., Isbell, C., Riedl, M., Thomaz, A.: Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9(1), 44–55 (2016)
Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. In: ICML, pp. 166–175 (2017)
Acknowledgement
This research was supported by NSERC and CONICYT. A preliminary non-archival version of this work was presented at RLDM (2017).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Toro Icarte, R., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A. (2018). Advice-Based Exploration in Model-Based Reinforcement Learning. In: Bagheri, E., Cheung, J. (eds) Advances in Artificial Intelligence. Canadian AI 2018. Lecture Notes in Computer Science(), vol 10832. Springer, Cham. https://doi.org/10.1007/978-3-319-89656-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-89656-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89655-7
Online ISBN: 978-3-319-89656-4
eBook Packages: Computer ScienceComputer Science (R0)