Abstract
In this paper we improve learning performance of a risk-aware robot facing navigation tasks by employing transfer learning; that is, we use information from a previously solved task to accelerate learning in a new task. To do so, we transfer risk-aware memoryless stochastic abstract policies into a new task. We show how to incorporate risk-awareness into robotic navigation tasks, in particular when tasks are modeled as stochastic shortest path problems. We then show how to use a modified policy iteration algorithm, called AbsProb-PI, to obtain risk-neutral and risk-prone memoryless stochastic abstract policies. Finally, we propose a method that combines abstract policies, and show how to use the combined policy in a new navigation task. Experiments validate our proposals and show that one can find effective abstract policies that can improve robot behavior in navigation problems.
This research was partly sponsored by FAPESP – Fundação de Amparo à Pesquisa do Estado de São Paulo (Procs. 11/19280-8, 12/02190-9, and 12/19627-0) and CNPq – Conselho Nacional de Desenvolvimento Científico e Tecnológico (Procs. 311058/2011-6 and 305395/2010-6).
Chapter PDF
Similar content being viewed by others
References
Banerjee, B., Stone, P.: General game learning using knowledge transfer. In: Proc. of the Twentieth Int. Jt. Conf. on Artif. Intell., pp. 672–677. AAAI Press (2007)
Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. of Oper. Res. 16(3), 580–595 (1991)
Bianchi, R., Ribeiro, C., Costa, A.: Accelerating autonomous learning by using heuristic selection of actions. J. of Heuristics 14, 135–168 (2008)
Delage, E., Mannor, S.: Percentile optimization for markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203–213 (2010)
Fernández, F., García, J., Veloso, M.: Probabilistic Policy Reuse for inter-task transfer learning. Robotics and Auton. Syst. 58(7), 866–871 (2010)
Howard, R.A., Matheson, J.E.: Risk-sensitive markov decision processes. Management Science 18(7), 356–369 (1972)
Koga, M.L., Silva, V.F., Costa, A.H.R.: Speeding-up reinforcement learning tasks through abstraction and transfer learning. In: Proc. of the Twelfth Int. Jt. Conf. on Auton. Agents and Multiagent Syst. (AAMAS 2013), pp. 119–126 (2013)
Konidaris, G., Scheidwasser, I., Barto, A.: Transfer in reinforcement learning via shared features. J. of Mach. Learn. Res. 13, 1333–1371 (2012)
Li, L., Walsh, T.J., Littman, M.L.: Towards a Unified Theory of State Abstraction for MDPs. In: Proc. of the Ninth Int. Sympos. on Artif. Intell. and Math., pp. 531–539. ISAIM (2006)
Littman, M.L.: Memoryless policies: theoretical limitations and practical results. In: Proc. of the Third Int. Conf. on Simul. of Adapt. Behav.: from Animals to Animats 3, pp. 238–245. MIT Press, Brighton (1994)
Liu, Y., Koenig, S.: Probabilistic planning with nonlinear utility functions. In: ICAPS, pp. 410–413 (2006)
Liu, Y., Stone, P.: Value-function-based transfer for reinforcement learning using structure mapping. In: Proc. of the Twenty-First Natl. Conf. on Artif. Intell., pp. 415–420. AAAI Press (2006)
Mannor, S., Tsitsiklis, J.: Mean-variance optimization in markov decision processes. In: Proc. of the Twenty-Eighth Intl. Conf. on Mach. Learn. (ICML 2011), pp. 177–184. ACM (2011)
Matos, T., Bergamo, Y.P., Silva, V.F., Cozman, F.G., Costa, A.H.R.: Simultaneous Abstract and Concrete Reinforcement Learning. In: Proc. of the Ninth Symp. of Abstr., Reformul., and Approx (SARA 2011), pp. 82–89. AAAI Press (2011)
Minami, R., da Silva, V.F.: Shortest stochastic path with risk sensitive evaluation. In: Batyrshin, I., González Mendoza, M. (eds.) MICAI 2012, Part I. LNCS (LNAI), vol. 7629, pp. 371–382. Springer, Heidelberg (2013)
Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc. (1994)
da Silva, V.F., Pereira, F.A., Costa, A.H.R.: Finding memoryless probabilistic relational policies for inter-task reuse. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M., Matarazzo, B., Yager, R.R. (eds.) IPMU 2012, Part II. CCIS, vol. 298, pp. 107–116. Springer, Heidelberg (2012)
Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable markovian decision processes. In: Proc. of the Eleventh Int. Conf. on Mach. Learn. (ICML 1994), vol. 31, p. 37. Morgan Kaufmann (1994)
Taylor, M.E., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. J. of Mach. Learn. Res. 8(1), 2125–2167 (2007)
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer (2003)
Whittle, P.: Why discount? the rationale of discounting in optimisation problems. In: Heyde, C., Prohorov, Y., Pyke, R., Rachev, S. (eds.) Athens Conference on Applied Probability and Time Series Analysis. Lecture Notes in Statistics, vol. 114, pp. 354–360. Springer, New York (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
da Silva, V.F., Koga, M.L., Cozman, F.G., Costa, A.H.R. (2014). Reusing Risk-Aware Stochastic Abstract Policies in Robotic Navigation Learning. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds) RoboCup 2013: Robot World Cup XVII. RoboCup 2013. Lecture Notes in Computer Science(), vol 8371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44468-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-662-44468-9_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44467-2
Online ISBN: 978-3-662-44468-9
eBook Packages: Computer ScienceComputer Science (R0)