Reusing Risk-Aware Stochastic Abstract Policies in Robotic Navigation Learning

da Silva, Valdinei Freire; Koga, Marcelo Li; Cozman, Fábio Gagliardi; Costa, Anna Helena Reali

doi:10.1007/978-3-662-44468-9_23

Valdinei Freire da Silva²³,
Marcelo Li Koga²⁴,
Fábio Gagliardi Cozman²⁴ &
…
Anna Helena Reali Costa²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8371))

Included in the following conference series:

Robot Soccer World Cup

3104 Accesses

Abstract

In this paper we improve learning performance of a risk-aware robot facing navigation tasks by employing transfer learning; that is, we use information from a previously solved task to accelerate learning in a new task. To do so, we transfer risk-aware memoryless stochastic abstract policies into a new task. We show how to incorporate risk-awareness into robotic navigation tasks, in particular when tasks are modeled as stochastic shortest path problems. We then show how to use a modified policy iteration algorithm, called AbsProb-PI, to obtain risk-neutral and risk-prone memoryless stochastic abstract policies. Finally, we propose a method that combines abstract policies, and show how to use the combined policy in a new navigation task. Experiments validate our proposals and show that one can find effective abstract policies that can improve robot behavior in navigation problems.

This research was partly sponsored by FAPESP – Fundação de Amparo à Pesquisa do Estado de São Paulo (Procs. 11/19280-8, 12/02190-9, and 12/19627-0) and CNPq – Conselho Nacional de Desenvolvimento Científico e Tecnológico (Procs. 311058/2011-6 and 305395/2010-6).

Download to read the full chapter text

Chapter PDF

A fully distributed multi-robot navigation method without pre-allocating target positions

Article 10 April 2021

SAC-PER: A Navigation Method Based on Deep Reinforcement Learning Under Uncertain Environments

Practical Bayesian Inverse Reinforcement Learning for Robot Navigation

Keywords

References

Banerjee, B., Stone, P.: General game learning using knowledge transfer. In: Proc. of the Twentieth Int. Jt. Conf. on Artif. Intell., pp. 672–677. AAAI Press (2007)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. of Oper. Res. 16(3), 580–595 (1991)
Article MATH MathSciNet Google Scholar
Bianchi, R., Ribeiro, C., Costa, A.: Accelerating autonomous learning by using heuristic selection of actions. J. of Heuristics 14, 135–168 (2008)
Article Google Scholar
Delage, E., Mannor, S.: Percentile optimization for markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203–213 (2010)
Article MATH MathSciNet Google Scholar
Fernández, F., García, J., Veloso, M.: Probabilistic Policy Reuse for inter-task transfer learning. Robotics and Auton. Syst. 58(7), 866–871 (2010)
Article Google Scholar
Howard, R.A., Matheson, J.E.: Risk-sensitive markov decision processes. Management Science 18(7), 356–369 (1972)
Article MATH MathSciNet Google Scholar
Koga, M.L., Silva, V.F., Costa, A.H.R.: Speeding-up reinforcement learning tasks through abstraction and transfer learning. In: Proc. of the Twelfth Int. Jt. Conf. on Auton. Agents and Multiagent Syst. (AAMAS 2013), pp. 119–126 (2013)
Google Scholar
Konidaris, G., Scheidwasser, I., Barto, A.: Transfer in reinforcement learning via shared features. J. of Mach. Learn. Res. 13, 1333–1371 (2012)
MATH MathSciNet Google Scholar
Li, L., Walsh, T.J., Littman, M.L.: Towards a Unified Theory of State Abstraction for MDPs. In: Proc. of the Ninth Int. Sympos. on Artif. Intell. and Math., pp. 531–539. ISAIM (2006)
Google Scholar
Littman, M.L.: Memoryless policies: theoretical limitations and practical results. In: Proc. of the Third Int. Conf. on Simul. of Adapt. Behav.: from Animals to Animats 3, pp. 238–245. MIT Press, Brighton (1994)
Google Scholar
Liu, Y., Koenig, S.: Probabilistic planning with nonlinear utility functions. In: ICAPS, pp. 410–413 (2006)
Google Scholar
Liu, Y., Stone, P.: Value-function-based transfer for reinforcement learning using structure mapping. In: Proc. of the Twenty-First Natl. Conf. on Artif. Intell., pp. 415–420. AAAI Press (2006)
Google Scholar
Mannor, S., Tsitsiklis, J.: Mean-variance optimization in markov decision processes. In: Proc. of the Twenty-Eighth Intl. Conf. on Mach. Learn. (ICML 2011), pp. 177–184. ACM (2011)
Google Scholar
Matos, T., Bergamo, Y.P., Silva, V.F., Cozman, F.G., Costa, A.H.R.: Simultaneous Abstract and Concrete Reinforcement Learning. In: Proc. of the Ninth Symp. of Abstr., Reformul., and Approx (SARA 2011), pp. 82–89. AAAI Press (2011)
Google Scholar
Minami, R., da Silva, V.F.: Shortest stochastic path with risk sensitive evaluation. In: Batyrshin, I., González Mendoza, M. (eds.) MICAI 2012, Part I. LNCS (LNAI), vol. 7629, pp. 371–382. Springer, Heidelberg (2013)
Chapter Google Scholar
Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc. (1994)
Google Scholar
da Silva, V.F., Pereira, F.A., Costa, A.H.R.: Finding memoryless probabilistic relational policies for inter-task reuse. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M., Matarazzo, B., Yager, R.R. (eds.) IPMU 2012, Part II. CCIS, vol. 298, pp. 107–116. Springer, Heidelberg (2012)
Chapter Google Scholar
Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable markovian decision processes. In: Proc. of the Eleventh Int. Conf. on Mach. Learn. (ICML 1994), vol. 31, p. 37. Morgan Kaufmann (1994)
Google Scholar
Taylor, M.E., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. J. of Mach. Learn. Res. 8(1), 2125–2167 (2007)
MATH MathSciNet Google Scholar
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer (2003)
Google Scholar
Whittle, P.: Why discount? the rationale of discounting in optimisation problems. In: Heyde, C., Prohorov, Y., Pyke, R., Rachev, S. (eds.) Athens Conference on Applied Probability and Time Series Analysis. Lecture Notes in Statistics, vol. 114, pp. 354–360. Springer, New York (1996)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo, Brazil
Valdinei Freire da Silva
Escola Politécnica, Universidade de São Paulo, São Paulo, Brazil
Marcelo Li Koga, Fábio Gagliardi Cozman & Anna Helena Reali Costa

Authors

Valdinei Freire da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Li Koga
View author publications
You can also search for this author in PubMed Google Scholar
Fábio Gagliardi Cozman
View author publications
You can also search for this author in PubMed Google Scholar
Anna Helena Reali Costa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Institute VI: Autonomous Intelligent Systems, University Bonn, Friedrich-Ebert-Allee 144, 53113, Bonn, Germany
Sven Behnke
School of Computing Science, Computer Science Department, Carnegie Mellon University, 500 Forbes Avenue, 15213-3890, USA, Pittsburgh, PA, USA
Manuela Veloso
Faculty of Science, Informatics Institute, Intelligent Robotics Lab, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands
Arnoud Visser
Institute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang University, 310027, Hangzhou, China
Rong Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

da Silva, V.F., Koga, M.L., Cozman, F.G., Costa, A.H.R. (2014). Reusing Risk-Aware Stochastic Abstract Policies in Robotic Navigation Learning. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds) RoboCup 2013: Robot World Cup XVII. RoboCup 2013. Lecture Notes in Computer Science(), vol 8371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44468-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-662-44468-9_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44467-2
Online ISBN: 978-3-662-44468-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics