Goal-Directed Reinforcement Learning Using Variable Learning Rate

de S. Braga, Arthur P.; Araújo, Aluizio F. R.

doi:10.1007/10692710_14

Arthur P. de S. Braga⁷ &
Aluizio F. R. Araújo⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1515))

Included in the following conference series:

Brazilian Symposium on Artificial Intelligence

296 Accesses

Abstract

This paper proposes and implements a reinforcement learning algorithm for an agent that can learn to navigate in an indoor and initially unknown environment. The agent learns a trajectory between an initial and a goal state through interactions with the environment. The environmental knowledge is encoded in two surfaces: the reward and the penalty surfaces. The former deals primarily with planning to reach the goal whilst the latter deals mainly with reaction to avoid obstacles. Temporal difference learning is the chosen strategy to construct both surfaces. The proposed algorithm is tested for different environments and types of obstacles. The simulation results suggest that the agent is able to reach a target from any point within the environment, avoiding local minimum points. Furthermore, the agent can improve an initial solution, employing a variable learning rate, through multiple visits to the spatial positions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Araújo, A.F.R., Braga, A.P.S.: Reward-Penalty Reinforcement Learning Scheme for Planning and Reactive Behavior. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (1998)
Google Scholar
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on SMC 3(5), 834–846 (1983)
Google Scholar
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artificial Intelligence 73(1), 81–138 (1995)
Article Google Scholar
Brooks, R.A.: A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation RA-2(1), 14–23 (1986)
Google Scholar
Donnart, J.-Y., Meyer, J.-A.: Learning reactive and planning rules in a motivationally autonomous animat. IEEE Transactions on SMC 26(3), 381–395 (1996)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In: Proceedings of IEEE International Conference on Robotics Automation, St. Louis, MO, pp. 500–505 (1995)
Google Scholar
Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goaldirected exploration with reinforcement learning algorithms. Machine Learning 22, 227–250 (1996)
MATH Google Scholar
Millán, J., del Rapid, R.: safe, and incremental learning of navigation strategies. IEEE Transactions on SMC 26, 408–420 (1996)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In: Proceedings of International Conference on Machine Learning, pp. 216–224 (1990)
Google Scholar
Sutton, R.S., Barto, A.: An Introduction to Reinforcement Learning. MIT Press, Bradford Books, Cambridge (1998)
Google Scholar
Thrun, S., Moeller, K., Linden, A.: Planning with an Adaptive World Model. In: Touretzky, D., Lippmann, R. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 3. Morgan Kaufmann, San Francisco (1991)
Google Scholar
Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7, 45–83 (1991)
Google Scholar
Winston, P.H.: Artificial Intelligence. Addison Wesley, Reading (1992)
Google Scholar
Wyatt, J., Hoar, J., Hayes, G.: Design, analysis and comparison of robot learners, accepted for publication. In: Nehmzow, U., Recce, M., Bisset, D. (eds.) Robotics and Autonomous Systems: Special Issue on quantitative methods in mobile robotics (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Engenharia Elétrica, Universidade de São Paulo, Av. Dr. Carlos Botelho, 1465, 13560-250, São Carlos, SP, Brazil
Arthur P. de S. Braga & Aluizio F. R. Araújo

Authors

Arthur P. de S. Braga
View author publications
You can also search for this author in PubMed Google Scholar
Aluizio F. R. Araújo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto de Informática - PUCRS, Av. Ipiranga 6681, prédio 30, bloc 4, 90619-900, Porto Alegre, RS, Brazil
Flávio Moreira de Oliveira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de S. Braga, A.P., Araújo, A.F.R. (1998). Goal-Directed Reinforcement Learning Using Variable Learning Rate. In: de Oliveira, F.M. (eds) Advances in Artificial Intelligence. SBIA 1998. Lecture Notes in Computer Science(), vol 1515. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10692710_14

Download citation

DOI: https://doi.org/10.1007/10692710_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65190-1
Online ISBN: 978-3-540-49523-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics