Skip to main content

Goal-Directed Reinforcement Learning Using Variable Learning Rate

  • Conference paper
Advances in Artificial Intelligence (SBIA 1998)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1515))

Included in the following conference series:

  • 296 Accesses

Abstract

This paper proposes and implements a reinforcement learning algorithm for an agent that can learn to navigate in an indoor and initially unknown environment. The agent learns a trajectory between an initial and a goal state through interactions with the environment. The environmental knowledge is encoded in two surfaces: the reward and the penalty surfaces. The former deals primarily with planning to reach the goal whilst the latter deals mainly with reaction to avoid obstacles. Temporal difference learning is the chosen strategy to construct both surfaces. The proposed algorithm is tested for different environments and types of obstacles. The simulation results suggest that the agent is able to reach a target from any point within the environment, avoiding local minimum points. Furthermore, the agent can improve an initial solution, employing a variable learning rate, through multiple visits to the spatial positions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Araújo, A.F.R., Braga, A.P.S.: Reward-Penalty Reinforcement Learning Scheme for Planning and Reactive Behavior. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (1998)

    Google Scholar 

  2. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on SMC  3(5), 834–846 (1983)

    Google Scholar 

  3. Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artificial Intelligence 73(1), 81–138 (1995)

    Article  Google Scholar 

  4. Brooks, R.A.: A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation  RA-2(1), 14–23 (1986)

    Google Scholar 

  5. Donnart, J.-Y., Meyer, J.-A.: Learning reactive and planning rules in a motivationally autonomous animat. IEEE Transactions on SMC 26(3), 381–395 (1996)

    Google Scholar 

  6. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  7. Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In: Proceedings of IEEE International Conference on Robotics Automation, St. Louis, MO, pp. 500–505 (1995)

    Google Scholar 

  8. Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goaldirected exploration with reinforcement learning algorithms. Machine Learning 22, 227–250 (1996)

    MATH  Google Scholar 

  9. Millán, J., del Rapid, R.: safe, and incremental learning of navigation strategies. IEEE Transactions on SMC 26, 408–420 (1996)

    Google Scholar 

  10. Sutton, R.S.: Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In: Proceedings of International Conference on Machine Learning, pp. 216–224 (1990)

    Google Scholar 

  11. Sutton, R.S., Barto, A.: An Introduction to Reinforcement Learning. MIT Press, Bradford Books, Cambridge (1998)

    Google Scholar 

  12. Thrun, S., Moeller, K., Linden, A.: Planning with an Adaptive World Model. In: Touretzky, D., Lippmann, R. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 3. Morgan Kaufmann, San Francisco (1991)

    Google Scholar 

  13. Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7, 45–83 (1991)

    Google Scholar 

  14. Winston, P.H.: Artificial Intelligence. Addison Wesley, Reading (1992)

    Google Scholar 

  15. Wyatt, J., Hoar, J., Hayes, G.: Design, analysis and comparison of robot learners, accepted for publication. In: Nehmzow, U., Recce, M., Bisset, D. (eds.) Robotics and Autonomous Systems: Special Issue on quantitative methods in mobile robotics (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

de S. Braga, A.P., Araújo, A.F.R. (1998). Goal-Directed Reinforcement Learning Using Variable Learning Rate. In: de Oliveira, F.M. (eds) Advances in Artificial Intelligence. SBIA 1998. Lecture Notes in Computer Science(), vol 1515. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10692710_14

Download citation

  • DOI: https://doi.org/10.1007/10692710_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65190-1

  • Online ISBN: 978-3-540-49523-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics