A topological reinforcement learning agent for navigation

Braga, Arthur P. S.; Araújo, Aluízio F. R.

doi:10.1007/s00521-003-0385-9

A topological reinforcement learning agent for navigation

Original Article
Published: 19 November 2003

Volume 12, pages 220–236, (2003)
Cite this article

Neural Computing & Applications Aims and scope Submit manuscript

Arthur P. S. Braga¹ &
Aluízio F. R. Araújo¹

438 Accesses
15 Citations
Explore all metrics

Abstract

This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topological Q-learning with internally guided exploration for mobile robot navigation

Article 28 February 2015

Predictive reinforcement learning: map-less navigation method for mobile robot

Article 31 August 2023

A Hierarchical Path Planning Approach Based on Reinforcement Learning for Mobile Robots

Notes

Hierarchical solutions and generalisation methods are other approaches for accelerating RL that are not mentioned in detail in this text.
Neuron and node are interchangeable terms used throughout this text.
The Delaunay triangulation is the dual of the Voronoi diagram [23]. The Voronoi diagram of a set W = {w ₁ , ..., w _N} of vectors w _i ∈ S is given by N Voronoi cells V _i , which is formed by the set of states s ∈ S that are closer to w _i than any other w _j ∈ S, j≠i. The Delaunay triangulation is the graph obtained if one connects all pairs w _i, w _j ∈ S whose Voronoi cells share an edge.
In this work the stimulus represents the spatial position of the agent in an environment.
One characteristic of a Delaunay triangulation is that to each triangle can be associated a circle. None of the vectors used as vertices for the triangulation [8] can be found in this circle. If there is a vector in any circle, there is a non-Delaunay edge connected to this point that should be deleted.
A trial is described as a sequence of 4-tuples(s _t, a _t, r _t+1, s _t+1) generated from an initial random state s ₀ until the agent reaches the goal state.
In case of multiple neighbour nodes with the same highest value, one of them is randomly chosen.
The adopted policy is a ε—greedy where the action is selected with probability 1-ε by Eq. 11 and with probability ε by an exploration strategy.
The maximum number of possible actions is eight. They are represented in Fig. 1.
Other exploratory strategies different from the semi-uniform distribution of probability.
Similar to [3], in TRLA the neighbourhood of the nearest node in the topological map of the current state gives all necessary information to the policy (see Eq. 11).

References

Arbib MA, Erdi P and Szentagothai J (1998) Neural organization—structure, function and dynamics. Bradford MIT Press, Cambridge, MA
Althoefer K, Krekelberg B, Husmeier D and Seneviratne L (2001) Reinforcement learning in a rule-based navigator for robotic manipulators. Neurocomputing 37:51–70
Article MATH Google Scholar
Atkeson CG, Schaal S (1995) Memory-based neural networks for robot learning. Neurocomputing 9(13):243–269
Article Google Scholar
Blodgett C (1929) The effect of the introduction of reward upon the maze performance of rats. Univ CA Pub Psychol 4:113–134
Google Scholar
Dean T, Kaelbling LP, Kirman J and Nicholson A (1995) Planning under time constraints in stochastic domains. Art Intellig 76:35–74
Article Google Scholar
Fritzke B (1994) Growing cell structures—a self-organizing network for unsupervised and supervised learning. Neur Netwks 7(9):1441–1460
Article Google Scholar
Gaussier P, Revel A, Joulain C and Zrehen S (1997) Living in a partially structured environment: how to bypass the limitations of classical reinforcement techniques. Robot Auton Sys 20:225–250
Article Google Scholar
George PL (1991) Automatic mesh generation—application to finite element methods. Wiley, New York
Haykin S (1999) Neural networks—a comprehensive foundation. Prentice Hall, New York
Jockusch J, Ritter H (1999) An instantaneous topological mapping model for correlated stimuli. In: Proceedings of the IJCNN’99, Washington, DC, 10-16 July 1999
Jockusch J (2000). Exploration based on neural networks with applications in manipulator control. Dissertation, University of Bielefeld
Johannet A, Sarda I (1999) Goal-directed behaviours by reinforcement learning. Neurocomputing 28:107–125
Article Google Scholar
Lin L-J (1993) Reinforcement learning for robots using neural networks. Dissertation, Carnegie Mellon University
Kaelbling LP, Littman ML and Moore AW (1996) Reinforcement learning: a survey. J Art Intellig Res 4:237–285
Google Scholar
Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Rob Res 5(1): 90–98
Google Scholar
Koenig S, Simmons RG (1996) The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms. Mach Learn 22:227–250
Article MATH Google Scholar
Kohonen T (1984) Self-organization and associative memory. Springer, Berlin Heidelberg New York
Kohonen T (2001) Self-organizing maps. Springer, Berlin Heidelberg New York
Kortenkamp D, Bonasso RP and Murphy R (1998) Artificial intelligence and mobile robots. AAAI Press / MIT Press, Cambridge, MA
Leonard JJ, Durrant-White HF (1991) Mobile robot localization by tracking geometric beacons. IEEE Trans Robot Automat 7(3):376–382
Article Google Scholar
Linhares A (1998) State-space search strategies gleaned from animal behavior: a traveling salesman experiment. Biol Cybern 78:167–173
Article MATH Google Scholar
Mahadevan S, Connell J (1992) Automatic programming of behavior-based robots using reinforcement learning. Art Intellig 55:311–365
Article Google Scholar
Martinetz T, Schulten K (1994) Topology representing networks. Neur Netwks 7(3):507–522
Article Google Scholar
Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130
Article Google Scholar
Muller RU, Stead M and Pach J (1996) The hippocampus as a cognitive graph. J Gen Physiol 7:663–694
Google Scholar
Nolfi S (2002) Power and limits of reactive agents. Neurocomputing 42:119–145
Article MATH Google Scholar
O’Keefe J, Dostrovsky J (1971) The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely moving rat. Expedr Brain Res 34:171–175
Article Google Scholar
O’Keefe J, Nadel L (1978) The hippocampus as a cognitive map. Claredon Press, Oxford, UK
O’Keefe J, Burgess N, Donnett JG, Jeffery KJ and Maguire EA (1998) Place cells, navigational accuracy, and the human hippocampus. Phil Trans Roy Soc Lond Series B-Biol Sci 353(1373):1333–1340
Google Scholar
Peng J, Williams RJ (1993) Efficient learning and planning within the Dyna framework. Adapt Behav 1(4):437–454
Google Scholar
Peng J, Williams RJ (1996) Incremental multi-step Q-learning. Mach Learn 22:283–290
Article Google Scholar
Rummery GA (1995) Problem solving with reinforcement learning. Dissertation, Cambridge University
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9-44
Google Scholar
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bullet 2:160–163
Google Scholar
Sutton RS, Barto A (1998) Introduction to reinforcement learning. MIT Press / Bradford Books, Cambridge, MA
Smith JNM (1974) Food searching behavior of two European thrushes—adaptiveness of search patterns. Behaviour 49:1-61
Google Scholar
Tchernichovski O, Benjamini Y and Golani I (1998) The dynamics of long-term exploration in rat. Part I—a phase-plane analysis of the relationship between location and velocity. Biol Cybern 78:423–432
Article Google Scholar
Tchernichovski O, Benjamini Y (1998) The dynamics of long-term exploration in rat. Part II—an analytical model of the kinematic structure of rat exploratory behavior. Biol Cybern 78:433–440
Article Google Scholar
Tesauro G (1995) Temporal differences learning and TD-Gammon. Comm ACM 38:58–68
Article Google Scholar
Thrun S, Moeller K and Linden A (1991) Planning with an adaptive world model. In: Touretzky D, Lippmann R (eds) Advances in neural information processing systems (NIPS) 3, Morgan Kaufmann, San Mateo, CA
Thrun SB (1992) Efficient exploration in reinforcement learning. Technical Report CMU-CS-92–102, Carnegie Mellon University
Tolman EC (1948) Cognitive maps in rats and men. Psychol Rev 55:189–208
Google Scholar
Tolman EC, Honzik CH (1930) Insight in rats. Univ CA Pub Psychol 4:215–232
Google Scholar
Touzet C (1997) Neural reinforcement learning for behaviour synthesis. Robot Auton Sys 22(3–4):251–281
Google Scholar
Trullier O, Wiener S, Berthoz A and Meyer JA (1997) Biologically-based artificial navigation systems: review and prospects. Prog Neurobiol 51(5):483–544
Article Google Scholar
Trullier O, Meyer J-A (2000) Animate navigation using a cognitive graph. Biol Cybern 83:271–285
Article MATH Google Scholar
Voicu H, Schmajuk N (2002) Latent learning, shortcuts and detours: a computational model. Behav Process 59:67–86
Article Google Scholar
Xiao J, Michalewicz Z, Zhang L and Trojanowski K (1997) Adaptative evolutionary planner/navigator for mobile robots. IEEE Trans Evolut Comput 1(1):18–28
Article Google Scholar
Watkins CJCH (1989) Learning from delayed rewards. Dissertation, Cambridge University
Wiering M, Schimidhuber J (1998) Fast online Q(λ). Mach Learn 33:105–115
Article MATH Google Scholar
Wyatt J (1997) Exploration and inference in learning from reinforcement. Dissertation, University of Edinburgh
Zalama E, Gaudiano P and Coronado JL (1995) A real-time, unsupervised neural network for the low-level control of a mobile robot in a nonstationary environment. Neur Netwks 8(1):103–123
Article Google Scholar
Zeller M, Sharma R and Schulten K (1997) Motion planning of a pneumatic robot using a neural network. IEEE Contr Sys Mag 17:89–98
Article Google Scholar
Zelinsky A (1992) A mobile robot exploration algorithm. IEEE Trans Robot Automat 8(6):707–717
Article Google Scholar

Download references

Acknowledgements

This work was supported by FAPESP grant #98/12700-5. The authors also thank the reviewers for their helpful comments.

Author information

Authors and Affiliations

Departamento de Engenharia Elétrica, Universidade de São Paulo, Av. Trabalhador Sãocarlense, 400, 13566-590, São Carlos, SP, Brazil
Arthur P. S. Braga & Aluízio F. R. Araújo

Authors

Arthur P. S. Braga
View author publications
You can also search for this author in PubMed Google Scholar
Aluízio F. R. Araújo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Arthur P. S. Braga or Aluízio F. R. Araújo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Braga, A.P.S., Araújo, A.F.R. A topological reinforcement learning agent for navigation. Neural Comput & Applic 12, 220–236 (2003). https://doi.org/10.1007/s00521-003-0385-9

Download citation

Received: 24 January 2003
Accepted: 04 August 2003
Published: 19 November 2003
Issue Date: December 2003
DOI: https://doi.org/10.1007/s00521-003-0385-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A topological reinforcement learning agent for navigation

Abstract

Access this article

Similar content being viewed by others

Topological Q-learning with internally guided exploration for mobile robot navigation

Predictive reinforcement learning: map-less navigation method for mobile robot

A Hierarchical Path Planning Approach Based on Reinforcement Learning for Mobile Robots

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A topological reinforcement learning agent for navigation

Abstract

Access this article

Similar content being viewed by others

Topological Q-learning with internally guided exploration for mobile robot navigation

Predictive reinforcement learning: map-less navigation method for mobile robot

A Hierarchical Path Planning Approach Based on Reinforcement Learning for Mobile Robots

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation