Skip to main content
Log in

Topological Q-learning with internally guided exploration for mobile robot navigation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript


Improving the learning convergence of reinforcement learning (RL) in mobile robot navigation has been the interest of many recent works that have investigated different approaches to obtain knowledge from effectively and efficiently exploring the robot’s environment. In RL, this knowledge is of great importance for reducing the high number of interactions required for updating the value function and to eventually find an optimal or a nearly optimal policy for the agent. In this paper, we propose a topological Q-learning (TQ-learning) algorithm that makes use of the topological ordering among the observed states of the environment in which the agent acts. This algorithm builds an incremental topological map of the environment using Instantaneous Topological Map model which we use for accelerating value function updates as well as providing a guided exploration strategy for the agent. We evaluate our algorithm against the original Q-learning and the Influence Zone algorithms in static and dynamic environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others


  1. By “subsequent experiences”, we mean subsequent visits by the agent to those proceeding states.

  2. The environment states are represented by nodes in the ITM map as will be discussed later.

  3. In ε-greedy exploration, the agent selects the next action randomly with a probability of ε and based on the learned policy with a probability of 1 − ε.


  1. Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull 2(4):160–163

    Article  Google Scholar 

  2. Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13(1):103–130

    Google Scholar 

  3. Hwang KS, Jiang WC, Chen YJ (2012) Tree-based Dyna-Q agent. The 2012 IEEE/ASME international conference on advanced intelligent mechatronics, Kaohsiung, Taiwan

  4. Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44

    Google Scholar 

  5. Watkins CGCH (1989) Learning from delayed rewards. King’s College, Cambridge

    Google Scholar 

  6. Wiering M, Schmidhuber J (1998) Fast online Q(λ). Mach Learn 33(1):105–115

    Article  MATH  Google Scholar 

  7. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  8. Peng J, Williams RJ (1996) Incremental multi-step Q-learning. Mach Learn 22(1–3):283–290

    Google Scholar 

  9. Touzet CF (1997) Neural reinforcement learning for behaviour synthesis. Rob Auton Syst 22(3–4):251–281

    Article  Google Scholar 

  10. Zeller M, Sharma R, Schulten K (1997) Motion planning of a pneumatic robot using a neural network. IEEE Control Syst 17(3):89–98

    Article  Google Scholar 

  11. Busoniu L, Babuska R, Schutter BD, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press, New York

    Book  Google Scholar 

  12. Millán JDR, Posenato D, Dedieu E (2002) Continuous-action Q-learning. Mach Learn 49(2–3):247–265

    Article  MATH  Google Scholar 

  13. Munos R, Szepesvári C (2008) Finite-time bounds for fitted value iteration. J Mach Learn Res 1:623–665

    Google Scholar 

  14. Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. Artificial intelligence research

  15. Takase N, Kubota N, Baba N (2012) Multi-scale Q-learning of a mobile robot in dynamic environments. SCIS-ISIS, Kobe

    Book  Google Scholar 

  16. Braga APS, Araújo AFR (2003) A topological reinforcement learning agent for navigation. Neural Comput Appl 12:220–236

    Article  Google Scholar 

  17. Braga APS, Araújo AFR (2006) Influence Zones: a strategy to enhance reinforcement learning. Neurocomputing 70(1–3):21–34

    Article  Google Scholar 

  18. Dai P, Strehl AL, J. Goldsmith (2008) Expediting RL by using graphical structures. Proceedings of the seventh international joint conference on autonomous agents and multiagent systems

  19. Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT Press/Bradford Books, Cambridge

    Google Scholar 

  20. Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the seventh international conference on machine learning

  21. Luciw M, Graziano V, Ring M, Schmidhuber J (2011) Artificial curiosity with planning for autonomous perceptual and cognitive development. Development and learning (ICDL), Frankfurt

  22. Zahedi K, Martius G, Ay N (2013) Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis. Front Psychol. doi:10.3389/fpsyg.2013.00801

  23. Remolina E, Kuipers B (2004) Towards a general theory of topological maps. Artif Intell 152(1):47–104

    Article  MATH  MathSciNet  Google Scholar 

  24. Thrun S, Buckenz A (1996) Integrating grid-based and topological maps for mobile robot navigation. Proceedings of the thirteenth national conference on artificial intelligence AAAI, Portland, Oregon

  25. Kohonen T (1989) Self-organization and associative memory. Springer, Berlin

    Book  Google Scholar 

  26. Kohonen T (2001) Self-organizing maps. Springer, Berlin

    Book  MATH  Google Scholar 

  27. Martinetz T, Schulten K (1991) A neural-gas network learns topologies. Artif Neural Netw, Amsterdam, pp 397–402

    Google Scholar 

  28. Fritzke B (1995) A growing neural gas network learns topologies. Adv Neural Inf Process Syst 7:625–632

    Google Scholar 

  29. Jockusch J, Ritter H (1999) An Instantaneous Topological Mapping Model for correlated stimuli. Proceedings of the IJCNN’99, Washington, DC

  30. Ng AY, Jordan M (2000) PEGASUS: a policy search method for large MDPs and POMDPs. Proceedings of the sixteenth conference on uncertainty in artificial intelligence, San Francisco

Download references


This research is supported by High Impact Research MoE Grant UM.C/625/1/HIR/MoE/FCSIT/10 from the Ministry of Education Malaysia.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Muhammad Burhan Hafez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hafez, M.B., Loo, C.K. Topological Q-learning with internally guided exploration for mobile robot navigation. Neural Comput & Applic 26, 1939–1954 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: