From Exploration to Planning

  • Cornelius Weber
  • Jochen Triesch
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5163)


Learning and behaviour of mobile robots faces limitations. In reinforcement learning, for example, an agent learns a strategy to get to only one specific target point within a state space. However, we can grasp a visually localized object at any point in space or navigate to any position in a room. We present a neural network model in which an agent learns a model of the state space that allows him to get to an arbitrarily chosen goal via a short route. By randomly exploring the state space, the agent learns associations between two adjoining states and the action that links them. Given arbitrary starting and goal positions, route-finding is done in two steps. First, an activation gradient spreads around the goal position along the associative connections. Second, the agent uses state-action associations to determine the actions leading to ascend the gradient toward the goal. All mechanisms are biologically justifiable.


State Space Forward Model Inverse Model Mirror Neuron Goal Position 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Ungless, M., Magill, P., Bolam, J.: Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303, 2040–2042 (2004)CrossRefGoogle Scholar
  3. 3.
    Tobler, P., Fiorillo, C., Schultz, W.: Adaptive coding of reward value by dopamine neurons. Science 307(5715), 1642–1645 (2005)CrossRefGoogle Scholar
  4. 4.
    Foster, D., Dayan, P.: Structure in the space of value functions. Machine Learning 49, 325–346 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Davidson, P., Wolpert, D.: Widespread access to predictive models in the motor system: A short review. Journal of Neural Engineering 2, 8313–8319 (2005)CrossRefGoogle Scholar
  6. 6.
    Iacoboni, M., Wilson, S.: Beyond a single area: motor control and language within a neural architecture encompassing broca’s area. Cortex 42(4), 503–506 (2006)CrossRefGoogle Scholar
  7. 7.
    Miall, R.: Connecting mirror neurons and forward models. Neuroreport 14(16), 2135–2137 (2003)CrossRefGoogle Scholar
  8. 8.
    Oztop, E., Wolpert, D., Kawato, M.: Mirror neurons: Key for mental simulation? In: Twelfth annual computational neuroscience meeting CNS, p. 81 (2003)Google Scholar
  9. 9.
    Churchland, P.: Self-representation in nervous systems. Science 296, 308–310 (2002)CrossRefGoogle Scholar
  10. 10.
    Plaut, D.C., Kello, C.T.: The emergence of phonology from the interplay of speech comprehension and production: A distributed connectionist approach. In: The emergence of language. B. MacWhinney (1998)Google Scholar
  11. 11.
    Metta, G., Panerai, F., Manzotti, R., Sandini, G.: Babybot: an artificial developing robotic agent. In: SAB (2000)Google Scholar
  12. 12.
    Dearden, A., Demiris, Y.: Learning forward models for robots. In: IJCAI, pp. 1440–1445 (2005)Google Scholar
  13. 13.
    Weber, C.: Self-organization of orientation maps, lateral connections, and dynamic receptive fields in the primary visual cortex. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 1147–1152. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. 14.
    Dorigo, M., Birattari, M., Stützle, T.: Ant colony optimization. Computational Intelligence Magazine, IEEE 1(4), 28–39 (2006)Google Scholar
  15. 15.
    Witkowski, M.: An action-selection calculus. Adaptive Behavior 15(1), 73–97 (2007)CrossRefGoogle Scholar
  16. 16.
    Schmidhuber, J.: Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science 18(2), 173–187 (1991)CrossRefGoogle Scholar
  17. 17.
    Herrmann, J., Pawelzik, K., Geisel, T.: Learning predictive representations. Neurocomputing 32-33, 785–791 (2000)CrossRefGoogle Scholar
  18. 18.
    Oudeyer, P., Kaplan, F., Hafner, V., Whyte, A.: The playground experiment: Task-independent development of a curious robot. In: AAAI Spring Symposium Workshop on Developmental Robotics (2005)Google Scholar
  19. 19.
    Der, R., Martius, G.: From motor babbling to purposive actions: Emerging self-exploration in a dynamical systems approach to early robot development. In: SAB, pp. 406–421. Springer, Berlin (2006)Google Scholar
  20. 20.
    Foster, D., Morris, R., Dayan, P.: A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10, 1–16 (2000)CrossRefGoogle Scholar
  21. 21.
    Van Rullen, R., Thorpe, S.: Rate coding versus temporal order coding: What the retinal ganglion cells tell the visual cortex. Neur. Comp. 13, 1255–1283 (2001)zbMATHCrossRefGoogle Scholar
  22. 22.
    Roelfsema, P., van Ooyen, A.: Attention-gated reinforcement learning of internal representations for classification. Neur. Comp. 17, 2176–2214 (2005)zbMATHCrossRefGoogle Scholar
  23. 23.
    McCallum, A.: Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, U. of Rochester (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Cornelius Weber
    • 1
  • Jochen Triesch
    • 1
  1. 1.Frankfurt Institute for Advanced StudiesJohann Wolfgang Goethe UniversityFrankfurt am MainGermany

Personalised recommendations