Reinforcement-Driven Shaping of Sequence Learning in Neural Dynamics

  • Matthew Luciw
  • Sohrob Kazerounian
  • Yulia Sandamirskaya
  • Gregor Schöner
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8575)


We present here a simulated model of a mobile Kuka Youbot which makes use of Dynamic Field Theory for its underlying perceptual and motor control systems, while learning behavioral sequences through Reinforcement Learning. Although dynamic neural fields have previously been used for robust control in robotics, high-level behavior has generally been pre-programmed by hand. In the present work we extend a recent framework for integrating reinforcement learning and dynamic neural fields, by using the principle of shaping, in order to reduce the search space of the learning agent.


Neural Dynamics Elementary Behaviors Reinforcement Learning Shaping 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amari, S.: Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics 27, 77–87 (1977)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Purposive behavior acquisition for a real robot by vision-based reinforcement learning. In: Recent Advances in Robot Learning, pp. 163–187. Springer (1996)Google Scholar
  3. 3.
    Bicho, E., Mallet, P., Schöner, G.: Target representation on an autonomous vehicle with low-level sensors. The International Journal of Robotics Research 19(5), 424–447 (2000)CrossRefGoogle Scholar
  4. 4.
    Colombetti, M., Dorigo, M.: Training agents to perform sequential behavior. Adaptive Behavior 2(3), 247–275 (1994)CrossRefGoogle Scholar
  5. 5.
    Dorigo, M.: Robot shaping: an experiment in behaviour engineering. The MIT Press (1998)Google Scholar
  6. 6.
    Duran, B., Sandamirskaya, Y.: Neural dynamics of hierarchically organized sequences: a robotic implementation. In: Proceedings of 2012 IEEE-RAS International Conference on Humanoid Robots, Humanoids (2012)Google Scholar
  7. 7.
    Durán, B., Sandamirskaya, Y., Schöner, G.: A dynamic field architecture for the generation of hierarchically organized sequences. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part I. LNCS, vol. 7552, pp. 25–32. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Frank, M., Leitner, J., Stollenga, M., Förster, A., Schmidhuber, J.: Curiosity driven reinforcement learning for motion planning on humanoids. Frontiers in Neurorobotics 7 (2013)Google Scholar
  9. 9.
    Gomez, F., Miikkulainen, R.: 2-D pole-balancing with recurrent evolutionary networks. In: Proceedings of the International Conference on Artificial Neural Networks, pp. 425–430. Citeseer (1998)Google Scholar
  10. 10.
    Graziano, V., Gomez, F.J., Ring, M.B., Schmidhuber, J.: T-learning. CoRR abs/1201.0292 (2012)Google Scholar
  11. 11.
    Grossberg, S.: Behavioral contrast in short-term memory: Serial binary memory models or parallel continuous memory models? Journal of Mathematical Psychology 3, 199–219 (1978)CrossRefGoogle Scholar
  12. 12.
    Grossberg, S., Kazerounian, S.: Laminar cortical dynamics of conscious speech perception: Neural model of phonemic restoration using subsequent context in noise. The Journal of the Acoustical Society of America 130(1), 440–460 (2011)CrossRefGoogle Scholar
  13. 13.
    Gullapalli, V.: Reinforcement learning and its application to control. PhD thesis, Citeseer (1992)Google Scholar
  14. 14.
    Indiveri, G.: Swedish wheeled omnidirectional mobile robots: kinematics analysis and control. IEEE Transactions on Robotics 25(1), 164–171 (2009)CrossRefGoogle Scholar
  15. 15.
    James, M.R., Singh, S.: Sarsalandmark: an algorithm for learning in pomdps with landmarks. In: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pp. 585–591. International Foundation for Autonomous Agents and Multiagent Systems (2009)Google Scholar
  16. 16.
    Kaelbing, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  17. 17.
    Kazerounian, S., Luciw, M., Richter, M., Sandamirskaya, Y.: Autonomous reinforcement of behavioral sequences in neural dynamics. In: International Joint Conference on Neural Networks, IJCNN (2013)Google Scholar
  18. 18.
    Konidaris, G., Barto, A.: Autonomous shaping: Knowledge transfer in reinforcement learning. In: Proceedings of the 23rd international conference on Machine learning, pp. 489–496. ACM (2006)Google Scholar
  19. 19.
    Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable markov decision processes. In: Proceedings of the Fifteenth International Conference on Machine Learning. Citeseer (1998)Google Scholar
  20. 20.
    Mataric, M.J.: Reward functions for accelerated learning. ICML 94, 181–189 (1994)Google Scholar
  21. 21.
    McGovern, A., Sutton, R.S., Fagg, A.H.: Roles of macro-actions in accelerating reinforcement learning. In: Grace Hopper celebration of women in computing, vol. 1317 (1997)Google Scholar
  22. 22.
    Peterson, G.B.: A day of great illumination: Bf skinner’s discovery of shaping. Journal of the Experimental Analysis of Behavior 82(3), 317–328 (2004)CrossRefGoogle Scholar
  23. 23.
    Piaget, J.: The origins of intelligence in children. International Universities Press, New York (1952)Google Scholar
  24. 24.
    Randlov, J., Alstrom, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 463–471 (1998)Google Scholar
  25. 25.
    Richter, M., Sandamirskaya, Y., Schöner, G.: A robotic architecture for action selection and behavioral organization inspired by human cognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2012)Google Scholar
  26. 26.
    Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering (1994)Google Scholar
  27. 27.
    Sandamirskaya, Y., Richter, M., Schöner, G.: A neural-dynamic architecture for behavioral organization of an embodied agent. In: IEEE International Conference on Development and Learning and on Epigenetic Robotics, ICDL EPIROB 2011 (2011)Google Scholar
  28. 28.
    Sandamirskaya, Y., Schöner, G.: Dynamic field theory of sequential action: A model and its implementation on an embodied agent. In: Scassellati, B., Deak, G. (eds.) International Conference on Development and Learning ICDL 2008, paper 53, 8 pages (2008)Google Scholar
  29. 29.
    Sandamirskaya, Y., Schöner, G.: An embodied account of serial order: How instabilities drive sequence generation. Neural Networks 23(10), 1164–1179 (2010)CrossRefGoogle Scholar
  30. 30.
    Sasksida, L.M., Raymond, S.M., Touretzky, D.S.: Shaping robot behavior using principles from instrumental conditioning. Robotics and Autonomous Systems 22(3), 231–249 (1998)Google Scholar
  31. 31.
    Schmidhuber, J.: Curious model-building control systems. In: Proceedings of the International Joint Conference on Neural Networks, Singapore. Volume 2, pp. 1458–1463. IEEE Press (1991)Google Scholar
  32. 32.
    Schöner, G.: Dynamical systems approaches to neural systems and behavior. In: Smelser, N.J., Baltes, P.B. (eds.) International Encyclopedia of the Social & Behavioral Sciences, Oxford, Pergamon, pp. 10571–10575. Pergamon Press, Oxford (2002)Google Scholar
  33. 33.
    Selfridge, O.G., Sutton, R.S., Barto, A.G.: Training and tracking in robotics. In: IJCAI, pp. 670–672. Citeseer (1985)Google Scholar
  34. 34.
    Silver, M.R., Grossberg, S., Bullock, D., Histed, M.H., Miller, E.K.: A neural model of sequential movement planning and control of eye movements: Item-order-rank working memory and saccade selection by the supplementary eye fields. Neural Networks 26, 29–58 (2012)CrossRefGoogle Scholar
  35. 35.
    Skinner, B.F.: The behavior of organisms: An experimental analysis (1938)Google Scholar
  36. 36.
    Spong, M.W., Hutchinson, S., Vidyasagar, M.: Robot modeling and control. John Wiley & Sons, New York (2006)Google Scholar
  37. 37.
    Sutton, R., Barto, A.: Reinforcement learning: An introduction, vol. 1. Cambridge Univ. Press (1998)Google Scholar
  38. 38.
    Thrun, S.B.: The role of exploration in learning control. In: Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)Google Scholar
  39. 39.
    Touretzky, D.S., Saksida, L.M.: Operant conditioning in skinnerbots. Adaptive Behavior 5(3-4), 219–247 (1997)CrossRefGoogle Scholar
  40. 40.
    Webots: Commercial Mobile Robot Simulation Software,
  41. 41.
    Weng, J.: Developmental robotics: Theory and experiments. International Journal of Humanoid Robotics 1(02), 199–236 (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Matthew Luciw
    • 1
  • Sohrob Kazerounian
    • 1
  • Yulia Sandamirskaya
    • 2
  • Gregor Schöner
    • 2
  • Jürgen Schmidhuber
    • 1
  1. 1.Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA)Manno-LuganoSwitzerland
  2. 2.Institut für Neuroinformatik at the UniversitätstrBochumGermany

Personalised recommendations