A Reinforcement Learning Based Algorithm for Robot Action Planning

  • Marko ŠvacoEmail author
  • Bojan Jerbić
  • Mateo Polančec
  • Filip Šuligoj
Conference paper
Part of the Mechanisms and Machine Science book series (Mechan. Machine Science, volume 67)


The learning process that arises in response to the visual perception of the environment is the starting point for numerous research in the field of applied and cognitive robotics. In this research, we propose a reinforcement learning based action planning algorithm for the assembly of spatial structures with an autonomous robot in an unstructured environment. We have developed an algorithm based on temporal difference learning using linear base functions for the approximation of the state-value-function because of a large number of discrete states that the autonomous robot can encounter. The aim is to find the optimal sequence of actions that the agent (robot) needs to take in order to move objects in a 2D environment until they reach the predefined target state. The algorithm is divided into two parts. In the first part, the goal is to learn the parameters in order to properly approximate the Q function. In the second part of the algorithm, the obtained parameters are used to define the sequence of actions for a UR3 robot arm. We present a preliminary validation of the algorithm in an experimental laboratory scenario.


Robotics Reinforcement learning Autonomous robot 



The authors would like to acknowledge the Croatian Scientific Foundation through the research project ACRON - A new concept of Applied Cognitive Robotics in clinical Neuroscience.


  1. 1.
    Švaco, M., Jerbić, B., Šekoranja, B.: Task planning based on the interpretation of spatial structures. Tehnicki vjesnik - Technical Gazette 24(2) (2017)Google Scholar
  2. 2.
    Švaco, M., Jerbić, B., Šuligoj, F.: ARTgrid: a two-level learning architecture based on adaptive resonance theory. Adv. Artif. Neural Syst. 2014, 1–9 (2014)CrossRefGoogle Scholar
  3. 3.
    Ekvall, S., Kragic, D.: Robot learning from demonstration: a task-level planning approach. Int. J. Adv. Rob. Syst. 5(3), (2008)CrossRefGoogle Scholar
  4. 4.
    Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Vision-based reinforcement learning for purposive behavior acquisition. In: IEEE International Conference on Robotics and Automation, Proceedings., vol. 1, pp. 146–153Google Scholar
  5. 5.
    Yiannis, D., Hayes, G.: Imitative Learning Mechanisms in Robots and Humans (1996)Google Scholar
  6. 6.
    Jerbić, B.: Autonomous robotic assembly using collaborative behavior based agents. Int. J. Smart Eng. Syst. Des. 4(1), 11–20 (2002)CrossRefGoogle Scholar
  7. 7.
    Jerbić, B., Grolinger, K., Vranješ, B.: Autonomous agent based on reinforcement learning and adaptive shadowed network. Artif. Intell. Eng. 13(2), 141–157 (1999)CrossRefGoogle Scholar
  8. 8.
    Kormushev, P., Calinon, S., Caldwell, D.: Reinforcement learning in robotics: applications and real-world challenges. Robotics 2(3), 122–148 (2013)CrossRefGoogle Scholar
  9. 9.
    Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Rob. Res. 32(11), 1238–1274 (2013)CrossRefGoogle Scholar
  10. 10.
    Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8-th Conference on Intelligent Autonomous Systems, pp. 438–445 (2004)Google Scholar
  11. 11.
    Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, arXiv preprint arXiv:1012.2599 (2010)
  12. 12.
    Morimoto, J., Doya, K.: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Rob. Autonom. Syst. 36(1), 37–51 (2001)CrossRefGoogle Scholar
  13. 13.
    Miljković, Z., Mitić, M., Lazarević, M., Babić, B.: Neural network reinforcement Learning for visual control of robot manipulators. Expert Syst. Appl. 40(5), 1721–1736 (2013)CrossRefGoogle Scholar
  14. 14.
    Khan, S.G., Herrmann, G., Lewis, F.L., Pipe, T., Melhuish, C.: Reinforcement learning and optimal adaptive control: an overview and implementation examples. Ann. Rev. Control 36(1), 42–59 (2012)CrossRefGoogle Scholar
  15. 15.
    Duguleana, M., Barbuceanu, F.G., Teirelbar, A., Mogan, G.: Obstacle avoidance of redundant manipulators using neural networks based reinforcement learning. Rob. Comput.-Integr. Manuf. (2011)Google Scholar
  16. 16.
    Deisenroth, M., Rasmussen, C., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning (2011)Google Scholar
  17. 17.
    Švaco, M., Jerbić, B., Šuligoj, F.: autonomous robot learning model based on visual interpretation of spatial structures. Trans. FAMENA 38(4), 13–28 (2014)Google Scholar
  18. 18.
    Miklic, D., Bogdan, S., Fierro, R.: Decentralized grid-based algorithms for formation reconfiguration and synchronization. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 4463–4468 (2010)Google Scholar
  19. 19.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  20. 20.
    Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.: Robot learning from demonstration by constructing skill trees. Int. J. Rob. Res. (2011)Google Scholar
  21. 21.
    Ye, C., Yung, N.H.C., Wang, D.: A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 33(1), 17–27 (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Marko Švaco
    • 1
    Email author
  • Bojan Jerbić
    • 1
  • Mateo Polančec
    • 1
  • Filip Šuligoj
    • 1
  1. 1.Faculty of Mechanical Engineering and Naval Architecture, Department of Robotics and Production System AutomationUniversity of ZagrebZagrebCroatia

Personalised recommendations