Advertisement

Knowledge and Information Systems

, Volume 51, Issue 3, pp 911–940 | Cite as

Incremental reinforcement learning for multi-objective robotic tasks

  • Javier GarcíaEmail author
  • Roberto Iglesias
  • Miguel A. Rodríguez
  • Carlos V. Regueiro
Regular Paper

Abstract

Recently reinforcement learning has been widely applied to robotic tasks. However, most of these tasks hide more than one objective. In these cases, the construction of a reward function is a key and difficult issue. A typical solution is combining the multiple objectives into one single-objective reward function. However, quite often this formulation is far from being intuitive, and the learning process might converge to a behaviour far from what we need. Another alternative to face these multi-objective tasks is to use what is called transfer learning. In this case, the idea is to reuse the experience gained after the learning of an objective to learn a new one. Nevertheless, the transfer affects only to the learned policy, leaving out other gained information that might be relevant. In this paper, we propose a different approach to learn problems with more than one objective. In particular, we describe a two-stage approach. During the first stage, our algorithm will learn a policy compatible with a main goal at the same time that it gathers relevant information for a subsequent search process. Once this is done, a second stage will start, which consists of a cyclical process of small perturbations and stabilizations, and which tries to avoid degrading the performance of the system while it searches for a new valid policy but that also optimizes a sub-objective. We have applied our proposal for the learning of the biped walking. We have tested it on a humanoid robot, both on simulation and on a real robot.

Keywords

Reinforcement learning Multi-objective optimization Robotic tasks Policy search 

Notes

Acknowledgments

This work was supported by the research grant TIN2012-32262 (FEDER), and by the Galician Government (Xunta de Galicia) under the Consolidation Program of Competitive Reference Groups (GRC2014/030).

References

  1. 1.
    Allen BF, Petros F (2009) Complex networks of simple neurons for bipedal locomotion. In: 2009 IEEE/RSJ international conference on intelligent robots and systems, October 11–15, 2009. St. Louis, MO, USA, pp 4457–4462Google Scholar
  2. 2.
    Anderson C (2000) Approximating a policy can be easier than approximating a value function. Technical report, University of Colorado StateGoogle Scholar
  3. 3.
    Barrett S, Taylor ME, Stone P (2010) Transfer learning for reinforcement learning on a physical robot. In: Ninth international conference on autonomous agents and multiagent systems—adaptive learning agents workshop (ALA), May 2010Google Scholar
  4. 4.
    Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(1–2):41–77MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Boedecker J (2005) Humanoid robot simulation and walking behaviour development in the spark simulator framework. Technical report, Artificial Intelligence Research University of KoblenzGoogle Scholar
  6. 6.
    Castelletti A, Corani G, Rizzoli AE, Soncini Sessa R, Weber E (2002) Reinforcement learning in the operational management of a water system. In: IFAC workshop on modeling and control in environmental issuesGoogle Scholar
  7. 7.
    Castro DD, Tamar A, Mannor S (2012) Policy gradients with variance related risk criteria. In: Proceedings of the 29th international conference on machine learning, ICML 2012, Edinburgh, Scotland, UK, June 26–July 1, 2012Google Scholar
  8. 8.
    Deisenroth MP, Neumann G, Peters J (2013) A survey on policy search for robotics, foundations and trends in robotics. Found Trends Robot 2(1–2):1–142Google Scholar
  9. 9.
    Domingues E, Lau N, Pimentel B, Shafii N, Reis LP, Neves AJR (2011) Humanoid behaviors: from simulation to a real robot. In: Progress in artificial intelligence, 15th Portuguese conference on artificial intelligence, EPIA 2011, Lisbon, Portugal, October 10–13, 2011. Proceedings, pp 352–364Google Scholar
  10. 10.
    Fernández F, García J, Veloso MM (2010) Probabilistic policy reuse for inter-task transfer learning. Robot Auton Syst 58(7):866–871CrossRefGoogle Scholar
  11. 11.
    Ferreira L, Bianchi R, Ribeiro C (2012) Multi-agent multi-objective learning using heuristically accelerated reinforcement learning. In: Brazilian robotics symposium and Latin American robotics symposiumGoogle Scholar
  12. 12.
    Gabor Z, Kalmar Z, Szepesvari C (1998) Multi-criteria reinforcement learning. In: International conference on machine learning (ICML-98), Madison, WIGoogle Scholar
  13. 13.
    Garg A, Roth D (2001) Understanding probabilistic classifiers. In: EMCL ’01: proceedings of the 12th European conference on machine learning, London, UK, 2001. Springer, pp 179–191Google Scholar
  14. 14.
    Geibel P (2006) In: Proceedings of the 17th European conference on machine learning Berlin, Germany, September 18–22, 2006 Proceedings, Berlin, Heidelberg, 2006. Springer, Berlin, Heidelberg, pp 646–653Google Scholar
  15. 15.
    Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. JAIR 24:81–108zbMATHGoogle Scholar
  16. 16.
    Ijspeert AJ, Nakanishi J, Schaal S (2002) Movement imitation with nonlinear dynamical systems in humanoid robots. In: IEEE international conference on robotics and automation (ICRA2002), pp 1398–1403Google Scholar
  17. 17.
    Kalyanakrishnan S, Stone P (2009) An empirical analysis of value function-based and policy search reinforcement learning. In: The eighth international conference on autonomous agents and multiagent systems (AAMAS), Richland, SC, May 2009. International Foundation for Autonomous Agents and Multiagent Systems, pp 749–756Google Scholar
  18. 18.
    Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 12:579–610Google Scholar
  19. 19.
    Kober J, Peters J (2011) Policy search for motor primitives in robotics. Mach Learn 84(1–2):171–203MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Kuhlmann G, Stone P (2007) Graph-based domain mapping for transfer learning in general games. In: Proceedings of the 18th European conference on machine learning, September 2007Google Scholar
  21. 21.
    Lee H, Shen Y, Yu C-H, Singh G, Ng AY (2006) Quadruped robot obstacle negotiation via reinforcement learning. In: Proceedings of the 2006 IEEE international conference on robotics and automation, ICRA 2006, May 15–19, 2006, Orlando, Florida, USA, pp 3003–3010Google Scholar
  22. 22.
    Liu C, Xu X, Hu D (2015) Multiobjective reinforcement learning: a comprehensive overview. IEEE Trans Syst Man Cybern Syst 45(3):385–398CrossRefGoogle Scholar
  23. 23.
    Van Moffaert K, Brys T, Nowé A (2015) Risk-sensitivity through multi-objective reinforcement learning. In: Proceedings of the IEEE congress on evolutionary computation (IEEE CEC)Google Scholar
  24. 24.
    Van Moffaert K, Drugan MM, Nowé A (2013) Hypervolume-based multi-objective reinforcement learning. In: Evolutionary multi-criterion optimization—7th international conference, EMO 2013, Sheffield, UK, March 19–22, 2013. Proceedings, pp 352–366Google Scholar
  25. 25.
    Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. J Mach Learn Res 15(1):3483–3512MathSciNetzbMATHGoogle Scholar
  26. 26.
    Morimoto J, Doya K (2000) Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29–July 2, 2000, pp 623–630Google Scholar
  27. 27.
    Parisi S, Pirotta M, Smacchia N, Bascetta L, Restelli M (2014) Policy gradient approaches for multi-objective sequential decision making. In: 2014 International joint conference on neural networks, IJCNN 2014, Beijing, China, July 6–11, 2014, pp 2323–2330Google Scholar
  28. 28.
    Perez J, Germain-Renaud C, Kégl B, Loomis C (2009) Responsive elastic computing. In: Proceedings of the 6th international conference industry session on grids meets autonomic computing, GMAC’09, New York, NY, USA, pp 55–64Google Scholar
  29. 29.
    Roijers D, Vamplew P, Whiteson S, Dazeley R (2013) A survey of multi-objective sequential decision-making. J Artif Intell Res 48:67–113MathSciNetzbMATHGoogle Scholar
  30. 30.
    Rückstieß T, Felder M, Schmidhuber J (2008) State-dependent exploration for policy gradient methods. In: European conference on machine learning and principles and practice of knowledge discovery in databases 2008, Part II, LNAI 5212, pp 234–249Google Scholar
  31. 31.
    Shafii N, Reis LP, Lao N (2010) Biped walking using coronal and sagittal movements based on truncated fourier series. In: Sousa AA, Eugénio O (eds) Proceedings of the fifth doctoral symposium in informatics engineering, (DSIE 2010), Porto, Portugal, January 2010. Faculdade de Engenharia, Universidade do Porto, pp 79–90Google Scholar
  32. 32.
    Shelton CR (2001) Importance sampling for reinforcement learning with multiple objectives. PhD thesis, Massachusetts Institute of Technology, August 2001Google Scholar
  33. 33.
    Smith AE, Coit DW, Baeck T, Fogel D, Michalewicz Z (1997) Penalty functions. Oxford University Press and Institute of Physics Publishing, New YorkCrossRefGoogle Scholar
  34. 34.
    Richard S, Sutton RS, Andrew G (1998) Introduction to reinforcement learning, 1st edn. MIT Press, CambridgeGoogle Scholar
  35. 35.
    Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(1):1633–1685MathSciNetzbMATHGoogle Scholar
  36. 36.
    Taylor ME, Whiteson S, Stone P (2007) Temporal difference and policy search methods for reinforcement learning: an empirical comparison. In: Proceedings of the twenty-second conference on artificial intelligence, pp. 1675–1678, July 2007. Nectar TrackGoogle Scholar
  37. 37.
    Uchibe E, Doya K (2009) Constrained reinforcement learning from intrinsic and extrinsic rewards. INTECH Open Access Publisher, New YorkzbMATHGoogle Scholar
  38. 38.
    van Hasselt H (2012) Reinforcement learning in continuous state and action spaces, volume 12 of adaptation, learning, and optimization, Chapter 7. Springer, Berlin, Heidelberg, pp 207–251Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  1. 1.CITIUSUniversidade de Santiago de CompostelaSantiago de CompostelaSpain
  2. 2.Departament of Electronics and SystemsUniversidade da CoruñaA CoruñaSpain

Personalised recommendations