Advertisement

Contextual Direct Policy Search

With Regularized Covariance Matrix Estimation
  • Abbas AbdolmalekiEmail author
  • David Simões
  • Nuno Lau
  • Luís Paulo Reis
  • Gerhard Neumann
Article
  • 9 Downloads

Abstract

Stochastic search and optimization techniques are used in a vast number of areas, ranging from refining the design of vehicles, determining the effectiveness of new drugs, developing efficient strategies in games, or learning proper behaviors in robotics. However, they specialize for the specific problem they are solving, and if the problem’s context slightly changes, they cannot adapt properly. In fact, they require complete re-leaning in order to perform correctly in new unseen scenarios, regardless of how similar they are to previous learned environments. Contextual algorithms have recently emerged as solutions to this problem. They learn the policy for a task that depends on a given context, such that widely different contexts belonging to the same task are learned simultaneously. That being said, the state-of-the-art proposals of this class of algorithms prematurely converge, and simply cannot compete with algorithms that learn a policy for a single context. We describe the Contextual Relative Entropy Policy Search (CREPS) algorithm, which belongs to the before-mentioned class of contextual algorithms. We extend it with a technique that allows the algorithm to severely increase its performance, and we call it Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation (CREPS-CMA). We propose two variants, and demonstrate their behavior in a set of classic contextual optimization problems, and on complex simulator robot tasks.

Keywords

Multi-task learning Stochastic policy search Contextual learning Covariance matrix adaptation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

References

  1. 1.
    Abdolmaleki, A., Lau, N., Reis, L.P., Peters, J., Neumann, G.: Contextual policy search for linear and nonlinear generalization of a humanoid walking controller. J. Intell. Robot. Syst. 10, 1–16 (2016)Google Scholar
  2. 2.
    Abdolmaleki, A., Lioutikov, R., Peters, J., Lua, N., Reis, L., Neumann, G.: Regularized Covariance Estimation for Weighted Maximum Likelihood Policy Search Methods. In: Advances in Neural Information Processing Systems (NIPS). MIT Press (2015)Google Scholar
  3. 3.
    Abdolmaleki, A., Lua, N., Reis, L., Neumann, G.: Regularized covariance estimation for weighted maximum likelihood policy search methods. In: Proceedings of the International Conference on Humanoid Robots (HUMANOIDS) (2015)Google Scholar
  4. 4.
    Abdolmaleki, A., Lua, N., Reis, L., Peters, J., Neumann, G.: Contextual Policy Search for Generalizing a Parameterized Biped Walking Controller. In: IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC) (2015)Google Scholar
  5. 5.
    Abdolmaleki, A., Simoes, D., Lau, N., Reis, L.P., Neumann, G.: Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation. In: 2016 IEEE International Conference On Autonomous Robot Systems and Competitions (ICARSC), pp. 94–99. IEEE (2016)Google Scholar
  6. 6.
    Boyd, S., Vandenberghe, L.: Convex optimization. University Press, Cambridge (2004)CrossRefGoogle Scholar
  7. 7.
    Broomhead, D.S., Lowe, D.: Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks. Tech. rep., DTIC Document (1988)Google Scholar
  8. 8.
    Da Silva, B., Konidaris, G., Barto, A.: Learning parameterized skills. International Conference on Machine Learning (ICML) (2012)Google Scholar
  9. 9.
    Daniel, C., Neumann, G., Peters, J.: Hierarchical Relative Entropy Policy Search. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2012)Google Scholar
  10. 10.
    Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task Policy Search for Robotics. In: IEEE International Conference on Robotics and Automation (ICRA) (2014)Google Scholar
  11. 11.
    Ha, S., Liu, C.: Evolutionary optimization for parameterized whole-body dynamic motor skills. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2016)Google Scholar
  12. 12.
    Hansen, N., Muller, S., Koumoutsakos, P.: Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation (2003)Google Scholar
  13. 13.
    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)CrossRefGoogle Scholar
  14. 14.
    Igel, C., Suttorp, T., Hansen, N.: A computational efficient covariance matrix update and a (1 + 1)-CMA for evolution strategies. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation (2006)Google Scholar
  15. 15.
    Ijspeert, A., Schaal, S.: Learning Attractor Landscapes for Learning Motor Primitives. In: Advances in Neural Information Processing Systems 15(NIPS) (2003)Google Scholar
  16. 16.
    Kober, J., Oztop, E., Peters, J.: Reinforcement Learning to adjust Robot Movements to New Situations. In: Proceedings of the Robotics: Science and Systems Conference (RSS) (2010)Google Scholar
  17. 17.
    Kober, J., Peters, J.: Policy Search for Motor Primitives in Robotics. Mach. Learn. 8, 1–33 (2010)zbMATHGoogle Scholar
  18. 18.
    Kupcsik, A., Deisenroth, M.P., Peters, J., Neumann, G.: Data-Efficient contextual policy search for robot movement skills. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2013)Google Scholar
  19. 19.
    Mannor, S., Rubinstein, R., Gat, Y.: The Cross Entropy method for Fast Policy Search. In: Proceedings of the 20th International Conference on Machine Learning (ICML) (2003)Google Scholar
  20. 20.
    Molga, M., Smutnicki, C.: Test Functions for Optimization Needs. In: http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf (2005)
  21. 21.
    Niehaus, C., Röfer, T., Laue, T.: Gait optimization on a humanoid robot using particle swarm optimization. In: Proceedings of the Second Workshop on Humanoid Soccer Robots in conjunction with the, pp. 1–7 (2007)Google Scholar
  22. 22.
    Peters, J., Mülling, K., Altun, Y.: Relative Entropy Policy Search. In: Proceedings of the 24th National Conference on Artificial Intelligence (AAAI). AAAI Press (2010)Google Scholar
  23. 23.
    Rückstieß, T., Felder, M., Schmidhuber, J.: State-dependent Exploration for Policy Gradient Methods. In: Proceedings of the European Conference on Machine Learning (ECML) (2008)Google Scholar
  24. 24.
    Stulp, F., Raiola, G., Hoarau, A., Ivaldi, S., Sigaud, O.: Learning Compact Parameterized Skills with a Single Regression. In: IEEE-RAS International Conference on Humanoid Robots (Humanoids) (2013)Google Scholar
  25. 25.
    Stulp, F., Sigaud, O.: Path Integral Policy Improvement with Covariance Matrix Adaptation. In: International Conference on Machine Learning (ICML) (2012)Google Scholar
  26. 26.
    Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y.P., Auger, A., Tiwari, S.: Problem Definitions and Evaluation Criteria for the CEC 2005 Special Session on Real-Parameter Optimization. Tech. rep., Nanyang Technological University, Singapore (2005)Google Scholar
  27. 27.
    Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Efficient Natural Evolution Strategies. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation(GECCO).  https://doi.org/10.1145/1569901.1569976 (2009)
  28. 28.
    Theodorou, E., Buchli, J., Schaal, S.: A Generalized Path Integral Control Approach to Reinforcement Learning. The Journal of Machine Learning Research (2010)Google Scholar
  29. 29.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Optimizing walking controllers. ACM Trans. Graph. (TOG) 28(5), 168 (2009)Google Scholar
  30. 30.
    Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Fitness Expectation Maximization. In: International Conference on Parallel Problem Solving from Nature, pp. 337–346. Springer (2008)Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.IEETA - Institute of Electronics and Informatics Engineering of AveiroUniversity of AveiroAveiroPortugal
  2. 2.LIACC - Artificial Intelligence and Computer Science LaboratoryUniversity of PortoPortoPortugal
  3. 3.CLAS - Computational Learning for Autonomous SystemsTechnische Universität DarmstadtDarmstadtGermany
  4. 4.University of LincolnLincolnUK

Personalised recommendations