Skip to main content

Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies

  • Conference paper
  • First Online:
Book cover Parallel Problem Solving from Nature – PPSN XV (PPSN 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11102))

Included in the following conference series:

Abstract

Evolution Strategies (ESs) have recently become popular for training deep neural networks, in particular on reinforcement learning tasks, a special form of controller design. Compared to classic problems in continuous direct search, deep networks pose extremely high-dimensional optimization problems, with many thousands or even millions of variables. In addition, many control problems give rise to a stochastic fitness function. Considering the relevance of the application, we study the suitability of evolution strategies for high-dimensional, stochastic problems. Our results give insights into which algorithmic mechanisms of modern ES are of value for the class of problems at hand, and they reveal principled limitations of the approach. They are in line with our theoretical understanding of ESs. We show that combining ESs that offer reduced internal algorithm cost with uncertainty handling techniques yields promising methods for this class of problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/NiMlr/High-Dim-ES-RL.

References

  1. Akimoto, Y., Auger, A., Hansen, N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 373–380. ACM (2014)

    Google Scholar 

  2. Beyer, H.-G., Arnold, D.V.: Qualms regarding the optimality of cumulative path length control in CSA/CMA-evolution strategies. Evol. Comput. 11(1), 19–28 (2003)

    Article  Google Scholar 

  3. Beyer, H.-G., Hellwig, M.: Analysis of the pcCMSA-ES on the noisy ellipsoid model. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 689–696. ACM (2017)

    Google Scholar 

  4. Beyer, H.-G., Schwefel, H.-P.: Evolution strategies-a comprehensive introduction. Nat. Comput. 1(1), 3–52 (2002)

    Article  MathSciNet  Google Scholar 

  5. Beyer, H.-G., Sendhoff, B.: Simplify your covariance matrix adaptation evolution strategy. IEEE Trans. Evol. Comput. 21(5), 746–759 (2017). https://ieeexplore.ieee.org/document/7875115/

    Article  Google Scholar 

  6. Chrabaszcz, P., Loshchilov, I., Hutter, F.: Back to basics: benchmarking canonical evolution strategies for playing atari. Technical report 1802.08842, arXiv.org (2018)

  7. Wierstra, D.: Natural evolution strategies. J. Mach. Learn. Res. 15(1), 949–980 (2014)

    MathSciNet  MATH  Google Scholar 

  8. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  9. Such, F., et al.: Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. Technical report 1712.06567, arXiv.org (2017)

  10. Brockman, G., et al.: OpenAI gym. Technical report 1606.01540, arXiv.org (2016)

  11. Loshchilov, I., et al.: Limited-memory matrix adaptation for large scale black-box optimization. Technical report 1705.06693, arXiv.org (2017)

  12. Lehman, J., et al.: ES is more than just a traditional finite-difference approximator. Technical report 1712.06568v2, arXiv.org (2017)

  13. Plappert, M., et al.: Parameter space noise for exploration. Technical report 1706.01905v2, arXiv.org (2017)

  14. Hansen, N., et al.: A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion. IEEE Trans. Evol. Comput. 13(1), 180–197 (2009)

    Article  Google Scholar 

  15. Hansen, N., et al.: COCO: a platform for comparing continuous optimizers in a black-box setting. Technical report 1603.08785, arXiv.org (2016)

  16. Geijtenbeek, T., et al.: Flexible muscle-based locomotion for bipedal creatures. ACM Trans. Graph. (TOG) 32(6), 206 (2013)

    Article  Google Scholar 

  17. Salimans, T., et al.: Evolution strategies as a scalable alternative to reinforcement learning. Technical report 1703.03864, arXiv.org (2017)

  18. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  19. Li, X., et al.: Benchmark functions for the CEC 2013 special session and competition on large-scale global optimization. Gene 7(33), 8 (2013)

    Google Scholar 

  20. Sun, Y., et al.: A linear time natural evolution strategy for non-separable functions. In: Conference Companion on Genetic and Evolutionary Computation. ACM (2013)

    Google Scholar 

  21. Hansen, N., Arnold, D.V., Auger, A.: Evolution strategies. In: Kacprzyk, J., Pedrycz, W. (eds.) Springer Handbook of Computational Intelligence, pp. 871–898. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-43505-2_44

    Chapter  Google Scholar 

  22. Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution atrategies. Evol. Comput. 9(2), 159–195 (2001)

    Article  Google Scholar 

  23. Heidrich-Meisner, V., Igel, C.: Neuroevolution strategies for episodic reinforcement learning. J. Algorithms 64(4), 152–168 (2009)

    Article  Google Scholar 

  24. Igel, C.: Neuroevolution for reinforcement learning using evolution strategies. In: Congress on Evolutionary Computation, vol. 4, pp. 2588–2595 (2003)

    Google Scholar 

  25. Jägersküpper, J.: How the (1+1)-ES using isotropic mutations minimizes positive definite quadratic forms. Theor. Comput. Sci. 361(1), 38–56 (2006)

    Article  MathSciNet  Google Scholar 

  26. Jebalia, M., Auger, A.: On multiplicative noise models for stochastic search. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 52–61. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87700-4_6

    Chapter  Google Scholar 

  27. Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)

    Google Scholar 

  28. Loshchilov, I.: A computationally efficient limited memory CMA-ES for large scale optimization. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 397–404. ACM (2014)

    Google Scholar 

  29. Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. J. Artif. Intell. Res. (JAIR) 11, 241–276 (1999)

    Article  Google Scholar 

  30. Rechenberg, I.: Evolutionsstrategie-Optimierung technischer Systeme nach Prinzipien der biologischen Evolution (1973)

    Google Scholar 

  31. Ros, R., Hansen, N.: A simple modification in CMA-ES achieving linear time and space complexity. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 296–305. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87700-4_30

    Chapter  Google Scholar 

  32. Stanley, K., D’Ambrosio, D., Gauci, J.: A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15(2), 185–212 (2009)

    Article  Google Scholar 

  33. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)

    Article  Google Scholar 

  34. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press, Cambridge (1998)

    Google Scholar 

  35. Teytaud, O., Gelly, S.: General lower bounds for evolutionary algorithms. In: Runarsson, T.P., Beyer, H.-G., Burke, E., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 21–31. Springer, Heidelberg (2006). https://doi.org/10.1007/11844297_3

    Chapter  Google Scholar 

  36. https://www.researchgate.net/publication/220743287_Uncertainty_handling_CMA-ES_for_reinforcement_learning

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tobias Glasmachers .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Müller, N., Glasmachers, T. (2018). Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies. In: Auger, A., Fonseca, C., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds) Parallel Problem Solving from Nature – PPSN XV. PPSN 2018. Lecture Notes in Computer Science(), vol 11102. Springer, Cham. https://doi.org/10.1007/978-3-319-99259-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99259-4_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99258-7

  • Online ISBN: 978-3-319-99259-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics