, Volume 64, Issue 4, pp 698–716 | Cite as

Theoretical Foundation for CMA-ES from Information Geometry Perspective

  • Youhei Akimoto
  • Yuichi Nagata
  • Isao Ono
  • Shigenobu Kobayashi


This paper explores the theoretical basis of the covariance matrix adaptation evolution strategy (CMA-ES) from the information geometry viewpoint.

To establish a theoretical foundation for the CMA-ES, we focus on a geometric structure of a Riemannian manifold of probability distributions equipped with the Fisher metric. We define a function on the manifold which is the expectation of fitness over the sampling distribution, and regard the goal of update of the parameters of sampling distribution in the CMA-ES as maximization of the expected fitness. We investigate the steepest ascent learning for the expected fitness maximization, where the steepest ascent direction is given by the natural gradient, which is the product of the inverse of the Fisher information matrix and the conventional gradient of the function.

Our first result is that we can obtain under some types of parameterization of multivariate normal distribution the natural gradient of the expected fitness without the need for inversion of the Fisher information matrix. We find that the update of the distribution parameters in the CMA-ES is the same as natural gradient learning for expected fitness maximization. Our second result is that we derive the range of learning rates such that a step in the direction of the exact natural gradient improves the parameters in the expected fitness. We see from the close relation between the CMA-ES and natural gradient learning that the default setting of learning rates in the CMA-ES seems suitable in terms of monotone improvement in expected fitness. Then, we discuss the relation to the expectation-maximization framework and provide an information geometric interpretation of the CMA-ES.


CMA-ES Information geometry Natural gradient learning Expectation-maximization algorithms Expected fitness maximization Continuous optimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akimoto, Y., Nagata, Y., Ono, I., Kobayashi S.: Theoretical analysis of evolutionary computation on continuously differentiable functions. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2010, pp. 1401–1408 (2010) Google Scholar
  2. 2.
    Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Bidirectional relation between CMA evolution strategies and natural evolution strategies. In: Parallel Problem Solving from Nature—PPSN XI, pp. 154–163. Springer, Berlin (2010) CrossRefGoogle Scholar
  3. 3.
    Akimoto, Y., Sakuma, J., Ono, I., Kobayashi, S.: Functionally specialized CMA-ES: a modification of CMA-ES based on the specialization of the functions of covariance matrix adaptation and step size adaptation. In: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation—GECCO ’08, pp. 479–486 (2008) CrossRefGoogle Scholar
  4. 4.
    Amari, S.: Information geometry of the EM and em algorithms for neural networks. Neural Netw. 8(9), 1379–1408 (1995) CrossRefGoogle Scholar
  5. 5.
    Amari S.i.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998) MathSciNetCrossRefGoogle Scholar
  6. 6.
    Amari S.i., Douglas S.: Why natural gradient? In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 1213–1216 (1998) Google Scholar
  7. 7.
    Amari S.i., Nagaoka H.: Methods of Information Geometry. American Mathematical Society, Providence (2007) Google Scholar
  8. 8.
    Billingsley, P.: Probability and Measure, 3rd edn. Wiley-Interscience, New York (1995) MATHGoogle Scholar
  9. 9.
    Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Comput. 9(2), 271–278 (1997) MATHCrossRefGoogle Scholar
  10. 10.
    Dempster, A., Laird, N.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977) MathSciNetMATHGoogle Scholar
  11. 11.
    Evans, M., Swartz, T.: Approximating Integrals via Monte Carlo and Deterministic Methods. Oxford University Press, New York (2000) MATHGoogle Scholar
  12. 12.
    Glasmachers, T., Schaul, T., Yi, S., Wierstra, D., Schmidhuber, J.: Exponential natural evolution strategies. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 393–400 (2010) Google Scholar
  13. 13.
    Hansen, N.: The CMA Evolution Strategy: A Comparing Review, pp. 75–102. Springer, Berlin (2006) Google Scholar
  14. 14.
    Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003) CrossRefGoogle Scholar
  15. 15.
    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001) CrossRefGoogle Scholar
  16. 16.
    Harville, D.A.: Matrix Algebra from a Statistician’s Perspective. Springer, Berlin (2008) MATHGoogle Scholar
  17. 17.
    Jastrebski, G., Arnold, D.V.: Improving evolution strategies through active covariance matrix adaptation. In: 2006 IEEE International Conference on Evolutionary Computation, pp. 9719–9726 (2006) Google Scholar
  18. 18.
    Kay, S.M.: Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall, New York (1993) Google Scholar
  19. 19.
    Kober, J., Peters, J.: Policy search for motor primitives in robotics. Adv. Neural Inf. Process. Syst. 22, 1–8 (2009) Google Scholar
  20. 20.
    Kullback, S.: Information Theory and Statistics. Wiley, New York (1959) MATHGoogle Scholar
  21. 21.
    Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic, Norwell (2002) MATHGoogle Scholar
  22. 22.
    Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, vol. 89, pp. 355–368 (1998) Google Scholar
  23. 23.
    Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008) CrossRefGoogle Scholar
  24. 24.
    Ros, R., Hansen, N.: A simple modification in CMA-ES achieving linear time and space complexity. In: Parallel Problem Solving from Nature—PPSN X, pp. 296–305 (2008) CrossRefGoogle Scholar
  25. 25.
    Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Efficient natural evolution strategies. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation—GECCO ’09, pp. 539–545 (2009) CrossRefGoogle Scholar
  26. 26.
    Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Stochastic search using the natural gradient. In: Proceedings of the 26th International Conference on Machine Learning, pp. 1161–1168 (2009) Google Scholar
  27. 27.
    Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Fitness expectation maximization. In: Parallel Problem Solving from Nature—PPSN X, pp. 337–346. Springer, Berlin (2008) CrossRefGoogle Scholar
  28. 28.
    Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Natural evolution strategies. In: IEEE Congress on Evolutionary Computation, pp. 3381–3387 (2008) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Youhei Akimoto
    • 1
  • Yuichi Nagata
    • 2
  • Isao Ono
    • 2
  • Shigenobu Kobayashi
    • 2
  1. 1.TAO Team – INRIA SaclayLRI – Paris-Sud UniversityOrsay CedexFrance
  2. 2.Interdisciplinary Graduate School of Science and EngineeringTokyo Institute of TechnologyYokohamaJapan

Personalised recommendations