Robust Estimation of Natural Gradient in Optimization by Regularized Linear Regression

  • Luigi Malagò
  • Matteo Matteucci
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8085)


We are interested in the optimization of the expected value of a function by following a steepest descent policy over a statistical model. Such approach appears in many different model-based search meta-heuristics for optimization, for instance in the large class of random search methods in stochastic optimization and Evolutionary Computation. We study the case when statistical models belong to the exponential family and the direction of maximum decrement of the expected value is given by the natural gradient evaluated with respect to the Fisher Information metric. When the gradient cannot be computed exactly, a robust estimation allows to minimize the number of function evaluations required to obtain convergence to the global optimum. Under the choice of centered sufficient statistics, the estimation of the natural gradient corresponds to solving a least squares regression problem for the original function to be optimized. The correspondence between the estimation of the natural gradient and solving a linear regression problem leads to the definition of regularized versions of the natural gradient. We propose a robust estimation of the natural gradient for the exponential family based on regularized least squares.


information geometry regularized natural gradient stochastic gradient descent regularized least squares ridge regression lasso 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Arnold, L., Auger, A., Hansen, N., Ollivier, Y.: Information-geometric optimization algorithms: A unifying picture via invariance principles. arXiv:1106.3708 (2011)Google Scholar
  3. 3.
    Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Lecture Notes - Monograph Series, vol. 9. Institute of Mathematical Statistics (1986)Google Scholar
  4. 4.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of Statistics 32(2), 407–499 (2004)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer (2001)Google Scholar
  6. 6.
    Igel, C., Toussaint, M., Weishui, W.: Rprop using the natural gradient. In: Trends and Applications in Constructive Approximation, vol. 151, pp. 259–272. Birkhuser Verlag (2005)Google Scholar
  7. 7.
    Karshenas, H., Santana, R., Bielza, C., Larrañaga, P.: Regularized continuous estimation of distribution algorithms. Applied Soft Computing (2012)Google Scholar
  8. 8.
    Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algoritms. A New Tool for evolutionary Computation. Springer (2001)Google Scholar
  9. 9.
    Malagò, L., Matteucci, M., Pistone, G.: Stochastic natural gradient descent by estimation of empirical covariances. In: Proc. of IEEE CEC 2011, pp. 949–956 (2011)Google Scholar
  10. 10.
    Malagò, L., Matteucci, M., Pistone, G.: Towards the geometry of estimation of distribution algorithms based on the exponential family. In: Proc. of FOGA 2011, pp. 230–242. ACM (2011)Google Scholar
  11. 11.
    Malagò, L., Matteucci, M., Pistone, G.: Natural gradient, fitness modelling and model selection: A unifying perspective. In: Proc. of IEEE CEC 2013 (2013)Google Scholar
  12. 12.
    Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning, pp. 515–521. Morgan Kaufmann (1998)Google Scholar
  13. 13.
    Schäfer, J., Strimmer, K.: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology 4(1) (2005)Google Scholar
  14. 14.
    Shakya, S., McCall, J.: Optimization by Estimation of Distribution with DEUM framework based on Markov random fields. International Journal of Automation and Computing 4(3), 262–272 (2007)CrossRefGoogle Scholar
  15. 15.
    Shakya, S., McCall, J., Brown, D.: Updating the probability vector using MRF technique for a Univariate EDA. In: Proc. of STAIRS 2004, pp. 15–25. IOS Press (2004)Google Scholar
  16. 16.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)Google Scholar
  17. 17.
    Valentini, G., Malagò, L., Matteucci, M.: Optimization by ℓ1-constrained markov fitness modelling. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, vol. 7219, pp. 250–264. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Natural evolution strategies. In: Proc. of IEEE CEC 2008, pp. 3381–3387 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Luigi Malagò
    • 1
  • Matteo Matteucci
    • 2
  1. 1.Dept. of Computer ScienceUniversità degli Studi di MilanoMilanItaly
  2. 2.Dept. of Electronics Information and BioengineeringPolitecnico di MilanoMilanItaly

Personalised recommendations