Advertisement

Adaption of Stepsize Parameter Using Newton’s Method

  • Itsuki Noda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7047)

Abstract

A method to optimize stepsize parameters in exponential moving average (EMA) based on Newton’s method to minimize square errors is proposed. The stepsize parameters used in reinforcement learning methods should be selected and adjusted carefully for dynamic and non-stationary environments. To find the suitable values for the stepsize parameters through learning, a framework to acquire higher-order derivatives of learning values by the stepsize parameters has been proposed. Based on this framework, the authors extend a method to determine the best stepsize using Newton’s method to minimize EMA of square error of learning. The method is confirmed by mathematical theories and by results of experiments.

Keywords

Expected Utility Learning Agent Approximate Dynamic Programming Reinforcement Learning Agent Multiagent Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bonarini, A., Lazaric, A., de Cote, E.M., Restelli, M.: Improving cooperation among self-interested reinforcement learning agents. In: Proc. of Workshop on Reinforcement Learning in Non-Stationary Environments. ECML-PKDD 2005 (October 2005)Google Scholar
  2. 2.
    Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Even-dar, E., Mansour, Y.: Learning rates for q-learning. Journal of Machine Learning Research 5, 2003 (December 2003)Google Scholar
  4. 4.
    George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine learning 65(1), 167–198 (2006)CrossRefGoogle Scholar
  5. 5.
    Noda, I.: Adaptation of stepsize parameter for non-stationary environments by recursive exponential moving average. In: Prof. of ECML 2009 LNIID Workshop, ECML, pp. 24–31 (September 2009)Google Scholar
  6. 6.
    Noda, I.: Recursive Adaptation of Stepsize Parameter for Non-stationary Environments. In: Taylor, M.E., Tuyls, K. (eds.) ALA 2009. LNCS, vol. 5924, pp. 74–90. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Sato, M., Kimura, H., Kobayashi, S.: TD algorithm for the variance of return and mean-variance reinforcement learning (in japanese). Transactions of the Japanese Society for Artificial Intelligence 16(3F), 353–362 (2001)CrossRefGoogle Scholar
  8. 8.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Itsuki Noda
    • 1
  1. 1.AIST, Tsukuba Univ. and Tokyo Inst. of Tech.TsukubaJapan

Personalised recommendations