Adaptive Reinforcement Learning for Dynamic Environment Based on Behavioral Habit

  • Akihiro Mimura
  • Shota Sumino
  • Shohei Kato
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 16)


In our previous works, we proposed the adjustment method for learning rate of reinforcement learning: ALR-P. In this method, the learning rate can be adjusted adaptively considering learning progress using a simple and general value TD-error. And we confirmed that the adaptive learning can be realized with the proposed method through a maze problem as dynamic environment. In this paper, we propose the additional ability for this method to realize the learning agent taking behavioral habit into the consideration. The behavioral habit has been taken in human consideration for important decision making in real world. We believe that the learning agent also should have the behavioral habit and take action considering it. We applied ALR-P with the some behavioral habits (ALR-BH) to dynamic maze problem. The experimental results show that the adaptively adjustment of the learning rate is effective for dynamic environment and ALR-BH enabled the learning agent to behave appropriate actions based on the behavioral habit.


Learning Rate Learning Agent Persistence Rate Total Reward Meta Parameter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Doya, K.: What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Networks 12(7), 961–974 (1999)CrossRefGoogle Scholar
  2. 2.
    Geoge, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167–198 (2006)CrossRefGoogle Scholar
  3. 3.
    Mimura, A., Kato, S.: Adaptive reinforcement learning based on degree of learning progress. In: 17th International Symposium on Artificial Life and Robotics (2012)Google Scholar
  4. 4.
    Mimura, A., Nishibe, S., Kato, S.: Kinetic chained throwing humanoid robots by using reinforcement learning. In: 12th International Symposium on Advanced Intelligent Systems, pp. 188–191 (2011)Google Scholar
  5. 5.
    Noda, I.: Adaptation of stepsize parameter to minimize exponential moving average of square error by newton’s method. In: 9th International Conference on Autonomous Agents and Multiagent Systems, pp. M-2–1 (2010)Google Scholar
  6. 6.
    Pessiglione, M., Seymour, B., Flandin, G., Dolan, R.J., Frith, C.D.: Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442(7106), 1042–1045 (2006)CrossRefGoogle Scholar
  7. 7.
    Riedmiller, M., Gabel, T., Hafner, R., Lange, S.: Reinforcement learning for robot soccer. Autonomous Robots 27(1), 55–73 (2004)CrossRefGoogle Scholar
  8. 8.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
  9. 9.
    Watkins, C.J.C.H., Dayan, P.: Q-leaning. Q-leaning. Machine Learning 8(3-4), 279–292 (1992)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Dept. of Computer Science and Engineering, Graduate School of EngineeringNagoya Institute of TechnologyNagoyaJapan

Personalised recommendations