Enhanced Reinforcement Learning by Recursive Updating of Q-values for Reward Propagation

  • Yunsick Sung
  • Eunyoung Ahn
  • Kyungeun Cho
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 215)


In this paper, we propose a method to reduce the learning time of Q-learning by combining the method of updating even to Q-values of unexecuted actions with the method of adding a terminal reward to unvisited Q-values. To verify the method, its performance was compared to that of conventional Q-learning. The proposed approach showed the same performance as conventional Q-learning, with only 27 % of the learning episodes required for conventional Q-learning. Accordingly, we verified that the proposed method reduced learning time by updating more Q-values in the early stage of learning and distributing a terminal reward to more Q-values.


Q-learning Terminal reward Propagation Q-value 



This work was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (2011-0011266).


  1. 1.
    Sung Y, Cho K (2012) Collaborative programming by demonstration for human, robot, and software agent team members in a virtual environment. IEEE Intell Syst 27(2):674–679Google Scholar
  2. 2.
    Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292MATHGoogle Scholar
  3. 3.
    Melo FS, Ribeiro MI (2007) Q-learning with linear function approximation. In: Learning theory: 20th annual conference on learning theory, Lecture notes in artificial intelligence (LNAI), vol 4539, pp 308–322Google Scholar
  4. 4.
    Thomaz AL, Hoffman G, Breazeal C (2006) Reinforcement learning with human teachers: understanding how people want to teach robots. In: the 15th IEEE International Symposium on Robot Hum Interact Commun pp 352–257Google Scholar
  5. 5.
    Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Log Intell Syst 11(5):388–394Google Scholar
  6. 6.
    Kormushev P, Nomoto K, Dong F, Hirota K (2008) Time manipulation technique for speeding up reinforcement learning in simulations. Int J Cybern Inf Technol 8(1):12–24MathSciNetGoogle Scholar
  7. 7.
    Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130Google Scholar
  8. 8.
    Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Logic Intell Syst 11(5):388–394Google Scholar
  9. 9.
    Singh S, Sutton RS, Kaelbling P (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22:123–158Google Scholar
  10. 10.
    Lee SG (2006) A cooperation online reinforcement learning approach in Ant-Q. Lecture notes in computer science (LNCS) 4232, pp 487–494Google Scholar
  11. 11.
    Wiering MA (2004) QV(λ)-learning: a new on-policy reinforcement learning algorithm. Mach Learn 55(1):5–29Google Scholar
  12. 12.
    Peng J, Williams RJ (1994) Incremental multi-step Q-learning. Mach Learn 226–232Google Scholar
  13. 13.
    McGovern A, Sutton RS, Fagg AH (1997) Roles of macro-actions in accelerating reinforcement learning. In: Grace Hopper celebration of women in computing, pp 13–18Google Scholar
  14. 14.
    Kim BC, Yun BJ (1999) Reinforcement learning using propagation of goal-state-value. J Korea Inf Process 6(5):1303–1311Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Department of Game EngineeringGraduate School, Dongguk UniversitySeoulKorea
  2. 2.Department of Multimedia EngineeringHanbat National UniversityDeajeonSouth Korea
  3. 3.Department of Multimedia EngineeringDongguk UnversitySeoulKorea

Personalised recommendations