Enhanced Reinforcement Learning by Recursive Updating of Q-values for Reward Propagation
In this paper, we propose a method to reduce the learning time of Q-learning by combining the method of updating even to Q-values of unexecuted actions with the method of adding a terminal reward to unvisited Q-values. To verify the method, its performance was compared to that of conventional Q-learning. The proposed approach showed the same performance as conventional Q-learning, with only 27 % of the learning episodes required for conventional Q-learning. Accordingly, we verified that the proposed method reduced learning time by updating more Q-values in the early stage of learning and distributing a terminal reward to more Q-values.
KeywordsQ-learning Terminal reward Propagation Q-value
This work was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (2011-0011266).
- 1.Sung Y, Cho K (2012) Collaborative programming by demonstration for human, robot, and software agent team members in a virtual environment. IEEE Intell Syst 27(2):674–679Google Scholar
- 3.Melo FS, Ribeiro MI (2007) Q-learning with linear function approximation. In: Learning theory: 20th annual conference on learning theory, Lecture notes in artificial intelligence (LNAI), vol 4539, pp 308–322Google Scholar
- 4.Thomaz AL, Hoffman G, Breazeal C (2006) Reinforcement learning with human teachers: understanding how people want to teach robots. In: the 15th IEEE International Symposium on Robot Hum Interact Commun pp 352–257Google Scholar
- 5.Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Log Intell Syst 11(5):388–394Google Scholar
- 7.Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130Google Scholar
- 8.Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Logic Intell Syst 11(5):388–394Google Scholar
- 9.Singh S, Sutton RS, Kaelbling P (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22:123–158Google Scholar
- 10.Lee SG (2006) A cooperation online reinforcement learning approach in Ant-Q. Lecture notes in computer science (LNCS) 4232, pp 487–494Google Scholar
- 11.Wiering MA (2004) QV(λ)-learning: a new on-policy reinforcement learning algorithm. Mach Learn 55(1):5–29Google Scholar
- 12.Peng J, Williams RJ (1994) Incremental multi-step Q-learning. Mach Learn 226–232Google Scholar
- 13.McGovern A, Sutton RS, Fagg AH (1997) Roles of macro-actions in accelerating reinforcement learning. In: Grace Hopper celebration of women in computing, pp 13–18Google Scholar
- 14.Kim BC, Yun BJ (1999) Reinforcement learning using propagation of goal-state-value. J Korea Inf Process 6(5):1303–1311Google Scholar