Advertisement

Accelerating Deep Q Network by Weighting Experiences

  • Kazuhiro Murakami
  • Koichi Moriyama
  • Atsuko Mutoh
  • Tohgoroh Matsui
  • Nobuhiro Inuzuka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11301)

Abstract

Deep Q Network (DQN) is a reinforcement learning methodlogy that uses deep neural networks to approximate the Q-function. Literature reveals that DQN can select better responses than humans. However, DQN requires a lengthy period of time to learn the appropriate actions by using tuples of state, action, reward and next state, called “experience”, sampled from its memory. DQN samples them uniformly and randomly, but the experiences are skewed resulting in slow learning because frequent experiences are redundantly sampled but infrequent ones are not. This work mitigates the problem by weighting experiences based on their frequency and manipulating their sampling probability. In a video game environment, the proposed method learned the appropriate responses faster than DQN.

Keywords

Reinforcement learning Deep learning 

References

  1. 1.
    Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1–31 (2010)CrossRefGoogle Scholar
  2. 2.
    Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv:1312.5602v1 (2013)
  3. 3.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRefGoogle Scholar
  4. 4.
    Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)
  5. 5.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)CrossRefGoogle Scholar
  6. 6.
    Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993)Google Scholar
  7. 7.
    Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (2016)Google Scholar
  8. 8.
    Miyazaki, K.: Exploitation-oriented learning with deep learning introducing profit sharing to a deep Q-network. J. Adv. Comput. Intell. Intell. Inform. 21(5), 849–855 (2017)CrossRefGoogle Scholar
  9. 9.
    Miyazaki, K., Yamamura, M., Kobayashi, S.: A theory of profit sharing in reinforcement learning. J. Jpn. Soc. Artif. Intell. 9(4), 580–587 (1994). (in Japanese)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Kazuhiro Murakami
    • 1
  • Koichi Moriyama
    • 1
  • Atsuko Mutoh
    • 1
  • Tohgoroh Matsui
    • 2
  • Nobuhiro Inuzuka
    • 1
  1. 1.Department of Computer Science, Graduate School of EngineeringNagoya Institute of TechnologyNagoyaJapan
  2. 2.Department of Clinical Engineering, College of Life and Health SciencesChubu UniversityKasugaiJapan

Personalised recommendations