Skip to main content

A Bayesian Posterior Updating Algorithm in Reinforcement Learning

  • Conference paper
  • First Online:
  • 4671 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10638))

Abstract

Bayesian reinforcement learning (BRL) is an important approach to reinforcement learning (RL) that takes full advantage of methods from Bayesian inference to incorporate prior information into the learning process when the agent interacts directly with environment without depending on exemplary supervision or complete models of the environment. BRL tackles the problem by expressing prior information in a probabilistic distribution to quantify the uncertainty, and updates these distributions when the evidences are collected. However, the expected total discounted rewards cannot be obtained instantly to maintain these distributions after each transition the agent executes. In this paper, we propose a novel idea to adjust immediate rewards slightly in the process of Bayesian Q-learning updating by introducing a state pool technique which could improve total rewards that accrue over a period of time when this pool resets appropriately. We show experimentally on several fundamental BRL problems that the proposed method can perform substantial improvements over other traditional strategies.

This is a preview of subscription content, log in via an institution.

References

  1. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Google Scholar 

  2. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (1998)

    Google Scholar 

  3. Ghavamzadeh, M., Mannor, S., Pineau, J., Tamar, A.: Bayesian reinforcement learning: a survey. Found. Trends? Mach. Learn. 8(5–6), 359–483 (2015)

    Article  MATH  Google Scholar 

  4. Vlassis, N., Ghavamzadeh, M., Mannor, S., Poupart, P.: Bayesian reinforcement learning. Reinforcement Learning 12, 359–386 (2012)

    Article  Google Scholar 

  5. Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: The Association for the Advancement of Artificial Intelligence, pp. 761–768 (1998)

    Google Scholar 

  6. Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the 22nd international conference on Machine learning, pp. 956–963 (2005)

    Google Scholar 

  7. Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3(Oct), 213–231 (2002)

    MathSciNet  MATH  Google Scholar 

  8. Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: Advances in neural information processing systems, pp. 2249–2257 (2011)

    Google Scholar 

  9. Strens, M.: A Bayesian framework for reinforcement learning. In: International Conference on Machine Learning, pp. 943–950 (2000)

    Google Scholar 

  10. Castronovo, M., Ernst, D., Couëtoux, A., Fonteneau, R.: Benchmarking for Bayesian reinforcement learning. PloS One 11(6), e0157088 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

This work is partly supported by NSFC grants 61375005, U1613213, 61210009, MOST grants 2015BAK35B00, 2015BAK35B01, Guangdong Science and Technology Department grant 2016B090910001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyong Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiong, F., Liu, Z., Yang, X., Sun, B., Chiu, C., Qiao, H. (2017). A Bayesian Posterior Updating Algorithm in Reinforcement Learning. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10638. Springer, Cham. https://doi.org/10.1007/978-3-319-70139-4_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70139-4_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70138-7

  • Online ISBN: 978-3-319-70139-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics