A Bayesian Posterior Updating Algorithm in Reinforcement Learning

Xiong, Fangzhou; Liu, Zhiyong; Yang, Xu; Sun, Biao; Chiu, Charles; Qiao, Hong

doi:10.1007/978-3-319-70139-4_42

A Bayesian Posterior Updating Algorithm in Reinforcement Learning

Fangzhou Xiong^18,19,
Zhiyong Liu^18,19,20,22,
Xu Yang¹⁸,
Biao Sun²¹,
Charles Chiu²³ &
…
Hong Qiao^{18,19,20,21,22}

Conference paper
First Online: 29 October 2017

4671 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10638))

Abstract

Bayesian reinforcement learning (BRL) is an important approach to reinforcement learning (RL) that takes full advantage of methods from Bayesian inference to incorporate prior information into the learning process when the agent interacts directly with environment without depending on exemplary supervision or complete models of the environment. BRL tackles the problem by expressing prior information in a probabilistic distribution to quantify the uncertainty, and updates these distributions when the evidences are collected. However, the expected total discounted rewards cannot be obtained instantly to maintain these distributions after each transition the agent executes. In this paper, we propose a novel idea to adjust immediate rewards slightly in the process of Bayesian Q-learning updating by introducing a state pool technique which could improve total rewards that accrue over a period of time when this pool resets appropriately. We show experimentally on several fundamental BRL problems that the proposed method can perform substantial improvements over other traditional strategies.

This is a preview of subscription content, log in via an institution.

References

Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (1998)
Google Scholar
Ghavamzadeh, M., Mannor, S., Pineau, J., Tamar, A.: Bayesian reinforcement learning: a survey. Found. Trends? Mach. Learn. 8(5–6), 359–483 (2015)
Article MATH Google Scholar
Vlassis, N., Ghavamzadeh, M., Mannor, S., Poupart, P.: Bayesian reinforcement learning. Reinforcement Learning 12, 359–386 (2012)
Article Google Scholar
Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: The Association for the Advancement of Artificial Intelligence, pp. 761–768 (1998)
Google Scholar
Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the 22nd international conference on Machine learning, pp. 956–963 (2005)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3(Oct), 213–231 (2002)
MathSciNet MATH Google Scholar
Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: Advances in neural information processing systems, pp. 2249–2257 (2011)
Google Scholar
Strens, M.: A Bayesian framework for reinforcement learning. In: International Conference on Machine Learning, pp. 943–950 (2000)
Google Scholar
Castronovo, M., Ernst, D., Couëtoux, A., Fonteneau, R.: Benchmarking for Bayesian reinforcement learning. PloS One 11(6), e0157088 (2016)
Article Google Scholar

Download references

Acknowledgments

This work is partly supported by NSFC grants 61375005, U1613213, 61210009, MOST grants 2015BAK35B00, 2015BAK35B01, Guangdong Science and Technology Department grant 2016B090910001.

Author information

Authors and Affiliations

The State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Science, Beijing, 100190, China
Fangzhou Xiong, Zhiyong Liu, Xu Yang & Hong Qiao
School of Computer and Control, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
Fangzhou Xiong, Zhiyong Liu & Hong Qiao
CAS Centre for Excellence in Brain Science and Intelligence Technology (CEBSIT), Shanghai, 200031, China
Zhiyong Liu & Hong Qiao
University of Science and Technology Beijing, Beijing, 100083, China
Biao Sun & Hong Qiao
Cloud Computing Center, Chinese Academy of Sciences, DongGuan, 523808, Guangdong, China
Zhiyong Liu & Hong Qiao
School for Higher and Professional Education, Chai Wan, Hong Kong, China
Charles Chiu

Authors

Fangzhou Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Biao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Charles Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyong Liu .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiong, F., Liu, Z., Yang, X., Sun, B., Chiu, C., Qiao, H. (2017). A Bayesian Posterior Updating Algorithm in Reinforcement Learning. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10638. Springer, Cham. https://doi.org/10.1007/978-3-319-70139-4_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-70139-4_42
Published: 29 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70138-7
Online ISBN: 978-3-319-70139-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics