On Stable Profit Sharing Reinforcement Learning with Expected Failure Probability
In this paper, Expected Success Probability (ESP) is defined and a reinforcement learning method Stable Profit Sharing with Expected Failure Probability (SPSwithEFP) is proposed. In SPSwithEFP, Expected Failure Probability (EFP) is used in the roulette wheel selection method and ESP is used in the update equation of the weight of a rule. EFP can discard risky actions and ESP can make the distribution of learned results smaller. The effectiveness is shown with simulation experiments for a maze environment with pitfalls.
KeywordsReinforcement learning XoL Profit Sharing EFP
This work was supported by JSPS KAKENHI Grant Number 17K00327.
- 1.Miyazaki, K., Yamamura, M., Kobayashi, S.: On the rationality of profit sharing in reinforcement learning. In: Proceedings of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing, pp. 285–288 (1994)Google Scholar
- 3.Miyazaki, K., Muraoka, H., Kobayashi, H.: Proposal of a propagation algorithm of the expected failure probability and the effectiveness on multi-agent environments. In: SICE Annual Conference 2013, pp. 1067–1072 (2013)Google Scholar
- 5.Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop 2013 (2013)Google Scholar
- 7.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book. MIT Press, Cambridge (1998)Google Scholar