Proposal of Exploitation-Oriented Learning PS-r#
Exploitation-oriented Learning (XoL) is a novel approach to goal-directed learning from interaction. Though reinforcement learning is much more focus on the learning and can gurantee the optimality in Markov Decision Processes (MDPs) environments, XoL aims to learn a rational policy, whose expected reward per an action is larger than zero, very quickly. We know PS-r* that is one of the XoL methods. It can learn an useful rational policy that is not inferior to a random walk in Partially Observed Markov Decision Processes (POMDPs) environments where the number of types of a reward is one. However, PS-r* requires O(MN 2) memories where N and M are the numbers of types of a sensory input and an action.In this paper, we propose PS-r# that can learn an useful rational policy in the POMDPs environments by O(MN) memories. We confirm the effectiveness of PS-r# in numerical examples.
KeywordsReinforcement Learning Sensory Input Rational Policy Markov Decision Process Partially Observe Markov Decision Process
Unable to display preview. Download preview PDF.
- 1.Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Proc. of 22th International Conference on Machine Learning, pp. 1–8 (2005)Google Scholar
- 2.Chrisman, L.: Reinforcement Learning with perceptual aliasing: The Perceptual Distinctions Approach. In: Proc. of 10th National Conference on Artificial Intelligence, pp. 183–188 (1992)Google Scholar
- 3.Kimura, H., Yamamura, M., Kobayashi, S.: Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward. In: Proc. of 12th International Conference on Machine Learning, pp. 295–303 (1995)Google Scholar
- 4.Merrick, K., Maher, M.L.: Motivated Reinforcement Learning for Adaptive Characters in Open-Ended Simulation Games. In: Proc. of the International Conference on Advanced in Computer Entertainment Technology, pp. 127–134 (2007)Google Scholar
- 5.McCallum, R.A.: Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State. In: Proc. of 12th International Conference on Machine Learning, pp. 387–395 (1995)Google Scholar
- 6.Miyazaki, K., Yamamura, M., Kobayashi, S.: On the Rationality of Profit Sharing in Reinforcement Learning. In: Proc. of 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing, pp. 285–288 (1994)Google Scholar
- 7.Miyazaki, K., Kobayashi, S.: Learning Deterministic Policies in Partially Observable Markov Decision Processes. In: Proc. of 5th International Conference on Intelligent Autonomous System, pp. 250–257 (1998)Google Scholar
- 8.Miyazaki, K., Kobayashi, S.: Reinforcement Learning for Penalty Avoiding Policy Making. In: Proc. of the 2000 IEEE International Conference on Systems, Man and Cybernetics, pp. 206–211 (2000)Google Scholar
- 11.Ng, A.Y., Russell, S.J.: Algorithms for Inverse Reinforcement Learning. In: Proc. of 17th International Conference on Machine Learning, pp. 663–670 (2000)Google Scholar
- 12.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, A Bradford Book. MIT Press, Cambridge (1998)Google Scholar