Evaluation of the Improved Penalty Avoiding Rational Policy Making Algorithm in Real World Environment
We focus on a potential capability of Exploitation-oriented Learning (XoL) in non-Markov multi-agent environments. XoL has some degree of rationality in non-Markov environments and is also confirmed the effectiveness by computer simulations. Penalty Avoiding Rational Policy Making algorithm (PARP) that is one of XoL methods was planed to learn a penalty avoiding policy. PARP is improved to save memories and to cope with uncertainties, that is called Improved PARP. Though the effectiveness of Improved PARP has been confirmed on computer simulations, there is no result in real world environment. In this paper, we show the effectiveness of Improved PARP in real world environment using a keepaway task that is a testbed of multi-agent soccer environment.
KeywordsReinforcement Learning Exploitaion-oritented Learning Keepaway Task Soccer Robot
Unable to display preview. Download preview PDF.
- 1.Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Proc. of the 22nd International Conference on Machine Learning, pp. 1–8 (2005)Google Scholar
- 3.Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value function. In: Proc. of the 15th Int. Conf. on Machine Learning, pp. 278–286 (1998)Google Scholar
- 4.Hong, T., Wu, C.: An Improved Weighted Clustering Algorithm for Determination of Application Nodes in Heterogeneous Sensor Networks. J. of Information Hiding and Multimedia Signal Processing. 2(2), 173–184 (2011)Google Scholar
- 5.Kuroda, S., Miyazaki, K., Kobayashi, H.: Introduction of Fixed Mode States into Online Profit Sharing and Its Application to Waist Trajectory Generation of Biped Robot. In: European Workshop on Reinforcement Learning 9 (2011)Google Scholar
- 7.Miyazaki, K., Yamamura, M., Kobayashi, S.: On the Rationality of Profit Sharing in Reinforcement Learning. In: Proc. of the 3rd Int. Conf. on Fuzzy Logic, Neural Nets and Soft Computing, pp. 285–288 (1994)Google Scholar
- 8.Miyazaki, K., Kobayashi, S.: Learning Deterministic Policies in Partially Observable Markov Decision Processes. In: Proc. of 5th Int. Conf. on Intelligent Autonomous System, pp. 250–257 (1998)Google Scholar
- 9.Miyazaki, K., Kobayashi, S.: Reinforcement Learning for Penalty Avoiding Policy Making. In: Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206–211 (2000)Google Scholar
- 12.Ng, A.Y.,, Russell, S.J.: Algorithms for Inverse Reinforcement Learning. In: Proc. of the 17th Int. Conf. on Machine Learning, pp. 663–670 (2000)Google Scholar
- 14.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book. MIT Press (1998)Google Scholar