Advertisement

Evaluation of the Improved Penalty Avoiding Rational Policy Making Algorithm in Real World Environment

  • Kazuteru Miyazaki
  • Masaki Itou
  • Hiroaki Kobayashi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7196)

Abstract

We focus on a potential capability of Exploitation-oriented Learning (XoL) in non-Markov multi-agent environments. XoL has some degree of rationality in non-Markov environments and is also confirmed the effectiveness by computer simulations. Penalty Avoiding Rational Policy Making algorithm (PARP) that is one of XoL methods was planed to learn a penalty avoiding policy. PARP is improved to save memories and to cope with uncertainties, that is called Improved PARP. Though the effectiveness of Improved PARP has been confirmed on computer simulations, there is no result in real world environment. In this paper, we show the effectiveness of Improved PARP in real world environment using a keepaway task that is a testbed of multi-agent soccer environment.

Keywords

Reinforcement Learning Exploitaion-oritented Learning Keepaway Task Soccer Robot 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Proc. of the 22nd International Conference on Machine Learning, pp. 1–8 (2005)Google Scholar
  2. 2.
    Arai, S., Tanaka, N.: Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains – RoboCup Soccer Keepaway. Transactions of the Japanese Society for Artificial Intelligence 21(6), 537–546 (2006) (in Japanese)CrossRefGoogle Scholar
  3. 3.
    Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value function. In: Proc. of the 15th Int. Conf. on Machine Learning, pp. 278–286 (1998)Google Scholar
  4. 4.
    Hong, T., Wu, C.: An Improved Weighted Clustering Algorithm for Determination of Application Nodes in Heterogeneous Sensor Networks. J. of Information Hiding and Multimedia Signal Processing. 2(2), 173–184 (2011)Google Scholar
  5. 5.
    Kuroda, S., Miyazaki, K., Kobayashi, H.: Introduction of Fixed Mode States into Online Profit Sharing and Its Application to Waist Trajectory Generation of Biped Robot. In: European Workshop on Reinforcement Learning 9 (2011)Google Scholar
  6. 6.
    Lin, T.C., Huang, H.C., Liao, B.Y., Pan, J.S.: An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index. International Journal of Computer Sciences and Engineering Systems 1(4), 253–257 (2007)zbMATHGoogle Scholar
  7. 7.
    Miyazaki, K., Yamamura, M., Kobayashi, S.: On the Rationality of Profit Sharing in Reinforcement Learning. In: Proc. of the 3rd Int. Conf. on Fuzzy Logic, Neural Nets and Soft Computing, pp. 285–288 (1994)Google Scholar
  8. 8.
    Miyazaki, K., Kobayashi, S.: Learning Deterministic Policies in Partially Observable Markov Decision Processes. In: Proc. of 5th Int. Conf. on Intelligent Autonomous System, pp. 250–257 (1998)Google Scholar
  9. 9.
    Miyazaki, K., Kobayashi, S.: Reinforcement Learning for Penalty Avoiding Policy Making. In: Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206–211 (2000)Google Scholar
  10. 10.
    Miyazaki, K., Kobayashi, S.: A Reinforcement Learning System for Penalty Avoiding in Continuous State Spaces. J. of Advanced Computational Intelligence and Intelligent Informatics 11(6), 668–676 (2007)CrossRefGoogle Scholar
  11. 11.
    Miyazaki, K., Kobayashi, S.: Exploitation-Oriented Learning PS-r#. J. of Advanced Computational Intelligence and Intelligent Informatics 13(6), 624–630 (2009)CrossRefGoogle Scholar
  12. 12.
    Ng, A.Y.,, Russell, S.J.: Algorithms for Inverse Reinforcement Learning. In: Proc. of the 17th Int. Conf. on Machine Learning, pp. 663–670 (2000)Google Scholar
  13. 13.
    Stone, P., Sutton, R.S., Kuhlamann, G.: Reinforcement Learning toward RoboCup Soccer Keepaway. Adaptive Behavior 13(3), 0165–0188 (2005)CrossRefGoogle Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book. MIT Press (1998)Google Scholar
  15. 15.
    Watanabe, T., Miyazaki, K., Kobayashi, H.: A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces. J. of Advanced Computational Intelligence and Intelligent Informatics. 13(6), 675–682 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kazuteru Miyazaki
    • 1
  • Masaki Itou
    • 2
  • Hiroaki Kobayashi
    • 3
  1. 1.National Institution for Academic Degrees and University EvaluationJapan
  2. 2.Toshiba Tec CorporationJapan
  3. 3.Meiji UniversityJapan

Personalised recommendations