Advertisement

Development of self-learning vision-based mobile robots for acquiring soccer robots behaviors

  • Takayuki Nakamura
Team Description Small-Size Robot Teams
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1395)

Abstract

An input generalization problem is one of the most important ones in applying reinforcement learning to real robot tasks. To cope with this problem, we propose a self-partitioning state space algorithm which can make non-uniform quantization of the multidimensional continuous state space. This method recursively splits its continuous state space into some coarse spaces called tentative states. It begins by supposing that such tentative states are regarded as the states for Q-learning. It collects Q values and statistical evidence regarding immediate rewards r and Q values within this tentative state space. When it finds out that a tentative state is relevant by the statistical test on minimum description length criterion, it partitions this coarse space into finer spaces. These procedures can make non-uniform quantization of the state space. Our method can be applied to non-deterministic domain because Q-learning is used to find out the optimal policy for accomplishing the given task. To show that our algorithm has generalization capability, we apply our method to two tasks in which a soccer robot shoots a ball into a goal and prevent a ball from entering a goal. To show the validity of this method, the experimental results for computer simulation and a real robot are shown.

Key Words

Self-organizing algorithm Reinforcement learning Vision-based mobile robots Soccer robots 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    H. Kitano, M. Tambe, Peter Stone, and et.al. “The robocup synthetic agent challenge 97”. In Proc. of The First International Workshop on RoboCup, pages 45–50, 1997.Google Scholar
  2. 2.
    M. Asada, Y. Kuniyoshi, A. Drogoul, and et.al. “The robocup physical agent challenge: phase i(draft)”. In Proc. of The First International Workshop on RoboCup, pages 51–56, 1997.Google Scholar
  3. 3.
    J. H. Connel and S. Mahadevan, editors. Robot Learning. Kluwer Academic Publishers, 1993.Google Scholar
  4. 4.
    D. Chapman and L. P. Kaelbling. “Input generalization in delayed reinforcement learning: An alogorithm and performance comparisons”. In Proc. of IJCAI-91, pages 726–731, 1991.Google Scholar
  5. 5.
    A. W. Moore and C. G. Atkeson. “The paxti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces”. Machine Learning, 21:199–233, 1995.Google Scholar
  6. 6.
    Long-Ji Lin. “Self-improving reactive agents based on reinforcement learning, planning and teaching”. Machine Learning, 8:293–321, 1992.Google Scholar
  7. 7.
    H. Ishiguro, R. Sato, and T. Ishida. “Robot oriented state space construction”. In Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems 1996 (IROS96), volume 3, 1996.Google Scholar
  8. 8.
    A. Ueno, K. Hori, and S. Nakasuka. “Simultaneous learning of situation classification based on rewards and behavior selection based on the situation”. In Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems 1996 (IROS96), volume 3, 1996.Google Scholar
  9. 9.
    M. Asada, S. Noda, and K. Hosoda. “Action-based sensor space categorization for robot learning”. In Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems 1996 (IROS96), volume 3, 1996.Google Scholar
  10. 10.
    Y. Takahashi, M. Asada, and K. Hosoda. “Reasonable performance in less learning time by real robot based on incremental state space segmentation”. In Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems 1996 (IROS96), volume 3, pages 1518–1524, 1996.Google Scholar
  11. 11.
    J. Boyan and A. Moore. “Generalization in reinforcement learning: Safely approximating the value function”. In Proceedings of Neural Information Processings Systems 7. Morgan Kaufmann, January 1995.Google Scholar
  12. 12.
    C. J. C. H. Watkins. “Learning from delayed rewards”. PhD thesis, King's College, University of Cambridge, May 1989.Google Scholar
  13. 13.
    G. Tesauro. “Practical issues in temporal difference learning”. Machine Learning, 8:257–277, 1992.Google Scholar
  14. 14.
    J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific, 1989.Google Scholar
  15. 15.
    R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.Google Scholar
  16. 16.
    L. P. Kaelbling. “Learning to achieve goals”. In Proc. of IJCAI-93, pages 1094–1098, 1993.Google Scholar
  17. 17.
    L. P. Kaelbling, M. L. Littman, and A. W. Moore. “Reinforcement learning: A survey”. Journal of Artificial Intelligence Research, 4, 1996.Google Scholar
  18. 18.
    G. J. Gordon. “Stable function approximation in dynamic rogramming”. In Proceedings of the Twelfth International Conference on Machine Learning, pages 261–268, 1995.Google Scholar
  19. 19.
    J. N. Tsitsiklis and B. V. Roy. “Feature-based methods for large scale dynamic programming”. Machine Learning, 22:1, 1996.Google Scholar
  20. 20.
    L. Baird. “Residual algorithms: Reinforcement learning with function approximation”. In Proceedings of the Twelfth International Conference on Machine Learning, pages 30–37, 1995.Google Scholar
  21. 21.
    R. S. Sutton. “Generalization in reinforcement learning: Successful examples using sparse coarse coding”. In Proc. of Neural Information Processing Systems 8, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Takayuki Nakamura
    • 1
  1. 1.Dept. of Information SystemsNara Inst. of Science and TechnologyIkoma, NaraJapan

Personalised recommendations