Machine Learning

, Volume 23, Issue 2–3, pp 279–303 | Cite as

Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning

  • Minoru Asada
  • Shoichi Noda
  • Sukoya Tawaratsumida
  • Koh Hosoda


This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a real robot with vision sensor by which the robot can obtain information about the changes in an environment. First, we construct a state space in terms of size, position, and orientation of a ball and a goal in an image, and an action space is designed in terms of the action commands to be sent to the left and right motors of a mobile robot. This causes a state-action deviation problem in constructing the state and action spaces that reflect the outputs from physical sensors and actuators, respectively. To deal with this issue, an action set is constructed in a way that one action consists of a series of the same action primitive which is successively executed until the current state changes. Next, to speed up the learning time, a mechanism of Learning from Easy Missions (or LEM) is implemented. LEM reduces the learning time from exponential to almost linear order in the size of the state space. The results of computer simulations and real robot experiments are given.

reinforcement learning vision learning from easy mission state-action deviation 


  1. Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.Google Scholar
  2. Chapman, D. & Kaelbling, L. P. (1991). "Input generalization in delayed reinforcement learning: An alogorithm and performance comparisons". In Proc. of IJCAI-91, pages 726–731.Google Scholar
  3. Connel, J. H. & Mahadevan, S. editors (1993). Robot Learning. Kluwer Academic Publishers.Google Scholar
  4. Connel, J. H. & Mahadevan, S. (1993). "Rapid task learning for real robot". In J. H. Connel and S. Mahadevan, editors, Robot Learning, chapter 5. Kluwer Academic Publishers.Google Scholar
  5. Fagg, A. H., Lotspeich, D., & Bekey, G. A. (1994). "A reinforcement learning approach to reactive control policy design for autonomous robots". In Proc. of 1994 IEEE Int. Conf. on Robotics and Automation, pages 39–44.Google Scholar
  6. Inaba, M. (1993). "Remote-brained robotics: Interfacing ai with real world behaviors". In Preprints of ISRR'93, Pitsuburg.Google Scholar
  7. Kaelbling, L. P. (1993). "Learning to achieve goals". In Proc. of IJCAI-93, pages 1094–1098.Google Scholar
  8. Lin, Long-Ji (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293–321.Google Scholar
  9. Mahadevan, S. & Connell, J. (1991) "Automatic programming of behavior-based robots using reinforcement learning". In AAAI-'91, pages 768–773.Google Scholar
  10. Mataric, M. (1994). "Reward functions for accelerated learning". In Proc. of Conf. on Machine Learning-1994, pages 181–189, 1994.Google Scholar
  11. Pomerleau, Dean A. (1993). Knowledge-based training of aritificial neural networks for autonomous robot driving. In J. H. Connel and S. Mahadevan, editors, Robot Learning, chapter 2. Kluwer Academic Publishers.Google Scholar
  12. Saito, F. & Fukuda, T. (1994). "Learning architecture for real robot systems-extension of connectionist q-learning for continuous robot control domain". In Proc. of 1994 IEEE Int. Conf. on Robotics and Automation, pages 27–32.Google Scholar
  13. Sutton, R. S. (1992). "Special issue on reinforcement learning". In R. S. Sutton(Guest), editor, Machine Learning, volume 8, pages-. Kluwer Academic Publishers.Google Scholar
  14. Watkins, C. J. C. H. (1989). Learning from delayed rewards". PhD thesis, King's College, University of Cambridge.Google Scholar
  15. Whitehead, S. D. & Ballard, D. H. (1990). "Active perception and reinforcement learning". In Proc. of Workshop on Machine Learning-1990, pages 179–188.Google Scholar
  16. Whitehead, S. D. (1991). "A complexity analysis of cooperative mechanisms in reinforcement learning". In Proc. AAAI-91, pages 607–613.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Minoru Asada
    • 1
  • Shoichi Noda
    • 1
  • Sukoya Tawaratsumida
    • 1
  • Koh Hosoda
    • 1
  1. 1.Dept. of Mech. Eng. for Computer-Controlled Machinery Osaka UniversityOsaka 565Japan

Personalised recommendations