Autonomous Robots

, Volume 4, Issue 1, pp 73–83 | Cite as

Reinforcement Learning in the Multi-Robot Domain

  • Maja J. Matarić


This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task.

robotics robot learning group behavior multi-agent systems reinforcement learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Asada, M., Uchibe, E., Noda, S., Tawaratsumida, S., and Hosoda, K. 1994. Coordination of multiple behaviors acquired by a avision-based reinforcement learning. In Proceedings IEEE/RSJ/GI International Conference on Intelligent Robots and Systems, Munich, Germany.Google Scholar
  2. Atkeson, C.G. 1989. Using local models to control movement. In Proceedings, Neural Information Processing Systems Conference.Google Scholar
  3. Atkeson, C.G., Aboaf, E.W., McIntyre, J., and Reinkensmeyer, D.J. 1988. Model-based robot learning. Technical Report AIM-1024, MIT.Google Scholar
  4. Barto, A.G., Bradtke, S.J., and Singh, S.P. 1993. Learning to act using real-time dynamic programming. AI Journal.Google Scholar
  5. Brooks, R.A. 1986. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, RA-2:14–23.Google Scholar
  6. Brooks, R.A. 1990. The behavior language: user's guide. Technical Report AIM-1227, MIT Artificial Intelligence Lab.Google Scholar
  7. Brooks, R.A. 1991. Intelligence without reason. In Proceedings, IJCAI-91.Google Scholar
  8. Kaelbling, L.P. 1990. Learning in embedded systems, Ph.D. Thesis, Stanford University.Google Scholar
  9. Lin, L.-J. 1991a. Programming robots using reinforcement learning and teaching. In Proceedings, AAAI-91, Pittsburgh, PA, pp. 781–786.Google Scholar
  10. Lin, L.-J. 1991b. Self-improving reactive agents: Case studies of reinforcement learning frameworks. In From Animals to Animats: International Conference on Simulation of Adaptive Behavior, The MIT Press.Google Scholar
  11. Maes, P. and Brooks, R.A. 1990. Learning to coordinate behaviors. In Proceedings, AAAI-91, Boston, MA, pp. 796–802.Google Scholar
  12. Mahadevan, S. and Connell, J. 1990. Automatic programming of behavior-based robots using reinforcement learning. Technical report, IBM T.J. Watson Research Center Research Report.Google Scholar
  13. Mahadevan, S. and Connell, J. 1991a. Automatic programming of behavior-based robots using reinforcement learning. In Proceedings, AAAI-91, Pittsburgh, PA, pp. 8–14.Google Scholar
  14. Mahadevan, S. and Connell, J. 1991b. Scaling reinforcement learning to robotics by exploiting the subsumption architecture. In Eighth International Workshop on Machine Learning, Morgan Kaufmann, pp. 328–337.Google Scholar
  15. Matarić, M.J. 1992a. Behavior-based systems: Key properties and implications. In IEEE International Conference on Robotics and Automation, Workshop on Architectures for Intelligent Control Systems, Nice, France, pp. 46–54.Google Scholar
  16. Matarić, M.J. 1992b. Designing emergent behaviors: From local interactions to collective intelligence. In From Animals to Animats: International Conference on Simulation of Adaptive Behavior, J.-A. Meyer, H. Roitblat, and S. Wilson (Eds.).Google Scholar
  17. Matarić, M.J. 1993. Kin recognition, similarity, and group behavior. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, Boulder, Colorado, pp. 705–710.Google Scholar
  18. Matarić, M.J. 1994a. Interaction and intelligent behavior, Technical Report AI-TR-1495, MIT Artificial Intelligence Lab.Google Scholar
  19. Matarić, M.J. 1994b. Learning to behave socially. In From Animals to Animats: International Conference on Simulation of Adaptive Behavior, D. Cliff, P. Husbands, J.-A. Meyer, and S. Wilson (Eds.), pp. 453–462.Google Scholar
  20. Millán, J.D.R. 1994. Learning reactive sequences from basic reflexes. In Proceedings, Simulation of Adaptive Behavior SAB-94, The MIT Press: Brighton, England, pp. 266–274.Google Scholar
  21. Moore, A.W. 1992. Fast, robust adaptive control by learning only forward models. Advances in Neural Information Processing, 4:571–579.Google Scholar
  22. Parker, L.E. 1994. Heterogeneous multi-robot cooperation, Ph.D. thesis, MIT.Google Scholar
  23. Pomerleau, D.A. 1992. Neural network perception for mobile robotic guidance, Ph.D. thesis, Carnegie Mellon University, School of Computer Science.Google Scholar
  24. Schaal, S. and Atkeson, C.C. 1994. Robot juggling: An implementation of memory-bassed learning. Control Systems Magazine, 14:57–71.Google Scholar
  25. Sutton, R. 1988. Learning to predict by method of temporal differences. Machine Learning, 3(1):9–44.Google Scholar
  26. Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings, Tenth International Conference on Machine Learning, Amherst, MA, pp. 330–337.Google Scholar
  27. Thrun, S.B. and Mitchell, T.M. 1993. Integrating inductive neural network learning and explanation-based learning. In Proceedings, IJCAI-93, Chambery, France.Google Scholar
  28. Watkins, C.J.C.H. and Dayan, P. 1992. Q-learning. Machine Learning, 8:279–292.Google Scholar
  29. Whitehead, S.D., Karlsson, J., and Tenenberg, J. 1993. Learning multiple goal behavior via task decomposition and dynamic policy merging. In Robot Learning, J.H. Connell and S. Mahadevan (Eds.), Kluwer Academic Publishers, pp. 45–78.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Maja J. Matarić
    • 1
  1. 1.Volen Center for Complex Systems, Computer Science DepartmentBrandeis UniversityWaltham

Personalised recommendations