Skip to main content
Log in

A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains

  • Published:
Journal of Intelligent and Robotic Systems Aims and scope Submit manuscript

Abstract

Reinforcement learning has been widely applied to solve a diverse set of learning tasks, from board games to robot behaviours. In some of them, results have been very successful, but some tasks present several characteristics that make the application of reinforcement learning harder to define. One of these areas is multi-robot learning, which has two important problems. The first is credit assignment, or how to define the reinforcement signal to each robot belonging to a cooperative team depending on the results achieved by the whole team. The second one is working with large domains, where the amount of data can be large and different in each moment of a learning step. This paper studies both issues in a multi-robot environment, showing that introducing domain knowledge and machine learning algorithms can be combined to achieve successful cooperative behaviours.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aha, D.: 1997, Lazy Learning, Kluwer Academic Publishers, Dordrecht.

    MATH  Google Scholar 

  • Balch, T. and Parker, L. E. (eds): 2002, Robot Teams: from Diversity to Polymorphism. A. K. Peters Publishers.

  • Bellman, R.: 1957, Dynamic Programming, Princeton Univ. Press, Princeton, NJ.

    Google Scholar 

  • Bertsekas, D. P. and Tsitsiklis, J. N.: 1996, Neuro-Dynamic Programming, Athena Scientific, Bellmon, MA.

    MATH  Google Scholar 

  • Duda, R. O. and Hart, P. E.: 1973, Pattern Classification and Scene Analysis, Wiley, New York.

    MATH  Google Scholar 

  • Fern??ndez, F. and Borrajo, D.: 2000, VQQL. Applying vector quantization to reinforcement learning, in: RoboCup-99: Robot Soccer World Cup III, Lecture Notes in Artificial Intelligence, Vol. 1856, Springer, Berlin, pp. 292???303.

    Chapter  Google Scholar 

  • Fern??ndez, F. and Borrajo, D.: 2002, On determinism handling while learning reduced state space representations, in: Proc. of the European Conf. on Artificial Intelligence (ECAI 2002), Lyon, France, July.

  • Fern??ndez, F. and Isasi, P.: 2002, Automatic finding of good classifiers following a biologically inspired metaphor, Computing Informatics 21(3), 205???220.

    Google Scholar 

  • Fern??ndez, F. and Isasi, P.: 2004, Evolutionary design of nearest prototype classifiers, J. Heuristics 10(4), 431???454.

    Article  Google Scholar 

  • Fern??ndez, F. and Parker, L.: 2001, Learning in large cooperative multi-robot domains, Internat. J. Robotics Automat. 16(4), 217???226.

    Google Scholar 

  • Kaelbling, L. P., Littman, M. L., and Moore, A. W.: 1996, Reinforcement learning: A survey, J. Artificial Intelligence Res. 4, 237???285.

    Google Scholar 

  • Mahadevan, S. and Connell, J.: 1992, Automatic programming of behaviour-based robots using reinforcement learning, Artificial Intelligence 55(2/3), 311???365.

    Article  Google Scholar 

  • Moore, A. W. and Atkeson, C. G.: 1995, The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces, Machine Learning 21(3), 199???233.

    Google Scholar 

  • Ng, A. Y. and Russel, S.: 2000, Algorithms for inverse reinforcement learning, in: Proc. of the Seventeenth Internat. Conf. on Machine Learning.

  • Parker, L. and Touzet, C.: 2000, Multi-robot learning in a cooperative observation task, in: L. E. Parker, G. Bekey and J. Barhen (eds), Distributed Autonomous Robotic Systems, Vol. 4, Springer, Berlin, pp. 391???401.

    Google Scholar 

  • Parker, L. E.: 2002, Distributed algorithms for multi-robot observation of multiple moving targets, Autonom. Robots 12(3), 231???255.

    Article  MATH  Google Scholar 

  • Puterman, M. L.: 1994, Markov Decision Processes ??? Discrete Stochastic Dynamic Programming, Wiley, New York.

    MATH  Google Scholar 

  • Santamar??a, J. C., Sutton, R. S., and Ram, A.: 1998, Experiments with reinforcement learning in problems with continuous state and action spaces, Adaptive Behavior 6(2), 163???218.

    Article  Google Scholar 

  • Smart, W. D.: 2002, Making reinforcement learning work on real robots, PhD Thesis, Department of Computer Science at Brown University, Providence, RI.

  • Stone, P. and Veloso, M.: 2000, Multiagent systems: A survey from a machine learning perspective, Autonom. Robots 8(3).

  • Tesauro, G.: 1992, Practical issues in temporal difference learning, Machine Learning 8, 257???277.

    MATH  Google Scholar 

  • Tsitsiklis, J. N. and Van Roy, B.: 1996, Feature-based methods for large scale dynamic programming, Machine Learning 22, 59???94.

    MATH  Google Scholar 

  • Watkins C. J. C. H.: 1989, Learning from delayed rewards, PhD Thesis, King???s College, Cambridge, UK.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fernando Fern??ndez.

Additional information

Fernando Fern??ndez: This work has been partially funded by a grant from Spanish Science and Technology Department.

Daniel Borrajo: This work has been partially funded by grants from Spanish Science and Technology Department number TAP1999-0535-C02-02, and TIC2002-04146-C05-05.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fern??ndez, F., Borrajo, D. & Parker, L.E. A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains. J Intell Robot Syst 43, 161–174 (2005). https://doi.org/10.1007/s10846-005-5137-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-005-5137-x

Keywords

Navigation