Abstract
Reinforcement learning has been widely applied to solve a diverse set of learning tasks, from board games to robot behaviours. In some of them, results have been very successful, but some tasks present several characteristics that make the application of reinforcement learning harder to define. One of these areas is multi-robot learning, which has two important problems. The first is credit assignment, or how to define the reinforcement signal to each robot belonging to a cooperative team depending on the results achieved by the whole team. The second one is working with large domains, where the amount of data can be large and different in each moment of a learning step. This paper studies both issues in a multi-robot environment, showing that introducing domain knowledge and machine learning algorithms can be combined to achieve successful cooperative behaviours.
Similar content being viewed by others
References
Aha, D.: 1997, Lazy Learning, Kluwer Academic Publishers, Dordrecht.
Balch, T. and Parker, L. E. (eds): 2002, Robot Teams: from Diversity to Polymorphism. A. K. Peters Publishers.
Bellman, R.: 1957, Dynamic Programming, Princeton Univ. Press, Princeton, NJ.
Bertsekas, D. P. and Tsitsiklis, J. N.: 1996, Neuro-Dynamic Programming, Athena Scientific, Bellmon, MA.
Duda, R. O. and Hart, P. E.: 1973, Pattern Classification and Scene Analysis, Wiley, New York.
Fern??ndez, F. and Borrajo, D.: 2000, VQQL. Applying vector quantization to reinforcement learning, in: RoboCup-99: Robot Soccer World Cup III, Lecture Notes in Artificial Intelligence, Vol. 1856, Springer, Berlin, pp. 292???303.
Fern??ndez, F. and Borrajo, D.: 2002, On determinism handling while learning reduced state space representations, in: Proc. of the European Conf. on Artificial Intelligence (ECAI 2002), Lyon, France, July.
Fern??ndez, F. and Isasi, P.: 2002, Automatic finding of good classifiers following a biologically inspired metaphor, Computing Informatics 21(3), 205???220.
Fern??ndez, F. and Isasi, P.: 2004, Evolutionary design of nearest prototype classifiers, J. Heuristics 10(4), 431???454.
Fern??ndez, F. and Parker, L.: 2001, Learning in large cooperative multi-robot domains, Internat. J. Robotics Automat. 16(4), 217???226.
Kaelbling, L. P., Littman, M. L., and Moore, A. W.: 1996, Reinforcement learning: A survey, J. Artificial Intelligence Res. 4, 237???285.
Mahadevan, S. and Connell, J.: 1992, Automatic programming of behaviour-based robots using reinforcement learning, Artificial Intelligence 55(2/3), 311???365.
Moore, A. W. and Atkeson, C. G.: 1995, The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces, Machine Learning 21(3), 199???233.
Ng, A. Y. and Russel, S.: 2000, Algorithms for inverse reinforcement learning, in: Proc. of the Seventeenth Internat. Conf. on Machine Learning.
Parker, L. and Touzet, C.: 2000, Multi-robot learning in a cooperative observation task, in: L. E. Parker, G. Bekey and J. Barhen (eds), Distributed Autonomous Robotic Systems, Vol. 4, Springer, Berlin, pp. 391???401.
Parker, L. E.: 2002, Distributed algorithms for multi-robot observation of multiple moving targets, Autonom. Robots 12(3), 231???255.
Puterman, M. L.: 1994, Markov Decision Processes ??? Discrete Stochastic Dynamic Programming, Wiley, New York.
Santamar??a, J. C., Sutton, R. S., and Ram, A.: 1998, Experiments with reinforcement learning in problems with continuous state and action spaces, Adaptive Behavior 6(2), 163???218.
Smart, W. D.: 2002, Making reinforcement learning work on real robots, PhD Thesis, Department of Computer Science at Brown University, Providence, RI.
Stone, P. and Veloso, M.: 2000, Multiagent systems: A survey from a machine learning perspective, Autonom. Robots 8(3).
Tesauro, G.: 1992, Practical issues in temporal difference learning, Machine Learning 8, 257???277.
Tsitsiklis, J. N. and Van Roy, B.: 1996, Feature-based methods for large scale dynamic programming, Machine Learning 22, 59???94.
Watkins C. J. C. H.: 1989, Learning from delayed rewards, PhD Thesis, King???s College, Cambridge, UK.
Author information
Authors and Affiliations
Corresponding author
Additional information
Fernando Fern??ndez: This work has been partially funded by a grant from Spanish Science and Technology Department.
Daniel Borrajo: This work has been partially funded by grants from Spanish Science and Technology Department number TAP1999-0535-C02-02, and TIC2002-04146-C05-05.
Rights and permissions
About this article
Cite this article
Fern??ndez, F., Borrajo, D. & Parker, L.E. A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains. J Intell Robot Syst 43, 161–174 (2005). https://doi.org/10.1007/s10846-005-5137-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-005-5137-x