Autonomous Robots

, Volume 40, Issue 8, pp 1441–1457 | Cite as

A robust approach to robot team learning

  • Justin Girard
  • M. Reza Emami


The paper achieves two outcomes. First, it summarizes previous work on concurrent Markov decision processes (CMDPs) currently demonstrated for use with multi-agent foraging problems. When using CMDPs, each agent models the environment using two Markov decision process (MDP). The two MDPs characterize a multi-agent foraging problem by modeling both a single-agent foraging problem, and multi-agent task allocation problem, for each agent. Second, the paper studies the effects of state uncertainty on a heterogeneous robot team that utilizes the aforementioned CMDP modelling approach. Furthermore, the paper presents a method to maintain performance despite state uncertainty. The resulting robust concurrent individual and social learning (RCISL) mechanism leads to an enhanced team learning behaviour despite state uncertainty. The paper analyzes the performance of the concurrent individual and social learning mechanism with and without a particle filter for a heterogeneous foraging scenario. The RCISL mechanism confers statistically significant performance improvements over the CISL mechanism.


Robot team learning Markov decision process State uncertainty Particle filters 

Mathematics Subject Classification



  1. Airiau, S., & Endriss, U. (2014). Multiagent resource allocation with sharable items. Autonomous Agents and Multi-Agent Systems, 28, 956–985.CrossRefGoogle Scholar
  2. Araghin, S., Khosravi, A., Johnstone, M., & Creighton, D. (2013). A novel modular Q-learning architecture to improve performance under incomplete learning in a grid soccer game. Engineering Applications of Artificial Intelligence, 26, 2164–2171. doi: 10.1016/j.engappai.2013.05.003.CrossRefGoogle Scholar
  3. Arulampalam, M. S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50, 174–188.CrossRefGoogle Scholar
  4. Becker, R., Zilberstein, S., Lesser, V., & Goldman, C. V. (2004). Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research archive, 22(1), 423–455.MathSciNetzbMATHGoogle Scholar
  5. Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of markov decision processes. Mathematics of Operations Research, 27, 819–840.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Billon, R., Nédélec, A., & Tisseau, J. (2008). Gesture recognition in flow based on PCA analysis using multiagent system. In ACE ’08 proceedings of the 2008 international conference on advances in computer entertainment technology (pp. 139–146).Google Scholar
  7. Biswas, J., & Veloso, M. (2012). Depth camera based localization and navigation for indoor mobile robots. In Proceedings of IEEE International Conference on Robotics and Automation (pp. 1697–1702).Google Scholar
  8. Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1–94.MathSciNetzbMATHGoogle Scholar
  9. Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In TARK ’96 proceedings of the 6th conference on theoretical aspects of rationality and knowledge (pp. 195–210).Google Scholar
  10. Di Paola, D., Gasparri, A., Naso, D., & Lewis, F. L. (2015). Decentralized dynamic task planning for heterogeneous robotic networks. Autonomous Agents and Multi-Agent Systems, 38, 31–48.Google Scholar
  11. Grady, D. K., Moll, M., & Kavraki, L. E. (2015). Extending the applicability of POMDP solutions to robotic tasks. IEEE Transaction on Robotics, 31(4), 948–961.CrossRefGoogle Scholar
  12. Hayat, S. A., & Niazi, M. (2005). Multi agent foraging—taking a step further Q-learning with search. International Conference on Emerging Technologies, 2005, 215–220.Google Scholar
  13. Jin, Z., Kunming, Y. U., Liu, W., & Jin, J. (2009). State-clusters shared cooperative multi-agent reinforcement learning. In ASCC 2009. 7th, Asian Control Conference (pp. 129–135).Google Scholar
  14. Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134.MathSciNetzbMATHGoogle Scholar
  15. Liu, L., Michael, N., & Shell, D. A. (2015). Communication constrained task allocation with optimized local task swaps. Autonomous Agents and Multi-Agent Systems, 39, 429–444.Google Scholar
  16. Montemerlo, M., & Thrun, S. (2003). FastSLAM 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In Proceedings of international joint conference on artificial intelligence (pp. 1151–1156).Google Scholar
  17. Ng, L., & Emami, M. R. (2014). Concurrent individual and social learning in robot teams. Computational Intelligence. Letter of notification October 13, 2014.Google Scholar
  18. Pajarinen, J., & Peltonen, J. (2011). Efficient planning for factored infinite-horizon DEC-POMDPs. In IJCAI’11 Proceedings of the twenty-second international joint conference on artificial intelligence (Vol. 1, pp. 325–331).Google Scholar
  19. Parker, L. E. (2012). Decision making as optimization in multi-robot teams. In Proceedings of 8th international conference on distributed computing and internet technology (Vol. 7154, pp. 35-49).Google Scholar
  20. Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (pp. 1025–1032).Google Scholar
  21. Rekleitis, I. M., Dudek, G., & Milios, E. (2003). Probabilistic cooperative localization and mapping in practice. In In Proceedings of IEEE international conference in robotics and automation (Vol. 2, pp. 1907–1912). Google Scholar
  22. Roy, N., Gordon, G., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.CrossRefzbMATHGoogle Scholar
  23. Sonu, E., & Doshi, P. (2015). Scalable solutions of interactive POMDPs using generalized and bounded policy iteration. Autonomous Agents and Multi-Agent Systems, 29, 455–494.CrossRefGoogle Scholar
  24. Tamimi, H., & Zell, A. (2004). Global visual localization of mobile robots using kernel principal component analysis. In Proceedings, 2004 IEEE/RSJ international conference on intelligent robots and systems (Vol. 2, pp. 1896–1901).Google Scholar
  25. Watkins, C., & Dyan, P. (1992). Q-learning. Machine Learning, 8(3/4), 279–292.CrossRefGoogle Scholar
  26. Welch, B. L. (1947). The generalization of “Student’s” problem when several different population variances are involved. Biometrika, 34(1–2), 28–35.MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Institute for Aerospace StudiesUniversity of TorontoTorontoCanada
  2. 2.Division of Space TechnologyLuleå University of TechnologyLuleåSweden

Personalised recommendations