Skip to main content

A robust approach to robot team learning


The paper achieves two outcomes. First, it summarizes previous work on concurrent Markov decision processes (CMDPs) currently demonstrated for use with multi-agent foraging problems. When using CMDPs, each agent models the environment using two Markov decision process (MDP). The two MDPs characterize a multi-agent foraging problem by modeling both a single-agent foraging problem, and multi-agent task allocation problem, for each agent. Second, the paper studies the effects of state uncertainty on a heterogeneous robot team that utilizes the aforementioned CMDP modelling approach. Furthermore, the paper presents a method to maintain performance despite state uncertainty. The resulting robust concurrent individual and social learning (RCISL) mechanism leads to an enhanced team learning behaviour despite state uncertainty. The paper analyzes the performance of the concurrent individual and social learning mechanism with and without a particle filter for a heterogeneous foraging scenario. The RCISL mechanism confers statistically significant performance improvements over the CISL mechanism.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  • Airiau, S., & Endriss, U. (2014). Multiagent resource allocation with sharable items. Autonomous Agents and Multi-Agent Systems, 28, 956–985.

    Article  Google Scholar 

  • Araghin, S., Khosravi, A., Johnstone, M., & Creighton, D. (2013). A novel modular Q-learning architecture to improve performance under incomplete learning in a grid soccer game. Engineering Applications of Artificial Intelligence, 26, 2164–2171. doi:10.1016/j.engappai.2013.05.003.

    Article  Google Scholar 

  • Arulampalam, M. S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50, 174–188.

    Article  Google Scholar 

  • Becker, R., Zilberstein, S., Lesser, V., & Goldman, C. V. (2004). Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research archive, 22(1), 423–455.

    MathSciNet  MATH  Google Scholar 

  • Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of markov decision processes. Mathematics of Operations Research, 27, 819–840.

    MathSciNet  Article  MATH  Google Scholar 

  • Billon, R., Nédélec, A., & Tisseau, J. (2008). Gesture recognition in flow based on PCA analysis using multiagent system. In ACE ’08 proceedings of the 2008 international conference on advances in computer entertainment technology (pp. 139–146).

  • Biswas, J., & Veloso, M. (2012). Depth camera based localization and navigation for indoor mobile robots. In Proceedings of IEEE International Conference on Robotics and Automation (pp. 1697–1702).

  • Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1–94.

    MathSciNet  MATH  Google Scholar 

  • Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In TARK ’96 proceedings of the 6th conference on theoretical aspects of rationality and knowledge (pp. 195–210).

  • Di Paola, D., Gasparri, A., Naso, D., & Lewis, F. L. (2015). Decentralized dynamic task planning for heterogeneous robotic networks. Autonomous Agents and Multi-Agent Systems, 38, 31–48.

    Google Scholar 

  • Grady, D. K., Moll, M., & Kavraki, L. E. (2015). Extending the applicability of POMDP solutions to robotic tasks. IEEE Transaction on Robotics, 31(4), 948–961.

    Article  Google Scholar 

  • Hayat, S. A., & Niazi, M. (2005). Multi agent foraging—taking a step further Q-learning with search. International Conference on Emerging Technologies, 2005, 215–220.

    Google Scholar 

  • Jin, Z., Kunming, Y. U., Liu, W., & Jin, J. (2009). State-clusters shared cooperative multi-agent reinforcement learning. In ASCC 2009. 7th, Asian Control Conference (pp. 129–135).

  • Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134.

    MathSciNet  MATH  Google Scholar 

  • Liu, L., Michael, N., & Shell, D. A. (2015). Communication constrained task allocation with optimized local task swaps. Autonomous Agents and Multi-Agent Systems, 39, 429–444.

    Google Scholar 

  • Montemerlo, M., & Thrun, S. (2003). FastSLAM 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In Proceedings of international joint conference on artificial intelligence (pp. 1151–1156).

  • Ng, L., & Emami, M. R. (2014). Concurrent individual and social learning in robot teams. Computational Intelligence. Letter of notification October 13, 2014.

  • Pajarinen, J., & Peltonen, J. (2011). Efficient planning for factored infinite-horizon DEC-POMDPs. In IJCAI’11 Proceedings of the twenty-second international joint conference on artificial intelligence (Vol. 1, pp. 325–331).

  • Parker, L. E. (2012). Decision making as optimization in multi-robot teams. In Proceedings of 8th international conference on distributed computing and internet technology (Vol. 7154, pp. 35-49).

  • Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (pp. 1025–1032).

  • Rekleitis, I. M., Dudek, G., & Milios, E. (2003). Probabilistic cooperative localization and mapping in practice. In In Proceedings of IEEE international conference in robotics and automation (Vol. 2, pp. 1907–1912).

  • Roy, N., Gordon, G., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.

    Article  MATH  Google Scholar 

  • Sonu, E., & Doshi, P. (2015). Scalable solutions of interactive POMDPs using generalized and bounded policy iteration. Autonomous Agents and Multi-Agent Systems, 29, 455–494.

    Article  Google Scholar 

  • Tamimi, H., & Zell, A. (2004). Global visual localization of mobile robots using kernel principal component analysis. In Proceedings, 2004 IEEE/RSJ international conference on intelligent robots and systems (Vol. 2, pp. 1896–1901).

  • Watkins, C., & Dyan, P. (1992). Q-learning. Machine Learning, 8(3/4), 279–292.

    Article  Google Scholar 

  • Welch, B. L. (1947). The generalization of “Student’s” problem when several different population variances are involved. Biometrika, 34(1–2), 28–35.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to M. Reza Emami.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Girard, J., Emami, M.R. A robust approach to robot team learning. Auton Robot 40, 1441–1457 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Robot team learning
  • Markov decision process
  • State uncertainty
  • Particle filters

Mathematics Subject Classification

  • 68T40