Journal of Intelligent & Robotic Systems

, Volume 74, Issue 1–2, pp 529–544 | Cite as

Distributed Learning for Planning Under Uncertainty Problems with Heterogeneous Teams

Scaling Up the Multiagent Planning with Distributed Learning and Approximate Representations
  • N. Kemal Ure
  • Girish Chowdhary
  • Yu Fan Chen
  • Jonathan P. How
  • John Vian


This paper considers the problem of multiagent sequential decision making under uncertainty and incomplete knowledge of the state transition model. A distributed learning framework, where each agent learns an individual model and shares the results with the team, is proposed. The challenges associated with this approach include choosing the model representation for each agent and how to effectively share these representations under limited communication. A decentralized extension of the model learning scheme based on the Incremental Feature Dependency Discovery (Dec-iFDD) is presented to address the distributed learning problem. The representation selection problem is solved by leveraging iFDD’s property of adjusting the model complexity based on the observed data. The model sharing problem is addressed by having each agent rank the features of their representation based on the model reduction error and broadcast the most relevant features to their teammates. The algorithm is tested on the multi-agent block building and the persistent search and track missions. The results show that the proposed distributed learning scheme is particularly useful in heterogeneous learning setting, where each agent learns significantly different models. We show through large-scale planning under uncertainty simulations and flight experiments with state-dependent actuator and fuel-burn- rate uncertainty that our planning approach can outperform planners that do not account for heterogeneity between agents.


Distributed learning Planning under uncertainty Unmanned aerial systems 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bertsekas, D.: Dynamic Programming and Optimal Control. Athena Scientific (2005)Google Scholar
  2. 2.
    Bethke, B., Bertuccelli, L.F., How, J.P.: Experimental demonstration of adaptive MDP-based planning with model uncertainty. In: AIAA Guidance Navigation and Control. Honolulu, Hawaii (2008)Google Scholar
  3. 3.
    Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. EEE Trans. Syst. Man Cyber. Part C Appl. Rev. I 38(2), 156–172 (2008)CrossRefGoogle Scholar
  4. 4.
    Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press (2010)Google Scholar
  5. 5.
    Choi, H.L,, Brunet, L., How, J.P.: Consensus-based decentralized auctions for robust task allocation. IEEE Trans. Robot. 25(4), 912–926 (2009). doi:10.1109/TRO.2009.2022423 CrossRefGoogle Scholar
  6. 6.
    Djuric, P., Wang, Y.: Distributed bayesian learning in multiagent systems: improving our understanding of its capabilities and limitations. IEEE Signal Process. Mag. 29(2), 65–76 (2012). doi:10.1109/MSP.2011.943495 CrossRefGoogle Scholar
  7. 7.
    Geramifard, A., Doshi, F., Redding, J., Roy, N., How, J.: Online discovery of feature dependencies. In: Getoor, L., Scheffer, T. (eds.) International Conference on Machine Learning (ICML), pp. 881–888. ACM (2011)Google Scholar
  8. 8.
    How, J.P., Bethke, B., Frank, A., Dale, D., Vian, J.: Real-time indoor autonomous vehicle test environment. IEEE Control Syst. Mag. 28(2), 51–64 (2008)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Krishnamurthy, V.: Quickest time detection and constrained optimal social learning with variance penalty. In: 49th IEEE Conference on Decision and Control (CDC), pp. 1102–1107. IEEE (2010)Google Scholar
  10. 10.
    Kushner, H.J., Yin, G.G.: Convergence of indirect adaptive asynchronous value iteration algorithms. Springer (2003)Google Scholar
  11. 11.
    LaValle, S.: Planning Algorithms. Cambridge University Press (2006)Google Scholar
  12. 12.
    MacKenzie, D.C., Arkin, R., Cameron, J.M.: Multiagent mission specification and execution. Auton. Robot. 4(1), 29–52 (1997)CrossRefGoogle Scholar
  13. 13.
    Monostori, L., Váncza, J., Kumara, S.R.: Agent-based systems for manufacturing. CIRP Annals-Manufacturing Technology 55(2), 697–720 (2006)CrossRefGoogle Scholar
  14. 14.
    Painter-Wakefield, C., Parr, R.: Greedy algorithms for sparse reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 968–975. ACM (2012)Google Scholar
  15. 15.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  16. 16.
    Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Auton. Agents Multi-Agent Syst. 11(3), 387–434 (2005)CrossRefGoogle Scholar
  17. 17.
    Powell, W.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, pp. 225–262. Wiley-Interscience (2007)Google Scholar
  18. 18.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, vol. 414. Wiley (2009)Google Scholar
  19. 19.
    Redding, J.D.: Approximate multi-agent planning in dynamic and uncertain environments. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, Cambridge MA (2012)Google Scholar
  20. 20.
    Redding, J.D., Toksoz, T., Ure, N.K., Geramifard, A., How, J.P., Vavrina, M., Vian, J.: Persistent distributed multi-agent missions with automated battery management. In: AIAA Guidance, Navigation, and Control Conference (GNC), (AIAA-2011-6480) (2011)Google Scholar
  21. 21.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs, NJ (2003)Google Scholar
  22. 22.
    Sutton, R., Barto, A.: Reinforcement Learning, an Introduction. MIT Press, Cambridge, MA (1998)Google Scholar
  23. 23.
    Sutton, R., Szepesvári, C., Geramifard, A., Bowling, M.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland (2008)Google Scholar
  24. 24.
    Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press (2005)Google Scholar
  25. 25.
    Toksoz, T.: Design and implementation of an automated battery management platform. Master’s thesis, Massachusetts Institute of Technology (2012)Google Scholar
  26. 26.
    Ure, N.K., Chowdhary, G., Redding, J., Toksoz, T., How, J., Vavrina, M., Vian, J.: Experimental demonstration of efficient multi-agent learning and planning for persistent missions in uncertain environments. In: Conference on Guidance Navigation and Control. AIAA, Minneapolis, MN (2012)Google Scholar
  27. 27.
    Ure, N.K., Geramifard, A., Chowdhary, G., How, J.P.: Adaptive planning for Markov decision processes with uncertain transition models via incremental feature dependency discovery. In: European Conference on Machine Learning (ECML). (2012)
  28. 28.
    Ure, N.K., Chowdhary, G., Chen, Y.F., How, J.P., Vian. J.: Decentralized learning based planning multiagent missions in presence of actuator failures. In: International Conference on Unmanned Aircraft Systems. IEEE, Atlanta GA (2013)Google Scholar
  29. 29.
    Ure, N.K., Chowdhary, G., Chen, Y.F., How, J.P., Vian, J.: Health-aware decentralized planning and learning for large-scale multiagent missions. In: Conference on Guidance Navigation and Control. AIAA, Washington DC (2013)Google Scholar
  30. 30.
    Ure, N.K., Chowdhary, G., How, J.P., Vavarina, M., Vian, J.: Health aware planning under uncertainty for uav missions with heterogeneous teams. In: Proceedings of the European Control Conference. Zurich, Switzerland (2013) (to appear)Google Scholar
  31. 31.
    Weibull, J.W.: Evolutionary Game Theory. MIT Press (1997)Google Scholar
  32. 32.
    Yao, H., Sutton, R.S., Bhatnagar, S., Dongcui, D., Szepesvári, C.: Multi-step dynamic planning for policy evaluation and control. In: NIPS, pp. 2187–2195 (2009)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • N. Kemal Ure
    • 1
  • Girish Chowdhary
    • 2
  • Yu Fan Chen
    • 1
  • Jonathan P. How
    • 1
  • John Vian
    • 3
  1. 1.Massachusetts Institute of TechnologyCambridgeUSA
  2. 2.Oklahoma State UniversityStillwaterUSA
  3. 3.Boeing Research and TechnologySeattleUSA

Personalised recommendations