A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12350)


Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task’s difficulty outpaces a single agent’s abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 % points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at


Embodied agents Multi-agent reinforcement learning Collaboration Emergent communication AI2-THOR 



This material is based upon work supported in part by the National Science Foundation under Grants No. 1563727, 1718221, 1637479, 165205, 1703166, Samsung, 3M, Sloan Fellowship, NVIDIA Artificial Intelligence Lab, Allen Institute for AI, Amazon, AWS Research Awards, and Siebel Scholars Award. We thank M. Wortsman and K.-H. Zeng for their insightful comments.

Supplementary material

504441_1_En_28_MOESM1_ESM.pdf (4.1 mb)
Supplementary material 1 (pdf 4155 KB)


  1. 1.
    Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. arXiv preprint arXiv:1603.04119 (2016)
  2. 2.
    Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)
  3. 3.
    Anderson, P., Shrivastava, A., Parikh, D., Batra, D., Lee, S.: Chasing ghosts: instruction following as bayesian state tracking. In: NeurIPS (2019)Google Scholar
  4. 4.
    Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: CVPR (2018)Google Scholar
  5. 5.
    Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
  6. 6.
    Aydemir, A., Pronobis, A., Göbelbecker, M., Jensfelt, P.: Active visual object search in unknown environments using uncertain semantics. IEEE Trans. Robot. 29, 986–1002 (2013)CrossRefGoogle Scholar
  7. 7.
    Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)
  8. 8.
    Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)CrossRefGoogle Scholar
  9. 9.
    Boutilier, C.: Sequential optimality and coordination in multiagent systems. In: IJCAI (1999)Google Scholar
  10. 10.
    Bratman, J., Shvartsman, M., Lewis, R.L., Singh, S.: A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints. In: Proceedings of International Conference on Cognitive Modeling (2010)Google Scholar
  11. 11.
    Brodeur, S., et al.: HoME: a household multimodal environment. arXiv preprint arXiv:1711.11017 (2017)
  12. 12.
    Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. 38, 156–172 (2008)CrossRefGoogle Scholar
  13. 13.
    Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32, 1309–1332 (2016)CrossRefGoogle Scholar
  14. 14.
    Canny, J.: The Complexity of Robot Motion Planning. MIT Press, Cambridge (1988)zbMATHGoogle Scholar
  15. 15.
    Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)Google Scholar
  16. 16.
    Chaplot, D.S., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural mapping. In: ICLR (2020)Google Scholar
  17. 17.
    Chen, B., Song, S., Lipson, H., Vondrick, C.: Visual hide and seek. arXiv preprint arXiv:1910.07882 (2019)
  18. 18.
    Chen, C., et al.: Audio-visual embodied navigation. arXiv preprint arXiv:1912.11474 (2019). First two authors contributed equally
  19. 19.
    Chen, H., Suhr, A., Misra, D., Snavely, N., Artzi, Y.: Touchdown: natural language navigation and spatial reasoning in visual street environments. In: CVPR (2019)Google Scholar
  20. 20.
    Daftry, S., Bagnell, J.A., Hebert, M.: Learning transferable policies for monocular reactive MAV control. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds.) ISER 2016. SPAR, vol. 1, pp. 3–11. Springer, Cham (2017). Scholar
  21. 21.
    Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: CVPR (2018)Google Scholar
  22. 22.
    Das, A., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Neural modular control for embodied question answering. In: ECCV (2018)Google Scholar
  23. 23.
    Das, A., et al.: Probing emergent semantics in predictive agents via question answering. In: ICML (2020). First two authors contributed equallyGoogle Scholar
  24. 24.
    Das, A., et al.: TarMAC: targeted multi-agent communication. In: ICML (2019)Google Scholar
  25. 25.
    Dellaert, F., Seitz, S., Thorpe, C., Thrun, S.: Structure from motion without correspondence. In: CVPR (2000)Google Scholar
  26. 26.
    Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer 22, 46–57 (1989)CrossRefGoogle Scholar
  27. 27.
    Foerster, J.N., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: NeurIPS (2016)Google Scholar
  28. 28.
    Foerster, J.N., Farquhar, G., Afouras, T., NArdelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: AAAI (2018)Google Scholar
  29. 29.
    Foerster, J.N., Nardelli, N., Farquhar, G., Torr, P.H.S., Kohli, P., Whiteson, S.: Stabilising experience replay for deep multi-agent reinforcement learning. In: ICML (2017)Google Scholar
  30. 30.
    Fraundorfer, F., et al.: Vision-based autonomous mapping and exploration using a quadrotor MAV. In: IROS (2012)Google Scholar
  31. 31.
    Gao, R., Chen, C., Al-Halah, Z., Schissler, C., Grauman, K.: VisualEchoes: spatial image representation learning through echolocation. In: ECCV (2020)Google Scholar
  32. 32.
    Giles, C.L., Jim, K.-C.: Learning communication for multi-agent systems. In: Truszkowski, W., Hinchey, M., Rouff, C. (eds.) WRAC 2002. LNCS (LNAI), vol. 2564, pp. 377–390. Springer, Heidelberg (2003). Scholar
  33. 33.
    Giusti, A., et al.: A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robot. Autom. Lett. 1, 661–667 (2015)CrossRefGoogle Scholar
  34. 34.
    Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., Farhadi, A.: IQA: visual Question Answering in Interactive Environments. In: CVPR (2018)Google Scholar
  35. 35.
    Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)Google Scholar
  36. 36.
    Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). Scholar
  37. 37.
    Henriques, J.F., Vedaldi, A.: MapNet: an allocentric spatial memory for mapping environments. In: CVPR (2018)Google Scholar
  38. 38.
    Hill, F., Hermann, K.M., Blunsom, P., Clark, S.: Understanding grounded language learning agents. arXiv preprint arXiv:1710.09867 (2017)
  39. 39.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRefGoogle Scholar
  40. 40.
    Jaderberg, M., et al.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Jain, U., et al.: Two body problem: collaborative visual task completion. In: CVPR (2019), first two authors contributed equallyGoogle Scholar
  42. 42.
    Johnson, M., Hofmann, K., Hutton, T., Bignell, D.: The malmo platform for artificial intelligence experimentation. In: IJCAI (2016)Google Scholar
  43. 43.
    Kahn, G., Zhang, T., Levine, S., Abbeel, P.: Plato: policy learning using adaptive trajectory optimization. In: ICRA (2017)Google Scholar
  44. 44.
    Kasai, T., Tenmoto, H., Kamiya, A.: Learning of communication codes in multi-agent reinforcement learning problem. In: Proceedings of IEEE Soft Computing in Industrial Applications (2008)Google Scholar
  45. 45.
    Kavraki, L.E., Svestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 12, 566–580 (1996)CrossRefGoogle Scholar
  46. 46.
    Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: Proceedings of IEEE Conference on Computational Intelligence and Games (2016)Google Scholar
  47. 47.
    Kolve, E., et al.: AI2-THOR: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2019)
  48. 48.
    Konolige, K., et al.: View-based maps. Int. J. Robot. Res. 29, 941–957 (2010)CrossRefGoogle Scholar
  49. 49.
    Kuipers, B., Byun, Y.T.: A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations. Robot. Auton. Syst. 8, 47–63 (1991)CrossRefGoogle Scholar
  50. 50.
    Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: ICML (2000)Google Scholar
  51. 51.
    Lavalle, S.M., Kuffner, J.J.: Rapidly-exploring random trees: progress and prospects. Algorithmic Comput. Robot.: New Direct (2000)Google Scholar
  52. 52.
    Lazaridou, A., Peysakhovich, A., Baroni, M.: Multi-agent cooperation and the emergence of (natural) language. In: arXiv preprint arXiv:1612.07182 (2016)
  53. 53.
    Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: ICML (2016)Google Scholar
  54. 54.
    Liu, I.J., Yeh, R., Schwing, A.G.: PIC: permutation invariant critic for multi-agent deep reinforcement learning. In: CoRL (2019). First two authors contributed equallyGoogle Scholar
  55. 55.
    Liu, Y.C., Tian, J., Glaser, N., Kira, Z.: When2com: multi-agent perception via communication graph grouping. In: CVPR (2020)Google Scholar
  56. 56.
    Liu, Y.C., Tian, J., Ma, C.Y., Glaser, N., Kuo, C.W., Kira, Z.: Who2com: collaborative perception via learnable handshake communication. In: ICRA (2020)Google Scholar
  57. 57.
    Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NeurIPS (2017)Google Scholar
  58. 58.
    Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)Google Scholar
  59. 59.
    Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Hysteretic Q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IROS (2007)Google Scholar
  60. 60.
    Melo, F.S., Spaan, M.T.J., Witwicki, S.J.: QueryPOMDP: POMDP-based communication in multiagent systems. In: Cossentino, M., Kaisers, M., Tuyls, K., Weiss, G. (eds.) EUMAS 2011. LNCS (LNAI), vol. 7541, pp. 189–204. Springer, Heidelberg (2012). Scholar
  61. 61.
    Mirowski, P., et al.: Learning to navigate in complex environments. In: ICLR (2017)Google Scholar
  62. 62.
    Mirowski, P., et al.: The streetlearn environment and dataset. arXiv preprint arXiv:1903.01292 (2019)
  63. 63.
    Mirowski, P., et al.: Learning to navigate in cities without a map. In: NeurIPS (2018)Google Scholar
  64. 64.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRefGoogle Scholar
  65. 65.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML (2016)Google Scholar
  66. 66.
    Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. In: AAAI (2018)Google Scholar
  67. 67.
    Oh, J., Chockalingam, V., Singh, S., Lee, H.: Control of memory, active perception, and action in minecraft. In: ICML (2016)Google Scholar
  68. 68.
    Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: ICML (2017)Google Scholar
  69. 69.
    Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. Autonom. Agents Multi-Agent Syst. AAMAS 11, 387–434 (2005)CrossRefGoogle Scholar
  70. 70.
    Peng, P., et al.: Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)
  71. 71.
    Smith, R.C., Cheeseman, P.: On the representation and estimation of spatial uncertainty. Int. J. Robot. Res. 5, 56–68 (1986)CrossRefGoogle Scholar
  72. 72.
    Ramakrishnan, S.K., Jayaraman, D., Grauman, K.: An exploration of embodied visual exploration. arXiv preprint arXiv:2001.02192 (2020)
  73. 73.
    Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)Google Scholar
  74. 74.
    Savinov, N., Dosovitskiy, A., Koltun, V.: Semi-parametric topological memory for navigation. In: ICLR (2018)Google Scholar
  75. 75.
    Savva, M., Chang, A.X., Dosovitskiy, A., Funkhouser, T., Koltun, V.: MINOS: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931 (2017)
  76. 76.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)Google Scholar
  77. 77.
    Smith, R.C., Self, M., Cheeseman, P.: Estimating uncertain spatial relationships in robotics. In: UAI (1986)Google Scholar
  78. 78.
    Suhr, A., et al.: Executing instructions in situated collaborative interactions. In: EMNLP (2019)Google Scholar
  79. 79.
    Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. In: NeurIPS (2016)Google Scholar
  80. 80.
    Sukhbaatar, S., Szlam, A., Synnaeve, G., Chintala, S., Fergus, R.: MazeBase: a sandbox for learning from games. arXiv preprint arXiv:1511.07401 (2015)
  81. 81.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  82. 82.
    Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: NeurIPS (2016)Google Scholar
  83. 83.
    Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PloS 12, e0172395 (2017)CrossRefGoogle Scholar
  84. 84.
    Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML (1993)Google Scholar
  85. 85.
    Tesauro, G.: Extending Q-learning to general adaptive multi-agent systems. In: NeurIPS (2004)Google Scholar
  86. 86.
    Thomason, J., Gordon, D., Bisk, Y.: Shifting the baseline: Single modality performance on visual navigation & QA. In: NAACL (2019)Google Scholar
  87. 87.
    Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. IJCV 9, 137–154 (1992)CrossRefGoogle Scholar
  88. 88.
    Toussaint, M.: Learning a world model and planning with a self-organizing, dynamic neural system. In: NeurIPS (2003)Google Scholar
  89. 89.
    Usunier, N., Synnaeve, G., Lin, Z., Chintala, S.: Episodic exploration for deep deterministic policies: an application to starcraft micromanagement tasks. In: ICLR (2016)Google Scholar
  90. 90.
    de Vries, H., Shuster, K., Batra, D., Parikh, D., Weston, J., Kiela, D.: Talk the walk: navigating new York city through grounded dialogue. arXiv preprint arXiv:1807.03367 (2018)
  91. 91.
    Wang, X., et al.: Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In: CVPR (2019)Google Scholar
  92. 92.
    Weihs, L., Jain, U., Salvador, J., Lazebnik, S., Kembhavi, A., Schwing, A.: Bridging the imitation gap by adaptive insubordination. arXiv preprint arXiv:2007.12173 (2020). The first two authors contributed equally
  93. 93.
    Weihs, L., et al.: Artificial agents learn flexible visual representations by playing a hiding game. arXiv preprint arXiv:1912.08195 (2019)
  94. 94.
    Weihs, L., et al.: AllenAct: a framework for embodied AI research. arXiv (2020)Google Scholar
  95. 95.
    Wijmans, E., et al.: Embodied question answering in photorealistic environments with point cloud perception. In: CVPR (2019)Google Scholar
  96. 96.
    Wortsman, M., Ehsani, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Learning to learn how to learn: self-adaptive visual navigation using meta-learning. In: CVPR (2019)Google Scholar
  97. 97.
    Wu, Y., Wu, Y., Tamar, A., Russell, S., Gkioxari, G., Tian, Y.: Bayesian relational memory for semantic visual navigation. In: ICCV (2019)Google Scholar
  98. 98.
    Wymann, B., Espié, E., Guionneau, C., Dimitrakakis, C., Coulom, R., Sumner, A.: TORCS, the open racing car simulator (2013).
  99. 99.
    Xia, F., et al.: Interactive Gibson: a benchmark for interactive navigation in cluttered environments. arXiv preprint arXiv:1910.14442 (2019)
  100. 100.
    Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson ENv: real-world perception for embodied agents. In: CVPR (2018)Google Scholar
  101. 101.
    Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Visual curiosity: learning to ask questions to learn visual recognition. In: CoRL (2018)Google Scholar
  102. 102.
    Yang, J., et al.: Embodied amodal recognition: learning to move to perceive objects. In: ICCV (2019)Google Scholar
  103. 103.
    Yang, W., Wang, X., Farhadi, A., Gupta, A., Mottaghi, R.: Visual semantic navigation using scene priors. In: ICLR (2018)Google Scholar
  104. 104.
    Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635 (2019)
  105. 105.
    Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Allen Institute for AISeattleUSA
  3. 3.University of WashingtonSeattleUSA

Personalised recommendations