Skip to main content

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13699))

Included in the following conference series:

Abstract

Recent work in Vision-and-Language Navigation (VLN) has presented two environmental paradigms with differing realism – the standard VLN setting built on topological environments where navigation is abstracted away [3], and the VLN-CE setting where agents must navigate continuous 3D environments using low-level actions [21]. Despite sharing the high-level task and even the underlying instruction-path data, performance on VLN-CE lags behind VLN significantly. In this work, we explore this gap by transferring an agent from the abstract environment of VLN to the continuous environment of VLN-CE. We find that this sim-2-sim transfer is highly effective, improving over the prior state of the art in VLN-CE by +12% success rate. While this demonstrates the potential for this direction, the transfer does not fully retain the original performance of the agent in the abstract setting. We present a sequence of experiments to identify what differences result in performance degradation, providing clear directions for further improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    github.com/jacobkrantz/Sim2Sim-VLNCE.

  2. 2.

    As defined by the Matterport3D Simulator used in VLN.

  3. 3.

    eval.ai/web/challenges/challenge-page/97.

  4. 4.

    eval.ai/web/challenges/challenge-page/719.

References

  1. Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)

  2. Anderson, P., et al.: Sim-to-real transfer for vision-and-language navigation. In: CoRL (2020)

    Google Scholar 

  3. Anderson, et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: CVPR (2018)

    Google Scholar 

  4. Blukis, V., Terme, Y., Niklasson, E., Knepper, R.A., Artzi, Y.: Learning to map natural language instructions to physical quadcopter control using simulated flight. In: CoRL (2020)

    Google Scholar 

  5. Chang, A., et al.: Matterport3d: learning from RGB-D data in indoor environments. In: 3DV (2017), MatterPort3D dataset license. http://kaldir.vc.in.tum.de/matterport/MP_TOS.pdf

  6. Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural slam. In: ICLR (2020)

    Google Scholar 

  7. Chen, K., Chen, J.K., Chuang, J., Vázquez, M., Savarese, S.: Topological planning with transformers for vision-and-language navigation. In: CVPR (2021)

    Google Scholar 

  8. Chen, S., Guhur, P.L., Schmid, C., Laptev, I.: History aware multimodal transformer for vision-and-language navigation. In: NeurIPS (2021)

    Google Scholar 

  9. Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: NeurIPS (2013)

    Google Scholar 

  10. Deitke, M., et al.: Robothor: an open simulation-to-real embodied AI platform. In: CVPR (2020)

    Google Scholar 

  11. Fried, D., et al.: Speaker-follower models for vision-and-language navigation. In: NeurIPS (2018)

    Google Scholar 

  12. Gordon, D., Kadian, A., Parikh, D., Hoffman, J., Batra, D.: Splitnet: sim2sim and task2task transfer for embodied visual navigation. In: CVPR (2019)

    Google Scholar 

  13. Hahn, M., Chaplot, D.S., Tulsiani, S., Mukadam, M., Rehg, J.M., Gupta, A.: No RL, no simulation: learning to navigate without navigating. In: NeurIPS (2021)

    Google Scholar 

  14. Hao, W., Li, C., Li, X., Carin, L., Gao, J.: Towards learning a generic agent for vision-and-language navigation via pre-training. In: CVPR (2020)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  16. Hong, Y., Wu, Q., Qi, Y., Rodriguez-Opazo, C., Gould, S.: VLN BERT: a recurrent vision-and-language bert for navigation. In: CVPR (2021)

    Google Scholar 

  17. Irshad, M.Z., Ma, C.Y., Kira, Z.: Hierarchical cross-modal agent for robotics vision-and-language navigation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2021)

    Google Scholar 

  18. Irshad, M.Z., Mithun, N.C., Seymour, Z., Chiu, H.P., Samarasekera, S., Kumar, R.: Sasra: semantically-aware spatio-temporal reasoning agent for vision-and-language navigation in continuous environments. In: International Conference on Pattern Recognition (ICPR) (2022)

    Google Scholar 

  19. Kadian, A., et al.: Are we making real progress in simulated environments? measuring the sim2real gap in embodied visual navigation. In: IROS (2020)

    Google Scholar 

  20. Krantz, J., Gokaslan, A., Batra, D., Lee, S., Maksymets, O.: Waypoint models for instruction-guided navigation in continuous environments. In: ICCV (2021)

    Google Scholar 

  21. Krantz, J., Wijmans, E., Majumdar, A., Batra, D., Lee, S.: Beyond the nav-graph: vision-and-language navigation in continuous environments. In: ECCV (2020)

    Google Scholar 

  22. Majumdar, A., Shrivastava, A., Lee, S., Anderson, P., Parikh, D., Batra, D.: Improving vision-and-language navigation with image-text pairs from the web. In: ECCV (2020)

    Google Scholar 

  23. Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)

    Google Scholar 

  24. Quigley, M., et al.: Ros: an open-source robot operating system. In: ICRA Workshop on Open Source Software (2009)

    Google Scholar 

  25. Raychaudhuri, S., Wani, S., Patel, S., Jain, U., Chang, A.X.: Language-aligned waypoint (law) supervision for vision-and-language navigation in continuous environments. In: EMNLP (2021)

    Google Scholar 

  26. Tan, H., Yu, L., Bansal, M.: Learning to navigate unseen environments: back translation with environmental dropout. In: NAACL HLT (2019)

    Google Scholar 

  27. Wang, X., et al.: Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In: CVPR (2019)

    Google Scholar 

  28. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. In: TPAMI (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the DARPA Machine Common Sense program. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government, or any sponsor.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacob Krantz .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Krantz, J., Lee, S. (2022). Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19842-7_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19841-0

  • Online ISBN: 978-3-031-19842-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics