Advertisement

Learning to Plan with Uncertain Topological Maps

Conference paper
  • 487 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12348)

Abstract

We train an agent to navigate in 3D environments using a hierarchical strategy including a high-level graph based planner and a local policy. Our main contribution is a data driven learning based approach for planning under uncertainty in topological maps, requiring an estimate of shortest paths in valued graphs with a probabilistic structure. Whereas classical symbolic algorithms achieve optimal results on noise-less topologies, or optimal results in a probabilistic sense on graphs with probabilistic structure, we aim to show that machine learning can overcome missing information in the graph by taking into account rich high-dimensional node features, for instance visual information available at each location of the map. Compared to purely learned neural white box algorithms, we structure our neural model with an inductive bias for dynamic programming based shortest path algorithms, and we show that a particular parameterization of our neural model corresponds to the Bellman-Ford algorithm. By performing an empirical analysis of our method in simulated photo-realistic 3D environments, we demonstrate that the inclusion of visual features in the learned neural planner outperforms classical symbolic solutions for graph based planning.

Keywords

Visual navigation Topological maps Graph neural networks 

Notes

Acknowledgements

This work was funded by grant Deepvision (ANR-15-CE23-0029, STPGP479356-15), a joint French/Canadian call by ANR & NSERC; Compute was provided by the CNRS/IN2P3 Computing Center (Lyon, France), and by GENCI-IDRIS (Grant 2019-100964).

Supplementary material

504435_1_En_28_MOESM1_ESM.pdf (1.1 mb)
Supplementary material 1 (pdf 1167 KB)

Supplementary material 2 (mp4 34819 KB)

References

  1. 1.
    Anderson, P., et al.: On evaluation of embodied navigation agents (2018)Google Scholar
  2. 2.
    Åström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Battaglia, P., Pascanu, R., Lai, M., Rezende, D.J., et al.: Interaction networks for learning about objects, relations and physics. In: Advances in Neural Information Processing Systems, pp. 4502–4510 (2016)Google Scholar
  4. 4.
    Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018)
  5. 5.
    Battaglia, P., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint 1807.09244 (2018)Google Scholar
  6. 6.
    Beeching, E., Wolf, C., Dibangoye, J., Simonin, O.: EgoMap: projective mapping and structured egocentric memory for deep RL (2020)Google Scholar
  7. 7.
    Bellman, R.: On a routing problem. Q. Appl. Math. 16(1), 87–90 (1958)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Bhatti, S., Desmaison, A., Miksik, O., Nardelli, N., Siddharth, N., Torr, P.H.S.: Playing doom with slam-augmented deep reinforcement learning. arxiv preprint 1612.00380 (2016)Google Scholar
  9. 9.
    Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond Euclidean data. IEEE Sig. Process. Mag. 34(4), 18–42 (2017)CrossRefGoogle Scholar
  10. 10.
    Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural slam. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=HklXn1BKDH
  11. 11.
    Chen, T., Gupta, S., Gupta, A.: Learning exploration policies for navigation. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=SyMWn05F7
  12. 12.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
  13. 13.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Gated Feedback Recurrent Neural Networks. In: ICML (2015)Google Scholar
  14. 14.
    Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 933–941. JMLR.org (2017)Google Scholar
  15. 15.
    Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische mathematik 1(1), 269–271 (1959)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Eysenbach, B., Salakhutdinov, R.R., Levine, S.: Search on the replay buffer: bridging planning and reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32, pp. 15220–15231. Curran Associates, Inc. (2019)Google Scholar
  17. 17.
    Fout, A., Byrd, J., Shariat, B., Ben-Hur, A.: Protein interface prediction using graph convolutional networks. In: Advances in Neural Information Processing Systems, pp. 6530–6539 (2017)Google Scholar
  18. 18.
    Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1263–1272. JMLR.org (2017)Google Scholar
  19. 19.
    Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)
  20. 20.
    Graves, A., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471 (2016)CrossRefGoogle Scholar
  21. 21.
    Gupta, S., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7272–7281, July 2017.  https://doi.org/10.1109/CVPR.2017.769
  22. 22.
    Gupta, S., Fouhey, D., Levine, S., Malik, J.: Unifying map and landmark based representations for visual navigation. arXiv preprint arXiv:1712.08125 (2017)
  23. 23.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  24. 24.
    Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. In: ICLR (2017)Google Scholar
  25. 25.
    Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv:1906.01227 (2019)
  26. 26.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Karkus, P., Hsu, D., Lee, W.S.: QMDP-net: deep learning for planning under partial observability (2017)Google Scholar
  28. 28.
    Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaskowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computatonal Intelligence and Games, CIG (2017).  https://doi.org/10.1109/CIG.2016.7860433, https://arxiv.org/pdf/1605.02097.pdf
  29. 29.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2017)Google Scholar
  30. 30.
    Kurniawati, H.: SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Proceedings of the Robotics: Science and Systems (2008)Google Scholar
  31. 31.
    LaValle, S.M.: Planning Algorithms. Cambridge University Press, New York (2006)CrossRefGoogle Scholar
  32. 32.
    Lecun, Y., Eon Bottou, L., Bengio, Y., Haaner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  33. 33.
    Li, Z., Chen, Q., Koltun, V.: Combinatorial optimization with graph convolutional networks and guided tree search. In: Advances in Neural Information Processing Systems, pp. 539–548 (2018)Google Scholar
  34. 34.
    Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)Google Scholar
  35. 35.
    Mirowski, P., et al.: Learning to navigate in cities without a map. arxiv pre-print 1804.00168v2 (2018)Google Scholar
  36. 36.
    Mirowski, P., et al.: Learning to navigate in complex environments. In: ICLR (2017)Google Scholar
  37. 37.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  38. 38.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, (2015).  https://doi.org/10.1038/nature14236, https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
  39. 39.
    Neverova, N., Wolf, C., Taylor, G., Nebout, F.: ModDrop: adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1692–1706 (2015)CrossRefGoogle Scholar
  40. 40.
    Parisotto, E., Salakhutdinov, R.: Neural map: structured memory for deep reinforcement learning. In: ICLR (2018)Google Scholar
  41. 41.
    Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019)Google Scholar
  42. 42.
    Remolina, E., Kuipers, B.: Towards a general theory of topological maps. Artif. Intell. 152, 47–104 (2004)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Savinov, N., Dosovitskiy, A., Koltun, V.: Semi-parametric topological memory for navigation. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=SygwwGbRW
  44. 44.
    Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-93417-4_38CrossRefGoogle Scholar
  45. 45.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arxiv pre-print 1707.06347 (2017)Google Scholar
  46. 46.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
  47. 47.
    Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agents Multi-Agent Syst. 27(1), 1–51 (2013).  https://doi.org/10.1007/s10458-012-9200-210.1007/s10458-012-9200-2CrossRefGoogle Scholar
  48. 48.
    Shatkay, H., Kaelbling, L.P.: Learning topological maps with weak local odometric information. In: IJCAI, vol. 2, pp. 920–929 (1997)Google Scholar
  49. 49.
    Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)CrossRefGoogle Scholar
  51. 51.
    Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 520–527 (2004)Google Scholar
  52. 52.
    Srinivas, A., Jabri, A., Abbeel, P., Levine, S., Finn, C.: Universal planning networks (2018)Google Scholar
  53. 53.
    Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks (2016)Google Scholar
  54. 54.
    Thrun, S.: Learning metric-topological maps for indoor mobile robot navigation. Artif. Intell. 99(1), 21–71 (1998)CrossRefGoogle Scholar
  55. 55.
    Wang, R.F., Spelke, E.S.: Human spatial representation: insights from animals. Trends Cogn. Sci. 6(9), 376–382 (2002).  https://doi.org/10.1016/s1364-6613(02)01961-7CrossRefGoogle Scholar
  56. 56.
    Wayne, G., et al.: Unsupervised predictive memory in a goal-directed agent. arxiv preprint 1803.10760 (2018)Google Scholar
  57. 57.
    Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596 (2019)
  58. 58.
    Xia, F., R. Zamir, A., He, Z.Y., Sax, A., Malik, J., Savarese, S.: Gibson Env: real-world perception for embodied agents. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2018)Google Scholar
  59. 59.
    Xu, K., Li, J., Zhang, M., Du, S., Kawarabayashi, K., Jegelka, S.: What can neural networks reason about? arxiv preprint 1905.13211 (2019)Google Scholar
  60. 60.
    Xu, K., Li, J., Zhang, M., Du, S.S., Kawarabayashi, K.i., Jegelka, S.: What can neural networks reason about? arXiv preprint arXiv:1905.13211 (2019)
  61. 61.
    Zhang, J., Tai, L., Boedecker, J., Burgard, W., Liu, M.: Neural SLAM. arxiv preprint 1706.09520 (2017)Google Scholar
  62. 62.
    Zhou, J., et al.: Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434 (2018)
  63. 63.
    Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3357–3364. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.INRIA Chroma team, CITI Lab. INSA LyonVilleurbanneFrance
  2. 2.Université de Lyon, INSA-Lyon, LIRIS, CNRSLyonFrance

Personalised recommendations