Abstract
The goal of object navigation is to navigate an agent to a target object using visual input. Without GPS and the map, one challenge of this task is how to locate the target object in the unseen environment, especially when the target object is not in the field of view. Previous works use relation graphs to encode the concurrence relationships among all the object categories, but these relation graphs are usually too flat for the agent to locate the target object efficiently. In this paper, a Hierarchical Graph Convolutional Neural Network (HGCNN) is proposed to encode the object relationships in a hierarchical manner. Specifically, the HGCNN consists of two graph convolution blocks and a graph pooling block, which constructs the hierarchical relation graph by learning an area-level graph from the object-level graph. Consequently, the HGCNN based framework enables the agent to locate the target object efficiently in the unseen environment. The proposed model is evaluated in the AI2-iTHOR environment, and the performance of object navigation shows a significant improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)
Chaplot, D.S., Gandhi, D.P., Gupta, A., Salakhutdinov, R.R.: Object goal navigation using goal-oriented semantic exploration. In: Advances in Neural Information Processing Systems, pp. 4247–4258 (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Diehl, F.: Edge contraction pooling for graph neural networks. arXiv preprint arXiv:1905.10990 (2019)
Du, H., Yu, X., Zheng, L.: Learning object relation graph and tentative policy for visual navigation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 19–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_2
Gupta, S., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2616–2625 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (Poster), pp. 1–15 (2015)
Kolve, E., et al.: Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474 (2017)
Lee, J., Lee, I., Kang, J.: Self-attention graph pooling. In: International Conference on Machine Learning, pp. 3734–3743 (2019)
Lv, Y., Xie, N., Shi, Y., Wang, Z., Shen, H.T.: Improving target-driven visual navigation with attention on 3d spatial relationships. arXiv preprint arXiv:2005.02153 (2020)
Maksymets, O., et al.: Thda: treasure hunt data augmentation for semantic navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15374–15383 (2021)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Oriolo, G., Vendittelli, M., Ulivi, G.: On-line map building and navigation for autonomous mobile robots. In: Proceedings of 1995 IEEE International Conference on Robotics and Automation, pp. 2900–2906 (1995)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Taniguchi, A., Sasaki, F., Yamashina, R.: Pose invariant topological memory for visual navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15384–15393 (2021)
Wijmans, E., et al.: DD-PPO: learning near-perfect pointgoal navigators from 2.5 billion frames. In: International Conference on Learning Representations, pp. 1–21 (2019)
Wortsman, M., Ehsani, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp. 6750–6759 (2019)
Yang, W., Wang, X., Farhadi, A., Gupta, A., Mottaghi, R.: Visual semantic navigation using scene priors. In: International Conference on Learning Representations, pp. 1–13 (2019)
Ye, J., Batra, D., Das, A., Wijmans, E.: Auxiliary tasks and exploration enable objectgoal navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16117–16126 (2021)
Ying, R., You, J., Morris, C., Ren, X., Hamilton, W.L., Leskovec, J.: Hierarchical graph representation learning with differentiable pooling. In: Advances in Neural Information Processing Systems, pp. 4805–4815 (2018)
Zhang, S., Song, X., Bai, Y., Li, W., Chu, Y., Jiang, S.: Hierarchical object-to-zone graph for object navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15130–15140 (2021)
Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., Farhadi, A.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3357–3364 (2017)
Acknowledgement
This work was supported by the National Key Research and Development Program of China under Grant 2020AAA0105900, partly by National Natural Science Foundation (NSFC) of China (grants 91948303, 61973301, 61972020), partly by Youth Innovation Promotion Association CAS, and partly by Beijing Science and Technology Plan Project (grant Z201100008320029).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, T., Yang, X., Zheng, S. (2022). Learning Hierarchical Graph Convolutional Neural Network for Object Navigation. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13530. Springer, Cham. https://doi.org/10.1007/978-3-031-15931-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-15931-2_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15930-5
Online ISBN: 978-3-031-15931-2
eBook Packages: Computer ScienceComputer Science (R0)