Abstract
Visual place classification from a first-person-view monocular RGB image is a fundamental problem in long-term robot navigation. A difficulty arises from the fact that RGB image classifiers are often vulnerable to spatial and appearance changes and degrade due to domain shifts, such as seasonal, weather, and lighting differences. To address this issue, multi-sensor fusion approaches combining RGB and depth (D) (e.g., LIDAR, radar, stereo) have gained popularity in recent years. Inspired by these efforts, we revisit the single-modal RGB visual place classification without requiring additional sensing devices, by exploring the use of pseudo-depth measurements from recently-developed techniques of “domain-invariant” monocular estimation as an additional pseudo depth modality. To this end, we develop a novel multimodal neural network for fully self-supervised training/classifying RGB and pseudo-D data. The results of experiments on challenging cross-domain scenarios with public NCLT datasets are presented to demonstrate effectiveness of the proposed approach.
Supported by JSPS KAKENHI Grant Numbers 23K11270, 20K12008.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Burnett, K., Wu, Y., Yoon, D.J., Schoellig, A.P., Barfoot, T.D.: Are we ready for radar to replace lidar in all-weather mapping and localization? IEEE Robot. Autom. Lett. 7(4), 10328–10335 (2022)
Carlevaris-Bianco, N., Ushani, A.K., Eustice, R.M.: University of michigan north campus long-term vision and lidar dataset. Int. J. Robot. Res. 35(9), 1023–1035 (2016)
Chaplot, D.S., Gandhi, D.P., Gupta, A., Salakhutdinov, R.R.: Object goal navigation using goal-oriented semantic exploration. Adv. Neural. Inf. Process. Syst. 33, 4247–4258 (2020)
Cummins, M., Newman, P.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30(9), 1100–1123 (2011)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Garcia-Fidalgo, E., Ortiz, A.: iBoW-LCD: an appearance-based loop-closure detection approach using incremental bags of binary words. IEEE Robot. Autom. Lett. 3(4), 3051–3057 (2018)
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vision 129, 1789–1819 (2021)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23
Hiroki, T., Tanaka, K.: Long-term knowledge distillation of visual place classifiers. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 541–546. IEEE (2019)
Kim, G., Park, B., Kim, A.: 1-day learning, 1-year localization: long-term lidar localization using scan context image. IEEE Robot. Autom. Lett. 4(2), 1948–1955 (2019)
Kurauchi, K., Tanaka, K., Yamamoto, R., Yoshida, M.: Active domain-invariant self-localization using ego-centric and world-centric maps. In: Tistarelli, M., Dubey, S.R., Singh, S.K., Jiang, X. (eds.) Computer Vision and Machine Intelligence, pp. 475–487. Springer Nature Singapore, Singapore (2023)
Lázaro, M.T., Capobianco, R., Grisetti, G.: Efficient long-term mapping in dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 153–160. IEEE (2018)
Mo, N., Gan, W., Yokoya, N., Chen, S.: Es6d: a computation efficient and symmetry-aware 6d pose regression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6718–6727 (2022)
Ohta, T., Tanaka, K., Yamamoto, R.: Scene graph descriptors for visual place classification from noisy scene data. In: ICT Express (2023)
Pham, Q.H., et al.: A 3d dataset: towards autonomous driving in challenging environments. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 2267–2273. IEEE (2020)
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623–1637 (2020)
Saxena, A., Sun, M., Ng, A.Y.: Make3d: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2008)
Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6896–6906 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Toft, C., Olsson, C., Kahl, F.: Long-term 3d localization and pose from semantic labellings. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 650–659 (2017)
Wang, H., Wang, W., Liang, W., Xiong, C., Shen, J.: Structured scene memory for vision-language navigation. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp. 8455–8464 (2021)
Wang, M., Deng, W.: Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018)
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Yang, N., Tanaka, K., Fang, Y., Fei, X., Inagami, K., Ishikawa, Y.: Long-term vehicle localization using compressed visual experiences. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 2203–2208. IEEE (2018)
Ye, J., Batra, D., Wijmans, E., Das, A.: Auxiliary tasks speed up learning point goal navigation. In: Conference on Robot Learning, pp. 498–516. PMLR (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Iwasaki, T., Tanaka, K., Tsukahara, K. (2023). A Multimodal Approach to Single-Modal Visual Place Classification. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham. https://doi.org/10.1007/978-3-031-47634-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-47634-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47633-4
Online ISBN: 978-3-031-47634-1
eBook Packages: Computer ScienceComputer Science (R0)