A Multimodal Approach to Single-Modal Visual Place Classification

Iwasaki, Tomoya; Tanaka, Kanji; Tsukahara, Kenta

doi:10.1007/978-3-031-47634-1_18

Tomoya Iwasaki¹³,
Kanji Tanaka¹³ &
Kenta Tsukahara¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14406))

Included in the following conference series:

Asian Conference on Pattern Recognition

267 Accesses

Abstract

Visual place classification from a first-person-view monocular RGB image is a fundamental problem in long-term robot navigation. A difficulty arises from the fact that RGB image classifiers are often vulnerable to spatial and appearance changes and degrade due to domain shifts, such as seasonal, weather, and lighting differences. To address this issue, multi-sensor fusion approaches combining RGB and depth (D) (e.g., LIDAR, radar, stereo) have gained popularity in recent years. Inspired by these efforts, we revisit the single-modal RGB visual place classification without requiring additional sensing devices, by exploring the use of pseudo-depth measurements from recently-developed techniques of “domain-invariant” monocular estimation as an additional pseudo depth modality. To this end, we develop a novel multimodal neural network for fully self-supervised training/classifying RGB and pseudo-D data. The results of experiments on challenging cross-domain scenarios with public NCLT datasets are presented to demonstrate effectiveness of the proposed approach.

Supported by JSPS KAKENHI Grant Numbers 23K11270, 20K12008.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Burnett, K., Wu, Y., Yoon, D.J., Schoellig, A.P., Barfoot, T.D.: Are we ready for radar to replace lidar in all-weather mapping and localization? IEEE Robot. Autom. Lett. 7(4), 10328–10335 (2022)
Article Google Scholar
Carlevaris-Bianco, N., Ushani, A.K., Eustice, R.M.: University of michigan north campus long-term vision and lidar dataset. Int. J. Robot. Res. 35(9), 1023–1035 (2016)
Article Google Scholar
Chaplot, D.S., Gandhi, D.P., Gupta, A., Salakhutdinov, R.R.: Object goal navigation using goal-oriented semantic exploration. Adv. Neural. Inf. Process. Syst. 33, 4247–4258 (2020)
Google Scholar
Cummins, M., Newman, P.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30(9), 1100–1123 (2011)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Garcia-Fidalgo, E., Ortiz, A.: iBoW-LCD: an appearance-based loop-closure detection approach using incremental bags of binary words. IEEE Robot. Autom. Lett. 3(4), 3051–3057 (2018)
Article Google Scholar
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vision 129, 1789–1819 (2021)
Article Google Scholar
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23
Chapter Google Scholar
Hiroki, T., Tanaka, K.: Long-term knowledge distillation of visual place classifiers. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 541–546. IEEE (2019)
Google Scholar
Kim, G., Park, B., Kim, A.: 1-day learning, 1-year localization: long-term lidar localization using scan context image. IEEE Robot. Autom. Lett. 4(2), 1948–1955 (2019)
Article Google Scholar
Kurauchi, K., Tanaka, K., Yamamoto, R., Yoshida, M.: Active domain-invariant self-localization using ego-centric and world-centric maps. In: Tistarelli, M., Dubey, S.R., Singh, S.K., Jiang, X. (eds.) Computer Vision and Machine Intelligence, pp. 475–487. Springer Nature Singapore, Singapore (2023)
Chapter Google Scholar
Lázaro, M.T., Capobianco, R., Grisetti, G.: Efficient long-term mapping in dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 153–160. IEEE (2018)
Google Scholar
Mo, N., Gan, W., Yokoya, N., Chen, S.: Es6d: a computation efficient and symmetry-aware 6d pose regression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6718–6727 (2022)
Google Scholar
Ohta, T., Tanaka, K., Yamamoto, R.: Scene graph descriptors for visual place classification from noisy scene data. In: ICT Express (2023)
Google Scholar
Pham, Q.H., et al.: A 3d dataset: towards autonomous driving in challenging environments. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 2267–2273. IEEE (2020)
Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623–1637 (2020)
Article Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3d: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2008)
Article Google Scholar
Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6896–6906 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Toft, C., Olsson, C., Kahl, F.: Long-term 3d localization and pose from semantic labellings. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 650–659 (2017)
Google Scholar
Wang, H., Wang, W., Liang, W., Xiong, C., Shen, J.: Structured scene memory for vision-language navigation. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp. 8455–8464 (2021)
Google Scholar
Wang, M., Deng, W.: Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018)
Article Google Scholar
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Chapter Google Scholar
Yang, N., Tanaka, K., Fang, Y., Fei, X., Inagami, K., Ishikawa, Y.: Long-term vehicle localization using compressed visual experiences. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 2203–2208. IEEE (2018)
Google Scholar
Ye, J., Batra, D., Wijmans, E., Das, A.: Auxiliary tasks speed up learning point goal navigation. In: Conference on Robot Learning, pp. 498–516. PMLR (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Fukui, 3-9-1, Bunkyo, Fukui, Japan
Tomoya Iwasaki, Kanji Tanaka & Kenta Tsukahara

Authors

Tomoya Iwasaki
View author publications
You can also search for this author in PubMed Google Scholar
Kanji Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Kenta Tsukahara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kanji Tanaka .

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Fukuoka, Japan
Huimin Lu
The University of Sydney, Sydney, NSW, Australia
Michael Blumenstein
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Osaka University, Osaka, Ibaraki, Japan
Yasushi Yagi
Kyushu Institute of Technology, Kitakyushu, Japan
Tohru Kamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iwasaki, T., Tanaka, K., Tsukahara, K. (2023). A Multimodal Approach to Single-Modal Visual Place Classification. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham. https://doi.org/10.1007/978-3-031-47634-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-47634-1_18
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47633-4
Online ISBN: 978-3-031-47634-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Multimodal Approach to Single-Modal Visual Place Classification