SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding

Blum, Hermann; Müller, Marcus G.; Gawel, Abel; Siegwart, Roland; Cadena, Cesar

doi:10.1007/978-3-031-25555-7_9

Hermann Blum¹³,
Marcus G. Müller^13,14,
Abel Gawel¹⁵,
Roland Siegwart¹³ &
…
Cesar Cadena¹³

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 27))

Included in the following conference series:

The International Symposium of Robotics Research

985 Accesses

Abstract

In order to operate in human environments, a robot’s semantic perception has to overcome open-world challenges such as novel objects and domain gaps. Autonomous deployment to such environments therefore requires robots to update their knowledge and learn without supervision. We investigate how a robot can autonomously discover novel semantic classes and improve accuracy on known classes when exploring an unknown environment. To this end, we develop a general framework for mapping and clustering that we then use to generate a self-supervised learning signal to update a semantic segmentation model. In particular, we show how clustering parameters can be optimized during deployment and that fusion of multiple observation modalities improves novel object discovery compared to prior work. Models, data, and implementations can be found at github.com/hermannsblum/scim.

This paper was financially supported by the HILTI Group.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
‘unsupervised’ refers to novel classes. All tested methods are supervised on the known classes.

References

Garg, S., Sünderhauf, N., Dayoub, F., et al.: Semantics for robotic mapping, perception and interaction: a survey. Engl. Found. Trends®in Robot. 8(1–2) (2020). https://doi.org/10.1561/2300000059, https://www.nowpublishers.com/article/Details/ROB-059
Liu, B.: Learning on the job: online lifelong and continual learning. In: AAAI, vol. 34, no. 09 (2020). https://doi.org/10.1609/aaai.v34i09.7079, https://ojs.aaai.org/index.php/AAAI/article/view/7079
Joseph, K.J., Khan, S., Khan, F.S., Balasubramanian, V.N.: Towards open world object detection (2021)
Google Scholar
Cen, J., Yun, P., Cai, J., Wang, M.Y., Liu, M.: Deep metric learning for open world semantic segmentation (2021)
Google Scholar
Lungarella, M., Metta, G., Pfeifer, R., Sandini, G.: Developmental robotics: a survey. Connect. Sci. 15(4), 151–190 (2003). https://doi.org/10.1080/09540090310001655110
Article Google Scholar
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: SemanticFusion: dense 3D semantic mapping with convolutional neural networks. Bayesian Forecast. Dyn. Models 22(2) (2016). https://doi.org/10.1007/b94608
Grinvald, M., Furrer, F., Novkovic, T., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Lett. 4(3) (2019). https://doi.org/10.1109/LRA.2019.2923960, https://ieeexplore.ieee.org/document/8741085/
Blum, H., Milano, F., Zurbrügg, R., Siegwart, R., Cadena, C., Gawel, A.: Self-improving semantic perception for indoor localisation. In: Proceedings of the 5th Conference on Robot Learning (2021). https://proceedings.mlr.press/v164/blum22a.html
Nakajima, Y., Kang, B., Saito, H., Kitani, K.: Incremental class discovery for semantic segmentation with RGBD sensing (2019). http://openaccess.thecvf.com/content_ICCV_2019/html/Nakajima_Incremental_Class_Discovery_for_Semantic_Segmentation_With_RGBD_Sensing_ICCV_2019_paper.html
Hamilton, M., Zhang, Z., Hariharan, B., Snavely, N., Freeman, W.T.: Unsupervised semantic segmentation by distilling feature correspondences. In: ICLR (2022)
Google Scholar
Uhlemeyer, S., Rottmann, M., Gottschalk, H.: Towards unsupervised open world semantic segmentation (2022)
Google Scholar
Caron, M., Touvron, H., Misra, I., et al.: Emerging properties in self-supervised vision transformers. arXiv:2104.14294 [cs] (2021)
Ester, M., Kriegel, H.-P., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise (1996)
Google Scholar
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14
Chapter Google Scholar
Fu, L., Lin, P., Vasilakos, A.V., Wang, S.: An overview of recent multi-view clustering. Neurocomputing 402, 148-161 (2020). https://doi.org/10.1016/j.neucom.2020.02.104, https://www.sciencedirect.com/science/article/pii/S0925231220303222
Shah, S.A., Koltun, V.: Deep continuous clustering (2018)
Google Scholar
Du, S., Liu, Z., Chen, Z., Yang, W., Wang, S.: Differentiable bi-sparse multi-view co-clustering. IEEE Trans. Signal Process. 69, 4623–4636 (2021). https://doi.org/10.1109/TSP.2021.3101979
Article MathSciNet MATH Google Scholar
Yu, L., Liu, X., van de Weijer, J.: Self-training for class-incremental semantic segmentation (2020). http://arxiv.org/abs/2012.03362
Michieli, U., Zanuttigh, P.: Incremental learning techniques for semantic segmentation (2019). http://openaccess.thecvf.com/content_ICCVW_2019/html/TASK-CV/Michieli_Incremental_Learning_Techniques_for_Semantic_Segmentation_ICCVW_2019_paper.html
Potts, R.B.: Some generalized order-disorder transformations. In: Mathematical Proceedings of the Cambridge Philosophical Society (1952). https://doi.org/10.1017/S0305004100027419, http://www.cambridge.org/core/journals/mathematical-proceedings-of-the-cambridge-philosophical-society/article/some-generalized-orderdisorder-transformations/5FD50240095F40BD123171E5F76CDBE0
Head, T., Kumar, M., Nahrstaedt, H., Louppe, G., Shcherbatyi, I.: Scikit-optimize/scikit-optimize (2021). https://zenodo.org/record/1157319
Schmid, L., Delmerico, J., Schönberger, J., et al.: Panoptic multi-TSDFs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. In: ICRA (2022). http://arxiv.org/abs/2109.10165
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017). http://openaccess.thecvf.com/content_cvpr_2017/html/Dai_ScanNet_Richly-Annotated_3D_CVPR_2017_paper.html
Frey, J., Blum, H., Milano, F., Siegwart, R., Cadena, C.: Continual learning of semantic segmentation using complementary 2D-3D data representations. arXiv:2111.02156 [cs] (2021)
Gojcic, Z., Zhou, C., Wegner, J.D., Wieser, A.: The perfect match: 3D point cloud matching with smoothed densities. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/CVPR.2019.00569, https://ieeexplore.ieee.org/document/8954296/
Blum, H., Sarlin, P.-E., Nieto, J., Siegwart, R., Cadena, C.: The fishyscapes benchmark: measuring blind spots in semantic segmentation. Int. J. Comput. Vision 129(11), 3119–3135 (2021). https://doi.org/10.1007/s11263-021-01511-6
Article Google Scholar
Jung, S., Lee, J., Gwak, D., Choi, S., Choo, J.: Standardized max logits: a simple yet effective approach for identifying unexpected road obstacles in urban-scene segmentation (2021)
Google Scholar
Douillard, A., Chen, Y., Dapogny, A., Cord, M.: PLOP: learning without forgetting for continual semantic segmentation (2020)
Google Scholar
Munkres - Munkres implementation for Python. http://software.clapper.org/munkres/#license
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007). https://aclanthology.org/D07-1043
Zurbrügg, R., Blum, H., Cadena, C., Siegwart, R., Schmid, L.: Embodied active domain adaptation for semantic segmentation via informative path planning. arXiv, Technical report arXiv:2203.00549 (2022)
Chaplot, D.S., Dalal, M., Gupta, S., Malik, J., Salakhutdinov, R.R.: SEAL: self-supervised embodied active learning using exploration and 3D consistency. In: Advances in Neural Information Processing Systems, vol. 34 (2021). https://proceedings.neurips.cc/paper/2021/hash/6d0c932802f6953f70eb20931645fa40-Abstract.html

Download references

Author information

Authors and Affiliations

Autonomous Systems Lab, ETH Zürich, Zürich, Switzerland
Hermann Blum, Marcus G. Müller, Roland Siegwart & Cesar Cadena
German Aerospace Center (DLR), Munich, Germany
Marcus G. Müller
Huawei Research, Zürich, Switzerland
Abel Gawel

Authors

Hermann Blum
View author publications
You can also search for this author in PubMed Google Scholar
Marcus G. Müller
View author publications
You can also search for this author in PubMed Google Scholar
Abel Gawel
View author publications
You can also search for this author in PubMed Google Scholar
Roland Siegwart
View author publications
You can also search for this author in PubMed Google Scholar
Cesar Cadena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hermann Blum .

Editor information

Editors and Affiliations

EPFL STI SMT-GE, Ecole Polytechnique Federale de Lausanne, Lausanne, Vaud, Switzerland
Aude Billard
Institute for Anthropomatics and Robotic, KIT, Karlsruhe, Baden-Württemberg, Germany
Tamim Asfour
Artificial Intelligence Laboratory, Department of Computer Science, Stanford University, Stanford, CA, USA
Oussama Khatib

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 9014 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blum, H., Müller, M.G., Gawel, A., Siegwart, R., Cadena, C. (2023). SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding. In: Billard, A., Asfour, T., Khatib, O. (eds) Robotics Research. ISRR 2022. Springer Proceedings in Advanced Robotics, vol 27. Springer, Cham. https://doi.org/10.1007/978-3-031-25555-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-25555-7_9
Published: 08 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25554-0
Online ISBN: 978-3-031-25555-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding