Skip to main content

SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding

  • 106 Accesses

Part of the Springer Proceedings in Advanced Robotics book series (SPAR,volume 27)


In order to operate in human environments, a robot’s semantic perception has to overcome open-world challenges such as novel objects and domain gaps. Autonomous deployment to such environments therefore requires robots to update their knowledge and learn without supervision. We investigate how a robot can autonomously discover novel semantic classes and improve accuracy on known classes when exploring an unknown environment. To this end, we develop a general framework for mapping and clustering that we then use to generate a self-supervised learning signal to update a semantic segmentation model. In particular, we show how clustering parameters can be optimized during deployment and that fusion of multiple observation modalities improves novel object discovery compared to prior work. Models, data, and implementations can be found at


  • Self-supervised learning
  • Semantic segmentation
  • Self-improving perception
  • Semantic scene understanding

This paper was financially supported by the HILTI Group.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-25555-7_9
  • Chapter length: 17 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   229.00
Price excludes VAT (USA)
  • ISBN: 978-3-031-25555-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   299.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.


  1. 1.

    ‘unsupervised’ refers to novel classes. All tested methods are supervised on the known classes.


  1. Garg, S., Sünderhauf, N., Dayoub, F., et al.: Semantics for robotic mapping, perception and interaction: a survey. Engl. Found. Trends®in Robot. 8(1–2) (2020).,

  2. Liu, B.: Learning on the job: online lifelong and continual learning. In: AAAI, vol. 34, no. 09 (2020).,

  3. Joseph, K.J., Khan, S., Khan, F.S., Balasubramanian, V.N.: Towards open world object detection (2021)

    Google Scholar 

  4. Cen, J., Yun, P., Cai, J., Wang, M.Y., Liu, M.: Deep metric learning for open world semantic segmentation (2021)

    Google Scholar 

  5. Lungarella, M., Metta, G., Pfeifer, R., Sandini, G.: Developmental robotics: a survey. Connect. Sci. 15(4), 151–190 (2003).

    CrossRef  Google Scholar 

  6. McCormac, J., Handa, A., Davison, A., Leutenegger, S.: SemanticFusion: dense 3D semantic mapping with convolutional neural networks. Bayesian Forecast. Dyn. Models 22(2) (2016).

  7. Grinvald, M., Furrer, F., Novkovic, T., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Lett. 4(3) (2019).,

  8. Blum, H., Milano, F., Zurbrügg, R., Siegwart, R., Cadena, C., Gawel, A.: Self-improving semantic perception for indoor localisation. In: Proceedings of the 5th Conference on Robot Learning (2021).

  9. Nakajima, Y., Kang, B., Saito, H., Kitani, K.: Incremental class discovery for semantic segmentation with RGBD sensing (2019).

  10. Hamilton, M., Zhang, Z., Hariharan, B., Snavely, N., Freeman, W.T.: Unsupervised semantic segmentation by distilling feature correspondences. In: ICLR (2022)

    Google Scholar 

  11. Uhlemeyer, S., Rottmann, M., Gottschalk, H.: Towards unsupervised open world semantic segmentation (2022)

    Google Scholar 

  12. Caron, M., Touvron, H., Misra, I., et al.: Emerging properties in self-supervised vision transformers. arXiv:2104.14294 [cs] (2021)

  13. Ester, M., Kriegel, H.-P., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise (1996)

    Google Scholar 

  14. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013).

    CrossRef  Google Scholar 

  15. Fu, L., Lin, P., Vasilakos, A.V., Wang, S.: An overview of recent multi-view clustering. Neurocomputing 402, 148-161 (2020).,

  16. Shah, S.A., Koltun, V.: Deep continuous clustering (2018)

    Google Scholar 

  17. Du, S., Liu, Z., Chen, Z., Yang, W., Wang, S.: Differentiable bi-sparse multi-view co-clustering. IEEE Trans. Signal Process. 69, 4623–4636 (2021).

    CrossRef  MathSciNet  MATH  Google Scholar 

  18. Yu, L., Liu, X., van de Weijer, J.: Self-training for class-incremental semantic segmentation (2020).

  19. Michieli, U., Zanuttigh, P.: Incremental learning techniques for semantic segmentation (2019).

  20. Potts, R.B.: Some generalized order-disorder transformations. In: Mathematical Proceedings of the Cambridge Philosophical Society (1952).,

  21. Head, T., Kumar, M., Nahrstaedt, H., Louppe, G., Shcherbatyi, I.: Scikit-optimize/scikit-optimize (2021).

  22. Schmid, L., Delmerico, J., Schönberger, J., et al.: Panoptic multi-TSDFs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. In: ICRA (2022).

  23. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017).

  24. Frey, J., Blum, H., Milano, F., Siegwart, R., Cadena, C.: Continual learning of semantic segmentation using complementary 2D-3D data representations. arXiv:2111.02156 [cs] (2021)

  25. Gojcic, Z., Zhou, C., Wegner, J.D., Wieser, A.: The perfect match: 3D point cloud matching with smoothed densities. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).,

  26. Blum, H., Sarlin, P.-E., Nieto, J., Siegwart, R., Cadena, C.: The fishyscapes benchmark: measuring blind spots in semantic segmentation. Int. J. Comput. Vision 129(11), 3119–3135 (2021).

    CrossRef  Google Scholar 

  27. Jung, S., Lee, J., Gwak, D., Choi, S., Choo, J.: Standardized max logits: a simple yet effective approach for identifying unexpected road obstacles in urban-scene segmentation (2021)

    Google Scholar 

  28. Douillard, A., Chen, Y., Dapogny, A., Cord, M.: PLOP: learning without forgetting for continual semantic segmentation (2020)

    Google Scholar 

  29. Munkres - Munkres implementation for Python.

  30. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007).

  31. Zurbrügg, R., Blum, H., Cadena, C., Siegwart, R., Schmid, L.: Embodied active domain adaptation for semantic segmentation via informative path planning. arXiv, Technical report arXiv:2203.00549 (2022)

  32. Chaplot, D.S., Dalal, M., Gupta, S., Malik, J., Salakhutdinov, R.R.: SEAL: self-supervised embodied active learning using exploration and 3D consistency. In: Advances in Neural Information Processing Systems, vol. 34 (2021).

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hermann Blum .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 9014 KB)

Rights and permissions

Reprints and Permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Blum, H., Müller, M.G., Gawel, A., Siegwart, R., Cadena, C. (2023). SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding. In: Billard, A., Asfour, T., Khatib, O. (eds) Robotics Research. ISRR 2022. Springer Proceedings in Advanced Robotics, vol 27. Springer, Cham.

Download citation