Skip to main content

SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding

  • Conference paper
  • First Online:
Book cover Robotics Research (ISRR 2022)

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 27))

Included in the following conference series:

  • 985 Accesses

Abstract

In order to operate in human environments, a robot’s semantic perception has to overcome open-world challenges such as novel objects and domain gaps. Autonomous deployment to such environments therefore requires robots to update their knowledge and learn without supervision. We investigate how a robot can autonomously discover novel semantic classes and improve accuracy on known classes when exploring an unknown environment. To this end, we develop a general framework for mapping and clustering that we then use to generate a self-supervised learning signal to update a semantic segmentation model. In particular, we show how clustering parameters can be optimized during deployment and that fusion of multiple observation modalities improves novel object discovery compared to prior work. Models, data, and implementations can be found at github.com/hermannsblum/scim.

This paper was financially supported by the HILTI Group.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ‘unsupervised’ refers to novel classes. All tested methods are supervised on the known classes.

References

  1. Garg, S., Sünderhauf, N., Dayoub, F., et al.: Semantics for robotic mapping, perception and interaction: a survey. Engl. Found. Trends®in Robot. 8(1–2) (2020). https://doi.org/10.1561/2300000059, https://www.nowpublishers.com/article/Details/ROB-059

  2. Liu, B.: Learning on the job: online lifelong and continual learning. In: AAAI, vol. 34, no. 09 (2020). https://doi.org/10.1609/aaai.v34i09.7079, https://ojs.aaai.org/index.php/AAAI/article/view/7079

  3. Joseph, K.J., Khan, S., Khan, F.S., Balasubramanian, V.N.: Towards open world object detection (2021)

    Google Scholar 

  4. Cen, J., Yun, P., Cai, J., Wang, M.Y., Liu, M.: Deep metric learning for open world semantic segmentation (2021)

    Google Scholar 

  5. Lungarella, M., Metta, G., Pfeifer, R., Sandini, G.: Developmental robotics: a survey. Connect. Sci. 15(4), 151–190 (2003). https://doi.org/10.1080/09540090310001655110

    Article  Google Scholar 

  6. McCormac, J., Handa, A., Davison, A., Leutenegger, S.: SemanticFusion: dense 3D semantic mapping with convolutional neural networks. Bayesian Forecast. Dyn. Models 22(2) (2016). https://doi.org/10.1007/b94608

  7. Grinvald, M., Furrer, F., Novkovic, T., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Lett. 4(3) (2019). https://doi.org/10.1109/LRA.2019.2923960, https://ieeexplore.ieee.org/document/8741085/

  8. Blum, H., Milano, F., Zurbrügg, R., Siegwart, R., Cadena, C., Gawel, A.: Self-improving semantic perception for indoor localisation. In: Proceedings of the 5th Conference on Robot Learning (2021). https://proceedings.mlr.press/v164/blum22a.html

  9. Nakajima, Y., Kang, B., Saito, H., Kitani, K.: Incremental class discovery for semantic segmentation with RGBD sensing (2019). http://openaccess.thecvf.com/content_ICCV_2019/html/Nakajima_Incremental_Class_Discovery_for_Semantic_Segmentation_With_RGBD_Sensing_ICCV_2019_paper.html

  10. Hamilton, M., Zhang, Z., Hariharan, B., Snavely, N., Freeman, W.T.: Unsupervised semantic segmentation by distilling feature correspondences. In: ICLR (2022)

    Google Scholar 

  11. Uhlemeyer, S., Rottmann, M., Gottschalk, H.: Towards unsupervised open world semantic segmentation (2022)

    Google Scholar 

  12. Caron, M., Touvron, H., Misra, I., et al.: Emerging properties in self-supervised vision transformers. arXiv:2104.14294 [cs] (2021)

  13. Ester, M., Kriegel, H.-P., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise (1996)

    Google Scholar 

  14. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14

    Chapter  Google Scholar 

  15. Fu, L., Lin, P., Vasilakos, A.V., Wang, S.: An overview of recent multi-view clustering. Neurocomputing 402, 148-161 (2020). https://doi.org/10.1016/j.neucom.2020.02.104, https://www.sciencedirect.com/science/article/pii/S0925231220303222

  16. Shah, S.A., Koltun, V.: Deep continuous clustering (2018)

    Google Scholar 

  17. Du, S., Liu, Z., Chen, Z., Yang, W., Wang, S.: Differentiable bi-sparse multi-view co-clustering. IEEE Trans. Signal Process. 69, 4623–4636 (2021). https://doi.org/10.1109/TSP.2021.3101979

    Article  MathSciNet  MATH  Google Scholar 

  18. Yu, L., Liu, X., van de Weijer, J.: Self-training for class-incremental semantic segmentation (2020). http://arxiv.org/abs/2012.03362

  19. Michieli, U., Zanuttigh, P.: Incremental learning techniques for semantic segmentation (2019). http://openaccess.thecvf.com/content_ICCVW_2019/html/TASK-CV/Michieli_Incremental_Learning_Techniques_for_Semantic_Segmentation_ICCVW_2019_paper.html

  20. Potts, R.B.: Some generalized order-disorder transformations. In: Mathematical Proceedings of the Cambridge Philosophical Society (1952). https://doi.org/10.1017/S0305004100027419, http://www.cambridge.org/core/journals/mathematical-proceedings-of-the-cambridge-philosophical-society/article/some-generalized-orderdisorder-transformations/5FD50240095F40BD123171E5F76CDBE0

  21. Head, T., Kumar, M., Nahrstaedt, H., Louppe, G., Shcherbatyi, I.: Scikit-optimize/scikit-optimize (2021). https://zenodo.org/record/1157319

  22. Schmid, L., Delmerico, J., Schönberger, J., et al.: Panoptic multi-TSDFs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. In: ICRA (2022). http://arxiv.org/abs/2109.10165

  23. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017). http://openaccess.thecvf.com/content_cvpr_2017/html/Dai_ScanNet_Richly-Annotated_3D_CVPR_2017_paper.html

  24. Frey, J., Blum, H., Milano, F., Siegwart, R., Cadena, C.: Continual learning of semantic segmentation using complementary 2D-3D data representations. arXiv:2111.02156 [cs] (2021)

  25. Gojcic, Z., Zhou, C., Wegner, J.D., Wieser, A.: The perfect match: 3D point cloud matching with smoothed densities. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/CVPR.2019.00569, https://ieeexplore.ieee.org/document/8954296/

  26. Blum, H., Sarlin, P.-E., Nieto, J., Siegwart, R., Cadena, C.: The fishyscapes benchmark: measuring blind spots in semantic segmentation. Int. J. Comput. Vision 129(11), 3119–3135 (2021). https://doi.org/10.1007/s11263-021-01511-6

    Article  Google Scholar 

  27. Jung, S., Lee, J., Gwak, D., Choi, S., Choo, J.: Standardized max logits: a simple yet effective approach for identifying unexpected road obstacles in urban-scene segmentation (2021)

    Google Scholar 

  28. Douillard, A., Chen, Y., Dapogny, A., Cord, M.: PLOP: learning without forgetting for continual semantic segmentation (2020)

    Google Scholar 

  29. Munkres - Munkres implementation for Python. http://software.clapper.org/munkres/#license

  30. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007). https://aclanthology.org/D07-1043

  31. Zurbrügg, R., Blum, H., Cadena, C., Siegwart, R., Schmid, L.: Embodied active domain adaptation for semantic segmentation via informative path planning. arXiv, Technical report arXiv:2203.00549 (2022)

  32. Chaplot, D.S., Dalal, M., Gupta, S., Malik, J., Salakhutdinov, R.R.: SEAL: self-supervised embodied active learning using exploration and 3D consistency. In: Advances in Neural Information Processing Systems, vol. 34 (2021). https://proceedings.neurips.cc/paper/2021/hash/6d0c932802f6953f70eb20931645fa40-Abstract.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hermann Blum .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 9014 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Blum, H., Müller, M.G., Gawel, A., Siegwart, R., Cadena, C. (2023). SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding. In: Billard, A., Asfour, T., Khatib, O. (eds) Robotics Research. ISRR 2022. Springer Proceedings in Advanced Robotics, vol 27. Springer, Cham. https://doi.org/10.1007/978-3-031-25555-7_9

Download citation

Publish with us

Policies and ethics