Advertisement

Semi-supervised Semantic Mapping Through Label Propagation with Semantic Texture Meshes

  • Radu Alexandru Rosu
  • Jan QuenzelEmail author
  • Sven Behnke
Article
Part of the following topical collections:
  1. Special Issue: Deep Learning for Robotic Vision

Abstract

Scene understanding is an important capability for robots acting in unstructured environments. While most SLAM approaches provide a geometrical representation of the scene, a semantic map is necessary for more complex interactions with the surroundings. Current methods treat the semantic map as part of the geometry which limits scalability and accuracy. We propose to represent the semantic map as a geometrical mesh and a semantic texture coupled at independent resolution. The key idea is that in many environments the geometry can be greatly simplified without loosing fidelity, while semantic information can be stored at a higher resolution, independent of the mesh. We construct a mesh from depth sensors to represent the scene geometry and fuse information into the semantic texture from segmentations of individual RGB views of the scene. Making the semantics persistent in a global mesh enables us to enforce temporal and spatial consistency of the individual view predictions. For this, we propose an efficient method of establishing consensus between individual segmentations by iteratively retraining semantic segmentation with the information stored within the map and using the retrained segmentation to re-fuse the semantics. We demonstrate the accuracy and scalability of our approach by reconstructing semantic maps of scenes from NYUv2 and a scene spanning large buildings.

Keywords

Semantic mapping Label propagation Semantic textured mesh 

Notes

Acknowledgements

We would like to thank David Droeschel for his effort in providing accurate poses for the courtyard dataset. This work was supported by Grant BE 2556/7 of the German Research Foundation (DFG).

References

  1. Acuna, D., Ling, H., Kar, A., & Fidler, S. (2018). Efficient interactive annotation of segmentation datasets with Polygon-RNN++. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 859–868).Google Scholar
  2. Bao, S. Y., & Savarese, S. (2011). Semantic structure from motion. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  3. Bao, S. Y., Chandraker, M., Lin, Y., & Savarese, S. (2013). Dense object reconstruction with semantic priors. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1264–1271).Google Scholar
  4. Blaha, M., Vogel, C., Richard, A., Wegner, J. D., Pock, T., & Schindler, K. (2016). Large-scale semantic 3D reconstruction: An adaptive multi-resolution model for multi-class volumetric labeling. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3176–3184).Google Scholar
  5. Castrejon, L., Kundu, K., Urtasun, R., & Fidler, S. (2017). Annotating object instances with a Polygon-RNN. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  6. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.CrossRefGoogle Scholar
  7. Cherabier, I., Häne, C., Oswald, M. R., & Pollefeys, M. (2016). Multi-label semantic 3D reconstruction using voxel blocks. In Proceedings of the international conference on 3D vision (3DV) (pp. 601–610).Google Scholar
  8. Cherabier, I., Schönberger, J. L., Oswald, M. R., Pollefeys, M., & Geiger, A. (2018). Learning priors for semantic 3D reconstruction. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  9. Civera, J., Gálvez-López, D., Riazuelo, L., Tardós, J. D., & Montiel, J. (2011). Towards semantic SLAM using a monocular camera. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1277–1284).Google Scholar
  10. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 248–255).Google Scholar
  11. Douglas, D. H., & Peucker, T. K. (1973). Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10(2), 112–122.CrossRefGoogle Scholar
  12. Droeschel, D., & Behnke, S. (2018). Efficient continuous-time SLAM for 3D lidar-based online mapping. In Proceedings of the IEEE international conference on robotics and automation (ICRA).Google Scholar
  13. Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 2650–2658).Google Scholar
  14. Engel, J., Schöps, T., & Cremers, D. (2014). LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the European conference on computer vision (ECCV) (pp. 834–849).Google Scholar
  15. Garland, M., & Heckbert, P. S. (1998). Simplifying surfaces with color and texture using quadric error metrics. In Proceedings of the IEEE VIS (pp. 263–269).Google Scholar
  16. Goldman, D., & Chen, J. (2005). Vignette and exposure calibration and compensation. In Proceedings of the IEEE international conference on computer vision (ICCV).Google Scholar
  17. Häne, C., Zach, C., Cohen, A., Angst, R., & Pollefeys, M. (2013). Joint 3D scene reconstruction and class segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 97–104).Google Scholar
  18. Häne, C., Savinov, N., & Pollefeys, M. (2014). Class specific 3D object shape priors using surface normals. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 652–659).Google Scholar
  19. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778).Google Scholar
  20. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE internatioinal conference on computer vision (ICCV) (pp. 2980–2988).Google Scholar
  21. Hermans, A., Floros, G., & Leibe, B. (2014). Dense 3D semantic mapping of indoor scenes from RGB-D images. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 2631–2638).Google Scholar
  22. Holz, D., & Behnke, S. (2015). Registration of non-uniform density 3D laser scans for mapping with micro aerial vehicles. Robotics and Autonomous Systems, 74, 318–330.CrossRefGoogle Scholar
  23. Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C., & Burgard, W. (2013). OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots, 34(3), 189–206.CrossRefGoogle Scholar
  24. Jain, S. D., & Grauman, K. (2016). Active image segmentation propagation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2864–2873).Google Scholar
  25. Kazhdan, M., & Hoppe, H. (2013). Screened poisson surface reconstruction. ACM Transactions on Graphics (ToG), 32(3), 29.CrossRefzbMATHGoogle Scholar
  26. Kostavelis, I., & Gasteratos, A. (2015). Semantic mapping for mobile robotics tasks: A survey. Robotics and Autonomous Systems, 66, 86–103.CrossRefGoogle Scholar
  27. Kundu, A., Li, Y., Dellaert, F., Li, F., & Rehg, J. M. (2014). Joint semantic segmentation and 3D reconstruction from monocular video. In Proceedings of the European conference on computer vision (ECCV) (pp. 703–718).Google Scholar
  28. Landrieu, L., & Simonovsky, M. (2017). Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  29. Li, X., & Belaroussi, R. (2016). Semi-dense 3D semantic mapping from monocular SLAM. arXiv preprint arXiv:1611.04144
  30. Lianos, K. N., Schönberger, J. L., Pollefeys, M., & Sattler, T. (2018). VSO: Visual semantic odometry. In Proceedings of the European conference on computer vision (ECCV) (pp. 234–250).Google Scholar
  31. Lin, G., Milan, A., Shen, C., & Reid, I. (2017). RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5168–5177).Google Scholar
  32. Ma, L., Stückler, J., Kerl, C., & Cremers, D. (2017). Multi-view deep learning for consistent semantic mapping with RGB-D cameras. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 598–605).Google Scholar
  33. Mackowiak, R., Lenz, P., Ghori, O., Diego, F., Lange, O., & Rother, C. (2018). CEREALS—cost-effective region-based active learning for semantic segmentation. arXiv preprint arXiv:1810.09726.
  34. Maninchedda, F., Häne, C., Jacquet, B., Delaunoy, A., & Pollefeys, M. (2016). Semantic 3D reconstruction of heads. In Proceedings of the European conference on computer vision (ECCV) (pp. 667–683).Google Scholar
  35. McCormac, J., Handa, A., Davison, A., & Leutenegger, S. (2017). SemanticFusion: Dense 3D semantic mapping with convolutional neural networks. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 4628–4635).Google Scholar
  36. Nakajima, Y., Tateno, K., Tombari, F., & Saito, H. (2018). Fast and accurate semantic mapping through geometric-based incremental segmentation. arXiv preprint arXiv:1803.02784.
  37. Neuhold, G., Ollmann, T., Bulo, S.R., & Kontschieder, P. (2017). The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 5000–5009).Google Scholar
  38. Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017a). PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  39. Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017b). PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems (pp. 5099–5108).Google Scholar
  40. Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., & Ng, A. Y. (2009). ROS: An open-source robot operating system. In ICRA workshop on open source software.Google Scholar
  41. Riegler, G., Ulusoy, A.O., & Geiger, A. (2017). OctNet: Learning deep 3D representations at high resolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  42. Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3234–3243).Google Scholar
  43. Savinov, N., Häne, C., Ladicky, L., & Pollefeys, M. (2016). Semantic 3D reconstruction with continuous regularization and ray potentials using a visibility consistency constraint. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5460–5469).Google Scholar
  44. Schönberger, J. L., Pollefeys, M., Geiger, A., & Sattler, T. (2018). Semantic visual localization. CVPR.Google Scholar
  45. Sheikh, R., Garbade, M., & Gall, J. (2016). Real-time semantic segmentation with label propagation. In Proceedings of the European conference on computer vision (ECCV) (pp. 3–14).Google Scholar
  46. Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In Proceedings of the European conference on computer vision (ECCV) (pp. 746–760).Google Scholar
  47. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
  48. Stueckler, J., Waldvogel, B., Schulz, H., & Behnke, S. (2014). Dense real-time mapping of object-class semantics from RGB-D video. Journal of Real-Time Image Processing (JRTIP), 10, 599–609Google Scholar
  49. Su, H., Jampani, V., Deqing, S. S., Maji, E., Yang, M. H., Kautz, J., et al. (2018). SPLATNet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  50. Sun, L., Yan, Z., Zaganidis, A., Zhao, C., & Duckett, T. (2018). Recurrent-OctoMap: Learning state-based map refinement for long-term semantic mapping with 3D-lidar data. IEEE Robotics and Automation Letters, 3(4), 3749–3756.CrossRefGoogle Scholar
  51. Tatarchenko, M., Park, J., Koltun, V., & Zhou, Q. Y. (2018). Tangent convolutions for dense prediction in 3D. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3887–3896).Google Scholar
  52. Tateno, K., Tombari, F., Laina, I., & Navab, N. (2017). CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction. arXiv preprint arXiv:1704.03489.
  53. Thürrner, G., & Wüthrich, C. A. (1998). Computing vertex normals from polygonal facets. Journal of Graphics Tools, 3(1), 43–46.CrossRefzbMATHGoogle Scholar
  54. Tulsiani, S., Zhou, T., Efros, A. A., & Malik, J. (2017). Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  55. Valentin, J. P., Sengupta, S., Warrell, J., Shahrokni, A., & Torr, P. H. (2013). Mesh based semantic modelling for indoor and outdoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2067–2074).Google Scholar
  56. Vezhnevets, A., Buhmann, J. M., & Ferrari, V. (2012). Active learning for semantic segmentation with expected change. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3162–3169).Google Scholar
  57. Vineet, V., Miksik, O., Lidegaard, M., Nießner, M., Golodetz, S., Prisacariu, V. A., Kähler, O., Murray, D. W., Izadi, S., Pérez, P., et al. (2015). Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 75–82).Google Scholar
  58. Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., & Davison, A. (2015). ElasticFusion: Dense SLAM without a pose graph. In Proceedings of robotics: science and systems.Google Scholar
  59. Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5987–5995).Google Scholar
  60. Yang, L., Zhang, Y., Chen, J., Zhang, S., & Chen, D. Z. (2017). Suggestive annotation: A deep active learning framework for biomedical image segmentation. In international conference on medical image computing and computer-assisted intervention (pp. 399–407).Google Scholar
  61. Zaganidis, A., Sun, L., Duckett, T., & Cielniak, G. (2018). Integrating deep semantic segmentation into 3D point cloud registration. IEEE Robotics and Automation Letters, 3(4), 2942–2949.CrossRefGoogle Scholar
  62. Zollhöfer, M., Stotko, P., Görlitz, A., Theobalt, C., Nießner, M., Klein, R., & Kolb, A. (2018). State of the art on 3D reconstruction with RGB-D cameras. In Computer graphics forum (pp. 625–652).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Autonomous Intelligent Systems GroupUniversity of BonnBonnGermany

Personalised recommendations