Skip to main content
Log in

3D Semantic Scene Completion: A Survey

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Semantic scene completion (SSC) aims to jointly estimate the complete geometry and semantics of a scene, assuming partial sparse input. In the last years following the multiplication of large-scale 3D datasets, SSC has gained significant momentum in the research community because it holds unresolved challenges. Specifically, SSC lies in the ambiguous completion of large unobserved areas and the weak supervision signal of the ground truth. This led to a substantially increasing number of papers on the matter. This survey aims to identify, compare and analyze the techniques providing a critical analysis of the SSC literature on both methods and datasets. Throughout the paper, we provide an in-depth analysis of the existing works covering all choices made by the authors while highlighting the remaining avenues of research. SSC performance of the SoA on the most popular datasets is also evaluated and analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Authors of SemanticKITTI report that semantic labeling a hectar of 3D data takes approx. 4.5 h (Behley et al. 2019).

  2. https://futurism.com/tech-suing-facebook-princeton-data.

  3. https://competitions.codalab.org/competitions/22037.

  4. In their seminal work, for memory reason Song et al. (2017) evaluated SSC only at the 1:4 scale. Subsequently, to provide fair comparisons between indoor datasets and methods, most other indoor SSC have been using the same resolution despite the fact that higher resolution ground truth is available. Recent experiments in Chen et al. (2020a) advocate that using higher input/output resolution boosts the SSC performance significantly.

References

  • Abbasi, A., Kalkan, S., & Sahillioglu, Y. (2018). Deep 3D semantic scene extrapolation. The Visual Computer, 35(2), 271–279.

    Article  Google Scholar 

  • Ahmed, E., Saint, A., Shabayek, A. E. R., Cherenkova, K., Das, R., Gusev, G., Aouada, D., & Ottersten, B. (2018). A survey on deep learning advances on different 3d data representations. arXiv:1808.01462.

  • Armeni, I., Sax, S., Zamir, A., & Savarese, S. (2017). Joint 2D-3D-semantic data for indoor scene understanding. arXiv:1702.01105.

  • Avetisyan, A., Dahnert, M., Dai, A., Savva, M., Chang, A. X., & Nießner, M. (2019). Scan2CAD: Learning CAD model alignment in RGB-D scans. In Cvpr (pp. 2614–2623).

  • Avetisyan, A., Khanova, T., Choy, C., Dash, D., Dai, A., & Nießner, M. (2020). SceneCAD: Predicting object alignments and layouts in RGB-D scans. In Eccv.

  • Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., & Gall, J. (2019). SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences. In Iccv (pp. 9296–9306).

  • Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.

    Article  Google Scholar 

  • Boulch, A., Guerry, J., Saux, B. L., & Audebert, N. (2018). SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Computers & Graphics, 71, 189–198.

  • Boulch, A., Saux, B. L., & Audebert, N. (2017). Unstructured point cloud semantic labeling using deep segmentation networks. In 3dor@eurographics.

  • Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., & Beijbom, O. (2020). nuScenes: A multimodal dataset for autonomous driving. In Cvpr (pp. 11618–11628).

  • Cai, Y., Chen, X., Zhang, C., Lin, K.-Y., Wang, X., & Li, H. (2021). Semantic scene completion via integrating instances and scene in-the-loop. In Cvpr (pp. 324–333).

  • Canny, J. (1986). A computational approach to edge detection. PAMI, 8(6), 679–698.

    Article  Google Scholar 

  • Chang, A. X., Dai, A., Funkhouser, T. A., Halber, M., Nießner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2017). Matterport3D: Learning from RGB-D data in indoor environments. In 3dv (pp. 667–676).

  • Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. PAMI, 40(4), 834–848.

    Article  Google Scholar 

  • Chen, R., Huang, Z., & Yu, Y. (2019). Am 2fnet: Attention-based multiscale & multi-modality fused network. In ROBIO (pp. 1192–1197).

  • Chen, X. [X.], Lin, K.-Y., Qian, C., Zeng, G., & Li, H. (2020a). 3D sketch-aware semantic scene completion via semisupervised structure prior. In Cvpr (pp. 4192–4201).

  • Chen, X. [Xiaokang], Xing, Y., & Zeng, G. (2020b). Real-time semantic scene completion via feature aggregation and conditioned prediction. In Icip (pp. 2830–2834).

  • Chen, X. [Xiaozhi], Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-view 3D object detection network for autonomous driving. In Cvpr (pp. 6526–6534).

  • Chen, Y. [Y.], Garbade, M., & Gall, J. (2019). 3D semantic scene completion from a single depth image using adversarial training. In Icip (pp. 1835–1839).

  • Cheng, R., Agia, C., Ren, Y., Li, X., & Bingbing, L. (2020). S3CNet: A sparse semantic scene completion network for LiDAR point clouds. In Corl.

  • Cherabier, I., Schönberger, J. L., Oswald, M., Pollefeys, M., & Geiger, A. (2018). Learning priors for semantic 3D reconstruction. In Eccv (Vol. 11216, pp. 325–341).

  • Choy, C., Gwak, J., & Savarese, S. (2019). 4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In Cvpr (pp. 3075–3084).

  • Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T. A., & Nießner, M. (2017a). ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Cvpr (pp. 2432–2443).

  • Dai, A., Diller, C., & Nießner, M. (2020). SG-NN: Sparse generative neural networks for self-supervised scene completion of RGB-D scans. In Cvpr (pp. 846–855).

  • Dai, A., & Nießner, M. (2018). 3DMV: Joint 3D-multi-view prediction for 3D semantic scene segmentation. In Eccv (Vol. 11214, pp. 458–474).

  • Dai, A., Qi, C. R., & Nießner, M. (2017b). Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In Cvpr (pp. 6545–6554).

  • Dai, A., Ritchie, D., Bokeloh, M., Reed, S., Sturm, J., & Nießner, M. (2018). ScanComplete: Large-scale scene completion and semantic segmentation for 3D scans. In Cvpr (pp. 4578–4587).

  • Davis, J., Marschner, S., Garr, M., & Levoy, M. (2002). Filling holes in complex surfaces using volumetric diffusion. In Proceedings First International Symposium on 3D Data Processing Visualization and Transmission (pp. 428–438).

  • de Charette, R., & Manitsaris, S. (2019). 3D reconstruction of deformable revolving object under heavy hand interaction. arXiv:1908.01523.

  • Denninger, M., & Triebel, R. (2020). 3D scene reconstruction from a single viewport. In Eccv (Vol. 12367, pp. 51–67). Springer.

  • Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Corl (Vol. 78, pp. 1–16).

  • Dourado, A., de Campos, T. E., Kim, H. S., & Hilton, A. (2020a). EdgeNet: Semantic scene completion from RGB-D images. arXiv:1908.02893.

  • Dourado, A., Kim, H., de Campos, T. E., & Hilton, A. (2020b). Semantic scene completion from a single 360-Degree image and depth map. In Visigrapp (pp. 36–46).

  • Engelmann, F., Rematas, K., Leibe, B., & Ferrari, V. (2021). From points to multi-object 3D reconstruction. In Cvpr (pp. 4588–4597).

  • Everingham, M., Eslami, S., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2014). The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.

    Article  Google Scholar 

  • Fan, H., Su, H., & Guibas, L. (2017). A point set generation network for 3D object reconstruction from a single image. In Cvpr (pp. 2463–2471).

  • Firman, M. [M.]. (2016). RGBD datasets: Past, present and future. In Cvprw (pp. 661–673).

  • Firman, M. [Michael], Aodha, O. M., Julier, S. J., & Brostow, G. J. (2016). Structured prediction of unobserved voxels from a single depth image. In Cvpr (pp. 5431–5440).

  • Fu, H., Cai, B., Gao, L., Zhang, L.-X., Li, C., Xun, Z., & Zhang, H. (2020). 3D-FRONT: 3D furnished rooms with layouts and semantics. arXiv:2011.09127.

  • Fuentes-Pacheco, J., Ascencio, J. R., & Rendón-Mancha, J. M. (2012). Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review, 43(1), 55–81.

    Article  Google Scholar 

  • Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual-Worlds as proxy for multi-object tracking analysis. In Cvpr (pp. 4340–4349).

  • Gao, B., Pan, Y., Li, C., Geng, S., & Zhao, H. (2021). Are we hungry for 3D LiDAR data for semantic segmentation? a survey of datasets and methods. T-ITS.

  • Garbade, M., Sawatzky, J., Richard, A., & Gall, J. (2019). Two stream 3D semantic scene completion. In Cvpr workshops (pp. 416–425).

  • Garg, S., Sünderhauf, N., Dayoub, F., Morrison, D., Cosgun, A., Carneiro, G., Wu, Q., Chin, T. J., Reid, I., Gould, S., & Milford, M. (2020). Semantics for robotic mapping, perception and interaction: A survey. Foundations and Trends in Robotics. arXiv:2101.00443.

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. IJRR, 32(11), 1231–1237.

    Google Scholar 

  • Geiger, A., & Wang, C. (2015). Joint 3D object and layout inference from a single RGB-D image. In Gcpr (Vol. 9358, pp. 183–195).

  • Gkioxari, G., Malik, J., & Johnson, J. J. (2019). Mesh RCNN. In Iccv (pp. 9784–9794).

  • Graham, B., Engelcke, M., & van der Maaten, L. (2018). 3D semantic segmentation with submanifold sparse convolutional networks. In Cvpr (pp. 9224–9232).

  • Griffiths, D., & Boehm, J. (2019). Synthcity: A large scale synthetic point cloud. arXiv:1907.04758.

  • Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., & Aubry, M. (2018). AtlasNet: A papier-mâché approach to learning 3D surface generation. In Cvpr (pp. 216–224).

  • Guedes, A. B. S., de Campos, T. E., & Hilton, A. (2018). Semantic scene completion combining colour and depth: Preliminary experiments. arXiv:1802.04735.

  • Guo, R., & Hoiem, D. (2013). Support surface prediction in indoor scenes. ICCV (pp. 2144–2151).

  • Guo, Y.-X., & Tong, X. (2018). View-volume network for semantic scene completion from a single depth image. In Ijcai (pp. 726–732).

  • Guo, Y.,Wang, H., Hu, Q., Liu, H., Liu, L., & Bennamoun, M. (2020). Deep learning for 3D point clouds: A survey. PAMI.

  • Gupta, S., Girshick, R. B., Arbeláez, P., & Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. In Eccv (Vol. 8695, pp. 345–360).

  • Hackel, T., Savinov, N., Ladicky, L., Wegner, J. D., Schindler, K., & Pollefeys, M. (2017). Semantic3D.net: A new large-scale point cloud classification benchmark. ISPRS Annals, IV-1-W1, 91–98.

  • Han, X. [X.], Laga, H., & Bennamoun, M. (2019). Imagebased 3D object reconstruction: State-of-the-art and trends in the deep learning era. PAMI, 43(5), 1578–1604.

  • Han, X. [Xiaoguang], Li, Z., Huang, H., Kalogerakis, E., & Yu, Y. (2017). High-resolution shape completion using deep neural networks for global structure and local geometry inference. In Iccv (pp. 85–93).

  • Han, X. [Xiaoguang], Zhang, Z., Du, D., Yang, M., Yu, J., Pan, P., Yang, X., Liu, L., Xiong, Z., & Cui, S. (2019). Deep reinforcement learning of volume-guided progressive view inpainting for 3D point scene completion from a single depth image. In Cvpr (pp. 234–243).

  • Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016). SceneNet: Understanding real world indoor scenes with synthetic data. arXiv:1511.07041.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Cvpr (pp. 770–778).

  • Hou, J., Dai, A., & Nießner, M. (2019). 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In Cvpr (pp. 4421–4430).

  • Hou, J., Dai, A., & Nießner, M. (2020). RevealNet: Seeing behind objects RGB-D scans. In Cvpr (pp. 2095–2104).

  • Hua, B.-S., Pham, Q.-H., Nguyen, D., Tran, M., Yu, L.-F., & Yeung, S. (2016). SceneNN: A scene meshes dataset with annotations. In 3dv (pp. 92–101).

  • Huang, H., Chen, H., & Li, J. (2019). Deep neural network for 3D point cloud completion with multistage loss function. Chinese control and decision conference (CCDC) (pp. 4604–4609).

  • Huang, Z., Yu, Y., Xu, J., Ni, F., & Le, X. (2020). PF-Net: Point fractal network for 3D point cloud completion. In Cvpr (pp. 7659–7667).

  • Izadinia, H., Shan, Q., & Seitz, S. M. (2017). IM2CAD. In Cvpr (pp. 2422–2431).

  • Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., & Qu, R. (2019). A survey of deep learning-based object detection. Access, 7, 128837–128868.

    Article  Google Scholar 

  • Kazhdan, M. M., Bolitho, M., & Hoppe, H. (2006). Poisson surface reconstruction. In Sgp (Vol. 256, pp. 61–70).

  • Kim, G., & Kim, A. (2020). Remove, then revert: Static point cloud map construction using multiresolution range images. In Iros (pp. 10758–10765). IEEE.

  • Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep Kd-networks for the recognition of 3D point cloud models. In Iccv (pp. 863–872).

  • Kundu, A., Li, Y., & Rehg, J. M. (2018). 3D-RCNN: Instancelevel 3D object reconstruction via render-and-compare. In Cvpr (pp. 3559–3568).

  • Kurenkov, A., Ji, J., Garg, A., Mehta, V., Gwak, J., Choy, C. B., & Savarese, S. (2018). DeformNet: Free-form deformation network for 3D shape reconstruction from a single image. In Wacv (pp. 858–866).

  • Landrieu, L., & Simonovsky, M. (2018). Large-scale point cloud semantic segmentation with superpoint graphs. In Cvpr (pp. 4558–4567).

  • Li, D., Shao, T., Wu, H., & Zhou, K. (2017). Shape completion from a single RGBD image. IEEE Transactions on Visualization and Computer Graphics, 23(7), 1809–1822.

    Article  Google Scholar 

  • Li, J., Han, K.,Wang, P., Liu, Y., & Yuan, X. (2020a). Anisotropic convolutional networks for 3D semantic scene completion. In Cvpr (pp. 3348–3356).

  • Li, J., Liu, Y. W., Yuan, X., Zhao, C., Siegwart, R., Reid, I., & Cadena, C. (2020b). Depth based semantic scene completion with position importance aware loss. Robotics and Automation Letters (RA-L), 5(1), 219–226.

  • Li, J., Liu, Y., Gong, D., Shi, Q., Yuan, X., Zhao, C., & Reid, I. D. (2019). RGBD based dimensional decomposition residual network for 3D semantic scene completion. In Cvpr (pp. 7693–7702).

  • Li, S., Zou, C., Li, Y., Zhao, X., & Gao, Y. (2020c). Attentionbased multi-modal fusion network for semantic scene completion. In Aaai (pp. 11402–11409).

  • Li, Y. [Y.], Ma, L., Zhong, Z., Liu, F., Chapman, M. A., Cao, D., & Li, J. (2020d). Deep learning for LiDAR point clouds in autonomous driving: A review. IEEE Transactions on Neural Networks and Learning Systems, 32(8), 3412–3432.

  • Li, Y. [Yangyan], Bu, R., Sun, M., Wu, W., Di, X., & Chen, B. (2018). PointCNN: Convolution on X-transformed points. In Nips (pp. 828–838).

  • Li, Y. [Yangyan], Dai, A., Guibas, L., & Nießner, M. (2015). Database-assisted object retrieval for real-time 3D reconstruction. Computer Graphics Forum, 34(2), 435–446.

  • Liao, Y., Donné, S., & Geiger, A. (2018). Deep marching cubes: Learning explicit surface representations. In Cvpr (pp. 2916–2925).

  • Lin, D., Fidler, S., & Urtasun, R. (2013). Holistic scene understanding for 3D object detection with RGBD cameras. In Iccv (pp. 1417–1424).

  • Lin, G., Milan, A., Shen, C., & Reid, I. (2017). RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In Cvpr (pp. 5168–5177).

  • Liu, S. [Shice], Hu, Y., Zeng, Y., Tang, Q., Jin, B., Han, Y., & Li, X. (2018). See and think: Disentangling semantic scene completion. In Neurips (pp. 261–272).

  • Liu, S. [Sifei], Mello, S. D., Gu, J., Zhong, G., Yang, M.-H., & Kautz, J. (2017). Learning affinity via spatial propagation networks. In Nips (pp. 1520–1530).

  • Liu, W., Sun, J., Li, W., Hu, T., & Wang, P. (2019). Deep learning on point clouds and its application: A survey. Sensors, 19(19), 4188.

    Article  Google Scholar 

  • Liu, Y. W., Li, J., Yan, Q., Yuan, X., Zhao, C.-X., Reid, I., & Cadena, C. (2020). 3D gated recurrent fusion for semantic scene completion. arXiv:2002.07269.

  • Lorensen, W., & Cline, H. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In Siggraph (pp. 163–169).

  • Lu, H., & Shi, H. (2020). Deep learning for 3d point cloud understanding: A survey. arXiv:2009.08920.

  • Ma, N., Zhang, X., Zheng, H.-T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient CNN architecture design. In Eccv (Vol. 11218, pp. 122–138).

  • Maturana, D., & Scherer, S. A. (2015). VoxNet: A 3D convolutional neural network for real-time object recognition. In Iros (pp. 922–928).

  • Meagher, D. (1982). Geometric modeling using octree encoding. Computer Graphics and Image Processing, 19(1), 85.

    Article  Google Scholar 

  • Meng, H.-Y., Gao, L., Lai, Y., & Manocha, D. (2019). VVNet: Voxel VAE net with group convolutions for point cloud segmentation. In Iccv (pp. 8499–8507).

  • Mitra, N., Pauly, M., Wand, M., & Ceylan, D. (2013). Symmetry in 3D geometry: Extraction and applications. Computer Graphics Forum, 32(6), 1–23.

    Article  Google Scholar 

  • Müller, N., Wong, Y.-S., Mitra, N., Dai, A., & Nießner, M. (2021). Seeing behind objects for 3D multi-object tracking in RGB-D sequences. In Cvpr (pp. 6071– 6080).

  • Nair, R., Lenzen, F., Meister, S., Schäfer, H., Garbe, C., & Kondermann, D. (2012). High accuracy TOF and stereo sensor fusion at interactive rates. In Eccv work-shops (Vol. 7584, pp. 1–11).

  • Nan, L., Xie, K., & Sharf, A. (2012). A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics, 31(6), 137:1-137:10.

    Article  Google Scholar 

  • Nealen, A., Igarashi, T., Sorkine-Hornung, O., & Alexa, M. (2006). Laplacian mesh optimization. In Graphite ’06 (pp. 381–389).

  • Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohi, P., Shotton, J., Hodges, S., & Fitzgibbon, A. W. (2011). KinectFusion: Real-time dense surface mapping and tracking. In International symposium on mixed and augmented reality (pp. 127–136).

  • Nguyen, A., & Le, H. (2013). 3D point cloud segmentation: A survey. In Conference on robotics, automation and mechatronics (RAM) (pp. 225–230).

  • Nie, Y. [Y.], Han, X.-G., Guo, S., Zheng, Y., Chang, J., & Zhang, J. (2020). Total3DUnderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In Cvpr (pp. 52–61).

  • Nie, Y. [Yinyu], Hou, J., Han, X., & Nießner, M. (2021). RfDNet: Point scene understanding by semantic instance reconstruction. In Cvpr (pp. 4608–4618).

  • Pan, Y., Gao, B., Mei, J., Geng, S., Li, C., & Zhao, H. (2020). SemanticPOSS: A point cloud dataset with large quantity of dynamic instances. In Iv (pp. 687–693).

  • Park, J. J., Florence, P., Straub, J., Newcombe, R. A., & Lovegrove, S. (2019). DeepSDF: Learning continuous signed distance functions for shape representation. In Cvpr (pp. 165–174).

  • Pauly, M., Mitra, N., Giesen, J., Groß, M., & Guibas, L. (2005). Example-based 3D scan completion. In Sgp.

  • Pauly, M., Mitra, N., Wallner, J., Pottmann, H., & Guibas, L. (2008). Discovering structural regularity in 3D geometry. In Siggraph 2008 (Vol. 27, 3, p. 43).

  • Pintore, G., Mura, C., Ganovelli, F., Perez, L. J. F., Pajarola, R., & Gobbetti, E. (2020). State-of-the-art in automatic 3D reconstruction of structured indoor environments. CGF, 39(2), 667–699.

    Google Scholar 

  • Pock, T., & Chambolle, A. (2011). Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In Iccv (pp. 1762–1769). IEEE.

  • Pomerleau, F., Colas, F., & Siegwart, R. (2014). A survey of rigid 3D pointcloud registration algorithms. In International conference on ambient computing, applications, services and technologies.

  • Pomerleau, F., Colas, F., & Siegwart, R. (2015). A review of point cloud registration algorithms for mobile robotics. Foundations and Trends Robotics, 4(1), 1–104.

    Article  Google Scholar 

  • Popov, S., Bauszat, P., & Ferrari, V. (2020). CoReNet: Coherent 3D scene reconstruction from a single RGB image. In Eccv (Vol. 12347, pp. 366–383).

  • Qi, C. R. [C. R.], Su, H., Mo, K., & Guibas, L. (2017). Point-Net: Deep learning on point sets for 3D classification and segmentation. In Cvpr (pp. 77–85).

  • Qi, C., Litany, O., He, K., & Guibas, L. (2019). Deep hough voting for 3d object detection in point clouds. In Iccv (pp. 9276–9285).

  • Qi, C. R. [Charles Ruizhongtai], Yi, L., Su, H., & Guibas, L. J. (2017). PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Neurips (pp. 5099–5108).

  • Reed, S., Oord, A., Kalchbrenner, N., Colmenarejo, S. G., Wang, Z., Chen, Y., Belov, D., & Freitas, N. D. (2017). Parallel multiscale autoregressive density estimation. In Icml (Vol. 70, pp. 2912–2921).

  • Rezende, D. J., Eslami, S., Mohamed, S., Battaglia, P., Jaderberg, M., & Heess, N. (2016). Unsupervised learning of 3D structure from images. In Nips (pp. 4997–5005).

  • Riegler, G., Ulusoy, A. O., Bischof, H., & Geiger, A. (2017a). OctNetFusion: Learning depth fusion from data. In 3dv (pp. 57–66).

  • Riegler, G., Ulusoy, A. O., & Geiger, A. (2017b). OctNet: Learning deep 3D representations at high resolutions. In Cvpr (pp. 6620–6629).

  • Rist, C. B., Emmerichs, D., Enzweiler, M., & Gavrila, D. M. (2021). Semantic scene completion using local deep implicit functions on LiDAR data. PAMI.

  • Rist, C. B., Schmidt, D., Enzweiler, M., & Gavrila, D. M. (2020). SCSSnet: Learning spatially-conditioned scene segmentation on LiDAR point clouds. In Iv (pp. 1086–1093).

  • Rock, J., Gupta, T., Thorsen, J., Gwak, J., Shin, D., & Hoiem, D. (2015). Completing 3D object shape from one depth image. In Cvpr (pp. 2484–2493).

  • Roldão, L., de Charette, R., & Verroust-Blondet, A. (2020). LMSCNet: Lightweight multiscale 3D semantic completion. In 3dv (pp. 111–119).

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Miccai (Vol. 9351, pp. 234–241).

  • Ros, G., Sellart, L., Materzynska, J., Vázquez, D., & López, A. M. (2016). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Cvpr (pp. 3234–3243).

  • Roynard, X., Deschaud, J.-E., & Goulette, F. (2018). Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification. The International Journal of Robotics Research, 37(6), 545–557.

    Article  Google Scholar 

  • Saputra, M. R. U., Markham, A., & Trigoni, A. (2018). Visual SLAM and structure from motion in dynamic environments. ACM Computing Surveys (CSUR), 51(2), 37:1-37:36.

    Google Scholar 

  • Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., & Guo, B. (2012). An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Transactions on Graphics, 255, 23–32.

    Google Scholar 

  • Sharma, A., Grau, O., & Fritz, M. (2016). VConv-DAE: Deep volumetric shape learning without object labels. In Eccv workshops (Vol. 9915, pp. 236–250).

  • Shen, C.-H., Fu, H., Chen, K., & Hu, S. (2012). Structure recovery by part assembly. ACM Transactions on Graphics, 31(6), 180:1-180:11.

    Article  Google Scholar 

  • Shi, S., Wang, Z., Shi, J., Wang, X., & Li, H. (2020). From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. PAMI, 43(8), 2647–2664.

    Google Scholar 

  • Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In Eccv (Vol. 7576, pp. 746–760).

  • Sipiran, I., Gregor, R., & Schreck, T. (2014). Approximate symmetry detection in partial 3D meshes. Computer Graphics Forum, 33(7), 131–140.

    Article  Google Scholar 

  • Smith, E., & Meger, D. (2017). Improved adversarial systems for 3D object generation and reconstruction. In Corl (Vol. 78, pp. 87–96).

  • Song, S., & Xiao, J. (2016). Deep sliding shapes for amodal 3d object detection in RGB-D images. In CVPR (pp. 808–816).

  • Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. A. (2017). Semantic scene completion from a single depth image. In Cvpr (pp. 190–198).

  • Song, S., Zeng, A., Chang, A. X., Savva, M., Savarese, S., & Funkhouser, T. (2018). Im2pano3d: Extrapolating 360\(^{\circ }\) structure and semantics beyond the field of view. In Cvpr (pp. 3847–3856).

  • Sorkine-Hornung, O., & Cohen-Or, D. (2004). Least-squares meshes. Proceedings Shape Modeling Applications, 2004, 191–199.

    Google Scholar 

  • Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur-Artal, R., Ren, C., Verma, S., & Newcombe, R. A. (2019). The Replica dataset: A digital replica of indoor spaces. arXiv:1906.05797.

  • Stutz, D., & Geiger, A. (2018). Learning 3D shape completion from laser scan data with weak supervision. In Cvpr (pp. 1955–1964).

  • Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3D shape recognition. In Iccv (pp. 945–953).

  • Sung, M., Kim, V. G., Angst, R., & Guibas, L. (2015). Data-driven structural priors for shape completion. ACM Transactions on Graphics, 34(6), 175:1-175:11.

    Article  Google Scholar 

  • Tan, W., Qin, N., Ma, L., Li, Y., Du, J., Cai, G., Yang, K., & Li, J. (2020). Toronto-3D: A large-scale mobile LiDAR dataset for semantic segmentation of urban roadways. In Cvpr workshops (pp. 797–806).

  • Tchapmi, L. P., Choy, C., Armeni, I., Gwak, J., & Savarese, S. (2017). SEGCloud: Semantic segmentation of 3D point clouds. In 3dv (pp. 537–547).

  • Tchapmi, L. P., Kosaraju, V., Rezatofighi, H., Reid, I., & Savarese, S. (2019). TopNet: Structural point cloud decoder. In Cvpr (pp. 383–392).

  • Thomas, H., Qi, C. R., Deschaud, J.-E., Marcotegui, B., Goulette, F., & Guibas, L. (2019). KPConv: Flexible and deformable convolution for point clouds. In Iccv (pp. 6410–6419).

  • Thrun, S., & Wegbreit, B. (2005). Shape from symmetry. In Iccv (pp. 1824–1831).

  • Vallet, B., Brédif, M., Serna, A., Marcotegui, B., & Paparoditis, N. (2015). TerraMobilita/iQmulus urban point cloud analysis benchmark. Computers & Graphics, 49, 126–133.

  • Varley, J., DeChant, C., Richardson, A., Ruales, J., & Allen, P. (2017). Shape completion enabled robotic grasping. In Iros (pp. 2442–2447).

  • Wang, P., Liu, Y., Guo, Y., Sun, C., & Tong, X. (2017). OCNN: Octree-based convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics, 36(4), 72:1-72:11.

    Google Scholar 

  • Wang, P.-S., Liu, Y., & Tong, X. (2020a). Deep Octree-based CNNs with output-guided skip connections for 3D shape and scene completion. In Cvpr workshops (pp. 1074–1081).

  • Wang, P.-S., Sun, C., Liu, Y., & Tong, X. (2018). Adaptive OCNN: A patch-based deep representation of 3D shapes. TOG, 37(6), 217:1-217:11.

    Google Scholar 

  • Wang,W., Yu, R., Huang, Q., & Neumann, U. (2018). SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In Cvpr (pp. 2569–2578).

  • Wang, X., Ang, M., & Lee, G. H. (2020b). Cascaded refinement network for point cloud completion. In Cvpr (pp. 787–796).

  • Wang, X., Ang, M., & Lee, G. H. (2020c). Point cloud completion by learning shape priors. In Iros (pp. 10719–10726).

  • Wang, X., Oswald, M., Cherabier, I., & Pollefeys, M. (2019a). Learning 3D semantic reconstruction on octrees. In German conference on pattern recognition (Vol. 11824, pp. 581–594).

  • Wang, Y. [Yida], Tan, D. J., Navab, N., & Tombari, F. (2020). SoftPoolNet: Shape descriptor for point cloud completion and classification. In Eccv (Vol. 12348, pp. 70–85).

  • Wang, Y. [Yida], Tan, D. J., Navab, N., & Tombari, F. (2018). Adversarial semantic scene completion from a single depth image. In 3dv (pp. 426–434).

  • Wang, Y. [Yida], Tan, D. J., Navab, N., & Tombari, F. (2019b). ForkNet: Multi-branch volumetric semantic completion from a single depth image. In Iccv (pp. 8607–8616).

  • Wang, Y. [Yifan],Wu, S., Huang, H., Cohen-Or, D., &Sorkine- Hornung, O. (2019c). Patch-based progressive 3D point set upsampling. In Cvpr (pp. 5958–5967).

  • Wang, Y. [Yue], Sun, Y., Liu, Z., Sarma, S., Bronstein, M., & Solomon, J. (2019). Dynamic graph CNN for learning on point clouds. TOG, 38(5), 146:1–146:12.

  • Wen, X., Li, T., Han, Z., & Liu, Y.-S. (2020). Point cloud completion by skip-attention network with hierarchical folding. In Cvpr (pp. 1936–1945).

  • Wu, S.-C., Tateno, K., Navab, N., & Tombari, F. (2020). SCFusion: Real-time incremental scene reconstruction with semantic completion. In 3dv (pp. 801–810).

  • Xiao, J., Owens, A., & Torralba, A. (2013). SUN3D: A database of big spaces reconstructed using SfM and object labels. In Iccv (pp. 1625–1632).

  • Xie, H., Yao, H., Zhou, S., Mao, J., Zhang, S., & Sun, W. (2020). GRNet: Gridding residual network for dense point cloud completion. In Eccv (Vol. 12354, pp. 365–381).

  • Xie, Y., Tian, J., & Zhu, X. (2020a). Linking points with labels in 3D: A review of point cloud semantic segmentation. Geoscience and Remote Sensing Magazine, 8, 38–59.

  • Xie, Y., Tian, J., & Zhu, X. X. (2020b). Linking points with labels in 3d: A review of point cloud semantic segmentation. Geoscience and Remote Sensing Magazine, 8, 38–59.

  • Yan, X., Gao, J., Li, J., Zhang, R., Li, Z., Huang, R., & Cui, S. (2021). Sparse single sweep LiDAR point cloud segmentation via learning contextual shape priors from scene completion. In Aaai (pp. 3101–3109).

  • Yang, B., Rosa, S., Markham, A., Trigoni, N., & Wen, H. (2019). Dense 3D object reconstruction from a single depth view. PAMI, 41(12), 2820–2834.

    Article  Google Scholar 

  • Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In Iclr.

  • Yuan,W., Khot, T., Held, D., Mertz, C., & Hebert, M. (2018). PCN: Point completion network. In 3dv (pp. 728–737).

  • Zhang, G., & Chen, Y. [YangQuan]. (2021). A metric for evaluating 3D reconstruction and mapping performance with no ground truthing. In Icip.

  • Zhang, J. [J.], Zhao, H., Yao, A., Chen, Y., Zhang, L., & Liao, H. (2018a). Efficient semantic scene completion network with spatial group convolution. In Eccv (Vol. 11216, pp. 749–765).

  • Zhang, J. [Jiaying], Zhao, X., Chen, Z., & Lu, Z. (2019a). A review of deep learning-based semantic segmentation for point cloud. IEEE Access, 7, 179118–179133.

  • Zhang, L., Wang, L., Zhang, X., Shen, P., Bennamoun, M., Zhu, G., et al. (2018b). Semantic scene completion with dense CRF from a single depth image. Neuro-computing, 318, 182–195.

  • Zhang, P., Liu, W., Lei, Y., Lu, H., & Yang, X. (2019b). Cascaded context pyramid for full-resolution 3D semantic scene completion. In Iccv (pp. 7800–7809).

  • Zhang,W., Yan, Q., & Xiao, C. (2020). Detail preserved point cloud completion via separated feature aggregation. In Eccv (Vol. 12370, pp. 512–528).

  • Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S.-C. (2013). Beyond point clouds: Scene understanding by reasoning geometry and physics. In Cvpr (pp. 3127–3134).

  • Zhong, M., & Zeng, G. (2020). Semantic point completion network for 3D semantic scene completion. In Ecai (Vol. 325, pp. 2824–2831).

  • Zimmermann, K., Petrícek, T., Salanský, V., & Svoboda, T. (2017). Learning for active 3D mapping. In Iccv (pp. 1548–1556).

  • Zollhöfer, M., Stotko, P., Görlitz, A., Theobalt, C., Nießner, M., Klein, R., & Kolb, A. (2018). State of the art on 3D reconstruction with RGB-D cameras. CGF, 37(2), 625–652.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Roldão.

Additional information

Communicated by Stephen Lin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roldão, L., de Charette, R. & Verroust-Blondet, A. 3D Semantic Scene Completion: A Survey. Int J Comput Vis 130, 1978–2005 (2022). https://doi.org/10.1007/s11263-021-01504-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01504-5

Keywords

Navigation