Abstract
Semantic scene completion (SSC) aims to jointly estimate the complete geometry and semantics of a scene, assuming partial sparse input. In the last years following the multiplication of large-scale 3D datasets, SSC has gained significant momentum in the research community because it holds unresolved challenges. Specifically, SSC lies in the ambiguous completion of large unobserved areas and the weak supervision signal of the ground truth. This led to a substantially increasing number of papers on the matter. This survey aims to identify, compare and analyze the techniques providing a critical analysis of the SSC literature on both methods and datasets. Throughout the paper, we provide an in-depth analysis of the existing works covering all choices made by the authors while highlighting the remaining avenues of research. SSC performance of the SoA on the most popular datasets is also evaluated and analyzed.
Similar content being viewed by others
Notes
Authors of SemanticKITTI report that semantic labeling a hectar of 3D data takes approx. 4.5 h (Behley et al. 2019).
In their seminal work, for memory reason Song et al. (2017) evaluated SSC only at the 1:4 scale. Subsequently, to provide fair comparisons between indoor datasets and methods, most other indoor SSC have been using the same resolution despite the fact that higher resolution ground truth is available. Recent experiments in Chen et al. (2020a) advocate that using higher input/output resolution boosts the SSC performance significantly.
References
Abbasi, A., Kalkan, S., & Sahillioglu, Y. (2018). Deep 3D semantic scene extrapolation. The Visual Computer, 35(2), 271–279.
Ahmed, E., Saint, A., Shabayek, A. E. R., Cherenkova, K., Das, R., Gusev, G., Aouada, D., & Ottersten, B. (2018). A survey on deep learning advances on different 3d data representations. arXiv:1808.01462.
Armeni, I., Sax, S., Zamir, A., & Savarese, S. (2017). Joint 2D-3D-semantic data for indoor scene understanding. arXiv:1702.01105.
Avetisyan, A., Dahnert, M., Dai, A., Savva, M., Chang, A. X., & Nießner, M. (2019). Scan2CAD: Learning CAD model alignment in RGB-D scans. In Cvpr (pp. 2614–2623).
Avetisyan, A., Khanova, T., Choy, C., Dash, D., Dai, A., & Nießner, M. (2020). SceneCAD: Predicting object alignments and layouts in RGB-D scans. In Eccv.
Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., & Gall, J. (2019). SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences. In Iccv (pp. 9296–9306).
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.
Boulch, A., Guerry, J., Saux, B. L., & Audebert, N. (2018). SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Computers & Graphics, 71, 189–198.
Boulch, A., Saux, B. L., & Audebert, N. (2017). Unstructured point cloud semantic labeling using deep segmentation networks. In 3dor@eurographics.
Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., & Beijbom, O. (2020). nuScenes: A multimodal dataset for autonomous driving. In Cvpr (pp. 11618–11628).
Cai, Y., Chen, X., Zhang, C., Lin, K.-Y., Wang, X., & Li, H. (2021). Semantic scene completion via integrating instances and scene in-the-loop. In Cvpr (pp. 324–333).
Canny, J. (1986). A computational approach to edge detection. PAMI, 8(6), 679–698.
Chang, A. X., Dai, A., Funkhouser, T. A., Halber, M., Nießner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2017). Matterport3D: Learning from RGB-D data in indoor environments. In 3dv (pp. 667–676).
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. PAMI, 40(4), 834–848.
Chen, R., Huang, Z., & Yu, Y. (2019). Am 2fnet: Attention-based multiscale & multi-modality fused network. In ROBIO (pp. 1192–1197).
Chen, X. [X.], Lin, K.-Y., Qian, C., Zeng, G., & Li, H. (2020a). 3D sketch-aware semantic scene completion via semisupervised structure prior. In Cvpr (pp. 4192–4201).
Chen, X. [Xiaokang], Xing, Y., & Zeng, G. (2020b). Real-time semantic scene completion via feature aggregation and conditioned prediction. In Icip (pp. 2830–2834).
Chen, X. [Xiaozhi], Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-view 3D object detection network for autonomous driving. In Cvpr (pp. 6526–6534).
Chen, Y. [Y.], Garbade, M., & Gall, J. (2019). 3D semantic scene completion from a single depth image using adversarial training. In Icip (pp. 1835–1839).
Cheng, R., Agia, C., Ren, Y., Li, X., & Bingbing, L. (2020). S3CNet: A sparse semantic scene completion network for LiDAR point clouds. In Corl.
Cherabier, I., Schönberger, J. L., Oswald, M., Pollefeys, M., & Geiger, A. (2018). Learning priors for semantic 3D reconstruction. In Eccv (Vol. 11216, pp. 325–341).
Choy, C., Gwak, J., & Savarese, S. (2019). 4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In Cvpr (pp. 3075–3084).
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T. A., & Nießner, M. (2017a). ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Cvpr (pp. 2432–2443).
Dai, A., Diller, C., & Nießner, M. (2020). SG-NN: Sparse generative neural networks for self-supervised scene completion of RGB-D scans. In Cvpr (pp. 846–855).
Dai, A., & Nießner, M. (2018). 3DMV: Joint 3D-multi-view prediction for 3D semantic scene segmentation. In Eccv (Vol. 11214, pp. 458–474).
Dai, A., Qi, C. R., & Nießner, M. (2017b). Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In Cvpr (pp. 6545–6554).
Dai, A., Ritchie, D., Bokeloh, M., Reed, S., Sturm, J., & Nießner, M. (2018). ScanComplete: Large-scale scene completion and semantic segmentation for 3D scans. In Cvpr (pp. 4578–4587).
Davis, J., Marschner, S., Garr, M., & Levoy, M. (2002). Filling holes in complex surfaces using volumetric diffusion. In Proceedings First International Symposium on 3D Data Processing Visualization and Transmission (pp. 428–438).
de Charette, R., & Manitsaris, S. (2019). 3D reconstruction of deformable revolving object under heavy hand interaction. arXiv:1908.01523.
Denninger, M., & Triebel, R. (2020). 3D scene reconstruction from a single viewport. In Eccv (Vol. 12367, pp. 51–67). Springer.
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Corl (Vol. 78, pp. 1–16).
Dourado, A., de Campos, T. E., Kim, H. S., & Hilton, A. (2020a). EdgeNet: Semantic scene completion from RGB-D images. arXiv:1908.02893.
Dourado, A., Kim, H., de Campos, T. E., & Hilton, A. (2020b). Semantic scene completion from a single 360-Degree image and depth map. In Visigrapp (pp. 36–46).
Engelmann, F., Rematas, K., Leibe, B., & Ferrari, V. (2021). From points to multi-object 3D reconstruction. In Cvpr (pp. 4588–4597).
Everingham, M., Eslami, S., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2014). The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.
Fan, H., Su, H., & Guibas, L. (2017). A point set generation network for 3D object reconstruction from a single image. In Cvpr (pp. 2463–2471).
Firman, M. [M.]. (2016). RGBD datasets: Past, present and future. In Cvprw (pp. 661–673).
Firman, M. [Michael], Aodha, O. M., Julier, S. J., & Brostow, G. J. (2016). Structured prediction of unobserved voxels from a single depth image. In Cvpr (pp. 5431–5440).
Fu, H., Cai, B., Gao, L., Zhang, L.-X., Li, C., Xun, Z., & Zhang, H. (2020). 3D-FRONT: 3D furnished rooms with layouts and semantics. arXiv:2011.09127.
Fuentes-Pacheco, J., Ascencio, J. R., & Rendón-Mancha, J. M. (2012). Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review, 43(1), 55–81.
Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual-Worlds as proxy for multi-object tracking analysis. In Cvpr (pp. 4340–4349).
Gao, B., Pan, Y., Li, C., Geng, S., & Zhao, H. (2021). Are we hungry for 3D LiDAR data for semantic segmentation? a survey of datasets and methods. T-ITS.
Garbade, M., Sawatzky, J., Richard, A., & Gall, J. (2019). Two stream 3D semantic scene completion. In Cvpr workshops (pp. 416–425).
Garg, S., Sünderhauf, N., Dayoub, F., Morrison, D., Cosgun, A., Carneiro, G., Wu, Q., Chin, T. J., Reid, I., Gould, S., & Milford, M. (2020). Semantics for robotic mapping, perception and interaction: A survey. Foundations and Trends in Robotics. arXiv:2101.00443.
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. IJRR, 32(11), 1231–1237.
Geiger, A., & Wang, C. (2015). Joint 3D object and layout inference from a single RGB-D image. In Gcpr (Vol. 9358, pp. 183–195).
Gkioxari, G., Malik, J., & Johnson, J. J. (2019). Mesh RCNN. In Iccv (pp. 9784–9794).
Graham, B., Engelcke, M., & van der Maaten, L. (2018). 3D semantic segmentation with submanifold sparse convolutional networks. In Cvpr (pp. 9224–9232).
Griffiths, D., & Boehm, J. (2019). Synthcity: A large scale synthetic point cloud. arXiv:1907.04758.
Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., & Aubry, M. (2018). AtlasNet: A papier-mâché approach to learning 3D surface generation. In Cvpr (pp. 216–224).
Guedes, A. B. S., de Campos, T. E., & Hilton, A. (2018). Semantic scene completion combining colour and depth: Preliminary experiments. arXiv:1802.04735.
Guo, R., & Hoiem, D. (2013). Support surface prediction in indoor scenes. ICCV (pp. 2144–2151).
Guo, Y.-X., & Tong, X. (2018). View-volume network for semantic scene completion from a single depth image. In Ijcai (pp. 726–732).
Guo, Y.,Wang, H., Hu, Q., Liu, H., Liu, L., & Bennamoun, M. (2020). Deep learning for 3D point clouds: A survey. PAMI.
Gupta, S., Girshick, R. B., Arbeláez, P., & Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. In Eccv (Vol. 8695, pp. 345–360).
Hackel, T., Savinov, N., Ladicky, L., Wegner, J. D., Schindler, K., & Pollefeys, M. (2017). Semantic3D.net: A new large-scale point cloud classification benchmark. ISPRS Annals, IV-1-W1, 91–98.
Han, X. [X.], Laga, H., & Bennamoun, M. (2019). Imagebased 3D object reconstruction: State-of-the-art and trends in the deep learning era. PAMI, 43(5), 1578–1604.
Han, X. [Xiaoguang], Li, Z., Huang, H., Kalogerakis, E., & Yu, Y. (2017). High-resolution shape completion using deep neural networks for global structure and local geometry inference. In Iccv (pp. 85–93).
Han, X. [Xiaoguang], Zhang, Z., Du, D., Yang, M., Yu, J., Pan, P., Yang, X., Liu, L., Xiong, Z., & Cui, S. (2019). Deep reinforcement learning of volume-guided progressive view inpainting for 3D point scene completion from a single depth image. In Cvpr (pp. 234–243).
Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016). SceneNet: Understanding real world indoor scenes with synthetic data. arXiv:1511.07041.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Cvpr (pp. 770–778).
Hou, J., Dai, A., & Nießner, M. (2019). 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In Cvpr (pp. 4421–4430).
Hou, J., Dai, A., & Nießner, M. (2020). RevealNet: Seeing behind objects RGB-D scans. In Cvpr (pp. 2095–2104).
Hua, B.-S., Pham, Q.-H., Nguyen, D., Tran, M., Yu, L.-F., & Yeung, S. (2016). SceneNN: A scene meshes dataset with annotations. In 3dv (pp. 92–101).
Huang, H., Chen, H., & Li, J. (2019). Deep neural network for 3D point cloud completion with multistage loss function. Chinese control and decision conference (CCDC) (pp. 4604–4609).
Huang, Z., Yu, Y., Xu, J., Ni, F., & Le, X. (2020). PF-Net: Point fractal network for 3D point cloud completion. In Cvpr (pp. 7659–7667).
Izadinia, H., Shan, Q., & Seitz, S. M. (2017). IM2CAD. In Cvpr (pp. 2422–2431).
Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., & Qu, R. (2019). A survey of deep learning-based object detection. Access, 7, 128837–128868.
Kazhdan, M. M., Bolitho, M., & Hoppe, H. (2006). Poisson surface reconstruction. In Sgp (Vol. 256, pp. 61–70).
Kim, G., & Kim, A. (2020). Remove, then revert: Static point cloud map construction using multiresolution range images. In Iros (pp. 10758–10765). IEEE.
Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep Kd-networks for the recognition of 3D point cloud models. In Iccv (pp. 863–872).
Kundu, A., Li, Y., & Rehg, J. M. (2018). 3D-RCNN: Instancelevel 3D object reconstruction via render-and-compare. In Cvpr (pp. 3559–3568).
Kurenkov, A., Ji, J., Garg, A., Mehta, V., Gwak, J., Choy, C. B., & Savarese, S. (2018). DeformNet: Free-form deformation network for 3D shape reconstruction from a single image. In Wacv (pp. 858–866).
Landrieu, L., & Simonovsky, M. (2018). Large-scale point cloud semantic segmentation with superpoint graphs. In Cvpr (pp. 4558–4567).
Li, D., Shao, T., Wu, H., & Zhou, K. (2017). Shape completion from a single RGBD image. IEEE Transactions on Visualization and Computer Graphics, 23(7), 1809–1822.
Li, J., Han, K.,Wang, P., Liu, Y., & Yuan, X. (2020a). Anisotropic convolutional networks for 3D semantic scene completion. In Cvpr (pp. 3348–3356).
Li, J., Liu, Y. W., Yuan, X., Zhao, C., Siegwart, R., Reid, I., & Cadena, C. (2020b). Depth based semantic scene completion with position importance aware loss. Robotics and Automation Letters (RA-L), 5(1), 219–226.
Li, J., Liu, Y., Gong, D., Shi, Q., Yuan, X., Zhao, C., & Reid, I. D. (2019). RGBD based dimensional decomposition residual network for 3D semantic scene completion. In Cvpr (pp. 7693–7702).
Li, S., Zou, C., Li, Y., Zhao, X., & Gao, Y. (2020c). Attentionbased multi-modal fusion network for semantic scene completion. In Aaai (pp. 11402–11409).
Li, Y. [Y.], Ma, L., Zhong, Z., Liu, F., Chapman, M. A., Cao, D., & Li, J. (2020d). Deep learning for LiDAR point clouds in autonomous driving: A review. IEEE Transactions on Neural Networks and Learning Systems, 32(8), 3412–3432.
Li, Y. [Yangyan], Bu, R., Sun, M., Wu, W., Di, X., & Chen, B. (2018). PointCNN: Convolution on X-transformed points. In Nips (pp. 828–838).
Li, Y. [Yangyan], Dai, A., Guibas, L., & Nießner, M. (2015). Database-assisted object retrieval for real-time 3D reconstruction. Computer Graphics Forum, 34(2), 435–446.
Liao, Y., Donné, S., & Geiger, A. (2018). Deep marching cubes: Learning explicit surface representations. In Cvpr (pp. 2916–2925).
Lin, D., Fidler, S., & Urtasun, R. (2013). Holistic scene understanding for 3D object detection with RGBD cameras. In Iccv (pp. 1417–1424).
Lin, G., Milan, A., Shen, C., & Reid, I. (2017). RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In Cvpr (pp. 5168–5177).
Liu, S. [Shice], Hu, Y., Zeng, Y., Tang, Q., Jin, B., Han, Y., & Li, X. (2018). See and think: Disentangling semantic scene completion. In Neurips (pp. 261–272).
Liu, S. [Sifei], Mello, S. D., Gu, J., Zhong, G., Yang, M.-H., & Kautz, J. (2017). Learning affinity via spatial propagation networks. In Nips (pp. 1520–1530).
Liu, W., Sun, J., Li, W., Hu, T., & Wang, P. (2019). Deep learning on point clouds and its application: A survey. Sensors, 19(19), 4188.
Liu, Y. W., Li, J., Yan, Q., Yuan, X., Zhao, C.-X., Reid, I., & Cadena, C. (2020). 3D gated recurrent fusion for semantic scene completion. arXiv:2002.07269.
Lorensen, W., & Cline, H. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In Siggraph (pp. 163–169).
Lu, H., & Shi, H. (2020). Deep learning for 3d point cloud understanding: A survey. arXiv:2009.08920.
Ma, N., Zhang, X., Zheng, H.-T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient CNN architecture design. In Eccv (Vol. 11218, pp. 122–138).
Maturana, D., & Scherer, S. A. (2015). VoxNet: A 3D convolutional neural network for real-time object recognition. In Iros (pp. 922–928).
Meagher, D. (1982). Geometric modeling using octree encoding. Computer Graphics and Image Processing, 19(1), 85.
Meng, H.-Y., Gao, L., Lai, Y., & Manocha, D. (2019). VVNet: Voxel VAE net with group convolutions for point cloud segmentation. In Iccv (pp. 8499–8507).
Mitra, N., Pauly, M., Wand, M., & Ceylan, D. (2013). Symmetry in 3D geometry: Extraction and applications. Computer Graphics Forum, 32(6), 1–23.
Müller, N., Wong, Y.-S., Mitra, N., Dai, A., & Nießner, M. (2021). Seeing behind objects for 3D multi-object tracking in RGB-D sequences. In Cvpr (pp. 6071– 6080).
Nair, R., Lenzen, F., Meister, S., Schäfer, H., Garbe, C., & Kondermann, D. (2012). High accuracy TOF and stereo sensor fusion at interactive rates. In Eccv work-shops (Vol. 7584, pp. 1–11).
Nan, L., Xie, K., & Sharf, A. (2012). A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics, 31(6), 137:1-137:10.
Nealen, A., Igarashi, T., Sorkine-Hornung, O., & Alexa, M. (2006). Laplacian mesh optimization. In Graphite ’06 (pp. 381–389).
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohi, P., Shotton, J., Hodges, S., & Fitzgibbon, A. W. (2011). KinectFusion: Real-time dense surface mapping and tracking. In International symposium on mixed and augmented reality (pp. 127–136).
Nguyen, A., & Le, H. (2013). 3D point cloud segmentation: A survey. In Conference on robotics, automation and mechatronics (RAM) (pp. 225–230).
Nie, Y. [Y.], Han, X.-G., Guo, S., Zheng, Y., Chang, J., & Zhang, J. (2020). Total3DUnderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In Cvpr (pp. 52–61).
Nie, Y. [Yinyu], Hou, J., Han, X., & Nießner, M. (2021). RfDNet: Point scene understanding by semantic instance reconstruction. In Cvpr (pp. 4608–4618).
Pan, Y., Gao, B., Mei, J., Geng, S., Li, C., & Zhao, H. (2020). SemanticPOSS: A point cloud dataset with large quantity of dynamic instances. In Iv (pp. 687–693).
Park, J. J., Florence, P., Straub, J., Newcombe, R. A., & Lovegrove, S. (2019). DeepSDF: Learning continuous signed distance functions for shape representation. In Cvpr (pp. 165–174).
Pauly, M., Mitra, N., Giesen, J., Groß, M., & Guibas, L. (2005). Example-based 3D scan completion. In Sgp.
Pauly, M., Mitra, N., Wallner, J., Pottmann, H., & Guibas, L. (2008). Discovering structural regularity in 3D geometry. In Siggraph 2008 (Vol. 27, 3, p. 43).
Pintore, G., Mura, C., Ganovelli, F., Perez, L. J. F., Pajarola, R., & Gobbetti, E. (2020). State-of-the-art in automatic 3D reconstruction of structured indoor environments. CGF, 39(2), 667–699.
Pock, T., & Chambolle, A. (2011). Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In Iccv (pp. 1762–1769). IEEE.
Pomerleau, F., Colas, F., & Siegwart, R. (2014). A survey of rigid 3D pointcloud registration algorithms. In International conference on ambient computing, applications, services and technologies.
Pomerleau, F., Colas, F., & Siegwart, R. (2015). A review of point cloud registration algorithms for mobile robotics. Foundations and Trends Robotics, 4(1), 1–104.
Popov, S., Bauszat, P., & Ferrari, V. (2020). CoReNet: Coherent 3D scene reconstruction from a single RGB image. In Eccv (Vol. 12347, pp. 366–383).
Qi, C. R. [C. R.], Su, H., Mo, K., & Guibas, L. (2017). Point-Net: Deep learning on point sets for 3D classification and segmentation. In Cvpr (pp. 77–85).
Qi, C., Litany, O., He, K., & Guibas, L. (2019). Deep hough voting for 3d object detection in point clouds. In Iccv (pp. 9276–9285).
Qi, C. R. [Charles Ruizhongtai], Yi, L., Su, H., & Guibas, L. J. (2017). PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Neurips (pp. 5099–5108).
Reed, S., Oord, A., Kalchbrenner, N., Colmenarejo, S. G., Wang, Z., Chen, Y., Belov, D., & Freitas, N. D. (2017). Parallel multiscale autoregressive density estimation. In Icml (Vol. 70, pp. 2912–2921).
Rezende, D. J., Eslami, S., Mohamed, S., Battaglia, P., Jaderberg, M., & Heess, N. (2016). Unsupervised learning of 3D structure from images. In Nips (pp. 4997–5005).
Riegler, G., Ulusoy, A. O., Bischof, H., & Geiger, A. (2017a). OctNetFusion: Learning depth fusion from data. In 3dv (pp. 57–66).
Riegler, G., Ulusoy, A. O., & Geiger, A. (2017b). OctNet: Learning deep 3D representations at high resolutions. In Cvpr (pp. 6620–6629).
Rist, C. B., Emmerichs, D., Enzweiler, M., & Gavrila, D. M. (2021). Semantic scene completion using local deep implicit functions on LiDAR data. PAMI.
Rist, C. B., Schmidt, D., Enzweiler, M., & Gavrila, D. M. (2020). SCSSnet: Learning spatially-conditioned scene segmentation on LiDAR point clouds. In Iv (pp. 1086–1093).
Rock, J., Gupta, T., Thorsen, J., Gwak, J., Shin, D., & Hoiem, D. (2015). Completing 3D object shape from one depth image. In Cvpr (pp. 2484–2493).
Roldão, L., de Charette, R., & Verroust-Blondet, A. (2020). LMSCNet: Lightweight multiscale 3D semantic completion. In 3dv (pp. 111–119).
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Miccai (Vol. 9351, pp. 234–241).
Ros, G., Sellart, L., Materzynska, J., Vázquez, D., & López, A. M. (2016). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Cvpr (pp. 3234–3243).
Roynard, X., Deschaud, J.-E., & Goulette, F. (2018). Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification. The International Journal of Robotics Research, 37(6), 545–557.
Saputra, M. R. U., Markham, A., & Trigoni, A. (2018). Visual SLAM and structure from motion in dynamic environments. ACM Computing Surveys (CSUR), 51(2), 37:1-37:36.
Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., & Guo, B. (2012). An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Transactions on Graphics, 255, 23–32.
Sharma, A., Grau, O., & Fritz, M. (2016). VConv-DAE: Deep volumetric shape learning without object labels. In Eccv workshops (Vol. 9915, pp. 236–250).
Shen, C.-H., Fu, H., Chen, K., & Hu, S. (2012). Structure recovery by part assembly. ACM Transactions on Graphics, 31(6), 180:1-180:11.
Shi, S., Wang, Z., Shi, J., Wang, X., & Li, H. (2020). From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. PAMI, 43(8), 2647–2664.
Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In Eccv (Vol. 7576, pp. 746–760).
Sipiran, I., Gregor, R., & Schreck, T. (2014). Approximate symmetry detection in partial 3D meshes. Computer Graphics Forum, 33(7), 131–140.
Smith, E., & Meger, D. (2017). Improved adversarial systems for 3D object generation and reconstruction. In Corl (Vol. 78, pp. 87–96).
Song, S., & Xiao, J. (2016). Deep sliding shapes for amodal 3d object detection in RGB-D images. In CVPR (pp. 808–816).
Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. A. (2017). Semantic scene completion from a single depth image. In Cvpr (pp. 190–198).
Song, S., Zeng, A., Chang, A. X., Savva, M., Savarese, S., & Funkhouser, T. (2018). Im2pano3d: Extrapolating 360\(^{\circ }\) structure and semantics beyond the field of view. In Cvpr (pp. 3847–3856).
Sorkine-Hornung, O., & Cohen-Or, D. (2004). Least-squares meshes. Proceedings Shape Modeling Applications, 2004, 191–199.
Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur-Artal, R., Ren, C., Verma, S., & Newcombe, R. A. (2019). The Replica dataset: A digital replica of indoor spaces. arXiv:1906.05797.
Stutz, D., & Geiger, A. (2018). Learning 3D shape completion from laser scan data with weak supervision. In Cvpr (pp. 1955–1964).
Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3D shape recognition. In Iccv (pp. 945–953).
Sung, M., Kim, V. G., Angst, R., & Guibas, L. (2015). Data-driven structural priors for shape completion. ACM Transactions on Graphics, 34(6), 175:1-175:11.
Tan, W., Qin, N., Ma, L., Li, Y., Du, J., Cai, G., Yang, K., & Li, J. (2020). Toronto-3D: A large-scale mobile LiDAR dataset for semantic segmentation of urban roadways. In Cvpr workshops (pp. 797–806).
Tchapmi, L. P., Choy, C., Armeni, I., Gwak, J., & Savarese, S. (2017). SEGCloud: Semantic segmentation of 3D point clouds. In 3dv (pp. 537–547).
Tchapmi, L. P., Kosaraju, V., Rezatofighi, H., Reid, I., & Savarese, S. (2019). TopNet: Structural point cloud decoder. In Cvpr (pp. 383–392).
Thomas, H., Qi, C. R., Deschaud, J.-E., Marcotegui, B., Goulette, F., & Guibas, L. (2019). KPConv: Flexible and deformable convolution for point clouds. In Iccv (pp. 6410–6419).
Thrun, S., & Wegbreit, B. (2005). Shape from symmetry. In Iccv (pp. 1824–1831).
Vallet, B., Brédif, M., Serna, A., Marcotegui, B., & Paparoditis, N. (2015). TerraMobilita/iQmulus urban point cloud analysis benchmark. Computers & Graphics, 49, 126–133.
Varley, J., DeChant, C., Richardson, A., Ruales, J., & Allen, P. (2017). Shape completion enabled robotic grasping. In Iros (pp. 2442–2447).
Wang, P., Liu, Y., Guo, Y., Sun, C., & Tong, X. (2017). OCNN: Octree-based convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics, 36(4), 72:1-72:11.
Wang, P.-S., Liu, Y., & Tong, X. (2020a). Deep Octree-based CNNs with output-guided skip connections for 3D shape and scene completion. In Cvpr workshops (pp. 1074–1081).
Wang, P.-S., Sun, C., Liu, Y., & Tong, X. (2018). Adaptive OCNN: A patch-based deep representation of 3D shapes. TOG, 37(6), 217:1-217:11.
Wang,W., Yu, R., Huang, Q., & Neumann, U. (2018). SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In Cvpr (pp. 2569–2578).
Wang, X., Ang, M., & Lee, G. H. (2020b). Cascaded refinement network for point cloud completion. In Cvpr (pp. 787–796).
Wang, X., Ang, M., & Lee, G. H. (2020c). Point cloud completion by learning shape priors. In Iros (pp. 10719–10726).
Wang, X., Oswald, M., Cherabier, I., & Pollefeys, M. (2019a). Learning 3D semantic reconstruction on octrees. In German conference on pattern recognition (Vol. 11824, pp. 581–594).
Wang, Y. [Yida], Tan, D. J., Navab, N., & Tombari, F. (2020). SoftPoolNet: Shape descriptor for point cloud completion and classification. In Eccv (Vol. 12348, pp. 70–85).
Wang, Y. [Yida], Tan, D. J., Navab, N., & Tombari, F. (2018). Adversarial semantic scene completion from a single depth image. In 3dv (pp. 426–434).
Wang, Y. [Yida], Tan, D. J., Navab, N., & Tombari, F. (2019b). ForkNet: Multi-branch volumetric semantic completion from a single depth image. In Iccv (pp. 8607–8616).
Wang, Y. [Yifan],Wu, S., Huang, H., Cohen-Or, D., &Sorkine- Hornung, O. (2019c). Patch-based progressive 3D point set upsampling. In Cvpr (pp. 5958–5967).
Wang, Y. [Yue], Sun, Y., Liu, Z., Sarma, S., Bronstein, M., & Solomon, J. (2019). Dynamic graph CNN for learning on point clouds. TOG, 38(5), 146:1–146:12.
Wen, X., Li, T., Han, Z., & Liu, Y.-S. (2020). Point cloud completion by skip-attention network with hierarchical folding. In Cvpr (pp. 1936–1945).
Wu, S.-C., Tateno, K., Navab, N., & Tombari, F. (2020). SCFusion: Real-time incremental scene reconstruction with semantic completion. In 3dv (pp. 801–810).
Xiao, J., Owens, A., & Torralba, A. (2013). SUN3D: A database of big spaces reconstructed using SfM and object labels. In Iccv (pp. 1625–1632).
Xie, H., Yao, H., Zhou, S., Mao, J., Zhang, S., & Sun, W. (2020). GRNet: Gridding residual network for dense point cloud completion. In Eccv (Vol. 12354, pp. 365–381).
Xie, Y., Tian, J., & Zhu, X. (2020a). Linking points with labels in 3D: A review of point cloud semantic segmentation. Geoscience and Remote Sensing Magazine, 8, 38–59.
Xie, Y., Tian, J., & Zhu, X. X. (2020b). Linking points with labels in 3d: A review of point cloud semantic segmentation. Geoscience and Remote Sensing Magazine, 8, 38–59.
Yan, X., Gao, J., Li, J., Zhang, R., Li, Z., Huang, R., & Cui, S. (2021). Sparse single sweep LiDAR point cloud segmentation via learning contextual shape priors from scene completion. In Aaai (pp. 3101–3109).
Yang, B., Rosa, S., Markham, A., Trigoni, N., & Wen, H. (2019). Dense 3D object reconstruction from a single depth view. PAMI, 41(12), 2820–2834.
Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In Iclr.
Yuan,W., Khot, T., Held, D., Mertz, C., & Hebert, M. (2018). PCN: Point completion network. In 3dv (pp. 728–737).
Zhang, G., & Chen, Y. [YangQuan]. (2021). A metric for evaluating 3D reconstruction and mapping performance with no ground truthing. In Icip.
Zhang, J. [J.], Zhao, H., Yao, A., Chen, Y., Zhang, L., & Liao, H. (2018a). Efficient semantic scene completion network with spatial group convolution. In Eccv (Vol. 11216, pp. 749–765).
Zhang, J. [Jiaying], Zhao, X., Chen, Z., & Lu, Z. (2019a). A review of deep learning-based semantic segmentation for point cloud. IEEE Access, 7, 179118–179133.
Zhang, L., Wang, L., Zhang, X., Shen, P., Bennamoun, M., Zhu, G., et al. (2018b). Semantic scene completion with dense CRF from a single depth image. Neuro-computing, 318, 182–195.
Zhang, P., Liu, W., Lei, Y., Lu, H., & Yang, X. (2019b). Cascaded context pyramid for full-resolution 3D semantic scene completion. In Iccv (pp. 7800–7809).
Zhang,W., Yan, Q., & Xiao, C. (2020). Detail preserved point cloud completion via separated feature aggregation. In Eccv (Vol. 12370, pp. 512–528).
Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S.-C. (2013). Beyond point clouds: Scene understanding by reasoning geometry and physics. In Cvpr (pp. 3127–3134).
Zhong, M., & Zeng, G. (2020). Semantic point completion network for 3D semantic scene completion. In Ecai (Vol. 325, pp. 2824–2831).
Zimmermann, K., Petrícek, T., Salanský, V., & Svoboda, T. (2017). Learning for active 3D mapping. In Iccv (pp. 1548–1556).
Zollhöfer, M., Stotko, P., Görlitz, A., Theobalt, C., Nießner, M., Klein, R., & Kolb, A. (2018). State of the art on 3D reconstruction with RGB-D cameras. CGF, 37(2), 625–652.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Stephen Lin.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Roldão, L., de Charette, R. & Verroust-Blondet, A. 3D Semantic Scene Completion: A Survey. Int J Comput Vis 130, 1978–2005 (2022). https://doi.org/10.1007/s11263-021-01504-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01504-5