Abstract
Recognizing 3D part instances from a 3D point cloud is crucial for 3D structure and scene understanding. Several learning-based approaches use semantic segmentation and instance center prediction as training tasks and fail to further exploit the inherent relationship between shape semantics and part instances. In this paper, we present a new method for 3D part instance segmentation. Our method exploits semantic segmentation to fuse nonlocal instance features, such as center prediction, and further enhances the fusion scheme in a multi- and cross-level way. We also propose a semantic region center prediction task to train and leverage the prediction results to improve the clustering of instance points. Our method outperforms existing methods with a large-margin improvement in the PartNet benchmark. We also demonstrate that our feature fusion scheme can be applied to other existing methods to improve their performance in indoor scene instance segmentation tasks.
![](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs41095-022-0300-x/MediaObjects/41095_2022_300_Fig1_HTML.jpg)
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Availability of data and materials
PartNet, ScanNet, and S3DIS are all publicly released datasets.
References
Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.; Savarese, S. SEGCloud: Semantic segmentation of 3D point clouds. In: Proceedings of the International Conference on 3D Vision, 537–547, 2017.
Yang, B.; Wang, J. N.; Clark, R.; Hu, Q. Y.; Wang, S.; Markham, A.; Trigoni, N. Learning object bounding boxes for 3D instance segmentation on point clouds. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 605, 6740–6749, 2019.
Lahoud, J.; Ghanem, B.; Oswald, M. R.; Pollefeys, M. 3D instance segmentation via multi-task metric learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9255–9265, 2019.
Zhang, F. H.; Guan, C. Y.; Fang, J.; Bai, S.; Yang, R. G.; Torr, P. H. S.; Prisacariu, V. Instance segmentation of LiDAR point clouds. In: Proceedings of the IEEE International Conference on Robotics and Automation, 9448–9455, 2020.
Tan, J. G.; Chen, L. L.; Wang, K. R.; Li, J. M.; Zhang, X. L. SASO: Joint 3D semantic-instance segmentation via multi-scale semantic association and salient point clustering optimization. IET Computer Vision Vol. 15, No. 5, 366–379, 2021.
Engelmann, F.; Bokeloh, M.; Fathi, A.; Leibe, B.; Nießner, M. 3D-MPA: Multi-proposal aggregation for 3D semantic instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9028–9037, 2020.
Liu, S. H.; Yu, S. Y.; Wu, S. C.; Chen, H. T.; Liu, T. L. Learning Gaussian instance segmentation in point clouds. arXiv preprint arXiv:2007.09860, 2020.
Jiang, L.; Zhao, H. S.; Shi, S. S.; Liu, S.; Fu, C. W.; Jia, J. Y. PointGroup: Dual-set point grouping for 3D instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4866–4875, 2020.
Zhang, B.; Wonka, P. Point cloud instance segmentation using probabilistic embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8879–8888, 2021.
He, T.; Shen, C. H.; van den Hengel, A. DyCo3D: Robust instance segmentation of 3D point clouds through dynamic convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 354–363, 2021.
Wang, X. L.; Liu, S.; Shen, X. Y.; Shen, C. H.; Jia, J. Y. Associatively segmenting instances and semantics in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4091–4100, 2019.
Zhao, L.; Tao, W. B. JSNet: Joint instance and semantic segmentation of 3D point clouds. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 12951–12958, 2020.
Mo, K. C.; Zhu, S. L.; Chang, A. X.; Yi, L.; Tripathi, S.; Guibas, L. J.; Su, H. PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 909–918, 2019.
Dai, A.; Chang, A. X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2432–2443, 2017.
Armeni, I.; Sener, O.; Zamir, A. R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1534–1543, 2016.
Hafiz, A. M.; Bhat, G. M. A survey on instance segmentation: State of the art. International Journal of Multimedia Information Retrieval Vol. 9, No. 3, 171–189, 2020.
Girshick, R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440–1448, 2015.
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, 91–99, 2015.
Wang, X. L.; Kong, T.; Shen, C. H.; Jiang, Y. N.; Li, L. SOLO: Segmenting objects by locations. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12363. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 649–665, 2020.
He, K. M.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2980–2988, 2017.
Bai, M.; Urtasun, R. Deep watershed transform for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2858–2866, 2017.
Dai, J. F.; He, K. M.; Li, Y.; Ren, S. Q.; Sun, J. Instance-sensitive fully convolutional networks. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9910. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 534–549, 2016.
Chen, X. L.; Girshick, R.; He, K. M.; Dollar, P. TensorMask: A foundation for dense object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2061–2069, 2019.
Zhang, H.; Sun, H.; Ao, W.; Dimirovski, G. A survey on instance segmentation: Recent advances and challenges. International Journal of Innovative Computing, Information and Control Vol. 17, No. 3, 1041–1053, 2021.
Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 7, 3523–3542, 2022.
Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep learning for 3D point clouds: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 12, 4338–4364, 2021.
He, Y.; Yu, H. S.; Liu, X. Y.; Yang, Z. G.; Sun, W.; Wang, Y. N.; Fu, Q.; Zou, Y. M.; Mian, A. Deep learning based 3D segmentation: A survey. arXiv preprint arXiv:2103.05423, 2021.
Jiang, H. Y.; Yan, F. L.; Cai, J. F.; Zheng, J. M.; Xiao, J. End-to-end 3D point cloud instance segmentation without detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12793–12802, 2020.
Hou, J.; Dai, A.; Nieýner, M. 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4416–4425, 2019.
Yi, L.; Zhao, W.; Wang, H.; Sung, M.; Guibas, L. J. GSPN: Generative shape proposal network for 3D instance segmentation in point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3942–3951, 2019.
Wang, W. Y.; Yu, R.; Huang, Q. G.; Neumann, U. SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2569–2578, 2018.
Liu, C.; Furukawa, Y. MASC: Multi-scale affinity with sparse convolution for 3D instance segmentation. arXiv preprint arXiv:1902.04478, 2019.
Han, L.; Zheng, T.; Xu, L.; Fang, L. OccuSeg: Occupancy-aware 3D instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2937–2946, 2020.
Chen, S. Y.; Fang, J. M.; Zhang, Q.; Liu, W. Y.; Wang, X. G. Hierarchical aggregation for 3D instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 15447–15456, 2021.
Liang, Z. H.; Li, Z. H.; Xu, S. C.; Tan, M. K.; Jia, K. Instance segmentation in 3D scenes using semantic superpoint tree networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2763–2772, 2021.
Yu, F. G.; Liu, K.; Zhang, Y.; Zhu, C. Y.; Xu, K. PartNet: A recursive part decomposition network for fine-grained and hierarchical shape segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9483–9492, 2019.
Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 24, No. 5, 603–619, 2002.
Wang, P.-S.; Liu, Y.; Guo, Y.-X.; Sun, C.-Y.; Tong, X. O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 72, 2017.
Wang, P. S.; Liu, Y.; Tong, X. Deep octree-based CNNs with output-guided skip connections for 3D shape and scene completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1074–1081, 2020.
Graham, B.; van der Maaten, L. Submanifold sparse convolutional networks. arXiv preprint arXiv:1706.01307, 2017.
Choy, C.; Gwak, J.; Savarese, S. 4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3070–3079, 2019.
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G. S.; Davis, A.; Dean, J.; Devin, M., et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V., et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research Vol. 12, 2825–2830, 2011.
Qi, C. R.; Yi, L.; Su, H.; Guibas, L. J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, 2017.
Sharma, G.; Liu, D. F.; Maji, S.; Kalogerakis, E.; Chaudhuri, S.; Mĕch, R. ParSeNet: A parametric surface fitting network for 3D point clouds. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12352. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 261–276, 2020.
Author information
Authors and Affiliations
Contributions
Chun-Yu Sun proposed and implemented the key idea, conducted the main experiments, and contributed to paper writing. Xin Tong supervised the findings of this work and verified the key concept. Yang Liu led the project and contributed to the key concept, experimental design, and paper writing.
Corresponding author
Ethics declarations
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Chun-Yu Sun received his bachelor degree in computer science and technology from Xidian University in 2015. He is currently a Ph.D. student at Institute for Advanced Study, Tsinghua University. His research interests include computer graphics and 3D vision.
Xin Tong is a principal researcher manager with Microsoft Research Asia, where he leads the Internet Graphics Group. He received his Ph.D. degree from Tsinghua University in 1999. His research interests include computer graphics and computer vision, including texture synthesis, appearance modeling, light transport simulation and acquisition, 3D facial animation, and data-driven geometric processing. He was on the editorial boards of IEEE Transactions on Visualization and Computer graphics, ACM Transactions on Graphics, and Computer Graphics Forum.
Yang Liu is a principal researcher at Microsoft Research Asia. He received his Ph.D. degree from The University of Hong Kong in 2008, master and bachelor degrees in computational mathematics from University of Science and Technology of China in 2003 and 2000, respectively. His recent research focuses on geometry processing and 3D learning. He is on the editorial boards of IEEE Transactions on Visualization and Computer graphics and ACM Transactions on Graphics.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095.
To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Sun, CY., Tong, X. & Liu, Y. Semantic segmentation-assisted instance feature fusion for multi-level 3D part instance segmentation. Comp. Visual Media 9, 699–715 (2023). https://doi.org/10.1007/s41095-022-0300-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-022-0300-x