Abstract
Large-scale point cloud segmentation is one of the important research directions in the field of computer vision, aiming at segmenting 3D point cloud data into parts with semantic meaning, which is widely used in the fields of robot perception, automated driving, and virtual reality. In practical applications, intelligences often face various uncertainties such as sensor noise, missing data, and uncertain model parameter estimation. However, many current research works do not consider the effects of these uncertainties, which can cause the model to overfit the noisy data and thus affect the model performance. In this paper, we propose a point cloud segmentation method with domain uncertainty that can greatly improve the robustness of the model to noise. Specifically, we first compute the neighborhood uncertainty, which is more reflective of the semantics of a local region than the prediction of a single point, which will reduce the impact of noise. Next, we fuse the uncertainty into the objective function, which allows the model to focus more on relatively deterministic data. Finally, we validate on the large-scale datasets S3DIS and Toronto3D, and the segmentation performance is substantially improved in both cases.
Similar content being viewed by others
Data Availability
The S3DIS dataset used in the article can be obtained at http://buildingparser.stanford.edu/dataset.html. The Toronto3D dataset used in the article can be obtained at https://opendatalab.org.cn/Toronto-3D/download. The Semantic3D dataset used in the article can be obtained at http://www.semantic3d.net/view_dbase.php?chl=1. The Scannet-v2 dataset used in the article can be obtained at https://github.com/ScanNet/ScanNet.
References
Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-view 3d object detection network for autonomous driving. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6526–6534
Blanc T, Beheiry ME, Caporal C, Masson J-B, Hajj B (2020) Genuage: visualize and analyze multidimensional single-molecule point cloud data in virtual reality. Nature Methods 1–3
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2020) Nerf: representing scenes as neural radiance fields for view synthesis. arXiv:2003.08934
Qi C, Su H, Mo K, Guibas LJ (2016) Pointnet: deep learning on point sets for 3d classification and segmentation. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 77–85
Qi C, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS
Hu Q, Yang B, Xie L, Rosa S, Guo Y, Wang Z, Trigoni A, Markham A (2019) Randla-net: efficient semantic segmentation of large-scale point clouds. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11105–11114
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2018) Dynamic graph cnn for learning on point clouds. ACM Trans Graph (TOG) 38:1–12
Lin Y, Yan Z, Huang H, Du D, Liu L, Cui S, Han X (2020) Fpconv: learning local flattening for point convolution. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4292–4301
Thomas H, Qi C, Deschaud J-E, Marcotegui B, Goulette F, Guibas LJ (2019) Kpconv: flexible and deformable convolution for point clouds. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6410–6419
Choy CB, Gwak J, Savarese S (2019) 4d spatio-temporal convnets: Minkowski convolutional neural networks. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3070–3079
Graham B, Engelcke M, Maaten L (2017) 3d semantic segmentation with submanifold sparse convolutional networks. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 9224–9232
Huang S-S, Ma Z-Y, Mu T-J, Fu H, Hu S (2021) Supervoxel convolution for online 3d semantic segmentation. ACM Trans Graph (TOG) 40:1–15
Zhao H, Jiang L, Jia J, Torr PHS, Koltun V (2020) Point transformer. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 16239–16248
Wu X, Lao Y, Jiang L, Liu X, Zhao H (2022) Point transformer v2: grouped vector attention and partition-based pooling. arXiv:2210.05666
Sirohi K, Marvi S, Büscher D, Burgard W (2022) Uncertainty-aware lidar panoptic segmentation. 2023 IEEE international conference on robotics and automation (ICRA), pp 8277–8283
Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: spatial granularity network for one-stage video instance segmentation. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9811–9820
Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying. arXiv:2210.00911
Qin Z, Lu X, Nie X, Liu D, Yin Y, Wang W (2023) Coarse-to-fine video instance segmentation with factorized conditional appearance flows. IEEE/CAA J Automatica Sinica 10:1192–1208
Cui Y, Yan L, Cao Z, Liu D (2021) Tf-blender: temporal feature blender for video object detection. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 8118–8127
Liang J, Zhou T, Liu D, Wang, W (2023) Clustseg: clustering for universal segmentation. arXiv:2305.02187
Cui Y, Liu X, Liu H, Zhang J, Zare A, Fan B (2021) Geometric attentional dynamic graph convolutional neural networks for point cloud analysis. Neurocomputing 432:300–310
Xu C, Wu B, Wang Z, Zhan W, Vajda P, Keutzer K, Tomizuka M (2020) Squeezesegv3: spatially-adaptive convolution for efficient point-cloud segmentation. In: European conference on computer vision. https://api.semanticscholar.org/CorpusID:214802232
Cui Y, Ruan L, Dong H, Li Q, Wu Z, Zeng T, Fan F (2023) Cloud-rain: point cloud analysis with reflectional invariance. arXiv:2305.07814
Uy MA, Lee GH (2018) Pointnetvlad: deep point cloud based retrieval for large-scale place recognition. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4470–4479
Liu D, Cui Y, Yan L, Mousas C, Yang B, Chen Y (2020) Densernet: weakly supervised visual localization using multi-scale feature aggregation. In: AAAI conference on artificial intelligence. https://api.semanticscholar.org/CorpusID:227305257
Yan L, Cui Y, Chen Y, Liu D (2021) Hierarchical attention fusion for geo-localization. ICASSP 2021 - 2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2220–2224
Cheng Z, Choi H, Liang J, Feng S, Tao G, Liu D, Zuzak M, Zhang X (2023) Fusion is not enough: single-modal attacks to compromise fusion models in autonomous driving. arXiv:2304.14614
Armeni I, Sax S, Zamir AR, Savarese S (2017) Joint 2d-3d-semantic data for indoor scene understanding
Tan W, Qin N, Ma L, Li Y, Du J, Cai G, Yang K, Li J (2020) Toronto-3d: a large-scale mobile lidar dataset for semantic segmentation of urban roadways. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 797–806. https://doi.org/10.1109/CVPRW50498.2020.00109
Li Y, Bu R, Sun M, Chen B (2018) Pointcnn. arXiv:1801.07791
Landrieu L, Simonovsky M (2017) Large-scale point cloud semantic segmentation with superpoint graphs. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4558–4567
Lei H, Akhtar N, Mian AS (2019) Spherical kernel for efficient graph convolution on 3d point clouds. IEEE Trans Patt Anal Mach Intell 43:3664–3680
Zhao H, Jiang L, Fu C-W, Jia J (2019) Pointweb: enhancing local neighborhood features for point cloud processing. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5560–5568
Hackel T, Savinov N, Ladicky L, Wegner JD, Schindler K, Pollefeys M (2017) Semantic3d.net: a new large-scale point cloud classification benchmark. ISPRS annals of photogrammetry, remote sensing and spatial information sciences
Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: richly-annotated 3d reconstructions of indoor scenes. In: Proc. computer vision and pattern recognition (CVPR). IEEE
Acknowledgements
This research was funded by the Natural Science Project of China for Young and middle-aged(No.2022KY1786).
Author information
Authors and Affiliations
Contributions
Yong Bao: Conceptualisation, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing original draft. Wen Haibiao: Conceptualisation, Methodology. Zhang Bao Qing: Validation, Software.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflicts of interest.
Ethical standard
Data used in the present study are publicly available, and ethical approval and informed consent were obtained in each original study.
Competing interests
The authors have no relevant financial or nonfinancial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bao, Y., Wen, H. & Zhang, B. Semantic segmentation of large-scale point clouds with neighborhood uncertainty. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17814-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-023-17814-4