DCARN: Deep Context Aware Recurrent Neural Network for Semantic Segmentation of Large Scale Unstructured 3D Point Cloud

Mehmood, Saba; Shahzad, Muhammad; Fraz, Muhammad Moazam

doi:10.1007/s11063-020-10368-8

DCARN: Deep Context Aware Recurrent Neural Network for Semantic Segmentation of Large Scale Unstructured 3D Point Cloud

Published: 17 October 2020

Volume 55, pages 881–904, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

779 Accesses
2 Citations
Explore all metrics

Abstract

Semantic segmentation of large unstructured 3D point clouds is important problem for 3D object recognition which in turn is essential to solving more complex tasks such as scene understanding. The problem is highly challenging owing to large scale of data, varying point density and localization errors of 3D points. Nevertheless, with recent successes of deep neural network architectures to solve complex 2D perceptual problems, several researchers have shown interest to translate the developed 2D networks to 3D point cloud segmentation by a prior voxelization step for an explicit neighborhood representation. However, such a 3D grid representation loses the fine details and inherent structure due to quantization artifacts. For this purpose, this paper proposes an approach to performing semantic segmentation of 3D point clouds by exploiting the idea of super-point based graph construction. The proposed architecture is composed of two cascaded modules including a light-weight representation learning module which uses unsupervised geometric grouping to partition the large-scale unstructured 3D point cloud and a deep context aware sequential network based on long short memory units and graph convolutions with embedding residual learning for semantic segmentation. The proposed model is evaluated on two standard benchmark datasets and achieves competitive performance with the existing state-of-the-art datasets. The code and the obtained results have been made public at https://github.com/saba155/DCARN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

PCT: Point cloud transformer

Article Open access 10 April 2021

References

Konrad W, Mathias R, Dieter F, Norbert H (2013) Image acquisition and model selection for multi-view stereo. Int Arch Photogramm Remote Sens Spatial Inf Sci 40(5W):251–258
Google Scholar
Rottensteiner F, Trinder J, Clode S (2005) Data acquisition for 3d city models from lidar extracting buildings and roads. In : Proceedings 2005 IEEE international geoscience and remote sensing symposium, 2005. IGARSS’05, vol 1, pp 4
Buckley Simon J, Howell JA, Enge HD, Kurz TH (2008) Terrestrial laser scanning in geology: data acquisition, processing and accuracy considerations. J Geol Soc 165(3):625–638
Article Google Scholar
Muhammad S, Xiang ZX (2015) Automatic detection and reconstruction of 2-d/3-d building shapes from spaceborne tomosar point clouds. IEEE Trans Geosci Remote Sens 54(3):1292–1310
Google Scholar
Shahzad M, Zhu XX (2014) Robust reconstruction of building facades for large areas using spaceborne tomosar point clouds. IEEE Trans Geosci Remote Sens 53(2):752–769
Article Google Scholar
Wolf D, Prankl J, Vincze M (2015) Fast semantic segmentation of 3d point clouds using a dense crf with learned parameters. In: 2015 IEEE International conference on robotics and automation (ICRA), pp 4867–4873
Guinard S, Landrieu L (2017) Weakly supervised segmentation-aided classification of urban scenes from 3d lidar point clouds. In: International archives of the photogrammetry, remote sensing and spatial information sciences, vol XLII-1/W1
Hu H, Munoz D, Bagnell JA, Hebert M (2013) Efficient 3-d scene analysis from streaming data. In: 2013 IEEE international conference on robotics and automation, pp 2297–2304
Weinmann M, Hinz S, Weinmann M (2017) A hybrid semantic point cloud classification-segmentation framework based on geometric features and semantic rules. PFG-J Photogramm Remote Sens Geoinf Sci 85(3):183–194
Google Scholar
Liu Y, Xu C, Chen Z, Chen C, Zhao H, Jin X (2020) Deep dual-stream network with scale context selection attention module for semantic segmentation. Neural Process Lett 51:2281–2299. https://doi.org/10.1007/s11063-019-10148-z
Article Google Scholar
Zhang R, Candra SA, Vetter K, Zakhor A (2015) Sensor fusion for semantic segmentation of urban scenes. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 1850–1857
Becker C, Häni N, Rosinskaya E, d’Angelo E, Strecha C (2017) Classification of aerial photogrammetric 3d point clouds. arXiv preprint arXiv:1705.08374
Timo H, Wegner JD, Konrad S (2016) Fast semantic segmentation of 3d point clouds with strongly varying density. ISPRS Ann Photogramm Remote Sens Spatial Inf Sci 3(3):177–184
Google Scholar
Munoz D, Vandapel N, Hebert M (2008) Directional associative markov network for 3-d point cloud classification. In: Proceedings of 3DPVT’08 - the fourth international symposium on 3D data processing, visualization and transmission, June 18–20. Georgia Institute of Technology, Atlanta, GA, USA
Vosselman G (2013) Point cloud segmentation for urban scene classification. Int Arch Photogramm Remote Sens Spatial Inf Sci 1:257–262
Article Google Scholar
Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 922–928
Tchapmi L, Choy C, Armeni I, Gwak J, Savarese S (2017) Segcloud: semantic segmentation of 3d point clouds. In: 2017 international conference on 3D vision (3DV), pp 537–547
Yang Yu, Qingbiao W, Khan Y, Chen M (2015) An adaptive variation model for point cloud normal computation. Neural Comput Appl 26(6):1451–1460
Article Google Scholar
Li C, Bing L, Zhang Y, Liu H, Yanyun Q (2018) 3d reconstruction of indoor scenes via image registration. Neural Process Lett 48(3):1281–1304
Article Google Scholar
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Hua B-S, Tran M-K, Yeung S-K (2018) Pointwise convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 984–993
Huang Q, Wang W, Neumann U (2018) Recurrent slice networks for 3d segmentation of point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2626–2635
Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: 31st conference on neural information processing systems (NIPS 2017). Long Beach, CA, USA
Wang W, Yu R, Huang Q, Neumann U (2018) Sgpn: similarity group proposal network for 3d point cloud instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2569–2578
Yi L, Zhao W, Wang H, Sung M, Guibas LJ (2019) Gspn: generative shape proposal network for 3d instance segmentation in point cloud. In: The IEEE conference on computer vision and pattern recognition (CVPR), June
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: The IEEE conference on computer vision and pattern recognition (CVPR), June
Wang X, Liu S, Shen X, Shen C, Jia J (2019) Associatively segmenting instances and semantics in point clouds. In: The IEEE conference on computer vision and pattern recognition (CVPR), June
Chen C, Li G, Xu R, Chen T, Wang M, Lin L (2019) Clusternet: deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis. In: The IEEE conference on computer vision and pattern recognition (CVPR), June
Pham Q-H, Nguyen T, Hua B-S, Roig G, Yeung S-K (2019) Jsis3d: joint semantic-instance segmentation of 3d point clouds with multi-task pointwise networks and multi-value conditional random fields. In: The IEEE conference on computer vision and pattern recognition (CVPR), June
Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: The IEEE conference on computer vision and pattern recognition (CVPR), June
Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4558–4567
Wang Y, Ji R, Chang S-F (2013) Label propagation from imagenet to 3d point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3135–3142
Tian Z, Liu L, Zhang Z, Fei B (2015) Superpixel-based segmentation for 3d prostate mr images. IEEE Trans Med Imag 35(3):791–801
Article Google Scholar
Xu J, Ishikawa H, Wollstein G, Schuman JS (2011) 3d optical coherence tomography super pixel with machine classifier analysis for glaucoma detection. In: 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp 3395–3398
Yang H, Zhang H (2016) Efficient 3d room shape recovery from a single panorama. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5422–5430
Landrieu L, Boussaha M (2019) Point cloud oversegmentation with graph-structured deep metric learning. In: The IEEE conference on computer vision and pattern recognition (CVPR) June
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intel 34(11):2274–2282
Article Google Scholar
Aubry M, Schlickewei U, Cremers D (2011) The wave kernel signature: a quantum mechanical approach to shape analysis. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), pp 1626–1633
Sun J, Ovsjanikov M, Guibas L (2009) A concise and provably informative multi-scale signature based on heat diffusion. Comput Graph Forum 28:1383–1392
Article Google Scholar
Bronstein MM , Kokkinos I (2010) Scale-invariant heat kernel signatures for non-rigid shape recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 1704–1711
Rusu RB, Blodow N, Marton ZC, Beetz M (2008) Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ international conference on intelligent robots and systems, pp 3384–3391
Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: 2009 IEEE international conference on robotics and automation, pp 3212–3217
Ling H, Jacobs DW (2007) Shape classification using the inner-distance. IEEE Trans Pattern Anal Mach Intel 29(2):286–299
Article Google Scholar
Chen D-Y, Tian X-P, Shen Y-T, Ouhyoung M (2003) On visual similarity based 3d model retrieval. Comput Graph Forum 22:223–232
Article Google Scholar
Johnson AE, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans Pattern Anal Mach Intel 21(5):433–449
Article Google Scholar
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for finegrained image recognition. In: IEEE transactions on pattern analysis and machine intelligence, early access 30 July 2019. https://doi.org/10.1109/TPAMI.2019.2932058
Jun Yu, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Google Scholar
Hong C, Jun Yu, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet MATH Google Scholar
Jun Yu, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Article MathSciNet MATH Google Scholar
Engelmann F, Kontogianni T, Hermans A, Leibe B (2017) Exploring spatial context for 3d semantic segmentation of point clouds. In: Proceedings of the IEEE international conference on computer vision, pp 716–724
Arshad S, Shahzad M, Riaz Q, Fraz MM (2019) Dprnet: deep 3d point based residual network for semantic segmentation and classification of 3d point clouds. IEEE Access 7:68892–68904
Article Google Scholar
Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3693–3702
Newell ME (1972) A new approach to the shaded picture problem. In: Proceedings of the ACM National Conference
Van LCF, Golub GH (1983) Matrix computations. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
Article MathSciNet MATH Google Scholar
Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intel 17(8):790–799
Article Google Scholar
Ester M, Kriegel H-P, Sander J, Xiaowei X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231
Google Scholar
Campello RJ, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pacific-Asia conference on knowledge discovery and data mining, pp 160–172
Demantké J, Mallet C, David N, Vallet B (2011) Dimensionality based scale selection in 3D lidar point clouds. International Archives of the Photogrammetry. Remote Sensing and Spatial Information Sciences, vol 38-5/W12. Calgary, Canada, pp 97–102
Landrieu L, Obozinski G (2017) Cut pursuit: Fast algorithms to learn piecewise constant functions on general weighted graphs. SIAM J Imag Sci 10(4):1724–1766
Article MathSciNet MATH Google Scholar
Jaderberg M , Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. Adv Neural Information Processing Systems, pp 2017–2025
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Liang X, Lin L, Shen X, Feng J, Yan S, Xing EP (2017) Interpretable structure-evolving lstm. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Yue B, Junwei F, Liang J (2018) Residual recurrent neural networks for learning sequential representations. Information 9(3):56
Article Google Scholar
Wang X, Liu S, Shen X, Shen C, Jia J (2019) Associatively segmenting instances and semantics in point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4096–4105
Armeni I, Sener O, Zamir AR, Jiang H, Brilakis I, Fischer M, Savarese S (2016) 3d semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1534–1543
Lawin FJ, Danelljan M, Tosteberg P, Bhat G, Khan FS, Felsberg M (2017) Deep projective 3d semantic segmentation. In: International conference on computer analysis of images and patterns, pp 95–107
Boulch LSB, Audebert N (2017) Unstructured point cloud semantic labeling using deep segmentation networks. 3DOR 2:7
Google Scholar
Contreras J, Denzler J (2019) Edge-convolution point net for semantic segmentation of large-scale point clouds. In: IGARSS 2019-2019 IEEE international geoscience and remote sensing symposium, pp 5236–5239
Wang F, Zhuang Y, Hong G, Huosheng H (2019) Octreenet: a novel sparse 3-d convolutional neural network for real-time 3-d outdoor scene analysis. IEEE Trans Autom Sci Eng 17(2):735–747
Article Google Scholar
Thomas H, Goulette F, Deschaud J-E, Marcotegui B, LeGall Y (2018) Semantic classification of 3d point clouds with multiscale spherical neighborhoods. In: 2018 International conference on 3D vision (3DV), pp 390–398
Hackel T, Savinov N, Ladicky L, Wegner JD , Schindler K, Pollefeys M (2017) Semantic3d. net: a new large-scale point cloud classification benchmark. arXiv preprint arXiv:1704.03847
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Hoppe H, DeRose T, Duchamp T, McDonald J, Stuetzle W (1992) Surface reconstruction from unorganized points. In: Proceedings of the 19th annual conference on Computer graphics and interactive techniques, pp 71–78
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

Download references

Author information

Authors and Affiliations

National University of Sciences and Technology (NUST), Islamabad, Pakistan
Saba Mehmood, Muhammad Shahzad & Muhammad Moazam Fraz
Deep Learning Laboratory, National Center of Articial Intelligence, Islamabad, 44000, Pakistan
Muhammad Shahzad
The Alan Turing Institute, British Library, London, NW1 2DB, UK
Muhammad Moazam Fraz

Authors

Saba Mehmood
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Shahzad
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Moazam Fraz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Moazam Fraz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

We present more visualization of segmentation results from S3DIS dataset in Figs. 7 and 8 along with input point clouds, geometric grouping and ground truth. Ceilings are hided for clarity of view of point clouds. Points belonging to different classes are color coded differently where floor is shown as blue, column as blue green, wall as brown, windows light green, beam in salmon color, table as dark green, chair as dark blue, sofa in purple, boards in grey, bookcase in red and clutter in light grey color.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mehmood, S., Shahzad, M. & Fraz, M.M. DCARN: Deep Context Aware Recurrent Neural Network for Semantic Segmentation of Large Scale Unstructured 3D Point Cloud. Neural Process Lett 55, 881–904 (2023). https://doi.org/10.1007/s11063-020-10368-8

Download citation

Accepted: 04 October 2020
Published: 17 October 2020
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11063-020-10368-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DCARN: Deep Context Aware Recurrent Neural Network for Semantic Segmentation of Large Scale Unstructured 3D Point Cloud

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation