Abstract
Point cloud representation is a challenge to extracting sufficient semantic information while ensuring that the sparsely point cloud spatial structure is complete. Benefiting from the Transformer network, recent studies have promoted the development of point cloud representation by extracting refined attention features based on global context. However, there is still undesired semantic information loss in the feature extraction stage. Hence, this paper proposes a novel architecture for 3D point cloud representation, namely Relation-Shape Transformer Network (RS-TNet), to address above problem while maintaining the merits of relation-shape embedding mechanism so as to generate rich and robust local semantic features. Specifically, RS-TNet can achieve coarse-to-fine grained semantic information coverage by integrating the global multi-head self-attention and local Relation-Feature extraction module simultaneously. Moreover, theoretical analysis demonstrates that RS-TNet can explicitly introduce the spatial relation of points by learning underlying shapes. In this way, extracted features are of more shape awareness and robustness. As a result, the proposed RS-TNet achieves 90.9% class accuracy and 85.6% Intersection-over-Union on ModelNet40 and ShapeNet datasets, respectively. Further, ablation experiments verify the effectiveness of our RS-TNet in point cloud classification and part segmentation tasks.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al. (2016) Tensorflow: A system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{OSDI\}\) 16), pp 265–283
Atzmon M, Maron H, Lipman Y (2018) Point convolutional neural networks by extension operators. ACM Trans Graph, 37(4), 71:1
Chen J, Qin J, Shen Y, Liu L, Zhu F, Shao L (2020) Learning attentive and hierarchical representations for 3d shape recognition. In Computer Vision-ECCV, (2020) 16th European conference, Glasgow, UK, 23–28 Aug 2020. Proceedings, Part XV 16, 105–122
Engel N, Belagiannis V, Dietmayer K (2020) Point transformer. IEEE Access 9:134826–134840
Esteves C, Xu Y, Allen-Blanchette C, Daniilidis K (2019) Equivariant multi-view networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1568–1577
Fuchs F, Worrall D, Fischer V, Welling M (2020) Se (3)-transformers: 3d roto-translation equivariant attention networks. Adv Neural Inf Process Syst
Guan T, Wang J, Lan S, Chandra R, Wu Z, Davis L, Manocha D (2021) M3detr: Multi-representation, multi-scale, mutual-relation 3d object detection with transformers. arXiv preprint: arXiv:2104.11896
Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M (2021) Pct: Point cloud transformer. Comput Vis Media, pp 187–199
Han X-F, Jin Y-F, Cheng H-X, Xiao G-Q (2021) Dual transformer for point cloud analysis. arXiv preprint: arXiv:2104.13044
Han X-F, Kuang Y-J, Xiao GQ (2021) Point cloud learning with transformer. arXiv preprint: arXiv:2104.13636
Kaul C, Pears N, Manandhar S (2021) Fatnet: A feature-attentive network for 3d point cloud processing. In: 2020 25th International conference on pattern recognition (ICPR), pp 7211–7218
Klokov R, Lempitsky V (2017) Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE international conference on computer vision, 2017, pp 863–872
Li J, Chen BM, Lee GH (2018) So-net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9397–9406
Lin J, Rickert M, Perzylo A, Knoll A (2021) Pctma-net: Point cloud transformer with morphing atlas-based point generation network for dense point cloud completion. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS)
Liu Y, Fan B, Xiang S, Pan C (2019) Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8895–8904
Liu X, Han Z, Liu Y-S, Zwicker M (2019) Point2sequence: learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: Proceedings of the AAAI conference on artificial intelligence, pp 8778–8785
Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X (2020) Tanet: Robust 3d object detection from point clouds with triple attention. In: Proceedings of the AAAI conference on artificial intelligence, pp 11677–11684
Loshchilov I, Hutter F (2017) Sgdr: Stochastic gradient descent with warm restarts. In: International conference on learning representations
Luo Z, Liu D, Li J, Chen Y, Xiao Z, Junior JM, Goncalves WN, Wang C (2020) Learning sequential slice representation with an attention-embedding network for 3d shape recognition and retrieval in mls point clouds. In: ISPRS J Photogramm Remote Sens, pp 147–163
Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS) 2015:922–928
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648–5656
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Annual conference on neural information processing systems, pp 5099–5108
Qin C, You H, Wang L, Kuo C-CJ, Fu Y (2019) “Pointdan: A multi-scale 3d domain adaption network for point cloud representation. Adv Neural Inf Process Syst, pp 7192–7203
Riegler G, Osman Ulusoy A, Geiger A (2017) Octnet: Learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3577–3586
Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang M-H, Kautz J (2018) Splatnet: Sparse lattice networks for point cloud processing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2530–2539
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953
Sun W, Zhang Z, Huang J (2020) Robnet: real-time road-object 3d point cloud segmentation based on squeezenet and cyclic CRF. Soft Comput 24(8):5805–5818
Wang J, Fu X, Wang X, Liu S, Gao L, Zhang W (2020) Enabling energy-efficient and reliable neural network via neuron-level voltage scaling. IEEE transactions on computers, pp 1460–1473
Wang X, Jin Y, Cen Y, Lang C, Li Y (2021) PST-NET: point cloud sampling via point-based transformer. In: International conference on image and graphics, vol 12890, pp 57–69
Wang X, Jin Y, Cen Y, Wang T, Li Y (2021) Attention models for point clouds in deep learning: a survey. arXiv preprint arXiv:2102.10788
Wang X, Jin Y, Cen Y, Wang T, Tang B, Li Y (2022) Lightn: Light-weight transformer network for performance-overhead tradeoff in point cloud downsampling. CoRR, vol abs/2202.06263
Wang X, Jin Y, Li C, Cen Y, Li Y (2022) Vsln: View-aware sphere learning network for cross-view vehicle re-identification. Int J Intell Syst, pp 1–21
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph cnn for learning on point clouds. Acm Trans Graphics (tog), pp 1–12
Wen X, Han Z, Youk G, Liu Y-S (2020) Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self-attention. In: Proceedings of the 28th ACM international conference on multimedia, pp 1661–1669
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920
Xiang P, Wen X, Liu Y-S, Cao Y-P, Wan P, Zheng W, Han Z (2021) Snowflakenet: Point cloud completion by snowflake point deconvolution with skip-transformer. arXiv preprint: arXiv:2108.04444
Xie S, Liu S, Chen Z, Tu Z (2018) Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4606–4615
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst
Yang B, Luo W, Urtasun R (2018) Pixor: real-time 3d object detection from point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7652–7660
Yang J, Zhang Q, Ni B, Li L, Liu J, Zhou M, Tian Q (2019) Modeling point clouds with self-attention and gumbel subset sampling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3323–3332
Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5589–5598
Yi L, Kim VG, Ceylan D, Shen I-C, Yan M, Su H, Lu C, Huang Q, Sheffer A, Guibas L (2016) A scalable active framework for region annotation in 3d shape collections. ACM Trans Graphics (ToG), pp 1–12
Yue K, Sun M, Yuan Y, Zhou F, Ding E, Xu F (2018) Compact generalized non-local network. In: Annual conference on neural information processing systems, pp 6511–6520
Zhang Y, Jin Y, Chen J, Kan S, Cen Y, Cao Q (2020) PGAN: part-based nondirect coupling embedded GAN for person reidentification. IEEE Multim 27(3):23–33
Zhao H, Jia J, Koltun V (2020) Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10076–10085
Zhao H, Jiang L, Fu C-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5565–5573
Zhao H, Jiang L, Jia J, Torr P, Koltun V (2020) Point transformer. arXiv preprint: arXiv:2012.09164
Funding
This work was supported by the National Natural Science Foundation of China under Grant No.61972030.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by XW, YJ, YZ, YC, BL and SW. The first draft of the manuscript was written by XW and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, X., Zeng, Y., Jin, Y. et al. RS-TNet: point cloud transformer with relation-shape awareness for fine-grained 3D visual processing. Soft Comput 27, 1005–1013 (2023). https://doi.org/10.1007/s00500-022-07543-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07543-5