Skip to main content
Log in

The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review

  • Review article
  • Published:
Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Abstract

Outstanding effectiveness of transformers in visual tasks has resulted in its fast growth and adoption in three dimensions (3D) vision tasks. Vision transformers have shown numerous advantages over earlier convolutional neural network (CNN) architectures including broad modelling abilities, more substantial modelling capabilities, convolution complementarity, scalability to model data size, and better connection for enhancing the performance records of many visual tasks. We present thorough review that classifies and summarizes the popular transformer-based approaches based on key features for transformer integration such as the input data, scalability element that enables transformer processing, architectural design, and context level through which the transformer functions as well as a highlight of the primary contributions of each transformer approach. Furthermore, we compare the results of these techniques with commonly employed non-transformer techniques in 3D object classification, segmentation, and object detection using standard 3D datasets including ModelNet, SUN RGB-D, ScanNet, nuScenes, Waymo, ShapeNet, S3DIS, and KITTI. This study also includes the discussion of numerous potential future options and limitation for 3D vision transformers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Zhou Y, Tuzel O (2018) VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp 4490-4499, https://doi.org/10.1109/CVPR.2018.00472

  2. Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and Multi-view CNNs for Object Classification on 3D Data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp 5648-5656, https://doi.org/10.1109/CVPR.2016.609

  3. Wang Z, Lu F (2020) VoxSegNet: volumetric CNNs for semantic part segmentation of 3D shapes. IEEE Trans Vis Comput Graph 26(9):2919–2930. https://doi.org/10.1109/TVCG.2019.2896310

    Article  Google Scholar 

  4. Shi S et al (2020) PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 10526-10535, https://doi.org/10.1109/CVPR42600.2020.01054

  5. Wang C, Samari B, Siddiqi K (2018) Local Spectral Graph Convolution for Point Set Feature Learning. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision ? ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11208. Springer, Cham. https://doi.org/10.1007/978-3-030-01225-0-4

  6. Chen C, Li G, Xu R, Chen T, Wang M, Lin L (2019) ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 4989-4997 https://doi.org/10.1109/CVPR.2019.00513

  7. Lan S, Yu R, Yu G, Davis LS (2019) Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 998-1008, https://doi.org/10.1109/CVPR.2019.00109

  8. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010

  9. Liu Z et al (2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986

  10. Dosovitskiy A et al (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale?. In: Proceedings of International Conference on Learning Representation, pp 1–12

  11. Guo MH, Cai JX, Liu ZN et al (2021) PCT: point cloud transformer. Comput Vis Media 7:187–199. https://doi.org/10.1007/s41095-021-0229-5

    Article  Google Scholar 

  12. Engel Nico, Belagiannis Vasileios, Dietmayer Klaus C. J (2020) Point transformer. IEEE Access 9:134826–134840

    Article  Google Scholar 

  13. Zhao H, Jiang L, Jia J, Torr P, Koltun V (2022) Point transformer. In: 16239–48. IEEE

  14. Yan X, Zheng C, Li Z, Wang S, Cui S (2020) PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 5588-5597, https://doi.org/10.1109/CVPR42600.2020.00563.

  15. Wang H, Shi C, Shi S, Lei M, Wang S, He D, Schiele B, Wang L (2023) DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  16. Liu YH, Tian B, Lv YS, Li LX, Wang F-Y (2023) Point cloud classification using content-based Transformer via clustering in feature space. IEEE/CAA J Autom Sin 10(8):1714–722. https://doi.org/10.1109/JAS.2023.123432

    Article  Google Scholar 

  17. Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M (2021) Deep learning for 3D point clouds: a survey. IEEE Trans Pattern Anal Mach Intell 43(12):4338–4364. https://doi.org/10.1109/TPAMI.2020.3005434

    Article  Google Scholar 

  18. Ioannidou A, Chatzilari E, Nikolopoulos S, Kompatsiaris I (2017) Deep learning advances in computer vision with 3D data: a survey. ACM Comput Surv 50, 2, Article 20 (2018), 38 pages. https://doi.org/10.1145/3042064

  19. Gezawa AS, Zhang Y, Wang Q, Yunqi L (2020) A review on deep learning approaches for 3D data representations in retrieval and classifications. IEEE Access 8:57566–57593. https://doi.org/10.1109/ACCESS.2020.2982196

    Article  Google Scholar 

  20. He Y, Yu H, Liu X, Yang Z, Sun W, Wang Y, Fu Q, Zou Y, Mian AS (2021) Deep learning based 3D segmentation: a survey. ArXiv, abs/2103.05423

  21. Xie Y, Tian J, Zhu XX (2020) Linking points with labels in 3D: a review of point cloud semantic segmentation. IEEE Geosci Remote Sens Mag 8(4):38–59. https://doi.org/10.1109/MGRS.2019.2937630

    Article  Google Scholar 

  22. Griffiths D, Boehm J (2019) A review on deep learning techniques for 3D sensed data classification. Remote Sens 11:1499. https://doi.org/10.3390/rs11121499

    Article  Google Scholar 

  23. Fernandes D, Silva A, Nevoa R, Simoes C, Gonzalez D, Guevara M, Novais P, Monteiro J, Melo-Pinto P (2021) Point cloud based 3D object detection and classification methods for self-driving applications: a survey and taxonomy. Inf Fusion 68:161–191

    Article  Google Scholar 

  24. Wu Y, Wang Y, Zhang S, Ogai H (2021) Deep 3D object detection networks using LiDAR data: a review. IEEE Sens J 21(2):1152–1171. https://doi.org/10.1109/JSEN.2020.3020626

    Article  Google Scholar 

  25. Li Y, Ma L, Zhong Z, Liu F, Chapman MA, Cao D, Li J (2021) Deep learning for LiDAR point clouds in autonomous driving: a review. IEEE Trans Neural Netw Learn Syst 32(8):3412–3432. https://doi.org/10.1109/TNNLS.2020.3015992

    Article  Google Scholar 

  26. Xiao YP, Lai YK, Zhang FL et al (2020) A survey on deep geometry learning: from a representation perspective. Comput Vis Media 6:113–133. https://doi.org/10.1007/s41095-020-0174-8

    Article  Google Scholar 

  27. Ahmed E, Saint A, Shabayek AER, Cherenkova K, Das R, Gusev G, Aouada D, Ottersten B (2018) A Survey on Deep Learning Advances on Different 3D Data Representations? arXiv preprint arXiv:1808.01462

  28. Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: a survey. ACM Comput Surv 54, 10s, Article 200, 41 pages. https://doi.org/10.1145/3505244

  29. Lu D, Xie Q, Wei M, Xu L, Li J (2022) Transformers in 3D point clouds: a survey. ArXiv, abs/2205.07417

  30. Liu Y et al (2022) A survey of visual transformers. In: IEEE Transactions on Neural Networks and Learning Systems, https://doi.org/10.1109/TNNLS.2022.3227717

  31. Han K et al (2023) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87–110. https://doi.org/10.1109/TPAMI.2022.3152247

    Article  Google Scholar 

  32. Lahoud J, Cao J, Khan FS, Cholakkal H, Anwer Rao M, Khan S, Yang M (2022) 3D vision with transformers: a survey? ArXiv abs/2208.04309: n. pag

  33. Tay Y, Dehghani M, Bahri D, Metzler D (2022) Efficient transformers: a survey. ACM Comput Surv 55(6):Article 109 (2023), 28 pages. https://doi.org/10.1145/3530811

  34. Shamshad F, Khan S, Zamir SW, Khan MH, Hayat M, Khan FS, Fu H (2023) Transformers in medical imaging: a survey, medical image analysis 102802. ISSN 1361–8415. https://doi.org/10.1016/j.media.2023.102802

  35. Gazdula J (2017) Review of Systematic Approaches to a Successful Literature Review (2nd Ed.), by A. Booth, A. Sutton and D. Papaioannou. Educationalfutures, [online] Vol. 8(2). Available at: https://educationstudies.org.uk/?p=7629 [Accessed 08 Jun, 2023]

  36. Zhang C, Wan H, Liu S, Shen X, Wu Z (2021) PVT: Point-Voxel Transformer for 3D deep learning,? arXiv:2108.06076. [Online]. Available: http://arxiv.org/abs/2108.06076

  37. Mao J et al (2021) Voxel transformer for 3D object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 3144-3153. https://doi.org/10.1109/ICCV48922.2021.00315

  38. He C, Li R, Li S, Zhang L (2022) Voxel set transformer: a set-to-set approach to 3D object detection from point clouds. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 8407–8417. https://doi.org/10.1109/CVPR52688.2022.00823

  39. Lai X et al (2022) Stratified transformer for 3D point cloud segmentation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 8490–8499. https://doi.org/10.1109/CVPR52688.2022.00831

  40. Xu Y, Tong X, Stilla U (2021) Voxel-based representation of 3D point clouds: methods, applications, and its potential use in the construction industry. Autom Constr 126:103675. https://doi.org/10.1016/j.autcon.2021.103675

    Article  Google Scholar 

  41. Graham B, Engelcke M, Maaten Lvd (2018) 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp 9224–9232. https://doi.org/10.1109/CVPR.2018.00961

  42. Choy C, Gwak J, Savarese S (2019) 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 3070–3079. https://doi.org/10.1109/CVPR.2019.00319

  43. Lee J et al (2019) Set transformer: a framework for attention-based permutation-invariant neural networks. In: Proceedings of Machine Learning Research, pp 3744–3753

  44. Fan Z, Song Z, Liu H, Lu Z, He J, Du X (2022) SVT-net: super light-weight sparse voxel transformer for large scale place recognition. In: Proceedings of AAAI, pp 551–560

  45. Park C, Jeong Y, Cho M, Park J (2022) Efficient Point Transformer for Large-Scale 3D Scene Understanding. [Online]. Available: https://openreview.net/forum?id=3SUToIxuIT3

  46. Qi Charles R, Yi Li, Su Hao, Guibas Leonidas J (2017) PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 5105–5114

  47. Charles RQ, Su H, Kaichun M, Guibas LJ (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp 77-85, https://doi.org/10.1109/CVPR.2017.16

  48. Yu X, Tang L, Rao Y, Huang T, Zhou J, Lu J (2022) Point-BERT: Pre-Training 3D Point Cloud Transformers with Masked Point Modeling, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 19291-19300, https://doi.org/10.1109/CVPR52688.2022.01871

  49. Lin J, Rickert M, Perzylo A, Knoll A (2021) PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, pp 5657-5663, https://doi.org/10.1109/IROS51168.2021.9636483

  50. Chen G, Wang M, Zhang Q, Yuan L, Yue Y Full Transformer Framework for Robust Point Cloud Registration With Deep Information Interaction. In: IEEE Transactions on Neural Networks and Learning Systems, https://doi.org/10.1109/TNNLS.2023.3267333

  51. Gao X-Y, Wang Y-Z, Zhang C-X, Lu J-Q (2021) Multi-head self-attention for 3D point cloud classification. IEEE Access 9:18137–18147. https://doi.org/10.1109/ACCESS.2021.3050488

    Article  Google Scholar 

  52. Lu D, Xie Q, Gao K, Xu L, Li J (2022) 3DCTN: 3D convolution-transformer network for point cloud classification. IEEE Trans Intell Transport Syst 23(12):24854–24865. https://doi.org/10.1109/TITS.2022.3198836

    Article  Google Scholar 

  53. Yu J et al (2021) 3D medical point transformer: introducing convolution to attention networks for medical point cloud analysis,? arXiv:2112.04863. [Online]. Available: http: //arxiv.org/abs/2112.04863

  54. Han X-F, Jin Y-F, Cheng H-X, Xiao G-Q (2022) Dual transformer for point cloud analysis. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2022.3198318

    Article  Google Scholar 

  55. Hui L, Yang H, Cheng M, Xie J, Yang J (2021) Pyramid point cloud transformer for large-scale place recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 6078–6087. https://doi.org/10.1109/ICCV48922.2021.00604

  56. Thomas H, Qi CR, Deschaud J-E, Marcotegui B, Goulette F, Guibas L (2019) KPConv: Flexible and Deformable Convolution for Point Clouds, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 6410–6419. https://doi.org/10.1109/ICCV.2019.00651

  57. Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph 38, 5, Article 146, 12. https://doi.org/10.1145/3326362

  58. Feng M, Zhang L, Lin X, Gilani SZ, Mian A (2020) Point attention network for semantic segmentation of 3D point clouds. Pattern Recogn 107:107446. https://doi.org/10.1016/j.patcog.2020.107446

    Article  Google Scholar 

  59. Deng S, Liang Z, Sun L, Jia K (2022) Vista: boosting 3d object detection via dual cross-view spatial attention? In: CVPR, pp 8448–8457

  60. Qiu S, Anwar S, Barnes N (2023) PU-Transformer: Point Cloud Upsampling Transformer. In: Wang L, Gall J, Chin TJ, Sato I, Chellappa R (eds) Computer Vision ? ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4-20

  61. Han X-F, He Z-Y, Chen J, Xiao G-Q (2022) 3CROSSNet: cross-level cross-scale cross-attention network for point cloud representation. IEEE Robot Autom Lett 7(2):3718–3725. https://doi.org/10.1109/LRA.2022.3147907

    Article  Google Scholar 

  62. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1, pp 4171–4186

  63. Rolfe JT (2016) Discrete Variational Autoencoders,? arXiv:1609.02200. [Online]. Available: http://arxiv.org/abs/1609.02200

  64. Gao Y, Liu X, Li J, Fang Z, Jiang X, Huq KMS (2023) LFT-Net: local feature transformer network for point clouds analysis. IEEE Trans Intell Transport Syst 24(2):2158–2168. https://doi.org/10.1109/TITS.2022.3140355

    Article  Google Scholar 

  65. Pan X, Xia Z, Song S, Li L, Huang G (2021) 3D Object Detection with Pointformer. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 7459–7468. https://doi.org/10.1109/CVPR46437.2021.00738

  66. Wu L, Liu X, Liu Q (2021) Centroid Transformers: Learning to Abstract with Attention,? arXiv:2102.08606. [Online]. Available: http://arxiv.org/abs/2102.08606

  67. Wang Z, Wang Y, An L, Liu J, Liu H (2022) Local transformer network on 3D point cloud semantic segmentation. Information 13(4):198

    Article  Google Scholar 

  68. Liu S, Fu K, Wang M, Song Z (2022) Group-in-group relation based transformer for 3D point cloud learning. Remote Sens 14(7):1563

    Article  Google Scholar 

  69. Yang H, Wang W, Chen M, Lin B, Tong H, Hua C, Xiaofei H, Wanli O (2023) PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  70. Qiu S, Anwar S, Barnes N (2022) Geometric back-projection network for point cloud classification. IEEE Trans Multimed 24:1943–1955. https://doi.org/10.1109/TMM.2021.3074240

    Article  Google Scholar 

  71. Tancik M et al (2020) Fourier features let networks learn high frequency functions in low dimensional domains. In: Proceedings of Advance Neural Information Processing Systems, pp 7537–7547

  72. Shenga H et al (2021) Improving 3D Object Detection with Channel-wise Transformer, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 2723-2732, https://doi.org/10.1109/ICCV48922.2021.00274

  73. Xie S, Liu S, Chen Z, Tu Z (2018) Attentional ShapeContextNet for Point Cloud Recognition, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp 4606–4615, https://doi.org/10.1109/CVPR.2018.00484.

  74. Salve SG, Jondhale KC (2010) Shape matching and object recognition using shape contexts. In: 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, pp 471–474. https://doi.org/10.1109/ICCSIT.2010.5565098

  75. Zhao H, Jia J, Koltun V (2020) Exploring Self-Attention for Image Recognition, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 10073–10082. https://doi.org/10.1109/CVPR42600.2020.01009.

  76. Armeni I et al (2016) 3D semantic parsing of large-scale indoor spaces. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp 1534–1543. https://doi.org/10.1109/CVPR.2016.170

  77. He K, Chen X, Xie S, Li Y, Dollár P, Girshick R (2022) Masked autoencoders are scalable vision learners. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 15979-15988, https://doi.org/10.1109/CVPR52688.2022.01553

  78. Carion N et al (2020) End-to-end object detection with transformers,? In Computer Vision ? ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28. https://doi.org/10.1007/978-3-030-58452-8-13

  79. Qi CR, Litany O, He K, Guibas L (2019) Deep hough voting for 3D object detection in point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 9276–9285, https://doi.org/10.1109/ICCV.2019.00937

  80. Xie Q, Lai YK, Wu J et al (2021) Vote-based 3D object detection with context modeling and SOB-3DNMS. Int J Comput Vis 129:1857–1874. https://doi.org/10.1007/s11263-021-01456-w

    Article  Google Scholar 

  81. Yuan Z, Song X, Bai L, Wang Z, Ouyang W (2022) Temporal-channel transformer for 3D lidar-based video object detection for autonomous driving. IEEE Trans Circ Syst Video Technol 32(4):2068–2078. https://doi.org/10.1109/TCSVT.2021.3082763

    Article  Google Scholar 

  82. Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp 2432–2443, https://doi.org/10.1109/CVPR.2017.261

  83. Song S, Lichtenberg SP, Xiao J (2015) SUN RGB-D: a RGBD scene understanding benchmark suite. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition, pp 567–576

  84. Chen X, Zhao H, Zhou G, Zhang Y-Q (2022) PQ-transformer: jointly parsing 3D objects and layouts from point clouds. IEEE Robot Autom Lett 7(2):2519–2526

    Article  Google Scholar 

  85. Liu Z, Zhang Z, Cao Y, Hu H, Tong X (2021) Group-free 3D object detection via transformers. In: Proceedings of IEEE International Conference on Computer Vision, pp 2949–2958

  86. Misra I, Girdhar R, Joulin A (2021) An End-to-End Transformer Model for 3D Object Detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 2886-2897. https://doi.org/10.1109/ICCV48922.2021.00290

  87. Bai X et al. (2022) TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 1080-1089. https://doi.org/10.1109/CVPR52688.2022.00116

  88. Wu Z et al (2015) 3D shapenets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp 1912–1920. https://doi.org/10.1109/CVPR.2015.7298801

  89. Uy MA, Pham Q -H, Hua B -S, Nguyen T, Yeung S -K (2019) Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 1588–1597, https://doi.org/10.1109/ICCV.2019.00167

  90. Armeni I, Sax S, Zamir AR, Savarese S (2017) Joint 2D-3D-semantic data for indoor scene understanding? arXiv preprint arXiv:1702.01105

  91. Ma X, Qin C, You H, Ran H, Fu Y (2022) Rethinking network design and local geometry in point cloud: a simple residual MLP framework,? arXiv:2202.07123. [Online]. Available: http://arxiv.org/abs/2202.07123

  92. Zhang Z, Sun B, Yang H, Huang Q (2020) H3DNet: 3D object detection using hybrid geometric primitives. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer Vision? ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12357. Springer, Cham. https://doi.org/10.1007/978-3-030-58610-2-19

  93. Xie Q et al. (2020) MLCVNet: multi-level context VoteNet for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 10444–10453. https://doi.org/10.1109/CVPR42600.2020.01046

  94. Chen Y, Ma H, Li X, Luo X (2021) S-VoteNet: deep hough voting with spherical proposal for 3D object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, pp 5161–5167. https://doi.org/10.1109/ICPR48806.2021.9412401

  95. Wang Y et al (2022) Bridged transformer for vision and point cloud 3D object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 12104–2113, https://doi.org/10.1109/CVPR52688.2022.01180

  96. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, pp 3354–361, https://doi.org/10.1109/CVPR.2012.6248074

  97. Sun P et al (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  98. Zhang Y, Lu J, Zhou J (2021) Objects are Different: Flexible Monocular 3D Object Detection, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 3288-3297, https://doi.org/10.1109/CVPR46437.2021.00330

  99. Zhang R, Qiu H, Wang T, Xu X, Guo Z, Qiao Y, Gao P, Li H (2022) Monodetr: Depth-Aware Transformer for Monocular 3D Object Detection. arXiv preprint arXiv:2203.13310

  100. Hu JK, Kuai T, Waslander S (2022) Point Density-Aware Voxels for LiDAR 3D Object Detection,. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 8459-8468. https://doi.org/10.1109/CVPR52688.2022.00828

  101. Caesar H et al (2020) nuScenes: A Multimodal Dataset for Autonomous Driving. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 11618-11628. https://doi.org/10.1109/CVPR42600.2020.01164

  102. Nekrasov A, Schult J, Litany O, Leibe B, Engelmann F (2021) Mix3D: Out-of-Context Data Augmentation for 3D Scenes, in 2021 International Conference on 3D Vision (3DV), London, United Kingdom, pp 116–25. https://doi.org/10.1109/3DV53792.2021.00022

  103. Qian G, Li Y, Peng H, Mai J, Hammoud HAAK, Elhoseiny M, Ghanem B (2022) Pointnext: Revisiting Pointnet++ with Improved Training and Scaling Strategies,? arXiv preprint arXiv:2206.04670

  104. Yu X et al(2021) PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 12478-12487. https://doi.org/10.1109/ICCV48922.2021.01227

  105. Xiang P et al (2023) Snowflake point deconvolution for point cloud completion and generation with skip-transformer in IEEE transactions on pattern analysis & machine. Intelligence 45(05):6320–6338. https://doi.org/10.1109/TPAMI.2022.3217161

    Article  Google Scholar 

  106. Yan X et al (2022) ShapeFormer: Transformer-based Shape Completion via Sparse Representation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 6229-6239. https://doi.org/10.1109/CVPR52688.2022.00614

  107. Xu X, Geng G, Cao X, Li K, Zhou M (2022) TDNet: transformer-based network for point cloud denoising. Appl Opt 61(6):C80–C88

    Article  Google Scholar 

  108. Gao R, Li M, Yang S-J, Cho K (2022) Reflective noise filtering of large-scale point cloud using transformer. Remote Sens 14(3):577

    Article  Google Scholar 

  109. Wang X, Jin Y, Cen Y, Wang T, Tang B, Li Y (2022) LighTN: Light-Weight Transformer Network for Performance-Overhead Trade-off in Point Cloud Downsampling,? arXiv:2202.06263, [Online]. Available: http://arxiv.org/abs/2202.06263

  110. Wang Y, Solomon J (2019) Deep Closest Point: Learning Representations for Point Cloud Registration. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 3522-3531. https://doi.org/10.1109/ICCV.2019.00362

  111. Wang Y, Yan C, Feng Y, Du S, Dai Q, Gao Y (2023) STORM: structure-based overlap matching for partial point cloud registration. IEEE Trans Pattern Anal Mach Intell 45(1):1135–1149. https://doi.org/10.1109/TPAMI.2022.3148308

    Article  Google Scholar 

  112. Fischer K et al (2021) StickyPillars: Robust and Efficient Feature Matching on Point Clouds using Graph Neural Networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 313-323. https://doi.org/10.1109/CVPR46437.2021.00038

  113. Cui Y, Fang Z, Shan J, Gu Z, Zhou S (2021) 3D Object Tracking with Transformer. Proceedings of British Machine Vision Conference, p 317

  114. Zhou C et al (2022) PTTR: relational 3D point cloud object tracking with transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 8521–8530. https://doi.org/10.1109/CVPR52688.2022.00834

  115. Jiayao S, Zhou S, Cui Y, Fang Z (2022) Real-time 3D single object tracking with transformer. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2022.3146714

    Article  Google Scholar 

  116. Fan H, Yang Y, Kankanhalli M (2021) Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 14199–14208. https://doi.org/10.1109/CVPR46437.2021.01398.

  117. Xu G, Cao H, Wan J, Xu K, Ma Y, Zhang C (2021) Adaptive Channel Encoding Transformer for Point Cloud Analysis. arXiv:2112.02507. [Online]. Available: http://arxiv.org/abs/2112.02507

  118. Zhao H, Jiang L, Fu C, Jia J (2019) PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 5560-5568. https://doi.org/10.1109/CVPR.2019.00571

  119. Xu Y, Fan T, Xu M, Zeng L, Qiao Y (2018) SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. In: European Conference on Computer Vision, pp 87–102

  120. Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) PointCNN: Convolution on X-Transformed Points. In: Proceedings of Advance Neural Information Processing Systems, pp 828–838

  121. Wu W, Qi Z, Fuxin L (2019) PointConv: Deep Convolutional Networks on 3D Point Clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 9613-9622. https://doi.org/10.1109/CVPR.2019.00985

  122. Lin Y et al (2020) FPConv: Learning Local Flattening for Point Convolution. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 4292–4301. https://doi.org/10.1109/CVPR42600.2020.00435

  123. Liu X, Han Z, Liu Y-S, Zwicker M Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-Based Sequence to Sequence Network. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI’19/IAAI’19/EAAI’19). AAAI Press, Article 1077, 8778?8785. https://doi.org/10.1609/aaai.v33i01.33018778

  124. Mao J, Wang X, Li H (2019) Interpolated Convolutional Networks for 3D Point Cloud Understanding. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 1578-1587. https://doi.org/10.1109/ICCV.2019.00166

  125. Zhang Z, Hua B, Yeung S (2019) ShellNet: Efficient Point Cloud Convolutional Neural Networks Using Concentric Shells Statistics. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 1607-1616. https://doi.org/10.1109/ICCV.2019.00169

  126. Lee D et al. (2021) Regularization Strategy for Point Cloud via Rigidly Mixed Sample, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 15895–15904. https://doi.org/10.1109/CVPR46437.2021.01564

  127. Xu M, Ding R, Zhao H, Qi X (2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 3172-3181, https://doi.org/10.1109/CVPR46437.2021.00319

  128. Ran H, Zhuo W, Liu J, Lu L (2021) Learning Inner-Group Relations on Point Clouds. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 15457-15467. https://doi.org/10.1109/ICCV48922.2021.01519

  129. Xiang T, Zhang C, Song Y, Yu J, Cai W (2021) Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 895-904. https://doi.org/10.1109/ICCV48922.2021.00095

  130. Yang J et al. (2019) Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 3318-3327, https://doi.org/10.1109/CVPR.2019.00344

  131. Cheng Z, Wan H, Shen X, Wu Z (2021) Patchformer: A Versatile 3D Transformer Based on Patch Attention,? arXiv:2111.00207, [Online]. Available: http://arxiv.org/abs/2111.00207

  132. Han X-F, Kuang Y-J, Xiao G-Q (2021) Point Cloud Learning with Transformer,? arXiv:2104.13636. [Online]. Available: http://arxiv.org/abs/2104.13636

  133. Park C, Jeong Y, Cho M, Park J (2022) Fast Point Transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 16928-16937. https://doi.org/10.1109/CVPR52688.2022.01644

  134. Lan Y, Duan Y, Liu C, Zhu C, Xiong Y, Huang H, Xu K (2022) Arm3D: Attention-Based Relation Module for Indoor 3D Object Detection,? Computational Visual Media, pp 1–20

  135. Maturana D, Scherer S (2015) VoxNet: A 3D Convolutional Neural Network for real-time object recognition, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, pp 922-928, https://doi.org/10.1109/IROS.2015.7353481

  136. Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: Fast encoders for object detection from point clouds. In: CVPR, pp 12 697–12 705

  137. Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector?. In: CVPR, pp 11 040–11 048

  138. Zhu B, Jiang Z, Zhou X, Li Z, Yu G (2019) Class-balanced grouping and sampling for point cloud 3d object detection,? arXiv preprint arXiv:1908.09492

  139. Yin T, Zhou X, uhl Philipp K (2021) Center- based 3d object detection and tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  140. Bewley A, Sun P, Mensink T, Anguelov D, Sminchisescu C (2020) Range conditioned dilated convolutions for scale invariant 3d object detection. arXiv preprint arXiv:2005.09927

  141. Ngiam J, Caine B, Han W, Yang B, Chai Y, Sun P, Zhou Y, Yi X, Alsharif O, Nguyen P et al (2019) Starnet: targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069

Download references

Acknowledgements

This work is supported by Fujian Province University Key Lab for the Analysis and Application of Industry Big Data, Fujian Key Lab of Agriculture IOT Application, and IOT Application Engineering Research Center of Fujian Province Colleges and Universities.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, ASG, HC, NURJ and CL; validation, HC and CL; investigation, ASG, HC, and CL; writing-original draft preparation, ASG, NURJ; writing-review and editing, ASG; visualization, ASG, HC, NURJ and CL; supervision, HC, and CL All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Chibiao Liu or Haruna Chiroma.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gezawa, A.S., Liu, C., Junejo, N.U.R. et al. The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review. Arch Computat Methods Eng (2024). https://doi.org/10.1007/s11831-024-10108-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11831-024-10108-4

Navigation