Abstract
Outstanding effectiveness of transformers in visual tasks has resulted in its fast growth and adoption in three dimensions (3D) vision tasks. Vision transformers have shown numerous advantages over earlier convolutional neural network (CNN) architectures including broad modelling abilities, more substantial modelling capabilities, convolution complementarity, scalability to model data size, and better connection for enhancing the performance records of many visual tasks. We present thorough review that classifies and summarizes the popular transformer-based approaches based on key features for transformer integration such as the input data, scalability element that enables transformer processing, architectural design, and context level through which the transformer functions as well as a highlight of the primary contributions of each transformer approach. Furthermore, we compare the results of these techniques with commonly employed non-transformer techniques in 3D object classification, segmentation, and object detection using standard 3D datasets including ModelNet, SUN RGB-D, ScanNet, nuScenes, Waymo, ShapeNet, S3DIS, and KITTI. This study also includes the discussion of numerous potential future options and limitation for 3D vision transformers.
Similar content being viewed by others
References
Zhou Y, Tuzel O (2018) VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp 4490-4499, https://doi.org/10.1109/CVPR.2018.00472
Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and Multi-view CNNs for Object Classification on 3D Data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp 5648-5656, https://doi.org/10.1109/CVPR.2016.609
Wang Z, Lu F (2020) VoxSegNet: volumetric CNNs for semantic part segmentation of 3D shapes. IEEE Trans Vis Comput Graph 26(9):2919–2930. https://doi.org/10.1109/TVCG.2019.2896310
Shi S et al (2020) PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 10526-10535, https://doi.org/10.1109/CVPR42600.2020.01054
Wang C, Samari B, Siddiqi K (2018) Local Spectral Graph Convolution for Point Set Feature Learning. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision ? ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11208. Springer, Cham. https://doi.org/10.1007/978-3-030-01225-0-4
Chen C, Li G, Xu R, Chen T, Wang M, Lin L (2019) ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 4989-4997 https://doi.org/10.1109/CVPR.2019.00513
Lan S, Yu R, Yu G, Davis LS (2019) Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 998-1008, https://doi.org/10.1109/CVPR.2019.00109
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010
Liu Z et al (2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986
Dosovitskiy A et al (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale?. In: Proceedings of International Conference on Learning Representation, pp 1–12
Guo MH, Cai JX, Liu ZN et al (2021) PCT: point cloud transformer. Comput Vis Media 7:187–199. https://doi.org/10.1007/s41095-021-0229-5
Engel Nico, Belagiannis Vasileios, Dietmayer Klaus C. J (2020) Point transformer. IEEE Access 9:134826–134840
Zhao H, Jiang L, Jia J, Torr P, Koltun V (2022) Point transformer. In: 16239–48. IEEE
Yan X, Zheng C, Li Z, Wang S, Cui S (2020) PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 5588-5597, https://doi.org/10.1109/CVPR42600.2020.00563.
Wang H, Shi C, Shi S, Lei M, Wang S, He D, Schiele B, Wang L (2023) DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Liu YH, Tian B, Lv YS, Li LX, Wang F-Y (2023) Point cloud classification using content-based Transformer via clustering in feature space. IEEE/CAA J Autom Sin 10(8):1714–722. https://doi.org/10.1109/JAS.2023.123432
Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M (2021) Deep learning for 3D point clouds: a survey. IEEE Trans Pattern Anal Mach Intell 43(12):4338–4364. https://doi.org/10.1109/TPAMI.2020.3005434
Ioannidou A, Chatzilari E, Nikolopoulos S, Kompatsiaris I (2017) Deep learning advances in computer vision with 3D data: a survey. ACM Comput Surv 50, 2, Article 20 (2018), 38 pages. https://doi.org/10.1145/3042064
Gezawa AS, Zhang Y, Wang Q, Yunqi L (2020) A review on deep learning approaches for 3D data representations in retrieval and classifications. IEEE Access 8:57566–57593. https://doi.org/10.1109/ACCESS.2020.2982196
He Y, Yu H, Liu X, Yang Z, Sun W, Wang Y, Fu Q, Zou Y, Mian AS (2021) Deep learning based 3D segmentation: a survey. ArXiv, abs/2103.05423
Xie Y, Tian J, Zhu XX (2020) Linking points with labels in 3D: a review of point cloud semantic segmentation. IEEE Geosci Remote Sens Mag 8(4):38–59. https://doi.org/10.1109/MGRS.2019.2937630
Griffiths D, Boehm J (2019) A review on deep learning techniques for 3D sensed data classification. Remote Sens 11:1499. https://doi.org/10.3390/rs11121499
Fernandes D, Silva A, Nevoa R, Simoes C, Gonzalez D, Guevara M, Novais P, Monteiro J, Melo-Pinto P (2021) Point cloud based 3D object detection and classification methods for self-driving applications: a survey and taxonomy. Inf Fusion 68:161–191
Wu Y, Wang Y, Zhang S, Ogai H (2021) Deep 3D object detection networks using LiDAR data: a review. IEEE Sens J 21(2):1152–1171. https://doi.org/10.1109/JSEN.2020.3020626
Li Y, Ma L, Zhong Z, Liu F, Chapman MA, Cao D, Li J (2021) Deep learning for LiDAR point clouds in autonomous driving: a review. IEEE Trans Neural Netw Learn Syst 32(8):3412–3432. https://doi.org/10.1109/TNNLS.2020.3015992
Xiao YP, Lai YK, Zhang FL et al (2020) A survey on deep geometry learning: from a representation perspective. Comput Vis Media 6:113–133. https://doi.org/10.1007/s41095-020-0174-8
Ahmed E, Saint A, Shabayek AER, Cherenkova K, Das R, Gusev G, Aouada D, Ottersten B (2018) A Survey on Deep Learning Advances on Different 3D Data Representations? arXiv preprint arXiv:1808.01462
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: a survey. ACM Comput Surv 54, 10s, Article 200, 41 pages. https://doi.org/10.1145/3505244
Lu D, Xie Q, Wei M, Xu L, Li J (2022) Transformers in 3D point clouds: a survey. ArXiv, abs/2205.07417
Liu Y et al (2022) A survey of visual transformers. In: IEEE Transactions on Neural Networks and Learning Systems, https://doi.org/10.1109/TNNLS.2022.3227717
Han K et al (2023) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87–110. https://doi.org/10.1109/TPAMI.2022.3152247
Lahoud J, Cao J, Khan FS, Cholakkal H, Anwer Rao M, Khan S, Yang M (2022) 3D vision with transformers: a survey? ArXiv abs/2208.04309: n. pag
Tay Y, Dehghani M, Bahri D, Metzler D (2022) Efficient transformers: a survey. ACM Comput Surv 55(6):Article 109 (2023), 28 pages. https://doi.org/10.1145/3530811
Shamshad F, Khan S, Zamir SW, Khan MH, Hayat M, Khan FS, Fu H (2023) Transformers in medical imaging: a survey, medical image analysis 102802. ISSN 1361–8415. https://doi.org/10.1016/j.media.2023.102802
Gazdula J (2017) Review of Systematic Approaches to a Successful Literature Review (2nd Ed.), by A. Booth, A. Sutton and D. Papaioannou. Educationalfutures, [online] Vol. 8(2). Available at: https://educationstudies.org.uk/?p=7629 [Accessed 08 Jun, 2023]
Zhang C, Wan H, Liu S, Shen X, Wu Z (2021) PVT: Point-Voxel Transformer for 3D deep learning,? arXiv:2108.06076. [Online]. Available: http://arxiv.org/abs/2108.06076
Mao J et al (2021) Voxel transformer for 3D object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 3144-3153. https://doi.org/10.1109/ICCV48922.2021.00315
He C, Li R, Li S, Zhang L (2022) Voxel set transformer: a set-to-set approach to 3D object detection from point clouds. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 8407–8417. https://doi.org/10.1109/CVPR52688.2022.00823
Lai X et al (2022) Stratified transformer for 3D point cloud segmentation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 8490–8499. https://doi.org/10.1109/CVPR52688.2022.00831
Xu Y, Tong X, Stilla U (2021) Voxel-based representation of 3D point clouds: methods, applications, and its potential use in the construction industry. Autom Constr 126:103675. https://doi.org/10.1016/j.autcon.2021.103675
Graham B, Engelcke M, Maaten Lvd (2018) 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp 9224–9232. https://doi.org/10.1109/CVPR.2018.00961
Choy C, Gwak J, Savarese S (2019) 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 3070–3079. https://doi.org/10.1109/CVPR.2019.00319
Lee J et al (2019) Set transformer: a framework for attention-based permutation-invariant neural networks. In: Proceedings of Machine Learning Research, pp 3744–3753
Fan Z, Song Z, Liu H, Lu Z, He J, Du X (2022) SVT-net: super light-weight sparse voxel transformer for large scale place recognition. In: Proceedings of AAAI, pp 551–560
Park C, Jeong Y, Cho M, Park J (2022) Efficient Point Transformer for Large-Scale 3D Scene Understanding. [Online]. Available: https://openreview.net/forum?id=3SUToIxuIT3
Qi Charles R, Yi Li, Su Hao, Guibas Leonidas J (2017) PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 5105–5114
Charles RQ, Su H, Kaichun M, Guibas LJ (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp 77-85, https://doi.org/10.1109/CVPR.2017.16
Yu X, Tang L, Rao Y, Huang T, Zhou J, Lu J (2022) Point-BERT: Pre-Training 3D Point Cloud Transformers with Masked Point Modeling, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 19291-19300, https://doi.org/10.1109/CVPR52688.2022.01871
Lin J, Rickert M, Perzylo A, Knoll A (2021) PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, pp 5657-5663, https://doi.org/10.1109/IROS51168.2021.9636483
Chen G, Wang M, Zhang Q, Yuan L, Yue Y Full Transformer Framework for Robust Point Cloud Registration With Deep Information Interaction. In: IEEE Transactions on Neural Networks and Learning Systems, https://doi.org/10.1109/TNNLS.2023.3267333
Gao X-Y, Wang Y-Z, Zhang C-X, Lu J-Q (2021) Multi-head self-attention for 3D point cloud classification. IEEE Access 9:18137–18147. https://doi.org/10.1109/ACCESS.2021.3050488
Lu D, Xie Q, Gao K, Xu L, Li J (2022) 3DCTN: 3D convolution-transformer network for point cloud classification. IEEE Trans Intell Transport Syst 23(12):24854–24865. https://doi.org/10.1109/TITS.2022.3198836
Yu J et al (2021) 3D medical point transformer: introducing convolution to attention networks for medical point cloud analysis,? arXiv:2112.04863. [Online]. Available: http: //arxiv.org/abs/2112.04863
Han X-F, Jin Y-F, Cheng H-X, Xiao G-Q (2022) Dual transformer for point cloud analysis. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2022.3198318
Hui L, Yang H, Cheng M, Xie J, Yang J (2021) Pyramid point cloud transformer for large-scale place recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 6078–6087. https://doi.org/10.1109/ICCV48922.2021.00604
Thomas H, Qi CR, Deschaud J-E, Marcotegui B, Goulette F, Guibas L (2019) KPConv: Flexible and Deformable Convolution for Point Clouds, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 6410–6419. https://doi.org/10.1109/ICCV.2019.00651
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph 38, 5, Article 146, 12. https://doi.org/10.1145/3326362
Feng M, Zhang L, Lin X, Gilani SZ, Mian A (2020) Point attention network for semantic segmentation of 3D point clouds. Pattern Recogn 107:107446. https://doi.org/10.1016/j.patcog.2020.107446
Deng S, Liang Z, Sun L, Jia K (2022) Vista: boosting 3d object detection via dual cross-view spatial attention? In: CVPR, pp 8448–8457
Qiu S, Anwar S, Barnes N (2023) PU-Transformer: Point Cloud Upsampling Transformer. In: Wang L, Gall J, Chin TJ, Sato I, Chellappa R (eds) Computer Vision ? ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4-20
Han X-F, He Z-Y, Chen J, Xiao G-Q (2022) 3CROSSNet: cross-level cross-scale cross-attention network for point cloud representation. IEEE Robot Autom Lett 7(2):3718–3725. https://doi.org/10.1109/LRA.2022.3147907
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1, pp 4171–4186
Rolfe JT (2016) Discrete Variational Autoencoders,? arXiv:1609.02200. [Online]. Available: http://arxiv.org/abs/1609.02200
Gao Y, Liu X, Li J, Fang Z, Jiang X, Huq KMS (2023) LFT-Net: local feature transformer network for point clouds analysis. IEEE Trans Intell Transport Syst 24(2):2158–2168. https://doi.org/10.1109/TITS.2022.3140355
Pan X, Xia Z, Song S, Li L, Huang G (2021) 3D Object Detection with Pointformer. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 7459–7468. https://doi.org/10.1109/CVPR46437.2021.00738
Wu L, Liu X, Liu Q (2021) Centroid Transformers: Learning to Abstract with Attention,? arXiv:2102.08606. [Online]. Available: http://arxiv.org/abs/2102.08606
Wang Z, Wang Y, An L, Liu J, Liu H (2022) Local transformer network on 3D point cloud semantic segmentation. Information 13(4):198
Liu S, Fu K, Wang M, Song Z (2022) Group-in-group relation based transformer for 3D point cloud learning. Remote Sens 14(7):1563
Yang H, Wang W, Chen M, Lin B, Tong H, Hua C, Xiaofei H, Wanli O (2023) PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Qiu S, Anwar S, Barnes N (2022) Geometric back-projection network for point cloud classification. IEEE Trans Multimed 24:1943–1955. https://doi.org/10.1109/TMM.2021.3074240
Tancik M et al (2020) Fourier features let networks learn high frequency functions in low dimensional domains. In: Proceedings of Advance Neural Information Processing Systems, pp 7537–7547
Shenga H et al (2021) Improving 3D Object Detection with Channel-wise Transformer, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 2723-2732, https://doi.org/10.1109/ICCV48922.2021.00274
Xie S, Liu S, Chen Z, Tu Z (2018) Attentional ShapeContextNet for Point Cloud Recognition, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp 4606–4615, https://doi.org/10.1109/CVPR.2018.00484.
Salve SG, Jondhale KC (2010) Shape matching and object recognition using shape contexts. In: 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, pp 471–474. https://doi.org/10.1109/ICCSIT.2010.5565098
Zhao H, Jia J, Koltun V (2020) Exploring Self-Attention for Image Recognition, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 10073–10082. https://doi.org/10.1109/CVPR42600.2020.01009.
Armeni I et al (2016) 3D semantic parsing of large-scale indoor spaces. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp 1534–1543. https://doi.org/10.1109/CVPR.2016.170
He K, Chen X, Xie S, Li Y, Dollár P, Girshick R (2022) Masked autoencoders are scalable vision learners. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 15979-15988, https://doi.org/10.1109/CVPR52688.2022.01553
Carion N et al (2020) End-to-end object detection with transformers,? In Computer Vision ? ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28. https://doi.org/10.1007/978-3-030-58452-8-13
Qi CR, Litany O, He K, Guibas L (2019) Deep hough voting for 3D object detection in point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 9276–9285, https://doi.org/10.1109/ICCV.2019.00937
Xie Q, Lai YK, Wu J et al (2021) Vote-based 3D object detection with context modeling and SOB-3DNMS. Int J Comput Vis 129:1857–1874. https://doi.org/10.1007/s11263-021-01456-w
Yuan Z, Song X, Bai L, Wang Z, Ouyang W (2022) Temporal-channel transformer for 3D lidar-based video object detection for autonomous driving. IEEE Trans Circ Syst Video Technol 32(4):2068–2078. https://doi.org/10.1109/TCSVT.2021.3082763
Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp 2432–2443, https://doi.org/10.1109/CVPR.2017.261
Song S, Lichtenberg SP, Xiao J (2015) SUN RGB-D: a RGBD scene understanding benchmark suite. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition, pp 567–576
Chen X, Zhao H, Zhou G, Zhang Y-Q (2022) PQ-transformer: jointly parsing 3D objects and layouts from point clouds. IEEE Robot Autom Lett 7(2):2519–2526
Liu Z, Zhang Z, Cao Y, Hu H, Tong X (2021) Group-free 3D object detection via transformers. In: Proceedings of IEEE International Conference on Computer Vision, pp 2949–2958
Misra I, Girdhar R, Joulin A (2021) An End-to-End Transformer Model for 3D Object Detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 2886-2897. https://doi.org/10.1109/ICCV48922.2021.00290
Bai X et al. (2022) TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 1080-1089. https://doi.org/10.1109/CVPR52688.2022.00116
Wu Z et al (2015) 3D shapenets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp 1912–1920. https://doi.org/10.1109/CVPR.2015.7298801
Uy MA, Pham Q -H, Hua B -S, Nguyen T, Yeung S -K (2019) Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 1588–1597, https://doi.org/10.1109/ICCV.2019.00167
Armeni I, Sax S, Zamir AR, Savarese S (2017) Joint 2D-3D-semantic data for indoor scene understanding? arXiv preprint arXiv:1702.01105
Ma X, Qin C, You H, Ran H, Fu Y (2022) Rethinking network design and local geometry in point cloud: a simple residual MLP framework,? arXiv:2202.07123. [Online]. Available: http://arxiv.org/abs/2202.07123
Zhang Z, Sun B, Yang H, Huang Q (2020) H3DNet: 3D object detection using hybrid geometric primitives. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer Vision? ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12357. Springer, Cham. https://doi.org/10.1007/978-3-030-58610-2-19
Xie Q et al. (2020) MLCVNet: multi-level context VoteNet for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 10444–10453. https://doi.org/10.1109/CVPR42600.2020.01046
Chen Y, Ma H, Li X, Luo X (2021) S-VoteNet: deep hough voting with spherical proposal for 3D object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, pp 5161–5167. https://doi.org/10.1109/ICPR48806.2021.9412401
Wang Y et al (2022) Bridged transformer for vision and point cloud 3D object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 12104–2113, https://doi.org/10.1109/CVPR52688.2022.01180
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, pp 3354–361, https://doi.org/10.1109/CVPR.2012.6248074
Sun P et al (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang Y, Lu J, Zhou J (2021) Objects are Different: Flexible Monocular 3D Object Detection, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 3288-3297, https://doi.org/10.1109/CVPR46437.2021.00330
Zhang R, Qiu H, Wang T, Xu X, Guo Z, Qiao Y, Gao P, Li H (2022) Monodetr: Depth-Aware Transformer for Monocular 3D Object Detection. arXiv preprint arXiv:2203.13310
Hu JK, Kuai T, Waslander S (2022) Point Density-Aware Voxels for LiDAR 3D Object Detection,. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 8459-8468. https://doi.org/10.1109/CVPR52688.2022.00828
Caesar H et al (2020) nuScenes: A Multimodal Dataset for Autonomous Driving. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 11618-11628. https://doi.org/10.1109/CVPR42600.2020.01164
Nekrasov A, Schult J, Litany O, Leibe B, Engelmann F (2021) Mix3D: Out-of-Context Data Augmentation for 3D Scenes, in 2021 International Conference on 3D Vision (3DV), London, United Kingdom, pp 116–25. https://doi.org/10.1109/3DV53792.2021.00022
Qian G, Li Y, Peng H, Mai J, Hammoud HAAK, Elhoseiny M, Ghanem B (2022) Pointnext: Revisiting Pointnet++ with Improved Training and Scaling Strategies,? arXiv preprint arXiv:2206.04670
Yu X et al(2021) PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 12478-12487. https://doi.org/10.1109/ICCV48922.2021.01227
Xiang P et al (2023) Snowflake point deconvolution for point cloud completion and generation with skip-transformer in IEEE transactions on pattern analysis & machine. Intelligence 45(05):6320–6338. https://doi.org/10.1109/TPAMI.2022.3217161
Yan X et al (2022) ShapeFormer: Transformer-based Shape Completion via Sparse Representation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 6229-6239. https://doi.org/10.1109/CVPR52688.2022.00614
Xu X, Geng G, Cao X, Li K, Zhou M (2022) TDNet: transformer-based network for point cloud denoising. Appl Opt 61(6):C80–C88
Gao R, Li M, Yang S-J, Cho K (2022) Reflective noise filtering of large-scale point cloud using transformer. Remote Sens 14(3):577
Wang X, Jin Y, Cen Y, Wang T, Tang B, Li Y (2022) LighTN: Light-Weight Transformer Network for Performance-Overhead Trade-off in Point Cloud Downsampling,? arXiv:2202.06263, [Online]. Available: http://arxiv.org/abs/2202.06263
Wang Y, Solomon J (2019) Deep Closest Point: Learning Representations for Point Cloud Registration. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 3522-3531. https://doi.org/10.1109/ICCV.2019.00362
Wang Y, Yan C, Feng Y, Du S, Dai Q, Gao Y (2023) STORM: structure-based overlap matching for partial point cloud registration. IEEE Trans Pattern Anal Mach Intell 45(1):1135–1149. https://doi.org/10.1109/TPAMI.2022.3148308
Fischer K et al (2021) StickyPillars: Robust and Efficient Feature Matching on Point Clouds using Graph Neural Networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 313-323. https://doi.org/10.1109/CVPR46437.2021.00038
Cui Y, Fang Z, Shan J, Gu Z, Zhou S (2021) 3D Object Tracking with Transformer. Proceedings of British Machine Vision Conference, p 317
Zhou C et al (2022) PTTR: relational 3D point cloud object tracking with transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 8521–8530. https://doi.org/10.1109/CVPR52688.2022.00834
Jiayao S, Zhou S, Cui Y, Fang Z (2022) Real-time 3D single object tracking with transformer. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2022.3146714
Fan H, Yang Y, Kankanhalli M (2021) Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 14199–14208. https://doi.org/10.1109/CVPR46437.2021.01398.
Xu G, Cao H, Wan J, Xu K, Ma Y, Zhang C (2021) Adaptive Channel Encoding Transformer for Point Cloud Analysis. arXiv:2112.02507. [Online]. Available: http://arxiv.org/abs/2112.02507
Zhao H, Jiang L, Fu C, Jia J (2019) PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 5560-5568. https://doi.org/10.1109/CVPR.2019.00571
Xu Y, Fan T, Xu M, Zeng L, Qiao Y (2018) SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. In: European Conference on Computer Vision, pp 87–102
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) PointCNN: Convolution on X-Transformed Points. In: Proceedings of Advance Neural Information Processing Systems, pp 828–838
Wu W, Qi Z, Fuxin L (2019) PointConv: Deep Convolutional Networks on 3D Point Clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 9613-9622. https://doi.org/10.1109/CVPR.2019.00985
Lin Y et al (2020) FPConv: Learning Local Flattening for Point Convolution. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp 4292–4301. https://doi.org/10.1109/CVPR42600.2020.00435
Liu X, Han Z, Liu Y-S, Zwicker M Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-Based Sequence to Sequence Network. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI’19/IAAI’19/EAAI’19). AAAI Press, Article 1077, 8778?8785. https://doi.org/10.1609/aaai.v33i01.33018778
Mao J, Wang X, Li H (2019) Interpolated Convolutional Networks for 3D Point Cloud Understanding. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 1578-1587. https://doi.org/10.1109/ICCV.2019.00166
Zhang Z, Hua B, Yeung S (2019) ShellNet: Efficient Point Cloud Convolutional Neural Networks Using Concentric Shells Statistics. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp 1607-1616. https://doi.org/10.1109/ICCV.2019.00169
Lee D et al. (2021) Regularization Strategy for Point Cloud via Rigidly Mixed Sample, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 15895–15904. https://doi.org/10.1109/CVPR46437.2021.01564
Xu M, Ding R, Zhao H, Qi X (2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 3172-3181, https://doi.org/10.1109/CVPR46437.2021.00319
Ran H, Zhuo W, Liu J, Lu L (2021) Learning Inner-Group Relations on Point Clouds. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 15457-15467. https://doi.org/10.1109/ICCV48922.2021.01519
Xiang T, Zhang C, Song Y, Yu J, Cai W (2021) Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp 895-904. https://doi.org/10.1109/ICCV48922.2021.00095
Yang J et al. (2019) Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 3318-3327, https://doi.org/10.1109/CVPR.2019.00344
Cheng Z, Wan H, Shen X, Wu Z (2021) Patchformer: A Versatile 3D Transformer Based on Patch Attention,? arXiv:2111.00207, [Online]. Available: http://arxiv.org/abs/2111.00207
Han X-F, Kuang Y-J, Xiao G-Q (2021) Point Cloud Learning with Transformer,? arXiv:2104.13636. [Online]. Available: http://arxiv.org/abs/2104.13636
Park C, Jeong Y, Cho M, Park J (2022) Fast Point Transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp 16928-16937. https://doi.org/10.1109/CVPR52688.2022.01644
Lan Y, Duan Y, Liu C, Zhu C, Xiong Y, Huang H, Xu K (2022) Arm3D: Attention-Based Relation Module for Indoor 3D Object Detection,? Computational Visual Media, pp 1–20
Maturana D, Scherer S (2015) VoxNet: A 3D Convolutional Neural Network for real-time object recognition, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, pp 922-928, https://doi.org/10.1109/IROS.2015.7353481
Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: Fast encoders for object detection from point clouds. In: CVPR, pp 12 697–12 705
Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector?. In: CVPR, pp 11 040–11 048
Zhu B, Jiang Z, Zhou X, Li Z, Yu G (2019) Class-balanced grouping and sampling for point cloud 3d object detection,? arXiv preprint arXiv:1908.09492
Yin T, Zhou X, uhl Philipp K (2021) Center- based 3d object detection and tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Bewley A, Sun P, Mensink T, Anguelov D, Sminchisescu C (2020) Range conditioned dilated convolutions for scale invariant 3d object detection. arXiv preprint arXiv:2005.09927
Ngiam J, Caine B, Han W, Yang B, Chai Y, Sun P, Zhou Y, Yi X, Alsharif O, Nguyen P et al (2019) Starnet: targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069
Acknowledgements
This work is supported by Fujian Province University Key Lab for the Analysis and Application of Industry Big Data, Fujian Key Lab of Agriculture IOT Application, and IOT Application Engineering Research Center of Fujian Province Colleges and Universities.
Author information
Authors and Affiliations
Contributions
Conceptualization, ASG, HC, NURJ and CL; validation, HC and CL; investigation, ASG, HC, and CL; writing-original draft preparation, ASG, NURJ; writing-review and editing, ASG; visualization, ASG, HC, NURJ and CL; supervision, HC, and CL All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gezawa, A.S., Liu, C., Junejo, N.U.R. et al. The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review. Arch Computat Methods Eng (2024). https://doi.org/10.1007/s11831-024-10108-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11831-024-10108-4