Convolutional Point Transformer

Kaul, Chaitanya; Mitton, Joshua; Dai, Hang; Murray-Smith, Roderick

doi:10.1007/978-3-031-27066-6_22

Chaitanya Kaul¹⁰,
Joshua Mitton¹⁰,
Hang Dai¹¹ &
…
Roderick Murray-Smith¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13848))

Included in the following conference series:

Asian Conference on Computer Vision

251 Accesses

Abstract

We present CpT: Convolutional point Transformer – a novel neural network layer for dealing with the unstructured nature of 3D point cloud data. CpT is an improvement over existing MLP and convolution layers for point cloud processing, as well as existing 3D point cloud processing transformer layers. It achieves this feat due to its effectiveness in creating a novel and robust attention-based point set embedding through a convolutional projection layer crafted for processing dynamically local point set neighbourhoods. The resultant point set embedding is robust to the permutations of the input points. Our novel layer builds over local neighbourhoods of points obtained via a dynamic graph computation at each layer of the network’s structure. It is fully differentiable and can be stacked just like convolutional layers to learn intrinsic properties of the points. Further, we propose a novel Adaptive Global Feature layer that learns to aggregate features from different representations into a better global representation of the point cloud. We evaluate our models on standard benchmark ModelNet40 classification and ShapeNet part segmentation datasets to show that our layer can serve as an effective addition for various point cloud processing tasks while effortlessly integrating into existing point cloud processing architectures to provide significant performance boosts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 1971–1980 (2019)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Google Scholar
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Mu, T.-J., Martin, R.R., Hu, S.-M.: PCT: point cloud transformer. Compu. Visual Media 7(2), 187–199 (2021). https://doi.org/10.1007/s41095-021-0229-5
Article Google Scholar
Hassani, A., Walton, S., Shah, N., Abuduweili, A., Li, J., Shi, H.: escaping the big data paradigm with Compact Transformers. arXiv preprint arXiv:2104.05704 (2021)
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Iida, H., Thai, D., Manjunatha, V., Iyyer, M.: Tabbie: pretrained representations of tabular data. arXiv preprint arXiv:2105.02584 (2021)
Kaul, C., Manandhar, S., Pears, N.: FocusNet: an attention-based fully convolutional network for medical image segmentation. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 455–458. IEEE (2019)
Google Scholar
Kaul, C., Pears, N., Manandhar, S.: SAWNet: a spatially aware deep neural network for 3D point cloud processing. arXiv preprint arXiv:1905.07650 (2019)
Kaul, C., Pears, N., Manandhar, S.: FatNet: A feature-attentive network for 3D point cloud processing. In: 2020 25th International Conference on Pattern Recognition (ICPR). pp. 7211–7218. IEEE (2021)
Google Scholar
Klokov, R., Lempitsky, V.: Escape from cells: Deep KD-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)
Google Scholar
Landrieu, L., Simonovsky, M.: Large-scale point cloud semantic segmentation with superpoint graphs. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4558–4567 (2018). https://doi.org/10.1109/CVPR.2018.00479
Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., Teh, Y.W.: Set transformer: a framework for attention-based permutation-invariant neural networks. In: International Conference on Machine Learning, pp. 3744–3753. PMLR (2019)
Google Scholar
Li, J., Chen, B.M., Lee, G.H.: SO-Net: self-organizing network for point cloud analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9397–9406. IEEE (2018)
Google Scholar
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on \(\cal{X} \)-transformed points. Adv. Neural. Inf. Process. Syst. 31, 820–830 (2018)
Google Scholar
Liu, X., Han, Z., Liu, Y.S., Zwicker, M.: Point2sequence: learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8778–8785 (2019)
Google Scholar
Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8895–8904 (2019)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
Google Scholar
Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928 (2015). https://doi.org/10.1109/IROS.2015.7353481
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: 30th Proceedings Conference on Advances in Neural Information Processing Systems (2017)
Google Scholar
Rao, R.M., et al.: MSA transformer. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 8844–8856. PMLR (18–24 July 2021). https://proceedings.mlr.press/v139/rao21a.html
Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel squeeze & excitation in fully convolutional networks. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 421–429. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_48
Chapter Google Scholar
Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3693–3702 (2017)
Google Scholar
Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C.B., Goldstein, T.: Saint: improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342 (2021)
Su, H., et al.: SplatNet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539 (2018)
Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Google Scholar
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6411–6420 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, C., Samari, B., Siddiqi, K.: Local spectral graph convolution for point set feature learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 56–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_4
Chapter Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graphics 38(5), 1–12 (2019)
Article Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Wu, H., et al.: CVT: introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808 (2021)
Wu, W., Qi, Z., Fuxin, L.: PointConv: Deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9621–9630 (2019)
Google Scholar
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Google Scholar
Xu, Y., Fan, T., Xu, M., Zeng, L., Qiao, Yu.: SpiderCNN: deep learning on point sets with parameterized convolutional filters. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 90–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_6
Chapter Google Scholar
Yan, X., Zheng, C., Li, Z., Wang, S., Cui, S.: PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5589–5598 (2020)
Google Scholar
Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. 35(6) (2016). https://doi.org/10.1145/2980179.2980238
Yun, C., Bhojanapalli, S., Rawat, A.S., Reddi, S., Kumar, S.: Are transformers universal approximators of sequence-to-sequence functions? In: International Conference on Learning Representations (2019)
Google Scholar
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: 30th Proceedings of Conference on Advances in Neural Information Processing Systems (2017)
Google Scholar
Zhao, H., Jiang, L., Fu, C.W., Jia, J.: PointWeb: enhancing local neighborhood features for point cloud processing. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5560–5568 (2019). https://doi.org/10.1109/CVPR.2019.00571
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Google Scholar

Download references

Acknowledgements

Chaitanya Kaul and Roderick Murray-Smith acknowledge funding from the QuantIC project funded by the EPSRC Quantum Technology Programme (grant EP/MO1326X/1) and the iCAIRD project, funded by Innovate UK (project number 104690). Joshua Mitton is supported by a University of Glasgow Lord Kelvin Adam Smith Studentship. Roderick Murray-Smith acknowledges funding support from EPSRC grant EP/R018634/1, Closed-loop Data Science.

Author information

Authors and Affiliations

School of Computing Science, University of Glasgow, Glasgow, G12 8RZ, Scotland
Chaitanya Kaul, Joshua Mitton & Roderick Murray-Smith
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
Hang Dai

Authors

Chaitanya Kaul
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Mitton
View author publications
You can also search for this author in PubMed Google Scholar
Hang Dai
View author publications
You can also search for this author in PubMed Google Scholar
Roderick Murray-Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaitanya Kaul .

Editor information

Editors and Affiliations

University of Tokyo, Tokyo, Japan
Yinqiang Zheng
Hacettepe University, Ankara, Türkiye
Hacer Yalim Keleş
Data61/CSIRO, Canberra, ACT, Australia
Piotr Koniusz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaul, C., Mitton, J., Dai, H., Murray-Smith, R. (2023). Convolutional Point Transformer. In: Zheng, Y., Keleş, H.Y., Koniusz, P. (eds) Computer Vision – ACCV 2022 Workshops. ACCV 2022. Lecture Notes in Computer Science, vol 13848. Springer, Cham. https://doi.org/10.1007/978-3-031-27066-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-27066-6_22
Published: 09 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27065-9
Online ISBN: 978-3-031-27066-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics