Skip to main content
Log in

A point cloud self-learning network based on contrastive learning for classification and segmentation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In the field of point cloud representation learning, many self-supervised learning methods aim to address the issue of conventional supervised learning methods relying heavily on labeled data. Particularly in recent years, contrastive learning-based methods have gained an increasing popularity. However, most of the current contrastive learning methods solely rely on conventional random augmentation, limiting the effectiveness of representation learning. Moreover, to prevent model collapse, they construct positive and negative sample pairs or explicit clustering centers, which adds complexity to data preprocessing operations. To address these challenges effectively and achieve accurate point cloud classification and segmentation, we propose PointSL, a self-learning network for point clouds based on contrastive learning. PointSL incorporates a learnable point cloud augmentation (LPA) module, which transforms samples with high precision, significantly improving the augmentation effect. To further enhance feature discrimination, PointSL introduces a self-learning process along a refined feature predictor (FFP). This innovative approach leverages the attention mechanism to facilitate mutual feature prediction between pairs of point clouds, thereby continuously improving discriminant performance. Additionally, the network constructed a simple yet effective self-adaptive loss function that optimizes the entire network through gradient feedback. For pretraining, it is beneficial to obtain encoders with a better generalization and a higher accuracy. We evaluate PointSL on benchmark datasets such as ModelNet40, Sydney Urban Objects and ShapeNetPart. Experimental results demonstrate that PointSL outperforms state-of-the-art self-supervised methods and supervised counterparts, achieving exceptional performance in classification and segmentation tasks. Notably, on the Sydney Urban Objects and ModelNet40 datasets, PointSL achieves OA and AA metrics of 80.6%, 69.9%, 94.2% and 91.4%, respectively. On the ShapeNetPart dataset, PointSL achieves Inst.mIoU and Cls.mIoU metrics of 86.3% and 85.1%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Availability of data and materials

The ShapeNetCore, ShapeNetPart, ModelNet40 and Sydney Urban Objects datasets used in this study were obtained from public domains and are available online at https://shapenet.org, https://www.shapenet.org/download/parts, https://modelnet.cs.princeton.edu and https://www.acfr.usyd.edu.au/papers/SydneyUrbanObjectsDataset.shtml accessed on 20 Feb 2023.

References

  1. Chen, H., Lu, P.: Real-time identification and avoidance of simultaneous static and dynamic obstacles on point cloud for UAVs navigation. Robot. Auton. Syst. 154, 104124 (2022)

    Article  Google Scholar 

  2. Chen, S., Liu, B., Feng, C., Vallespi-Gonzalez, C., Wellington, C.: 3D point cloud processing and learning for autonomous driving: impacting map creation, localization, and perception. IEEE Signal Process. Mag. 38(1), 68–86 (2020)

    Article  Google Scholar 

  3. Zheng, Y., Li, Y., Yang, S., Lu, H.: Global-pbnet: a novel point cloud registration for autonomous driving. IEEE Trans. Intell. Transp. Syst. 23(11), 22312–22319 (2022)

    Article  Google Scholar 

  4. Geng, Z., Sabbaghi, A., Bidanda, B.: Automated variance modeling for three-dimensional point cloud data via Bayesian neural networks. IISE Trans. 55(9), 912–925 (2023)

    Article  Google Scholar 

  5. Nguyen, V.-T., Fournier, R.A., Côté, J.-F., Pimont, F.: Estimation of vertical plant area density from single return terrestrial laser scanning point clouds acquired in forest environments. Remote Sens. Environ. 279, 113115 (2022)

    Article  Google Scholar 

  6. Qian, G., Hammoud, H., Li, G., Thabet, A., Ghanem, B.: Assanet: an anisotropic separable set abstraction for efficient point cloud representation learning. Adv. Neural. Inf. Process. Syst. 34, 28119–28130 (2021)

    Google Scholar 

  7. Singh, S.A., Kumar, A.S., Desai, K.: Comparative assessment of common pre-trained CNNs for vision-based surface defect detection of machined components. Expert Syst. Appl. 218, 119623 (2023)

    Article  Google Scholar 

  8. Zhang, J., Xie, W., Wang, C., Tu, R., Tu, Z.: Graph-aware transformer for skeleton-based action recognition. Vis. Comput. pp. 1–12 (2022)

  9. Li, C., Guan, Y., Yang, S., Li, Y.: A dynamic learning framework integrating attention mechanism for point cloud registration. Vis. Comput. pp. 1–15 (2023)

  10. Wang, C., Cheng, M., Sohel, F., Bennamoun, M., Li, J.: Normalnet: a voxel-based CNN for 3D object classification and retrieval. Neurocomputing 323, 139–147 (2019)

    Article  Google Scholar 

  11. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)

  12. Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: Gvcnn: Group-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 264–272 (2018)

  13. Yu, T., Meng, J., Yuan, J.: Multi-view harmonized bilinear network for 3D object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 186–194 (2018)

  14. Hamdi, A., Giancola, S., Ghanem, B.: Mvtn: Multi-view transformation network for 3D shape recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–11 (2021)

  15. Wang, W., Zhou, H., Chen, G., Wang, X.: Fusion of a static and dynamic convolutional neural network for multiview 3D point cloud classification. Remote Sens. 14(9), 1996 (2022)

    Article  Google Scholar 

  16. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017)

  17. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graphics 38(5), 1–12 (2019)

    Article  Google Scholar 

  18. Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021)

  19. Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and local geometry in point cloud: A simple residual MLP framework. arXiv preprint arXiv:2202.07123 (2022)

  20. Xie, S., Gu, J., Guo, D., Qi, C.R., Guibas, L., Litany, O.: Pointcontrast: Unsupervised pre-training for 3D point cloud understanding. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 574–591 (2020). Springer

  21. Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-bert: Pre-training 3D point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19313–19322 (2022)

  22. Yan, S., Yang, Z., Li, H., Guan, L., Kang, H., Hua, G., Huang, Q.: Iae: Implicit autoencoder for point cloud self-supervised representation learning. arXiv preprint arXiv:2201.00785 (2022)

  23. Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: Point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018)

  24. Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10252–10263 (2021)

  25. Long, F., Yao, T., Qiu, Z., Li, L., Mei, T.: Pointclustering: Unsupervised point cloud pre-training using transformation invariance in clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21824–21834 (2023)

  26. Pang, Y., Wang, W., Tay, F.E., Liu, W., Tian, Y., Yuan, L.: Masked autoencoders for point cloud self-supervised learning. In: European Conference on Computer Vision, pp. 604–621 (2022). Springer

  27. Zeng, Y., Jiang, C., Mao, J., Han, J., Ye, C., Huang, Q., Yeung, D.-Y., Yang, Z., Liang, X., Xu, H.: Clip2: Contrastive language-image-point pretraining from real-world point cloud data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15244–15253 (2023)

  28. Hou, J., Graham, B., Nießner, M., Xie, S.: Exploring data-efficient 3d scene understanding with contrastive scene contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15587–15597 (2021)

  29. Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., Rodrigo, R.: Crosspoint: self-supervised cross-modal contrastive learning for 3d point cloud understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9902–9912 (2022)

  30. Rao, Y., Liu, B., Wei, Y., Lu, J., Hsieh, C.-J., Zhou, J.: Randomrooms: Unsupervised pre-training from synthetic shapes and randomized layouts for 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3283–3292 (2021)

  31. Du, B., Gao, X., Hu, W., Li, X.: Self-contrastive learning with hard negative sampling for self-supervised point cloud learning. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3133–3142 (2021)

  32. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

  33. Liu, Z., Tang, H., Lin, Y., Han, S.: Point-voxel CNN for efficient 3D deep learning. Adv. Neural Inf. Process. Syst. 32 (2019)

  34. Liu, Z., Song, W., Tian, Y., Ji, S., Sung, Y., Wen, L., Zhang, T., Song, L., Gozho, A.: Vb-net: voxel-based broad learning network for 3D object classification. Appl. Sci. 10(19), 6735 (2020)

    Article  Google Scholar 

  35. Mohammadi, S.S., Wang, Y., Del Bue, A.: Pointview-GCN: 3D shape classification with multi-view point clouds. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 3103–3107 (2021). IEEE

  36. Chen, S., Yu, T., Li, P.: MVT: Multi-view vision transformer for 3D object recognition. In: British Machine Vision Conference (2021)

  37. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

  38. Goyal, A., Law, H., Liu, B., Newell, A., Deng, J.: Revisiting point cloud shape classification with a simple and effective baseline. In: International Conference on Machine Learning, pp. 3809–3820 (2021). PMLR

  39. Xiang, T., Zhang, C., Song, Y., Yu, J., Cai, W.: Walk in the cloud: learning curves for point clouds shape analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 915–924 (2021)

  40. Ran, H., Zhuo, W., Liu, J., Lu, L.: Learning inner-group relations on point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15477–15487 (2021)

  41. Xu, M., Ding, R., Zhao, H., Qi, X.: Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3173–3182 (2021)

  42. Zhang, C., Wan, H., Liu, S., Shen, X., Wu, Z.: Pvt: Point-voxel transformer for 3D deep learning. arxiv 2021. arXiv preprint arXiv:2108.060765

  43. Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1009–1018 (2019)

  44. Shi, Y., Xu, M., Yuan, S., Fang, Y.: Unsupervised deep shape descriptor with point distribution learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9353–9362 (2020)

  45. Yang, J., Ahn, P., Kim, D., Lee, H., Kim, J.: Progressive seed generation auto-encoder for unsupervised point cloud learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6413–6422 (2021)

  46. Wang, H., Liu, Q., Yue, X., Lasenby, J., Kusner, M.J.: Unsupervised point cloud pre-training via occlusion completion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9782–9792 (2021)

  47. Sanghi, A.: Info3D: Representation learning on 3d objects using mutual information maximization and contrastive learning. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, pp. 626–642 (2020). Springer

  48. Jiang, J., Lu, X., Ouyang, W., Wang, M.: Unsupervised contrastive learning with simple transformation for 3D point cloud data. The Visual Computer, pp. 1–18 (2023)

  49. Chen, Y., Hu, V.T., Gavves, E., Mensink, T., Mettes, P., Yang, P., Snoek, C.G.: Pointmixup: Augmentation for point clouds. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 330–345 (2020). Springer

  50. Sheshappanavar, S.V., Singh, V.V., Kambhamettu, C.: Patchaugment: Local neighborhood augmentation in point cloud classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2118–2127 (2021)

  51. Choi, J., Song, Y., Kwak, N.: Part-aware data augmentation for 3d object detection in point cloud. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3391–3397 (2021). IEEE

  52. Li, R., Li, X., Heng, P.-A., Fu, C.-W.: Pointaugment: an auto-augmentation framework for point cloud classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6378–6387 (2020)

  53. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR

  54. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural. Inf. Process. Syst. 33, 9912–9924 (2020)

    Google Scholar 

  55. Chen, X., He, K.: Exploring simple Siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)

  56. Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

  57. Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural. Inf. Process. Syst. 33, 21271–21284 (2020)

    Google Scholar 

  58. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304 (2010). JMLR Workshop and Conference Proceedings

  59. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  60. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  61. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

  62. De Deuge, M., Quadros, A., Hung, C., Douillard, B.: Unsupervised feature learning for classification of outdoor 3d scans. In: Australasian Conference on Robitics and Automation, vol. 2 (2013). University of New South Wales Kensington, Australia

  63. Han, Z., Shang, M., Liu, Y.-S., Zwicker, M.: View inter-prediction GAN: Unsupervised representation learning for 3D shapes by learning global shape memories to support local view predictions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8376–8384 (2019)

  64. Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. Adv. Neural Inf. Process. Syst. 32 (2019)

Download references

Acknowledgements

The authors would like to thank the relevant researchers from Princeton University, Stanford University and TTIC for providing ShapeNetCore, ShapeNetPart and ModelNet40 datasets, and the relevant researchers from the University of Sydney for providing the Sydney Urban Objects dataset.

Funding

This work was sponsored by Natural Science Foundation of Shanghai under Grant No. 19ZR1435900.

Author information

Authors and Affiliations

Authors

Contributions

HZ contributed to conceptualization, methodology, resources and software; HZ, GC and XW performed data curation; WW carried out formal analysis and funding acquisition; WW, GC and XW performed supervision and validation; GC and XW contributed to visualization; HZ performed writing—original draft; HZ and WW performed writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Wenju Wang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, H., Wang, W., Chen, G. et al. A point cloud self-learning network based on contrastive learning for classification and segmentation. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03248-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-023-03248-4

Keywords

Navigation