Skip to main content
Log in

Unsupervised contrastive learning with simple transformation for 3D point cloud data

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Though a number of point cloud learning methods have been proposed to handle unordered points, most of them are supervised and require labels for training. By contrast, unsupervised learning of point cloud data has received much less attention to date. In this paper, we propose a simple yet effective approach for unsupervised point cloud learning. In particular, we identify a very useful transformation which generates a good contrastive version of an original point cloud. They make up a pair. After going through a shared encoder and a shared head network, the consistency between the output representations are maximized with introducing two variants of contrastive losses to respectively facilitate downstream classification and segmentation. To demonstrate the efficacy of our method, we conduct experiments on three downstream tasks which are 3D object classification (on ModelNet40 and ModelNet10), shape part segmentation (on ShapeNet Part dataset) as well as scene segmentation (on S3DIS). Comprehensive results show that our unsupervised contrastive representation learning enables impressive outcomes in object classification and semantic segmentation. It generally outperforms current unsupervised methods, and even achieves comparable performance to supervised methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

  2. Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586 (2017)

  3. Wang, P.-S., Liu, Y., Guo, Y.-X., Sun, C.-Y., Tong, X.: O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Trans. Graph. (TOG) 36(4), 1–11 (2017)

    Google Scholar 

  4. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)

  5. Li, L., Zhu, S., Fu, H., Tan, P., Tai, C.-L.: End-to-end learning local multi-view descriptors for 3d point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1919–1928 (2020)

  6. Lyu, Y., Huang, X., Zhang, Z.: Learning to segment 3d point clouds in 2d image space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12255–12264 (2020)

  7. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

  8. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

  9. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: Pointcnn: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, pp. 820–830 (2018)

  10. Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds. In: International Conference on Machine Learning, pp. 40–49. PMLR (2018)

  11. Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018)

  12. Han, Z., Wang, X., Liu, Y.-S., Zwicker, M.: Multi-angle point cloud-vae: unsupervised feature learning for 3d point clouds from multiple angles by joint self-reconstruction and half-to-half prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10441–10450. IEEE (2019)

  13. Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3d point capsule networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1009–1018 (2019)

  14. Zhang, D., Lu, X., Qin, H., He, Y.: Pointfilter: point cloud filtering via encoder-decoder modeling. IEEE Trans. Vis. Comput. Graph. (2020). https://doi.org/10.1109/TVCG.2020.3027069

    Article  Google Scholar 

  15. Lu, D., Lu, X., Sun, Y., Wang, J.: Deep feature-preserving normal estimation for point cloud filtering. Comput. Aided Des. 125, 102860 (2020). https://doi.org/10.1016/j.cad.2020.102860

    Article  MathSciNet  Google Scholar 

  16. Lu, X., Schaefer, S., Luo, J., Ma, L., He, Y.: Low rank matrix approximation for 3d geometry filtering. IEEE Trans. Vis. Comput. Graph. (2020). https://doi.org/10.1109/TVCG.2020.3026785

    Article  Google Scholar 

  17. Lu, X., Wu, S., Chen, H., Yeung, S., Chen, W., Zwicker, M.: Gpf: Gmm-inspired feature-preserving point set filtering. IEEE Trans. Vis. Comput. Graph. 24(8), 2315–2326 (2018). https://doi.org/10.1109/TVCG.2017.2725948

    Article  Google Scholar 

  18. Su, H., Jampani, V., Sun, D., Maji, S., Kalogerakis, E., Yang, M.-H., Kautz, J.: Splatnet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539 (2018)

  19. Zhou, H.-Y., Liu, A.-A., Nie, W.-Z., Nie, J.: Multi-view saliency guided deep neural network for 3-d object retrieval and classification. IEEE Trans. Multimed. 22(6), 1496–1506 (2019)

    Article  Google Scholar 

  20. Wu, W., Qi, Z., Fuxin, L.: Pointconv: deep convolutional networks on 3d point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9621–9630 (2019)

  21. Xu, Y., Fan, T., Xu, M., Zeng, L., Qiao, Y.: Spidercnn: Deep learning on point sets with parameterized convolutional filters. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102 (2018)

  22. Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8895–8904 (2019)

  23. Komarichev, A., Zhong, Z., Hua, J.: A-cnn: annularly convolutional neural networks on point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7421–7430 (2019)

  24. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. Acm Trans. Graph. (tog) 38(5), 1–12 (2019)

    Article  Google Scholar 

  25. Lin, Z.-H., Huang, S.-Y., Wang, Y.-C.F.: Convolution in the cloud: learning deformable kernels in 3d graph convolution networks for point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1800–1809 (2020)

  26. Jiang, L., Shi, S., Tian, Z., Lai, X., Liu, S., Fu, C.-W., Jia, J.: Guided point contrastive learning for semi-supervised point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6423–6432 (2021)

  27. Du, B., Gao, X., Hu, W., Li, X.: Self-contrastive learning with hard negative sampling for self-supervised point cloud learning. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3133–3142 (2021)

  28. Xu, C., Leng, B., Chen, B., Zhang, C., Zhou, X.: Learning discriminative and generative shape embeddings for three-dimensional shape retrieval. IEEE Trans. Multimed. 22(9), 2234–2245 (2019)

    Article  Google Scholar 

  29. Huang, J., Yan, W., Li, T.H., Liu, S., Li, G.: Learning the global descriptor for 3d object recognition based on multiple views decomposition. IEEE Trans. Multimed. (2020)

  30. Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3693–3702 (2017)

  31. Wang, S., Suo, S., Ma, W.-C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2589–2597 (2018)

  32. Li, J., Chen, B.M., Hee Lee, G.: So-net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9397–9406 (2018)

  33. Zhao, H., Jiang, L., Fu, C.-W., Jia, J.: Pointweb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5565–5573 (2019)

  34. Xie, S., Liu, S., Chen, Z., Tu, Z.: Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4606–4615 (2018)

  35. Fujiwara, K., Hashimoto, T.: Neural implicit embedding for point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11734–11743 (2020)

  36. Yan, X., Zheng, C., Li, Z., Wang, S., Cui, S.: Pointasnl: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5589–5598 (2020)

  37. Qiu, S., Anwar, S., Barnes, N.: Geometric back-projection network for point cloud classification. IEEE Trans. Multimed. (2021)

  38. Chen, C., Qian, S., Fang, Q., Xu, C.: Hapgn: hierarchical attentive pooling graph network for point cloud segmentation. IEEE Trans. Multimed. (2020)

  39. Liu, H., Guo, Y., Ma, Y., Lei, Y., Wen, G.: Semantic context encoding for accurate 3d point cloud segmentation. IEEE Trans. Multimed. (2020)

  40. Rao, Y., Lu, J., Zhou, J.: Global-local bidirectional reasoning for unsupervised representation learning of 3d point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5376–5385 (2020)

  41. Zhang, M., You, H., Kadam, P., Liu, S., Kuo, C.-C.J.: Pointhop: an explainable machine learning method for point cloud classification. IEEE Trans. Multimed. 22(7), 1744–1755 (2020)

    Article  Google Scholar 

  42. Xie, S., Gu, J., Guo, D., Qi, C.R., Guibas, L., Litany, O.: Pointcontrast: unsupervised pre-training for 3d point cloud understanding. In: European Conference on Computer Vision, pp. 574–591. Springer (2020)

  43. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

  44. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  45. Gadelha, M., Wang, R., Maji, S.: Multiresolution tree networks for 3d point cloud processing. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–118 (2018)

  46. Han, Z., Shang, M., Liu, Y.-S., Zwicker, M.: View inter-prediction gan: unsupervised representation learning for 3d shapes by learning global shape memories to support local view predictions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8376–8384 (2019)

  47. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  48. Yi, L., Kim, V.G., Ceylan, D., Shen, I.-C., Yan, M., Su, H., Lu, C., Huang, Q., Sheffer, A., Guibas, L.: A scalable active framework for region annotation in 3d shape collections. ACM Trans. Graph. (ToG) 35(6), 1–12 (2016)

    Article  Google Scholar 

  49. Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)

  50. Klokov, R., Lempitsky, V.: Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)

  51. Shen, Y., Feng, C., Yang, Y., Tian, D.: Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4557 (2018)

  52. Thomas, H., Qi, C.R., Deschaud, J.-E., Marcotegui, B., Goulette, F., Guibas, L.J.: Kpconv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6411–6420 (2019)

  53. Atzmon, M., Maron, H., Lipman, Y.: Point convolutional neural networks by extension operators. ACM Trans. Graph. (TOG) 37(4), 1–12 (2018)

    Article  Google Scholar 

  54. Liu, X., Han, Z., Liu, Y.-S., Zwicker, M.: Point2sequence: learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8778–8785 (2019)

  55. Hassani, K., Haley, M.: Unsupervised multi-task feature learning on point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8160–8171 (2019)

  56. Huang, Q., Wang, W., Neumann, U.: Recurrent slice networks for 3d segmentation of point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2635 (2018)

  57. Feng, Z., Zhou, Q., Gu, Q., Tan, X., Cheng, G., Lu, X., Shi, J., Ma, L.: Dmt: dynamic mutual training for semi-supervised learning. arXiv preprint arXiv:2004.08514 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meili Wang.

Ethics declarations

Conflict of interest

The authors declare that the work is original and has not been submitted elsewhere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Overview of segmentation

We also achieve the task of point cloud semantic segmentation, including shape part segmentation and scene segmentation. Different from the 3D object classification task, we need to gain all the point-wise features in the point cloud, which is the key to solve the segmentation task. For our unsupervised contrastive learning, as shown in Fig. 6, we still consider the original point cloud and its transformed point cloud as a contrastive pair. However, in order to ensure that the feature of each point in the point cloud will be learned, we use the mean of point-wise cross entropy to evaluate the point cloud similarity, and try to maximize the similarity of the positive pair (all other pairs of point clouds in the minibatch are viewed as negative pairs). In this unsupervised manner, our framework can learn the feature of each point in the point cloud.

Appendix B Additional visual results on scene segmentation

In this section, we show more visual results on scene segmentation. Similarly, we utilize the Linear Classifier setting for this downstream task. Figure 7 shows the visual results of several scenes. We can observe from the figure that our method produces close segmentation results to the ground truth. This demonstrates the capability of our unsupervised representation learning method.

Fig. 7
figure 7

Visual result of scene segmentation

Fig. 8
figure 8

Some examples of all 16 categories in ShapeNet Part dataset

Appendix C Additional visual results on shape part segmentation

In this section, we put more visual results of our method on the downstream shape part segmentation. We simply employ the Linear Classifier setting for this downstream task. Figure 8 shows the visual results of 32 models of 16 categories, involving 2 models per category. As we can see from the figure, with our unsupervised learned representations, a simple linear classifier for the downstream task can generate very similar visual results to the ground truth segmentation. It further confirms the effectiveness of our unsupervised method in learning distinguishable representations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, J., Lu, X., Ouyang, W. et al. Unsupervised contrastive learning with simple transformation for 3D point cloud data. Vis Comput (2023). https://doi.org/10.1007/s00371-023-02921-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-023-02921-y

Keywords

Navigation