Skip to main content

Info3D: Representation Learning on 3D Objects Using Mutual Information Maximization and Contrastive Learning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12374))

Included in the following conference series:

Abstract

A major endeavor of computer vision is to represent, understand and extract structure from 3D data. Towards this goal, unsupervised learning is a powerful and necessary tool. Most current unsupervised methods for 3D shape analysis use datasets that are aligned, require objects to be reconstructed and suffer from deteriorated performance on downstream tasks. To solve these issues, we propose to extend the InfoMax and contrastive learning principles on 3D shapes. We show that we can maximize the mutual information between 3D objects and their “chunks” to improve the representations in aligned datasets. Furthermore, we can achieve rotation invariance in SO(3) group by maximizing the mutual information between the 3D objects and their geometric transformed versions. Finally, we conduct several experiments such as clustering, transfer learning, shape retrieval, and achieve state of art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds (2017). arXiv preprint arXiv:1707.02392

  2. Anand, A., Racah, E., Ozair, S., Bengio, Y., Côté, M.A., Hjelm, R.D.: Unsupervised state representation learning in atari (2019). arXiv preprint arXiv:1906.08226

  3. Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views (2019). arXiv preprint arXiv:1906.00910

  4. Becker, S.: An information-theoretic unsupervised learning algorithm for neural networks. University of Toronto (1992)

    Google Scholar 

  5. Becker, S.: Mutual information maximization: models of cortical self-organization. Netw. Comput. Neural Syst. 7(1), 7–31 (1996)

    MATH  Google Scholar 

  6. Bell, A.J., Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159 (1995)

    Article  Google Scholar 

  7. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  8. Chang, A.X., et al.: Shapenet: an information-rich 3D model repository (2015). arXiv preprint arXiv:1512.03012

  9. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations (2020). arXiv preprint arXiv:2002.05709

  10. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5939–5948 (2019)

    Google Scholar 

  11. Cheng, S., Bronstein, M., Zhou, Y., Kotsia, I., Pantic, M., Zafeiriou, S.: Meshgan: non-linear 3D morphable models of faces (2019). arXiv preprint arXiv:1903.10384

  12. Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning, pp. 2990–2999 (2016)

    Google Scholar 

  13. Deng, H., Birdal, T., Ilic, S.: Ppf-foldnet: unsupervised learning of rotation invariant 3D local descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 602–618 (2018)

    Google Scholar 

  14. Deng, H., Birdal, T., Ilic, S.: 3D local features for direct pairwise registration (2019). arXiv preprint arXiv:1904.04281

  15. Esteves, C., Xu, Y., Allen-Blanchette, C., Daniilidis, K.: Equivariant multi-view networks (2019). arXiv preprint arXiv:1904.00993

  16. Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: Atlasnet: a papier-m\(\backslash \hat{\,}\) ach\(\backslash \)’e approach to learning 3D surface generation (2018). arXiv preprint arXiv:1802.05384

  17. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304 (2010)

    Google Scholar 

  18. Hassani, K., Haley, M.: Unsupervised multi-task feature learning on point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8160–8171 (2019)

    Google Scholar 

  19. Hénaff, O.J., Razavi, A., Doersch, C., Eslami, S., Oord, A.V.d.: Data-efficient image recognition with contrastive predictive coding (2019). arXiv preprint arXiv:1905.09272

  20. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980

  21. Koch, S., et al.: Abc: a big cad model dataset for geometric deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9601–9611 (2019)

    Google Scholar 

  22. Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition (2004)

    Google Scholar 

  23. Li, J., Bi, Y., Lee, G.H.: Discrete rotation equivariance for point cloud recognition (2019). arXiv preprint arXiv:1904.00319

  24. Linsker, R.: An application of the principle of maximum information preservation to linear systems. In: Advances in Neural Information Processing Systems, pp. 186–194 (1989)

    Google Scholar 

  25. Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8895–8904 (2019)

    Google Scholar 

  26. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: Learning 3D struction in function space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)

    Google Scholar 

  27. Michalkiewicz, M., Pontes, J.K., Jack, D., Baktashmotlagh, M., Eriksson, A.: Deep level sets: Implicit surface representations for 3D shape inference (2019). arXiv preprint arXiv:1901.06802

  28. Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations (2019). arXiv preprint arXiv:1912.01991

  29. Oord, A.V.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2018). arXiv preprint arXiv:1807.03748

  30. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation (2019). arXiv preprint arXiv:1901.05103

  31. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  32. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  33. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

    Google Scholar 

  34. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: Orb: an efficient alternative to sift or surf. In: ICCV, vol. 11, p. 2. Citeseer (2011)

    Google Scholar 

  35. Sanghi, A., Danielyan, A.: Towards 3d rotation invariant embeddings

    Google Scholar 

  36. Sauder, J., Sievers, B.: Context prediction for unsupervised deep learning on point clouds (2019). arXiv preprint arXiv:1901.08396

  37. Steder, B., Rusu, R.B., Konolige, K., Burgard, W.: Narf: 3D range image features for object recognition. In: Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 44 (2010)

    Google Scholar 

  38. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)

    Google Scholar 

  39. Sun, F.Y., Hoffmann, J., Tang, J.: Infograph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization (2019). arXiv preprint arXiv:1908.01000

  40. Tan, Q., Gao, L., Lai, Y.K., Xia, S.: Variational autoencoders for deforming 3D mesh models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5841–5850 (2018)

    Google Scholar 

  41. Thomas, N., et al.: Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds (2018). arXiv preprint arXiv:1802.08219

  42. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding (2019). arXiv preprint arXiv:1906.05849

  43. Veličković, P., Fedus, W., Hamilton, W.L., Liò, P., Bengio, Y., Hjelm, R.D.: Deep graph infomax (2018). arXiv preprint arXiv:1809.10341

  44. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 146 (2019)

    Article  Google Scholar 

  45. Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5028–5037 (2017)

    Google Scholar 

  46. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)

    Google Scholar 

  47. Wu, Z., et al.: 3D shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

    Google Scholar 

  48. Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised feature learning via non-parametric instance-level discrimination (2018). arXiv preprint arXiv:1805.01978

  49. Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018)

    Google Scholar 

  50. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems, pp. 3391–3401 (2017)

    Google Scholar 

  51. Zhang, L., Zhu, Z.: Unsupervised feature learning for point cloud understanding by contrasting and clustering using graph convolutional neural networks. In: 2019 International Conference on 3D Vision (3DV), pp. 395–404. IEEE (2019)

    Google Scholar 

  52. Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1009–1018 (2019)

    Google Scholar 

  53. Zhuang, C., Zhai, A.L., Yamins, D.: Local aggregation for unsupervised learning of visual embeddings. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6002–6012 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Sanghi .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1695 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sanghi, A. (2020). Info3D: Representation Learning on 3D Objects Using Mutual Information Maximization and Contrastive Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12374. Springer, Cham. https://doi.org/10.1007/978-3-030-58526-6_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58526-6_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58525-9

  • Online ISBN: 978-3-030-58526-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics