Info3D: Representation Learning on 3D Objects Using Mutual Information Maximization and Contrastive Learning

Sanghi, Aditya

doi:10.1007/978-3-030-58526-6_37

Aditya Sanghi¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12374))

Included in the following conference series:

European Conference on Computer Vision

4349 Accesses
31 Citations

Abstract

A major endeavor of computer vision is to represent, understand and extract structure from 3D data. Towards this goal, unsupervised learning is a powerful and necessary tool. Most current unsupervised methods for 3D shape analysis use datasets that are aligned, require objects to be reconstructed and suffer from deteriorated performance on downstream tasks. To solve these issues, we propose to extend the InfoMax and contrastive learning principles on 3D shapes. We show that we can maximize the mutual information between 3D objects and their “chunks” to improve the representations in aligned datasets. Furthermore, we can achieve rotation invariance in SO(3) group by maximizing the mutual information between the 3D objects and their geometric transformed versions. Finally, we conduct several experiments such as clustering, transfer learning, shape retrieval, and achieve state of art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds (2017). arXiv preprint arXiv:1707.02392
Anand, A., Racah, E., Ozair, S., Bengio, Y., Côté, M.A., Hjelm, R.D.: Unsupervised state representation learning in atari (2019). arXiv preprint arXiv:1906.08226
Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views (2019). arXiv preprint arXiv:1906.00910
Becker, S.: An information-theoretic unsupervised learning algorithm for neural networks. University of Toronto (1992)
Google Scholar
Becker, S.: Mutual information maximization: models of cortical self-organization. Netw. Comput. Neural Syst. 7(1), 7–31 (1996)
MATH Google Scholar
Bell, A.J., Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159 (1995)
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Chang, A.X., et al.: Shapenet: an information-rich 3D model repository (2015). arXiv preprint arXiv:1512.03012
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations (2020). arXiv preprint arXiv:2002.05709
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5939–5948 (2019)
Google Scholar
Cheng, S., Bronstein, M., Zhou, Y., Kotsia, I., Pantic, M., Zafeiriou, S.: Meshgan: non-linear 3D morphable models of faces (2019). arXiv preprint arXiv:1903.10384
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning, pp. 2990–2999 (2016)
Google Scholar
Deng, H., Birdal, T., Ilic, S.: Ppf-foldnet: unsupervised learning of rotation invariant 3D local descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 602–618 (2018)
Google Scholar
Deng, H., Birdal, T., Ilic, S.: 3D local features for direct pairwise registration (2019). arXiv preprint arXiv:1904.04281
Esteves, C., Xu, Y., Allen-Blanchette, C., Daniilidis, K.: Equivariant multi-view networks (2019). arXiv preprint arXiv:1904.00993
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: Atlasnet: a papier-m\(\backslash \hat{\,}\) ach\(\backslash \)’e approach to learning 3D surface generation (2018). arXiv preprint arXiv:1802.05384
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304 (2010)
Google Scholar
Hassani, K., Haley, M.: Unsupervised multi-task feature learning on point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8160–8171 (2019)
Google Scholar
Hénaff, O.J., Razavi, A., Doersch, C., Eslami, S., Oord, A.V.d.: Data-efficient image recognition with contrastive predictive coding (2019). arXiv preprint arXiv:1905.09272
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980
Koch, S., et al.: Abc: a big cad model dataset for geometric deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9601–9611 (2019)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition (2004)
Google Scholar
Li, J., Bi, Y., Lee, G.H.: Discrete rotation equivariance for point cloud recognition (2019). arXiv preprint arXiv:1904.00319
Linsker, R.: An application of the principle of maximum information preservation to linear systems. In: Advances in Neural Information Processing Systems, pp. 186–194 (1989)
Google Scholar
Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8895–8904 (2019)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: Learning 3D struction in function space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Google Scholar
Michalkiewicz, M., Pontes, J.K., Jack, D., Baktashmotlagh, M., Eriksson, A.: Deep level sets: Implicit surface representations for 3D shape inference (2019). arXiv preprint arXiv:1901.06802
Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations (2019). arXiv preprint arXiv:1912.01991
Oord, A.V.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2018). arXiv preprint arXiv:1807.03748
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation (2019). arXiv preprint arXiv:1901.05103
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: Orb: an efficient alternative to sift or surf. In: ICCV, vol. 11, p. 2. Citeseer (2011)
Google Scholar
Sanghi, A., Danielyan, A.: Towards 3d rotation invariant embeddings
Google Scholar
Sauder, J., Sievers, B.: Context prediction for unsupervised deep learning on point clouds (2019). arXiv preprint arXiv:1901.08396
Steder, B., Rusu, R.B., Konolige, K., Burgard, W.: Narf: 3D range image features for object recognition. In: Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 44 (2010)
Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Google Scholar
Sun, F.Y., Hoffmann, J., Tang, J.: Infograph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization (2019). arXiv preprint arXiv:1908.01000
Tan, Q., Gao, L., Lai, Y.K., Xia, S.: Variational autoencoders for deforming 3D mesh models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5841–5850 (2018)
Google Scholar
Thomas, N., et al.: Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds (2018). arXiv preprint arXiv:1802.08219
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding (2019). arXiv preprint arXiv:1906.05849
Veličković, P., Fedus, W., Hamilton, W.L., Liò, P., Bengio, Y., Hjelm, R.D.: Deep graph infomax (2018). arXiv preprint arXiv:1809.10341
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 146 (2019)
Article Google Scholar
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5028–5037 (2017)
Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)
Google Scholar
Wu, Z., et al.: 3D shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Google Scholar
Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised feature learning via non-parametric instance-level discrimination (2018). arXiv preprint arXiv:1805.01978
Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018)
Google Scholar
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems, pp. 3391–3401 (2017)
Google Scholar
Zhang, L., Zhu, Z.: Unsupervised feature learning for point cloud understanding by contrasting and clustering using graph convolutional neural networks. In: 2019 International Conference on 3D Vision (3DV), pp. 395–404. IEEE (2019)
Google Scholar
Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1009–1018 (2019)
Google Scholar
Zhuang, C., Zhai, A.L., Yamins, D.: Local aggregation for unsupervised learning of visual embeddings. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6002–6012 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Autodesk AI Lab, Toronto, Canada
Aditya Sanghi

Authors

Aditya Sanghi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Sanghi .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1695 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sanghi, A. (2020). Info3D: Representation Learning on 3D Objects Using Mutual Information Maximization and Contrastive Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12374. Springer, Cham. https://doi.org/10.1007/978-3-030-58526-6_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-58526-6_37
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58525-9
Online ISBN: 978-3-030-58526-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics