Learning SO(3) Equivariant Representations with Spherical CNNs

Published: 06 September 2019

Volume 128, pages 588–600, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Carlos Esteves ORCID: orcid.org/0000-0001-9413-1201¹,
Christine Allen-Blanchette¹,
Ameesh Makadia² &
…
Kostas Daniilidis¹

1046 Accesses
29 Citations
3 Altmetric
Explore all metrics

Abstract

We address the problem of 3D rotation equivariance in convolutional neural networks. 3D rotations have been a challenging nuisance in 3D classification tasks requiring higher capacity and extended data augmentation in order to tackle it. We model 3D data with multi-valued spherical functions and we propose a novel spherical convolutional network that implements exact convolutions on the sphere by realizing them in the spherical harmonic domain. Resulting filters have local symmetry and are localized by enforcing smooth spectra. We apply a novel pooling on the spectral domain and our operations are independent of the underlying spherical resolution throughout the network. We show that networks with much lower capacity and without requiring data augmentation can exhibit performance comparable to the state of the art in standard 3D shape retrieval and classification benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Similar content being viewed by others

Learning SO(3) Equivariant Representations with Spherical CNNs

Chapter © 2018

Deep Learning 3D Shape Surfaces Using Geometry Images

Chapter © 2016

Large-Scale Shape Retrieval with Sparse 3D Convolutional Neural Networks

Chapter © 2018

Notes

The first version of this work was submitted to CVPR on 11/15/2017, shortly after we became aware of Cohen et al. (2018) ICLR submission on 10/27/2017.
In a CNN setting, f represents inputs/feature maps, and h the learned filters.
For the experiments in Table 6, one epoch for the WAP model in the first row takes 234 s, versus 132 s for the SP model in the third row, both on a Nvidia 1080 Ti.

References

Arfken, G. (1966). Mathematical methods for physicists. No. v. 2 in Mathematical methods for physicists. New York: Academic Press.
Bai, S., Bai, X., Zhou, Z., Zhang, Z., & Jan Latecki, L. (2016). Gift: A real-time and scalable 3d shape search engine. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5023–5032).
Boscaini, D., Masci, J., Rodolà, E., & Bronstein, M. (2016). Learning shape correspondence with anisotropic convolutional neural networks. In Advances in neural information processing systems (pp. 3189–3197).
Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. (2017). Geometric deep learning: Going beyond euclidean data. IEEE Signal Processing Magazine, 34(4), 18–42.
Article Google Scholar
Bruna, J., Szlam, A., & LeCun, Y. (2013a). Learning stable group invariant representations with convolutional networks. arXiv preprint arXiv:1301.3537.
Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013b). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.
Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013c). Spectral networks and locally connected networks on graphs. CoRR arXiv:1312.6203v3.
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., et al. (2015). Shapenet: An information-rich 3d model repository. CoRR arXiv:1512.03012v1.
Cohen, T. S., & Welling, M. (2016). Group equivariant convolutional networks. arXiv preprint arXiv:1602.07576.
Cohen, T. S., Geiger, M., Köhler, J., & Welling, M. (2018). Spherical CNNs. In International conference on learning representations.
Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems (pp. 3844–3852).
Dieleman, S., Willett, K. W., & Dambre, J. (2015). Rotation-invariant convolutional neural networks for galaxy morphology prediction. Monthly Notices of the Royal Astronomical Society, 450(2), 1441–1459.
Article Google Scholar
Driscoll, J. R., & Healy, D. M. (1994). Computing fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics, 15(2), 202–250.
Article MathSciNet Google Scholar
Frome, A., Huber, D., Kolluri, R., Bülow, T., & Malik, J. (2004). Recognizing objects in range data using regional point descriptors. In T. Pajdla & J. Matas (Eds.), Computer Vision - ECCV 2004. ECCV 2004. Lecture notes in computer science (vol. 3023). Berlin, Heidelberg: Springer.
Furuya, T., & Ohbuchi, R. (2016). Deep aggregation of local 3d geometric features for 3d model retrieval. In BMVC (p. 121).
Gens, R., & Domingos, P. M. (2014). Deep symmetry networks. In Advances in neural information processing systems (pp. 2537–2545).
Górski, K. M., Hivon, E., Banday, A. J., Wandelt, B. D., Hansen, F. K., Reinecke, M., et al. (2005). HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal, 622, 759–771. https://doi.org/10.1086/427976.
Article Google Scholar
Healy, D. M., Rockmore, D. N., Kostelec, P. J., & Moore, S. (2003). Ffts for the 2-sphere-improvements and variations. Journal of Fourier Analysis and Applications, 9(4), 341–385.
Article MathSciNet Google Scholar
Hel-Or, Y., & Teo, P. C. (1996). Canonical decomposition of steerable functions. In Computer vision and pattern recognition, 1996. Proceedings CVPR’96, 1996 IEEE computer society conference on (pp. 809–816). IEEE.
Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).
Kanezaki, A., Matsushita, Y., & Nishida, Y. (2018). Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR).
Kazhdan, M., & Funkhouser, T. (2002). Harmonic 3d shape matching. In ACM SIGGRAPH 2002 conference abstracts and applications (pp. 191–191). New York: ACM.
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In International conference on compute vision (ICCV) (pp. 863–872).
Lebedev, N., & Silverman, R. (1972). Special functions and their applications. Dover Books on Mathematics, Dover Publications.
Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 991–999).
Li, J., Chen, B M., & Lee, G. H. (2018). SO-net: Self-organizing network for point cloud analysis. CoRR arXiv:1803.04249v4.
Makadia, A., & Daniilidis, K. (2010). Spherical correlation of visual representations for 3d model retrieval. International Journal of Computer Vision, 89(2), 193–210.
Article Google Scholar
Marcos, D., Volpi, M., Komodakis, N., & Tuia, D. (2016). Rotation equivariant vector field networks. CoRR arXiv:1612.09346.
Masci, J., Boscaini, D., Bronstein, M., & Vandergheynst, P. (2015). Geodesic convolutional neural networks on Riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops (pp. 37–45).
Maturana, D., & Scherer, S. (2015). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems, IROS 2015, Hamburg, Germany, September 28–October 2, 2015 (pp. 922–928). https://doi.org/10.1109/IROS.2015.7353481.
Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., & Bronstein, M. M. (2016). Geometric deep learning on graphs and manifolds using mixture model CNNs. arXiv preprint arXiv:1611.08402.
Qi, C. R., Su, H., Nießner, M., Dai, A., Yan, M., & Guibas, LJ. (2016). Volumetric and multi-view CNNs for object classification on 3d data. In 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016 (pp. 5648–5656). https://doi.org/10.1109/CVPR.2016.609.
Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Processing computer vision and pattern recognition (CVPR) (Vol. 1(2), p. 4). IEEE.
Qi, C. R., Yi, L., Su, H., & Guibas, LJ. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems (pp. 5105–5114).
Rippel, O., Snoek, J., & Adams, R. P. (2015). Spectral representations for convolutional neural networks. CoRR arXiv:1506.03767
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y.
Article MathSciNet Google Scholar
Savva, M., Yu, F., Su, H., Kanezaki, A., Furuya, T., et al. (2017). Shrec’17 track: Large-scale 3d shape retrieval from shapenet core55. In 10th Eurographics workshop on 3D object retrieval (pp. 1–11).
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815–823).
Segman, J., Rubinstein, J., & Zeevi, Y. Y. (1992). The canonical coordinates method for pattern deformation: Theoretical and computational considerations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(12), 1171–1183.
Article Google Scholar
Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision (pp. 945–953).
Tatsuma, A., & Aono, M. (2009). Multi-fourier spectra descriptor and augmentation with spectral clustering for 3d shape retrieval. The Visual Computer, 25(8), 785–804.
Article Google Scholar
Thurston, W. P. (1997). Three-dimensional geometry and topology (Vol. 1). Princeton, NJ: Princeton University Press.
Book Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M.M., & Solomon, J. M. (2018). Dynamic graph CNN for learning on point clouds. arXiv preprint arXiv:1801.07829.
Worrall, D. E., Garbin, S. J., Turmukhambetov, D., & Brostow, G. J. (2016). Harmonic networks: Deep translation and rotation equivariance. arXiv preprint arXiv:1612.04642.
Worrall, D. E., Garbin, S. J., Turmukhambetov, D., & Brostow, G. J. (2017). Harmonic networks: deep translation and rotation equivariance. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 5028–5037).
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015 (pp. 1912–1920). https://doi.org/10.1109/CVPR.2015.7298801.
Yi, L., Su, H., Guo, X., & Guibas, L. (2016). SyncSpecCNN: Synchronized spectral CNN for 3d shape segmentation. arXiv preprint arXiv:1612.00606.
Zhang, R. (2019). Making convolutional networks shift-invariant again. In International conference on machine learning (ICML)
Zhou, Y., Ye, Q., Qiu, Q., & Jiao, J. (2017). Oriented response networks. In The IEEE conference on computer vision and pattern recognition (CVPR).

Download references

Author information

Authors and Affiliations

University of Pennsylvania, Philadelphia, USA
Carlos Esteves, Christine Allen-Blanchette & Kostas Daniilidis
Google Research, New York, USA
Ameesh Makadia

Authors

Carlos Esteves
View author publications
You can also search for this author in PubMed Google Scholar
Christine Allen-Blanchette
View author publications
You can also search for this author in PubMed Google Scholar
Ameesh Makadia
View author publications
You can also search for this author in PubMed Google Scholar
Kostas Daniilidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Esteves.

Additional information

Communicated by Cristian Sminchisescu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We are grateful for support through the following grants: NSF-DGE-0966142 (IGERT), NSF-IIP-1439681 (I/UCRC), NSF-IIS-1426840, NSF-IIS-1703319, NSF MRI 1626008, ARL RCTA W911NF-10-2-0016, ONR N00014-17-1-2093, and by Honda Research Institute. Our code is available at http://github.com/daniilidis-group/spherical-cnn.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Esteves, C., Allen-Blanchette, C., Makadia, A. et al. Learning SO(3) Equivariant Representations with Spherical CNNs. Int J Comput Vis 128, 588–600 (2020). https://doi.org/10.1007/s11263-019-01220-1

Download citation

Received: 01 February 2019
Accepted: 26 August 2019
Published: 06 September 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11263-019-01220-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions