Representation Learning on Unit Ball with 3D Roto-translational Equivariance

Abstract

Convolution is an integral operation that defines how the shape of one function is modified by another function. This powerful concept forms the basis of hierarchical feature learning in deep neural networks. Although performing convolution in Euclidean geometries is fairly straightforward, its extension to other topological spaces—such as a sphere (\(\mathbb {S}^2\)) or a unit ball (\(\mathbb {B}^3\))—entails unique challenges. In this work, we propose a novel ‘volumetric convolution’ operation that can effectively model and convolve arbitrary functions in \(\mathbb {B}^3\). We develop a theoretical framework for volumetric convolution based on Zernike polynomials and efficiently implement it as a differentiable and an easily pluggable layer in deep networks. By construction, our formulation leads to the derivation of a novel formula to measure the symmetry of a function in \(\mathbb {B}^3\) around an arbitrary axis, that is useful in function analysis tasks. We demonstrate the efficacy of proposed volumetric convolution operation on one viable use case i.e., 3D object recognition.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Notes

  1. 1.

    We refer the reader to (Cohen et al. 2018a) for an excellent review on group equivariant CNNs.

References

  1. Agathos, A., Pratikakis, I., Papadakis, P., Perantonis, S. J., Azariadis, P. N., & Sapidis, N. S. (2009). Retrieval of 3D articulated objects using a graph-based representation. In 3DOR 2009 (pp. 29–36).

  2. Ankerst, M., Kastenmüller, G., Kriegel, H. P., & Seidl, T. (1999). 3D shape histograms for similarity search and classification in spatial databases. In International symposium on spatial databases (pp. 207–226). Berlin: Springer.

  3. Arbter, K., Snyder, W. E., Burkhardt, H., & Hirzinger, G. (1990). Application of affine-invariant fourier descriptors to recognition of 3-d objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7), 640–647.

    Article  Google Scholar 

  4. Bai, S., Bai, X., Zhou, Z., Zhang, Z., & Latecki, L. J. (2016). Gift: A real-time and scalable 3D shape search engine. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5023–5032). IEEE.

  5. Boomsma, W., & Frellsen, J. (2017). Spherical convolutions and their application in molecular modelling. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30, pp. 3433–3443). Curran Associates, Inc. http://papers.nips.cc/paper/6935-spherical-convolutions-and-their-application-in-molecular-modelling.pdf.

  6. Boscaini, D., Masci, J., Melzi, S., Bronstein, M. M., Castellani, U., & Vandergheynst, P. (2015). Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. Computer Graphics Forum, 34, 13–23.

    Article  Google Scholar 

  7. Boscaini, D., Masci, J., Rodolà, E., & Bronstein, M. (2016). Learning shape correspondence with anisotropic convolutional neural networks. In Advances in neural information processing systems (pp. 3189–3197).

  8. Brock, A., Lim, T., Ritchie, J. M., & Weston, N. (2016). Generative and discriminative Voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236.

  9. Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. (2017). Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4), 18–42.

    Article  Google Scholar 

  10. Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.

  11. Canterakis, N. (1996). Complete moment invariants and pose determination for orthogonal transformations of 3D objects. In Mustererkennung 1996 (pp. 339–350). Berlin: Springer.

    Google Scholar 

  12. Canterakis, N. (1999). 3D zernike moments and zernike affine invariants for 3D image analysis and recognition. In In 11th Scandinavian conference on image analysis, Citeseer.

  13. Carrière, M., Oudot, S. Y., & Ovsjanikov, M. (2015). Stable topological signatures for points on 3D shapes. Computer Graphics Forum, 34, 1–12.

    Article  Google Scholar 

  14. Cohen, T., Geiger, M., & Weiler, M. (2018a). A general theory of equivariant CNNS on homogeneous spaces. arXiv preprint arXiv:1811.02017.

  15. Cohen, T. S., Geiger, M., Koehler, J., & Welling, M. (2018b). Spherical CNNS. In International conference on learning representations (ICLR).

  16. Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29, pp. 3844–3852). Curran Associates, Inc. http://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.pdf.

  17. El Mallahi, M., Zouhri, A., El Affar, A., Tahiri, A., & Qjidaa, H. (2017). Radial Hahn moment invariants for 2D and 3D image recognition. International Journal of Automation and Computing, 15(3), 277–289.

    Article  Google Scholar 

  18. Ester, M., Kriegel, H. P., Sander, J., Xu, X., et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, 96, 226–231.

    Google Scholar 

  19. Esteves, C., Allen-Blanchette, C., Makadia, A., & Daniilidis, K. (2018). Learning so(3) equivariant representations with spherical CNNS. In The European conference on computer vision (ECCV).

  20. Flusser, J., Boldys, J., & Zitová, B. (2003). Moment forms invariant to rotation and blur in arbitrary number of dimensions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2), 234–246.

    Article  Google Scholar 

  21. Fotenos, A. F., Snyder, A. Z., Girton, L. E., Morris, J. C., & Buckner, R. L. (2005). Normative estimates of cross-sectional and longitudinal brain volume decline in aging and AD. Neurology, 64(6), 1032–1039.

    Article  Google Scholar 

  22. Frome, A., Huber, D., Kolluri, R., Bülow, T., & Malik, J. (2004). Recognizing objects in range data using regional point descriptors. In European conference on computer vision (pp. 224–237). Berlin: Springer.

  23. Furuya, T., & Ohbuchi, R. (2016). Deep aggregation of local 3D geometric features for 3D model retrieval. In BMVC.

  24. Garcia-Garcia, A., Gomez-Donoso, F., Garcia-Rodriguez, J., Orts-Escolano, S., Cazorla, M., & Azorin-Lopez, J. (2016). Pointnet: A 3D convolutional neural network for real-time object class recognition. In 2016 international joint conference on neural networks (IJCNN) (pp. 1578–1584). IEEE.

  25. Guo, X. (1993). Three dimensional moment invariants under rigid transformation. In International conference on computer analysis of images and patterns (pp. 518–522). Berlin: Springer.

    Google Scholar 

  26. Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J., & Kwok, N. M. (2016). A comprehensive performance evaluation of 3D local feature descriptors. International Journal of Computer Vision, 116(1), 66–89.

    MathSciNet  Article  Google Scholar 

  27. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  28. Henaff, M., Bruna, J., & LeCun, Y. (2015). Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163.

  29. Hu, M. K. (1962). Visual pattern recognition by moment invariants. IRE Transactions on Information Theory, 8(2), 179–187.

    Article  Google Scholar 

  30. Ilse, M., Tomczak, J. M., & Welling, M. (2018). Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712.

  31. Janssen, M. H., Janssen, A. J., Bekkers, E. J., Bescós, J. O., & Duits, R. (2018). Design and processing of invertible orientation scores of 3D images. Journal of Mathematical Imaging and Vision, 60(9), 1427–1458.

    MathSciNet  Article  Google Scholar 

  32. Johns, E., Leutenegger, S., & Davison, A. J. (2016). Pairwise decomposition of image sequences for active multi-view recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3813–3822). IEEE.

  33. Kanezaki, A., Matsushita, Y., & Nishida, Y. (2016). Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. arXiv preprint arXiv:1603.06208.

  34. Khalil, M. I., & Bayoumi, M. M. (2001). A dyadic wavelet affine invariant function for 2D shape recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1152–1164.

    Article  Google Scholar 

  35. Khan, S. H., Hayat, M., & Barnes, N. (2018). Adversarial training of variational auto-encoders for high fidelity image generation. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 1312–1320). IEEE.

  36. Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep KD-networks for the recognition of 3D point cloud models. In 2017 IEEE international conference on computer vision (ICCV) (pp. 863–872). IEEE.

  37. Kondor, R. (2018). N-body networks: A covariant hierarchical neural network architecture for learning atomic potentials. arXiv preprint arXiv:1803.01588.

  38. Kondor, R., Lin, Z., & Trivedi, S. (2018). Clebsch-gordan nets: A fully fourier space spherical convolutional neural network. arXiv preprint arXiv:1806.09231.

  39. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 25, pp. 1097–1105). Curran Associates, Inc. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.

  40. Kurtek, S., Klassen, E., Ding, Z., & Srivastava, A. (2010). A novel Riemannian framework for shape analysis of 3D objects. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 1625–1632). IEEE.

  41. Lavoué, G. (2012). Combination of bag-of-words descriptors for robust partial shape retrieval. The Visual Computer, 28(9), 931–942.

    Article  Google Scholar 

  42. Li, H. B., Huang, T. Z., Zhang, Y., Liu, X. P., & Gu, T. X. (2011). Chebyshev-type methods and preconditioning techniques. Applied Mathematics and Computation, 218(2), 260–270.

    MathSciNet  Article  Google Scholar 

  43. Li, J., Chen, B. M., & Lee, G. H. (2018). So-net: Self-organizing network for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9397–9406).

  44. Li, Y., Pirk, S., Su, H., Qi, C. R., & Guibas, L. J. (2016). FPNN: Field probing neural networks for 3D data. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29, pp. 307–315). Curran Associates, Inc. http://papers.nips.cc/paper/6416-fpnn-fieldprobing-neural-networks-for-3d-data.pdf.

  45. Lin, C., & Chellappa, R. (1987). Classification of partial 2-D shapes using Fourier descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 686–690.

    Article  Google Scholar 

  46. Liu, W., Zhang, Y.-M., Li, X., Yu, Z., Dai, B., Zhao, T., & Song, L. (2017). Deep hyperspherical learning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30, pp. 3950–3960). Curran Associates, Inc. http://papers.nips.cc/paper/6984-deep-hyperspherical-learning.pdf.

  47. Maron, H., Ben-Hamu, H., Shamir, N., & Lipman, Y. (2018). Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902.

  48. Masci, J., Boscaini, D., Bronstein, M., & Vandergheynst, P. (2015). Geodesic convolutional neural networks on Riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops (pp. 37–45).

  49. Maturana, D., & Scherer, S. (2015). Voxnet: A 3D convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 922–928). IEEE.

  50. Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., & Bronstein, M. M. (2017). Geometric deep learning on graphs and manifolds using mixture model CNNS. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5115–5124).

  51. Osada, R., Funkhouser, T., Chazelle, B., & Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics (TOG), 21(4), 807–832.

    MathSciNet  Article  Google Scholar 

  52. Papadakis, P., Pratikakis, I., Theoharis, T., Passalis, G., & Perantonis, S. (2008). 3D object retrieval using an efficient and compact hybrid shape descriptor. In Eurographics workshop on 3D object retrieval.

  53. Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017a). Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of computer vision and pattern recognition (CVPR) (Vol. 1(2), p. 4). IEEE.

  54. Qi, C. R., Su, H., Nießner, M., Dai, A., Yan, M., & Guibas, L. J. (2016). Volumetric and multi-view CNNS for object classification on 3D data. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp 5648–5656).

  55. Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30, pp. 5099–5108). Curran Associates, Inc. http://papers.nips.cc/paper/7095-pointnet-deep-hierarchical-feature-learning-on-point-sets-in-a-metric-space.pdf.

  56. Ramasinghe, S., Khan, S., & Barnes, N. (2019a). Volumetric convolution: Automatic representation learning in unit ball. arXiv preprint arXiv:1901.00616.

  57. Ramasinghe, S., Khan, S., Barnes, N., & Gould, S. (2019b). Blended convolution and synthesis for efficient discrimination of 3D shapes. arXiv preprint arXiv:1908.10209.

  58. Reininghaus, J., Huber, S., Bauer, U., & Kwitt, R. (2015). A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4741–4748).

  59. Reiss, T. (1992). Features invariant to linear transformations in 2D and 3D. In 11th IAPR international conference on pattern recognition. Vol. III. Conference C: Image, speech and signal analysis (pp. 493–496). IEEE.

  60. Ronchi, C., Iacono, R., & Paolucci, P. S. (1996). The “cubed sphere”: A new method for the solution of partial differential equations in spherical geometry. Journal of Computational Physics, 124(1), 93–114.

    MathSciNet  Article  Google Scholar 

  61. Sedaghat, N., Zolfaghari, M., Amiri, E., & Brox, T. (2016). Orientation-boosted voxel nets for 3D object recognition. arXiv preprint arXiv:1604.03351.

  62. Shi, B., Bai, S., Zhou, Z., & Bai, X. (2015). Deeppano: Deep panoramic representation for 3-D shape recognition. IEEE Signal Processing Letters, 22(12), 2339–2343.

    Article  Google Scholar 

  63. Simonovsky, M., & Komodakis, N. (2017). Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of CVPR.

  64. Su, H., Jampani, V., Sun, D., Maji, S., Kalogerakis, E., Yang, M. H., & Kautz, J. (2018). Splatnet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2530–2539).

  65. Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE international conference on computer vision (pp. 945–953).

  66. Suk, T., & Flusser, J. (1996). Vertex-based features for recognition of projectively deformed polygons. Pattern Recognition, 29(3), 361–367.

    Article  Google Scholar 

  67. Tabia, H., Laga, H., Picard, D., & Gosselin, P. H. (2014). Covariance descriptors for 3D shape matching and retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4185–4192).

  68. Tabia, H., Picard, D., Laga, H., & Gosselin, P. H. (2013). Compact vectors of locally aggregated tensors for 3D shape retrieval. In Eurographics workshop on 3D object retrieval.

  69. Tatsuma, A., & Aono, M. (2009). Multi-fourier spectra descriptor and augmentation with spectral clustering for 3D shape retrieval. The Visual Computer, 25(8), 785–804.

    Article  Google Scholar 

  70. Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., & Riley, P. (2018). Tensor field networks: Rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219.

  71. Tieng, Q. M., & Boles, W. W. (1995). An application of wavelet-based affine-invariant representation. Pattern Recognition Letters, 16(12), 1287–1296.

    Article  Google Scholar 

  72. Tombari, F., Salti, S., & Di Stefano, L. (2010). Unique signatures of histograms for local surface description. In European conference on computer vision (pp. 356–369). Berlin: Springer.

  73. Vranic, D. V., & Saupe, D. (2002). Description of 3D-shape using a complex function on the sphere. In 2002 IEEE international conference on multimedia and expo, 2002. ICME’02. Proceedings (Vol. 1, pp. 177–180) IEEE.

  74. Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., & Solomon, J. M. (2018). Dynamic graph CNN for learning on point clouds. arXiv preprint arXiv:1801.07829.

  75. Weiler, M., Geiger, M., Welling, M., Boomsma, W., & Cohen, T. (2018). 3D steerable CNNS: Learning rotationally equivariant features in volumetric data. arXiv preprint arXiv:1807.02547.

  76. Worrall, D. E., & Brostow, G. J. (2018). Cubenet: Equivariance to 3D rotation and translation. In European conference on computer vision.

  77. Worrall, D. E., Garbin, S. J., Turmukhambetov, D., & Brostow, G. J. (2017). Harmonic networks: Deep translation and rotation equivariance. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7168–7177). IEEE.

  78. Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29, pp. 82–90). Curran Associates, Inc. http://papers.nips.cc/paper/6096-learning-a-probabilistic-latent-space-of-object-shapes-via-3d-generative-adversarial-modeling.pdf.

  79. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1912–1920).

  80. Xie, J., Fang, Y., Zhu, F., & Wong, E. (2015). Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1275–1283). IEEE.

  81. Yang, B., Flusser, J., & Suk, T. (2015). 3D rotation invariants of Gaussian-hermite moments. Pattern Recognition Letters, 54, 18–26.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sameera Ramasinghe.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Xavier Pennec.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ramasinghe, S., Khan, S., Barnes, N. et al. Representation Learning on Unit Ball with 3D Roto-translational Equivariance. Int J Comput Vis 128, 1612–1634 (2020). https://doi.org/10.1007/s11263-019-01278-x

Download citation

Keywords

  • Convolution neural networks
  • 3D moments
  • Volumetric convolution
  • Zernike polynomials
  • Deep learning