Large-Scale Shape Retrieval with Sparse 3D Convolutional Neural Networks

  • Alexandr NotchenkoEmail author
  • Yermek Kapushev
  • Evgeny Burnaev
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10716)


In this paper we present results of performance evaluation of S3DCNN — a Sparse 3D Convolutional Neural Network — on a large-scale 3D Shape benchmark ModelNet40, and measure how it is impacted by voxel resolution of input shape. We demonstrate comparable classification and retrieval performance to state-of-the-art models, but with much less computational costs in training and inference phases. We also notice that benefits of higher input resolution can be limited by an ability of a neural network to generalize high level features.


Deep Learning Sparse 3D Convolutional Neural Network Voxel resolution 



We are very grateful to Dmitry Yarotsky for his contribution to this research project. Big Thanks to Benjamin Graham for some useful comments and ideas. Thanks to Rasim Akhunzyanov for his help in debugging the PySparseConvNet code.

The research was partially supported by the Russian Science Foundation grant (project 14-50-00150). E. Burnaev was partially supported by the Next Generation Skoltech-MIT Program.


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).
  2. 2.
    Bottou, L.: Stochastic gradient tricks. Neural Netw. Tricks Trade Reloaded 7700, 430–445 (2012)Google Scholar
  3. 3.
    Brock, A., Lim, T., Ritchie, J., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016)
  4. 4.
    Bronstein, A.M., Bronstein, M.M., Guibas, L.J., Ovsjanikov, M.: Shape google: geometric words and expressions for invariant shape retrieval. ACM Trans. Graph. (TOG) 30(1), 1 (2011)CrossRefGoogle Scholar
  5. 5.
    Chollet, F., et al.: Keras (2015).
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  7. 7.
    Graham, B.: Spatially-sparse convolutional neural networks. arXiv preprint arXiv:1409.6070 (2014)
  8. 8.
    Hegde, V., Zadeh, R.: Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695 (2016)
  9. 9.
    Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). CrossRefGoogle Scholar
  10. 10.
    Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition. arXiv preprint arXiv:1605.08359 (2016)
  11. 11.
    Kazhdan, M., Funkhouser, T., Rusinkiewicz, S.: Rotation invariant spherical harmonic representation of 3D shape descriptors. Symp. Geom. Process. 6, 156–164 (2003)Google Scholar
  12. 12.
    Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough transform and 3D SURF for robust three dimensional classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 589–602. Springer, Heidelberg (2010). CrossRefGoogle Scholar
  13. 13.
    Kokkinos, I., Bronstein, M.M., Litman, R., Bronstein, A.M.: Intrinsic shape context descriptors for deformable shapes. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 159–166. IEEE (2012)Google Scholar
  14. 14.
    LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, vol. 3361, no. 10 (1995)Google Scholar
  15. 15.
    Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851 (2013)
  16. 16.
    Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)Google Scholar
  17. 17.
    Sedaghat, N., Zolfaghari, M., Brox, T.: Orientation-boosted voxel nets for 3D object recognition. arXiv preprint arXiv:1604.03351 (2016)
  18. 18.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of ICCV (2015)Google Scholar
  19. 19.
    Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016Google Scholar
  20. 20.
    Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next-generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems (LearningSys) in the Twenty-Ninth Annual Conference on Neural Information Processing Systems (NIPS) (2015)Google Scholar
  21. 21.
    Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)Google Scholar
  22. 22.
    Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Alexandr Notchenko
    • 1
    • 2
    Email author
  • Yermek Kapushev
    • 1
    • 2
  • Evgeny Burnaev
    • 1
  1. 1.Skolkovo Institute of Science and Technology, Skolkovo Innovation CentreMoscowRussia
  2. 2. Institute for Information Transmission Problems RASMoscowRussia

Personalised recommendations