Learning 3D Shapes as Multi-layered Height-Maps Using 2D Convolutional Networks

  • Kripasindhu SarkarEmail author
  • Basavaraj Hampiholi
  • Kiran Varanasi
  • Didier Stricker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)


We present a novel global representation of 3D shapes, suitable for the application of 2D CNNs. We represent 3D shapes as multi-layered height-maps (MLH) where at each grid location, we store multiple instances of height maps, thereby representing 3D shape detail that is hidden behind several layers of occlusion. We provide a novel view merging method for combining view dependent information (Eg. MLH descriptors) from multiple views. Because of the ability of using 2D CNNs, our method is highly memory efficient in terms of input resolution compared to the voxel based input. Together with MLH descriptors and our multi view merging, we achieve the state-of-the-art result in classification on ModelNet dataset.


CNN on 3D shapes 3D shape representation ModelNet Shape classification Shape generation 



This work was partially funded by the BMBF project DYNAMICS (01IW15003).

Supplementary material

474218_1_En_5_MOESM1_ESM.pdf (921 kb)
Supplementary material 1 (pdf 920 KB)


  1. 1.
    Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.J.: Representation learning and adversarial generation of 3d point clouds. CoRR abs/ arXiv:1707.02392 (2017).
  2. 2.
    Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. CoRR abs/ arXiv:1608.04236 (2016).
  3. 3.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  4. 4.
    Girshick, R.B.: Fast R-CNN. CoRR abs/ arXiv:1504.08083 (2015).
  5. 5.
    Gomez-Donoso, F., Garcia-Garcia, A., Garcia-Rodriguez, J., Orts-Escolano, S., Cazorla, M.: Lonchanet: A sliced-based cnn architecture for real-time 3d object recognition. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 412–418 (May 2017). |DOIurl10.1109/IJCNN.2017.7965883Google Scholar
  6. 6.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE (2017)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/ arXiv:1512.03385 (2015).
  8. 8.
    Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3813–3822. IEEE (2016)Google Scholar
  9. 9.
    Klokov, R., Lempitsky, V.: Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 863–872. IEEE (2017)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012).
  11. 11.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. CoRR abs/ arXiv:1609.04802 (2016).
  12. 12.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  13. 13.
    Masci, J., Boscaini, D., Bronstein, M.M., Vandergheynst, P.: Geodesic convolutional neural networks on riemannian manifolds. In: IEEE Workshop on 3D Representation and Recognition (3DRR). pp. 37–45 (2015)Google Scholar
  14. 14.
    Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015)Google Scholar
  15. 15.
    Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting (2016)Google Scholar
  16. 16.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv:1612.00593 (2016)
  17. 17.
    Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)Google Scholar
  18. 18.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  19. 19.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/ arXiv:1506.01497 (2015).
  20. 20.
    Riegler, G., Ulusoy, A.O., Geiger, A.: Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  21. 21.
    Russakovsky, O., et al.: ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). Scholar
  22. 22.
    Sarkar, K., Varanasi, K., Stricker, D.: Learning quadrangulated patches for 3d shape parameterization and completion. In: International Conference on 3D Vision 2017 (2017)Google Scholar
  23. 23.
    Sarkar, K., Varanasi, K., Stricker, D.: 3d shape processing by convolutional denoising autoencoders on local patches. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018)Google Scholar
  24. 24.
    Sedaghat, N., Zolfaghari, M., Amiri, E., Brox, T.: Orientation-boosted voxel nets for 3d object recognition. arXiv preprint arXiv:1604.03351 (2016)
  25. 25.
    Sfikas, K., Theoharis, T., Pratikakis, I.: Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval. In: Pratikakis, I., Dupont, F., Ovsjanikov, M. (eds.) Eurographics Workshop on 3D Object Retrieval. The Eurographics Association (2017).
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/ arXiv:1409.1556 (2014).
  27. 27.
    Soltani, A.A., Huang, H., Wu, J., Kulkarni, T.D., Tenenbaum, J.B.: Synthesizing 3d shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1511–1519 (2017)Google Scholar
  28. 28.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of ICCV (2015)Google Scholar
  29. 29.
    Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015).
  30. 30.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR abs/ arXiv:1512.00567 (2015).
  31. 31.
    Wang, C., Pelillo, M., Siddiqi, K.: Dominant set clustering and pooling for multi-view 3d object recognition. In: Proceedings of British Machine Vision Conference (BMVC) (2017)Google Scholar
  32. 32.
    Wu, Z., et al.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Kripasindhu Sarkar
    • 1
    • 2
    Email author
  • Basavaraj Hampiholi
    • 2
  • Kiran Varanasi
    • 1
  • Didier Stricker
    • 1
    • 2
  1. 1.DFKI KaiserslauternKaiserslauternGermany
  2. 2.Technische Universität KaiserslauternKaiserslauternGermany

Personalised recommendations