VConv-DAE: Deep Volumetric Shape Learning Without Object Labels

  • Abhishek SharmaEmail author
  • Oliver Grau
  • Mario Fritz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9915)


With the advent of affordable depth sensors, 3D capture becomes more and more ubiquitous and already has made its way into commercial products. Yet, capturing the geometry or complete shapes of everyday objects using scanning devices (e.g. Kinect) still comes with several challenges that result in noise or even incomplete shapes.

Recent success in deep learning has shown how to learn complex shape distributions in a data-driven way from large scale 3D CAD Model collections and to utilize them for 3D processing on volumetric representations and thereby circumventing problems of topology and tessellation. Prior work has shown encouraging results on problems ranging from shape completion to recognition. We provide an analysis of such approaches and discover that training as well as the resulting representation are strongly and unnecessarily tied to the notion of object labels. Thus, we propose a full convolutional volumetric auto encoder that learns volumetric representation from noisy data by estimating the voxel occupancy grids. The proposed method outperforms prior work on challenging tasks like denoising and shape completion. We also show that the obtained deep embedding gives competitive performance when used for classification and promising results for shape interpolation.


Denoising auto-encoder 3D deep learning Shape completion Shape blending 



This work was supported by funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 642841.


  1. 1.
    Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: real-time dense surface mapping and tracking. In: 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (2011)Google Scholar
  2. 2.
    Chaudhuri, S., Kalogerakis, E., Guibas, L., Koltun, V.: Probabilistic reasoning for assembly-based 3D modeling. In: SIGGRAPH (2011)Google Scholar
  3. 3.
    Kalogerakis, E., Chaudhuri, S., Koller, D., Koltun, V.: A Probabilistic model of component-based shape synthesis. In: SIGGRAPH (2012)Google Scholar
  4. 4.
    Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: CVPR (2015)Google Scholar
  5. 5.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Unsupervised learning of hierarchical representations with convolutional deep belief networks. Commun. ACM 54(10), 95–103 (2011)CrossRefGoogle Scholar
  7. 7.
    Lai, K., Bo, L., Fox, D.: Unsupervised feature learning for 3D scene labeling. In: ICRA (2014)Google Scholar
  8. 8.
    Mitra, N.J., Pauly, M., Wand, M., Ceylan, D.: Symmetry in 3D geometry: extraction and applications. In: Computer Graphics Forum, vol. 32, pp. 1–23. Wiley Online Library (2013)Google Scholar
  9. 9.
    Thanh Nguyen, D., Hua, B.S., Tran, K., Pham, Q.H., Yeung, S.K.: A field model for repairing 3D shapes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  10. 10.
    Maturana, D., Scherer, S.: 3D convolutional neural networks for landing zone detection from lidar. In: ICRA (2015)Google Scholar
  11. 11.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)Google Scholar
  12. 12.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21735-7_7 CrossRefGoogle Scholar
  14. 14.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar
  15. 15.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  16. 16.
    Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D–R2N2: a unified approach for single and multi-view 3D object reconstruction (2015). arXiv:1604.00449
  17. 17.
    Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.M.: Learning shape correspondence with anisotropic convolutional neural networks (2016). arXiv:1605.06437
  18. 18.
    Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.M., Cremers, D.: Anisotropic diffusion descriptors. In: Computer Graphics Forum (2016)Google Scholar
  19. 19.
    Huang, H., Kalogerakis, E., Yumer, M. E., Mech, R.: Shape synthesis from sketches via procedural models, convolutional networks. In: IEEE Transactions on Visualization and Computer Graphics (2016)Google Scholar
  20. 20.
    Wei, L., Huang, Q., Ceylan, D., Etienne, V., Li, H.: Dense human body correspondences using convolutional networks. In: CVPR (2016)Google Scholar
  21. 21.
    Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks. In: CVPR (2015)Google Scholar
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  23. 23.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33715-4_54 CrossRefGoogle Scholar
  25. 25.
    Chen, D.Y., Tian, X.P., Shen, Y.T., Ouhyoung, M.: On visual similarity based 3D model retrieval. In: Computer Graphics Forum, vol. 22, pp. 223–232. Wiley Online Library (2003)Google Scholar
  26. 26.
    Kazhdan, M., Funkhouser, T., Rusinkiewicz, S.: Rotation invariant spherical harmonic representation of 3D shape descriptors. In: Symposium on geometry processing, vol. 6, pp. 156–164 (2003)Google Scholar
  27. 27.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of ICCV (2015)Google Scholar
  28. 28.
    Yumer, M.E., Mitra, N.J.: Learning semantic deformation flows with 3D convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 294–311. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-46466-4_18 CrossRefGoogle Scholar
  29. 29.
    Georgoulis, S., Rematas, K., Ritschel, T., Fritz, M., Gool, L.V., Tuytelaars, T.: Deep reflectance maps. In: CVPR (2016)Google Scholar
  30. 30.
    Rematas, K., Ritschel, T., Fritz, M., Gavves, E., Tuytelaars, T.: DeLight-Net: decomposing reflectance maps into specular materials and natural illumination (2016). arXiv:1602.00328
  31. 31.
    Rezende, D.J., Eslami, S.M., Mohamed, S., Battaglia, P., Jaderberg, M., Heess, N.: Unsupervised learning of 3D structure from images (2016). arXiv:1607.00662
  32. 32.
    Nalbach, O., Arabadzhiyska, E., Mehta, D., Seidel, H.P., Ritschel, T.: Deep shading: convolutional neural networks for screen-space shading (2016). arXiv:1603.06078

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Intel Visual Computing InstituteSaarbrückenGermany
  2. 2.IntelSaarbrückenGermany
  3. 3.Max Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations