Structure from Motion by Artificial Neural Networks

  • Julius Schöning
  • Thea Behrens
  • Patrick Faion
  • Peyman Kheiri
  • Gunther Heidemann
  • Ulf Krumnack
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10269)


Retrieving the 3D shape of an object from a collection of images or a video is currently realized with multiple view geometry algorithms, most commonly Structure from Motion (SfM) methods. With the aim of introducing artificial neuronal networks (ANN) into the domain of image-based 3D reconstruction of unknown object categories, we developed a scalable voxel-based dataset in which one can choose different training and testing subsets. We show that image-based 3D shape reconstruction by ANNs is possible, and we evaluate the aspect of scalability by examining the correlation between the complexity of the reconstructed object and the required amount of training samples. Along with our dataset, we are introducing, in this paper, a first baseline achieved by an only five-layer ANN. For capturing life’s complexity, the ANNs trained on our dataset can be used a as pre-trained starting point and adapted for further investigation. Finally, we conclude with a discussion of open issues and further work empowering 3D reconstruction on real world images or video sequences by a CAD-model based ANN training data set.


  1. 1.
    Agisoft: Agisoft PhotoScan, January 2017.
  2. 2.
    Autodesk Inc.: Autodesk 123D Catch — 3D model from photos.
  3. 3.
    Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), 1–46 (2015)Google Scholar
  4. 4.
    Brabandere, B.D., Jia, X., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. In: Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc. (2016)Google Scholar
  5. 5.
  6. 6.
  7. 7.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). doi: 10.1007/978-3-319-46484-8_38 CrossRefGoogle Scholar
  8. 8.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE International Conference on Computer Vision (ICCV), pp. 2650–2658 (2015)Google Scholar
  9. 9.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems (NIPS), pp. 2366–2374. Curran Associates, Inc. (2014)Google Scholar
  10. 10.
    Elizondo, D., Zhou, S.-M., Chrysostomou, C.: Surface reconstruction techniques using neural networks to recover noisy 3D scenes. In: Kůrková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008. LNCS, vol. 5163, pp. 857–866. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-87536-9_88 CrossRefGoogle Scholar
  11. 11.
    Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). doi: 10.1007/978-3-319-46484-8_45 CrossRefGoogle Scholar
  12. 12.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  13. 13.
    González, À.: Measurement of areas on a sphere using fibonacci and latitude-longitude lattices. Math. Geosci. 42(1), 49–64 (2009)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105. Curran Associates, Inc. (2012)Google Scholar
  15. 15.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  16. 16.
    Li, B., Shen, C., Dai, Y., van den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs, pp. 1119–1127. Institute of Electrical and Electronics Engineers (IEEE) (2015)Google Scholar
  17. 17.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5162–5170 (2015)Google Scholar
  18. 18.
    Liu, G., Yang, C., Li, Z., Ceylan, D., Huang, Q.: Symmetry-aware depth estimation using deep neural networks. arXiv preprint arXiv:1604.06079 (2016)
  19. 19.
    Pan, Q., Reitmayr, G., Drummond, T.: ProFORMA: probabilistic feature-based on-line rapid model acquisition. In: British Machine Vision Conference (BMVC), pp. 112.1–112.11 (2009)Google Scholar
  20. 20.
    Peng, L.W., Shamsuddin, S.M.: 3D object reconstruction and representation using neural networks. In: Proceedings of the International Conference on Computer Graphics and iIteractive Techniques in Austalasia and Southe East Asia (GRAPHITE), pp. 139–147. Association for Computing Machinery (ACM) (2004)Google Scholar
  21. 21.
    Pollefeys, M., Nistér, D., Frahm, J.M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.J., Merrell, P., et al.: Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vis. 78(2–3), 143–167 (2008)CrossRefGoogle Scholar
  22. 22.
    Roy, A., Todorovic, S.: Monocular depth estimation using neural regression forest. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  23. 23.
    Schöning, J., Heidemann, G.: Evaluation of multi-view 3D reconstruction software. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9257, pp. 450–461. Springer, Cham (2015). doi: 10.1007/978-3-319-23117-4_39 CrossRefGoogle Scholar
  24. 24.
    Simard, P., Steinkraus, D., Platt, J.: Best practices for convolutional neural networks applied to visual document analysis. In: Seventh International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 958–962 (2003)Google Scholar
  25. 25.
    Theano Development Team: Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints arXiv.abs/1605.02688 (2016)
  26. 26.
    Vergauwen, M., Van Gool, L.: Web-based 3D reconstruction service. Mach. Vis. Appl. (MVA) 17(6), 411–426 (2006)CrossRefGoogle Scholar
  27. 27.
    Waller, L., Tian, L.: Computational imaging: machine learning for 3D microscopy. Nature 523(7561), 416–417 (2015)CrossRefGoogle Scholar
  28. 28.
    Wang, X., Fouhey, D.F., Gupta, A.: Designing deep networks for surface normal estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 539–547, June 2015Google Scholar
  29. 29.
    Wu, C.: VisualSFM: a visual structure from motion system.
  30. 30.
    Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Institute of Electrical and Electronics Engineers (IEEE) (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Julius Schöning
    • 1
  • Thea Behrens
    • 1
  • Patrick Faion
    • 1
  • Peyman Kheiri
    • 1
  • Gunther Heidemann
    • 1
  • Ulf Krumnack
    • 1
  1. 1.Institute of Cognitive ScienceOsnabrück UniversityOsnabrückGermany

Personalised recommendations