Bio-Inspired Architecture for Deriving 3D Models from Video Sequences

  • Julius SchöningEmail author
  • Gunther Heidemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10117)


In an everyday context, automatic or interactive 3D reconstruction of objects from one or several videos is not yet possible. Humans, on the contrary, are capable of recognizing the 3D shape of objects even in complex video sequences. To enable machines for doing the same, we propose a bio-inspired processing architecture, which is motivated by the human visual system and converts video data into 3D representations. Similar to the hierarchy of the ventral stream, our process reduces the influence of the position information in the video sequences by object recognition and represents the object of interest as multiple pictorial representations. These multiple pictorial representations are showing 2D projections of the object of interest from different perspectives. Thus, a 3D point cloud can be obtained by multiple view geometry algorithms. In the course of a detailed presentation of this architecture, we additionally highlight existing analogies to the view-combination scheme. The potency of our architecture is shown by reconstructing a car out of two video sequences. In case the automatic processing cannot complete the task, the user is put in the loop to solve the problem interactively. This human-machine interaction facilitates a prototype implementation of the architecture, which can reconstruct 3D objects out of one or several videos. In conclusion, the strengths and limitations of our approach are discussed, followed by an outlook to future work to improve the architecture.


Point Cloud Video Sequence Pictorial Representation Structure From Motion Inferior Temporal 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material (26.8 mb)
Supplementary material 1 (zip 27452 KB)


  1. 1.
    Agisoft: Agisoft PhotoScan (2016),
  2. 2.
    Arikan, M., Schwärzler, M., Flöry, S., Wimmer, M., Maierhofer, S.: O-Snap: optimization-based snapping for modeling architecture. ACM Trans. Graph. 32(1), 6:1–6:15 (2013)CrossRefzbMATHGoogle Scholar
  3. 3.
    Autodesk Inc.: Autodesk 123D Catch\(|\)3D model from photos (2016).
  4. 4.
    Bernardini, F., Mittleman, J., Rushmeier, H., Silva, C., Taubin, G.: The ball-pivoting algorithm for surface reconstruction. IEEE Trans. Vis. Comput. Graph. 5(4), 349–359 (1999)CrossRefGoogle Scholar
  5. 5.
    Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object deection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: IEEE International Conference on Computer Vision (ICCV), pp. 105–112 (2001)Google Scholar
  7. 7.
    Chen, T., Zhu, Z., Shamir, A., Hu, S.M., Cohen-Or, D.: 3-Sweep. ACM Trans. Graph. 32(6), 1–10 (2013)Google Scholar
  8. 8.
    Dasiopoulou, S., Giannakidou, E., Litos, G., Malasioti, P., Kompatsiaris, Y.: A survey of semantic image and video annotation tools. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. LNCS (LNAI), vol. 6050, pp. 196–239. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-20795-2_8 CrossRefGoogle Scholar
  9. 9.
    Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Computer Graphics and Interactive Techniques - SIGGRAPH, pp. 11–20 (1996)Google Scholar
  10. 10.
    Doermann, D., Mihalcik, D.: Tools and techniques for video performance evaluation. Int. Conf. Recogn. (ICPR) 4, 167–170 (2000)Google Scholar
  11. 11.
    Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: IEEE Computer Vision and Pattern Recognition (CVPR), pp. 2141–2148 (2010)Google Scholar
  12. 12.
    van den Hengel, A., Dick, A., Thormählen, T., Ward, B., Torr, P.H.S.: VideoTrace: rapid interactive scene modelling from video. ACM Trans. Graph. 26(3), 86:1–86:6 (2007)Google Scholar
  13. 13.
    van den Hengel, A., Hill, R., Ward, B., Dick, A.: In situ image-based modeling. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 107–110 (2009)Google Scholar
  14. 14.
    Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using depth cameras for dense 3D modeling of indoor environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental Robotics, pp. 477–491. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  15. 15.
    Itseez: OpenCV — OpenCV (2016).
  16. 16.
    Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. 32(3), 1–13 (2013)CrossRefzbMATHGoogle Scholar
  17. 17.
    Kholgade, N., Simon, T., Efros, A., Sheikh, Y.: 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graph. 33(4), 127:1–127:13 (2014)CrossRefGoogle Scholar
  18. 18.
    Kowdle, A., Chang, Y.J., Gallagher, A., Batra, D., Chen, T.: Putting the user in the loop for image-based modeling. Int. J. Comput. Vis. 108(1), 30–48 (2014)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Kurzhals, K., Bopp, C.F., Bässler, J., Ebinger, F., Weiskopf, D.: Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli. In: Workshop on BELIV, pp. 54–60 (2014)Google Scholar
  20. 20.
    Matroska: Matroska media container (2016).
  21. 21.
    MeshLab: Meshlab (2016).
  22. 22.
    Multimedia Knowledge and Social Media Analytics Laboratory: Video image annotation tool (2015).
  23. 23.
    Musialski, P., Wonka, P., Aliaga, D.G., Wimmer, M., Gool, L., Purgathofer, W.: A survey of urban reconstruction. Comput. Graph. Forum. 32, 146–177 (2013)CrossRefGoogle Scholar
  24. 24.
    Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136 (2011)Google Scholar
  25. 25.
    Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)CrossRefGoogle Scholar
  26. 26.
    Pan, Q., Reitmayr, G., Drummond, T.: ProFORMA: probabilistic feature-based on-line rapid model acquisition, pp. 112:1–112:11. British Machine Vision Conference (BMVC) (2009)Google Scholar
  27. 27.
    Pintore, G., Gobbetti, E.: Effective mobile mapping of multi-room indoor structures. Vis. Comput. 30(6), 707–716 (2014)CrossRefGoogle Scholar
  28. 28.
    Pollefeys, M., Nistér, D., Frahm, J.M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.J., Merrell, P., Salmi, C., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewénius, H., Yang, R., Welch, G., Towles, H.: Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vis. 78(2), 143–167 (2008)CrossRefGoogle Scholar
  29. 29.
    Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. Int. J. Comput. Vis. 59(3), 207–232 (2004)CrossRefGoogle Scholar
  30. 30.
    Rother, C., Kolmogorov, V., Blake, A.: GrabCut. ACM Trans. Graph. 23(3), 309–314 (2004)CrossRefGoogle Scholar
  31. 31.
    Schöning, J.: Interactive 3D reconstruction: new opportunities for getting CAD-ready models. In: Imperial College Computing Student Workshop (ICCSW). OpenAccess Series in Informatics (OASIcs), vol. 49, pp. 54–61. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2015)Google Scholar
  32. 32.
    Schöning, J., Faion, P., Heidemann, G.: Semi-automatic ground truth annotation in videos: an interactive tool for polygon-based object annotation and segmentation. In: International Conference on Knowledge Capture (K-CAP), pp. 17:1–17:4. ACM, New York (2015)Google Scholar
  33. 33.
    Schöning, J., Faion, P., Heidemann, G.: Pixel-wise ground truth annotation in videos - an semi-automatic approach for pixel-wise and semantic object annotation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 690–697. SCITEPRESS (2016)Google Scholar
  34. 34.
    Schöning, J., Faion, P., Heidemann, G., Krumnack, U.: Eye tracking data in multimedia containers for instantaneous visualizations. In: IEEE VIS Workshop on Eye Tracking and Visualization (ETVIS), IEEE (2016)Google Scholar
  35. 35.
    Schöning, J., Faion, P., Heidemann, G., Krumnack, U.: Providing video annotations in multimedia containers for visualization and research. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)Google Scholar
  36. 36.
    Schöning, J., Heidemann, G.: Evaluation of multi-view 3D reconstruction software. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9257, pp. 450–461. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23117-4_39 CrossRefGoogle Scholar
  37. 37.
    Schöning, J., Heidemann, G.: Taxonomy of 3D sensors - a survey of state-of-the-art consumer 3D-reconstruction sensors and their field of applications. In: Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), vol. 3, pp. 194–199. SCITEPRESS (2016)Google Scholar
  38. 38.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25(3), 835–846 (2006)CrossRefGoogle Scholar
  39. 39.
    Solem, J.E.: Programming Computer Vision with Python: Tools and Algorithms for Analyzing Images. O’Reilly Media Inc., Sebastopol (2012)Google Scholar
  40. 40.
    Sub Station Alpha: Sub station alpha v4.00+ script format (2016).
  41. 41.
    Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., Pollefeys, M.: Live metric 3D reconstruction on mobile phones. In: IEEE International Conference on Computer Vision (ICCV), pp. 65–72. IEEE (2013)Google Scholar
  42. 42.
    The MathWorks Inc: MATLAB - MathWorks (2016).
  43. 43.
    Ullman, S.: High-level Vision: Object Recognition and Visual Cognition, 2nd edn. MIT Press, Cambridge (1997)zbMATHGoogle Scholar
  44. 44.
    Ungerleider, L.: What and where in the human brain. Curr. Opin. Neurobiol. 4(2), 157165 (1994)CrossRefGoogle Scholar
  45. 45.
    Ungerleider, L., Mishkin, M.: Two cortical visual systems. In: Ingle, D., Goodale, M., Mansfield, R. (eds.) Analysis Visual Behavior, pp. 549–586. MIT Press, Boston (1982)Google Scholar
  46. 46.
    Valentin, J., Torr, P., Vineet, V., Cheng, M.M., Kim, D., Shotton, J., Kohli, P., Niener, M., Criminisi, A., Izadi, S.: Semanticpaint. ACM Trans. Graph. 34(5), 1–17 (2015)CrossRefGoogle Scholar
  47. 47.
    Wu, C.: VisualSfM: a visual structure from motion system (2016).
  48. 48.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 75–82 (2014)Google Scholar
  49. 49.
    Zhang, Y., Gibson, G.M., Hay, R., Bowman, R.W., Padgett, M.J., Edgar, M.P.: A fast 3D reconstruction system with a low-cost camera accessory. Sci. Rep. 5, 10909:1–10909:7 (2015)Google Scholar
  50. 50.
    Zhang, Z., Tan, T., Huang, K., Wang, Y.: Three-dimensional deformable-model-based localization and recognition of road vehicles. IEEE Trans. Image Process. 21(1), 113 (2012)MathSciNetGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Institute of Cognitive ScienceOsnabrück UniversityOsnabrückGermany

Personalised recommendations