3D Layout Propagation to Improve Object Recognition in Egocentric Videos

  • Alejandro RituertoEmail author
  • Ana C. Murillo
  • José J. Guerrero
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8927)


Intelligent systems need complex and detailed models of their environment to achieve more sophisticated tasks, such as assistance to the user. Vision sensors provide rich information and are broadly used to obtain these models, for example, indoor scene modeling from monocular images has been widely studied. A common initial step in those settings is the estimation of the \(3\)D layout of the scene. While most of the previous approaches obtain the scene layout from a single image, this work presents a novel approach to estimate the initial layout and addresses the problem of how to propagate it on a video. We propose to use a particle filter framework for this propagation process and describe how to generate and sample new layout hypotheses for the scene on each of the following frames. We present different ways to evaluate and rank these hypotheses. The experimental validation is run on two recent and publicly available datasets and shows promising results on the estimation of a basic \(3\)D layout. Our experiments demonstrate how this layout information can be used to improve detection tasks useful for a human user, in particular sign detection, by easily rejecting false positives.


Scene understanding Egocentric vision Object detection 


  1. 1.
    Badrinarayanan, V., Galasso, F., Cipolla, R.: Label propagation in video sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3265–3272 (2010)Google Scholar
  2. 2.
    Bao, S.Y., Sun, M., Savarese, S.: Toward coherent object detection and scene layout understanding. Image and Vision Computing 29(9), 569–579 (2011)CrossRefGoogle Scholar
  3. 3.
    Cambra, A.B., Murillo, A.: Towards robust and efficient text sign reading from a mobile phone. In: Int. Conf. on Computer Vision Workshops, pp. 64–71 (2011)Google Scholar
  4. 4.
    Chen, L., Guo, B.L., Sun, W.: Obstacle detection system for visually impaired people based on stereo vision. In: Int. Conf. on Genetic and Evolutionary Computing, pp. 723–726 (2010)Google Scholar
  5. 5.
    Ciocarlie, M., Hsiao, K., Jones, E.G., Chitta, S., Rusu, R.B., Şucan, I.A.: Towards reliable grasping and manipulation in household environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental Robotics. STAR, vol. 79, pp. 241–252. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    Coughlan, J.M., Yuille, A.L.: Manhattan world: Compass direction from a single image by bayesian inference. In: IEEE International Conference on Computer Vision (ICCV), pp. 941–947 (1999)Google Scholar
  7. 7.
    Delage, E., Lee, H., Ng, A.Y.: A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2418–2428 (2006)Google Scholar
  8. 8.
    Flint, A., Murray, D., Reid, I.: Manhattan scene understanding using monocular, stereo, and 3D features. In: IEEE International Conference on Computer Vision (ICCV), pp. 2228–2235 (2011)Google Scholar
  9. 9.
    Furlan, A., Miller, S., Sorrenti, D.G., Fei-Fei, L., Savarese, S.: Free your camera: 3d indoor scene understanding from arbitrary camera motion. In: British Machine Vision Conference (BMVC) (2013)Google Scholar
  10. 10.
    Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 482–496. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  11. 11.
    Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: IEEE International Conference on Computer Vision (ICCV), pp. 1849–1856 (2009)Google Scholar
  12. 12.
    Hedau, V., Hoiem, D., Forsyth, D.: Thinking inside the box: using appearance models and context based on room geometry. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 224–237. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  13. 13.
    Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. International Journal of Computer Vision 75(1), 151–172 (2007)CrossRefGoogle Scholar
  14. 14.
    Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. International Journal of Computer Vision 80(1), 3–15 (2008)CrossRefGoogle Scholar
  15. 15.
    Kovesi, P.D.: MATLAB and Octave functions for computer vision and image processingGoogle Scholar
  16. 16.
    Lee, D.C., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Advances in Neural Information Processing Systems (NIPS) (2010)Google Scholar
  17. 17.
    Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2136–2143 (2009)Google Scholar
  18. 18.
    López-Nicolás, G., Omedes, J., Guerrero, J.: Spatial layout recovery from a single omnidirectional image and its matching-free sequential propagation. In: Robotics and Autonomous Systems (2014)Google Scholar
  19. 19.
    Raza, S.H., Grundmann, M., Essa, I.: Geometric context from video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  20. 20.
    Rituerto, A., Manduchi, R., Murillo, A.C., Guerrero, J.J.: 3D Spatial layout propagation in a video sequence. In: Campilho, A., Kamel, M. (eds.) ICIAR 2014, Part II. LNCS, vol. 8815, pp. 374–382. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  21. 21.
    Rituerto, J., Murillo, A., Kosecka, J.: Label propagation in videos indoors with an incremental non-parametric model update. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2383–2389 (2011)Google Scholar
  22. 22.
    Rother, C.: A new approach to vanishing point detection in architectural environments. Image and Vision Computing 20(9), 647–655 (2002)CrossRefGoogle Scholar
  23. 23.
    Saxena, A., Sun, M., Ng, A.Y.: Make3d: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 824–840 (2009)CrossRefGoogle Scholar
  24. 24.
    Southey, T., Little, J.: 3D spatial relationships for improving object detection. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 140–147 (May 2013)Google Scholar
  25. 25.
    Tapu, R., Mocanu, B., Bursuc, A., Zaharia, T.: A smartphone-based obstacle detection and classification system for assisting visually impaired people. In: Int. Conf. on Computer Vision Workshops (ICCVW), pp. 444–451 (2013)Google Scholar
  26. 26.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM 53(3), 107–114 (2010)CrossRefGoogle Scholar
  27. 27.
    Tsai, G., Kuipers, B.: Dynamic visual understanding of the local environment for an indoor navigating robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4695–4701 (2012)Google Scholar
  28. 28.
    Vazquez-Reina, A., Avidan, S., Pfister, H., Miller, E.: Multiple hypothesis video segmentation from superpixel flows. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 268–281. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  29. 29.
    Wexler, Y., Shashua, A., Tadmor, O., Ehrlich, I.: User wearable visual assistance device (ORCAM), uS Patent App. 13/914,792 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Alejandro Rituerto
    • 1
    Email author
  • Ana C. Murillo
    • 1
  • José J. Guerrero
    • 1
  1. 1.Instituto de Investigación en Ingeniería de AragónUniversity of ZaragozaZaragozaSpain

Personalised recommendations