Abstract
Intelligent systems need complex and detailed models of their environment to achieve more sophisticated tasks, such as assistance to the user. Vision sensors provide rich information and are broadly used to obtain these models, for example, indoor scene modeling from monocular images has been widely studied. A common initial step in those settings is the estimation of the \(3\)D layout of the scene. While most of the previous approaches obtain the scene layout from a single image, this work presents a novel approach to estimate the initial layout and addresses the problem of how to propagate it on a video. We propose to use a particle filter framework for this propagation process and describe how to generate and sample new layout hypotheses for the scene on each of the following frames. We present different ways to evaluate and rank these hypotheses. The experimental validation is run on two recent and publicly available datasets and shows promising results on the estimation of a basic \(3\)D layout. Our experiments demonstrate how this layout information can be used to improve detection tasks useful for a human user, in particular sign detection, by easily rejecting false positives.
We would like to thank Prof. Roberto Manduchi for his comments and suggestions, which helped us to improve the present work. This work was supported by the Spanish FPI grant BES-\(2010\)-\(030299\) and Spanish projects DPI\(2012\)-\(31781\), DGA-T\(04\)-FSE and TAMA.
Chapter PDF
Similar content being viewed by others
References
Badrinarayanan, V., Galasso, F., Cipolla, R.: Label propagation in video sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3265–3272 (2010)
Bao, S.Y., Sun, M., Savarese, S.: Toward coherent object detection and scene layout understanding. Image and Vision Computing 29(9), 569–579 (2011)
Cambra, A.B., Murillo, A.: Towards robust and efficient text sign reading from a mobile phone. In: Int. Conf. on Computer Vision Workshops, pp. 64–71 (2011)
Chen, L., Guo, B.L., Sun, W.: Obstacle detection system for visually impaired people based on stereo vision. In: Int. Conf. on Genetic and Evolutionary Computing, pp. 723–726 (2010)
Ciocarlie, M., Hsiao, K., Jones, E.G., Chitta, S., Rusu, R.B., Şucan, I.A.: Towards reliable grasping and manipulation in household environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental Robotics. STAR, vol. 79, pp. 241–252. Springer, Heidelberg (2012)
Coughlan, J.M., Yuille, A.L.: Manhattan world: Compass direction from a single image by bayesian inference. In: IEEE International Conference on Computer Vision (ICCV), pp. 941–947 (1999)
Delage, E., Lee, H., Ng, A.Y.: A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2418–2428 (2006)
Flint, A., Murray, D., Reid, I.: Manhattan scene understanding using monocular, stereo, and 3D features. In: IEEE International Conference on Computer Vision (ICCV), pp. 2228–2235 (2011)
Furlan, A., Miller, S., Sorrenti, D.G., Fei-Fei, L., Savarese, S.: Free your camera: 3d indoor scene understanding from arbitrary camera motion. In: British Machine Vision Conference (BMVC) (2013)
Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 482–496. Springer, Heidelberg (2010)
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: IEEE International Conference on Computer Vision (ICCV), pp. 1849–1856 (2009)
Hedau, V., Hoiem, D., Forsyth, D.: Thinking inside the box: using appearance models and context based on room geometry. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 224–237. Springer, Heidelberg (2010)
Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. International Journal of Computer Vision 75(1), 151–172 (2007)
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. International Journal of Computer Vision 80(1), 3–15 (2008)
Kovesi, P.D.: MATLAB and Octave functions for computer vision and image processing
Lee, D.C., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Advances in Neural Information Processing Systems (NIPS) (2010)
Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2136–2143 (2009)
López-Nicolás, G., Omedes, J., Guerrero, J.: Spatial layout recovery from a single omnidirectional image and its matching-free sequential propagation. In: Robotics and Autonomous Systems (2014)
Raza, S.H., Grundmann, M., Essa, I.: Geometric context from video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Rituerto, A., Manduchi, R., Murillo, A.C., Guerrero, J.J.: 3D Spatial layout propagation in a video sequence. In: Campilho, A., Kamel, M. (eds.) ICIAR 2014, Part II. LNCS, vol. 8815, pp. 374–382. Springer, Heidelberg (2014)
Rituerto, J., Murillo, A., Kosecka, J.: Label propagation in videos indoors with an incremental non-parametric model update. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2383–2389 (2011)
Rother, C.: A new approach to vanishing point detection in architectural environments. Image and Vision Computing 20(9), 647–655 (2002)
Saxena, A., Sun, M., Ng, A.Y.: Make3d: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 824–840 (2009)
Southey, T., Little, J.: 3D spatial relationships for improving object detection. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 140–147 (May 2013)
Tapu, R., Mocanu, B., Bursuc, A., Zaharia, T.: A smartphone-based obstacle detection and classification system for assisting visually impaired people. In: Int. Conf. on Computer Vision Workshops (ICCVW), pp. 444–451 (2013)
Torralba, A., Murphy, K.P., Freeman, W.T.: Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM 53(3), 107–114 (2010)
Tsai, G., Kuipers, B.: Dynamic visual understanding of the local environment for an indoor navigating robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4695–4701 (2012)
Vazquez-Reina, A., Avidan, S., Pfister, H., Miller, E.: Multiple hypothesis video segmentation from superpixel flows. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 268–281. Springer, Heidelberg (2010)
Wexler, Y., Shashua, A., Tadmor, O., Ehrlich, I.: User wearable visual assistance device (ORCAM), uS Patent App. 13/914,792 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Rituerto, A., Murillo, A.C., Guerrero, J.J. (2015). 3D Layout Propagation to Improve Object Recognition in Egocentric Videos. In: Agapito, L., Bronstein, M., Rother, C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science(), vol 8927. Springer, Cham. https://doi.org/10.1007/978-3-319-16199-0_58
Download citation
DOI: https://doi.org/10.1007/978-3-319-16199-0_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16198-3
Online ISBN: 978-3-319-16199-0
eBook Packages: Computer ScienceComputer Science (R0)