Co-inference for Multi-modal Scene Analysis

Munoz, Daniel; Bagnell, James Andrew; Hebert, Martial

doi:10.1007/978-3-642-33783-3_48

Daniel Munoz²¹,
James Andrew Bagnell²¹ &
Martial Hebert²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7577))

Included in the following conference series:

European Conference on Computer Vision

9439 Accesses
20 Citations

Abstract

We address the problem of understanding scenes from multiple sources of sensor data (e.g., a camera and a laser scanner) in the case where there is no one-to-one correspondence across modalities (e.g., pixels and 3-D points). This is an important scenario that frequently arises in practice not only when two different types of sensors are used, but also when the sensors are not co-located and have different sampling rates. Previous work has addressed this problem by restricting interpretation to a single representation in one of the domains, with augmented features that attempt to encode the information from the other modalities. Instead, we propose to analyze all modalities simultaneously while propagating information across domains during the inference procedure. In addition to the immediate benefit of generating a complete interpretation in all of the modalities, we demonstrate that this co-inference approach also improves performance over the canonical approach.

Download to read the full chapter text

Chapter PDF

A Bimodal Co-sparse Analysis Model for Image Processing

Article 22 November 2014

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Probabilistic multi-modal depth estimation based on camera–LiDAR sensor fusion

Article Open access 29 July 2023

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: 3DRR Workshop (2011)
Google Scholar
Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3-D object dataset putting the kinect to work. In: Consumer Depth Cameras in Computer Vision Workshop (2011)
Google Scholar
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: CVPR (2010)
Google Scholar
Besl, P.J., Jain, R.C.: Invariant surface characteristics for 3D object recognition in range images. CVGIP 33 (1986)
Google Scholar
Kweon, I.S., Hebert, M., Kanade, T.: Sensor fusion of range and reflectance data for outdoor scene analysis. In: NASA Workshop on Space Operations, Automation, and Robotics (1988)
Google Scholar
Baseski, E., Pugeault, N., Kalkan, S., Kraft, D., Worgotter, F., Kruge, N.: Indoor scene segmentation using a structured light sensor. In: 3DRR Workshop (2007)
Google Scholar
Koppula, H.S., Anand, A., Joachims, T., Saxena, A.: Semantic labeling of 3D point clouds for indoor scenes. In: NIPS (2011)
Google Scholar
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and Recognition Using Structure from Motion Point Clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)
Chapter Google Scholar
Gould, S., Baumstarck, P., Quigley, M., Ng, A.Y., Koller, D.: Integrating visual and range data for robotic object detection. In: M2SFA2 Workshop (2008)
Google Scholar
Xiao, J., Quan, L.: Multiple view semantic segmentation for street view images. In: ICCV (2009)
Google Scholar
Zhang, C., Wang, L., Yang, R.: Semantic Segmentation of Urban Scenes Using Dense Depth Maps. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 708–721. Springer, Heidelberg (2010)
Chapter Google Scholar
Collet, A., Srinivasa, S., Hebert, M.: Structure discovery in multi-modal data: a region-based approach. In: ICRA (2011)
Google Scholar
Tombari, F., Stefano, L.D.: 3D data segmentation by local classification and markov random fields. In: 3DIMPVT (2011)
Google Scholar
Douillard, B., Fox, D., Ramos, F., Durrant-Whyte, H.: Classification and semantic mapping of urban environments. IJRR 30 (2011)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3D scenes. In: ICRA (2012)
Google Scholar
Munoz, D., Bagnell, J.A., Hebert, M.: Stacked Hierarchical Labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010)
Chapter Google Scholar
Xiong, X., Munoz, D., Bagnell, J.A., Hebert, M.: 3-D scene analysis via sequenced predictions over points and regions. In: ICRA (2011)
Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Networks 5 (1992)
Google Scholar
Russell, B., Torralba, A., Murphy, K., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. IJCV 77 (2007)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. IJCV 59 (2004)
Google Scholar
Medioni, G., Lee, M.S., Tang, C.K.: A Computational Framework for Segmentation and Grouping. Elsevier (2000)
Google Scholar
Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS (2011)
Google Scholar
Ladicky, L.: Global Structured Models towards Scene Understanding. PhD thesis, Oxford Brookes University (2011)
Google Scholar
Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. IJCV 80 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

The Robotics Institute, Carnegie Mellon University, USA
Daniel Munoz, James Andrew Bagnell & Martial Hebert

Authors

Daniel Munoz
View author publications
You can also search for this author in PubMed Google Scholar
James Andrew Bagnell
View author publications
You can also search for this author in PubMed Google Scholar
Martial Hebert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Munoz, D., Bagnell, J.A., Hebert, M. (2012). Co-inference for Multi-modal Scene Analysis. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33783-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-642-33783-3_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33782-6
Online ISBN: 978-3-642-33783-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Co-inference for Multi-modal Scene Analysis

Abstract

Chapter PDF

Similar content being viewed by others

A Bimodal Co-sparse Analysis Model for Image Processing

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Probabilistic multi-modal depth estimation based on camera–LiDAR sensor fusion

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Co-inference for Multi-modal Scene Analysis

Abstract

Chapter PDF

Similar content being viewed by others

A Bimodal Co-sparse Analysis Model for Image Processing

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Probabilistic multi-modal depth estimation based on camera–LiDAR sensor fusion

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation