Fusing Intertial Data with Vision for Enhanced Image Understanding

Haines, Osian; Bull, David R.; Burn, J. F.

doi:10.1007/978-3-319-29971-6_11

Osian Haines¹⁷,
David R. Bull¹⁷ &
J. F. Burn¹⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 598))

Included in the following conference series:

International Joint Conference on Computer Vision, Imaging and Computer Graphics

1088 Accesses

Abstract

In this paper we show that combining knowledge of the orientation of a camera with visual information can be used to improve the performance of semantic image segmentation. This is based on the assumption that the direction in which a camera is facing acts as a prior on the content of the images it creates. We gathered egocentric video with a camera attached to a head-mounted display, and recorded its orientation using an inertial sensor. By combining orientation information with typical image descriptors, we show that segmentation of individual images improves in accuracy compared with vision alone, from 61 % to 71 % over six classes. We also show that this method can be applied to both point and line based features from the image, and that these can be combined together for further benefits. Our resulting system would have applications in autonomous robot locomotion and guiding visually impaired humans.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
en.ids-imaging.com.
2.
www.oculus.com.
3.
www.docs.opencv.org.
4.
Code available at www.ipol.im/pub/art/2012/gjmr-lsd.
5.
Using the ‘gco-v3.0’ code at vision.csd.uwo.ca/code.
6.
Using code available at www.philkr.net/home/densecrf.
7.
Our dataset can be found at www.osianh.com/inertial.
8.
Videos available at www.osianh.com/inertial.

References

Angelaki, D.E., Cullen, K.E.: Vestibular system: the many facets of a multimodal sense. Ann. Rev. Neurosci. 31, 125–150 (2008)
Article Google Scholar
Bi, Y., Guan, J., Bell, D.: The combination of multiple classifiers using an evidential reasoning approach. Artif. Intell. 172(15), 1731–1751 (2008)
Article MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Brandt, T., Bartenstein, P., Janek, A., Dieterich, M.: Reciprocal inhibitory visual-vestibular interaction. Brain 121(9), 1749–1758 (1998)
Article Google Scholar
Dahlkamp, H., Kaehler, A., Stavens, D., Thrun, S., Bradski, G.R.: Self-supervised monocular road detection in desert terrain. In: Robotics Science and Systems (2006)
Google Scholar
Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. Int. J. Comput. Vis. 96(1), 1–27 (2012)
Article MathSciNet MATH Google Scholar
Deshpande, N., Patla, A.E.: Visual-vestibular interaction during goal directed locomotion: effects of aging and blurring vision. Exp. Brain Res. 176(1), 43–53 (2007)
Article Google Scholar
De Souza, G.N., Kak, A.C.: IVision for mobile robot navigation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 237–267 (2002)
Article Google Scholar
Domke, J.: Learning graphical model parameters with approximate marginal inference. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2454 (2013)
Article Google Scholar
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: IEEE International Conference on Computer Vision (2009)
Google Scholar
Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation. Int. J. Comput. Vis. 112, 1–17 (2014)
MathSciNet Google Scholar
Haines, O., Calway, A.: Recognising planes in a single image. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1849–1861 (2014)
Article Google Scholar
Haines, O., Bull, D., Burn, J.F.: Using inertial data to enhance image segmentation. In: International Conference on Computer Vision Theory and Applications (2015)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 1(75), 151–172 (2007)
Article MATH Google Scholar
Joshi, N., Kang, S.B., Zitnick, C.L., Szeliski, R.: Image deblurring using inertial measurement sensors. ACM Trans. Graph. 29(4), 30 (2010)
Article Google Scholar
Kleiner, A., Dornhege, C.: Real-time localization and elevation mapping within urban search and rescue scenarios. J. Field Robot. 24(8–9), 723–745 (2007)
Article Google Scholar
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: Advances in Neural Information Processing Systems (2011)
Google Scholar
Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 703–718. Springer, Heidelberg (2014)
Google Scholar
Li, S.Z.: Markov Random Field Modeling in Image Analysis. Springer, New York (2009)
MATH Google Scholar
Lorch, O., Albert, A., Denk, J., Gerecke, M., Cupec, R., Seara, J.F., Gerth, W., Schmidt, G.: Experiments in vision-guided biped walking. In: IEEE International Conference on Intelligent Robots and Systems (2002)
Google Scholar
Maimone, M., Cheng, Y., Matthies, L.: Two years of visual odometry on the mars exploration rovers. J. Field Robot. 24(3), 169–186 (2007)
Article Google Scholar
Nützi, G., Weiss, S., Scaramuzza, D., Siegwart, R.: Fusion of IMU and vision for absolute scale estimation in monocular SLAM. J. Intell. Robot. Syst. 61, 287–299 (2011)
Article Google Scholar
Patla, A.E.: Understanding the roles of vision in the control of human locomotion. Gait Posture 1(5), 54–69 (1997)
Article Google Scholar
Piniés, P., Lupton, T., Sukkarieh, S., Tardós, J.D.: Inertial aiding of inverse depth SLAM using a monocular camera. In: International Conference on Robotics and Automation (2007)
Google Scholar
Sadhukhan, D., Moore, C., Collins, E.: Terrain estimation using internal sensors. In: International Conference on Robotics and Applications (2004)
Google Scholar
Gould, S., Fulton, R., Koller, D.: Combining appearance and structure from motion features for road scene understanding. In: British Machine Vision Conference (2009)
Google Scholar
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Introduction to Statistical Relational Learning, pp. 93–128 (2006)
Google Scholar
Tapu, R., Mocanu, B., Zaharia, T.: A computer vision system that ensure the autonomous navigation of blind people. In: Conference on E-Health and Bioengineerin (2013)
Google Scholar
Vidal, P.P., Degallaix, L., Josset, P., Gasc, J.P., Cullen, K.E.: Postural and locomotor control in normal and vestibularly deficient mice. J. Physiol. 559(2), 625638 (2004)
Article Google Scholar
Virre, E.: Virtual reality and the vestibular apparatus. Eng. Med. Biol. Mag. 15(2), 41–43 (1996)
Article Google Scholar
Von Gioi, R.G., Jakubowicz, J., Morel, J., Randall, G.: LSD: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 722–732 (2010)
Article Google Scholar

Download references

Acknowledgments

This work was funded by the UK Engineering and Physical Sciences Research Council (EP/J012025/1). The authors would like to thank Austin Gregg-Smith and Geoffrey Daniels for help with hardware and data, and Adeline Paiement for all the enlightening discussions.

Author information

Authors and Affiliations

University of Bristol, Bristol, UK
Osian Haines, David R. Bull & J. F. Burn

Authors

Osian Haines
View author publications
You can also search for this author in PubMed Google Scholar
David R. Bull
View author publications
You can also search for this author in PubMed Google Scholar
J. F. Burn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osian Haines .

Editor information

Editors and Affiliations

Escola Superior de Tecnologia do IPS, Setúbal, Portugal
José Braz
Inria-Rennes/MimeTIC Team, Rennes cedex, France
Julien Pettré
LISA - ISTIA, University of Angers, Angers, France
Paul Richard
Linnaeus University, Växjö, Sweden
Andreas Kerren
Jacobs University, Bremen, Germany
Lars Linsen
Università di Catania, Catania, Catania, Italy
Sebastiano Battiato
Research Innovation Center, Canon U.S.A. Inc, San Jose, CA, USA
Francisco Imai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haines, O., Bull, D.R., Burn, J.F. (2016). Fusing Intertial Data with Vision for Enhanced Image Understanding. In: Braz, J., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2015. Communications in Computer and Information Science, vol 598. Springer, Cham. https://doi.org/10.1007/978-3-319-29971-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-29971-6_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29970-9
Online ISBN: 978-3-319-29971-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics