Jointly Learning Visual Motion and Confidence from Local Patches in Event Cameras

Kepple, Daniel R.; Lee, Daewon; Prepsius, Colin; Isler, Volkan; Park, Il Memming; Lee, Daniel D.

doi:10.1007/978-3-030-58539-6_30

Daniel R. Kepple¹²,
Daewon Lee¹²,
Colin Prepsius¹²,
Volkan Isler¹²,
Il Memming Park¹³ &
…
Daniel D. Lee¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12351))

Included in the following conference series:

European Conference on Computer Vision

3760 Accesses
10 Citations
3 Altmetric

Abstract

We propose the first network to jointly learn visual motion and confidence from events in spatially local patches. Event-based sensors deliver high temporal resolution motion information in a sparse, non-redundant format. This creates the potential for low computation, low latency motion recognition. Neural networks which extract global motion information, however, are generally computationally expensive. Here, we introduce a novel shallow and compact neural architecture and learning approach to capture reliable visual motion information along with the corresponding confidence of inference. Our network makes a prediction of the visual motion at each spatial location using only local events. Our confidence network then identifies which of these predictions will be accurate. In the task of recovering pan-tilt ego velocities from events, we show that each individual confident local prediction of our network can be expected to be as accurate as state of the art optimization approaches which utilize the full image. Furthermore, on a publicly available dataset, we find our local predictions generalize to scenes with camera motions and the presence of independently moving objects. This makes the output of our network well suited for motion based tasks, such as the segmentation of independently moving objects. We demonstrate on a publicly available motion segmentation dataset that restricting predictions to confident regions is sufficient to achieve results that exceed state of the art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gallego, G., et al.: Event-based vision: a survey. arXiv preprint arXiv:1904.08405 (2019)
Drazen, D., Lichtsteiner, P., Häfliger, P., Delbrück, T., Jensen, A.: Toward real-time particle tracking using an event-based dynamic vision sensor. Exp. Fluids 51(5), 1465 (2011)
Article Google Scholar
Lichtsteiner, P., Posch, C., Delbruck, T.: A 128\(\times \) 128 120 db 15 \(\mu s\) latency asynchronous temporal contrast vision sensor. IEEE\(\_\)J\(\_\)JSSC, 43(2), 566–576 (2008)
Google Scholar
Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: High speed and high dynamic range video with an event camera. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
Google Scholar
Delmerico, J., Cieslewski, T., Rebecq, H., Faessler, M., Scaramuzza, D.: Are we ready for autonomous drone racing? the UZH-FPV drone racing dataset. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6713–6719. IEEE (2019)
Google Scholar
Mitrokhin, A., Fermuller, C., Parameshwara, C., Aloimonos, Y.: Event-based moving object detection and tracking. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2018. http://dx.doi.org/10.1109/IROS.2018.8593805
Zhu, A.Z., Thakur, D., Özaslan, T., Pfrommer, B., Kumar, V., Daniilidis, K.: The multivehicle stereo event camera dataset: an event camera dataset for 3D perception. IEEE Robot. Autom. Lett. 3(3), 2032–2039 (2018)
Article Google Scholar
Amir, A., et al.: A low power, fully Event-Based gesture recognition system. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7388–7397, July 2017
Google Scholar
Barranco, F., Fermuller, C., Aloimonos, Y., Delbruck, T.: A dataset for visual navigation with neuromorphic methods. Front. Neurosci. 10, 49 (2016)
Article Google Scholar
Reichrdt, W.E.: Autocorrelation, a principle for the evaluation of sensory information by the central nervous system. Sens. Commun. 303–317 (1961)
Google Scholar
Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A. 2(2), 284–299 (1985)
Article Google Scholar
Britten, K.H., Shadlen, M.N., Newsome, W.T., Movshon, J.A.: Responses of neurons in macaque MT to stochastic motion signals. Vis. Neurosci. 10(6), 1157–1169 (1993)
Article Google Scholar
Simoncelli, E.P., Heeger, D.J.: A model of neuronal responses in visual area MT. Vis. Res. 38(5), 743–761 (1998)
Article Google Scholar
Borst, A., Haag, J., Reiff, D.F.: Fly motion vision. Annu. Rev. Neurosci. 33, 49–70 (2010)
Article Google Scholar
Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: EV-FlowNet: self-supervised optical flow estimation for event-based cameras, February 2018
Google Scholar
Ye, C., Mitrokhin, A., Parameshwara, C., Fermüller, C., Yorke, J.A., Aloimonos, Y.: Unsupervised learning of dense optical flow and depth from sparse event data. CoRR, vol. abs/1809.08625 (2018). http://arxiv.org/abs/1809.08625
Mitrokhin, A., Ye, C., Fermuller, C., Aloimonos, Y., Delbruck, T.: EV-IMO: motion segmentation dataset and learning pipeline for event cameras, March 2019
Google Scholar
Benosman, R., Clercq, C., Lagorce, X., Ieng, S.-H., Bartolozzi, C.: Event-based visual flow. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 407–417 (2014)
Article Google Scholar
Gallego, G., Rebecq, H., Scaramuzza, D.: A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. CoRR, vol. abs/1804.01306, 2018. http://arxiv.org/abs/1804.01306
Gallego, G., Scaramuzza, D.: Accurate angular velocity estimation with an event camera. IEEE Robot. Autom. Lett. 2(2), 632–639 (2017)
Article Google Scholar
Stoffregen, T., Gallego, G., Drummond, T., Kleeman, L., Scaramuzza, D.: Event-based motion segmentation by motion compensation (2019)
Google Scholar
Jacobs, R.A., et al.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., Scaramuzza, D.: The event-camera dataset and simulator: event-based data for pose estimation, visual odometry, and SLAM. Int. J. Rob. Res. 36(2), 142–149 (2017)
Article Google Scholar
Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., Benosman, R.: HATS: histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1731–1740 (2018)
Google Scholar
Lagorce, X., Orchard, G., Galluppi, F., Shi, B.E., Benosman, R.B.: HOTS: a hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1346–1359 (2017)
Article Google Scholar
Gerstner, W., Kistler, W.M.: Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)
Book MATH Google Scholar
Baker, S., Matthews, I.: Lucas-Kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004)
Article Google Scholar
Wallach, H.: Über visuell wahrgenommene bewegungsrichtung. Psychologische Forschung 20(1), 325–380 (1935)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, December 2014
Google Scholar
Stoffregen, T., Kleeman, L.: Simultaneous optical flow and segmentation (sofas) using dynamic vision sensor (2018)
Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceeding of Fourth Alvey Vision Conference, pp. 147–151 (1988)
Google Scholar

Download references

Author information

Authors and Affiliations

Samsung AI Center, New York, USA
Daniel R. Kepple, Daewon Lee, Colin Prepsius, Volkan Isler & Daniel D. Lee
Department of Neurobiology and Behavior, Stony Brook University, New York, USA
Il Memming Park

Authors

Daniel R. Kepple
View author publications
You can also search for this author in PubMed Google Scholar
Daewon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Colin Prepsius
View author publications
You can also search for this author in PubMed Google Scholar
Volkan Isler
View author publications
You can also search for this author in PubMed Google Scholar
Il Memming Park
View author publications
You can also search for this author in PubMed Google Scholar
Daniel D. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel R. Kepple .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4802 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kepple, D.R., Lee, D., Prepsius, C., Isler, V., Park, I.M., Lee, D.D. (2020). Jointly Learning Visual Motion and Confidence from Local Patches in Event Cameras. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12351. Springer, Cham. https://doi.org/10.1007/978-3-030-58539-6_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-58539-6_30
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58538-9
Online ISBN: 978-3-030-58539-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics