Jointly Learning Visual Motion and Confidence from Local Patches in Event Cameras

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)


We propose the first network to jointly learn visual motion and confidence from events in spatially local patches. Event-based sensors deliver high temporal resolution motion information in a sparse, non-redundant format. This creates the potential for low computation, low latency motion recognition. Neural networks which extract global motion information, however, are generally computationally expensive. Here, we introduce a novel shallow and compact neural architecture and learning approach to capture reliable visual motion information along with the corresponding confidence of inference. Our network makes a prediction of the visual motion at each spatial location using only local events. Our confidence network then identifies which of these predictions will be accurate. In the task of recovering pan-tilt ego velocities from events, we show that each individual confident local prediction of our network can be expected to be as accurate as state of the art optimization approaches which utilize the full image. Furthermore, on a publicly available dataset, we find our local predictions generalize to scenes with camera motions and the presence of independently moving objects. This makes the output of our network well suited for motion based tasks, such as the segmentation of independently moving objects. We demonstrate on a publicly available motion segmentation dataset that restricting predictions to confident regions is sufficient to achieve results that exceed state of the art methods.

Supplementary material

504443_1_En_30_MOESM1_ESM.pdf (4.7 mb)
Supplementary material 1 (pdf 4802 KB)


  1. 1.
    Gallego, G., et al.: Event-based vision: a survey. arXiv preprint arXiv:1904.08405 (2019)
  2. 2.
    Drazen, D., Lichtsteiner, P., Häfliger, P., Delbrück, T., Jensen, A.: Toward real-time particle tracking using an event-based dynamic vision sensor. Exp. Fluids 51(5), 1465 (2011)CrossRefGoogle Scholar
  3. 3.
    Lichtsteiner, P., Posch, C., Delbruck, T.: A 128\(\times \) 128 120 db 15 \(\mu s\) latency asynchronous temporal contrast vision sensor. IEEE\(\_\)J\(\_\)JSSC, 43(2), 566–576 (2008)Google Scholar
  4. 4.
    Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: High speed and high dynamic range video with an event camera. IEEE Trans. Pattern Anal. Mach. Intell. (2019)Google Scholar
  5. 5.
    Delmerico, J., Cieslewski, T., Rebecq, H., Faessler, M., Scaramuzza, D.: Are we ready for autonomous drone racing? the UZH-FPV drone racing dataset. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6713–6719. IEEE (2019)Google Scholar
  6. 6.
    Mitrokhin, A., Fermuller, C., Parameshwara, C., Aloimonos, Y.: Event-based moving object detection and tracking. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2018.
  7. 7.
    Zhu, A.Z., Thakur, D., Özaslan, T., Pfrommer, B., Kumar, V., Daniilidis, K.: The multivehicle stereo event camera dataset: an event camera dataset for 3D perception. IEEE Robot. Autom. Lett. 3(3), 2032–2039 (2018)CrossRefGoogle Scholar
  8. 8.
    Amir, A., et al.: A low power, fully Event-Based gesture recognition system. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7388–7397, July 2017Google Scholar
  9. 9.
    Barranco, F., Fermuller, C., Aloimonos, Y., Delbruck, T.: A dataset for visual navigation with neuromorphic methods. Front. Neurosci. 10, 49 (2016)CrossRefGoogle Scholar
  10. 10.
    Reichrdt, W.E.: Autocorrelation, a principle for the evaluation of sensory information by the central nervous system. Sens. Commun. 303–317 (1961)Google Scholar
  11. 11.
    Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A. 2(2), 284–299 (1985)CrossRefGoogle Scholar
  12. 12.
    Britten, K.H., Shadlen, M.N., Newsome, W.T., Movshon, J.A.: Responses of neurons in macaque MT to stochastic motion signals. Vis. Neurosci. 10(6), 1157–1169 (1993)CrossRefGoogle Scholar
  13. 13.
    Simoncelli, E.P., Heeger, D.J.: A model of neuronal responses in visual area MT. Vis. Res. 38(5), 743–761 (1998)CrossRefGoogle Scholar
  14. 14.
    Borst, A., Haag, J., Reiff, D.F.: Fly motion vision. Annu. Rev. Neurosci. 33, 49–70 (2010)CrossRefGoogle Scholar
  15. 15.
    Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: EV-FlowNet: self-supervised optical flow estimation for event-based cameras, February 2018Google Scholar
  16. 16.
    Ye, C., Mitrokhin, A., Parameshwara, C., Fermüller, C., Yorke, J.A., Aloimonos, Y.: Unsupervised learning of dense optical flow and depth from sparse event data. CoRR, vol. abs/1809.08625 (2018).
  17. 17.
    Mitrokhin, A., Ye, C., Fermuller, C., Aloimonos, Y., Delbruck, T.: EV-IMO: motion segmentation dataset and learning pipeline for event cameras, March 2019Google Scholar
  18. 18.
    Benosman, R., Clercq, C., Lagorce, X., Ieng, S.-H., Bartolozzi, C.: Event-based visual flow. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 407–417 (2014)CrossRefGoogle Scholar
  19. 19.
    Gallego, G., Rebecq, H., Scaramuzza, D.: A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. CoRR, vol. abs/1804.01306, 2018.
  20. 20.
    Gallego, G., Scaramuzza, D.: Accurate angular velocity estimation with an event camera. IEEE Robot. Autom. Lett. 2(2), 632–639 (2017)CrossRefGoogle Scholar
  21. 21.
    Stoffregen, T., Gallego, G., Drummond, T., Kleeman, L., Scaramuzza, D.: Event-based motion segmentation by motion compensation (2019)Google Scholar
  22. 22.
    Jacobs, R.A., et al.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)CrossRefGoogle Scholar
  23. 23.
    Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)Google Scholar
  24. 24.
    Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  25. 25.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  26. 26.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  27. 27.
    Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., Scaramuzza, D.: The event-camera dataset and simulator: event-based data for pose estimation, visual odometry, and SLAM. Int. J. Rob. Res. 36(2), 142–149 (2017)CrossRefGoogle Scholar
  28. 28.
    Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., Benosman, R.: HATS: histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1731–1740 (2018)Google Scholar
  29. 29.
    Lagorce, X., Orchard, G., Galluppi, F., Shi, B.E., Benosman, R.B.: HOTS: a hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1346–1359 (2017)CrossRefGoogle Scholar
  30. 30.
    Gerstner, W., Kistler, W.M.: Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)zbMATHCrossRefGoogle Scholar
  31. 31.
    Baker, S., Matthews, I.: Lucas-Kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004)CrossRefGoogle Scholar
  32. 32.
    Wallach, H.: Über visuell wahrgenommene bewegungsrichtung. Psychologische Forschung 20(1), 325–380 (1935)CrossRefGoogle Scholar
  33. 33.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  34. 34.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, December 2014Google Scholar
  35. 35.
    Stoffregen, T., Kleeman, L.: Simultaneous optical flow and segmentation (sofas) using dynamic vision sensor (2018)Google Scholar
  36. 36.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceeding of Fourth Alvey Vision Conference, pp. 147–151 (1988)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Samsung AI CenterNew YorkUSA
  2. 2.Department of Neurobiology and BehaviorStony Brook UniversityNew YorkUSA

Personalised recommendations