Advertisement

Event-Based Asynchronous Sparse Convolutional Networks

Conference paper
  • 672 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)

Abstract

Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse “events”. Recently, pattern recognition algorithms, such as learning-based methods, have made significant progress with event cameras by converting events into synchronous dense, image-like representations and applying traditional machine learning methods developed for standard cameras. However, these approaches discard the spatial and temporal sparsity inherent in event data at the cost of higher computational complexity and latency. In this work, we present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output, thus directly leveraging the intrinsic asynchronous and sparse nature of the event data. We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks without sacrificing accuracy. In addition, our framework has several desirable characteristics: (i) it exploits spatio-temporal sparsity of events explicitly, (ii) it is agnostic to the event representation, network architecture, and task, and (iii) it does not require any train-time change, since it is compatible with the standard neural networks’ training process. We thoroughly validate the proposed framework on two computer vision tasks: object detection and object recognition. In these tasks, we reduce the computational complexity up to 20 times with respect to high-latency neural networks. At the same time, we outperform state-of-the-art asynchronous approaches up to \(24\%\) in prediction accuracy.

Keywords

Deep Learning: Applications Methodology and Theory Low-level Vision 

Notes

Acknowledgements

This work was supported by the Swiss National Center of Competence Research Robotics (NCCR), through the Swiss National Science Foundation, and the SNSF-ERC starting grant.

Supplementary material

504445_1_En_25_MOESM1_ESM.pdf (688 kb)
Supplementary material 1 (pdf 687 KB)

Supplementary material 2 (mp4 87544 KB)

References

  1. 1.
    Gallego, G., et al.: Event-based vision: A survey. arXiv e-prints, vol. abs/1904.08405 (2019). http://arxiv.org/abs/1904.08405
  2. 2.
    Orchard, G., Meyer, C., Etienne-Cummings, R., Posch, C., Thakor, N., Benosman, R.: HFirst: a temporal approach to object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2028–2040 (2015)CrossRefGoogle Scholar
  3. 3.
    Lagorce, X., Orchard, G., Gallupi, F., Shi, B.E., Benosman, R.: HOTS: a hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1346–1359 (2017)CrossRefGoogle Scholar
  4. 4.
    Gallego, G., Lund, J.E.A., Mueggler, E., Rebecq, H., Delbruck, T., Scaramuzza, D.: Event-based, 6-DOF camera tracking from photometric depth maps. IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2402–2412 (2018)CrossRefGoogle Scholar
  5. 5.
    Kim, H., Leutenegger, S., Davison, A.J.: Real-time 3D reconstruction and 6-DoF tracking with an event camera. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 349–364. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_21CrossRefGoogle Scholar
  6. 6.
    Lee, J.H., Delbruck, T., Pfeiffer, M.: Training deep spiking neural networks using backpropagation. Front. Neurosci. 10, 508 (2016)Google Scholar
  7. 7.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  8. 8.
    Perez-Carrasco, J.A., et al.: Mapping from frame-driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing-application to feedforward ConvNets. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2706–2719 (2013)CrossRefGoogle Scholar
  9. 9.
    Amir, A., et al.: A low power, fully event-based gesture recognition system. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 7388–7397 (2017)Google Scholar
  10. 10.
    Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., Benosman, R.: HATS: histograms of averaged time surfaces for robust event-based object classification. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 1731–1740 (2018)Google Scholar
  11. 11.
    Maqueda, A.I., Loquercio, A., Gallego, G., García, N., Scaramuzza, D.: Event-based vision meets deep learning on steering prediction for self-driving cars. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 5419–5427 (2018)Google Scholar
  12. 12.
    Gehrig, D., Loquercio, A., Derpanis, K.G., Scaramuzza, D.: End-to-end learning of representations for asynchronous event-based data. In: International Conference Computer Vision (ICCV) (2019)Google Scholar
  13. 13.
    Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: EV-FlowNet: self-supervised optical flow estimation for event-based cameras. In: Robotics: Science and Systems (RSS) (2018)Google Scholar
  14. 14.
    Rebecq, H., Horstschäfer, T., Gallego, G., Scaramuzza, D.: EVO: a geometric approach to event-based 6-DOF parallel tracking and mapping in real-time. IEEE Robot. Autom. Lett. 2(2), 593–600 (2017)CrossRefGoogle Scholar
  15. 15.
    Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: Events-to-video: bringing modern computer vision to event cameras. In: IEEE Conference Computer Vision Pattern Recognition (CVPR) (2019)Google Scholar
  16. 16.
    Xu, Y., Ji, H., Fermüller, C.: Viewpoint invariant texture description using fractal analysis. Int. J. Comput. Vis. 83(1), 85–100 (2009)CrossRefGoogle Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  18. 18.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference Computer Vision Pattern Recognition (CVPR) (2016)Google Scholar
  19. 19.
    Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 5297–5307 (2016)Google Scholar
  20. 20.
    Orchard, G., Benosman, R., Etienne-Cummings, R., Thakor, N.V.: A spiking neural network architecture for visual motion estimation. In: IEEE Biomedical Circuits Systems Conference (BioCAS), pp. 298–301 (2013)Google Scholar
  21. 21.
    Neil, D., Pfeiffer, M., Liu, S.-C.: Phased LSTM: accelerating recurrent network training for long or event-based sequences. In: Conference Neural Information Processing System (NIPS) (2016)Google Scholar
  22. 22.
    Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: Unsupervised event-based learning of optical flow, depth, and egomotion. In: IEEE Conference Computer Vision Pattern Recognition (CVPR) (2019)Google Scholar
  23. 23.
    Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: High speed and high dynamic range video with an event camera. IEEE Trans. Pattern Anal. Mach. Intell. (2019)Google Scholar
  24. 24.
    Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 9224–9232 (2018)Google Scholar
  25. 25.
    Aimar, A., et al.: Nullhop: a flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. Learn. Syst. 30(3), 644–656 (2019)Google Scholar
  26. 26.
    Sekikawa, Y., et al.: Constant velocity 3D convolution. In: International Conference on 3D Vision (3DV), pp. 343–351 (2018)Google Scholar
  27. 27.
    Scheerlinck, C., Barnes, N., Mahony, R.: Asynchronous spatial image convolutions for event cameras. IEEE Robot. Autom. Lett. 4(2), 816–822 (2019)CrossRefGoogle Scholar
  28. 28.
    Sekikawa, Y., Hara, K., Saito, H.: EventNet: asynchronous recursive event processing. In: IEEE Conference Computer Vision Pattern Recognition (CVPR) (2019)Google Scholar
  29. 29.
    Cannici, M., Ciccone, M., Romanoni, A., Matteucci, M.: Asynchronous convolutional networks for object detection in neuromorphic cameras. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), Workshops (CVPRW) (2019)Google Scholar
  30. 30.
    Rueckauer, B., Lungu, I.-A., Hu, Y., Pfeiffer, M., Liu, S.-C.: Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Front. Neurosci. 11, 682 (2017)Google Scholar
  31. 31.
    Gallego, G., Forster, C., Mueggler, E., Scaramuzza, D.: Event-based camera pose tracking using a generative event model (2015) arXiv:1510.01972
  32. 32.
    Tulyakov, S., Fleuret, F., Kiefel, M., Gehler, P., Hirsch, M.: Learning an event sequence embedding for dense event-based deep stereo. In: International Conference Computer Vision (ICCV), October 2019Google Scholar
  33. 33.
    Mitrokhin, A., Fermuller, C., Parameshwara, C., Aloimonos, Y.: Event-based moving object detection and tracking. In: IEEE/RSJ International Conference Intelligent Robots Systems (IROS) (2018)Google Scholar
  34. 34.
    Orchard, G., Jayawant, A., Cohen, G.K., Thakor, N.: Converting static image datasets to spiking neuromorphic datasets using saccades. Front. Neurosci. 9, 437 (2015)Google Scholar
  35. 35.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference Learning, Representations (ICLR) (2015)Google Scholar
  36. 36.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings International Conference Machine, Learning (ICML) (2015)Google Scholar
  37. 37.
    Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference Learning, Representations (ICLR) (2015)Google Scholar
  38. 38.
    Ramesh, B., Yang, H., Orchard, G.M., Le Thi, N.A., Xiang, C.: DART: distribution aware retinal transform for event-based cameras, arXiv e-prints, October 2017. http://arxiv.org/abs/1710.10800
  39. 39.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Workshops (2017)Google Scholar
  40. 40.
    de Tournemire, P., Nitti, D., Perot, E., Migliore, D., Sironi, A.: A large scale event-based detection dataset for automotive, ArXiv, vol. abs/2001.08499 (2020)Google Scholar
  41. 41.
    Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)Google Scholar
  42. 42.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)Google Scholar
  43. 43.
    Padilla, R., Netto, S.L., da Silva, E.A.B.: Survey on performance metrics for object-detection algorithms. In: International Conference on Systems, Signals and Image Processing (IWSSIP) (2020)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of InformaticsUniversity of ZurichZürichSwitzerland
  2. 2.Department of NeuroinformaticsUniversity of Zurich and ETH ZurichZürichSwitzerland

Personalised recommendations