Skip to main content

Data Association Between Event Streams and Intensity Frames Under Diverse Baselines

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13667))

Included in the following conference series:

Abstract

This paper proposes a learning-based framework to associate event streams and intensity frames under diverse camera baselines, to simultaneously benefit camera pose estimation under large baselines and depth estimation under small baselines. Based on the observation that event streams are globally sparse (a small percentage of pixels in global frames are triggered with events) and locally dense (a large percentage of pixels in local patches are triggered with events) in the spatial domain, we put forward a two-stage architecture for matching feature maps. LSparse-Net uses a large receptive field to find sparse matches while SDense-Net uses a small receptive field to find dense matches. Both stages apply Transformer modules with self-attention layers and cross-attention layers to effectively process multi-resolution features from the feature pyramid network backbone. Experimental results on public datasets show a systematic performance improvement for both tasks compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahmed, S.H., Jang, H.W., Uddin, S.N., Jung, Y.J.: Deep event stereo leveraged by event-to-image translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 882–890 (2021)

    Google Scholar 

  2. Alexis, B., Zihao, W.W., Oliver, C., Aggelos, K.K.: E3D: event-based 3D shape reconstruction. CoRR abs/2012.05214 (2020). https://arxiv.org/abs/2012.05214

  3. Bryner, S., Gallego, G., Rebecq, H., Scaramuzza, D.: Event-based, direct camera tracking from a photometric 3D map using nonlinear optimization. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 325–331 (2019)

    Google Scholar 

  4. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  5. Censi, A., Scaramuzza, D.: Low-latency event-based visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 703–710 (2014)

    Google Scholar 

  6. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5828–5839 (2017)

    Google Scholar 

  7. Delbruck, T.: Frame-free dynamic digital vision. In: Proceedings of International Symposium on Secure-Life Electronics, Advanced Electronics for Quality Life and Society, vol. 1, pp. 21–26 (2008)

    Google Scholar 

  8. DeTone, D., Malisiewicz, T., Rabinovich, A.: Toward geometric deep slam. arXiv preprint arXiv:1707.07410 (2017)

  9. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)

    Google Scholar 

  10. Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2758–2766 (2015)

    Google Scholar 

  11. Duan, P., Wang, Z., Shi, B., Cossairt, O., Huang, T., Katsaggelos, A.: Guided event filtering: synergy between intensity images and neuromorphic events for high performance imaging. IEEE Trans. Pattern Anal. Mach. Intell. 44, 8261–8275 (2021)

    Google Scholar 

  12. Duan, P., Wang, Z., Zhou, X., Ma, Y., Shi, B.: EventZoom: learning to denoise and super resolve neuromorphic events. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  13. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  14. Gallego, G., Forster, C., Mueggler, E., Scaramuzza, D.: Event-based camera pose tracking using a generative event model. arXiv preprint arXiv:1510.01972 (2015)

  15. Gallego, G., Lund, J.E., Mueggler, E., Rebecq, H., Delbruck, T., Scaramuzza, D.: Event-based, 6-DoF camera tracking from photometric depth maps. IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2402–2412 (2017)

    Article  Google Scholar 

  16. Gallego, G., Rebecq, H., Scaramuzza, D.: A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3867–3876 (2018)

    Google Scholar 

  17. Gehrig, D., Rebecq, H., Gallego, G., Scaramuzza, D.: Asynchronous, photometric feature tracking using events and frames. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 766–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_46

    Chapter  Google Scholar 

  18. Gehrig, D., Rebecq, H., Gallego, G., Scaramuzza, D.: EKLT: asynchronous photometric feature tracking using events and frames. Int. J. Comput. Vis. 128, 1–18 (2019)

    Google Scholar 

  19. Han, J., et al: Neuromorphic camera guided high dynamic range imaging. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  20. Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2007)

    Article  Google Scholar 

  21. Hu, Y., Liu, S.C., Delbruck, T.: V2E: from video frames to realistic DVS events. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1312–1321 (2021)

    Google Scholar 

  22. Hui, T.W., Tang, X., Loy, C.C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)

    Google Scholar 

  23. Hui, T.W., Tang, X., Loy, C.C.: A lightweight optical flow CNN-revisiting data fidelity and regularization. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2555–2569 (2020)

    Article  Google Scholar 

  24. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2462–2470 (2017)

    Google Scholar 

  25. Jin, Y., Mishkin, D., Mishchuk, A., Matas, J., Fua, P., Yi, K.M., Trulls, E.: Image matching across wide baselines: from paper to practice 129(2), 517–547 (2021)

    Google Scholar 

  26. Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: International Conference on Machine Learning, pp. 5156–5165 (2020)

    Google Scholar 

  27. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2938–2946 (2015)

    Google Scholar 

  28. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  29. Lagorce, X., Orchard, G., Galluppi, F., Shi, B.E., Benosman, R.B.: HOTS: a hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1346–1359 (2016)

    Article  Google Scholar 

  30. Li, H., Li, G., Shi, L.: Super-resolution of spatiotemporal event-stream image. Neurocomputing 335, 206–214 (2019)

    Google Scholar 

  31. Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. Adv. Neural. Inf. Process. Syst. 33, 17346–17357 (2020)

    Google Scholar 

  32. Li, Z., et al.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 6197–6206 (2021)

    Google Scholar 

  33. Lichtsteiner, P., Posch, C., Delbruck, T.: A 128 \(\times \) 128 120 dB 15 \(\mu \)s latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circuits 43(2), 566–576 (2008)

    Article  Google Scholar 

  34. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)

    Google Scholar 

  35. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of International Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157 (1999)

    Google Scholar 

  36. Lowe, D.G.: Distinctive image features from scale-invariant keypoints 60(2), 91–110 (2004)

    Google Scholar 

  37. Maqueda, A.I., Loquercio, A., Gallego, G., García, N., Scaramuzza, D.: Event-based vision meets deep learning on steering prediction for self-driving cars. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  38. Mostafavi Isfahani, S.M., Nam, Y., Choi, J., Yoon, K.J.: E2SRI: learning to super-resolve intensity images from events. IEEE Trans. Pattern Anal. Mach. Intell. 44, 6890–6909 (2021)

    Google Scholar 

  39. Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., Scaramuzza, D.: The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. Int. J. Rob. Res. 36(2), 142–149 (2017)

    Article  Google Scholar 

  40. Muglikar, M., Gehrig, M., Gehrig, D., Scaramuzza, D.: How to calibrate your event camera. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1403–1409 (2021)

    Google Scholar 

  41. Neira, J., Tardós, J.D.: Data association in stochastic mapping using the joint compatibility test. IEEE Trans. Robot. Autom. 17(6), 890–897 (2001)

    Article  Google Scholar 

  42. Piatkowska, E., Kogler, J., Belbachir, N., Gelautz, M.: Improved cooperative stereo matching for dynamic vision sensors with ground truth evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 53–60 (2017)

    Google Scholar 

  43. Rebecq, H., Gallego, G., Mueggler, E., Scaramuzza, D.: EMVS: event-based multi-view stereo-3D reconstruction with an event camera in real-time. Int. J. Comput. Vision 126(12), 1394–1414 (2018)

    Article  Google Scholar 

  44. Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: Events-to-video: bringing modern computer vision to event cameras. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3857–3866 (2019)

    Google Scholar 

  45. Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  46. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  47. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)

    Google Scholar 

  48. Rueckauer, B., Delbruck, T.: Evaluation of event-based algorithms for optical flow with ground-truth from inertial measurement sensor. Front. Neurosci. 10, 176 (2016)

    Article  Google Scholar 

  49. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4938–4947 (2020)

    Google Scholar 

  50. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018)

    Google Scholar 

  51. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8922–8931 (2021)

    Google Scholar 

  52. Tardós, J.D., Neira, J., Newman, P.M., Leonard, J.J.: Robust mapping and localization in indoor environments using sonar data. Int. J. Rob. Res. 21(4), 311–330 (2002)

    Article  Google Scholar 

  53. Thrun, S., Burgard, W., Fox, D.: A probabilistic approach to concurrent mapping and localization for mobile robots. Auton. Robot. 5(3), 253–271 (1998)

    Article  MATH  Google Scholar 

  54. Tulyakov, S., Fleuret, F., Kiefel, M., Gehler, P., Hirsch, M.: Learning an event sequence embedding for dense event-based deep stereo. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 1527–1537 (2019)

    Google Scholar 

  55. Tulyakov, S., et al.: Time Lens: event-based video frame interpolation. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  56. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  57. Vidal, A.R., Rebecq, H., Horstschaefer, T., Scaramuzza, D.: Ultimate SLAM? Combining events, images, and IMU for robust visual SLAM in HDR and high-speed scenarios. IEEE Rob. Autom. Lett. 3(2), 994–1001 (2018)

    Article  Google Scholar 

  58. Wang, Z., Pan, L., Ng, Y., Zhuang, Z., Mahony, R.: Stereo hybrid event-frame (SHEF) cameras for 3D perception. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9758–9764 (2021)

    Google Scholar 

  59. Weikersdorfer, D., Adrian, D.B., Cremers, D., Conradt, J.: Event-based 3D SLAM with a depth-augmented dynamic vision sensor. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 359–364 (2014)

    Google Scholar 

  60. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28

    Chapter  Google Scholar 

  61. Zhou, Y., Gallego, G., Rebecq, H., Kneip, L., Li, H., Scaramuzza, D.: Semi-dense 3D reconstruction with a stereo event camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 242–258. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_15

    Chapter  Google Scholar 

  62. Zhou, Y., Gallego, G., Shen, S.: Event-based stereo visual odometry. IEEE Trans. Rob. 37(5), 1433–1450 (2021)

    Article  Google Scholar 

  63. Zhu, A.Z., Chen, Y., Daniilidis, K.: Realtime time synchronized event-based stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 438–452. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_27

    Chapter  Google Scholar 

  64. Zhu, A.Z., Thakur, D., Özaslan, T., Pfrommer, B., Kumar, V., Daniilidis, K.: The multivehicle stereo event camera dataset: an event camera dataset for 3D perception. IEEE Rob. Autom. Lett. 3(3), 2032–2039 (2018)

    Article  Google Scholar 

  65. Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: EV-FlowNet: self-supervised optical flow estimation for event-based cameras. In: Proceedings of Robotics: Science and Systems (2018)

    Google Scholar 

  66. Zou, D., et al: Robust dense depth map estimation from sparse DVS stereos. In: Proceedings of British Machine Vision Conference (BMVC), vol. 1 (2017)

    Google Scholar 

  67. Zuo, Y.F., et al.: Accurate depth estimation from a hybrid event-RGB stereo setup. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6833–6840

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Key R &D Program of China (2021ZD0109803) and National Natural Science Foundation of China under Grant No. 62136001, 62088102.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boxin Shi .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1033 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, D., Ding, Q., Duan, P., Zhou, C., Shi, B. (2022). Data Association Between Event Streams and Intensity Frames Under Diverse Baselines. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20071-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20070-0

  • Online ISBN: 978-3-031-20071-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics