Abstract
This paper proposes a learning-based framework to associate event streams and intensity frames under diverse camera baselines, to simultaneously benefit camera pose estimation under large baselines and depth estimation under small baselines. Based on the observation that event streams are globally sparse (a small percentage of pixels in global frames are triggered with events) and locally dense (a large percentage of pixels in local patches are triggered with events) in the spatial domain, we put forward a two-stage architecture for matching feature maps. LSparse-Net uses a large receptive field to find sparse matches while SDense-Net uses a small receptive field to find dense matches. Both stages apply Transformer modules with self-attention layers and cross-attention layers to effectively process multi-resolution features from the feature pyramid network backbone. Experimental results on public datasets show a systematic performance improvement for both tasks compared to state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmed, S.H., Jang, H.W., Uddin, S.N., Jung, Y.J.: Deep event stereo leveraged by event-to-image translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 882–890 (2021)
Alexis, B., Zihao, W.W., Oliver, C., Aggelos, K.K.: E3D: event-based 3D shape reconstruction. CoRR abs/2012.05214 (2020). https://arxiv.org/abs/2012.05214
Bryner, S., Gallego, G., Rebecq, H., Scaramuzza, D.: Event-based, direct camera tracking from a photometric 3D map using nonlinear optimization. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 325–331 (2019)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Censi, A., Scaramuzza, D.: Low-latency event-based visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 703–710 (2014)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5828–5839 (2017)
Delbruck, T.: Frame-free dynamic digital vision. In: Proceedings of International Symposium on Secure-Life Electronics, Advanced Electronics for Quality Life and Society, vol. 1, pp. 21–26 (2008)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Toward geometric deep slam. arXiv preprint arXiv:1707.07410 (2017)
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2758–2766 (2015)
Duan, P., Wang, Z., Shi, B., Cossairt, O., Huang, T., Katsaggelos, A.: Guided event filtering: synergy between intensity images and neuromorphic events for high performance imaging. IEEE Trans. Pattern Anal. Mach. Intell. 44, 8261–8275 (2021)
Duan, P., Wang, Z., Zhou, X., Ma, Y., Shi, B.: EventZoom: learning to denoise and super resolve neuromorphic events. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Gallego, G., Forster, C., Mueggler, E., Scaramuzza, D.: Event-based camera pose tracking using a generative event model. arXiv preprint arXiv:1510.01972 (2015)
Gallego, G., Lund, J.E., Mueggler, E., Rebecq, H., Delbruck, T., Scaramuzza, D.: Event-based, 6-DoF camera tracking from photometric depth maps. IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2402–2412 (2017)
Gallego, G., Rebecq, H., Scaramuzza, D.: A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3867–3876 (2018)
Gehrig, D., Rebecq, H., Gallego, G., Scaramuzza, D.: Asynchronous, photometric feature tracking using events and frames. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 766–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_46
Gehrig, D., Rebecq, H., Gallego, G., Scaramuzza, D.: EKLT: asynchronous photometric feature tracking using events and frames. Int. J. Comput. Vis. 128, 1–18 (2019)
Han, J., et al: Neuromorphic camera guided high dynamic range imaging. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2007)
Hu, Y., Liu, S.C., Delbruck, T.: V2E: from video frames to realistic DVS events. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1312–1321 (2021)
Hui, T.W., Tang, X., Loy, C.C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)
Hui, T.W., Tang, X., Loy, C.C.: A lightweight optical flow CNN-revisiting data fidelity and regularization. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2555–2569 (2020)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2462–2470 (2017)
Jin, Y., Mishkin, D., Mishchuk, A., Matas, J., Fua, P., Yi, K.M., Trulls, E.: Image matching across wide baselines: from paper to practice 129(2), 517–547 (2021)
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: International Conference on Machine Learning, pp. 5156–5165 (2020)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2938–2946 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lagorce, X., Orchard, G., Galluppi, F., Shi, B.E., Benosman, R.B.: HOTS: a hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1346–1359 (2016)
Li, H., Li, G., Shi, L.: Super-resolution of spatiotemporal event-stream image. Neurocomputing 335, 206–214 (2019)
Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. Adv. Neural. Inf. Process. Syst. 33, 17346–17357 (2020)
Li, Z., et al.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 6197–6206 (2021)
Lichtsteiner, P., Posch, C., Delbruck, T.: A 128 \(\times \) 128 120 dB 15 \(\mu \)s latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circuits 43(2), 566–576 (2008)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of International Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157 (1999)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints 60(2), 91–110 (2004)
Maqueda, A.I., Loquercio, A., Gallego, G., García, N., Scaramuzza, D.: Event-based vision meets deep learning on steering prediction for self-driving cars. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Mostafavi Isfahani, S.M., Nam, Y., Choi, J., Yoon, K.J.: E2SRI: learning to super-resolve intensity images from events. IEEE Trans. Pattern Anal. Mach. Intell. 44, 6890–6909 (2021)
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., Scaramuzza, D.: The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. Int. J. Rob. Res. 36(2), 142–149 (2017)
Muglikar, M., Gehrig, M., Gehrig, D., Scaramuzza, D.: How to calibrate your event camera. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1403–1409 (2021)
Neira, J., Tardós, J.D.: Data association in stochastic mapping using the joint compatibility test. IEEE Trans. Robot. Autom. 17(6), 890–897 (2001)
Piatkowska, E., Kogler, J., Belbachir, N., Gelautz, M.: Improved cooperative stereo matching for dynamic vision sensors with ground truth evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 53–60 (2017)
Rebecq, H., Gallego, G., Mueggler, E., Scaramuzza, D.: EMVS: event-based multi-view stereo-3D reconstruction with an event camera in real-time. Int. J. Comput. Vision 126(12), 1394–1414 (2018)
Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: Events-to-video: bringing modern computer vision to event cameras. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3857–3866 (2019)
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)
Rueckauer, B., Delbruck, T.: Evaluation of event-based algorithms for optical flow with ground-truth from inertial measurement sensor. Front. Neurosci. 10, 176 (2016)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4938–4947 (2020)
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018)
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8922–8931 (2021)
Tardós, J.D., Neira, J., Newman, P.M., Leonard, J.J.: Robust mapping and localization in indoor environments using sonar data. Int. J. Rob. Res. 21(4), 311–330 (2002)
Thrun, S., Burgard, W., Fox, D.: A probabilistic approach to concurrent mapping and localization for mobile robots. Auton. Robot. 5(3), 253–271 (1998)
Tulyakov, S., Fleuret, F., Kiefel, M., Gehler, P., Hirsch, M.: Learning an event sequence embedding for dense event-based deep stereo. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 1527–1537 (2019)
Tulyakov, S., et al.: Time Lens: event-based video frame interpolation. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Vidal, A.R., Rebecq, H., Horstschaefer, T., Scaramuzza, D.: Ultimate SLAM? Combining events, images, and IMU for robust visual SLAM in HDR and high-speed scenarios. IEEE Rob. Autom. Lett. 3(2), 994–1001 (2018)
Wang, Z., Pan, L., Ng, Y., Zhuang, Z., Mahony, R.: Stereo hybrid event-frame (SHEF) cameras for 3D perception. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9758–9764 (2021)
Weikersdorfer, D., Adrian, D.B., Cremers, D., Conradt, J.: Event-based 3D SLAM with a depth-augmented dynamic vision sensor. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 359–364 (2014)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Zhou, Y., Gallego, G., Rebecq, H., Kneip, L., Li, H., Scaramuzza, D.: Semi-dense 3D reconstruction with a stereo event camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 242–258. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_15
Zhou, Y., Gallego, G., Shen, S.: Event-based stereo visual odometry. IEEE Trans. Rob. 37(5), 1433–1450 (2021)
Zhu, A.Z., Chen, Y., Daniilidis, K.: Realtime time synchronized event-based stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 438–452. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_27
Zhu, A.Z., Thakur, D., Özaslan, T., Pfrommer, B., Kumar, V., Daniilidis, K.: The multivehicle stereo event camera dataset: an event camera dataset for 3D perception. IEEE Rob. Autom. Lett. 3(3), 2032–2039 (2018)
Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: EV-FlowNet: self-supervised optical flow estimation for event-based cameras. In: Proceedings of Robotics: Science and Systems (2018)
Zou, D., et al: Robust dense depth map estimation from sparse DVS stereos. In: Proceedings of British Machine Vision Conference (BMVC), vol. 1 (2017)
Zuo, Y.F., et al.: Accurate depth estimation from a hybrid event-RGB stereo setup. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6833–6840
Acknowledgements
This work was supported by National Key R &D Program of China (2021ZD0109803) and National Natural Science Foundation of China under Grant No. 62136001, 62088102.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, D., Ding, Q., Duan, P., Zhou, C., Shi, B. (2022). Data Association Between Event Streams and Intensity Frames Under Diverse Baselines. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-20071-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20070-0
Online ISBN: 978-3-031-20071-7
eBook Packages: Computer ScienceComputer Science (R0)