Selection and Cross Similarity for Event-Image Deep Stereo

Cho, Hoonhee; Yoon, Kuk-Jin

doi:10.1007/978-3-031-19824-3_28

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13692))

Included in the following conference series:

European Conference on Computer Vision

2529 Accesses
4 Citations

Abstract

Standard frame-based cameras have shortcomings of low dynamic range and motion blur in real applications. On the other hand, event cameras, which are bio-inspired sensors, asynchronously output the polarity values of pixel-level log intensity changes and report continuous stream data even under fast motion with a high dynamic range. Therefore, event cameras are effective in stereo depth estimation under challenging illumination conditions and/or fast motion. To estimate the disparity map with events, existing state-of-the-art event-based stereo models use the image together with past events that occurred up to the current image acquisition time. However, not all events equally contribute to the disparity estimation of the current frame since past events occur at different times under different movements with different disparity values. Therefore, events need to be carefully selected for accurate event-guided disparity estimation. In this paper, we aim to effectively deal with events that continuously occur with different disparity values in the scene depending on the camera’s movement. To this end, we first propose the differentiable event selection network to select the most relevant events for current depth estimation. Furthermore, we effectively use feature-like events triggered around the boundary of objects, leading them to serve as ideal guides in disparity estimation. To this end, we propose a neighbor cross similarity feature (NCSF) that considers the similarity between different modalities. Finally, our experiments on various datasets demonstrate the superiority of our method to estimate the depth using images and event data together. Our project code is available at: https://github.com/Chohoonhee/SCSNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abernethy, J., Lee, C., Tewari, A.: Perturbation techniques in online learning and optimization. Perturbations, Optimization, and Statistics, p. 223 (2016)
Google Scholar
Ahmed, S.H., Jang, H.W., Uddin, S.N., Jung, Y.J.: Deep event stereo leveraged by event-to-image translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 882–890 (2021)
Google Scholar
Berthet, Q., Blondel, M., Teboul, O., Cuturi, M., Vert, J.P., Bach, F.: Learning with differentiable pertubed optimizers. Adv. Neural. Inf. Process. Syst. 33, 9508–9519 (2020)
Google Scholar
Brandli, C., Berner, R., Yang, M., Liu, S.C., Delbruck, T.: A 240 \(\times \) 180 130 db 3 \(\upmu \)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Cir. 49, 2333–2341 (2014)
Google Scholar
Camunas-Mesa, L.A., Serrano-Gotarredona, T., Ieng, S.H., Benosman, R.B., Linares-Barranco, B.: On the use of orientation filters for 3D reconstruction in event-driven stereo vision. Front. Neurosci. 8, 48 (2014)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Google Scholar
Cheng, X., et al.: Hierarchical neural architecture search for deep stereo matching. arXiv abs/2010.13501 (2020)
Google Scholar
Cho, H., Jeong, J., Yoon, K.J.: EOMVS: event-based omnidirectional multi-view stereo. IEEE Robot. Autom. Lett. 6(4), 6709–6716 (2021). https://doi.org/10.1109/LRA.2021.3096161
Article Google Scholar
Choi, J., et al.: Learning to super resolve intensity images from events. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2768–2776 (2020)
Google Scholar
Gehrig, M., Aarents, W., Gehrig, D., Scaramuzza, D.: DSEC: a stereo event camera dataset for driving scenarios. IEEE Robot. Autom. Lett. 6(3), 4947–4954 (2021)
Article Google Scholar
Gumbel, E.J.: Statistical theory of extreme values and some practical applications: a series of lectures, vol. 33. US Government Printing Office (1954)
Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3268–3277 (2019)
Google Scholar
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P.: End-to-end learning of geometry and context for deep stereo regression. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kogler, J., Humenberger, M., Sulzbachner, C.: Event-based stereo matching approaches for frameless address event stereo data. In: International Symposium on Visual Computing, pp. 674–685. Springer (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (2012)
Google Scholar
Laga, H., Jospin, L.V., Boussaïd, F., Bennamoun: a survey on deep learning techniques for stereo-based depth estimation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Google Scholar
Liang, Z., et al.: Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2811–2820 (2018)
Google Scholar
Lichtsteiner, P., Posch, C., Delbrück, T.: A 128 \(\times \) 128 120 db 15 \(\upmu \)s latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circ. 43, 566–576 (2008)
Google Scholar
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)
Google Scholar
Mostafavi, M., Yoon, K.J., Choi, J.: Event-intensity stereo: estimating depth by the best of both worlds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4258–4267 (2021)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Piatkowska, E., Belbachir, A., Gelautz, M.: Asynchronous stereo vision for event-driven dynamic stereo sensor using an adaptive cooperative approach. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 45–50 (2013)
Google Scholar
Piatkowska, E., Kogler, J., Belbachir, N., Gelautz, M.: Improved cooperative stereo matching for dynamic vision sensors with ground truth evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 53–60 (2017)
Google Scholar
Rebecq, H., Gallego, G., Mueggler, E., Scaramuzza, D.: EMVS: event-based multi-view stereo-3D reconstruction with an event camera in real-time. Int. J. Comput. Vision 126, 1394–1414 (2017)
Article Google Scholar
Rogister, P., Benosman, R., Ieng, S.H., Lichtsteiner, P., Delbruck, T.: Asynchronous event-based binocular stereo matching. IEEE Trans. Neural Netw. Learn. Syst. 23(2), 347–353 (2011)
Article Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 47, 7–42 (2004)
Article MATH Google Scholar
Tulyakov, S., Fleuret, F., Kiefel, M., Gehler, P., Hirsch, M.: Learning an event sequence embedding for dense event-based deep stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1527–1537 (2019)
Google Scholar
Tulyakov, S., Ivanov, A., Fleuret, F.: Practical deep stereo (pds): toward applications-friendly deep stereo matching. arXiv preprint arXiv:1806.01677 (2018)
Wang, L., et al.: Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10081–10090 (2019)
Google Scholar
Xu, H., Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1956–1965 (2020)
Google Scholar
Yang, G., Zhao, H., Shi, J., Deng, Z., Jia, J.: SegStereo: exploiting semantic information for disparity estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 660–676. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_39
Chapter Google Scholar
Zhang, F., Prisacariu, V.A., Yang, R., Torr, P.H.S.: GA-Net: guided aggregation net for end-to-end stereo matching. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194 (2019)
Google Scholar
Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., Torr, P.: Domain-invariant stereo matching networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 420–439. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_25
Chapter Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Google Scholar
Zhong, Y., Dai, Y., Li, H.: Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:1709.00930 (2017)
Zhu, A.Z., Chen, Y., Daniilidis, K.: Realtime time synchronized event-based stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 438–452. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_27
Chapter Google Scholar
Zhu, A.Z., Thakur, D., Özaslan, T., Pfrommer, B., Kumar, V., Daniilidis, K.: The multivehicle stereo event camera dataset: an event camera dataset for 3D perception. IEEE Robot. Autom. Lett. 3(3), 2032–2039 (2018)
Article Google Scholar
Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: Unsupervised event-based learning of optical flow, depth, and egomotion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 989–997 (2019)
Google Scholar
Zou, D., et al.: Context-aware event-driven stereo matching. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1076–1080. IEEE (2016)
Google Scholar
Zou, D., et al.: Robust dense depth map estimation from sparse DVS stereos. In: British British Machine Vision Conference (BMVC), vol. 1 (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government (MSIT) (NRF-2022R1A2B5B03002636).

Author information

Authors and Affiliations

Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Hoonhee Cho & Kuk-Jin Yoon

Authors

Hoonhee Cho
View author publications
You can also search for this author in PubMed Google Scholar
Kuk-Jin Yoon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuk-Jin Yoon .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2333 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cho, H., Yoon, KJ. (2022). Selection and Cross Similarity for Event-Image Deep Stereo. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-19824-3_28
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Selection and Cross Similarity for Event-Image Deep Stereo