Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction

Mostafavi, Mohammad; Wang, Lin; Yoon, Kuk-Jin

doi:10.1007/s11263-020-01410-2

Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction

Published: 05 January 2021

Volume 129, pages 900–920, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

2305 Accesses
32 Citations
Explore all metrics

Abstract

Event cameras have numerous advantages over traditional cameras, such as low latency, high temporal resolution, and high dynamic range (HDR). We initially investigate the potential of creating intensity images/videos from an adjustable portion of the event data stream via event-based conditional generative adversarial networks (cGANs). Using the proposed framework, we further show the versatility of our method in directly handling similar supervised tasks, such as optical flow and depth prediction. Stacks of space-time coordinates of events are used as the inputs while the proposed framework is trained to predict either the intensity images, optical flows, or depth outputs according to the target task. We further demonstrate the unique capability of our approach in generating HDR images even under extreme illumination conditions, creating non-blurred images under rapid motion, and generating very high frame rate videos up to the temporal resolution of event cameras. The proposed framework is evaluated using a publicly available real-world dataset and a synthetic dataset we prepared by utilizing an event camera simulator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Secrets of Event-Based Optical Flow

Fast event-inpainting based on lightweight generative adversarial nets

Article 08 July 2021

Reducing the Sim-to-Real Gap for Event Cameras

Notes

Our dataset is publicly available at https://github.com/wl082013/ESIM_dataset.
imported from OpenCV: cv::quality::QualityBRISQUE.

References

Atapour-Abarghouei, A., & Breckon, T. P. (2018). Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,volume 18, page 1.
Bardow, P., Davison, A. J., & Leutenegger, S. (2016). Simultaneous optical flow and intensity estimation from an event camera. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 884–892.
Barua, S., Miyatani, Y., & Veeraraghavan, A. (2016). Direct face detection and video reconstruction from event cameras. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp. 1–9. IEEE.
Benosman, R., Ieng, S. H., Clercq, C., Bartolozzi, C., & Srinivasan, M. (2012). Asynchronous frameless event-based optical flow. Neural Networks, 27, 32–37.
Article Google Scholar
Binas, J., Neil, D., Liu, S.-C., & Delbruck, T. (2017). DDD17: End-to-end davis driving dataset. arXiv preprint arXiv:1711.01458.
Chen, N. F. (2018). Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 644–653.
Community, B. O. (2018). Blender - a 3D modelling and rendering package. Stichting Blender Foundation, Amsterdam. Retrieved from http://www.blender.org
Cook, M., Gugelmann, L., Jug, F., Krautz, C., & Steger, A. (2011). Interacting maps for fast visual interpretation. In: The 2011 international joint conference on neural networks (IJCNN), pp. 770–776. IEEE.
Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A., Conradt, J., Daniilidis, K., Scaramuzza, D. (2019). Event-based Vision: A Survey. arXiv preprint arXiv:1904.08405.
Gallego, H., Rebecq, H., & Scaramuzza, D. (2018). A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3867–3876).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680.
Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. arXiv preprint.
Karacan, L., Akata, Z., Erdem, A., & Erdem, E. (2016). Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv preprint arXiv:1612.00215.
Kim, H., Leutenegger, S., & Davison, A. J. (2016). Real-time 3d reconstruction and 6-dof tracking with an event camera. In: European conference on computer vision, pp. 349–364. Springer.
Kim, H., Handa, A., Benosman, R., Ieng, S.-H., & Davison, A. J. (2008). Simultaneous mosaicing and tracking with an event camera.J. Solid State Circ, 43, 566–576.
Article Google Scholar
Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In the International Conference on Learning Representations (ICLR).
Lai, W. S., Huang, J. B., Wang, O., Shechtman, E., Yumer, E., & Yang, M. H. (2018). Learning blind video temporal consistency. In: Proceedings of the European conference on computer vision (ECCV), pp. 170–185.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. IEEE Conference on Computer Vision ;and Pattern Recognition (CVPR), volume 2, page 4.
Li, C., & Wand, M. (2016). Precomputed real-time texture synthesis with Markovian generative adversarial networks. In European Conference on Computer Vision, pp. 702–716. Springer.
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A \(128\times 128\)\(120 dB\)\(15 \mu s\) latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.
Article Google Scholar
Maqueda, A. I., Loquercio, A., Gallego, G., Garcıa, N., & Scaramuzza, D. (2018). Event-based vision meets deep learning on steering prediction for self-driving cars. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5419–5427.
Mathieu, M., Couprie, C., & LeCun, Y. (2015). Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440.
Mittal, A., Moorthy, A. K., & Bovik, A. C. (2012). No-reference image quality assessment in the spatial domain. IEEE Trans-actions on Image Processing, 21(12), 4695–4708.
Article MathSciNet Google Scholar
Moeys, D. P., Corradi, F., Kerr, E., Vance, P., Das, G., Neil, D., Kerr, D., & Delbrück, T. (2016). Steering a predator robot using a mixed frame/event-driven convolutional neural network. In: 2016 Second international conference on event-based control, communication, and signal processing (EBCCSP), pp. 1–8. IEEE.
Moeys, D. P., Li, C., Martel, J. N., Bamford, S., Longinotti, L., Motsnyi, V., Bello, D. S. S., Delbruck, T. (2017). Color temporal contrast sensitivity in dynamic vision sensors. In: IEEE international symposium on circuits and systems (ISCAS), 2017, pp. 1–4. IEEE.
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam. The International Journal of Robotics Research, 36(2), 142–149.
Article Google Scholar
Munda, G., Reinbacher, C., & Pock, T. (2018). Real-time intensity-image reconstruction for event cameras using manifold regularisation. International Journal of Computer Vision, 126(12), 1381–1393.
Article Google Scholar
Nguyen, A., Do, T.-T., Caldwell, D. G., & Tsagarakis, N. G. Real-time 6dof pose relocalization for event cameras with stacked spatial lstm networks. arXiv preprint.
Open Source Computer Vision Library 2020.
Ouderaa, V. D., Tycho, F. A., & Worrall, D. E. (2019). Reversible gans for memory-efficient image-to-image translation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4720–4728.
Rebecq, H., Gehrig, D., & Scaramuzza, D. (2018). Esim: An open event camera simulator. In: Conference on robot learning, pp. 969–982.
Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2019). Events-to-video: Bringing modern computer vision to event cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3857–3866).
Rebecq, H., Gallego, G., Mueggler, E., & Scaramuzza, D. (2018). EMVS: Event-based multi-view stereo-3D reconstruction with an event camera in real-time. International Journal of Computer Vision, 126(12), 1394–1414.
Rebecq, H., Horstschaefer, T., Gallego, G., & Scaramuzza, D. (2017). Evo: A geometric approach to event-based 6-dof parallel tracking and mapping in real time. IEEE Robotics and Automation Letters, 2(2), 593–600.
Article Google Scholar
Reinbacher, C., Graber, G., & Pock, T. (2016). Real-time intensity-image reconstruction for event cameras using manifold regularisation. arXiv preprint arXiv:1607.06283.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 234–241. Springer.
Ruder, M., Dosovitskiy, A., & Brox, T. (2016). Artistic style transfer for videos. In: German conference on pattern recognition (pp. 26-36). Springer, Cham.
rviz 3D visualization tool for ROS (2019). Retrieved from https://github.com/ros-visualization/rviz.
Scheerlinck, C., Barnes, N., & Mahony, R. (2018). Continuous-time intensity estimation using event cameras. arXiv preprint arXiv:1811.00386.
Scheerlinck, C., Rebecq, H., Stoffregen, T., Barnes, N., Mahony, R., & Scaramuzza, D. (2019). CED: Color event camera dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.
Shedligeri, P. A., Shah, K., Kumar, D., & Mitra, K. (2018). Photorealistic image reconstruction from hybrid intensity and event based sensor. arXiv preprint arXiv:1805.06140.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR).
Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. In: European conference on computer vision, pp. 318–335. Springer
Wang, Z., Chen, J., & CH, S. (2020). Hoi: Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
Wang, L., Mostafavi I, S. M., Ho, Y., & Yoon, K. (2019). Eventbased high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In: IEEE conference on computer vision and pattern recognition (CVPR).
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Article Google Scholar
Ye, C., Mitrokhin, A., Fermüller, C., Yorke, J. A., & Aloimonos, Y. (2018). Unsupervised learning of dense optical flow, depth and egomotion from sparse event data. arXiv preprint arXiv:1809.08625.
Yi, Z., Zhang, H. R., Tan, P., & Gong, M. (2017). Dualgan: Unsupervised dual learning for image-to-image translation. ICCV, 2868–2876.
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In:Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).
Zhang, L., Zhang, L., Mou, X., & Zhang, D. (2011). Fsim: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8), 2378–2386.
Article MathSciNet Google Scholar
Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial net-works. arXiv preprint.
Zhu, A. Z., Yuan, L., Chaney, K., & Daniilidis, K. (2018). Ev-flownet: Self-supervised optical flow estimation for event-based cameras. Proceedings of Robotics: Science and Systems
Zhu, A. Z., Yuan, L., Chaney, K., & Daniilidis, K. (2019). Unsupervised event-based learning of optical flow, depth, and egomotion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 989-997).
Zhu, A. Z., Thakur, D., Ozaslan, T., Pfrommer, B., Kumar, V., & Daniilidis, K. (2018). The multi vehicle stereo event camera dataset: An event camera dataset for 3D perception. IEEE Robotics and Automation Letters, 3(3), 2032–2039.

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF 2018R1A2B3008640), the Next Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2017M3C4A7069369) and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by Korea government (MSIT) (No.2020-0-00440, Development of Artificial Intelligence Technology that Continuously Improves Itself as the Situation Changes in the Real World).

Author information

Authors and Affiliations

Computer Vision Lab., GIST, Gwangju, Korea
Mohammad Mostafavi
Visual Intelligence Lab., KAIST, Daejeon, Korea
Lin Wang & Kuk-Jin Yoon

Authors

Mohammad Mostafavi
View author publications
You can also search for this author in PubMed Google Scholar
Lin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kuk-Jin Yoon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuk-Jin Yoon.

Additional information

Communicated by Takayuki Okatani.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 6233 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mostafavi, M., Wang, L. & Yoon, KJ. Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction. Int J Comput Vis 129, 900–920 (2021). https://doi.org/10.1007/s11263-020-01410-2

Download citation

Received: 12 September 2019
Accepted: 26 November 2020
Published: 05 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11263-020-01410-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction

Abstract

Access this article

Similar content being viewed by others

Secrets of Event-Based Optical Flow

Fast event-inpainting based on lightweight generative adversarial nets

Reducing the Sim-to-Real Gap for Event Cameras

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction

Abstract

Access this article

Similar content being viewed by others

Secrets of Event-Based Optical Flow

Fast event-inpainting based on lightweight generative adversarial nets

Reducing the Sim-to-Real Gap for Event Cameras

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation