Skip to main content
Log in

Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Event cameras have numerous advantages over traditional cameras, such as low latency, high temporal resolution, and high dynamic range (HDR). We initially investigate the potential of creating intensity images/videos from an adjustable portion of the event data stream via event-based conditional generative adversarial networks (cGANs). Using the proposed framework, we further show the versatility of our method in directly handling similar supervised tasks, such as optical flow and depth prediction. Stacks of space-time coordinates of events are used as the inputs while the proposed framework is trained to predict either the intensity images, optical flows, or depth outputs according to the target task. We further demonstrate the unique capability of our approach in generating HDR images even under extreme illumination conditions, creating non-blurred images under rapid motion, and generating very high frame rate videos up to the temporal resolution of event cameras. The proposed framework is evaluated using a publicly available real-world dataset and a synthetic dataset we prepared by utilizing an event camera simulator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. Our dataset is publicly available at https://github.com/wl082013/ESIM_dataset.

  2. imported from OpenCV: cv::quality::QualityBRISQUE.

References

  • Atapour-Abarghouei, A., & Breckon, T. P. (2018). Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,volume 18, page 1.

  • Bardow, P., Davison, A. J., & Leutenegger, S. (2016). Simultaneous optical flow and intensity estimation from an event camera. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 884–892.

  • Barua, S., Miyatani, Y., & Veeraraghavan, A. (2016). Direct face detection and video reconstruction from event cameras. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp. 1–9. IEEE.

  • Benosman, R., Ieng, S. H., Clercq, C., Bartolozzi, C., & Srinivasan, M. (2012). Asynchronous frameless event-based optical flow. Neural Networks, 27, 32–37.

    Article  Google Scholar 

  • Binas, J., Neil, D., Liu, S.-C., & Delbruck, T. (2017). DDD17: End-to-end davis driving dataset. arXiv preprint arXiv:1711.01458.

  • Chen, N. F. (2018). Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 644–653.

  • Community, B. O. (2018). Blender - a 3D modelling and rendering package. Stichting Blender Foundation, Amsterdam. Retrieved from http://www.blender.org

  • Cook, M., Gugelmann, L., Jug, F., Krautz, C., & Steger, A. (2011). Interacting maps for fast visual interpretation. In: The 2011 international joint conference on neural networks (IJCNN), pp. 770–776. IEEE.

  • Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A., Conradt, J., Daniilidis, K., Scaramuzza, D. (2019). Event-based Vision: A Survey. arXiv preprint arXiv:1904.08405.

  • Gallego, H., Rebecq, H., & Scaramuzza, D. (2018). A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3867–3876).

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680.

  • Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. arXiv preprint.

  • Karacan, L., Akata, Z., Erdem, A., & Erdem, E. (2016). Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv preprint arXiv:1612.00215.

  • Kim, H., Leutenegger, S., & Davison, A. J. (2016). Real-time 3d reconstruction and 6-dof tracking with an event camera. In: European conference on computer vision, pp. 349–364. Springer.

  • Kim, H., Handa, A., Benosman, R., Ieng, S.-H., & Davison, A. J. (2008). Simultaneous mosaicing and tracking with an event camera.J. Solid State Circ, 43, 566–576.

    Article  Google Scholar 

  • Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In the International Conference on Learning Representations (ICLR).

  • Lai, W. S., Huang, J. B., Wang, O., Shechtman, E., Yumer, E., & Yang, M. H. (2018). Learning blind video temporal consistency. In: Proceedings of the European conference on computer vision (ECCV), pp. 170–185.

  • Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. IEEE Conference on Computer Vision ;and Pattern Recognition (CVPR), volume 2, page 4.

  • Li, C., & Wand, M. (2016). Precomputed real-time texture synthesis with Markovian generative adversarial networks. In European Conference on Computer Vision, pp. 702–716. Springer.

  • Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A \(128\times 128\)\(120 dB\)\(15 \mu s\) latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.

    Article  Google Scholar 

  • Maqueda, A. I., Loquercio, A., Gallego, G., Garcıa, N., & Scaramuzza, D. (2018). Event-based vision meets deep learning on steering prediction for self-driving cars. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5419–5427.

  • Mathieu, M., Couprie, C., & LeCun, Y. (2015). Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440.

  • Mittal, A., Moorthy, A. K., & Bovik, A. C. (2012). No-reference image quality assessment in the spatial domain. IEEE Trans-actions on Image Processing, 21(12), 4695–4708.

    Article  MathSciNet  Google Scholar 

  • Moeys, D. P., Corradi, F., Kerr, E., Vance, P., Das, G., Neil, D., Kerr, D., & Delbrück, T. (2016). Steering a predator robot using a mixed frame/event-driven convolutional neural network. In: 2016 Second international conference on event-based control, communication, and signal processing (EBCCSP), pp. 1–8. IEEE.

  • Moeys, D. P., Li, C., Martel, J. N., Bamford, S., Longinotti, L., Motsnyi, V., Bello, D. S. S., Delbruck, T. (2017). Color temporal contrast sensitivity in dynamic vision sensors. In: IEEE international symposium on circuits and systems (ISCAS), 2017, pp. 1–4. IEEE.

  • Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam. The International Journal of Robotics Research, 36(2), 142–149.

    Article  Google Scholar 

  • Munda, G., Reinbacher, C., & Pock, T. (2018). Real-time intensity-image reconstruction for event cameras using manifold regularisation. International Journal of Computer Vision, 126(12), 1381–1393.

    Article  Google Scholar 

  • Nguyen, A., Do, T.-T., Caldwell, D. G., & Tsagarakis, N. G. Real-time 6dof pose relocalization for event cameras with stacked spatial lstm networks. arXiv preprint.

  • Open Source Computer Vision Library 2020.

  • Ouderaa, V. D., Tycho, F. A., & Worrall, D. E. (2019). Reversible gans for memory-efficient image-to-image translation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4720–4728.

  • Rebecq, H., Gehrig, D., & Scaramuzza, D. (2018). Esim: An open event camera simulator. In: Conference on robot learning, pp. 969–982.

  • Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2019). Events-to-video: Bringing modern computer vision to event cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3857–3866).

  • Rebecq, H., Gallego, G., Mueggler, E., & Scaramuzza, D. (2018). EMVS: Event-based multi-view stereo-3D reconstruction with an event camera in real-time. International Journal of Computer Vision, 126(12), 1394–1414.

  • Rebecq, H., Horstschaefer, T., Gallego, G., & Scaramuzza, D. (2017). Evo: A geometric approach to event-based 6-dof parallel tracking and mapping in real time. IEEE Robotics and Automation Letters, 2(2), 593–600.

    Article  Google Scholar 

  • Reinbacher, C., Graber, G., & Pock, T. (2016). Real-time intensity-image reconstruction for event cameras using manifold regularisation. arXiv preprint arXiv:1607.06283.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 234–241. Springer.

  • Ruder, M., Dosovitskiy, A., & Brox, T. (2016). Artistic style transfer for videos. In: German conference on pattern recognition (pp. 26-36). Springer, Cham.

  • rviz 3D visualization tool for ROS (2019). Retrieved from https://github.com/ros-visualization/rviz.

  • Scheerlinck, C., Barnes, N., & Mahony, R. (2018). Continuous-time intensity estimation using event cameras. arXiv preprint arXiv:1811.00386.

  • Scheerlinck, C., Rebecq, H., Stoffregen, T., Barnes, N., Mahony, R., & Scaramuzza, D. (2019). CED: Color event camera dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.

  • Shedligeri, P. A., Shah, K., Kumar, D., & Mitra, K. (2018). Photorealistic image reconstruction from hybrid intensity and event based sensor. arXiv preprint arXiv:1805.06140.

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR).

  • Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. In: European conference on computer vision, pp. 318–335. Springer

  • Wang, Z., Chen, J., & CH, S. (2020). Hoi: Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

  • Wang, L., Mostafavi I, S. M., Ho, Y., & Yoon, K. (2019). Eventbased high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In: IEEE conference on computer vision and pattern recognition (CVPR).

  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.

    Article  Google Scholar 

  • Ye, C., Mitrokhin, A., Fermüller, C., Yorke, J. A., & Aloimonos, Y. (2018). Unsupervised learning of dense optical flow, depth and egomotion from sparse event data. arXiv preprint arXiv:1809.08625.

  • Yi, Z., Zhang, H. R., Tan, P., & Gong, M. (2017). Dualgan: Unsupervised dual learning for image-to-image translation. ICCV, 2868–2876.

  • Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In:Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).

  • Zhang, L., Zhang, L., Mou, X., & Zhang, D. (2011). Fsim: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8), 2378–2386.

    Article  MathSciNet  Google Scholar 

  • Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial net-works. arXiv preprint.

  • Zhu, A. Z., Yuan, L., Chaney, K., & Daniilidis, K. (2018). Ev-flownet: Self-supervised optical flow estimation for event-based cameras. Proceedings of Robotics: Science and Systems

  • Zhu, A. Z., Yuan, L., Chaney, K., & Daniilidis, K. (2019). Unsupervised event-based learning of optical flow, depth, and egomotion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 989-997).

  • Zhu, A. Z., Thakur, D., Ozaslan, T., Pfrommer, B., Kumar, V., & Daniilidis, K. (2018). The multi vehicle stereo event camera dataset: An event camera dataset for 3D perception. IEEE Robotics and Automation Letters, 3(3), 2032–2039.

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF 2018R1A2B3008640), the Next Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2017M3C4A7069369) and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by Korea government (MSIT) (No.2020-0-00440, Development of Artificial Intelligence Technology that Continuously Improves Itself as the Situation Changes in the Real World).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuk-Jin Yoon.

Additional information

Communicated by Takayuki Okatani.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 6233 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mostafavi, M., Wang, L. & Yoon, KJ. Learning to Reconstruct HDR Images from Events, with Applications to Depth and Flow Prediction. Int J Comput Vis 129, 900–920 (2021). https://doi.org/10.1007/s11263-020-01410-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01410-2

Keywords

Navigation