Advertisement

Across Scales and Across Dimensions: Temporal Super-Resolution Using Deep Internal Learning

Conference paper
  • 743 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12352)

Abstract

When a very fast dynamic event is recorded with a low-framerate camera, the resulting video suffers from severe motion blur (due to exposure time) and motion aliasing (due to low sampling rate in time). True Temporal Super-Resolution (TSR) is more than just Temporal-Interpolation (increasing framerate). It can also recover new high temporal frequencies beyond the temporal Nyquist limit of the input video, thus resolving both motion-blur and motion-aliasing – effects that temporal frame interpolation (as sophisticated as it may be) cannot undo. In this paper we propose a “Deep Internal Learning” approach for true TSR. We train a video-specific CNN on examples extracted directly from the low-framerate input video. Our method exploits the strong recurrence of small space-time patches inside a single video sequence, both within and across different spatio-temporal scales of the video. We further observe (for the first time) that small space-time patches recur also across-dimensions of the video sequence – i.e., by swapping the spatial and temporal dimensions. In particular, the higher spatial resolution of video frames provides strong examples as to how to increase the temporal resolution of that video. Such internal video-specific examples give rise to strong self-supervision, requiring no data but the input video itself. This results in Zero-Shot Temporal-SR of complex videos, which removes both motion blur and motion aliasing, outperforming previous supervised methods trained on external video datasets.

Notes

Acknowledgments

Thanks to Ben Feinstein for his invaluable help in getting the GPUs to run smoothly and efficiently. This project received funding from the European Research Council (ERC) Horizon 2020, grant No. 788535, and from the Carolito Stiftung. Dr Bagon is a Robin Chemers Neustein AI Fellow.

Supplementary material

Supplementary material 1 (mp4 88973 KB)

504444_1_En_4_MOESM2_ESM.pdf (10.2 mb)
Supplementary material 2 (pdf 10480 KB)

References

  1. 1.
    Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2011)CrossRefGoogle Scholar
  2. 2.
    Bao, W., Lai, W.S., Ma, C., Zhang, X., Gao, Z., Yang, M.H.: Depth-aware video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3703–3712 (2019)Google Scholar
  3. 3.
    Bao, W., Lai, W.S., Zhang, X., Gao, Z., Yang, M.H.: MEMC-Net: motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 12, 1–16 (2019)CrossRefGoogle Scholar
  4. 4.
    Barnes, C., Shechtman, E., Goldman, D.B., Finkelstein, A.: The generalized PatchMatch correspondence algorithm. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 29–43. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15558-1_3CrossRefGoogle Scholar
  5. 5.
    Glasner, D., Bagon, S., Irani, M.: Super-resolution from a single image. In: 2009 IEEE 12th International Conference on Computer Vision (ICCV) (2009)Google Scholar
  6. 6.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 11. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  7. 7.
    Hyun Kim, T., Mu Lee, K.: Generalized video deblurring for dynamic scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  8. 8.
    Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP Graph. Models Image Process. 53, 231–239 (1991)CrossRefGoogle Scholar
  9. 9.
    Jiang, H., Sun, D., Jampani, V., Yang, M.H., Learned-Miller, E., Kautz, J.: Super SloMo: high quality estimation of multiple intermediate frames for video interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  10. 10.
    Jin, M., Hu, Z., Favaro, P.: Learning to extract flawless slow motion from blurry videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  11. 11.
    Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR) (2017)Google Scholar
  12. 12.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  13. 13.
    Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NeurIPS) (2016)Google Scholar
  14. 14.
    Meyer, S., Djelouah, A., McWilliams, B., Sorkine-Hornung, A., Gross, M., Schroers, C.: PhaseNet for video frame interpolation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)Google Scholar
  15. 15.
    Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  16. 16.
    Peleg, T., Szekely, P., Sabo, D., Sendik, O.: IM-Net for high resolution video frame interpolation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)Google Scholar
  17. 17.
    Shaham, T.R., Dekel, T., Michaeli, T.: SinGAN: learning a generative model from a single natural image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  18. 18.
    Shahar, O., Faktor, A., Irani, M.: Super-resolution from a single video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  19. 19.
    Shechtman, E., Caspi, Y., Irani, M.: Increasing space-time resolution in video. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 753–768. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-47969-4_50CrossRefGoogle Scholar
  20. 20.
    Shocher, A., Bagon, S., Isola, P., Irani, M.: InGAN: capturing and remapping the “DNA” of a natural image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  21. 21.
    Shocher, A., Cohen, N., Irani, M.: “zero-shot” super-resolution using deep internal learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 5, no. 8, p. 10 (2018)Google Scholar
  22. 22.
    Soomro, K., Zamir, A.R., Shah, M.: A dataset of 101 human action classes from videos in the wild. Center for Research in Computer Vision (2012)Google Scholar
  23. 23.
    Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., Wang, O.: Deep video deblurring for hand-held cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1279–1288 (2017)Google Scholar
  24. 24.
    Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vis. (IJCV) 127(8), 1106–1125 (2019)CrossRefGoogle Scholar
  25. 25.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 12, no. 13, p. 14 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer Science and Applied MathThe Weizmann Institute of ScienceRehovotIsrael
  2. 2.Weizmann Artificial Intelligence Center (WAIC)RehovotIsrael
  3. 3.Technion, Israel Institute of TechnologyHaifaIsrael

Personalised recommendations