Skip to main content

Learning Cross-Video Neural Representations for High-Quality Frame Interpolation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13675))

Included in the following conference series:

  • 2708 Accesses

Abstract

This paper considers the problem of temporal video interpolation, where the goal is to synthesize a new video frame given its two neighbors. We propose C ross-Video Ne u ral Re presentation (CURE) as the first video interpolation method based on neural fields (NF). NF refers to the recent class of methods for neural representation of complex 3D scenes that has seen widespread success and application across computer vision. CURE represents the video as a continuous function parameterized by a coordinate-based neural network, whose inputs are the spatiotemporal coordinates and outputs are the corresponding RGB values. CURE introduces a new architecture that conditions the neural network on the input frames for imposing space-time consistency in the synthesized video. This not only improves the final interpolation quality, but also enables CURE to learn a prior across multiple videos. Experimental evaluations show that CURE achieves the state-of-the-art performance on video interpolation on several benchmark datasets. (This work was supported by CCF-2043134.)

W. Shangguan and Y. Sun—These authors contributed equally and are ranked in the alphabetical order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://media.xiph.org/video/derf/.

References

  1. Bao, W., Lai, W.S., Zhang, X., Gao, Z., Yang, M.H.: MEMC-Net: motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 933–948 (2021)

    Article  Google Scholar 

  2. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24673-2_3

    Chapter  Google Scholar 

  3. Castagno, R., Haavisto, P., Ramponi, G.: A method for motion adaptive frame rate up-conversion. IEEE Trans. Circuits Syst. Video Technol. 6(5), 436–446 (1996)

    Article  Google Scholar 

  4. Chen, H., He, B., Wang, H., Ren, Y., Lim, S.N., Shrivastava, A.: NeRV: neural representations for videos. Adv. Neural. Inf. Process. Syst. 34, 21557–21568 (2021)

    Google Scholar 

  5. Chen, Z., Jin, H., Lin, Z., Cohen, S., Wu, Y.: Large displacement optical flow from nearest neighbor fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2443–2450 (2013)

    Google Scholar 

  6. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5939–5948 (2019)

    Google Scholar 

  7. Choi, M., Kim, H., Han, B., Xu, N., Lee, K.M.: Channel attention is all you need for video frame interpolation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10663–10671 (2020)

    Google Scholar 

  8. Du, Y., Zhang, Y., Yu, H.X., Tenenbaum, J.B., Wu, J.: Neural radiance flow for 4D view synthesis and video processing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14324–14334 (2021)

    Google Scholar 

  9. Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5515–5524 (2016)

    Google Scholar 

  10. Gupta, A., Aich, A., Roy-Chowdhury, A.K.: ALANET: adaptive latent attention network forjoint video deblurring and interpolation. arXiv:2009.01005 [cs.CV] (2020)

  11. Hui, T.W., Tang, X., Loy, C.C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)

    Google Scholar 

  12. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  13. Jiang, H., Sun, D., Jampani, V., Yang, M.H., Learned-Miller, E., Kautz, J.: SuperSloMo: high quality estimation of multiple intermediate frames for video interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9000–9008 (2018)

    Google Scholar 

  14. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  15. Lee, H., Kim, T., Chung, T.Y., Pak, D., Ban, Y., Lee, S.: AdaCof: adaptive collaboration of flows for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5316–5325 (2020)

    Google Scholar 

  16. Li, H., Yuan, Y., Wang, Q.: Video frame interpolation via residue refinement. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2613–2617 (2020)

    Google Scholar 

  17. Li, T., et al.: Neural 3D video synthesis. arXiv:2103.02597 (2021)

  18. Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. arXiv:2011.13084 (2020)

  19. Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6498–6508 (2021)

    Google Scholar 

  20. Lindell, D.B., Martel, J.N.P., Wetzstein, G.: AutoInt: automatic integration for fast neural volume rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  21. Liu, L., Gu, J., Lin, K.Z., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. Adv. Neural. Inf. Process. Syst. 33, 15651–15663 (2020)

    Google Scholar 

  22. Liu, R., Sun, Y., Zhu, J., Tian, L., Kamilov, U.S.: Zero-shot learning of continuous 3D refractive index maps from discrete intensity-only measurements. arXiv:2112.00002 (2021)

  23. Liu, Z., Yeh, R.A., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4463–4471 (2017)

    Google Scholar 

  24. Long, G., Kneip, L., Alvarez, J.M., Li, H., Zhang, X., Yu, Q.: Learning image matching by simply watching video. In: European Conference on Computer Vision, pp. 434–450 (2016)

    Google Scholar 

  25. Lu, G., Zhang, X., Chen, L., Gao, Z.: Novel integration of frame rate up conversion and HEVC coding based on rate-distortion optimization. IEEE Trans. Image Process. 27(2), 678–691 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  26. Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: NeRF in the wild: neural radiance fields for unconstrained photo collections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7210–7219 (2021)

    Google Scholar 

  27. Meyer, S., Djelouah, A., McWilliams, B., Sorkine-Hornung, A., Gross, M., Schroers, C.: PhaseNet for video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 498–507 (2018)

    Google Scholar 

  28. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: European Conference on Computer Vision, pp. 405–421 (2020)

    Google Scholar 

  29. Niklaus, S., Liu, F.: Context-aware synthesis for video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1710 (2018)

    Google Scholar 

  30. Niklaus, S., Liu, F.: Softmax splatting for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5437–5446 (2020)

    Google Scholar 

  31. Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 670–679 (2017)

    Google Scholar 

  32. Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 261–270 (2017)

    Google Scholar 

  33. Niklaus, S., Mai, L., Wang, O.: Revisiting adaptive convolutions for video frame interpolation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1099–1109 (2021)

    Google Scholar 

  34. Oh, J., Kim, M.: DeMFI: deep joint deblurring and multi-frame interpolation with flow-guided attentive correlation and recursive boosting. arXiv:2111.09985 [cs.CV] (2021)

  35. Park, J., Ko, K., Lee, C., Kim, C.S.: BMBC: bilateral motion estimation with bilateral cost volume for video interpolation. In: European Conference on Computer Vision, pp. 109–125 (2020)

    Google Scholar 

  36. Park, J., Lee, C., Kim, C.S.: Asymmetric bilateral motion estimation for video frame interpolation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14539–14548 (2021)

    Google Scholar 

  37. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)

    Google Scholar 

  38. Park, K., et al.: Nerfies: deformable neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5865–5874 (2021)

    Google Scholar 

  39. Peng, S., et al.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  40. Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4161–4170 (2017)

    Google Scholar 

  41. Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., Zha, H.: Unsupervised deep learning for optical flow estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

    Google Scholar 

  42. Shen, L., Pauly, J., Xing, L.: NeRP: implicit neural representation learning with prior embedding for sparsely sampled image reconstruction. arXiv:2108.10991 [eess.IV] (2021)

  43. Shi, Z., Liu, X., Shi, K., Dai, L., Chen, J.: Video frame interpolation via generalized deformable convolution. IEEE Trans. Multimedia 24, 426–439 (2021)

    Article  Google Scholar 

  44. Sim, H., Oh, J., Kim, M.: XVFI: extreme video frame interpolation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14489–14498 (2021)

    Google Scholar 

  45. Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

    Google Scholar 

  46. Sitzmann, V., Zollhoefer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  47. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  48. Sun, Y., Liu, J., Xie, M., Wohlberg, B., Kamilov, U.S.: CoIL: coordinate-based internal learning for tomographic imaging. IEEE Trans. Comp. Imag. 7, 1400–1412 (2021)

    Article  Google Scholar 

  49. Takeda, H., Van Beek, P., Milanfar, P.: Spatio-temporal video interpolation and denoising using motion-assisted steering kernel (MASK) regression. In: Proceedings of the IEEE International Conference on Image Processing, pp. 637–640 (2008)

    Google Scholar 

  50. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  51. Wu, J., Yuen, C., Cheung, N.M., Chen, J., Chen, C.W.: Modeling and optimization of high frame rate video transmission over wireless networks. IEEE Trans. Wireless Commun. 15(4), 2713–2726 (2015)

    Article  Google Scholar 

  52. Xian, W., Huang, J.B., Kopf, J., Kim, C.: Space-time neural irradiance fields for free-viewpoint video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9421–9431 (2021)

    Google Scholar 

  53. Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127(8), 1106–1125 (2019)

    Article  Google Scholar 

  54. Yoon, J.S., Kim, K., Gallo, O., Park, H.S., Kautz, J.: Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5336–5345 (2020)

    Google Scholar 

  55. Zhang, K., Riegler, G., Snavely, N., Koltun, V.: NeRF++: analyzing and improving neural radiance fields. arXiv:2010.07492 [cs.CV] (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ulugbek S. Kamilov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shangguan, W., Sun, Y., Gan, W., Kamilov, U.S. (2022). Learning Cross-Video Neural Representations for High-Quality Frame Interpolation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13675. Springer, Cham. https://doi.org/10.1007/978-3-031-19784-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19784-0_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19783-3

  • Online ISBN: 978-3-031-19784-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics