Skip to main content

Real-Time Intermediate Flow Estimation for Video Frame Interpolation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13674))

Included in the following conference series:

Abstract

Real-time video frame interpolation (VFI) is very useful in video processing, media players, and display devices. We propose RIFE, a Real-time Intermediate Flow Estimation algorithm for VFI. To realize a high-quality flow-based VFI method, RIFE uses a neural network named IFNet that can estimate the intermediate flows end-to-end with much faster speed. A privileged distillation scheme is designed for stable IFNet training and improve the overall performance. RIFE does not rely on pre-trained optical flow models and can support arbitrary-timestep frame interpolation with the temporal encoding input. Experiments demonstrate that RIFE achieves state-of-the-art performance on several public benchmarks. Compared with the popular SuperSlomo and DAIN methods, RIFE is 4–27 times faster and produces better results. Furthermore, RIFE can be extended to wider applications thanks to temporal encoding. https://github.com/megvii-research/ECCV2022-RIFE

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anil, R., Pereyra, G., Passos, A., Ormandi, R., Dahl, G.E., Hinton, G.E.: Large scale distributed neural network training through online distillation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  2. Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. In: International Journal of Computer Vision (IJCV) (2011)

    Google Scholar 

  3. Bao, W., Lai, W.S., Ma, C., Zhang, X., Gao, Z., Yang, M.H.: Depth-aware video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  4. Bao, W., Lai, W.S., Zhang, X., Gao, Z., Yang, M.H.: MEMC-Net: motion estimation and motion compensation driven neural network for video interpolation and enhancement. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI) (2018). https://doi.org/10.1109/TPAMI.2019.2941941

  5. Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  6. Briedis, K.M., Djelouah, A., Meyer, M., McGonigal, I., Gross, M., Schroers, C.: Neural frame interpolation for rendered content. ACM Trans. Graph. 40(6), 1–13 (2021)

    Article  Google Scholar 

  7. Chen, X., Zhang, Y., Wang, Y., Shu, H., Xu, C., Xu, C.: Optical flow distillation: Towards efficient and stable video style transfer. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

    Google Scholar 

  8. Cheng, X., Chen, Z.: Video frame interpolation via deformable separable convolution. In: AAAI Conference on Artificial Intelligence (2020)

    Google Scholar 

  9. Cheng, X., Chen, Z.: Multiple video frame interpolation via enhanced deformable separable convolution. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021). https://doi.org/10.1109/TPAMI.2021.3100714

  10. Choi, M., Kim, H., Han, B., Xu, N., Lee, K.M.: Channel attention is all you need for video frame interpolation. In: AAAI Conference on Artificial Intelligence (2020)

    Google Scholar 

  11. Danier, D., Zhang, F., Bull, D.: Spatio-temporal multi-flow network for video frame interpolation. arXiv preprint arXiv:2111.15483 (2021)

  12. Ding, L., Goshtasby, A.: On the canny edge detector. Pattern Recogn. 34(3), 721–725 (2001)

    Article  Google Scholar 

  13. Ding, T., Liang, L., Zhu, Z., Zharkov, I.: CDFI: compression-driven network design for frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  14. Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: making VGG-style convnets great again. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  15. Dosovitskiy, A., et al.: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  17. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  18. Huang, Z., Heng, W., Zhou, S.: Learning to paint with model-based deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  19. Hui, T.W., Tang, X., Change Loy, C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  20. Ilg, E., et al.: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  21. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

  22. Jiang, H., Sun, D., Jampani, V., Yang, M.H., Learned-Miller, E., Kautz, J.: Super SloMo: high quality estimation of multiple intermediate frames for video interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  23. Jonschkowski, R., Stone, A., Barron, J.T., Gordon, A., Konolige, K., Angelova, A.: What matters in unsupervised optical flow. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

    Google Scholar 

  24. Kalluri, T., Pathak, D., Chandraker, M., Tran, D.: FLAVR: Flow-agnostic video representations for fast frame interpolation. arXiv preprint arXiv:2012.08512 (2020)

  25. Kong, L., et al.: IfrNet: intermediate feature refine network for efficient frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  26. Lee, D.H., et al.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Proceedings of the IEEE International Conference on Machine Learning Workshops (ICMLW) (2013)

    Google Scholar 

  27. Lee, H., Kim, T., Chung, T.y., Pak, D., Ban, Y., Lee, S.: AdaCOF: adaptive collaboration of flows for video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  28. Liu, Y., Xie, L., Siyao, L., Sun, W., Qiao, Y., Dong, C.: Enhanced quadratic video interpolation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

    Google Scholar 

  29. Liu, Y.L., Liao, Y.T., Lin, Y.Y., Chuang, Y.Y.: Deep video frame interpolation using cyclic frame generation. In: Proceedings of the 33rd Conference on Artificial Intelligence (AAAI) (2019)

    Google Scholar 

  30. Liu, Z., Yeh, R.A., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  31. Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  32. Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam. arXiv preprint arXiv:1711.05101 (2017)

  33. Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., Gao, Z.: DVC: an end-to-end deep video compression framework. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  34. Lu, L., Wu, R., Lin, H., Lu, J., Jia, J.: Video frame interpolation with transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  35. Luo, K., Wang, C., Liu, S., Fan, H., Wang, J., Sun, J.: UPFlow: upsampling pyramid for unsupervised optical flow learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  36. Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European conference on computer vision (ECCV) (2018)

    Google Scholar 

  37. Meister, S., Hur, J., Roth, S.: UnFlow: unsupervised learning of optical flow with a bidirectional census loss. In: AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  38. Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, a.: Phase-based frame interpolation for video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  39. Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  40. Niklaus, S., Liu, F.: Context-aware synthesis for video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  41. Niklaus, S., Liu, F.: SoftMax splatting for video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  42. Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  43. Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  44. Park, J., Lee, C., Kim, C.S.: Asymmetric bilateral motion estimation for video frame interpolation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  45. Porrello, A., Bergamini, L., Calderara, S.: Robust re-identification by multiple views knowledge distillation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

    Google Scholar 

  46. Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2020)

    Google Scholar 

  47. Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  48. Reda, F., Kontkanen, J., Tabellion, E., Sun, D., Pantofaru, C., Curless, B.: Frame interpolation for large motion. arXiv (2022)

    Google Scholar 

  49. Reda, F.A., et al.: Unsupervised video interpolation using cycle consistency. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  50. Sim, H., Oh, J., Kim, M.: XVFI: extreme video frame interpolation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  51. Siyao, L., et al.: Deep animation video interpolation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  52. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  53. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  54. Sun, S., Kuang, Z., Sheng, L., Ouyang, W., Zhang, W.: Optical flow guided feature: a fast and robust motion representation for video action recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  55. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

    Google Scholar 

  56. Wu, C.Y., Singhal, N., Krahenbuhl, P.: Video compression through image interpolation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  57. Wu, Y., Wen, Q., Chen, Q.: Optimizing video prediction via video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  58. Xiang, X., Tian, Y., Zhang, Y., Fu, Y., Allebach, J.P., Xu, C.: Zooming slow-MO: fast and accurate one-stage space-time video super-resolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  59. Xu, G., Xu, J., Li, Z., Wang, L., Sun, X., Cheng, M.: Temporal modulation network for controllable space-time video super-resolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  60. Xu, X., Siyao, L., Sun, W., Yin, Q., Yang, M.H.: Quadratic video interpolation. In: Advances in Neural Information Processing Systems (NIPS) (2019)

    Google Scholar 

  61. Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. In: International Journal of Computer Vision (IJCV) (2019)

    Google Scholar 

  62. Yuan, S., Stenger, B., Kim, T.K.: RGB-based 3d hand pose estimation via privileged learning with depth images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW) (2019)

    Google Scholar 

  63. Zhao, Z., Wu, Z., Zhuang, Y., Li, B., Jia, J.: Tracking objects as pixel-wise distributions. In: Proceedings of the European conference on computer vision (ECCV) (2022)

    Google Scholar 

  64. Zhou, M., Bai, Y., Zhang, W., Zhao, T., Mei, T.: Responsive listening head generation: a benchmark dataset and baseline. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

    Google Scholar 

  65. Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016)

    Google Scholar 

Download references

Acknowledgement

This work is supported by National Key R &D Program of China (2021ZD0109803) and National Natural Science Foundation of China under Grant No. 62136001, 62088102.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Boxin Shi or Shuchang Zhou .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5519 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Z., Zhang, T., Heng, W., Shi, B., Zhou, S. (2022). Real-Time Intermediate Flow Estimation for Video Frame Interpolation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13674. Springer, Cham. https://doi.org/10.1007/978-3-031-19781-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19781-9_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19780-2

  • Online ISBN: 978-3-031-19781-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics