Advertisement

Video Frame Interpolation via Cyclic Fine-Tuning and Asymmetric Reverse Flow

  • Morten HannemoseEmail author
  • Janus Nørtoft Jensen
  • Gudmundur Einarsson
  • Jakob Wilm
  • Anders Bjorholm Dahl
  • Jeppe Revall Frisvad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11482)

Abstract

The objective in video frame interpolation is to predict additional in-between frames in a video while retaining natural motion and good visual quality. In this work, we use a convolutional neural network (CNN) that takes two frames as input and predicts two optical flows with pixelwise weights. The flows are from an unknown in-between frame to the input frames. The input frames are warped with the predicted flows, multiplied by the predicted weights, and added to form the in-between frame. We also propose a new strategy to improve the performance of video frame interpolation models: we reconstruct the original frames using the learned model by reusing the predicted frames as input for the model. This is used during inference to fine-tune the model so that it predicts the best possible frames. Our model outperforms the publicly available state-of-the-art methods on multiple datasets.

Keywords

Slow motion Video frame interpolation Convolutional neural networks 

Notes

Acknowledgements

We would like to thank Joel Janai for providing us with the SlowFlow data [5].

References

  1. 1.
    Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2011)CrossRefGoogle Scholar
  2. 2.
    Castagno, R., Haavisto, P., Ramponi, G.: A method for motion adaptive frame rate up-conversion. IEEE Trans. Circuits Syst. Video Technol. 6(5), 436–446 (1996)CrossRefGoogle Scholar
  3. 3.
    Catmull, E.: The problems of computer-assisted animation. Comput. Graph. 12(3), 348–353 (1978). SIGGRAPH 1978CrossRefGoogle Scholar
  4. 4.
    Herbst, E., Seitz, S., Baker, S.: Occlusion reasoning for temporal interpolation using optical flow. Technical report, Microsoft Research, August 2009Google Scholar
  5. 5.
    Janai, J., Güney, F., Wulff, J., Black, M., Geiger, A.: Slow flow: exploiting high-speed cameras for accurate and diverse optical flow reference data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1406–1416 (2017)Google Scholar
  6. 6.
    Jiang, H., Sun, D., Jampani, V., Yang, M.H., Learned-Miller, E., Kautz, J.: Super SloMo: high quality estimation of multiple intermediate frames for video interpolation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9000–9008 (2018)Google Scholar
  7. 7.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  8. 8.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  9. 9.
    Lasseter, J.: Principles of traditional animation applied to 3D computer animation. Comput. Graph. 21(4), 35–44 (1987). SIGGRAPH 1987CrossRefGoogle Scholar
  10. 10.
    Liu, Y.L., Liao, Y.T., Lin, Y.Y., Chuang, Y.Y.: Deep video frame interpolation using cyclic frame generation. In: AAAI Conference on Artificial Intelligence (2019)Google Scholar
  11. 11.
    Liu, Z., Yeh, R.A., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: International Conference on Computer Vision, pp. 4463–4471 (2017)Google Scholar
  12. 12.
    Long, G., Kneip, L., Alvarez, J.M., Li, H., Zhang, X., Yu, Q.: Learning image matching by simply watching video. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 434–450. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_26CrossRefGoogle Scholar
  13. 13.
    Mahajan, D., Huang, F.C., Matusik, W., Ramamoorthi, R., Belhumeur, P.: Moving gradients: a path-based method for plausible image interpolation. ACM Trans. Graph. 28(3), 42:1–42:11 (2009)CrossRefGoogle Scholar
  14. 14.
    Meyer, S., Djelouah, A., McWilliams, B., Sorkine-Hornung, A., Gross, M., Schroers, C.: Phasenet for video frame interpolation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 498–507 (2018)Google Scholar
  15. 15.
    Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1410–1418 (2015)Google Scholar
  16. 16.
    Niklaus, S., Liu, F.: Context-aware synthesis for video frame interpolation. In: Conference on Computer Vision and Pattern Recognition, pp. 1701–1710 (2018)Google Scholar
  17. 17.
    Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive convolution. In: Conference on Computer Vision and Pattern Recognition, pp. 670–679 (2017)Google Scholar
  18. 18.
    Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: International Conference on Computer Vision, pp. 261–270 (2017)Google Scholar
  19. 19.
    Reeves, W.T.: Inbetweening for computer animation utilizing moving point constraints. Comput. Graph. 15(3), 263–269 (1981). (SIGGRAPH 1981)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  22. 22.
    Soomro, K., Zamir, A.R., Shah, M., Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)Google Scholar
  23. 23.
    Szeliski, R.: Prediction error as a quality metric for motion and stereo. In: IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 781–788 (1999)Google Scholar
  24. 24.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  25. 25.
    Werlberger, M., Pock, T., Unger, M., Bischof, H.: Optical flow guided TV-L1 video interpolation and restoration. In: Boykov, Y., Kahl, F., Lempitsky, V., Schmidt, F.R. (eds.) EMMCVPR 2011. LNCS, vol. 6819, pp. 273–286. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23094-3_20CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Morten Hannemose
    • 1
    Email author
  • Janus Nørtoft Jensen
    • 1
  • Gudmundur Einarsson
    • 1
  • Jakob Wilm
    • 2
  • Anders Bjorholm Dahl
    • 1
  • Jeppe Revall Frisvad
    • 1
  1. 1.DTU ComputeTechnical University of DenmarkKongens LyngbyDenmark
  2. 2.SDU RoboticsUniversity of Southern DenmarkOdenseDenmark

Personalised recommendations