Advertisement

Video Enhancement with Task-Oriented Flow

  • Tianfan Xue
  • Baian Chen
  • Jiajun WuEmail author
  • Donglai Wei
  • William T. Freeman
Article

Abstract

Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.

Keywords

Video processing Optical flow Neural network Video dataset 

Notes

Acknowledgements

This work is supported by NSF RI #1212849, NSF BIGDATA #1447476, Facebook, and Shell Research. This work was done when Tianfan Xue and Donglai Wei were graduate students at MIT CSAIL.

References

  1. Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). Youtube-8m: A large-scale video classification benchmark. arXiv:1609.08675.
  2. Ahn, N., Kang, B., & Sohn, K. A. (2018). Fast, accurate, and, lightweight super-resolution with cascading residual network. In European conference on computer vision.Google Scholar
  3. Aittala, M., & Durand, F. (2018). Burst image deblurring using permutation invariant convolutional neural networks. In European conference on computer vision (pp 731–747).Google Scholar
  4. Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1), 1–31.Google Scholar
  5. Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision.Google Scholar
  6. Brox, T., Bregler, C., & Malik, J. (2009). Large displacement optical flow. In IEEE conference on computer vision and pattern recognition.Google Scholar
  7. Bulat, A., Yang, J., & Tzimiropoulos, G. (2018). To learn image super-resolution, use a gan to learn how to do image degradation first. In European conference on computer vision.Google Scholar
  8. Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In European conference on computer vision (pp. 611–625).Google Scholar
  9. Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J., Wang, Z., Shi, W. (2017). Real-time video super-resolution with spatio-temporal networks and motion compensation. In IEEE conference on computer vision and pattern recognition.Google Scholar
  10. Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazırbaş, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In IEEE international conference on computer vision.Google Scholar
  11. Ganin, Y., Kononenko, D., Sungatullina, D., & Lempitsky, V. (2016). Deepwarp: Photorealistic image resynthesis for gaze manipulation. In European conference on computer vision.Google Scholar
  12. Ghoniem, M., Chahir, Y., & Elmoataz, A. (2010). Nonlocal video denoising, simplification and inpainting using discrete regularization on graphs. Signal Process, 90(8), 2445–2455.zbMATHGoogle Scholar
  13. Godard, C., Matzen, K., & Uyttendaele, M. (2017). Deep burst denoising. In European conference on computer vision.Google Scholar
  14. Horn, B. K., & Schunck, B. G. (1981). Determining optical flow. Artif Intell, 17(1–3), 185–203.Google Scholar
  15. Huang, Y., Wang, W., & Wang, L. (2015). Bidirectional recurrent convolutional networks for multi-frame super-resolution. In Advances in neural information processing systems.Google Scholar
  16. Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems.Google Scholar
  17. Jiang, H., Sun, D., Jampani, V., Yang, M. H., Learned-Miller, E., Kautz, J. (2017). Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In IEEE conference on computer vision and pattern recognition.Google Scholar
  18. Jiang, X., Le Pendu, M., & Guillemot, C. (2018). Depth estimation with occlusion handling from a sparse set of light field views. In IEEE international conference on image processing.Google Scholar
  19. Jo, Y., Oh, S. W., Kang, J., & Kim, S. J. (2018). Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In IEEE conference on computer vision and pattern recognition (pp. 3224–3232).Google Scholar
  20. Kappeler, A., Yoo, S., Dai, Q., & Katsaggelos, A. K. (2016). Video super-resolution with convolutional neural networks. IEEE Transactions on Computational Imaging, 2(2), 109–122.MathSciNetGoogle Scholar
  21. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations.Google Scholar
  22. Li, M., Xie, Q., Zhao, Q., Wei, W., Gu, S., Tao, J., & Meng, D. (2018). Video rain streak removal by multiscale convolutional sparse coding. In IEEE conference on computer vision and pattern recognition (pp. 6644–6653).Google Scholar
  23. Liao, R., Tao, X., Li, R., Ma, Z., & Jia, J. (2015). Video super-resolution via deep draft-ensemble learning. In IEEE conference on computer vision and pattern recognition.Google Scholar
  24. Liu, C., & Freeman, W. (2010). A high-quality video denoising algorithm based on reliable motion estimation. In European conference on computer vision.Google Scholar
  25. Liu, C., & Sun, D. (2011). A bayesian approach to adaptive video super resolution. In IEEE conference on computer vision and pattern recognition.Google Scholar
  26. Liu, C., & Sun, D. (2014). On bayesian adaptive video super resolution. IEEE Transactions on Pattern Analysis and Machine intelligence, 36(2), 346–360.Google Scholar
  27. Liu, Z., Yeh, R., Tang, X., Liu, Y., & Agarwala, A. (2017). Video frame synthesis using deep voxel flow. In IEEE international conference on computer vision.Google Scholar
  28. Lu, G., Ouyang, W., Xu, D., Zhang, X., Gao, Z., & Sun, M. T. (2018). Deep Kalman filtering network for video compression artifact reduction. In European conference on computer vision (pp. 568–584).Google Scholar
  29. Ma, Z., Liao, R., Tao, X., Xu, L., Jia, J., & Wu, E. (2015). Handling motion blur in multi-frame super-resolution. In IEEE conference on computer vision and pattern recognition.Google Scholar
  30. Maggioni, M., Boracchi, G., Foi, A., & Egiazarian, K. (2012). Video denoising, deblocking, and enhancement through separable 4-d nonlocal spatiotemporal transforms. IEEE Transactions on Image Processing, 21(9), 3952–3966.MathSciNetzbMATHGoogle Scholar
  31. Makansi, O., Ilg, E., & Brox, T. (2017). End-to-end learning of video super-resolution with motion compensation. In German conference on pattern recognition.Google Scholar
  32. Mathieu, M., Couprie, C., & LeCun, Y. (2016). Deep multi-scale video prediction beyond mean square error. In International conference on learning representations.Google Scholar
  33. Mémin, E., & Pérez, P. (1998). Dense estimation and object-based segmentation of the optical flow with robust techniques. IEEE Transactions on Image Processing, 7(5), 703–719.Google Scholar
  34. Mildenhall, B., Barron, J. T., Chen, J., Sharlet, D., Ng, R., & Carroll, R. (2018). Burst denoising with kernel prediction networks. In IEEE conference on computer vision and pattern recognition.Google Scholar
  35. Nasrollahi, K., & Moeslund, T. B. (2014). Super-resolution: A comprehensive survey. Machine Vision and Applications, 25(6), 1423–1468.Google Scholar
  36. Niklaus, S., & Liu, F. (2018). Context-aware synthesis for video frame interpolation. In IEEE conference on computer vision and pattern recognition.Google Scholar
  37. Niklaus, S., Mai, L., & Liu, F. (2017a). Video frame interpolation via adaptive convolution. In IEEE conference on computer vision and pattern recognition.Google Scholar
  38. Niklaus, S., Mai, L., & Liu, F. (2017b). Video frame interpolation via adaptive separable convolution. In IEEE international conference on computer vision.Google Scholar
  39. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.zbMATHGoogle Scholar
  40. Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In IEEE conference on computer vision and pattern recognition.Google Scholar
  41. Revaud, J., Weinzaepfel, P., Harchaoui, Z., & Schmid, C. (2015). Epicflow: Edge-preserving interpolation of correspondences for optical flow. In IEEE conference on computer vision and pattern recognition.Google Scholar
  42. Sajjadi, M. S., Vemulapalli, R., & Brown, M. (2018). Frame-recurrent video super-resolution. In IEEE conference on computer vision and pattern recognition (pp. 6626–6634).Google Scholar
  43. Tao, X., Gao, H., Liao, R., Wang, J., & Jia, J. (2017). Detail-revealing deep video super-resolution. In IEEE international conference on computer vision.Google Scholar
  44. Varghese, G., & Wang, Z. (2010). Video denoising based on a spatiotemporal gaussian scale mixture model. IEEE Transactions on Circuits and Systems for Video Technology, 20(7), 1032–1040.Google Scholar
  45. Wang, T. C., Zhu, J. Y., Kalantari, N. K., Efros, A. A., & Ramamoorthi, R. (2017). Light field video capture using a learning-based hybrid imaging system. In SIGGRAPH.Google Scholar
  46. Wedel, A., Cremers, D., Pock, T., & Bischof, H. (2009). Structure-and motion-adaptive regularization for high accuracy optic flow. In IEEE conference on computer vision and pattern recognition.Google Scholar
  47. Wen, B., Li, Y., Pfister, L., & Bresler, Y. (2017). Joint adaptive sparsity and low-rankness on the fly: An online tensor reconstruction scheme for video denoising. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
  48. Werlberger, M., Pock, T., Unger, M., & Bischof, H. (2011). Optical flow guided tv-l1 video interpolation and restoration. In International conference on energy minimization methods in computer vision and pattern recognition.Google Scholar
  49. Xu, J., Ranftl, R., & Koltun, V. (2017). Accurate optical flow via direct cost volume processing. In IEEE conference on computer vision and pattern recognition.Google Scholar
  50. Xu, S., Zhang, F., He, X., Shen, X., & Zhang, X. (2015). Pm-pm: Patchmatch with potts model for object segmentation and stereo matching. IEEE Transactions on Image Processing, 24(7), 2182–2196.MathSciNetGoogle Scholar
  51. Yang, R., Xu, M., Wang, Z., & Li, T. (2018). Multi-frame quality enhancement for compressed video. In IEEE conference on computer vision and pattern recognition (pp 6664–6673).Google Scholar
  52. Yu, J. J., Harley, A. W., & Derpanis, K. G. (2016). Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In European conference on computer vision workshops.Google Scholar
  53. Yu, Z., Li, H., Wang, Z., Hu, Z., & Chen, C. W. (2013). Multi-level video frame interpolation: Exploiting the interaction among different levels. IEEE Transactions on Circuits and Systems for Video Technology, 23(7), 1235–1248.Google Scholar
  54. Zhou, T., Tulsiani, S., Sun, W., Malik, J., & Efros, A. A. (2016). View synthesis by appearance flow. In European conference on computer vision.Google Scholar
  55. Zhu, X., Wang, Y., Dai, J., Yuan, L., & Wei, Y. (2017). Flow-guided feature aggregation for video object detection. In IEEE International Conference on Computer Vision.Google Scholar
  56. Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., & Szeliski, R. (2004). High-quality video view interpolation using a layered representation. ACM Transactions on Graphics, 23(3), 600–608.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Google ResearchMountain ViewUSA
  2. 2.Massachusetts Institute of TechnologyCambridgeUSA
  3. 3.Harvard UniversityCambridgeUSA
  4. 4.Google ResearchCambridgeUSA

Personalised recommendations