Advertisement

Spatio-Temporal Transformer Network for Video Restoration

  • Tae Hyun Kim
  • Mehdi S. M. Sajjadi
  • Michael Hirsch
  • Bernhard Schölkopf
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)

Abstract

State-of-the-art video restoration methods integrate optical flow estimation networks to utilize temporal information. However, these networks typically consider only a pair of consecutive frames and hence are not capable of capturing long-range temporal dependencies and fall short of establishing correspondences across several timesteps. To alleviate these problems, we propose a novel Spatio-temporal Transformer Network (STTN) which handles multiple frames at once and thereby manages to mitigate the common nuisance of occlusions in optical flow estimation. Our proposed STTN comprises a module that estimates optical flow in both space and time and a resampling layer that selectively warps target frames using the estimated flow. In our experiments, we demonstrate the efficiency of the proposed network and show state-of-the-art restoration results in video super-resolution and video deblurring.

Keywords

Spatio-temporal transformer network Spatio-temporal flow Spatio-temporal sampler Video super-resolution Video deblurring 

References

  1. 1.
    Liu, C., Sun, D.: A Bayesian approach to adaptive video super resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  2. 2.
    Kim, T.H., Lee, K.M.: Generalized video deblurring for dynamic scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  3. 3.
    Zhao, W.Y., Sawhney, H.S.: Is super-resolution with optical flow feasible? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 599–613. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-47969-4_40CrossRefGoogle Scholar
  4. 4.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  5. 5.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  6. 6.
    Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  7. 7.
    Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., Zha, H.: Unsupervised deep learning for optical flow estimation. In: Association for the Advancement of Artificial Intelligence (AAAI) (2017)Google Scholar
  8. 8.
    Meister, S., Hur, J., Roth, S.: UnFlow: unsupervised learning of optical flow with a bidirectional census loss. In: AAAI (2017)Google Scholar
  9. 9.
    Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  10. 10.
    Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  11. 11.
    Ahmadi, A., Patras, I.: Unsupervised convolutional neural networks for motion estimation. In: IEEE International Conference on Image Processing (ICIP) (2016)Google Scholar
  12. 12.
    Yu, J.J., Harley, A.W., Derpanis, K.G.: Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 3–10. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_1CrossRefGoogle Scholar
  13. 13.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  14. 14.
    Zimmer, H., Bruhn, A., Weickert, J.: Optic flow in harmony. Int. J. Comput. Vis. (IJCV) 93(3), 368–388 (2011)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Werlberger, M., Pock, T., Bischof, H.: Motion estimation with non-local total variation regularization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  16. 16.
    Lee, K.J., Kwon, D., Yun, I.D., Lee, S.U.: Optical flow estimation with adaptive convolution Kernel prior on discrete framework. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  17. 17.
    Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  18. 18.
    Xu, L., Jia, J., Matsushita, Y.: Motion detail preserving optical flow estimation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 34(9), 1744–1757 (2012)CrossRefGoogle Scholar
  19. 19.
    Kim, T.H., Lee, H.S., Lee, K.M.: Optical flow via locally adaptive fusion of complementary data costs. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  20. 20.
    Volz, S., Bruhn, A., Valgaerts, L., Zimmer, H.: Modeling temporal coherence for optical flow. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2011)Google Scholar
  21. 21.
    Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J.Comput. Vis. (IJCV) 92(1), 1–31 (2011)CrossRefGoogle Scholar
  22. 22.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_44CrossRefGoogle Scholar
  23. 23.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  24. 24.
    Xu, J., Ranftl, R., Koltun, V.: Accurate optical flow via direct cost volume processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  25. 25.
    Güney, F., Geiger, A.: Deep discrete flow. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10114, pp. 207–224. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-54190-7_13CrossRefGoogle Scholar
  26. 26.
    Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Deepmatching: hierarchical deformable dense matching. Int. J. Comput. Vis. (IJCV) 120(3), 300–323 (2016)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Bailer, C., Taetz, B., Stricker, D.: Flow fields: dense correspondence fields for highly accurate large displacement optical flow estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  28. 28.
    Mitzel, D., Pock, T., Schoenemann, T., Cremers, D.: Video super resolution using duality based TV-L1 optical flow. In: Denzler, J., Notni, G., Süße, H. (eds.) DAGM 2009. LNCS, vol. 5748, pp. 432–441. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03798-6_44CrossRefGoogle Scholar
  29. 29.
    Kappeler, A., Yoo, S., Dai, Q., Katsaggelos, A.K.: Video super-resolution with convolutional neural networks. IEEE Trans. Comput. Imaging 2(2), 109–122 (2016)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  31. 31.
    Makansi, O., Ilg, E., Brox, T.: End-to-end learning of video super-resolution with motion compensation. In: Proceedings of the German Conference on Pattern Recognition (GCPR) (2017)Google Scholar
  32. 32.
    Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  33. 33.
    Liu, D., et al.: Robust video super-resolution with learned temporal dynamics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  34. 34.
    Sajjadi, M.S.M., Vemulapalli, R., Brown, M.: Frame-recurrent video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  35. 35.
    Cai, J.F., Ji, H., Liu, C., Shen, Z.: Blind motion deblurring using multiple images. J. Comput. Phys. 228(14), 5057–5071 (2009)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Zhang, H., Wipf, D., Zhang, Y.: Multi-image blind deblurring using a coupled adaptive sparse prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  37. 37.
    Cho, S., Cho, H., Tai, Y.W., Lee, S.: Registration based non-uniform motion deblurring. Comput. Graph. Forum 31(7), 2183–2192 (2012)CrossRefGoogle Scholar
  38. 38.
    Zhang, H., Carin, L.: Multi-shot imaging: joint alignment, deblurring and resolution-enhancement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  39. 39.
    Zhang, H., Yang, J.: Intra-frame deblurring by leveraging inter-frame camera motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  40. 40.
    Li, Y., Kang, S.B., Joshi, N., Seitz, S.M., Huttenlocher, D.P.: Generating sharp panoramas from motion-blurred videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  41. 41.
    Wulff, J., Black, M.J.: Modeling blurred video with layers. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 236–252. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_16CrossRefGoogle Scholar
  42. 42.
    Kim, T.H., Nah, S., Lee, K.M.: Dynamic video deblurring using a locally adaptive linear blur model. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2018)Google Scholar
  43. 43.
    Kim, T.H., Lee, K.M.: Segmentation-free dynamic scene deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  44. 44.
    Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  45. 45.
    Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., Wang, O.: Deep video deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  46. 46.
    Wieschollek, P., Hirsch, M., Schölkopf, B., Lensch, H.P.: Learning blind motion deblurring. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  47. 47.
    Kim, T.H., Lee, K.M., Schölkopf, B., Hirsch, M.: Online video deblurring via dynamic temporal blending network. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  48. 48.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  49. 49.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)Google Scholar
  50. 50.
    Sajjadi, M.S.M., Schölkopf, B., Hirsch, M.: EnhanceNet: single image super-resolution through automated texture synthesis. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  51. 51.
    Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  52. 52.
    Pérez, J.S., Meinhardt-Llopis, E., Facciolo, G.: TV-L1 optical flow estimation. Image Process. On Line (IPOL) 3, 137–150 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Tae Hyun Kim
    • 1
    • 2
  • Mehdi S. M. Sajjadi
    • 1
    • 3
  • Michael Hirsch
    • 1
    • 4
  • Bernhard Schölkopf
    • 1
  1. 1.Max Planck Institute for Intelligent SystemsTübingenGermany
  2. 2.Hanyang UniversitySeoulRepublic of Korea
  3. 3.Max Planck ETH Center for Learning SystemsTübingenGermany
  4. 4.Amazon ResearchTübingenGermany

Personalised recommendations