Advertisement

DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation

  • Yaroslav GaninEmail author
  • Daniil Kononenko
  • Diana Sungatullina
  • Victor Lempitsky
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9906)

Abstract

In this work, we consider the task of generating highly-realistic images of a given face with a redirected gaze. We treat this problem as a specific instance of conditional image generation and suggest a new deep architecture that can handle this task very well as revealed by numerical comparison with prior art and a user study. Our deep architecture performs coarse-to-fine warping with an additional intensity correction of individual pixels. All these operations are performed in a feed-forward manner, and the parameters associated with different operations are learned jointly in the end-to-end fashion. After learning, the resulting neural network can synthesize images with manipulated gaze, while the redirection angle can be selected arbitrarily from a certain range and provided as an input to the network.

Keywords

Gaze correction Warping Spatial transformers Supervised learning Deep learning 

Notes

Acknowledgements

We would like to thank Leonid Ekimov for sharing the results of his work on applying auto-encoders for gaze correction. We are also grateful to all the Skoltech students and employees who agreed to participate in the dataset collection and in the user study. This research is supported by the Skoltech Translational Research and Innovation Program.

Supplementary material

419974_1_En_20_MOESM1_ESM.pdf (237 kb)
Supplementary material 1 (pdf 236 KB)

References

  1. 1.
    Criminisi, A., Shotton, J., Blake, A., Torr, P.H.: Gaze manipulation for one-to-one teleconferencing. In: ICCV (2003)Google Scholar
  2. 2.
    Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a laplacian pyramid of adversarial networks. In: NIPS (2015)Google Scholar
  3. 3.
    Dosovitskiy, A., Tobias Springenberg, J., Brox, T.: Learning to generate chairs with convolutional neural networks. In: CVPR (2015)Google Scholar
  4. 4.
    Ganin, Y., Kononenko, D., Sungatullina, D., Lempitsky, V.: Project website (2016). http://sites.skoltech.ru/compvision/projects/deepwarp/. Accessed 22 July 2016
  5. 5.
    Gatys, L., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. In: NIPS (2015)Google Scholar
  6. 6.
    Ghodrati, A., Jia, X., Pedersoli, M., Tuytelaars, T.: Towards automatic image editing: learning to see another you. arXiv preprint arXiv:1511.08446 (2015)
  7. 7.
    Giger, D., Bazin, J.C., Kuster, C., Popa, T., Gross, M.: Gaze correction with a single webcam. In: IEEE International Conference on Multimedia and Expo (2014)Google Scholar
  8. 8.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  9. 9.
    Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: ICML (2015)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
  11. 11.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  12. 12.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  13. 13.
    Jones, A., Lang, M., Fyffe, G., Yu, X., Busch, J., McDowall, I., Bolas, M.T., Debevec, P.E.: Achieving eye contact in a one-to-many 3D video teleconferencing system. ACM Trans. Graph. 28(3), 64 (2009)CrossRefGoogle Scholar
  14. 14.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  15. 15.
    Kleinke, C.L.: Gaze and eye contact: a research review. Psychol. Bull. 100(1), 78 (1986)CrossRefGoogle Scholar
  16. 16.
    Kononenko, D., Lempitsky, V.: Learning to look up: realtime monocular gaze correction using machine learning. In: CVPR (2015)Google Scholar
  17. 17.
    Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.: Deep convolutional inverse graphics network. In: NIPS (2015)Google Scholar
  18. 18.
    Kuster, C., Popa, T., Bazin, J.C., Gotsman, C., Gross, M.: Gaze correction for home video conferencing. In: SIGGRAPH Asia (2012)Google Scholar
  19. 19.
    Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015)Google Scholar
  20. 20.
    Okada, K.I., Maeda, F., Ichikawaa, Y., Matsushita, Y.: Multiparty videoconferencing at virtual social distance: MAJIC design. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW 1994, pp. 385–393 (1994)Google Scholar
  21. 21.
    Oquab, M.: Torch7 modules for spatial transformer networks (2015). https://github.com/qassemoquab/stnbhwd
  22. 22.
    Reed, S.E., Zhang, Y., Zhang, Y., Lee, H.: Deep visual analogy-making. In: NIPS (2015)Google Scholar
  23. 23.
    Wallis, L.J., Range, F., Müller, C.A., Serisier, S., Huber, L., Virányi, Z.: Training for eye contact modulates gaze following in dogs. Anim. Behav. 106, 27–35 (2015)CrossRefGoogle Scholar
  24. 24.
    Wolf, L., Freund, Z., Avidan, S.: An eye for an eye: a single camera gaze-replacement method. In: CVPR (2010)Google Scholar
  25. 25.
    Xiong, X., Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)Google Scholar
  26. 26.
    Yang, R., Zhang, Z.: Eye gaze correction with stereovision for video-teleconferencing. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 479–494. Springer, Heidelberg (2002). doi: 10.1007/3-540-47967-8_32 CrossRefGoogle Scholar
  27. 27.
    Yip, B., Jin, J.S.: Face re-orientation using ellipsoid model in video conference. In: Proceedings of the 7th IASTED International Conference on Internet and Multimedia Systems and Applications, pp. 245–250 (2003)Google Scholar
  28. 28.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Yaroslav Ganin
    • 1
    Email author
  • Daniil Kononenko
    • 1
  • Diana Sungatullina
    • 1
  • Victor Lempitsky
    • 1
  1. 1.Skolkovo Institute of Science and TechnologySkolkovo, Moscow RegionRussia

Personalised recommendations