Advertisement

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

  • Justin JohnsonEmail author
  • Alexandre Alahi
  • Li Fei-Fei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9906)

Abstract

We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

Keywords

Style transfer Super-resolution Deep learning 

Notes

Acknowledgments

Our work is supported by an ONR MURI grant, Yahoo! Labs, and a hardware donation from NVIDIA.

Supplementary material

419974_1_En_43_MOESM1_ESM.pdf (6.9 mb)
Supplementary material 1 (pdf 7091 KB)

References

  1. 1.
    Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE TPAMI 32, 295–307 (2016)CrossRefGoogle Scholar
  2. 2.
    Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: ICCV (2015)Google Scholar
  3. 3.
    Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. ECCV (2016)Google Scholar
  4. 4.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  5. 5.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)Google Scholar
  6. 6.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)Google Scholar
  7. 7.
    Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015)Google Scholar
  8. 8.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR Workshop (2014)Google Scholar
  9. 9.
    Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. In: ICML Deep Learning Workshop (2015)Google Scholar
  10. 10.
    Gatys, L.A., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. In: NIPS (2015)Google Scholar
  11. 11.
    Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
  12. 12.
    Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016)Google Scholar
  13. 13.
    Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 184–199. Springer, Heidelberg (2014)Google Scholar
  14. 14.
    Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE TPAMI 35(8), 1915–1929 (2013)CrossRefGoogle Scholar
  15. 15.
    Pinheiro, P.H., Collobert, R.: Recurrent convolutional neural networks for scene labeling. In: ICML (2014)Google Scholar
  16. 16.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV (2015)Google Scholar
  17. 17.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.: Conditional random fields as recurrent neural networks. In: ICCV (2015)Google Scholar
  18. 18.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: CVPR (2015)Google Scholar
  19. 19.
    Wang, X., Fouhey, D., Gupta, A.: Designing deep networks for surface normal estimation. In: CVPR (2015)Google Scholar
  20. 20.
    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: ICLR (2014)Google Scholar
  21. 21.
    Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: CVPR (2015)Google Scholar
  22. 22.
    d’Angelo, E., Alahi, A., Vandergheynst, P.: Beyond bits: reconstructing images from local binary descriptors. In: ICPR (2012)Google Scholar
  23. 23.
    d’Angelo, E., Jacques, L., Alahi, A., Vandergheynst, P.: From bits to images: inversion of local binary descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 874–887 (2014)CrossRefGoogle Scholar
  24. 24.
    Vondrick, C., Khosla, A., Malisiewicz, T., Torralba, A.: Hoggles: visualizing object detection features. In: ICCV (2013)Google Scholar
  25. 25.
    Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR (2016)Google Scholar
  26. 26.
    Ulyanov, D., Lebadev, V., Vedaldi, A., Lempitsky, V.: Texture networks: feed-forward synthesis of textures and stylized images. In: ICML (2016)Google Scholar
  27. 27.
    Li, C., Wand, M.: Precomputed real-time texture synthesis with markovian generative adversarial networks. In: ECCV (2016)Google Scholar
  28. 28.
    Yang, C.-Y., Ma, C., Yang, M.-H.: Single-image super-resolution: a benchmark. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 372–386. Springer, Heidelberg (2014)Google Scholar
  29. 29.
    Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP: Graph. Models Image Process. 53(3), 231–239 (1991)Google Scholar
  30. 30.
    Freedman, G., Fattal, R.: Image and video upscaling from local self-examples. ACM Trans. Graph. (TOG) 30(2), 12 (2011)CrossRefGoogle Scholar
  31. 31.
    Sun, J., Sun, J., Xu, Z., Shum, H.Y.: Image super-resolution using gradient profile prior. In: CVPR (2008)Google Scholar
  32. 32.
    Shan, Q., Li, Z., Jia, J., Tang, C.K.: Fast image/video upsampling. ACM Trans. Graph. (TOG) 27, 153 (2008). ACMGoogle Scholar
  33. 33.
    Kim, K.I., Kwon, Y.: Single-image super-resolution using sparse regression and natural image prior. IEEE TPAMI 32(6), 1127–1133 (2010)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Xiong, Z., Sun, X., Wu, F.: Robust web image/video super-resolution. IEEE Trans. Image Process. 19(8), 2017–2028 (2010)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Comput. Graph. Appl. 22(2), 56–65 (2002)CrossRefGoogle Scholar
  36. 36.
    Chang, H., Yeung, D.Y., Xiong, Y.: Super-resolution through neighbor embedding. In: CVPR (2004)Google Scholar
  37. 37.
    Glasner, D., Bagon, S., Irani, M.: Super-resolution from a single image. In: ICCV (2009)Google Scholar
  38. 38.
    Yang, J., Lin, Z., Cohen, S.: Fast image super-resolution based on in-place example regression. In: CVPR (2013)Google Scholar
  39. 39.
    Sun, J., Zheng, N.N., Tao, H., Shum, H.Y.: Image hallucination with primal sketch priors. In: CVPR (2003)Google Scholar
  40. 40.
    Ni, K.S., Nguyen, T.Q.: Image superresolution using support vector regression. IEEE Trans. Image Process. 16(6), 1596–1610 (2007)MathSciNetCrossRefGoogle Scholar
  41. 41.
    He, L., Qi, H., Zaretzki, R.: Beta process joint dictionary learning for coupled feature spaces with application to single image super-resolution. In: CVPR (2013)Google Scholar
  42. 42.
    Yang, J., Wright, J., Huang, T., Ma, Y.: Image super-resolution as sparse representation of raw image patches. In: CVPR (2008)Google Scholar
  43. 43.
    Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Timofte, R., De Smet, V., Van Gool, L.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 111–126. Springer, Heidelberg (2015)Google Scholar
  45. 45.
    Schulter, S., Leistner, C., Bischof, H.: Fast and accurate image upscaling with super-resolution forests. In: CVPR (2015)Google Scholar
  46. 46.
    Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: CVPR (2015)Google Scholar
  47. 47.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)Google Scholar
  48. 48.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  49. 49.
    Gross, S., Wilber, M.: Training and investigating residual nets (2016). http://torch.ch/blog/2016/02/04/resnets.html
  50. 50.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  51. 51.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  52. 52.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  53. 53.
    Aly, H.A., Dubois, E.: Image up-sampling using total-variation regularization with a new observation model. IEEE Trans. Image Process. 14(10), 1647–1659 (2005)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Zhang, H., Yang, J., Zhang, Y., Huang, T.S.: Non-local kernel regression for image and video restoration. In: Maragos, P., Paragios, N., Daniilidis, K. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 566–579. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  55. 55.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014)Google Scholar
  56. 56.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  57. 57.
    Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a Matlab-like environment for machine learning. In: NIPS BigLearn Workshop (2011)Google Scholar
  58. 58.
    Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cuDNN: efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
  59. 59.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  60. 60.
    Hanhart, P., Korshunov, P., Ebrahimi, T.: Benchmarking of quality metrics on ultra-high definition video sequences. In: 2013 18th International Conference on Digital Signal Processing (DSP), pp. 1–8. IEEE (2013)Google Scholar
  61. 61.
    Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 44(13), 800–801 (2008)CrossRefGoogle Scholar
  62. 62.
    Kundu, D., Evans, B.L.: Full-reference visual quality assessment for synthetic images: a subjective study. In: Proceedings of the IEEE International Conference on Image Processing (2015)Google Scholar
  63. 63.
    Zhang, L., Zhang, L., Mou, X., Zhang, D.: Fsim: a feature similarity index for image quality assessment. IEEE Trans. Image Process. 20(8), 2378–2386 (2011)MathSciNetCrossRefGoogle Scholar
  64. 64.
    Sheikh, H.R., Bovik, A.C.: Image information and visual quality. IEEE Trans. Image Process. 15(2), 430–444 (2006)CrossRefGoogle Scholar
  65. 65.
    Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding (2012)Google Scholar
  66. 66.
    Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Boissonnat, J.-D., Chenin, P., Cohen, A., Gout, C., Lyche, T., Mazure, M.-L., Schumaker, L. (eds.) Curves and Surfaces 2011. LNCS, vol. 6920, pp. 711–730. Springer, Heidelberg (2012). Revised Selected PapersCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceStanford UniversityStanfordUSA

Personalised recommendations