Skip to main content
Log in

FSS: algorithm and neural network accelerator for style transfer

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Neural networks (NNs), owing to their impressive performance, have gradually begun to dominate multimedia processing. For resource-constrained and energy-sensitive mobile devices, an efficient NN accelerator is necessary. Style transfer is an important multimedia application. However, existing arbitrary style transfer networks are complex and not well supported by current NN accelerators, limiting their application on mobile devices. Moreover, the quality of style transfer needs improvement. Thus, we design the FastStyle system (FSS), where a novel algorithm and an NN accelerator are proposed for style transfer. In FSS, we first propose a novel arbitrary style transfer algorithm, FastStyle. We propose a light network that contributes to high quality and low computational complexity and a prior mechanism to avoid retraining when the style changes. Then, we redesign an NN accelerator for FastStyle by applying two improvements to the basic NVIDIA deep learning accelerator (NVDLA) architecture. First, a flexible dat FSM and wt FSM are redesigned to enable the original data path to perform other operations (including the GRAM operation) by software programming. Moreover, statistics and judgment logic are designed to utilize the continuity of a video stream and remove the data dependency in the instance normalization, which improves the accelerator performance by 18.6%. The experimental results demonstrate that the proposed FastStyle can achieve higher quality with a lower computational cost, making it more suitable for mobile devices. The proposed NN accelerator is implemented on the Xilinx VCU118 FPGA under a 180-MHz clock. Experimental results show that the accelerator can stylize 512×512-pixel video with 20 FPS, and the measured performance reaches up to 306.07 GOPS. The ASIC implementation in TSMC 28 nm achieves about 22 FPS in the case of a 720-p video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Wronski B, Garcia-Dorado I, Ernst M, et al. Handheld multi-frame super-resolution. ACM Trans Graph, 2019, 38: 1–18

    Article  Google Scholar 

  2. Dong C, Loy C C, He K, et al. Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 295–307

    Article  Google Scholar 

  3. Sajjadi M S M, Vemulapalli R, Brown M. Frame-recurrent video super-resolution. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 6626–6634

  4. Shen W, Bao W, Zhai G, et al. Video frame interpolation and enhancement via pyramid recurrent framework. IEEE Trans Image Process, 2021, 30: 277–292

    Article  MathSciNet  Google Scholar 

  5. Wei K, Fu Y, Yang J, et al. A physics-based noise formation model for extreme low-light raw denoising. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2755–2764

  6. Zamir S W, Arora A, Khan S, et al. CycleISP: real image restoration via improved data synthesis. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2693–2702

  7. Anwar S, Barnes N. Real image denoising with feature attention. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 3155–3164

  8. Nah S, Son S, Lee K M. Recurrent neural networks with intra-frame iterations for video deblurring. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 8094–8103

  9. Jin M, Meishvili G, Favaro P. Learning to extract a video sequence from a single motion-blurred image. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 6334–6342

  10. Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 2414–2423

  11. Ota K, Dao M S, Mezaris V, et al. Deep learning for mobile multimedia. ACM Trans Multimedia Comput Commun Appl, 2017, 13: 1–22

    Google Scholar 

  12. Motamedi M, Gysel P, Ghiasi S. PLACID: a platform for FPGA-based accelerator creation for DCNNs. ACM Trans Multimedia Comput Commun Appl, 2017, 13: 1–21

    Article  Google Scholar 

  13. Luan F, Paris S, Shechtman E, et al. Deep photo style transfer. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 6997–7005

  14. Yang L, Yang L, Zhao M, et al. Controlling stroke size in fast style transfer with recurrent convolutional neural network. Comput Graphics Forum, 2018, 37: 97–107

    Article  Google Scholar 

  15. Yao Y, Ren J, Xie X, et al. Attention-aware multi-stroke style transfer. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1467–1475

  16. Luan F, Paris S, Shechtman E, et al. Deep painterly harmonization. Comput Graphics Forum, 2018, 37: 95–106

    Article  Google Scholar 

  17. Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 1510–1519

  18. Park D Y, Lee K H. Arbitrary style transfer with style-attentional networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 5873–5881

  19. Li Y, Fang C, Yang J, et al. Universal style transfer via feature transforms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 385–395

  20. Wang H, Li Y, Wang Y, et al. Collaborative distillation for ultra-resolution universal style transfer. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1857–1866

  21. Wang Z, Zhao L, Chen H, et al. Diversified arbitrary style transfer via deep feature perturbation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 7786–7795

  22. Zhang C, Zhu Y, Zhu S C. MetaStyle: three-way trade-off among speed, flexibility, and quality in neural style transfer. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence, 2019. 1254–1261

  23. Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of European Conference on Computer Vision, 2016. 694–711

  24. Cheng M M, Liu X C, Wang J, et al. Structure-preserving neural style transfer. IEEE Trans Image Process, 2020, 29: 909–920

    Article  MathSciNet  MATH  Google Scholar 

  25. Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process, 2004, 13: 600–612

    Article  Google Scholar 

  26. Ulyanov D, Lebedev V, Vedaldi A, et al. Texture networks: feed-forward synthesis of textures and stylized images. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016. 1349–1357

  27. Chen T, Du Z, Sun N, et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, 2014

  28. Qiu J, Song S, Wang Y, et al. Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

  29. Chen Y H, Krishna T, Emer J S, et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits, 2017, 52: 127–138

    Article  Google Scholar 

  30. Ma Y, Cao Y, Vrudhula S, et al. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans VLSI Syst, 2018, 26: 1354–1367

    Article  Google Scholar 

  31. Li H, Fan X, Jiao L, et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: Proceedings of 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016. 1–9

  32. Lu L, Xie J, Huang R, et al. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In: Proceedings of 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2019. 17–25

  33. Liang Y, Lu L, Xiao Q, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs. IEEE Trans Comput-Aided Des Integr Circ Syst, 2020, 39: 857–870

    Article  Google Scholar 

  34. Zhang C, Di W, Sun J, et al. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In: Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2019

  35. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015

  36. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 770–778

  37. Ulyanov D, Vedaldi A, Lempitsky V. Instance normalization: the missing ingredient for fast stylization. 2016. ArXiv:1607.08022

  38. Dumoulin V, Shlens J, Kudlur M. A learned representation for artistic style. In: Proceedings of International Conference on Learning Representations, 2017

  39. Li Y, Fang C, Yang J, et al. Diversified texture synthesis with feed-forward networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 266–274

  40. Zhang H, Dana K. Multi-style generative network for real-time transfer. 2017. ArXiv:1703.06953

  41. Sheng L, Lin Z, Shao J, et al. Avatar-Net: multi-scale zero-shot style transfer by feature decoration. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8242–8250

  42. Ma Y, Kim M, Cao Y, et al. End-to-end scalable FPGA accelerator for deep residual networks. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 2017

  43. Russakovsky O, Su J D, Krause J, et al. ImageNet large scale visual recognition challenge. In: Proceedings of International Journal of Computer Vision, 2015. 211–252

  44. Cimpoi M, Maji S, Kokkinos I, et al. Describing textures in the wild. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. 3606–3613

  45. Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations, 2015

  46. Abadi M, Agarwal A, Barham P, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. 2016. ArXiv:1603.04467

  47. Yu Y, Zhao T, Wang M, et al. Uni-OPU: an FPGA-based uniform accelerator for convolutional and transposed convolutional networks. IEEE Trans VLSI Syst, 2020, 28: 1545–1556

    Article  Google Scholar 

  48. Xu H, Wang Y, Wang Y, et al. ACG-engine: an inference accelerator for content generative neural networks. In: Proceedings of 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2019. 1–7

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant No. 62074041), State Key Laboratory of ASIC and System (Grant No. 2021KF009), Zhuhai Fudan Innovation Institute, Key R&D program of Shandong Province (Grant No. 2022CXGC010504).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyu Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ling, Y., Huang, Y., Cai, Y. et al. FSS: algorithm and neural network accelerator for style transfer. Sci. China Inf. Sci. 67, 122401 (2024). https://doi.org/10.1007/s11432-022-3676-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3676-2

Keywords

Navigation