FSS: algorithm and neural network accelerator for style transfer

Ling, Yi; Huang, Yujie; Cai, Yujie; Li, Zhaojie; Wang, Mingyu; Li, Wenhong; Zeng, Xiaoyang

doi:10.1007/s11432-022-3676-2

FSS: algorithm and neural network accelerator for style transfer

Research Paper
Published: 10 October 2023

Volume 67, article number 122401, (2024)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Yi Ling¹,
Yujie Huang^1,2,
Yujie Cai^1,2,
Zhaojie Li^1,2,
Mingyu Wang¹,
Wenhong Li¹ &
…
Xiaoyang Zeng¹

95 Accesses
Explore all metrics

Abstract

Neural networks (NNs), owing to their impressive performance, have gradually begun to dominate multimedia processing. For resource-constrained and energy-sensitive mobile devices, an efficient NN accelerator is necessary. Style transfer is an important multimedia application. However, existing arbitrary style transfer networks are complex and not well supported by current NN accelerators, limiting their application on mobile devices. Moreover, the quality of style transfer needs improvement. Thus, we design the FastStyle system (FSS), where a novel algorithm and an NN accelerator are proposed for style transfer. In FSS, we first propose a novel arbitrary style transfer algorithm, FastStyle. We propose a light network that contributes to high quality and low computational complexity and a prior mechanism to avoid retraining when the style changes. Then, we redesign an NN accelerator for FastStyle by applying two improvements to the basic NVIDIA deep learning accelerator (NVDLA) architecture. First, a flexible dat FSM and wt FSM are redesigned to enable the original data path to perform other operations (including the GRAM operation) by software programming. Moreover, statistics and judgment logic are designed to utilize the continuity of a video stream and remove the data dependency in the instance normalization, which improves the accelerator performance by 18.6%. The experimental results demonstrate that the proposed FastStyle can achieve higher quality with a lower computational cost, making it more suitable for mobile devices. The proposed NN accelerator is implemented on the Xilinx VCU118 FPGA under a 180-MHz clock. Experimental results show that the accelerator can stylize 512×512-pixel video with 20 FPS, and the measured performance reaches up to 306.07 GOPS. The ASIC implementation in TSMC 28 nm achieves about 22 FPS in the case of a 720-p video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Wronski B, Garcia-Dorado I, Ernst M, et al. Handheld multi-frame super-resolution. ACM Trans Graph, 2019, 38: 1–18
Article Google Scholar
Dong C, Loy C C, He K, et al. Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 295–307
Article Google Scholar
Sajjadi M S M, Vemulapalli R, Brown M. Frame-recurrent video super-resolution. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 6626–6634
Shen W, Bao W, Zhai G, et al. Video frame interpolation and enhancement via pyramid recurrent framework. IEEE Trans Image Process, 2021, 30: 277–292
Article MathSciNet Google Scholar
Wei K, Fu Y, Yang J, et al. A physics-based noise formation model for extreme low-light raw denoising. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2755–2764
Zamir S W, Arora A, Khan S, et al. CycleISP: real image restoration via improved data synthesis. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2693–2702
Anwar S, Barnes N. Real image denoising with feature attention. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 3155–3164
Nah S, Son S, Lee K M. Recurrent neural networks with intra-frame iterations for video deblurring. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 8094–8103
Jin M, Meishvili G, Favaro P. Learning to extract a video sequence from a single motion-blurred image. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 6334–6342
Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 2414–2423
Ota K, Dao M S, Mezaris V, et al. Deep learning for mobile multimedia. ACM Trans Multimedia Comput Commun Appl, 2017, 13: 1–22
Google Scholar
Motamedi M, Gysel P, Ghiasi S. PLACID: a platform for FPGA-based accelerator creation for DCNNs. ACM Trans Multimedia Comput Commun Appl, 2017, 13: 1–21
Article Google Scholar
Luan F, Paris S, Shechtman E, et al. Deep photo style transfer. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 6997–7005
Yang L, Yang L, Zhao M, et al. Controlling stroke size in fast style transfer with recurrent convolutional neural network. Comput Graphics Forum, 2018, 37: 97–107
Article Google Scholar
Yao Y, Ren J, Xie X, et al. Attention-aware multi-stroke style transfer. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1467–1475
Luan F, Paris S, Shechtman E, et al. Deep painterly harmonization. Comput Graphics Forum, 2018, 37: 95–106
Article Google Scholar
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 1510–1519
Park D Y, Lee K H. Arbitrary style transfer with style-attentional networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 5873–5881
Li Y, Fang C, Yang J, et al. Universal style transfer via feature transforms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 385–395
Wang H, Li Y, Wang Y, et al. Collaborative distillation for ultra-resolution universal style transfer. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1857–1866
Wang Z, Zhao L, Chen H, et al. Diversified arbitrary style transfer via deep feature perturbation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 7786–7795
Zhang C, Zhu Y, Zhu S C. MetaStyle: three-way trade-off among speed, flexibility, and quality in neural style transfer. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence, 2019. 1254–1261
Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of European Conference on Computer Vision, 2016. 694–711
Cheng M M, Liu X C, Wang J, et al. Structure-preserving neural style transfer. IEEE Trans Image Process, 2020, 29: 909–920
Article MathSciNet MATH Google Scholar
Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process, 2004, 13: 600–612
Article Google Scholar
Ulyanov D, Lebedev V, Vedaldi A, et al. Texture networks: feed-forward synthesis of textures and stylized images. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016. 1349–1357
Chen T, Du Z, Sun N, et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, 2014
Qiu J, Song S, Wang Y, et al. Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
Chen Y H, Krishna T, Emer J S, et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits, 2017, 52: 127–138
Article Google Scholar
Ma Y, Cao Y, Vrudhula S, et al. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans VLSI Syst, 2018, 26: 1354–1367
Article Google Scholar
Li H, Fan X, Jiao L, et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: Proceedings of 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016. 1–9
Lu L, Xie J, Huang R, et al. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In: Proceedings of 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2019. 17–25
Liang Y, Lu L, Xiao Q, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs. IEEE Trans Comput-Aided Des Integr Circ Syst, 2020, 39: 857–870
Article Google Scholar
Zhang C, Di W, Sun J, et al. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In: Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2019
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 770–778
Ulyanov D, Vedaldi A, Lempitsky V. Instance normalization: the missing ingredient for fast stylization. 2016. ArXiv:1607.08022
Dumoulin V, Shlens J, Kudlur M. A learned representation for artistic style. In: Proceedings of International Conference on Learning Representations, 2017
Li Y, Fang C, Yang J, et al. Diversified texture synthesis with feed-forward networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 266–274
Zhang H, Dana K. Multi-style generative network for real-time transfer. 2017. ArXiv:1703.06953
Sheng L, Lin Z, Shao J, et al. Avatar-Net: multi-scale zero-shot style transfer by feature decoration. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8242–8250
Ma Y, Kim M, Cao Y, et al. End-to-end scalable FPGA accelerator for deep residual networks. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 2017
Russakovsky O, Su J D, Krause J, et al. ImageNet large scale visual recognition challenge. In: Proceedings of International Journal of Computer Vision, 2015. 211–252
Cimpoi M, Maji S, Kokkinos I, et al. Describing textures in the wild. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. 3606–3613
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations, 2015
Abadi M, Agarwal A, Barham P, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. 2016. ArXiv:1603.04467
Yu Y, Zhao T, Wang M, et al. Uni-OPU: an FPGA-based uniform accelerator for convolutional and transposed convolutional networks. IEEE Trans VLSI Syst, 2020, 28: 1545–1556
Article Google Scholar
Xu H, Wang Y, Wang Y, et al. ACG-engine: an inference accelerator for content generative neural networks. In: Proceedings of 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2019. 1–7

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant No. 62074041), State Key Laboratory of ASIC and System (Grant No. 2021KF009), Zhuhai Fudan Innovation Institute, Key R&D program of Shandong Province (Grant No. 2022CXGC010504).

Author information

Authors and Affiliations

State Key Laboratory of ASIC & System, Fudan University, Shanghai, 200120, China
Yi Ling, Yujie Huang, Yujie Cai, Zhaojie Li, Mingyu Wang, Wenhong Li & Xiaoyang Zeng
Shanghai ExploreX Technology Co., Ltd., Shanghai, 200120, China
Yujie Huang, Yujie Cai & Zhaojie Li

Authors

Yi Ling
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhaojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingyu Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ling, Y., Huang, Y., Cai, Y. et al. FSS: algorithm and neural network accelerator for style transfer. Sci. China Inf. Sci. 67, 122401 (2024). https://doi.org/10.1007/s11432-022-3676-2

Download citation

Received: 18 July 2022
Revised: 21 October 2022
Accepted: 05 January 2023
Published: 10 October 2023
DOI: https://doi.org/10.1007/s11432-022-3676-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FSS: algorithm and neural network accelerator for style transfer

Abstract

Access this article

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation