Abstract
Neural networks (NNs), owing to their impressive performance, have gradually begun to dominate multimedia processing. For resource-constrained and energy-sensitive mobile devices, an efficient NN accelerator is necessary. Style transfer is an important multimedia application. However, existing arbitrary style transfer networks are complex and not well supported by current NN accelerators, limiting their application on mobile devices. Moreover, the quality of style transfer needs improvement. Thus, we design the FastStyle system (FSS), where a novel algorithm and an NN accelerator are proposed for style transfer. In FSS, we first propose a novel arbitrary style transfer algorithm, FastStyle. We propose a light network that contributes to high quality and low computational complexity and a prior mechanism to avoid retraining when the style changes. Then, we redesign an NN accelerator for FastStyle by applying two improvements to the basic NVIDIA deep learning accelerator (NVDLA) architecture. First, a flexible dat FSM and wt FSM are redesigned to enable the original data path to perform other operations (including the GRAM operation) by software programming. Moreover, statistics and judgment logic are designed to utilize the continuity of a video stream and remove the data dependency in the instance normalization, which improves the accelerator performance by 18.6%. The experimental results demonstrate that the proposed FastStyle can achieve higher quality with a lower computational cost, making it more suitable for mobile devices. The proposed NN accelerator is implemented on the Xilinx VCU118 FPGA under a 180-MHz clock. Experimental results show that the accelerator can stylize 512×512-pixel video with 20 FPS, and the measured performance reaches up to 306.07 GOPS. The ASIC implementation in TSMC 28 nm achieves about 22 FPS in the case of a 720-p video.
References
Wronski B, Garcia-Dorado I, Ernst M, et al. Handheld multi-frame super-resolution. ACM Trans Graph, 2019, 38: 1–18
Dong C, Loy C C, He K, et al. Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 295–307
Sajjadi M S M, Vemulapalli R, Brown M. Frame-recurrent video super-resolution. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 6626–6634
Shen W, Bao W, Zhai G, et al. Video frame interpolation and enhancement via pyramid recurrent framework. IEEE Trans Image Process, 2021, 30: 277–292
Wei K, Fu Y, Yang J, et al. A physics-based noise formation model for extreme low-light raw denoising. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2755–2764
Zamir S W, Arora A, Khan S, et al. CycleISP: real image restoration via improved data synthesis. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2693–2702
Anwar S, Barnes N. Real image denoising with feature attention. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 3155–3164
Nah S, Son S, Lee K M. Recurrent neural networks with intra-frame iterations for video deblurring. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 8094–8103
Jin M, Meishvili G, Favaro P. Learning to extract a video sequence from a single motion-blurred image. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 6334–6342
Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 2414–2423
Ota K, Dao M S, Mezaris V, et al. Deep learning for mobile multimedia. ACM Trans Multimedia Comput Commun Appl, 2017, 13: 1–22
Motamedi M, Gysel P, Ghiasi S. PLACID: a platform for FPGA-based accelerator creation for DCNNs. ACM Trans Multimedia Comput Commun Appl, 2017, 13: 1–21
Luan F, Paris S, Shechtman E, et al. Deep photo style transfer. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 6997–7005
Yang L, Yang L, Zhao M, et al. Controlling stroke size in fast style transfer with recurrent convolutional neural network. Comput Graphics Forum, 2018, 37: 97–107
Yao Y, Ren J, Xie X, et al. Attention-aware multi-stroke style transfer. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1467–1475
Luan F, Paris S, Shechtman E, et al. Deep painterly harmonization. Comput Graphics Forum, 2018, 37: 95–106
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 1510–1519
Park D Y, Lee K H. Arbitrary style transfer with style-attentional networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 5873–5881
Li Y, Fang C, Yang J, et al. Universal style transfer via feature transforms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 385–395
Wang H, Li Y, Wang Y, et al. Collaborative distillation for ultra-resolution universal style transfer. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1857–1866
Wang Z, Zhao L, Chen H, et al. Diversified arbitrary style transfer via deep feature perturbation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 7786–7795
Zhang C, Zhu Y, Zhu S C. MetaStyle: three-way trade-off among speed, flexibility, and quality in neural style transfer. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence, 2019. 1254–1261
Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of European Conference on Computer Vision, 2016. 694–711
Cheng M M, Liu X C, Wang J, et al. Structure-preserving neural style transfer. IEEE Trans Image Process, 2020, 29: 909–920
Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process, 2004, 13: 600–612
Ulyanov D, Lebedev V, Vedaldi A, et al. Texture networks: feed-forward synthesis of textures and stylized images. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016. 1349–1357
Chen T, Du Z, Sun N, et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, 2014
Qiu J, Song S, Wang Y, et al. Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
Chen Y H, Krishna T, Emer J S, et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits, 2017, 52: 127–138
Ma Y, Cao Y, Vrudhula S, et al. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans VLSI Syst, 2018, 26: 1354–1367
Li H, Fan X, Jiao L, et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: Proceedings of 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016. 1–9
Lu L, Xie J, Huang R, et al. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In: Proceedings of 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2019. 17–25
Liang Y, Lu L, Xiao Q, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs. IEEE Trans Comput-Aided Des Integr Circ Syst, 2020, 39: 857–870
Zhang C, Di W, Sun J, et al. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In: Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2019
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 770–778
Ulyanov D, Vedaldi A, Lempitsky V. Instance normalization: the missing ingredient for fast stylization. 2016. ArXiv:1607.08022
Dumoulin V, Shlens J, Kudlur M. A learned representation for artistic style. In: Proceedings of International Conference on Learning Representations, 2017
Li Y, Fang C, Yang J, et al. Diversified texture synthesis with feed-forward networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 266–274
Zhang H, Dana K. Multi-style generative network for real-time transfer. 2017. ArXiv:1703.06953
Sheng L, Lin Z, Shao J, et al. Avatar-Net: multi-scale zero-shot style transfer by feature decoration. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8242–8250
Ma Y, Kim M, Cao Y, et al. End-to-end scalable FPGA accelerator for deep residual networks. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 2017
Russakovsky O, Su J D, Krause J, et al. ImageNet large scale visual recognition challenge. In: Proceedings of International Journal of Computer Vision, 2015. 211–252
Cimpoi M, Maji S, Kokkinos I, et al. Describing textures in the wild. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. 3606–3613
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations, 2015
Abadi M, Agarwal A, Barham P, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. 2016. ArXiv:1603.04467
Yu Y, Zhao T, Wang M, et al. Uni-OPU: an FPGA-based uniform accelerator for convolutional and transposed convolutional networks. IEEE Trans VLSI Syst, 2020, 28: 1545–1556
Xu H, Wang Y, Wang Y, et al. ACG-engine: an inference accelerator for content generative neural networks. In: Proceedings of 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2019. 1–7
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant No. 62074041), State Key Laboratory of ASIC and System (Grant No. 2021KF009), Zhuhai Fudan Innovation Institute, Key R&D program of Shandong Province (Grant No. 2022CXGC010504).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ling, Y., Huang, Y., Cai, Y. et al. FSS: algorithm and neural network accelerator for style transfer. Sci. China Inf. Sci. 67, 122401 (2024). https://doi.org/10.1007/s11432-022-3676-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3676-2