Skip to main content

Advertisement

Log in

An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the system performance, volume and power restriction requirements in edge computing, single chip based on Field Programmable Gate Array (FPGA), with the characteristics of parallel execution, flexible configuration and power efficiency, is more desirable for realizing Convolutional Neural Network (CNN) acceleration. However, implementing a lightweight CNN with limited on-chip resources while maintaining high computing efficiency and utilization is still a challenging task. To achieve efficient acceleration with single chip, we implement Network-on-Chip (NoC) based on Processing Element (PE) that consists of multiple node arrays. Moreover, the computing and memory efficiencies of PE are optimized with a sharing function and hybrid memory. To maximize resource utilization, a theoretical model is constructed to explore the parallel parameters and running cycles of each PE. In the experimental results of LeNet and MobileNet, resource utilization values of 83.61% and 95.28% are achieved, where the throughput values are 53.3 Giga Operations Per Second (GOPS) and 41.9 GOPS, respectively. Power measurements show that the power efficiency is optimized to 77.25 GOPS/W and 85.51 GOPS/W on our platform, which is sufficient to realize efficient inference for edge computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Algorithm 1

Similar content being viewed by others

References

  1. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. pp 1–74. Springer

  2. Bai L, Zhao Y, Huang X (2018) A cnn accelerator on fpga using depthwise separable convolution. IEEE Trans Circ Syst II: Express Briefs 65(10):1415–1419

    Google Scholar 

  3. Bianchi V, Bassoli M, Lombardo G, Fornacciari P, Mordonini M, De Munari I (2019) Iot wearable sensor and deep learning: an integrated approach for personalized human activity recognition in a smart home environment. IEEE Internet Things J 6(5):8553–8562

    Article  Google Scholar 

  4. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1251–1258

  5. Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: International conference on artificial neural networks, pp 281–290. Springer

  6. Ding W, Huang Z, Huang ZA, Tian LA, Wang HA, Feng SA (2019) Designing efficient accelerator of depthwise separable convolutional neural network on fpga. J Syst Archit 97:278– 286

    Article  Google Scholar 

  7. Gilan AA, Emad M, Alizadeh B (2019) Fpga-based implementation of a real-time object recognition system using convolutional neural network. IEEE Trans Circ Syst II: Express Briefs 67(4):755–759

    Google Scholar 

  8. Huang W, Wu H, Chen Q, Luo C, Huang Y (2021) Fpga-based high-throughput cnn hardware accelerator with high computing resource utilization ratio. IEEE Trans Neural Netw Learn Syst PP (99):1–15

    Google Scholar 

  9. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp 448–456. PMLR

  10. Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2704– 2713

  11. Jafari A, Ganesan A, Thalisetty CSK, Sivasubramanian V, Oates T, Mohsenin T (2018) Sensornet: a scalable and low-power deep convolutional neural network for multimodal data classification. IEEE Trans Circ Syst I: Regular Papers 66(1):274– 287

    Google Scholar 

  12. Kala S, Jose BR, Mathew J, Nalesh S (2019) High-performance cnn accelerator on fpga using unified winograd-gemm architecture. IEEE Trans Very Large Scale Integration (VLSI) Syst 27(12):2816–2828

    Article  Google Scholar 

  13. Liao S, Samiee A, Deng C, Bai Y, Yuan B (2019) Compressing deep neural networks using toeplitz matrix: algorithm design and fpga implementation. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1443–1447. IEEE

  14. Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1529–1538

  15. Liu X, Yang J, Zou C, Chen Q, Yan X, Chen Y, Cai C (2021) Collaborative edge computing with fpga-based cnn accelerators for energy-efficient and time-aware face tracking system, pp 252–266. IEEE

  16. Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, Xu Y (2017) Throughput-optimized fpga accelerator for deep convolutional neural networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10(3):1–23

    Article  Google Scholar 

  17. Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116– 131

  18. Ma Y, Cao Y, Vrudhula S, Seo Js (2018) Automatic compilation of diverse cnns onto high-performance fpga accelerators. IEEE Trans Comput-Aided Des Integrated Circ Syst 39(2):424– 437

    Article  Google Scholar 

  19. Ma Y, Cao Y, Vrudhula S, Seo Js (2018) Optimizing the convolution operation to accelerate deep neural networks on fpga. IEEE Trans Very Large Scale Integration (VLSI) Syst 26(7):1354–1367

    Article  Google Scholar 

  20. Ma Y, Cao Y, Vrudhula S, Seo JS (2019) Performance modeling for cnn inference accelerators on fpga. IEEE Trans Comput-Aided Des Integr Circ Syst 39(4):843–856

    Article  Google Scholar 

  21. Mathieu M, Henaff M, LeCun Y (2014) Fast training of convolutional networks through ffts. In: 2nd International Conference on Learning Representations, ICLR 2014

  22. Moolchandani D, Kumar A, Sarangi SR (2021) Accelerating cnn inference on asics: a survey. J Syst Archit 113:101887

    Article  Google Scholar 

  23. Mukhopadhyay AK, Majumder S, Chakrabarti I (2022) Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array. Comput Electr Eng 97:107628

    Article  Google Scholar 

  24. Palossi D, Conti F, Benini L (2019) An open source and open hardware deep learning-powered visual navigation engine for autonomous nano-uavs. In: 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), pp 604–611. IEEE

  25. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520

  26. Wang J, Lin J, Wang Z (2017) Efficient hardware architectures for deep convolutional neural network. IEEE Trans Circ Syst I: Regular Papers 65(6):1941–1953

    Google Scholar 

  27. Wang S, Ananthanarayanan G, Zeng Y, Goel N, Pathania A, Mitra T (2019) High-throughput cnn inference on embedded arm big. little multicore processors. IEEE Trans Comput-Aided Des Integr Circ Syst 39(10):2254–2267

    Article  Google Scholar 

  28. Yu Y, Wu C, Zhao T, Wang K, He L (2019) Opu: an fpga-based overlay processor for convolutional neural networks. IEEE Trans Very Large Scale Integr VLSI Syst 28(1):35–47

    Article  Google Scholar 

  29. Zeng H, Chen R, Zhang C, Prasanna V (2018) A framework for generating high throughput cnn implementations on fpgas. In: Proceedings of the 2018 ACM/SIGDA international symposium on field-programmable gate arrays, pp 117–126

  30. Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170

  31. Zhang Y, Li X (2020) Fast convolutional neural networks with fine-grained ffts. In: Proceedings of the ACM international conference on parallel architectures and compilation techniques, pp 255–265

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China(NSFC) under Grant 62171156.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, R., Liu, B., Fu, P. et al. An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA. Appl Intell 53, 13867–13881 (2023). https://doi.org/10.1007/s10489-022-04251-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04251-3

Keywords

Navigation