Skip to main content
Log in

CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have been widely deployed in artificial intelligence, including computer vision and pattern recognition. In these applications, CNN is the most computationally intensive part. Recently, many researchers have used depthwise convolution to decrease the computational load in the execution of CNNs; on the other hand, today, CNNs have become larger and larger. Consequently, they need more computational budget for their executions. The problem is more serious when this application is run in an embedded system, especially in the edge devices, as the embedded processor can hardly handle these heavy computational loads. This paper proposes a lightweight, low-power, and efficient CNN hardware accelerator for edge computing devices. This accelerator is explicitly designed for depthwise CNN. The proposed accelerator can be configured and programmed to run any lightweight CNN of a wide range of AI networks such as MobileNet, Xception, and shuffleNet. Our experimental results show that our accelerator can run MobileNet 70 times per second in a remote sensing AI application with a \(224\times 224\) pixel image from the ImageNet dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Andrii, O.T.; et al.: Convolutional neural networks as a model of the visual system: past, present, and future. J. Cognit. Neurosci. 33(10), 2017–2031 (2021)

    Article  Google Scholar 

  2. Khan, A.; et al.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020)

    Article  Google Scholar 

  3. Tianyi, L.; et al.: Implementation of Training Convolutional Neural Networks. arXiv:1506.01195 (2015)

  4. Pérez, I.; Figueroa, M.: A heterogeneous hardware accelerator for image classification in embedded systems. Sensors 21(8), 2637 (2021). https://doi.org/10.3390/s21082637

    Article  Google Scholar 

  5. Hammami, E.; et al.: An overview on loop tiling techniques for code generation. In: IEEE/ACS 14th International Conference on Computer Systems and Applications (2017)

  6. Pérez, I.; Figueroa, M.: A heterogeneous hardware accelerator for image classification in embedded systems. Sensors 21(8), 2637 (2021). https://doi.org/10.3390/s21082637

  7. Bouguezzi, S.; et al.: An efficient FPGA-based convolutional neural network for classification: Ad-MobileNet. MDPI Electron. 10, 1025 (2021). https://doi.org/10.3390/electronics10182272

    Article  Google Scholar 

  8. Liu, B.; et al.: An FPGA-based CNN accelerator integrating depthwise separable convolution. MDPI Electron. 8(3), 281 (2019). https://doi.org/10.3390/electronics8030281

    Article  Google Scholar 

  9. Jiang, S.; et al.: Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification. In: International Symposium on Applied Reconfigurable Computing, ARC, pp. 16–28. Springer (2018)

  10. Ma, Y.; et al.: ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler. Integr. VLSI J. 62, 14–23 (2018)

    Article  Google Scholar 

  11. Shen, Y., et al. Escher: A CNN accelerator with flexible buffering to minimize off-chip transfer. In: Annual IEEE Symposium on Filed-Programmable Custom Computing Machine FCCM, pp. 93–100 (2017)

  12. Lian, X.; et al.: High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1874–1885 (2019)

    Article  Google Scholar 

  13. Dinga, W.; et al.: Designing efficient accelerator of depthwise separable convolutional neural network on FPGA. J. Syst. Archit. 97, 278–286 (2019)

    Article  Google Scholar 

  14. Jahanshahi, A.; Sharifi, R.; Rezvani, M.; Zamani, H.: Inf4edge:Automatic resource-aware generation of energy-efficient CNN inference accelerator for edge embedded FPGA. In: 2021 12th International Green and Sustainable Computing Workshops (IGSC), Energy-Efficient Machine Learning (E2ML). IEEE (2021)

  15. Howard, A.G., et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.0486 (2017)

  16. Chen, Y.-H.; et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)

    Article  Google Scholar 

  17. Wang, M.; et al.: Factorized convolutional neural networks. In: Proceedings of IEEE International conference on Computer Vision Workshops, pp. 545–553 (2017)

  18. Sharma, H.; et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks. In: ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hakem Beithollahi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farahani, A., Beithollahi, H., Fathi, M. et al. CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge. Arab J Sci Eng 48, 1537–1545 (2023). https://doi.org/10.1007/s13369-022-06931-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-022-06931-1

Keywords

Navigation