Abstract
Convolutional neural networks (CNNs) have been widely deployed in artificial intelligence, including computer vision and pattern recognition. In these applications, CNN is the most computationally intensive part. Recently, many researchers have used depthwise convolution to decrease the computational load in the execution of CNNs; on the other hand, today, CNNs have become larger and larger. Consequently, they need more computational budget for their executions. The problem is more serious when this application is run in an embedded system, especially in the edge devices, as the embedded processor can hardly handle these heavy computational loads. This paper proposes a lightweight, low-power, and efficient CNN hardware accelerator for edge computing devices. This accelerator is explicitly designed for depthwise CNN. The proposed accelerator can be configured and programmed to run any lightweight CNN of a wide range of AI networks such as MobileNet, Xception, and shuffleNet. Our experimental results show that our accelerator can run MobileNet 70 times per second in a remote sensing AI application with a \(224\times 224\) pixel image from the ImageNet dataset.
Similar content being viewed by others
References
Andrii, O.T.; et al.: Convolutional neural networks as a model of the visual system: past, present, and future. J. Cognit. Neurosci. 33(10), 2017–2031 (2021)
Khan, A.; et al.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020)
Tianyi, L.; et al.: Implementation of Training Convolutional Neural Networks. arXiv:1506.01195 (2015)
Pérez, I.; Figueroa, M.: A heterogeneous hardware accelerator for image classification in embedded systems. Sensors 21(8), 2637 (2021). https://doi.org/10.3390/s21082637
Hammami, E.; et al.: An overview on loop tiling techniques for code generation. In: IEEE/ACS 14th International Conference on Computer Systems and Applications (2017)
Pérez, I.; Figueroa, M.: A heterogeneous hardware accelerator for image classification in embedded systems. Sensors 21(8), 2637 (2021). https://doi.org/10.3390/s21082637
Bouguezzi, S.; et al.: An efficient FPGA-based convolutional neural network for classification: Ad-MobileNet. MDPI Electron. 10, 1025 (2021). https://doi.org/10.3390/electronics10182272
Liu, B.; et al.: An FPGA-based CNN accelerator integrating depthwise separable convolution. MDPI Electron. 8(3), 281 (2019). https://doi.org/10.3390/electronics8030281
Jiang, S.; et al.: Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification. In: International Symposium on Applied Reconfigurable Computing, ARC, pp. 16–28. Springer (2018)
Ma, Y.; et al.: ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler. Integr. VLSI J. 62, 14–23 (2018)
Shen, Y., et al. Escher: A CNN accelerator with flexible buffering to minimize off-chip transfer. In: Annual IEEE Symposium on Filed-Programmable Custom Computing Machine FCCM, pp. 93–100 (2017)
Lian, X.; et al.: High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1874–1885 (2019)
Dinga, W.; et al.: Designing efficient accelerator of depthwise separable convolutional neural network on FPGA. J. Syst. Archit. 97, 278–286 (2019)
Jahanshahi, A.; Sharifi, R.; Rezvani, M.; Zamani, H.: Inf4edge:Automatic resource-aware generation of energy-efficient CNN inference accelerator for edge embedded FPGA. In: 2021 12th International Green and Sustainable Computing Workshops (IGSC), Energy-Efficient Machine Learning (E2ML). IEEE (2021)
Howard, A.G., et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.0486 (2017)
Chen, Y.-H.; et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Wang, M.; et al.: Factorized convolutional neural networks. In: Proceedings of IEEE International conference on Computer Vision Workshops, pp. 545–553 (2017)
Sharma, H.; et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks. In: ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) (2018)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Farahani, A., Beithollahi, H., Fathi, M. et al. CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge. Arab J Sci Eng 48, 1537–1545 (2023). https://doi.org/10.1007/s13369-022-06931-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-022-06931-1