Efficient Hardware Acceleration of Emerging Neural Networks for Embedded Machine Learning: An Industry Perspective

Raha, Arnab; Sung, Raymond; Ghosh, Soumendu; Gupta, Praveen Kumar; Mathaikutty, Deepak A.; Cheema, Umer I.; Hyland, Kevin; Brick, Cormac; Raghunathan, Vijay

doi:10.1007/978-3-031-19568-6_5

Arnab Raha³,
Raymond Sung³,
Soumendu Ghosh⁴,
Praveen Kumar Gupta⁵,
Deepak A. Mathaikutty⁶,
Umer I. Cheema⁵,
Kevin Hyland⁷,
Cormac Brick³ &
…
Vijay Raghunathan⁴

533 Accesses
3 Citations
1 Altmetric

Abstract

As neural networks become more complex, the energy required for doing training and inference has resulted in a noticeable shift towards adopting specialized accelerators to meet strict latency and energy constraints that are prevalent in both edge and cloud deployments. These accelerators achieve high performance through parallelism over hundreds of processing elements, and energy efficiency is achieved by reducing data movement and maximizing resource utilization through data reuse. After providing a brief summary of the problems that neural networks have been solving in the domains of Computer Vision, Natural Language Processing, Recommendation Systems and Graph Processing we will discuss how individual layers from each of these different neural networks can be accelerated in an energy-efficient manner. In particular, we focus on design considerations and trade-offs for mapping CNNs, Transformers, and GNNs on AI accelerators that attempt to maximize compute efficiency and minimize energy consumption by reducing the number of access to memory through efficient data reuse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Raha, A., Kim, S.K., Mathaikutty, D.A., Venkataramanan, G., Mohapatra, D., Sung, R., Brick, C., Chinya, G.N.: Design considerations for edge neural network accelerators: An industry perspective. In: 34th International Conference on VLSI Design and 20th International Conference on Embedded Systems, pp. 328–333 (2021)
Google Scholar
Raha, A., Ghosh, S., Mohapatra, D., Mathaikutty, D.A., Sung, R., Brick, C., Raghunathan, V.: Special session: Approximate TinyML systems: Full system approximations for extreme energy-efficiency in intelligent edge devices. In: IEEE 39th International Conference on Computer Design (ICCD), pp. 13–16 (2021)
Google Scholar
Sze, V., Chen, Y.H., Yang, T.-J., Emer, J.S.: Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 2295–2329 (2017)
Article Google Scholar
Kwon, H., Chatarasi, P., Sarkar, V., Krishna, T., Pellauer, M., Parashar, A.: Maestro: A data-centric approach to understand reuse, performance, and hardware cost of DNN mappings. IEEE Micro 40, 20–29 (2020)
Article Google Scholar
Norrie, T., Patil, N., Yoon, D.H., Kurian, G., Li, S., Laudon, J., Young, C., Jouppi, N.P., Patterson, D.A.: The design process for Google’s training chips: Tpuv2 and tpuv3. IEEE Micro 41, 56–63 (2021)
Article Google Scholar
Jang, J.-W., Lee, S., Kim, D., Park, H., Ardestani, A.S., Choi, Y., Kim, C., Kim, Y., Yu, H., et al.: Sparsity-aware and re-configurable NPU architecture for Samsung flagship mobile soc. In: ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pp. 15–28, (2021)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6, 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
Zhao, Y., Wang, G., Tang, C., Luo, C., Zeng, W., Zha, Z.-J.: A battle of network structures: An empirical study of CNN, transformer, and MLP (2021). arXiv
Google Scholar
Meta AI. The latest in machine learning — papers with code. https://paperswithcode.com/
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need (2017). arXiv
Google Scholar
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: A survey. ACM Comput. Surv. 54, 1–41 (2021)
Article Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale (2021). arXiv
Google Scholar
Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers (2021). arXiv
Google Scholar
Tolstikhin, I.O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., et al.: MLP-mixer: An all-MLP architecture for vision. Adv. Neural Inf. Process. Syst. 34, 24261–24272 (2021)
Google Scholar
Ko, H., Lee, S., Park, Y., Choi, A.: A survey of recommendation systems: Recommendation models, techniques, and application fields. Electronics 11(1), 141 (2022)
Article Google Scholar
Wu, S., Sun, F., Zhang, W., Cui, B.: Graph neural networks in recommender systems: A survey (2020). arXiv
Google Scholar
Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)
Article Google Scholar
Dong, G., Tang, M., Wang, Z., Gao, J., Guo, S., Cai, L., Gutierrez, R., Campbell, B., Barnes, L.E., Boukhechba, M.L: Graph neural networks in IoT: A survey. ACM Trans. Sensor Netw. (2022). http://nvdla.org/hw/v1/ias/lut-programming.html
Abadal, S., Jain, A., Guirado, R., López-Alonso, J., Alarcón, E.: Computing graph neural networks: A survey from algorithms to accelerators. ACM Comput. Surv. 54(9), 1–38 (2021)
Article Google Scholar
Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., Sun, M.: Graph neural networks: A review of methods and applications. AI Open 1, 57–81 (2020)
Article Google Scholar
NVDLA Open Source Project - LUT programming
Google Scholar
Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)
Article Google Scholar
Chen, Y.-H., Yang, T.J., Emer, J., Sze, V.: Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Topics Circuits Syst. 9(2), 292–308 (2019)
Article Google Scholar
Lin, C.-H., Cheng, C.-C., Tsai, Y.-M., Hung, S.-J., Kuo, Y.-T., Wang, P.H., Tsung, P.-K., Hsu, J.-Y., Lai, W.-C., et al.: 7.1 a 3.4-to-13.3tops/w 3.6tops dual-core deep-learning accelerator for versatile AI applications in 7nm 5g smartphone soc. In: IEEE International Solid-State Circuits Conference-(ISSCC), pp. 134–136 (2020)
Google Scholar
Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al.: In-datacenter performance analysis of a tensor processing unit. SIGARCH Comput. Archit. News 45(2), 1–12 (2017)
Article Google Scholar
Qin, E., Samajdar, A., Kwon, H., Nadella, V., Srinivasan, S.M., Das, D., Kaul, B., Krishna, T.: Sigma: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 58–70 (2020)
Google Scholar
NVIDIA. Nvidia ampere architecture (2022). https://www.nvidia.com/en-us/data-center/ampere-architecture/
Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., Dally, W.J.: Scnn: An accelerator for compressed-sparse convolutional neural networks. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 27–40 (2017)
Google Scholar
Rhu, M., O’Connor, M., Chatterjee, N., Pool, J., Kwon, Y., Keckler, S.W.: Compressing DMA engine: Leveraging activation sparsity for training deep neural networks. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 78–91 (2018)
Google Scholar
IntelⓇ Movidius™ Myriad™ X Vision Processing Unit (VPU). https://www.intel.com/content/www/us/en/products/details/processors/movidius-vpu/movidius-myriad-x.html
Lee, B., Burgess, N.: Some results on Taylor-series function approximation on FPGA. In: The Thirty-Seventh Asilomar Conference on Signals, Systems Computers, vol. 2, pp. 2198–2202 (2003)
Google Scholar
Lin, C.-W., Wang, J.-S.: A digital circuit design of hyperbolic tangent sigmoid function for neural networks. In: 2008 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 856–859 (2008)
Google Scholar
Leboeuf, K., Namin, A.H., Muscedere, R., Wu, H., Ahmadi, M.: High speed VLSI implementation of the hyperbolic tangent sigmoid function. In: Third International Conference on Convergence and Hybrid Information Technology, vol. 1, pp. 1070–1073 (2008)
Google Scholar
Zamanlooy, B., Mirhassani, M.: Efficient VLSI implementation of neural networks with hyperbolic tangent activation function. IEEE Trans. Very Large Scale Integr. Syst. 22(1), 39–48 (2014)
Article Google Scholar
Ioannou, Y.A., Robertson, D.P., Cipolla, R., Criminisi, A.: Deep roots: Improving CNN efficiency with hierarchical filter groups. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5977–5986 (2017)
Google Scholar
Sun, K., Li, M., Liu, D., Wang, J.: Igcv3: Interleaved low-rank group convolutions for efficient deep neural networks. In: BMVC (2018)
Google Scholar
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning (2016). arXiv
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019)
Google Scholar
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., et al.: Language models are few-shot learners (2020). arXiv
Google Scholar
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer (2019). arXiv
Google Scholar
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992–10002 (2021)
Google Scholar
Wang, H., Zhang, Z., Han, S.: SpAtten: Efficient sparse attention architecture with cascade token and head pruning. In: IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 97–110 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs (2017). arXiv
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks (2018). arXiv
Google Scholar
Yan, M., Deng, L., Hu, X., Liang, L., Feng, Y., Ye, X., Zhang, Z., Fan, D., Xie, Y.: HyGCN: A GCN accelerator with hybrid architecture (2020). arXiv
Google Scholar
Stevens, J.R., Das, D., Avancha, S., Kaul, B., Raghunathan, A.: GNNerator: A hardware/software framework for accelerating graph neural networks (2021). arXiv
Google Scholar
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s (2022). arXiv
Google Scholar
Susskind, Z., Arden, B., John, L.K., Stockton, P., John, E.B.: Neuro-symbolic AI: An emerging class of AI workloads and their characterization (2021). arXiv
Google Scholar
Wang, X., Han, Y., Leung, V.C., Niyato, D., Yan, X., Chen, X.: Convergence of edge computing and deep learning: A comprehensive survey. IEEE Commun. Surv. Tutor. 22(2), 869–904 (2020)
Article Google Scholar
Raha, A., Raghunathan, V.: qLUT: Input-aware quantized table lookup for energy-efficient approximate accelerators. ACM Trans. Embed. Comput. Syst. 16(5s), 1–23 (2017)
Article Google Scholar
Salvator, D., Wu, H., Kulkarni, M., Emmart, N.: Nvidia technical blog: Int4 precision for AI inference (2019). https://www.nvidia.com/en-us/data-center/ampere-architecture/
Choi, J., Venkataramani, S.: Highly accurate deep learning inference with 2-bit precision (2019). https://www.ibm.com/blogs/research/2019/04/2-bit-precision/
Ghosh, S.K., Raha, A., Raghunathan, V.: Approximate inference systems (axis) end-to-end approximations for energy-efficient inference at the edge. In: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 7–12 (2020)
Google Scholar
Bavikadi, S., Sutradhar, P.R., Khasawneh, K.N., Ganguly, A., Dinakarrao, S.M.P.: A review of in-memory computing architectures for machine learning applications. In: Proceedings of the Great Lakes Symposium on VLSI, pp. 89–94 (2020)
Google Scholar
Yu, S., Jiang, H., Huang, S., Peng, X., Lu, A.: Compute-in-memory chips for deep learning: recent trends and prospects. IEEE Circuits Syst. Mag. 21(3), 31–56 (2021)
Article Google Scholar
Bai, L., Zhao, Y., Huang, X.: A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II: Express Briefs 65(10), 1415–1419 (2018)
Google Scholar
Lu, S., Wang, M., Liang, S., Lin, J., Wang, Z.: Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In: IEEE 33rd International System-on-Chip Conference (SOCC), pp. 84–89. IEEE (2020)
Google Scholar
Kiningham, K., Re, C., Levis, P.: Grip: A graph neural network accelerator architecture (2020). arXiv
Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, Santa Clara, CA, USA
Arnab Raha, Raymond Sung & Cormac Brick
Purdue University, West Lafayette, IN, USA
Soumendu Ghosh & Vijay Raghunathan
Intel Corporation, Hillsboro, OR, USA
Praveen Kumar Gupta & Umer I. Cheema
Intel Corporation, Chandler, AZ, USA
Deepak A. Mathaikutty
Intel Corporation, Leixlip, Kildare, Ireland
Kevin Hyland

Authors

Arnab Raha
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Sung
View author publications
You can also search for this author in PubMed Google Scholar
Soumendu Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Praveen Kumar Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Deepak A. Mathaikutty
View author publications
You can also search for this author in PubMed Google Scholar
Umer I. Cheema
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Hyland
View author publications
You can also search for this author in PubMed Google Scholar
Cormac Brick
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Raghunathan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnab Raha .

Editor information

Editors and Affiliations

Colorado State University, Fort Collins, CO, USA
Sudeep Pasricha
New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
Muhammad Shafique

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Raha, A. et al. (2024). Efficient Hardware Acceleration of Emerging Neural Networks for Embedded Machine Learning: An Industry Perspective. In: Pasricha, S., Shafique, M. (eds) Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-19568-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-19568-6_5
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19567-9
Online ISBN: 978-3-031-19568-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics