Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture

Moura, Rafael Fão de; Carro, Luigi

doi:10.1007/978-3-031-42921-7_16

Rafael Fão de Moura¹¹ &
Luigi Carro¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14251))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

Abstract

As AI techniques are increasingly adopted in various industry sectors, reducing energy consumption in Neural Network applications has become a priority for researchers. One potential solution is analog ReRAM processing, which outperforms GPU-based approaches in terms of both performance and energy consumption. However, the scalability of ReRAM-based architectures for large-scale NN applications with billions of parameters remains a major challenge. To address this issue, this paper proposes a novel GPU-ReRAM architecture that uses a heuristic approach to identify the best NN layers for ReRAM acceleration, thus enabling ReRAM to be scalable for complex NNs while significantly reducing energy consumption. The effectiveness of this approach was tested on real-world models, resulting in a meaningful 6x reduction in energy consumption without sacrificing inference accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aguirre, F.L., Gomez, N.M., Pazos, S.M., Palumbo, F., Suñé, J., Miranda, E.: Minimization of the line resistance impact on memdiode-based simulations of multilayer perceptron arrays applied to pattern recognition. J. Low Power Electron. Appl. 11(1), 9 (2021)
Article Google Scholar
Arka, A.I., Joardar, B.K., Doppa, J.R., Pande, P.P., Chakrabarty, K.: Performance and accuracy tradeoffs for training graph neural networks on reram-based architectures. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 29(10), 1743–1756 (2021)
Google Scholar
Bakhoda, A., Yuan, G.L., Fung, W.W., Wong, H., Aamodt, T.M.: Analyzing cuda workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174. IEEE (2009)
Google Scholar
Chakraborty, I., Roy, D., Roy, K.: Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars. IEEE Trans. Emerg. Topics Comput. Intell. 2(5), 335–344 (2018)
Article Google Scholar
Cheng, M., et al.: Time: a training-in-memory architecture for RRAM-based deep neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(5), 834–847 (2018)
Article Google Scholar
Chi, P., Li, S., Xu, C., Zhang, T., Zhao, J., Liu, Y., Wang, Y., Xie, Y.: Prime: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Comput. Architect. News 44(3), 27–39 (2016)
Article Google Scholar
Cosemans, S., et al.:Towards 10000tops/w dnn inference with analog in-memory computing-a circuit blueprint, device options and requirements. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 22–2. IEEE (2019)
Google Scholar
Du, Y., et al.: Exploring the impact of random telegraph noise-induced accuracy loss on resistive ram-based deep neural network. IEEE Trans. Electron Devices 67(8), 3335–3340 (2020)
Article Google Scholar
Fouda, M.E., Lee, S., Lee, J., Kim, G.H., Kurdahi, F., Eltawi, A.M.: Ir-qnn framework: an Ir drop-aware offline training of quantized crossbar arrays. IEEE Access 8, 228392–228408 (2020)
Article Google Scholar
Gokmen, T., Vlasov, Y.: Acceleration of deep neural network training with resistive cross-point devices: design considerations. Front. Neurosci. 10, 333 (2016)
Article Google Scholar
Grossi, A., et al.: Experimental investigation of 4-kb rram arrays programming conditions suitable for tcam. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(12), 2599–2607 (2018)
Google Scholar
Han, J., Liu, H., Wang, M., Li, Z., Zhang, Y.: Era-LSTM: an efficient ReRam-based architecture for long short-term memory. IEEE Trans. Parallel Distrib. Syst. 31(6), 1328–1342 (2019)
Article Google Scholar
Ielmini, D., Pedretti, G.: Device and circuit architectures for in-memory computing. Adv. Intell. Syst. 2(7), 2000040 (2020)
Article Google Scholar
Jain, S., Raghunathan, A.: CxDNN: hardware-software compensation methods for deep neural networks on resistive crossbar systems. ACM Trans. Embedded Comput. Syst. (TECS) 18(6), 1–23 (2019)
Article Google Scholar
Jeong, D.S., Kim, K.M., Kim, S., Choi, B.J., Hwang, C.S.: Memristors for energy-efficient new computing paradigms. Adv. Electron. Mater. 2(9), 1600090 (2016)
Article Google Scholar
Jeong, Y., Zidan, M.A., Lu, W.D.: Parasitic effect analysis in memristor-array-based neuromorphic systems. IEEE Trans. Nanotechnol. 17(1), 184–193 (2017)
Article Google Scholar
Ji, Y., Liang, L., Deng, L., Zhang, Y., Zhang, Y., Xie, Y.: Tetris: Tile-matching the tremendous irregular sparsity. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Ji, Y., et al.: Fpsa: A full system stack solution for reconfigurable reram-based nn accelerator architecture. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 733–747 (2019)
Google Scholar
Joardar, B.K., Doppa, J.R., Pande, P.P., Li, H., Chakrabarty, K.: Accured: high accuracy training of CNNs on ReRAM/GPU heterogeneous 3-D architecture. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(5), 971–984 (2020)
Article Google Scholar
Joardar, B.K., Jayakodi, N.K., Doppa, J.R., Li, H., Pande, P.P., Chakrabarty, K.: GRAMARCH: A GPU-ReRAM based heterogeneous architecture for neural image segmentation. In: 2020 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 228–233. IEEE (2020)
Google Scholar
Joardar, B.K., Li, B., Doppa, J.R., Li, H., Pande, P.P., Chakrabarty, K.: Regent: A heterogeneous ReRAM/GPU-based architecture enabled by NoC for training CNNs. In: 2019 Design, Automation and Test in Europe Conference & Exhibition (DATE), pp. 522–527. IEEE (2019)
Google Scholar
Kim, H., Jung, Y., Kim, L.S.: ADC-free ReRAM-based in-situ accelerator for energy-efficient binary neural networks. IEEE Trans. Comput. (2022)
Google Scholar
Kull, L., et al.: A 3.1 mw 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital soi cmos. IEEE J. Solid-State Circ. 48(12), 3049–3058 (2013)
Google Scholar
Laborieux, A. et al.: Low power in-memory implementation of ternary neural networks with resistive ram-based synapse. In: 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 136–140. IEEE (2020)
Google Scholar
Lee, Y.K., et al.: Matrix mapping on crossbar memory arrays with resistive interconnects and its use in in-memory compression of biosignals. Micromachines 10(5), 306 (2019)
Article MathSciNet Google Scholar
Li, B., Doppa, J.R., Pande, P.P., Chakrabarty, K., Qiu, J.X., Li, H.: 3D-ReG: A 3D ReRAM-based heterogeneous architecture for training deep neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(2), 1–24 (2020)
Article Google Scholar
Long, Y., Na, T., Mukhopadhyay, S.: ReRAM-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(12), 2781–2794 (2018)
Google Scholar
Luo, T., et al.: Dadiannao: a neural network supercomputer. IEEE Trans. Comput. 66(1), 73–88 (2016)
Article MathSciNet Google Scholar
Moreno, D.G., Del Barrio, A.A., Botella, G., Hasler, J.: A cluster of FPAAs to recognize images using neural networks. IEEE Trans. Circ. Syst. II Express Briefs 68(11), 3391–3395 (2021)
Google Scholar
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA organizations and wiring alternatives for large caches with cacti 6.0. In: 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp. 3–14. IEEE (2007)
Google Scholar
Murshed, M.S., Murphy, C., Hou, D., Khan, N., Ananthanarayanan, G., Hussain, F.: Machine learning at the network edge: a survey. ACM Comput. Surv. (CSUR) 54(8), 1–37 (2021)
Article Google Scholar
Peng, X., Huang, S., Jiang, H., Lu, A., Yu, S.: Dnn+ neurosim v2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Trans. Comput.-Aided Design of Integr. Circ. Syst. 40(11), 2306–2319 (2020)
Google Scholar
Rao, M., et al.: Learning with resistive switching neural networks. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 35–4. IEEE (2019)
Google Scholar
Shafiee, A., et al.: Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Architect. News 44(3), 14–26 (2016)
Article Google Scholar
Song, L., Qian, X., Li, H., Chen, Y.: Pipelayer: A pipelined reram-based accelerator for deep learning. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 541–552. IEEE (2017)
Google Scholar
Zhang, C., Wu, D., Sun, J., Sun, G., Luo, G., Cong, J.: Energy-efficient cnn implementation on a deeply pipelined FGPA cluster. In: Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pp. 326–331 (2016)
Google Scholar
Zhang, F., Hu, M.: Mitigate parasitic resistance in resistive crossbar-based convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(3), 1–20 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Informatics Institute, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Rafael Fão de Moura & Luigi Carro

Authors

Rafael Fão de Moura
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Carro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Fão de Moura .

Editor information

Editors and Affiliations

Università degli Studi di Sassari, Sassari, Italy
Francesca Palumbo
Aristotle University of Thessaloniki, Thessaloniki, Greece
Georgios Keramidas
University of Peloponnese, Patras, Greece
Nikolaos Voros
University of Porto, Porto, Portugal
Pedro C. Diniz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moura, R.F.d., Carro, L. (2023). Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-42921-7_16
Published: 16 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42920-0
Online ISBN: 978-3-031-42921-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture