Abstract
As AI techniques are increasingly adopted in various industry sectors, reducing energy consumption in Neural Network applications has become a priority for researchers. One potential solution is analog ReRAM processing, which outperforms GPU-based approaches in terms of both performance and energy consumption. However, the scalability of ReRAM-based architectures for large-scale NN applications with billions of parameters remains a major challenge. To address this issue, this paper proposes a novel GPU-ReRAM architecture that uses a heuristic approach to identify the best NN layers for ReRAM acceleration, thus enabling ReRAM to be scalable for complex NNs while significantly reducing energy consumption. The effectiveness of this approach was tested on real-world models, resulting in a meaningful 6x reduction in energy consumption without sacrificing inference accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aguirre, F.L., Gomez, N.M., Pazos, S.M., Palumbo, F., Suñé, J., Miranda, E.: Minimization of the line resistance impact on memdiode-based simulations of multilayer perceptron arrays applied to pattern recognition. J. Low Power Electron. Appl. 11(1), 9 (2021)
Arka, A.I., Joardar, B.K., Doppa, J.R., Pande, P.P., Chakrabarty, K.: Performance and accuracy tradeoffs for training graph neural networks on reram-based architectures. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 29(10), 1743–1756 (2021)
Bakhoda, A., Yuan, G.L., Fung, W.W., Wong, H., Aamodt, T.M.: Analyzing cuda workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174. IEEE (2009)
Chakraborty, I., Roy, D., Roy, K.: Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars. IEEE Trans. Emerg. Topics Comput. Intell. 2(5), 335–344 (2018)
Cheng, M., et al.: Time: a training-in-memory architecture for RRAM-based deep neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(5), 834–847 (2018)
Chi, P., Li, S., Xu, C., Zhang, T., Zhao, J., Liu, Y., Wang, Y., Xie, Y.: Prime: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Comput. Architect. News 44(3), 27–39 (2016)
Cosemans, S., et al.:Towards 10000tops/w dnn inference with analog in-memory computing-a circuit blueprint, device options and requirements. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 22–2. IEEE (2019)
Du, Y., et al.: Exploring the impact of random telegraph noise-induced accuracy loss on resistive ram-based deep neural network. IEEE Trans. Electron Devices 67(8), 3335–3340 (2020)
Fouda, M.E., Lee, S., Lee, J., Kim, G.H., Kurdahi, F., Eltawi, A.M.: Ir-qnn framework: an Ir drop-aware offline training of quantized crossbar arrays. IEEE Access 8, 228392–228408 (2020)
Gokmen, T., Vlasov, Y.: Acceleration of deep neural network training with resistive cross-point devices: design considerations. Front. Neurosci. 10, 333 (2016)
Grossi, A., et al.: Experimental investigation of 4-kb rram arrays programming conditions suitable for tcam. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(12), 2599–2607 (2018)
Han, J., Liu, H., Wang, M., Li, Z., Zhang, Y.: Era-LSTM: an efficient ReRam-based architecture for long short-term memory. IEEE Trans. Parallel Distrib. Syst. 31(6), 1328–1342 (2019)
Ielmini, D., Pedretti, G.: Device and circuit architectures for in-memory computing. Adv. Intell. Syst. 2(7), 2000040 (2020)
Jain, S., Raghunathan, A.: CxDNN: hardware-software compensation methods for deep neural networks on resistive crossbar systems. ACM Trans. Embedded Comput. Syst. (TECS) 18(6), 1–23 (2019)
Jeong, D.S., Kim, K.M., Kim, S., Choi, B.J., Hwang, C.S.: Memristors for energy-efficient new computing paradigms. Adv. Electron. Mater. 2(9), 1600090 (2016)
Jeong, Y., Zidan, M.A., Lu, W.D.: Parasitic effect analysis in memristor-array-based neuromorphic systems. IEEE Trans. Nanotechnol. 17(1), 184–193 (2017)
Ji, Y., Liang, L., Deng, L., Zhang, Y., Zhang, Y., Xie, Y.: Tetris: Tile-matching the tremendous irregular sparsity. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Ji, Y., et al.: Fpsa: A full system stack solution for reconfigurable reram-based nn accelerator architecture. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 733–747 (2019)
Joardar, B.K., Doppa, J.R., Pande, P.P., Li, H., Chakrabarty, K.: Accured: high accuracy training of CNNs on ReRAM/GPU heterogeneous 3-D architecture. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(5), 971–984 (2020)
Joardar, B.K., Jayakodi, N.K., Doppa, J.R., Li, H., Pande, P.P., Chakrabarty, K.: GRAMARCH: A GPU-ReRAM based heterogeneous architecture for neural image segmentation. In: 2020 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 228–233. IEEE (2020)
Joardar, B.K., Li, B., Doppa, J.R., Li, H., Pande, P.P., Chakrabarty, K.: Regent: A heterogeneous ReRAM/GPU-based architecture enabled by NoC for training CNNs. In: 2019 Design, Automation and Test in Europe Conference & Exhibition (DATE), pp. 522–527. IEEE (2019)
Kim, H., Jung, Y., Kim, L.S.: ADC-free ReRAM-based in-situ accelerator for energy-efficient binary neural networks. IEEE Trans. Comput. (2022)
Kull, L., et al.: A 3.1 mw 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital soi cmos. IEEE J. Solid-State Circ. 48(12), 3049–3058 (2013)
Laborieux, A. et al.: Low power in-memory implementation of ternary neural networks with resistive ram-based synapse. In: 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 136–140. IEEE (2020)
Lee, Y.K., et al.: Matrix mapping on crossbar memory arrays with resistive interconnects and its use in in-memory compression of biosignals. Micromachines 10(5), 306 (2019)
Li, B., Doppa, J.R., Pande, P.P., Chakrabarty, K., Qiu, J.X., Li, H.: 3D-ReG: A 3D ReRAM-based heterogeneous architecture for training deep neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(2), 1–24 (2020)
Long, Y., Na, T., Mukhopadhyay, S.: ReRAM-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(12), 2781–2794 (2018)
Luo, T., et al.: Dadiannao: a neural network supercomputer. IEEE Trans. Comput. 66(1), 73–88 (2016)
Moreno, D.G., Del Barrio, A.A., Botella, G., Hasler, J.: A cluster of FPAAs to recognize images using neural networks. IEEE Trans. Circ. Syst. II Express Briefs 68(11), 3391–3395 (2021)
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA organizations and wiring alternatives for large caches with cacti 6.0. In: 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp. 3–14. IEEE (2007)
Murshed, M.S., Murphy, C., Hou, D., Khan, N., Ananthanarayanan, G., Hussain, F.: Machine learning at the network edge: a survey. ACM Comput. Surv. (CSUR) 54(8), 1–37 (2021)
Peng, X., Huang, S., Jiang, H., Lu, A., Yu, S.: Dnn+ neurosim v2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Trans. Comput.-Aided Design of Integr. Circ. Syst. 40(11), 2306–2319 (2020)
Rao, M., et al.: Learning with resistive switching neural networks. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 35–4. IEEE (2019)
Shafiee, A., et al.: Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Architect. News 44(3), 14–26 (2016)
Song, L., Qian, X., Li, H., Chen, Y.: Pipelayer: A pipelined reram-based accelerator for deep learning. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 541–552. IEEE (2017)
Zhang, C., Wu, D., Sun, J., Sun, G., Luo, G., Cong, J.: Energy-efficient cnn implementation on a deeply pipelined FGPA cluster. In: Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pp. 326–331 (2016)
Zhang, F., Hu, M.: Mitigate parasitic resistance in resistive crossbar-based convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(3), 1–20 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Moura, R.F.d., Carro, L. (2023). Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-42921-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42920-0
Online ISBN: 978-3-031-42921-7
eBook Packages: Computer ScienceComputer Science (R0)