Skip to main content

Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14251))

Included in the following conference series:

Abstract

As AI techniques are increasingly adopted in various industry sectors, reducing energy consumption in Neural Network applications has become a priority for researchers. One potential solution is analog ReRAM processing, which outperforms GPU-based approaches in terms of both performance and energy consumption. However, the scalability of ReRAM-based architectures for large-scale NN applications with billions of parameters remains a major challenge. To address this issue, this paper proposes a novel GPU-ReRAM architecture that uses a heuristic approach to identify the best NN layers for ReRAM acceleration, thus enabling ReRAM to be scalable for complex NNs while significantly reducing energy consumption. The effectiveness of this approach was tested on real-world models, resulting in a meaningful 6x reduction in energy consumption without sacrificing inference accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aguirre, F.L., Gomez, N.M., Pazos, S.M., Palumbo, F., Suñé, J., Miranda, E.: Minimization of the line resistance impact on memdiode-based simulations of multilayer perceptron arrays applied to pattern recognition. J. Low Power Electron. Appl. 11(1), 9 (2021)

    Article  Google Scholar 

  2. Arka, A.I., Joardar, B.K., Doppa, J.R., Pande, P.P., Chakrabarty, K.: Performance and accuracy tradeoffs for training graph neural networks on reram-based architectures. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 29(10), 1743–1756 (2021)

    Google Scholar 

  3. Bakhoda, A., Yuan, G.L., Fung, W.W., Wong, H., Aamodt, T.M.: Analyzing cuda workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174. IEEE (2009)

    Google Scholar 

  4. Chakraborty, I., Roy, D., Roy, K.: Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars. IEEE Trans. Emerg. Topics Comput. Intell. 2(5), 335–344 (2018)

    Article  Google Scholar 

  5. Cheng, M., et al.: Time: a training-in-memory architecture for RRAM-based deep neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(5), 834–847 (2018)

    Article  Google Scholar 

  6. Chi, P., Li, S., Xu, C., Zhang, T., Zhao, J., Liu, Y., Wang, Y., Xie, Y.: Prime: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Comput. Architect. News 44(3), 27–39 (2016)

    Article  Google Scholar 

  7. Cosemans, S., et al.:Towards 10000tops/w dnn inference with analog in-memory computing-a circuit blueprint, device options and requirements. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 22–2. IEEE (2019)

    Google Scholar 

  8. Du, Y., et al.: Exploring the impact of random telegraph noise-induced accuracy loss on resistive ram-based deep neural network. IEEE Trans. Electron Devices 67(8), 3335–3340 (2020)

    Article  Google Scholar 

  9. Fouda, M.E., Lee, S., Lee, J., Kim, G.H., Kurdahi, F., Eltawi, A.M.: Ir-qnn framework: an Ir drop-aware offline training of quantized crossbar arrays. IEEE Access 8, 228392–228408 (2020)

    Article  Google Scholar 

  10. Gokmen, T., Vlasov, Y.: Acceleration of deep neural network training with resistive cross-point devices: design considerations. Front. Neurosci. 10, 333 (2016)

    Article  Google Scholar 

  11. Grossi, A., et al.: Experimental investigation of 4-kb rram arrays programming conditions suitable for tcam. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(12), 2599–2607 (2018)

    Google Scholar 

  12. Han, J., Liu, H., Wang, M., Li, Z., Zhang, Y.: Era-LSTM: an efficient ReRam-based architecture for long short-term memory. IEEE Trans. Parallel Distrib. Syst. 31(6), 1328–1342 (2019)

    Article  Google Scholar 

  13. Ielmini, D., Pedretti, G.: Device and circuit architectures for in-memory computing. Adv. Intell. Syst. 2(7), 2000040 (2020)

    Article  Google Scholar 

  14. Jain, S., Raghunathan, A.: CxDNN: hardware-software compensation methods for deep neural networks on resistive crossbar systems. ACM Trans. Embedded Comput. Syst. (TECS) 18(6), 1–23 (2019)

    Article  Google Scholar 

  15. Jeong, D.S., Kim, K.M., Kim, S., Choi, B.J., Hwang, C.S.: Memristors for energy-efficient new computing paradigms. Adv. Electron. Mater. 2(9), 1600090 (2016)

    Article  Google Scholar 

  16. Jeong, Y., Zidan, M.A., Lu, W.D.: Parasitic effect analysis in memristor-array-based neuromorphic systems. IEEE Trans. Nanotechnol. 17(1), 184–193 (2017)

    Article  Google Scholar 

  17. Ji, Y., Liang, L., Deng, L., Zhang, Y., Zhang, Y., Xie, Y.: Tetris: Tile-matching the tremendous irregular sparsity. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  18. Ji, Y., et al.: Fpsa: A full system stack solution for reconfigurable reram-based nn accelerator architecture. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 733–747 (2019)

    Google Scholar 

  19. Joardar, B.K., Doppa, J.R., Pande, P.P., Li, H., Chakrabarty, K.: Accured: high accuracy training of CNNs on ReRAM/GPU heterogeneous 3-D architecture. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(5), 971–984 (2020)

    Article  Google Scholar 

  20. Joardar, B.K., Jayakodi, N.K., Doppa, J.R., Li, H., Pande, P.P., Chakrabarty, K.: GRAMARCH: A GPU-ReRAM based heterogeneous architecture for neural image segmentation. In: 2020 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 228–233. IEEE (2020)

    Google Scholar 

  21. Joardar, B.K., Li, B., Doppa, J.R., Li, H., Pande, P.P., Chakrabarty, K.: Regent: A heterogeneous ReRAM/GPU-based architecture enabled by NoC for training CNNs. In: 2019 Design, Automation and Test in Europe Conference & Exhibition (DATE), pp. 522–527. IEEE (2019)

    Google Scholar 

  22. Kim, H., Jung, Y., Kim, L.S.: ADC-free ReRAM-based in-situ accelerator for energy-efficient binary neural networks. IEEE Trans. Comput. (2022)

    Google Scholar 

  23. Kull, L., et al.: A 3.1 mw 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital soi cmos. IEEE J. Solid-State Circ. 48(12), 3049–3058 (2013)

    Google Scholar 

  24. Laborieux, A. et al.: Low power in-memory implementation of ternary neural networks with resistive ram-based synapse. In: 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 136–140. IEEE (2020)

    Google Scholar 

  25. Lee, Y.K., et al.: Matrix mapping on crossbar memory arrays with resistive interconnects and its use in in-memory compression of biosignals. Micromachines 10(5), 306 (2019)

    Article  MathSciNet  Google Scholar 

  26. Li, B., Doppa, J.R., Pande, P.P., Chakrabarty, K., Qiu, J.X., Li, H.: 3D-ReG: A 3D ReRAM-based heterogeneous architecture for training deep neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(2), 1–24 (2020)

    Article  Google Scholar 

  27. Long, Y., Na, T., Mukhopadhyay, S.: ReRAM-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(12), 2781–2794 (2018)

    Google Scholar 

  28. Luo, T., et al.: Dadiannao: a neural network supercomputer. IEEE Trans. Comput. 66(1), 73–88 (2016)

    Article  MathSciNet  Google Scholar 

  29. Moreno, D.G., Del Barrio, A.A., Botella, G., Hasler, J.: A cluster of FPAAs to recognize images using neural networks. IEEE Trans. Circ. Syst. II Express Briefs 68(11), 3391–3395 (2021)

    Google Scholar 

  30. Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA organizations and wiring alternatives for large caches with cacti 6.0. In: 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp. 3–14. IEEE (2007)

    Google Scholar 

  31. Murshed, M.S., Murphy, C., Hou, D., Khan, N., Ananthanarayanan, G., Hussain, F.: Machine learning at the network edge: a survey. ACM Comput. Surv. (CSUR) 54(8), 1–37 (2021)

    Article  Google Scholar 

  32. Peng, X., Huang, S., Jiang, H., Lu, A., Yu, S.: Dnn+ neurosim v2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Trans. Comput.-Aided Design of Integr. Circ. Syst. 40(11), 2306–2319 (2020)

    Google Scholar 

  33. Rao, M., et al.: Learning with resistive switching neural networks. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 35–4. IEEE (2019)

    Google Scholar 

  34. Shafiee, A., et al.: Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Architect. News 44(3), 14–26 (2016)

    Article  Google Scholar 

  35. Song, L., Qian, X., Li, H., Chen, Y.: Pipelayer: A pipelined reram-based accelerator for deep learning. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 541–552. IEEE (2017)

    Google Scholar 

  36. Zhang, C., Wu, D., Sun, J., Sun, G., Luo, G., Cong, J.: Energy-efficient cnn implementation on a deeply pipelined FGPA cluster. In: Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pp. 326–331 (2016)

    Google Scholar 

  37. Zhang, F., Hu, M.: Mitigate parasitic resistance in resistive crossbar-based convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(3), 1–20 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Fão de Moura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moura, R.F.d., Carro, L. (2023). Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42921-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42920-0

  • Online ISBN: 978-3-031-42921-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics