Skip to main content

In-Memory Computing for AI Accelerators: Challenges and Solutions

  • Chapter
  • First Online:
Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing

Abstract

In-memory computing (IMC)-based hardware reduces latency as well as energy consumption for compute-intensive machine learning (ML) applications. Till date, several SRAM/ReRAM-based IMC hardware architectures to accelerate ML applications have been proposed in the literature. However, crossbar-based IMC hardware poses several design challenges. In this chapter, we first describe different machine learning algorithms adopted in the literature recently. Then, we elucidate the need for IMC-based hardware accelerators and various IMC techniques for compute-intensive ML applications. Next, we discuss the challenges associated with IMC architectures. We identify that designing an energy-efficient interconnect is extremely challenging for IMC hardware. Thereafter, we discuss different interconnect techniques for IMC architectures proposed in the literature. Finally, different performance evaluation techniques for IMC architectures are described. We conclude the chapter with a summary and future avenues for IMC architectures for ML acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, N., Krishna, T., Peh, L.S., Jha, N.K.: GARNET: A Detailed on-chip Network Model inside a Full-system Simulator. In: 2009 IEEE International Symposium on Performance Analysis of Sand Software, pp. 33–42 (2009)

    Google Scholar 

  2. Arka, A.I., Doppa, J.R., Pande, P.P., Joardar, B.K., Chakrabarty, K.: ReGraphX: NoC-enabled 3D heterogeneous ReRAM architecture for training graph neural networks. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1667–1672. IEEE (2021)

    Google Scholar 

  3. Arka, A.I., Joardar, B.K., Doppa, J.R., Pande, P.P., Chakrabarty, K.: DARe: DropLayer-aware manycore ReRAM architecture for training graph neural networks. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9 (2021)

    Google Scholar 

  4. Arka, A.I., Joardar, B.K., Doppa, J.R., Pande, P.P., Chakrabarty, K.: Performance and accuracy tradeoffs for training graph neural networks on ReRAM-based architectures. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 29(10), 1743–1756 (2021)

    Google Scholar 

  5. Bharadwaj, S., Yin, J., Beckmann, B., Krishna, T.: Kite: A family of heterogeneous interposer topologies enabled via accurate interconnect modeling. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2020)

    Google Scholar 

  6. Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News 39(2), 1–7 (2011)

    Article  Google Scholar 

  7. Chakraborty, I., Ali, M.F., Kim, D.E., Ankit, A., Roy, K.: Geniex: A generalized approach to emulating non-ideality in memristive Xbars using neural networks. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)

    Google Scholar 

  8. Charan, G., Mohanty, A., Du, X., Krishnan, G., Joshi, R.V., Cao, Y.: Accurate inference with inaccurate rram devices: A joint algorithm-design solution. IEEE J. Explor. Solid State Comput. Dev. Circuits 6(1), 27–35 (2020a)

    Google Scholar 

  9. Charan, G., et al.: Accurate inference with inaccurate RRAM devices: statistical data, model transfer, and on-line adaptation. In: DAC. IEEE (2020b)

    Google Scholar 

  10. Chen, L., et al.: Accelerator-friendly neural-network training: learning variations and defects in RRAM crossbar. In: DATE. IEEE (2017)

    Google Scholar 

  11. Chen, P.Y., Peng, X., Yu, S.: Neurosim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37(12), 3067–3080 (2018)

    Article  Google Scholar 

  12. Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid State Circ. 52(1), 127–138 (2016)

    Article  Google Scholar 

  13. Chen, Y.H., Yang, T.J., Emer, J., Sze, V.: Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circ. Syst. 9(2), 292–308 (2019)

    Article  Google Scholar 

  14. Cherupally, S.K., Meng, J., Rakin, A.S., Yin, S., Yeo, I., Yu, S., Fan, D., Seo, J.: Improving the accuracy and robustness of RRAM-based in-memory computing against RRAM hardware noise and adversarial attacks. Semicond. Sci. Technol. 37(3), 034001 (2022). https://doi.org/10.1088/1361-6641/ac461f

    Article  Google Scholar 

  15. Cherupally, S.K., Meng, J., Rakin, A.S., Yin, S., Yeo, I., Yu, S., Fan, D., Seo, J.S.: Improving the accuracy and robustness of rram-based in-memory computing against rram hardware noise and adversarial attacks. Semicond. Sci. Technol. 37(3), 034001 (2022)

    Article  Google Scholar 

  16. Chiang, W.L., Liu, X., Si, S., Li, Y., Bengio, S., Hsieh, C.J.: Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257–266 (2019)

    Google Scholar 

  17. Chih, Y.D., Lee, P.H., Fujiwara, H., Shih, Y.C., Lee, C.F., Naous, R., Chen, Y.L., Lo, C.P., Lu, C.H., Mori, H., et al.: An 89tops/w and 16.3 tops/mm 2 all-digital sram-based full-precision compute-in memory macro in 22nm for machine-learning edge applications. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64, pp. 252–254. IEEE (2021)

    Google Scholar 

  18. De Cao, N., Kipf, T.: Molgan: An implicit generative model for small molecular graphs. Preprint (2018). arXiv:1805.11973

    Google Scholar 

  19. Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013)

    Google Scholar 

  20. Dong, Q., Sinangil, M.E., Erbagci, B., Sun, D., Khwa, W.S., Liao, H.J., Wang, Y., Chang, J.: 15.3 a 351tops/w and 372.4 gops compute-in-memory sram macro in 7nm finfet cmos for machine-learning applications. In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 242–244. IEEE (2020)

    Google Scholar 

  21. Du, X., Krishnan, G., Mohanty, A., Li, Z., Charan, G., Cao, Y.: Towards efficient neural networks on-a-chip: Joint hardware-algorithm approaches. In: 2019 China Semiconductor Technology International Conference (CSTIC), pp. 1–5. IEEE (2019)

    Google Scholar 

  22. Fujiwara, H., Mori, H., Zhao, W.C., Chuang, M.C., Naous, R., Chuang, C.K., Hashizume, T., Sun, D., Lee, C.F., Akarvardar, K., et al.: A 5-nm 254-tops/w 221-tops/mm 2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage-frequency scaling and simultaneous mac and write operations. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65, pp. 1–3. IEEE (2022)

    Google Scholar 

  23. Gagniuc, P.A.: Markov Chains: From Theory to Implementation and Experimentation. Wiley (2017)

    Google Scholar 

  24. Gallicchio, C., Micheli, A.: Graph echo state networks. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)

    Google Scholar 

  25. Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization methods for efficient neural network inference. Preprint (2021). arXiv:2103.13630

    Google Scholar 

  26. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)

    Google Scholar 

  27. Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 729–734. IEEE (2005)

    Google Scholar 

  28. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Proces. Syst. 30, (2017). arXiv:1706.02216

    Google Scholar 

  29. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  30. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  31. Horowitz, M.: Computing’s energy problem (and What We Can Do About It). In: IEEE ISSCC, pp. 10–14 (2014)

    Google Scholar 

  32. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. Preprint (2017). arXiv:1704.04861

    Google Scholar 

  33. Hu, M., Li, H., Chen, Y., Wu, Q., Rose, G.S.: BSB training scheme implementation on memristor-based circuit. In: IEEE CISDA. IEEE (2013)

    Google Scholar 

  34. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  35. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. Preprint (2016). arXiv:1602.07360

    Google Scholar 

  36. Jain, S., Sengupta, A., Roy, K., Raghunathan, A.: RxNN: A framework for evaluating deep neural networks on resistive crossbars. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 40(2), 326–338 (2020)

    Article  Google Scholar 

  37. Jiang, H., Huang, S., Peng, X., Su, J.W., Chou, Y.C., Huang, W.H., Liu, T.W., Liu, R., Chang, M.F., Yu, S.: A two-way SRAM array based accelerator for deep neural network on-chip training. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)

    Google Scholar 

  38. Jiang, N., et al.: A detailed and flexible cycle-accurate network-on-chip simulator. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 86–96. IEEE (2013)

    Google Scholar 

  39. Jiang, Z., Yin, S., Seo, J.S., Seok, M.: C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism. IEEE J. Solid State Circ. 55(7), 1888–1897 (2020). https://doi.org/10.1109/JSSC.2020.2992886

    Article  Google Scholar 

  40. Joardar, B.K., Deshwal, A., Doppa, J.R., Pande, P.P., Chakrabarty, K.: High-throughput training of deep CNNs on ReRAM-based heterogeneous architectures via optimized normalization layers. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 41(5), 1537–1549 (2021)

    Article  Google Scholar 

  41. Joardar, B.K., Doppa, J.R., Pande, P.P., Li, H., Chakrabarty, K.: AccuReD: high accuracy training of CNNs on ReRAM/GPU heterogeneous 3-D architecture. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 40(5), 971–984 (2020)

    Article  Google Scholar 

  42. Joardar, B.K., Li, B., Doppa, J.R., Li, H., Pande, P.P., Chakrabarty, K.: REGENT: A heterogeneous ReRAM/GPU-based architecture enabled by NoC for training CNNs. In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 522–527. IEEE (2019)

    Google Scholar 

  43. Jordan, M.I.: Serial order: A parallel distributed processing approach. In: Advances in Psychology, vol. 121, pp. 471–495. Elsevier (1997)

    Google Scholar 

  44. Joshi, V., et al.: Accurate deep neural network inference using computational phase-change memory. Nature Communications (2020)

    Google Scholar 

  45. Kang, M., Kim, Y., Patil, A.D., Shanbhag, N.R.: Deep in-memory architectures for machine learning–accuracy versus efficiency trade-offs. IEEE Trans. Circ. Syst. I Regul. Pap. 67(5), 1627–1639 (2020)

    Article  Google Scholar 

  46. Kiasari, A.E., Lu, Z., Jantsch, A.: An analytical latency model for networks-on-chip. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(1), 113–123 (2012)

    Google Scholar 

  47. Kim, H., Yoo, T., Kim, T.T.H., Kim, B.: Colonnade: A reconfigurable sram-based digital bit-serial compute-in-memory macro for processing neural networks. IEEE J. Solid State Circ. 56(7), 2221–2233 (2021)

    Article  Google Scholar 

  48. Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39(4), 261–283 (2013)

    Article  Google Scholar 

  49. Krishnan, G., Du, X., Cao, Y.: Structural pruning in deep neural networks: A small-world approach. Preprint (2019). arXiv:1911.04453

    Google Scholar 

  50. Krishnan, G., Hazra, J., Liehr, M., Du, X., Beckmann, K., Joshi, R.V., Cady, N.C., Cao, Y.: Design limits of in-memory computing: Beyond the crossbar. In: 2021 5th IEEE Electron Devices Technology & Manufacturing Conference (EDTM), pp. 1–3. IEEE (2021)

    Google Scholar 

  51. Krishnan, G., Ma, Y., Cao, Y.: Small-world-based structural pruning for efficient fpga inference of deep neural networks. In: 2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), pp. 1–5. IEEE (2020)

    Google Scholar 

  52. Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, J.s., Ogras, U.Y., Cao, Y.: Interconnect-aware area and energy optimization for in-memory acceleration of DNNs. IEEE Des. Test 37(6), 79–87 (2020)

    Google Scholar 

  53. Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, J.S., Ogras, U.Y., Cao, Y.: Impact of on-chip interconnect on in-memory acceleration of deep neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 18(2), 1–22 (2021)

    Google Scholar 

  54. Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, J.s., Ogras, U.Y., Cao, Y.: Interconnect-centric benchmarking of in-memory acceleration for DNNs. In: 2021 China Semiconductor Technology International Conference (CSTIC), pp. 1–4. IEEE (2021)

    Google Scholar 

  55. Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, J.S., Ogras, U.Y., Cao, Y.: System-level benchmarking of chiplet-based IMC architectures for deep neural network acceleration. In: 2021 IEEE 14th International Conference on ASIC (ASICON), pp. 1–4 (2021)

    Google Scholar 

  56. Krishnan, G., Mandal, S.K., Pannala, M., Chakrabarti, C., Seo, J.S., Ogras, U.Y., Cao, Y.: SIAM: Chiplet-based scalable in-memory acceleration with mesh for deep neural networks. ACM Trans. Embed. Comput. Syst. (TECS) 20(5s), 1–24 (2021)

    Google Scholar 

  57. Krishnan, G., Sun, J., Hazra, J., Du, X., Liehr, M., Li, Z., Beckmann, K., Joshi, R.V., Cady, N.C., Cao, Y.: Robust RRAM-based in-memory computing in light of model stability. In: IRPS. IEEE (2021)

    Google Scholar 

  58. Krishnan, G., Yang, L., Sun, J., Hazra, J., Du, X., Liehr, M., Li, Z., Beckmann, K., Joshi, R., Cady, N.C., et al.: Exploring model stability of deep neural networks for reliable RRAM-based in-memory acceleration. IEEE Trans. Comput. 71(11), 2740–2752 (2022)

    Article  Google Scholar 

  59. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  60. Liehr, M., Hazra, J., Beckmann, K., Rafiq, S., Cady, N.: Impact of switching variability of 65nm CMOS integrated hafnium dioxide-based ReRAM devices on distinct level operations. In: IIRW. IEEE (2020)

    Google Scholar 

  61. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)

    Google Scholar 

  62. Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. Preprint (2015). arXiv:1506.00019

    Google Scholar 

  63. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A., Van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)

    Article  Google Scholar 

  64. Liu, B., Chen, Y., Liu, S., Kim, H.S.: Deep learning in latent space for video prediction and compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 701–710 (2021)

    Google Scholar 

  65. Liu, B., et al.: Reduction and IR-drop compensations techniques for reliable neuromorphic computing systems. In: ICCAD. IEEE (2014)

    Google Scholar 

  66. Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2018)

    Google Scholar 

  67. Liu, Z., Chen, C., Li, L., Zhou, J., Li, X., Song, L., Qi, Y.: Geniepath: Graph neural networks with adaptive receptive paths. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4424–4431 (2019)

    Google Scholar 

  68. Long, Y., She, X., Mukhopadhyay, S.: Design of reliable DNN accelerator with un-reliable ReRAM. In: DATE. IEEE (2019)

    Google Scholar 

  69. Ma, C., et al.: Go unary: A novel synapse coding and mapping scheme for reliable ReRAM-based neuromorphic computing. In: DATE. IEEE (2020)

    Google Scholar 

  70. Ma, T., Chen, J., Xiao, C.: Constrained generation of semantically valid graphs via regularizing variational autoencoders. Preprint (2018). arXiv:1809.02630

    Google Scholar 

  71. Mandal, S.K., Ayoub, R., Kishinevsky, M., Islam, M.M., Ogras, U.Y.: Analytical performance modeling of NoCs under priority arbitration and bursty traffic. IEEE Embed. Syst. Lett. 13(3), 98–101 (2020)

    Article  Google Scholar 

  72. Mandal, S.K., Ayoub, R., Kishinevsky, M., Ogras, U.Y.: Analytical performance models for NoCs with multiple priority traffic classes. ACM Trans. Embed. Comput. Syst. (TECS) 18(5s), 1–21 (2019)

    Google Scholar 

  73. Mandal, S.K., Krishnakumar, A., Ayoub, R., Kishinevsky, M., Ogras, U.Y.: Performance analysis of priority-aware NoCs with deflection routing under traffic congestion. In: Proceedings of the 39th International Conference on Computer-Aided Design, pp. 1–9 (2020)

    Google Scholar 

  74. Mandal, S.K., Krishnakumar, A., Ogras, U.Y.: Energy-efficient networks-on-chip architectures: design and run-time optimization. In: Network-on-Chip Security and Privacy, p. 55 (2021)

    Google Scholar 

  75. Mandal, S.K., Krishnan, G., Chakrabarti, C., Seo, J.S., Cao, Y., Ogras, U.Y.: A latency-optimized reconfigurable NoC for in-memory acceleration of DNNs. IEEE J. Emerg. Sel. Top. Circ. Syst. 10(3), 362–375 (2020)

    Article  Google Scholar 

  76. Mandal, S.K., Krishnan, G., Goksoy, A.A., Nair, G.R., Cao, Y., Ogras, U.Y.: COIN: Communication-aware in-memory acceleration for graph convolutional networks. IEEE J. Emerg. Sel. Top. Circ. Syst. 2(2), 472–485 (2022)

    Article  Google Scholar 

  77. Mandal, S.K., Tong, J., Ayoub, R., Kishinevsky, M., Abousamra, A., Ogras, U.Y.: Theoretical analysis and evaluation of NoCs with weighted round-robin arbitration. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9 (2021)

    Google Scholar 

  78. Mao, M., et al.: MAX2: An ReRAM-based neural network accelerator that maximizes data reuse and area utilization. IEEE J. Emerg. Sel. Top. Circ. Syst. 9(2), 398–410 (2019)

    Article  Google Scholar 

  79. Mohanty, A., et al.: Random sparse adaptation for accurate inference with inaccurate multi-level RRAM arrays. In: IEDM. IEEE (2017)

    Google Scholar 

  80. Nabavinejad, S.M., Baharloo, M., Chen, K.C., Palesi, M., Kogel, T., Ebrahimi, M.: An overview of efficient interconnection networks for deep neural network accelerators. IEEE J. Emerg. Sel. Top. Circ. Syst. 10(3), 268–282 (2020)

    Article  Google Scholar 

  81. Ogras, U.Y., Bogdan, P., Marculescu, R.: An analytical approach for network-on-chip performance analysis. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 29(12), 2001–2013 (2010)

    Article  Google Scholar 

  82. Peng, X., Huang, S., Jiang, H., Lu, A., Yu, S.: DNN+ NeuroSim V2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 40(11), 2306–2319 (2020)

    Google Scholar 

  83. Peng, X., Huang, S., Luo, Y., Sun, X., Yu, S.: DNN+ NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 32–35 (2019)

    Google Scholar 

  84. Pisner, D.A., Schnyer, D.M.: Support vector machine. In: Machine Learning, pp. 101–121. Elsevier (2020)

    Google Scholar 

  85. Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)

    Google Scholar 

  86. Rubinstein, R., Bruckstein, A.M., Elad, M.: Dictionaries for sparse representation modeling. Proc. IEEE 98(6), 1045–1057 (2010)

    Article  Google Scholar 

  87. Saikia, J., Yin, S., Cherupally, S.K., Zhang, B., Meng, J., Seok, M., Seo, J.S.: Modeling and optimization of SRAM-based in-memory computing hardware design. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 942–947. IEEE (2021)

    Google Scholar 

  88. Samajdar, A., Zhu, Y., Whatmough, P., Mattina, M., Krishna, T.: Scale-sim: systolic CNN accelerator simulator. Preprint (2018). arXiv:1811.02883

    Google Scholar 

  89. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Networks 20(1), 61–80 (2008)

    Article  Google Scholar 

  90. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  91. Seo, J.: Advances in digital vs. analog AI accelerators (2022). In: Tutorial at IEEE International Solid-State Circuits Conference (ISSCC)

    Google Scholar 

  92. Shafiee, A., et al.: ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Archit. News 44(3), 14–26 (2016)

    Article  Google Scholar 

  93. Shao, Y.S., Clemons, J., Venkatesan, R., Zimmer, B., Fojtik, M., Jiang, N., Keller, B., Klinefelter, A., Pinckney, N., Raina, P., et al.: Simba: Scaling deep-learning inference with multi-chip-module-based architecture. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 14–27 (2019)

    Google Scholar 

  94. Si, X., Chen, J.J., Tu, Y.N., Huang, W.H., Wang, J.H., Chiu, Y.C., Wei, W.C., Wu, S.Y., Sun, X., Liu, R., et al.: 24.5 a twin-8t SRAM computation-in-memory macro for multiple-bit CNN-based machine learning. In: 2019 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 396–398. IEEE (2019)

    Google Scholar 

  95. Simonovsky, M., Komodakis, N.: Graphvae: Towards generation of small graphs using variational autoencoders. In: International Conference on Artificial Neural Networks, pp. 412–422. Springer (2018)

    Google Scholar 

  96. Song, L., Qian, X., Li, H., Chen, Y.: Pipelayer: A pipelined ReRAM-based accelerator for deep learning. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 541–552 (2017)

    Google Scholar 

  97. Spetalnick, S.D., Chang, M., Crafton, B., Khwa, W.S., Chih, Y.D., Chang, M.F., Raychowdhury, A.: A 40nm 64kb 26.56 tops/w 2.37 mb/mm 2 rram binary/compute-in-memory macro with 4.23 x improvement in density and >75% use of sensing dynamic range. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65, pp. 1–3. IEEE (2022)

    Google Scholar 

  98. Su, J.W., Si, X., Chou, Y.C., Chang, T.W., Huang, W.H., Tu, Y.N., Liu, R., Lu, P.J., Liu, T.W., Wang, J.H., et al.: 15.2 a 28nm 64kb inference-training two-way transpose multibit 6t SRAM compute-in-memory macro for AI edge chips. In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 240–242. IEEE (2020)

    Google Scholar 

  99. Sun, Y., et al.: Unary coding and variation-aware optimal mapping scheme for reliable ReRAM-based neuromorphic computing. TCAD (2021)

    Google Scholar 

  100. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  101. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  102. Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., Le, Q.V.: Mnasnet: Platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)

    Google Scholar 

  103. Valavi, H., Ramadge, P.J., Nestler, E., Verma, N.: A 64-tile 2.4-mb in-memory-computing CNN accelerator employing charge-domain compute. IEEE J. Solid State Circ. 54(6), 1789–1799 (2019)

    Google Scholar 

  104. Vivet, P., Guthmuller, E., Thonnart, Y., Pillonnet, G., Fuguet, C., Miro-Panades, I., Moritz, G., Durupt, J., Bernard, C., Varreau, D., et al.: IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management. IEEE J. Solid State Circ. 56(1), 79–97 (2020)

    Article  Google Scholar 

  105. Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., Keutzer, K.: Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10734–10742 (2019)

    Google Scholar 

  106. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? Preprint (2018). arXiv:1810.00826

    Google Scholar 

  107. Yang, X., et al.: Multi-objective optimization of ReRAM crossbars for robust DNN inferencing under stochastic noise. In: ICCAD. IEEE/ACM (2021)

    Google Scholar 

  108. Yin, S., Jiang, Z., Kim, M., Gupta, T., Seok, M., Seo, J.s.: Vesti: energy-efficient in-memory computing accelerator for deep neural networks. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 28(1), 48–61 (2019)

    Google Scholar 

  109. Yin, S., Jiang, Z., Seo, J.S., Seok, M.: XNOR-SRAM: In-memory computing sram macro for binary/ternary deep neural networks. IEEE J. Solid State Circ. 55(6), 1733–1743 (2020)

    Google Scholar 

  110. Yin, S., Zhang, B., Kim, M., Saikia, J., Kwon, S., Myung, S., Kim, H., Kim, S.J., Seok, M., Seo, J.s.: Pimca: A 3.4-mb programmable in-memory computing accelerator in 28nm for on-chip DNN inference. In: 2021 Symposium on VLSI Technology, pp. 1–2. IEEE (2021)

    Google Scholar 

  111. Yue, J., Liu, Y., Yuan, Z., Feng, X., He, Y., Sun, W., Zhang, Z., Si, X., Liu, R., Wang, Z., et al.: Sticker-im: A 65 nm computing-in-memory NN processor using block-wise sparsity optimization and inter/intra-macro data reuse. IEEE J. Solid State Circ. 57(8), 2560–2573 (2022)

    Article  Google Scholar 

  112. Zhang, J., Wang, Z., Verma, N.: In-memory computation of a machine-learning classifier in a standard 6t SRAM array. IEEE J. Solid State Circ. 52(4), 915–924 (2017)

    Article  Google Scholar 

  113. Zhao, W., Cao, Y.: New generation of predictive technology model for Sub-45 nm early design exploration. IEEE Trans. Electron Dev. 53(11), 2816–2823 (2006)

    Article  Google Scholar 

  114. Zhou, C., Kadambi, P., Mattina, M., Whatmough, P.N.: Noisy machines: understanding noisy neural networks and enhancing robustness to analog hardware errors using distillation. Preprint (2020). arXiv:2001.04974

    Google Scholar 

  115. Zhou, D., Zhou, X., Zhang, W., Loy, C.C., Yi, S., Zhang, X., Ouyang, W.: Econas: Finding proxies for economical neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11396–11404 (2020)

    Google Scholar 

  116. Zhu, Z., Sun, H., Qiu, K., Xia, L., Krishnan, G., Dai, G., Niu, D., Chen, X., Hu, X.S., Cao, Y., et al.: MNSIM 2.0: A behavior-level modeling tool for memristor-based neuromorphic computing systems. In: Proceedings of the 2020 on Great Lakes Symposium on VLSI, pp. 83–88 (2020)

    Google Scholar 

  117. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Cao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, Js., Ogras, U.Y., Cao, Y. (2024). In-Memory Computing for AI Accelerators: Challenges and Solutions. In: Pasricha, S., Shafique, M. (eds) Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-19568-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19568-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19567-9

  • Online ISBN: 978-3-031-19568-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics