In-Memory Computing for AI Accelerators: Challenges and Solutions

Krishnan, Gokul; Mandal, Sumit K.; Chakrabarti, Chaitali; Seo, Jae-sun; Ogras, Umit Y.; Cao, Yu

doi:10.1007/978-3-031-19568-6_7

Gokul Krishnan³,
Sumit K. Mandal⁴,
Chaitali Chakrabarti⁵,
Jae-sun Seo⁵,
Umit Y. Ogras⁴ &
…
Yu Cao⁵

547 Accesses
1 Citations

Abstract

In-memory computing (IMC)-based hardware reduces latency as well as energy consumption for compute-intensive machine learning (ML) applications. Till date, several SRAM/ReRAM-based IMC hardware architectures to accelerate ML applications have been proposed in the literature. However, crossbar-based IMC hardware poses several design challenges. In this chapter, we first describe different machine learning algorithms adopted in the literature recently. Then, we elucidate the need for IMC-based hardware accelerators and various IMC techniques for compute-intensive ML applications. Next, we discuss the challenges associated with IMC architectures. We identify that designing an energy-efficient interconnect is extremely challenging for IMC hardware. Thereafter, we discuss different interconnect techniques for IMC architectures proposed in the literature. Finally, different performance evaluation techniques for IMC architectures are described. We conclude the chapter with a summary and future avenues for IMC architectures for ML acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, N., Krishna, T., Peh, L.S., Jha, N.K.: GARNET: A Detailed on-chip Network Model inside a Full-system Simulator. In: 2009 IEEE International Symposium on Performance Analysis of Sand Software, pp. 33–42 (2009)
Google Scholar
Arka, A.I., Doppa, J.R., Pande, P.P., Joardar, B.K., Chakrabarty, K.: ReGraphX: NoC-enabled 3D heterogeneous ReRAM architecture for training graph neural networks. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1667–1672. IEEE (2021)
Google Scholar
Arka, A.I., Joardar, B.K., Doppa, J.R., Pande, P.P., Chakrabarty, K.: DARe: DropLayer-aware manycore ReRAM architecture for training graph neural networks. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9 (2021)
Google Scholar
Arka, A.I., Joardar, B.K., Doppa, J.R., Pande, P.P., Chakrabarty, K.: Performance and accuracy tradeoffs for training graph neural networks on ReRAM-based architectures. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 29(10), 1743–1756 (2021)
Google Scholar
Bharadwaj, S., Yin, J., Beckmann, B., Krishna, T.: Kite: A family of heterogeneous interposer topologies enabled via accurate interconnect modeling. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2020)
Google Scholar
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News 39(2), 1–7 (2011)
Article Google Scholar
Chakraborty, I., Ali, M.F., Kim, D.E., Ankit, A., Roy, K.: Geniex: A generalized approach to emulating non-ideality in memristive Xbars using neural networks. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)
Google Scholar
Charan, G., Mohanty, A., Du, X., Krishnan, G., Joshi, R.V., Cao, Y.: Accurate inference with inaccurate rram devices: A joint algorithm-design solution. IEEE J. Explor. Solid State Comput. Dev. Circuits 6(1), 27–35 (2020a)
Google Scholar
Charan, G., et al.: Accurate inference with inaccurate RRAM devices: statistical data, model transfer, and on-line adaptation. In: DAC. IEEE (2020b)
Google Scholar
Chen, L., et al.: Accelerator-friendly neural-network training: learning variations and defects in RRAM crossbar. In: DATE. IEEE (2017)
Google Scholar
Chen, P.Y., Peng, X., Yu, S.: Neurosim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37(12), 3067–3080 (2018)
Article Google Scholar
Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid State Circ. 52(1), 127–138 (2016)
Article Google Scholar
Chen, Y.H., Yang, T.J., Emer, J., Sze, V.: Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circ. Syst. 9(2), 292–308 (2019)
Article Google Scholar
Cherupally, S.K., Meng, J., Rakin, A.S., Yin, S., Yeo, I., Yu, S., Fan, D., Seo, J.: Improving the accuracy and robustness of RRAM-based in-memory computing against RRAM hardware noise and adversarial attacks. Semicond. Sci. Technol. 37(3), 034001 (2022). https://doi.org/10.1088/1361-6641/ac461f
Article Google Scholar
Cherupally, S.K., Meng, J., Rakin, A.S., Yin, S., Yeo, I., Yu, S., Fan, D., Seo, J.S.: Improving the accuracy and robustness of rram-based in-memory computing against rram hardware noise and adversarial attacks. Semicond. Sci. Technol. 37(3), 034001 (2022)
Article Google Scholar
Chiang, W.L., Liu, X., Si, S., Li, Y., Bengio, S., Hsieh, C.J.: Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257–266 (2019)
Google Scholar
Chih, Y.D., Lee, P.H., Fujiwara, H., Shih, Y.C., Lee, C.F., Naous, R., Chen, Y.L., Lo, C.P., Lu, C.H., Mori, H., et al.: An 89tops/w and 16.3 tops/mm 2 all-digital sram-based full-precision compute-in memory macro in 22nm for machine-learning edge applications. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64, pp. 252–254. IEEE (2021)
Google Scholar
De Cao, N., Kipf, T.: Molgan: An implicit generative model for small molecular graphs. Preprint (2018). arXiv:1805.11973
Google Scholar
Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013)
Google Scholar
Dong, Q., Sinangil, M.E., Erbagci, B., Sun, D., Khwa, W.S., Liao, H.J., Wang, Y., Chang, J.: 15.3 a 351tops/w and 372.4 gops compute-in-memory sram macro in 7nm finfet cmos for machine-learning applications. In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 242–244. IEEE (2020)
Google Scholar
Du, X., Krishnan, G., Mohanty, A., Li, Z., Charan, G., Cao, Y.: Towards efficient neural networks on-a-chip: Joint hardware-algorithm approaches. In: 2019 China Semiconductor Technology International Conference (CSTIC), pp. 1–5. IEEE (2019)
Google Scholar
Fujiwara, H., Mori, H., Zhao, W.C., Chuang, M.C., Naous, R., Chuang, C.K., Hashizume, T., Sun, D., Lee, C.F., Akarvardar, K., et al.: A 5-nm 254-tops/w 221-tops/mm 2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage-frequency scaling and simultaneous mac and write operations. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65, pp. 1–3. IEEE (2022)
Google Scholar
Gagniuc, P.A.: Markov Chains: From Theory to Implementation and Experimentation. Wiley (2017)
Google Scholar
Gallicchio, C., Micheli, A.: Graph echo state networks. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)
Google Scholar
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization methods for efficient neural network inference. Preprint (2021). arXiv:2103.13630
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Google Scholar
Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 729–734. IEEE (2005)
Google Scholar
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Proces. Syst. 30, (2017). arXiv:1706.02216
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Article Google Scholar
Horowitz, M.: Computing’s energy problem (and What We Can Do About It). In: IEEE ISSCC, pp. 10–14 (2014)
Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. Preprint (2017). arXiv:1704.04861
Google Scholar
Hu, M., Li, H., Chen, Y., Wu, Q., Rose, G.S.: BSB training scheme implementation on memristor-based circuit. In: IEEE CISDA. IEEE (2013)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. Preprint (2016). arXiv:1602.07360
Google Scholar
Jain, S., Sengupta, A., Roy, K., Raghunathan, A.: RxNN: A framework for evaluating deep neural networks on resistive crossbars. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 40(2), 326–338 (2020)
Article Google Scholar
Jiang, H., Huang, S., Peng, X., Su, J.W., Chou, Y.C., Huang, W.H., Liu, T.W., Liu, R., Chang, M.F., Yu, S.: A two-way SRAM array based accelerator for deep neural network on-chip training. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)
Google Scholar
Jiang, N., et al.: A detailed and flexible cycle-accurate network-on-chip simulator. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 86–96. IEEE (2013)
Google Scholar
Jiang, Z., Yin, S., Seo, J.S., Seok, M.: C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism. IEEE J. Solid State Circ. 55(7), 1888–1897 (2020). https://doi.org/10.1109/JSSC.2020.2992886
Article Google Scholar
Joardar, B.K., Deshwal, A., Doppa, J.R., Pande, P.P., Chakrabarty, K.: High-throughput training of deep CNNs on ReRAM-based heterogeneous architectures via optimized normalization layers. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 41(5), 1537–1549 (2021)
Article Google Scholar
Joardar, B.K., Doppa, J.R., Pande, P.P., Li, H., Chakrabarty, K.: AccuReD: high accuracy training of CNNs on ReRAM/GPU heterogeneous 3-D architecture. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 40(5), 971–984 (2020)
Article Google Scholar
Joardar, B.K., Li, B., Doppa, J.R., Li, H., Pande, P.P., Chakrabarty, K.: REGENT: A heterogeneous ReRAM/GPU-based architecture enabled by NoC for training CNNs. In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 522–527. IEEE (2019)
Google Scholar
Jordan, M.I.: Serial order: A parallel distributed processing approach. In: Advances in Psychology, vol. 121, pp. 471–495. Elsevier (1997)
Google Scholar
Joshi, V., et al.: Accurate deep neural network inference using computational phase-change memory. Nature Communications (2020)
Google Scholar
Kang, M., Kim, Y., Patil, A.D., Shanbhag, N.R.: Deep in-memory architectures for machine learning–accuracy versus efficiency trade-offs. IEEE Trans. Circ. Syst. I Regul. Pap. 67(5), 1627–1639 (2020)
Article Google Scholar
Kiasari, A.E., Lu, Z., Jantsch, A.: An analytical latency model for networks-on-chip. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(1), 113–123 (2012)
Google Scholar
Kim, H., Yoo, T., Kim, T.T.H., Kim, B.: Colonnade: A reconfigurable sram-based digital bit-serial compute-in-memory macro for processing neural networks. IEEE J. Solid State Circ. 56(7), 2221–2233 (2021)
Article Google Scholar
Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39(4), 261–283 (2013)
Article Google Scholar
Krishnan, G., Du, X., Cao, Y.: Structural pruning in deep neural networks: A small-world approach. Preprint (2019). arXiv:1911.04453
Google Scholar
Krishnan, G., Hazra, J., Liehr, M., Du, X., Beckmann, K., Joshi, R.V., Cady, N.C., Cao, Y.: Design limits of in-memory computing: Beyond the crossbar. In: 2021 5th IEEE Electron Devices Technology & Manufacturing Conference (EDTM), pp. 1–3. IEEE (2021)
Google Scholar
Krishnan, G., Ma, Y., Cao, Y.: Small-world-based structural pruning for efficient fpga inference of deep neural networks. In: 2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), pp. 1–5. IEEE (2020)
Google Scholar
Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, J.s., Ogras, U.Y., Cao, Y.: Interconnect-aware area and energy optimization for in-memory acceleration of DNNs. IEEE Des. Test 37(6), 79–87 (2020)
Google Scholar
Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, J.S., Ogras, U.Y., Cao, Y.: Impact of on-chip interconnect on in-memory acceleration of deep neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 18(2), 1–22 (2021)
Google Scholar
Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, J.s., Ogras, U.Y., Cao, Y.: Interconnect-centric benchmarking of in-memory acceleration for DNNs. In: 2021 China Semiconductor Technology International Conference (CSTIC), pp. 1–4. IEEE (2021)
Google Scholar
Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, J.S., Ogras, U.Y., Cao, Y.: System-level benchmarking of chiplet-based IMC architectures for deep neural network acceleration. In: 2021 IEEE 14th International Conference on ASIC (ASICON), pp. 1–4 (2021)
Google Scholar
Krishnan, G., Mandal, S.K., Pannala, M., Chakrabarti, C., Seo, J.S., Ogras, U.Y., Cao, Y.: SIAM: Chiplet-based scalable in-memory acceleration with mesh for deep neural networks. ACM Trans. Embed. Comput. Syst. (TECS) 20(5s), 1–24 (2021)
Google Scholar
Krishnan, G., Sun, J., Hazra, J., Du, X., Liehr, M., Li, Z., Beckmann, K., Joshi, R.V., Cady, N.C., Cao, Y.: Robust RRAM-based in-memory computing in light of model stability. In: IRPS. IEEE (2021)
Google Scholar
Krishnan, G., Yang, L., Sun, J., Hazra, J., Du, X., Liehr, M., Li, Z., Beckmann, K., Joshi, R., Cady, N.C., et al.: Exploring model stability of deep neural networks for reliable RRAM-based in-memory acceleration. IEEE Trans. Comput. 71(11), 2740–2752 (2022)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Liehr, M., Hazra, J., Beckmann, K., Rafiq, S., Cady, N.: Impact of switching variability of 65nm CMOS integrated hafnium dioxide-based ReRAM devices on distinct level operations. In: IIRW. IEEE (2020)
Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Google Scholar
Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. Preprint (2015). arXiv:1506.00019
Google Scholar
Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A., Van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Article Google Scholar
Liu, B., Chen, Y., Liu, S., Kim, H.S.: Deep learning in latent space for video prediction and compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 701–710 (2021)
Google Scholar
Liu, B., et al.: Reduction and IR-drop compensations techniques for reliable neuromorphic computing systems. In: ICCAD. IEEE (2014)
Google Scholar
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2018)
Google Scholar
Liu, Z., Chen, C., Li, L., Zhou, J., Li, X., Song, L., Qi, Y.: Geniepath: Graph neural networks with adaptive receptive paths. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4424–4431 (2019)
Google Scholar
Long, Y., She, X., Mukhopadhyay, S.: Design of reliable DNN accelerator with un-reliable ReRAM. In: DATE. IEEE (2019)
Google Scholar
Ma, C., et al.: Go unary: A novel synapse coding and mapping scheme for reliable ReRAM-based neuromorphic computing. In: DATE. IEEE (2020)
Google Scholar
Ma, T., Chen, J., Xiao, C.: Constrained generation of semantically valid graphs via regularizing variational autoencoders. Preprint (2018). arXiv:1809.02630
Google Scholar
Mandal, S.K., Ayoub, R., Kishinevsky, M., Islam, M.M., Ogras, U.Y.: Analytical performance modeling of NoCs under priority arbitration and bursty traffic. IEEE Embed. Syst. Lett. 13(3), 98–101 (2020)
Article Google Scholar
Mandal, S.K., Ayoub, R., Kishinevsky, M., Ogras, U.Y.: Analytical performance models for NoCs with multiple priority traffic classes. ACM Trans. Embed. Comput. Syst. (TECS) 18(5s), 1–21 (2019)
Google Scholar
Mandal, S.K., Krishnakumar, A., Ayoub, R., Kishinevsky, M., Ogras, U.Y.: Performance analysis of priority-aware NoCs with deflection routing under traffic congestion. In: Proceedings of the 39th International Conference on Computer-Aided Design, pp. 1–9 (2020)
Google Scholar
Mandal, S.K., Krishnakumar, A., Ogras, U.Y.: Energy-efficient networks-on-chip architectures: design and run-time optimization. In: Network-on-Chip Security and Privacy, p. 55 (2021)
Google Scholar
Mandal, S.K., Krishnan, G., Chakrabarti, C., Seo, J.S., Cao, Y., Ogras, U.Y.: A latency-optimized reconfigurable NoC for in-memory acceleration of DNNs. IEEE J. Emerg. Sel. Top. Circ. Syst. 10(3), 362–375 (2020)
Article Google Scholar
Mandal, S.K., Krishnan, G., Goksoy, A.A., Nair, G.R., Cao, Y., Ogras, U.Y.: COIN: Communication-aware in-memory acceleration for graph convolutional networks. IEEE J. Emerg. Sel. Top. Circ. Syst. 2(2), 472–485 (2022)
Article Google Scholar
Mandal, S.K., Tong, J., Ayoub, R., Kishinevsky, M., Abousamra, A., Ogras, U.Y.: Theoretical analysis and evaluation of NoCs with weighted round-robin arbitration. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9 (2021)
Google Scholar
Mao, M., et al.: MAX2: An ReRAM-based neural network accelerator that maximizes data reuse and area utilization. IEEE J. Emerg. Sel. Top. Circ. Syst. 9(2), 398–410 (2019)
Article Google Scholar
Mohanty, A., et al.: Random sparse adaptation for accurate inference with inaccurate multi-level RRAM arrays. In: IEDM. IEEE (2017)
Google Scholar
Nabavinejad, S.M., Baharloo, M., Chen, K.C., Palesi, M., Kogel, T., Ebrahimi, M.: An overview of efficient interconnection networks for deep neural network accelerators. IEEE J. Emerg. Sel. Top. Circ. Syst. 10(3), 268–282 (2020)
Article Google Scholar
Ogras, U.Y., Bogdan, P., Marculescu, R.: An analytical approach for network-on-chip performance analysis. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 29(12), 2001–2013 (2010)
Article Google Scholar
Peng, X., Huang, S., Jiang, H., Lu, A., Yu, S.: DNN+ NeuroSim V2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 40(11), 2306–2319 (2020)
Google Scholar
Peng, X., Huang, S., Luo, Y., Sun, X., Yu, S.: DNN+ NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 32–35 (2019)
Google Scholar
Pisner, D.A., Schnyer, D.M.: Support vector machine. In: Machine Learning, pp. 101–121. Elsevier (2020)
Google Scholar
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)
Google Scholar
Rubinstein, R., Bruckstein, A.M., Elad, M.: Dictionaries for sparse representation modeling. Proc. IEEE 98(6), 1045–1057 (2010)
Article Google Scholar
Saikia, J., Yin, S., Cherupally, S.K., Zhang, B., Meng, J., Seok, M., Seo, J.S.: Modeling and optimization of SRAM-based in-memory computing hardware design. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 942–947. IEEE (2021)
Google Scholar
Samajdar, A., Zhu, Y., Whatmough, P., Mattina, M., Krishna, T.: Scale-sim: systolic CNN accelerator simulator. Preprint (2018). arXiv:1811.02883
Google Scholar
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Networks 20(1), 61–80 (2008)
Article Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Seo, J.: Advances in digital vs. analog AI accelerators (2022). In: Tutorial at IEEE International Solid-State Circuits Conference (ISSCC)
Google Scholar
Shafiee, A., et al.: ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Archit. News 44(3), 14–26 (2016)
Article Google Scholar
Shao, Y.S., Clemons, J., Venkatesan, R., Zimmer, B., Fojtik, M., Jiang, N., Keller, B., Klinefelter, A., Pinckney, N., Raina, P., et al.: Simba: Scaling deep-learning inference with multi-chip-module-based architecture. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 14–27 (2019)
Google Scholar
Si, X., Chen, J.J., Tu, Y.N., Huang, W.H., Wang, J.H., Chiu, Y.C., Wei, W.C., Wu, S.Y., Sun, X., Liu, R., et al.: 24.5 a twin-8t SRAM computation-in-memory macro for multiple-bit CNN-based machine learning. In: 2019 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 396–398. IEEE (2019)
Google Scholar
Simonovsky, M., Komodakis, N.: Graphvae: Towards generation of small graphs using variational autoencoders. In: International Conference on Artificial Neural Networks, pp. 412–422. Springer (2018)
Google Scholar
Song, L., Qian, X., Li, H., Chen, Y.: Pipelayer: A pipelined ReRAM-based accelerator for deep learning. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 541–552 (2017)
Google Scholar
Spetalnick, S.D., Chang, M., Crafton, B., Khwa, W.S., Chih, Y.D., Chang, M.F., Raychowdhury, A.: A 40nm 64kb 26.56 tops/w 2.37 mb/mm 2 rram binary/compute-in-memory macro with 4.23 x improvement in density and >75% use of sensing dynamic range. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65, pp. 1–3. IEEE (2022)
Google Scholar
Su, J.W., Si, X., Chou, Y.C., Chang, T.W., Huang, W.H., Tu, Y.N., Liu, R., Lu, P.J., Liu, T.W., Wang, J.H., et al.: 15.2 a 28nm 64kb inference-training two-way transpose multibit 6t SRAM compute-in-memory macro for AI edge chips. In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 240–242. IEEE (2020)
Google Scholar
Sun, Y., et al.: Unary coding and variation-aware optimal mapping scheme for reliable ReRAM-based neuromorphic computing. TCAD (2021)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., Le, Q.V.: Mnasnet: Platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)
Google Scholar
Valavi, H., Ramadge, P.J., Nestler, E., Verma, N.: A 64-tile 2.4-mb in-memory-computing CNN accelerator employing charge-domain compute. IEEE J. Solid State Circ. 54(6), 1789–1799 (2019)
Google Scholar
Vivet, P., Guthmuller, E., Thonnart, Y., Pillonnet, G., Fuguet, C., Miro-Panades, I., Moritz, G., Durupt, J., Bernard, C., Varreau, D., et al.: IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management. IEEE J. Solid State Circ. 56(1), 79–97 (2020)
Article Google Scholar
Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., Keutzer, K.: Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10734–10742 (2019)
Google Scholar
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? Preprint (2018). arXiv:1810.00826
Google Scholar
Yang, X., et al.: Multi-objective optimization of ReRAM crossbars for robust DNN inferencing under stochastic noise. In: ICCAD. IEEE/ACM (2021)
Google Scholar
Yin, S., Jiang, Z., Kim, M., Gupta, T., Seok, M., Seo, J.s.: Vesti: energy-efficient in-memory computing accelerator for deep neural networks. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 28(1), 48–61 (2019)
Google Scholar
Yin, S., Jiang, Z., Seo, J.S., Seok, M.: XNOR-SRAM: In-memory computing sram macro for binary/ternary deep neural networks. IEEE J. Solid State Circ. 55(6), 1733–1743 (2020)
Google Scholar
Yin, S., Zhang, B., Kim, M., Saikia, J., Kwon, S., Myung, S., Kim, H., Kim, S.J., Seok, M., Seo, J.s.: Pimca: A 3.4-mb programmable in-memory computing accelerator in 28nm for on-chip DNN inference. In: 2021 Symposium on VLSI Technology, pp. 1–2. IEEE (2021)
Google Scholar
Yue, J., Liu, Y., Yuan, Z., Feng, X., He, Y., Sun, W., Zhang, Z., Si, X., Liu, R., Wang, Z., et al.: Sticker-im: A 65 nm computing-in-memory NN processor using block-wise sparsity optimization and inter/intra-macro data reuse. IEEE J. Solid State Circ. 57(8), 2560–2573 (2022)
Article Google Scholar
Zhang, J., Wang, Z., Verma, N.: In-memory computation of a machine-learning classifier in a standard 6t SRAM array. IEEE J. Solid State Circ. 52(4), 915–924 (2017)
Article Google Scholar
Zhao, W., Cao, Y.: New generation of predictive technology model for Sub-45 nm early design exploration. IEEE Trans. Electron Dev. 53(11), 2816–2823 (2006)
Article Google Scholar
Zhou, C., Kadambi, P., Mattina, M., Whatmough, P.N.: Noisy machines: understanding noisy neural networks and enhancing robustness to analog hardware errors using distillation. Preprint (2020). arXiv:2001.04974
Google Scholar
Zhou, D., Zhou, X., Zhang, W., Loy, C.C., Yi, S., Zhang, X., Ouyang, W.: Econas: Finding proxies for economical neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11396–11404 (2020)
Google Scholar
Zhu, Z., Sun, H., Qiu, K., Xia, L., Krishnan, G., Dai, G., Niu, D., Chen, X., Hu, X.S., Cao, Y., et al.: MNSIM 2.0: A behavior-level modeling tool for memristor-based neuromorphic computing systems. In: Proceedings of the 2020 on Great Lakes Symposium on VLSI, pp. 83–88 (2020)
Google Scholar
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ, USA
Gokul Krishnan
University of Wisconsin-Madison, Madison, WI, USA
Sumit K. Mandal & Umit Y. Ogras
School of ECEE, Arizona State University, Tempe, AZ, USA
Chaitali Chakrabarti, Jae-sun Seo & Yu Cao

Authors

Gokul Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Sumit K. Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Chaitali Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar
Jae-sun Seo
View author publications
You can also search for this author in PubMed Google Scholar
Umit Y. Ogras
View author publications
You can also search for this author in PubMed Google Scholar
Yu Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Cao .

Editor information

Editors and Affiliations

Colorado State University, Fort Collins, CO, USA
Sudeep Pasricha
New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
Muhammad Shafique

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Krishnan, G., Mandal, S.K., Chakrabarti, C., Seo, Js., Ogras, U.Y., Cao, Y. (2024). In-Memory Computing for AI Accelerators: Challenges and Solutions. In: Pasricha, S., Shafique, M. (eds) Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-19568-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-19568-6_7
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19567-9
Online ISBN: 978-3-031-19568-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics