The Deep In-Memory Architecture (DIMA)

  • Mingu Kang
  • Sujan Gonugondla
  • Naresh R. Shanbhag


This chapter describes the Deep In-memory Architecture (DIMA) by first showing how the algorithmic data-flow of commonly used ML algorithms is well-matched to DIMA’s intrinsic architectural data-flow. This attribute of DIMA is unsurprising since it implements a highly efficient matrix-vector multiply (MVM) operation. In spite of being intrinsically analog, DIMA’s surprising robustness to PVT variations is explained through the lens of a Shannon-inspired model of computation. The various stages of analog computations in DIMA are described along with practical circuit and architectural design guidelines. Finally, circuit-aware system models of DIMA’s energy, delay, and functionality are presented and employed to evaluate DIMA’s decision-making accuracy, energy, and latency trade-offs.


Modeling Data-flow Robustness Matrix-vector multiplication Energy and accuracy trade-offs Robustness 


  1. 20.
    F. Frustaci, M. Khayatzadeh, D. Blaauw, D. Sylvester, M. Alioto, SRAM for error-tolerant applications with dynamic energy-quality management in 28 nm CMOS. IEEE J. Solid State Circuits 50(5), 1310–1323 (2015)CrossRefGoogle Scholar
  2. 21.
    F. Frustaci, D. Blaauw, D. Sylvester, M. Alioto, Approximate SRAMs with dynamic energy-quality management. IEEE Trans. Very Large Scale Integr. Syst. 24(6), 2128–2141 (2016)CrossRefGoogle Scholar
  3. 29.
    M. Kang, M.-S. Keel, N.R. Shanbhag, S. Eilert, K. Curewitz, An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 8326–8330Google Scholar
  4. 31.
    J. Zhang, Z. Wang, N. Verma, In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE J. Solid State Circuits 52(4), 915–924 (2017)CrossRefGoogle Scholar
  5. 32.
    M. Kang, S.K. Gonugondla, A. Patil, N.R. Shanbhag, A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE J. Solid State Circuits 53(2), 642–655 (2018)CrossRefGoogle Scholar
  6. 41.
    M. Kang, S.K. Gonugondla, M.-S. Keel, N.R. Shanbhag, An energy-efficient memory-based high-throughput VLSI architecture for convolutional networks, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)Google Scholar
  7. 43.
    M. Kang, E.P. Kim, M.-S. Keel, N. R. Shanbhag, Energy-efficient and high throughput sparse distributed memory architecture, in IEEE International Symposium on Circuits and Systems (ISCAS) (2015), pp. 2505–2508Google Scholar
  8. 44.
    M. Kang, S. Gonugondla, A. Patil, N. Shanbhag, A 481pJ/decision 3.4M decision/s multifunctional deep in-memory inference processor using standard 6T SRAM array. arXiv:1610.07501 (preprint, 2016)Google Scholar
  9. 45.
    J. Backus, Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs. Commun. ACM 21(8), 613–641 (1978)MathSciNetCrossRefGoogle Scholar
  10. 46.
    M. Yamaoka et al., A 300-MHz 25-μA/Mb-leakage on-chip SRAM module featuring process-variation immunity and low-leakage-active mode for mobile-phone application processor. IEEE J. Solid State Circuits 40(1), 186–194 (2005)CrossRefGoogle Scholar
  11. 47.
    J. Zhang, Z. Wang, N. Verma, A machine-learning classifier implemented in a standard 6T SRAM array, in IEEE Symposium on VLSI Circuits (VLSI Circuits) (2016), pp. 1–2Google Scholar
  12. 48.
    R.G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, T. Mudge, Near-threshold computing: reclaiming Moore’s law through energy efficient integrated circuits. Proc. IEEE 98(2), 253–266 (2010)CrossRefGoogle Scholar
  13. 49.
    K.J. Kuhn, Reducing variation in advanced logic technologies: approaches to process and design for manufacturability of nanoscale CMOS, in IEEE International Electron Devices Meeting (IEDM) (IEEE, Piscataway, 2007), pp. 471–474Google Scholar
  14. 50.
    D. Bankman, B. Murmann, An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS, in IEEE Asian Solid-State Circuits Conference (A-SSCC) (2016), pp. 21–24Google Scholar
  15. 51.
    S. Assefa, S. Shank, W. Green, M. Khater, E. Kiewra, C. Reinholm, S. Kamlapurkar, A. Rylyakov, C. Schow, F. Horst et al., A 90nm CMOS integrated nano-photonics technology for 25Gbps WDM optical communications applications, in IEEE International Electron Devices Meeting (IEDM), 2012, pp. 33–38Google Scholar
  16. 52.
    S.K. Gonugondla, M. Kang, N.R. Shanbhag, A variation-tolerant in-memory machine learning classifier via on-chip training. IEEE J. Solid State Circuits53(11), 3163–3173 (2018)CrossRefGoogle Scholar
  17. 53.
    Z.Wang, K.H. Lee, N. Verma, Overcoming computational errors in sensing platforms through embedded machine-learning kernels. IEEE Trans. Very Large Scale Integr.23(8), 1459–1470 (2015)CrossRefGoogle Scholar
  18. 54.
    M. Kang, S. Lim, S. Gonugondla, N.R. Shanbhag, An in-memory VLSI architecture for convolutional neural networks. IEEE J. Emerg. Sel. Top. Circuits Syst.8(3), 494–505 (2018)CrossRefGoogle Scholar
  19. 55.
    Y. Kim, M. Kang, L.R. Varshney, N.R. Shanbhag, Generalized water-filling for source-aware energy-efficient SRAMs. IEEE Trans. Commun.66(10), 4826–4841 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mingu Kang
    • 1
  • Sujan Gonugondla
    • 2
  • Naresh R. Shanbhag
    • 2
  1. 1.IBM T. J. Watson Research CenterOld TappanUSA
  2. 2.University of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations