PROMISE: A DIMA-Based Accelerator

  • Mingu Kang
  • Sujan Gonugondla
  • Naresh R. Shanbhag


DIMA’s benefits have been demonstrated only for a limited set of functions in the previous chapters thereby raising the question: Can DIMA be made programmable without losing much of its energy and throughput benefits over their digital counterparts? This chapter presents a DIMA-based accelerator called PROMISE, which realizes a high level of programmability for diverse ML algorithms without noticeably losing the efficiency of mixed-signal accelerators for specific ML algorithms. PROMISE exposes instruction set mechanisms that allow software control over energy-vs-accuracy trade-offs, and supports compilation of high-level languages down to the hardware.


Analog instruction set architecture (ISA) Programmable accelerator Compiler Low level virtual machine (LLVM) 


  1. 1.
    D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  2. 7.
    P.N. Whatmough, S.K. Lee, H. Lee, S. Rama, D. Brooks, G.-Y. Wei, A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications, in IEEE International Solid-State Circuits Conference (ISSCC) (2017), pp. 242–243Google Scholar
  3. 17.
    D. Ernst, N.S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner et al., RAZOR: a low-power pipeline based on circuit-level timing speculation, in IEEE/ACM International Symposium on Microarchitecture (MICRO) (2003), pp. 7–18Google Scholar
  4. 26.
    S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, R. Das, Compute caches, in IEEE International Symposium on High Performance Computer Architecture (HPCA) (2017), pp. 481–492Google Scholar
  5. 29.
    M. Kang, M.-S. Keel, N.R. Shanbhag, S. Eilert, K. Curewitz, An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 8326–8330Google Scholar
  6. 32.
    M. Kang, S.K. Gonugondla, A. Patil, N.R. Shanbhag, A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE J. Solid State Circuits 53(2), 642–655 (2018)CrossRefGoogle Scholar
  7. 39.
    C. Sakr, Y. Kim, N. Shanbhag, Analytical guarantees on numerical precision of deep neural networks, in International Conference on Machine Learning (ICML) (2017), pp. 3007–3016Google Scholar
  8. 41.
    M. Kang, S.K. Gonugondla, M.-S. Keel, N.R. Shanbhag, An energy-efficient memory-based high-throughput VLSI architecture for convolutional networks, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)Google Scholar
  9. 44.
    M. Kang, S. Gonugondla, A. Patil, N. Shanbhag, A 481pJ/decision 3.4M decision/s multifunctional deep in-memory inference processor using standard 6T SRAM array. arXiv:1610.07501 (preprint, 2016)Google Scholar
  10. 58.
    H. Kaul, M.A. Anders, S.K. Mathew, G. Chen, S.K. Satpathy, S.K. Hsu, A. Agarwal, R.K. Krishnamurthy, A 21.5 M-query-vectors/s 3.37 nJ/vector reconfigurable k-nearest-neighbor accelerator with adaptive precision in 14nm tri-gate CMOS, in IEEE International Solid-State Circuits Conference (ISSCC) (2016), pp. 260–261Google Scholar
  11. 59.
    S. Gupta, A. Agrawal, K. Gopalakrishnan, P. Narayanan, Deep learning with limited numerical precision, in International Conference on Machine Learning (ICML) (2015), pp. 1737–1746Google Scholar
  12. 62.
    Center for biologicaland computational learning (CBCL) at MIT (2000).
  13. 64.
    Y. LeCun, C. Cortes, MNIST handwritten digit database. AT&T Labs (2010).
  14. 77.
    C. Farabet, C. Poulet, J.Y. Han, Y. LeCun, CNP: an FPGA-based processor for convolutional networks, in IEEE International Conference on Field Programmable Logic and Applications (FPL) (2009), pp. 32–37Google Scholar
  15. 98.
    P. Srivastava, M. Kang, S.K. Gonugondla, S. Lim, J. Choi, V. Adve, N.S. Kim, N. Shanbhag, PROMISE: an end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms, in Proceedings of the 45th Annual International Symposium on Computer Architecture (IEEE Press, Piscataway, 2018), pp. 43–56Google Scholar
  16. 99.
    D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, Y. Chen, PuDianNao: a polyvalent machine learning accelerator, in ACM SIGARCH Computer Architecture News, vol. 43, no. 1. (ACM, New York, 2015), pp. 369–381Google Scholar
  17. 100.
    A.V. Aho, M.S. Lam, R. Sethi, J.D. Ullman, Compilers: Principles, Techniques, and Tools, 2nd edn. (Addison-Wesley, Boston, 2006)zbMATHGoogle Scholar
  18. 101.
    R. Collobert, K. Kavukcuoglu, C. Farabet, Torch7: a Matlab-like environment for machine learning, in BigLearn, NIPS Workshop (2011)Google Scholar
  19. 105.
    F. Chollet et al. Keras (2015).
  20. 106.
    B. Murmann, D. Bankman, E. Chai, D. Miyashita, L. Yang, Mixed-signal circuits for embedded machine-learning applications, in IEEE 49th Asilomar Conference on Signals, Systems and Computers (2015), pp. 1341–1345Google Scholar
  21. 111.
    R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B.C. Lee, S. Richardson, C. Kozyrakis, M. Horowitz, Understanding sources of inefficiency in general-purpose chips, in ACM SIGARCH Computer Architecture News, vol. 38 (ACM, New York, 2010), pp. 37–47Google Scholar
  22. 112.
    ITRS, ITRS Roadmap [Online].
  23. 102.
    R. Al-Rfou, G. Alain, A. Almahairi, C. Angermüller, D. Bahdanau, N. Ballas et al., Theano: a Python framework for fast computation of mathematical expressions. CoRR, vol. abs/1605.02688 (2016)Google Scholar
  24. 103.
    M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D.G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, Tensorflow: a system for large-scale machine learning, in USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016)Google Scholar
  25. 104.
    T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, Z. Zhang, MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, vol. abs/1512.01274 (2015)Google Scholar
  26. 63.
    Production Crate, Gun shot sounds.

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mingu Kang
    • 1
  • Sujan Gonugondla
    • 2
  • Naresh R. Shanbhag
    • 2
  1. 1.IBM T. J. Watson Research CenterOld TappanUSA
  2. 2.University of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations