Skip to main content

Architectures for Machine Learning

  • Living reference work entry
  • First Online:
Handbook of Computer Architecture

Abstract

The term “artificial intelligence (AI)” was coined in 1956, and its development has undergone periods of extreme hype and periods of strong disillusionment since then. Today, AI has received tremendous attention from both academia and industry, and it will remain one of the hottest topics in the foreseeable future. A subset of AI named machine learning (ML) has achieved great success throughout a huge variety of fields, such as computer vision, natural language processing, and computer gaming. ML was first proposed to endow machine the ability to imitate the learning process of the human brain using neuromorphic models. However, the modelling complexity and limited computing capabilities of machines hindered the development of ML in its early days. Benefiting from the ever-growing computing power and availability of digital data, ML has adopted both bio-inspired spiking neural network (SNN), or neuromorphic computing, and practical artificial neural network (ANN), which have become two of the top trending methods with outstanding results.

This chapter gives a brief overview of the state-of-the-art architectures and circuits for ML. On the one hand, neuromorphic computing architectures and accelerators are investigated, including bio-inspired computational models and learning methods, microarchitecture, circuit-level design considerations, and prominent neuromorphic chips. On the other hand, architectures for ANNs are outlined, including essential design metrics on ANN accelerators and various state-of-the-art ANN architectures and circuits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Akopyan F, Sawada J, Cassidy A, Alvarez-Icaza R, Arthur J, Merolla P, Imam N, Nakamura Y, Datta P, Nam GJ, Taba B (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans Comput-Aided Des Integr Circuits Syst 34(10):1537–1557

    Article  Google Scholar 

  • Albericio J, Judd P, Hetherington T, Aamodt T, Jerger NE, Moshovos A (2016) Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput Archit News 44(3):1–13

    Article  Google Scholar 

  • Albericio J, Delmás A, Judd P, Sharify S, O’Leary G, Genov R, Moshovos A (2017) Bit-pragmatic deep neural network computing. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp 382–394

    Google Scholar 

  • Amravati A, Nasir SB, Thangadurai S, Yoon I, Raychowdhury A (2018) A 55nm time-domain mixed-signal neuromorphic accelerator with stochastic synapses and embedded reinforcement learning for autonomous micro-robots. In: 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, pp 124–126

    Google Scholar 

  • Anwani N, Rajendran B (2015) Normad-normalized approximate descent based supervised learning rule for spiking neurons. In 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

    Google Scholar 

  • Azizimazreah A, Chen L (2019) Shortcut mining: exploiting cross-layer shortcut reuse in dcnn accelerators. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 94–105

    Google Scholar 

  • Benjamin BV, Gao P, McQuinn E, Choudhary S, Chandrasekaran AR, Bussat JM, Alvarez-Icaza R, Arthur JV, Merolla PA, Boahen K (2014) Neurogrid: a mixed-analog-digital multichip system for large-scale neural simulations. Proc IEEE 102(5):699–716

    Article  Google Scholar 

  • Berdan R, Marukame T, Kabuyanagi S, Ota K, Saitoh M, Fujii S (2019) In-memory reinforcement learning with moderatelystochastic conductance switching of ferroelectric tunnel junctions. In: Proceeding Symposium on VLSI Technology, pp 22–23

    Google Scholar 

  • Bi GQ, Poo MM (1998) Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci 18(24):10464–10472

    Article  Google Scholar 

  • Bo D et al OR-ML: enhancing reliability for machine learning accelerator with opportunistic redundancy. In: 2021 IEEE Design, Automation and Test in Europe Conference (DATE) (2021)

    Google Scholar 

  • Bohte SM, Kok JN, La Poutre H (2002) Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48(1–4):17–37

    Article  MATH  Google Scholar 

  • Brader JM, Senn W, Fusi S (2007) Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural Comput 19(11):2881–2912

    Article  MathSciNet  MATH  Google Scholar 

  • Buckler M, Bedoukian P, Jayasuriya S, Sampson A (2018) EVA2: exploiting temporal redundancy in live computer vision. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 533–546

    Google Scholar 

  • Cai R, Ren A, Liu N, Ding C, Wang L, Qian X, Pedram M, Wang Y (2018) Vibnn: hardware acceleration of Bayesian neural networks. ACM SIGPLAN Not 53(2):476–488

    Article  Google Scholar 

  • Cai H, Gan C, Wang T, Zhang Z, Han S (2019) Once-for-all: train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791

    Google Scholar 

  • Chakradhar S, Sankaradas M, Jakkula V, Cadambi S (2010) A dynamically configurable coprocessor for convolutional neural networks. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, pp 247–257

    Google Scholar 

  • Chattopadhyay A, Meyr H, Leupers R (2008) LISA: a uniform ADL for embedded processor modeling, implementation, and software toolsuite generation. In: Processor description languages. Morgan Kaufmann, San Francisco, pp 95–132

    Chapter  Google Scholar 

  • Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014a) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput Archit News 42(1):269–284

    Article  Google Scholar 

  • Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N, Temam O (2014b) Dadiannao: a machine-learning supercomputer. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, pp 609–622

    Google Scholar 

  • Chen YH, Emer J, Sze V (2016) Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput Archit News 44(3):367–379

    Article  Google Scholar 

  • Chen T, Moreau T, Jiang Z, Zheng L, Yan E, Shen H, Cowan M, Wang L, Hu Y, Ceze L, Guestrin C (2018) TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp 578–594

    Google Scholar 

  • Chen Y-H, Yang T-J, Emer J, Sze V (2019) Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J Emerg Sel Top Circuits Syst 9(2):292–308

    Article  Google Scholar 

  • Chen Y, Xie Y, Song L, Chen F, Tang T (2020) A survey of accelerator architectures for deep neural networks. Engineering 6(3):264–274

    Article  Google Scholar 

  • Chen W et al (2021) Improving system latency of AI accelerator with on-chip pipelined activation preprocessing and multi-mode batch inference. In: IEEE International Conference on Artificial Intelligence Circuits and Systems. IEEE

    Book  Google Scholar 

  • Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E (2014) cudnn: efficient primitives for deep learning. arXiv preprint arXiv:1410.0759

    Google Scholar 

  • Chicca E, Stefanini F, Bartolozzi C, Indiveri G (2014) Neuromorphic electronic circuits for building autonomous cognitive systems. Proc IEEE 102(9):1367–1388

    Article  Google Scholar 

  • Cho H, Oh P, Park J, Jung W, Lee J (2019) Fa3c: FPGA-accelerated deep reinforcement learning. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp 499–513

    Google Scholar 

  • Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with COTS HPC systems. In: International Conference on Machine Learning. PMLR, pp 1337–1345

    Google Scholar 

  • Dally B (2021) Sustainable computing via domain-specific architecture and efficient circuits. DATE Special Day on Sustainable HPC

    Google Scholar 

  • Davies M, Srinivasa N, Lin TH, Chinya G, Cao Y, Choday SH, Dimou G, Joshi P, Imam N, Jain S, Liao Y (2018) Loihi: a neuromorphic manycore processor with on-chip learning. Ieee Micro 38(1):82–99

    Article  Google Scholar 

  • Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

    Google Scholar 

  • Ditzel D, Kuroda T, Lee S (2014) Low-cost 3D chip stacking with ThruChip wireless connections. In: Proceedings of IEEE Hot Chips Symposium (HCS), pp 1–37

    Google Scholar 

  • Dong C, Loy CC, Tang X (2016) Accelerating the super-resolution convolutional neural network. In: European Conference on Computer Vision. Springer (2016), pp 391–407

    Google Scholar 

  • Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: shifting vision processing closer to the sensor. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp 92–104

    Google Scholar 

  • Folowosele F, Harrison A, Cassidy A, Andreou AG, Etienne-Cummings R, Mihalas S, Niebur E, Hamilton TJ (2009) A switched capacitor implementation of the generalized linear integrate-and-fire neuron. In: 2009 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, pp 2149–2152

    Google Scholar 

  • Freericks M (1991) The nML machine description formalism. Leiter der Fachbibliothek Informatik, Sekretariat FR 5–4

    Google Scholar 

  • Frenkel C, Lefebvre M, Legat JD, Bol D (2018) A 0.086-mm 212.7-pj/sop 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor in 28-nm CMOS. IEEE Trans Biomed Circuits Syst 13(1):145–158

    Google Scholar 

  • Friedmann S, Schemmel J, Grübl A, Hartel A, Hock M, Meier K (2016) Demonstrating hybrid learning in a flexible neuromorphic hardware system. IEEE Trans Biomed Circuits Syst 11(1):128–142

    Article  Google Scholar 

  • Furber SB, Galluppi F, Temple S, Plana LA (2014) The spinnaker project. Proc IEEE 102(5):652–665

    Article  Google Scholar 

  • Gao M, Pu J, Yang X, Horowitz M, Kozyrakis C (2017) Tetris: scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, pp 751–764.

    Google Scholar 

  • Gao C, Neil D, Ceolini E, Liu SC, Delbruck T (2018) DeltaRNN: a power-efficient recurrent neural network accelerator. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 21–30

    Google Scholar 

  • Geng T, Li A, Shi R, Wu C, Wang T, Li Y, Haghi P, Tumeo A, Che S, Reinhardt S, Herbordt MC (2020) AWB-GCN: a graph convolutional network accelerator with runtime workload rebalancing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 922–936

    Google Scholar 

  • Ghosh-Dastidar S, Adeli H (2009) A new supervised learning algorithm for multiple spiking neural networks with application in epilepsy and seizure detection. Neural Netw 22(10):1419–1431

    Article  Google Scholar 

  • Gokhale V, Jin J, Dundar A, Martini B, Culurciello E (2014) A 240 G-ops/s mobile coprocessor for deep neural networks. In: CVPR Workshop, pp 682–687

    Google Scholar 

  • Guo R, Liu Y, Zheng S, Wu SY, Ouyang P, Khwa WS, Chen X, Chen JJ, Li X, Liu L, Chang MF (2019) A 5.1 pJ/neuron 127.3 us/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65 nm CMOS. In: 2019 Symposium on VLSI Circuits. IEEE, pp C120–C121

    Google Scholar 

  • Gwennap L (2016) Wave accelerates deep learning-new dataflow processor targets 10x speedup for neural networks. The Linley MicroProcessor Report

    Google Scholar 

  • Ham TJ, Jung SJ, Kim S, Oh YH, Park Y, Song Y, Park JH, Lee S, Park K, Lee JW, Jeong DK (2020) A3̂: accelerating attention mechanisms in neural networks with approximation. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 328–341

    Google Scholar 

  • Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput Archit News 44(3):243–254

    Article  Google Scholar 

  • Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y, Yang H (2017) Ese: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 75–84

    Google Scholar 

  • Hegde K, Agrawal R, Yao Y, Fletcher CW (2018) Morph: flexible acceleration for 3d cnn-based video understanding. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 933–946

    Google Scholar 

  • Herculano-Houzel S (2009) The human brain in numbers: a linearly scaled-up primate brain. Front Hum Neurosci 3:31

    Article  Google Scholar 

  • Hosomi M, Yamagishi H, Yamamoto T, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C, Nagao H (2005) A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. In: IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest. IEEE, pp 459–462

    Chapter  Google Scholar 

  • Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360

    Google Scholar 

  • Iandola FN, Shaw AE, Krishna R, Keutzer KW (2020) SqueezeBERT: what can computer vision teach NLP about efficient neural networks? arXiv preprint arXiv:2006.11316

    Google Scholar 

  • Indiveri G, Chicca E, Douglas RJ (2006) A VLSI array of low-power spiking neurons and bistable synapses with spike–timing dependent plasticity. IEEE Trans Neural Netw 17(1):211–221

    Article  Google Scholar 

  • Izhikevich EM (2003) Simple model of spiking neurons. IEEE Trans Neural Netw 14(6):1569–1572

    Article  MathSciNet  Google Scholar 

  • James M et al (2020) Ispd 2020 physical mapping of neural networks on a wafer-scale deep learning accelerator. In: Proceedings of the 2020 International Symposium on Physical Design

    Google Scholar 

  • Jeddeloh J, Keeth B (2012) Hybrid memory cube new DRAM architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT). IEEE, pp 87–88

    Google Scholar 

  • Jia T, Ju Y, Joseph R, Gu J (2020) NCPU: an embedded neural CPU architecture on resource-constrained low power devices for real-time end-to-end performance. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1097–1109

    Google Scholar 

  • Joulin A, Cissé M, Grangier D, Jégou H (2017) Efficient softmax approximation for GPUs. In: International Conference on Machine Learning. PMLR, pp 1302–1310

    Google Scholar 

  • Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp 1–12

    Google Scholar 

  • Jouppi NP, Yoon DH, Kurian G, Li S, Patil N, Laudon J, Young C, Patterson D (2020) A domain-specific supercomputer for training deep neural networks. Commun ACM 63(7):67–78

    Article  Google Scholar 

  • Judd P, Albericio J, Hetherington T, Aamodt TM, Moshovos A (2016) Stripes: bit-serial deep neural network computing. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1–12

    Google Scholar 

  • Keutzer K. What every NN accelerator architect should know about deep learning applications and software. In: keynote of 2021 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC)

    Google Scholar 

  • Kim D, Kung J, Chai S, Yalamanchili S, Mukhopadhyay S (2016) Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory. ACM SIGARCH Comput Archit News 44(3):380–392

    Article  Google Scholar 

  • Kim H, Sim J, Choi Y, Kim LS (2019) Nand-net: minimizing computational complexity of in-memory processing for binary neural networks. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 661–673

    Google Scholar 

  • Kim S, Gholami A, Yao Z, Mahoney MW, Keutzer K (2021a) I-bert: integer-only bert quantization. In: International Conference on Machine Learning. PMLR, pp 5506–5518

    Google Scholar 

  • Kim S, Gholami A, Yao Z, Nrusimha A, Zhai B, Gao T, Mahoney MW, Keutzer K (2021b) Q-ASR: Integer-Only Zero-Shot Quantization for Efficient Speech Recognition. arXiv e-prints, arXiv-2103

    Google Scholar 

  • Ko GG, Chai Y, Donato M, Whatmough PN, Tambe T, Rutenbar RA, Brooks D, Wei GY (2020) A 3mm 2 programmable Bayesian inference accelerator for unsupervised machine perception using parallel Gibbs sampling in 16nm. In: 2020 IEEE Symposium on VLSI Circuits. IEEE, pp 1–2

    Google Scholar 

  • Korat UA, Alimohammad A (2019) A reconfigurable hardware architecture for principal component analysis. Circuits Syst Sig Process 38(5):2097–2113

    Article  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  • Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. ACM SIGPLAN Not 53(2):461–475

    Article  Google Scholar 

  • Lee DU, Kim KW, Kim KW, Kim H, Kim JY, Park YJ, Kim JH, Kim DS, Park HB, Shin JW, Cho JH (2014) 25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, pp 432–433

    Google Scholar 

  • Lee J, Kim C, Kang S, Shin D, Kim S, Yoo H (2018) UNPU: a 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: 2018 IEEE International Solid – State Circuits Conference (ISSCC), pp 218–220

    Google Scholar 

  • Lee J, Shin D, Lee J, Lee J, Kang S, Yoo HJ (2019) A full HD 60 fps CNN super resolution processor with selective caching based layer fusion for mobile devices. In: 2019 Symposium on VLSI Circuits. IEEE, pp C302–C303

    Google Scholar 

  • Li Z, Ding C, Wang S, Wen W, Zhuo Y, Liu C, Qiu Q, Xu W, Lin X, Qian X, Wang Y (2019a) E-RNN: Design optimization for efficient recurrent neural networks in FPGAs. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 69–80

    Google Scholar 

  • Li Y, Liu IJ, Yuan Y, Chen D, Schwing A, Huang J (2019b) Accelerating distributed reinforcement learning with in-switch computing. In: 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 279–291

    Google Scholar 

  • Li J, Louri A, Karanth A, Bunescu R (2021) GCNAX: a flexible and energy-efficient accelerator for graph convolutional neural networks. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, pp 775–788

    Google Scholar 

  • Lines A, Joshi P, Liu R, McCoy S, Tse J, Weng YH, Davies M (2018) Loihi asynchronous neuromorphic research chip. In: 2018 24th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC). IEEE, pp 32–33

    Google Scholar 

  • Liu D, Chen T, Liu S, Zhou J, Zhou S, Teman O, Feng X, Zhou X, Chen Y (2015) Pudiannao: a polyvalent machine learning accelerator. ACM SIGARCH Comput Archit News 43(1):369–381

    Article  Google Scholar 

  • Liu S, Du Z, Tao J, Han D, Luo T, Xie Y, Chen Y, Chen T (2016) Cambricon: an instruction set architecture for neural networks. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 393–405

    Google Scholar 

  • Liu C, Bellec G, Vogginger B, Kappel D, Partzsch J, Neumärker F, Höppner S, Maass W, Furber SB, Legenstein R, Mayr CG (2018) Memory-efficient deep learning on a spinnaker 2 prototype. Front Neurosci 12:840

    Article  Google Scholar 

  • Lu W, Yan G, Li J, Gong S, Han Y, Li X (2017) Flexflow: a flexible dataflow accelerator architecture for convolutional neural networks. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 553–564

    Google Scholar 

  • Maher MAC, Deweerth SP, Mahowald MA, Mead CA (1989) Implementing neural architectures using analog VLSI circuits. IEEE Trans Circuits Syst 36(5):643–652

    Article  Google Scholar 

  • Mahmoud M, Siu K, Moshovos A (2018) Diffy: a Déjà vu-free differential deep neural network accelerator. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 134–147

    Google Scholar 

  • Martin AJ (1990) The limitations to delay-insensitivity in asynchronous circuits. In: Beauty is our business. Springer, New York, pp 302–311

    Chapter  Google Scholar 

  • Martin AJ, Nyström M (2004) CAST: Caltech asynchronous synthesis tools. In: Asynchronous Circuit Design Working Group Workshop, Turku

    Google Scholar 

  • Mead C (1990) Neuromorphic electronic systems. Proc IEEE 78(10):1629–1636

    Article  Google Scholar 

  • Meng H, Appiah K, Hunter A, Dickinson P (2011) FPGA implementation of naive bayes classifier for visual object recognition. In: CVPR 2011 WORKSHOPS. IEEE, pp 123–128

    Google Scholar 

  • Mitchell TM (1997) Machine learning. McGraw Hill. ISBN 0-07-042807-7

    MATH  Google Scholar 

  • Molchanov P, Hall J, Yin H, Kautz J, Fusi N, Vahdat A (2021) HANT: hardware-aware network transformation. arXiv preprint arXiv:2107.10624

    Google Scholar 

  • Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017) 14.5 envision: a 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, pp 246–247

    Google Scholar 

  • Moreau T, Chen T, Vega L, Roesch J, Yan E, Zheng L, Fromm J, Jiang Z, Ceze L, Guestrin C (2019) A hardware–software blueprint for flexible deep learning specialization. IEEE Micro 39(5):8–16

    Article  Google Scholar 

  • Norrie T, Patil N, Yoon DH, Kurian G, Li S, Laudon J, Young C, Jouppi NP, Patterson DA (2020) Google’s Training Chips Revealed: TPUv2 and TPUv3. In: Hot Chips Symposium, pp 1–70

    Google Scholar 

  • NVIDIA (2017) NVIDIA deep learning accelerator (NVDLA). http://nvdla.org

  • Papadonikolakis M, Bouganis CS (2012) Novel cascade FPGA accelerator for support vector machines classification. IEEE Trans Neural Netw Learn Syst 23(7):1040–1052

    Article  Google Scholar 

  • Peemen M, Setio AAA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: IEEE International Conference on Computer Design (ICCD), pp 13–19

    Google Scholar 

  • Pei J, Deng L, Song S, Zhao M, Zhang Y, Wu S, Wang G, Zou Z, Wu Z, He W, Chen F (2019) Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572(7767):106–111

    Article  Google Scholar 

  • Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee SK, Hernández-Lobato JM, Wei GY, Brooks D (2016) Minerva: enabling low-power, highly-accurate deep neural network accelerators. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 267–278

    Google Scholar 

  • Riera M, Arnau JM, González A (2018) Computation reuse in DNNs by exploiting input similarity. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 57–68

    Google Scholar 

  • Ryu S, Kim H, Yi W, Kim JJ (2019) Bitblade: area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In: Proceedings of the 56th Annual Design Automation Conference 2019, pp 1–6

    Google Scholar 

  • Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520

    Google Scholar 

  • Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108

    Google Scholar 

  • Sanh V, Wolf T, Rush A (2020) Movement pruning: adaptive sparsity by fine-tuning. Adv Neural Inf Process Syst 33:20378–20389

    Google Scholar 

  • Saqib F, Dutta A, Plusquellic J, Ortiz P, Pattichis MS (2013) Pipelined decision tree classification accelerator implementation in FPGA (DT-CAIF). IEEE Trans Comput 64(1):280–285

    Article  MathSciNet  MATH  Google Scholar 

  • Schemmel J, Brüderle D, Grübl A, Hock M, Meier K, Millner S (2010) A e neuromorphic hardware system for large-scale neural modeling. In: 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, pp 1947–1950

    Google Scholar 

  • Schuman CD, Potok TE, Patton RM, Birdwell JD, Dean ME, Rose GS, Plank JS (2017) A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963

    Google Scholar 

  • Sharma H, Park J, Suda N, Lai L, Chau B, Chandra V, Esmaeilzadeh H (2018) Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 764–775

    Google Scholar 

  • Shen J, Huang Y, Wang Z, Qiao Y, Wen M, Zhang C (2018) Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 97–106

    Google Scholar 

  • Sun Z, Yu H, Song X, Liu R, Yang Y, Zhou D (2019) Mobilebert: task-agnostic compression of bert by progressive knowledge transfer

    Google Scholar 

  • Sun Z, Yu H, Song X, Liu R, Yang Y, Zhou D (2020) Mobilebert: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984

    Google Scholar 

  • Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329

    Article  Google Scholar 

  • Tambe T, Yang EY, Ko GG, Chai Y, Hooper C, Donato M, Whatmough PN, Rush AM, Brooks D, Wei GY (2021) 9.8 A 25 mm 2 SoC for IoT devices with 18 ms noise-robust speech-to-text latency via Bayesian speech denoising and attention-based sequence-to-sequence DNN speech recognition in 16 nm FinFET. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol 64. IEEE, pp 158–160

    Google Scholar 

  • Tay Y, Dehghani M, Abnar S, Shen Y, Bahri D, Pham P, Rao J, Yang L, Ruder S, Metzler D (2020) Long range arena: a benchmark for efficient transformers. arXiv preprint arXiv:2011.04006

    Google Scholar 

  • Temam O (2012) A defect-tolerant accelerator for emerging high-performance applications. In: 2012 39th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 356–367

    Google Scholar 

  • Tuma T, Pantazi A, Le Gallo M, Sebastian A, Eleftheriou E (2016) Stochastic phase-change neurons. Nat Nanotechnol 11(8):693

    Article  Google Scholar 

  • Ueyoshi K, Ando K, Hirose K, Takamaeda-Yamazaki S, Kadomoto J, Miyata T, Hamada M, Kuroda T, Motomura M (2018) QUEST: a 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS. In: 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, pp 216–218

    Google Scholar 

  • Venkatesan R, Shao YS, Wang M, Clemons J, Dai S, Fojtik M, Keller B, Klinefelter A, Pinckney N, Raina P, Zhang Y (2019) Magnet: a modular accelerator generator for neural networks. In: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, pp 1–8

    Google Scholar 

  • Wang Q, Li P, Kim Y (2014) A parallel digital VLSI architecture for integrated support vector machine training and classification. IEEE Trans Very Large Scale Integr(VLSI) Syst 23(8):1471–1484

    Article  Google Scholar 

  • Wang S, Li Z, Ding C, Yuan B, Qiu Q, Wang Y, Liang Y (2018) C-LSTM: enabling efficient LSTM using structured compression techniques on FPGAs. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 11–20

    Google Scholar 

  • Waser R, Dittmann R, Staikov G, Szot K (2009) Redox-based resistive switching memories–nanoionic mechanisms, prospects, and challenges. Adv Mater 21(25–26):2632–2663

    Article  Google Scholar 

  • Wei X, Liang Y, Li X, Yu CH, Zhang P, Cong J (2018) TGPA: tile-grained pipeline architecture for low latency CNN inference. In: Proceedings of the International Conference on Computer-Aided Design, pp 1–8

    Google Scholar 

  • Wijekoon JH, Dudek P (2008) Compact silicon neuron circuit with spiking and bursting behaviour. Neural Netw 21(2–3):524–534

    Article  Google Scholar 

  • Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76

    Article  Google Scholar 

  • Winterstein F, Bayliss S, Constantinides GA (2013) September. FPGA-based K-means clustering using tree-based data structures. In: 2013 23rd International Conference on Field Programmable Logic and Applications. IEEE, pp 1–6

    Google Scholar 

  • Wong CG, Martin AJ (2003) High-level synthesis of asynchronous systems by data-driven decomposition. In: Proceedings of the 40th Annual Design Automation Conference, pp 508–513

    Google Scholar 

  • Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 129–137

    Google Scholar 

  • Wu B, Wan A, Yue X, Keutzer K (2018) Squeezeseg: convolutional neural nets with recurrent crf for real-time road-object segmentation from 3D lidar point cloud. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1887–1893

    Google Scholar 

  • Wu B, Zhou X, Zhao S, Yue X, Keutzer K (2019) Squeezesegv2: improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp 4376–4382

    Google Scholar 

  • Xu P, Zhang X, Hao C, Zhao Y, Zhang Y, Wang Y, Li C, Guan Z, Chen D, Lin Y (2020) AutoDNNchip: an automated DNN chip predictor and builder for both FPGAs and ASICs. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 40–50

    Google Scholar 

  • Yan M, Deng L, Hu X, Liang L, Feng Y, Ye X, Zhang Z, Fan D, Xie Y (2020) HyGCN: a GCN accelerator with hybrid architecture. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 15–29

    Google Scholar 

  • Yang A (2019) Deep learning training at scale spring crest deep learning accelerator (intelⓇ nervanaTM NNP-T). In: 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE, pp 1–20

    Google Scholar 

  • Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo KA (2018) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 49(7):2490–2503

    Article  Google Scholar 

  • Yin S, Ouyang P, Tang S, Tu F, Li X, Zheng S, Lu T, Gu J, Liu L, Wei S (2017) A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE J Solid-State Circuits 53(4):968–982

    Article  Google Scholar 

  • Yin S, Ouyang P, Yang J, Lu T, Li X, Liu L, Wei S (2018a) An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28nm CMOS. In: 2018 IEEE Symposium on VLSI Circuits. IEEE, pp 37–38

    Google Scholar 

  • Yin S, Ouyang P, Zheng S, Song D, Li X, Liu L, Wei S (2018b) A 141 uw, 2.46 pj/neuron binarized convolutional neural network based self-learning speech recognition processor in 28 nm CMOS. In: 2018 IEEE Symposium on VLSI Circuits. IEEE, pp 139–140

    Google Scholar 

  • Yin S, Jiang Z, Seo JS, Seok M (2020) XNOR-SRAM: in-memory computing SRAM macro for binary/ternary deep neural networks. IEEE J Solid-State Circuits 55(6):1733–1743

    Google Scholar 

  • Zadeh AH, Edo I, Awad OM, Moshovos A (2020) GOBO: quantizing attention-based nlp models for low latency and energy efficient inference. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 811–824

    Google Scholar 

  • Zeng H, Prasanna V (2020) Graphact: accelerating gcn training on CPU-FPGA heterogeneous platforms. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 255–265

    Google Scholar 

  • Zhai B, Gao T, Xue F, Rothchild D, Wu B, Gonzalez JE, Keutzer K (2020) Squeezewave: Extremely lightweight vocoders for on-device speech synthesis. arXiv preprint arXiv:2001.05685

    Google Scholar 

  • Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 161–170

    Google Scholar 

  • Zhang S, Du Z, Zhang L, Lan H, Liu S, Li L, Guo Q, Chen T, Chen Y (2016) Cambricon-X: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1–12

    Google Scholar 

  • Zhang J, Wu H, Wei J, Wei S, Chen H (2019) An asynchronous reconfigurable SNN accelerator with event-driven time step update. In: 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, pp 213–216

    Google Scholar 

  • Zhang X, Song SL, Xie C, Wang J, Zhang W, Fu X (2020) Enabling highly efficient capsule networks processing through a PIM-based architecture design. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 542–555

    Google Scholar 

  • Zhao Y, Du Z, Guo Q, Liu S, Li L, Xu Z, Chen T, Chen Y (2019) Cambricon-F: machine learning computers with fractal von Neumann architecture. In: 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 788–801

    Google Scholar 

  • Zhao L, Zhang Y, Yang J (2020) SCA: a secure CNN accelerator for both training and inference. In: 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, pp 1–6

    Google Scholar 

  • Zhou X, Du Z, Guo Q, Liu S, Liu C, Wang C, Zhou X, Li L, Chen T, Chen Y (2018) Cambricon-S: addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 15–28

    Google Scholar 

  • Zhu Y, Samajdar A, Mattina M, Whatmough P (2018) Euphrates: algorithm-SoC co-design for low-power mobile continuous vision. In: Proceedings of the 45th Annual International Symposium on Computer Architecture, pp 547–560

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Wang .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Singapore Pte Ltd.

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Yang, Y., Chen, C., Wang, Z. (2022). Architectures for Machine Learning. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_12-1

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6401-7_12-1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6401-7

  • Online ISBN: 978-981-15-6401-7

  • eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics