Abstract
The term “artificial intelligence (AI)” was coined in 1956, and its development has undergone periods of extreme hype and periods of strong disillusionment since then. Today, AI has received tremendous attention from both academia and industry, and it will remain one of the hottest topics in the foreseeable future. A subset of AI named machine learning (ML) has achieved great success throughout a huge variety of fields, such as computer vision, natural language processing, and computer gaming. ML was first proposed to endow machine the ability to imitate the learning process of the human brain using neuromorphic models. However, the modelling complexity and limited computing capabilities of machines hindered the development of ML in its early days. Benefiting from the ever-growing computing power and availability of digital data, ML has adopted both bio-inspired spiking neural network (SNN), or neuromorphic computing, and practical artificial neural network (ANN), which have become two of the top trending methods with outstanding results.
This chapter gives a brief overview of the state-of-the-art architectures and circuits for ML. On the one hand, neuromorphic computing architectures and accelerators are investigated, including bio-inspired computational models and learning methods, microarchitecture, circuit-level design considerations, and prominent neuromorphic chips. On the other hand, architectures for ANNs are outlined, including essential design metrics on ANN accelerators and various state-of-the-art ANN architectures and circuits.
References
Akopyan F, Sawada J, Cassidy A, Alvarez-Icaza R, Arthur J, Merolla P, Imam N, Nakamura Y, Datta P, Nam GJ, Taba B (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans Comput-Aided Des Integr Circuits Syst 34(10):1537–1557
Albericio J, Judd P, Hetherington T, Aamodt T, Jerger NE, Moshovos A (2016) Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput Archit News 44(3):1–13
Albericio J, Delmás A, Judd P, Sharify S, O’Leary G, Genov R, Moshovos A (2017) Bit-pragmatic deep neural network computing. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp 382–394
Amravati A, Nasir SB, Thangadurai S, Yoon I, Raychowdhury A (2018) A 55nm time-domain mixed-signal neuromorphic accelerator with stochastic synapses and embedded reinforcement learning for autonomous micro-robots. In: 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, pp 124–126
Anwani N, Rajendran B (2015) Normad-normalized approximate descent based supervised learning rule for spiking neurons. In 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Azizimazreah A, Chen L (2019) Shortcut mining: exploiting cross-layer shortcut reuse in dcnn accelerators. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 94–105
Benjamin BV, Gao P, McQuinn E, Choudhary S, Chandrasekaran AR, Bussat JM, Alvarez-Icaza R, Arthur JV, Merolla PA, Boahen K (2014) Neurogrid: a mixed-analog-digital multichip system for large-scale neural simulations. Proc IEEE 102(5):699–716
Berdan R, Marukame T, Kabuyanagi S, Ota K, Saitoh M, Fujii S (2019) In-memory reinforcement learning with moderatelystochastic conductance switching of ferroelectric tunnel junctions. In: Proceeding Symposium on VLSI Technology, pp 22–23
Bi GQ, Poo MM (1998) Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci 18(24):10464–10472
Bo D et al OR-ML: enhancing reliability for machine learning accelerator with opportunistic redundancy. In: 2021 IEEE Design, Automation and Test in Europe Conference (DATE) (2021)
Bohte SM, Kok JN, La Poutre H (2002) Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48(1–4):17–37
Brader JM, Senn W, Fusi S (2007) Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural Comput 19(11):2881–2912
Buckler M, Bedoukian P, Jayasuriya S, Sampson A (2018) EVA2: exploiting temporal redundancy in live computer vision. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 533–546
Cai R, Ren A, Liu N, Ding C, Wang L, Qian X, Pedram M, Wang Y (2018) Vibnn: hardware acceleration of Bayesian neural networks. ACM SIGPLAN Not 53(2):476–488
Cai H, Gan C, Wang T, Zhang Z, Han S (2019) Once-for-all: train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791
Chakradhar S, Sankaradas M, Jakkula V, Cadambi S (2010) A dynamically configurable coprocessor for convolutional neural networks. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, pp 247–257
Chattopadhyay A, Meyr H, Leupers R (2008) LISA: a uniform ADL for embedded processor modeling, implementation, and software toolsuite generation. In: Processor description languages. Morgan Kaufmann, San Francisco, pp 95–132
Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014a) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput Archit News 42(1):269–284
Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N, Temam O (2014b) Dadiannao: a machine-learning supercomputer. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, pp 609–622
Chen YH, Emer J, Sze V (2016) Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput Archit News 44(3):367–379
Chen T, Moreau T, Jiang Z, Zheng L, Yan E, Shen H, Cowan M, Wang L, Hu Y, Ceze L, Guestrin C (2018) TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp 578–594
Chen Y-H, Yang T-J, Emer J, Sze V (2019) Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J Emerg Sel Top Circuits Syst 9(2):292–308
Chen Y, Xie Y, Song L, Chen F, Tang T (2020) A survey of accelerator architectures for deep neural networks. Engineering 6(3):264–274
Chen W et al (2021) Improving system latency of AI accelerator with on-chip pipelined activation preprocessing and multi-mode batch inference. In: IEEE International Conference on Artificial Intelligence Circuits and Systems. IEEE
Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E (2014) cudnn: efficient primitives for deep learning. arXiv preprint arXiv:1410.0759
Chicca E, Stefanini F, Bartolozzi C, Indiveri G (2014) Neuromorphic electronic circuits for building autonomous cognitive systems. Proc IEEE 102(9):1367–1388
Cho H, Oh P, Park J, Jung W, Lee J (2019) Fa3c: FPGA-accelerated deep reinforcement learning. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp 499–513
Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with COTS HPC systems. In: International Conference on Machine Learning. PMLR, pp 1337–1345
Dally B (2021) Sustainable computing via domain-specific architecture and efficient circuits. DATE Special Day on Sustainable HPC
Davies M, Srinivasa N, Lin TH, Chinya G, Cao Y, Choday SH, Dimou G, Joshi P, Imam N, Jain S, Liao Y (2018) Loihi: a neuromorphic manycore processor with on-chip learning. Ieee Micro 38(1):82–99
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ditzel D, Kuroda T, Lee S (2014) Low-cost 3D chip stacking with ThruChip wireless connections. In: Proceedings of IEEE Hot Chips Symposium (HCS), pp 1–37
Dong C, Loy CC, Tang X (2016) Accelerating the super-resolution convolutional neural network. In: European Conference on Computer Vision. Springer (2016), pp 391–407
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: shifting vision processing closer to the sensor. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp 92–104
Folowosele F, Harrison A, Cassidy A, Andreou AG, Etienne-Cummings R, Mihalas S, Niebur E, Hamilton TJ (2009) A switched capacitor implementation of the generalized linear integrate-and-fire neuron. In: 2009 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, pp 2149–2152
Freericks M (1991) The nML machine description formalism. Leiter der Fachbibliothek Informatik, Sekretariat FR 5–4
Frenkel C, Lefebvre M, Legat JD, Bol D (2018) A 0.086-mm 212.7-pj/sop 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor in 28-nm CMOS. IEEE Trans Biomed Circuits Syst 13(1):145–158
Friedmann S, Schemmel J, Grübl A, Hartel A, Hock M, Meier K (2016) Demonstrating hybrid learning in a flexible neuromorphic hardware system. IEEE Trans Biomed Circuits Syst 11(1):128–142
Furber SB, Galluppi F, Temple S, Plana LA (2014) The spinnaker project. Proc IEEE 102(5):652–665
Gao M, Pu J, Yang X, Horowitz M, Kozyrakis C (2017) Tetris: scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, pp 751–764.
Gao C, Neil D, Ceolini E, Liu SC, Delbruck T (2018) DeltaRNN: a power-efficient recurrent neural network accelerator. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 21–30
Geng T, Li A, Shi R, Wu C, Wang T, Li Y, Haghi P, Tumeo A, Che S, Reinhardt S, Herbordt MC (2020) AWB-GCN: a graph convolutional network accelerator with runtime workload rebalancing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 922–936
Ghosh-Dastidar S, Adeli H (2009) A new supervised learning algorithm for multiple spiking neural networks with application in epilepsy and seizure detection. Neural Netw 22(10):1419–1431
Gokhale V, Jin J, Dundar A, Martini B, Culurciello E (2014) A 240 G-ops/s mobile coprocessor for deep neural networks. In: CVPR Workshop, pp 682–687
Guo R, Liu Y, Zheng S, Wu SY, Ouyang P, Khwa WS, Chen X, Chen JJ, Li X, Liu L, Chang MF (2019) A 5.1 pJ/neuron 127.3 us/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65 nm CMOS. In: 2019 Symposium on VLSI Circuits. IEEE, pp C120–C121
Gwennap L (2016) Wave accelerates deep learning-new dataflow processor targets 10x speedup for neural networks. The Linley MicroProcessor Report
Ham TJ, Jung SJ, Kim S, Oh YH, Park Y, Song Y, Park JH, Lee S, Park K, Lee JW, Jeong DK (2020) A3̂: accelerating attention mechanisms in neural networks with approximation. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 328–341
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput Archit News 44(3):243–254
Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y, Yang H (2017) Ese: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 75–84
Hegde K, Agrawal R, Yao Y, Fletcher CW (2018) Morph: flexible acceleration for 3d cnn-based video understanding. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 933–946
Herculano-Houzel S (2009) The human brain in numbers: a linearly scaled-up primate brain. Front Hum Neurosci 3:31
Hosomi M, Yamagishi H, Yamamoto T, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C, Nagao H (2005) A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. In: IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest. IEEE, pp 459–462
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360
Iandola FN, Shaw AE, Krishna R, Keutzer KW (2020) SqueezeBERT: what can computer vision teach NLP about efficient neural networks? arXiv preprint arXiv:2006.11316
Indiveri G, Chicca E, Douglas RJ (2006) A VLSI array of low-power spiking neurons and bistable synapses with spike–timing dependent plasticity. IEEE Trans Neural Netw 17(1):211–221
Izhikevich EM (2003) Simple model of spiking neurons. IEEE Trans Neural Netw 14(6):1569–1572
James M et al (2020) Ispd 2020 physical mapping of neural networks on a wafer-scale deep learning accelerator. In: Proceedings of the 2020 International Symposium on Physical Design
Jeddeloh J, Keeth B (2012) Hybrid memory cube new DRAM architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT). IEEE, pp 87–88
Jia T, Ju Y, Joseph R, Gu J (2020) NCPU: an embedded neural CPU architecture on resource-constrained low power devices for real-time end-to-end performance. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1097–1109
Joulin A, Cissé M, Grangier D, Jégou H (2017) Efficient softmax approximation for GPUs. In: International Conference on Machine Learning. PMLR, pp 1302–1310
Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp 1–12
Jouppi NP, Yoon DH, Kurian G, Li S, Patil N, Laudon J, Young C, Patterson D (2020) A domain-specific supercomputer for training deep neural networks. Commun ACM 63(7):67–78
Judd P, Albericio J, Hetherington T, Aamodt TM, Moshovos A (2016) Stripes: bit-serial deep neural network computing. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1–12
Keutzer K. What every NN accelerator architect should know about deep learning applications and software. In: keynote of 2021 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC)
Kim D, Kung J, Chai S, Yalamanchili S, Mukhopadhyay S (2016) Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory. ACM SIGARCH Comput Archit News 44(3):380–392
Kim H, Sim J, Choi Y, Kim LS (2019) Nand-net: minimizing computational complexity of in-memory processing for binary neural networks. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 661–673
Kim S, Gholami A, Yao Z, Mahoney MW, Keutzer K (2021a) I-bert: integer-only bert quantization. In: International Conference on Machine Learning. PMLR, pp 5506–5518
Kim S, Gholami A, Yao Z, Nrusimha A, Zhai B, Gao T, Mahoney MW, Keutzer K (2021b) Q-ASR: Integer-Only Zero-Shot Quantization for Efficient Speech Recognition. arXiv e-prints, arXiv-2103
Ko GG, Chai Y, Donato M, Whatmough PN, Tambe T, Rutenbar RA, Brooks D, Wei GY (2020) A 3mm 2 programmable Bayesian inference accelerator for unsupervised machine perception using parallel Gibbs sampling in 16nm. In: 2020 IEEE Symposium on VLSI Circuits. IEEE, pp 1–2
Korat UA, Alimohammad A (2019) A reconfigurable hardware architecture for principal component analysis. Circuits Syst Sig Process 38(5):2097–2113
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. ACM SIGPLAN Not 53(2):461–475
Lee DU, Kim KW, Kim KW, Kim H, Kim JY, Park YJ, Kim JH, Kim DS, Park HB, Shin JW, Cho JH (2014) 25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, pp 432–433
Lee J, Kim C, Kang S, Shin D, Kim S, Yoo H (2018) UNPU: a 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: 2018 IEEE International Solid – State Circuits Conference (ISSCC), pp 218–220
Lee J, Shin D, Lee J, Lee J, Kang S, Yoo HJ (2019) A full HD 60 fps CNN super resolution processor with selective caching based layer fusion for mobile devices. In: 2019 Symposium on VLSI Circuits. IEEE, pp C302–C303
Li Z, Ding C, Wang S, Wen W, Zhuo Y, Liu C, Qiu Q, Xu W, Lin X, Qian X, Wang Y (2019a) E-RNN: Design optimization for efficient recurrent neural networks in FPGAs. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 69–80
Li Y, Liu IJ, Yuan Y, Chen D, Schwing A, Huang J (2019b) Accelerating distributed reinforcement learning with in-switch computing. In: 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 279–291
Li J, Louri A, Karanth A, Bunescu R (2021) GCNAX: a flexible and energy-efficient accelerator for graph convolutional neural networks. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, pp 775–788
Lines A, Joshi P, Liu R, McCoy S, Tse J, Weng YH, Davies M (2018) Loihi asynchronous neuromorphic research chip. In: 2018 24th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC). IEEE, pp 32–33
Liu D, Chen T, Liu S, Zhou J, Zhou S, Teman O, Feng X, Zhou X, Chen Y (2015) Pudiannao: a polyvalent machine learning accelerator. ACM SIGARCH Comput Archit News 43(1):369–381
Liu S, Du Z, Tao J, Han D, Luo T, Xie Y, Chen Y, Chen T (2016) Cambricon: an instruction set architecture for neural networks. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 393–405
Liu C, Bellec G, Vogginger B, Kappel D, Partzsch J, Neumärker F, Höppner S, Maass W, Furber SB, Legenstein R, Mayr CG (2018) Memory-efficient deep learning on a spinnaker 2 prototype. Front Neurosci 12:840
Lu W, Yan G, Li J, Gong S, Han Y, Li X (2017) Flexflow: a flexible dataflow accelerator architecture for convolutional neural networks. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 553–564
Maher MAC, Deweerth SP, Mahowald MA, Mead CA (1989) Implementing neural architectures using analog VLSI circuits. IEEE Trans Circuits Syst 36(5):643–652
Mahmoud M, Siu K, Moshovos A (2018) Diffy: a Déjà vu-free differential deep neural network accelerator. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 134–147
Martin AJ (1990) The limitations to delay-insensitivity in asynchronous circuits. In: Beauty is our business. Springer, New York, pp 302–311
Martin AJ, Nyström M (2004) CAST: Caltech asynchronous synthesis tools. In: Asynchronous Circuit Design Working Group Workshop, Turku
Mead C (1990) Neuromorphic electronic systems. Proc IEEE 78(10):1629–1636
Meng H, Appiah K, Hunter A, Dickinson P (2011) FPGA implementation of naive bayes classifier for visual object recognition. In: CVPR 2011 WORKSHOPS. IEEE, pp 123–128
Mitchell TM (1997) Machine learning. McGraw Hill. ISBN 0-07-042807-7
Molchanov P, Hall J, Yin H, Kautz J, Fusi N, Vahdat A (2021) HANT: hardware-aware network transformation. arXiv preprint arXiv:2107.10624
Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017) 14.5 envision: a 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, pp 246–247
Moreau T, Chen T, Vega L, Roesch J, Yan E, Zheng L, Fromm J, Jiang Z, Ceze L, Guestrin C (2019) A hardware–software blueprint for flexible deep learning specialization. IEEE Micro 39(5):8–16
Norrie T, Patil N, Yoon DH, Kurian G, Li S, Laudon J, Young C, Jouppi NP, Patterson DA (2020) Google’s Training Chips Revealed: TPUv2 and TPUv3. In: Hot Chips Symposium, pp 1–70
NVIDIA (2017) NVIDIA deep learning accelerator (NVDLA). http://nvdla.org
Papadonikolakis M, Bouganis CS (2012) Novel cascade FPGA accelerator for support vector machines classification. IEEE Trans Neural Netw Learn Syst 23(7):1040–1052
Peemen M, Setio AAA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: IEEE International Conference on Computer Design (ICCD), pp 13–19
Pei J, Deng L, Song S, Zhao M, Zhang Y, Wu S, Wang G, Zou Z, Wu Z, He W, Chen F (2019) Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572(7767):106–111
Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee SK, Hernández-Lobato JM, Wei GY, Brooks D (2016) Minerva: enabling low-power, highly-accurate deep neural network accelerators. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 267–278
Riera M, Arnau JM, González A (2018) Computation reuse in DNNs by exploiting input similarity. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 57–68
Ryu S, Kim H, Yi W, Kim JJ (2019) Bitblade: area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In: Proceedings of the 56th Annual Design Automation Conference 2019, pp 1–6
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108
Sanh V, Wolf T, Rush A (2020) Movement pruning: adaptive sparsity by fine-tuning. Adv Neural Inf Process Syst 33:20378–20389
Saqib F, Dutta A, Plusquellic J, Ortiz P, Pattichis MS (2013) Pipelined decision tree classification accelerator implementation in FPGA (DT-CAIF). IEEE Trans Comput 64(1):280–285
Schemmel J, Brüderle D, Grübl A, Hock M, Meier K, Millner S (2010) A e neuromorphic hardware system for large-scale neural modeling. In: 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, pp 1947–1950
Schuman CD, Potok TE, Patton RM, Birdwell JD, Dean ME, Rose GS, Plank JS (2017) A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963
Sharma H, Park J, Suda N, Lai L, Chau B, Chandra V, Esmaeilzadeh H (2018) Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 764–775
Shen J, Huang Y, Wang Z, Qiao Y, Wen M, Zhang C (2018) Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 97–106
Sun Z, Yu H, Song X, Liu R, Yang Y, Zhou D (2019) Mobilebert: task-agnostic compression of bert by progressive knowledge transfer
Sun Z, Yu H, Song X, Liu R, Yang Y, Zhou D (2020) Mobilebert: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984
Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Tambe T, Yang EY, Ko GG, Chai Y, Hooper C, Donato M, Whatmough PN, Rush AM, Brooks D, Wei GY (2021) 9.8 A 25 mm 2 SoC for IoT devices with 18 ms noise-robust speech-to-text latency via Bayesian speech denoising and attention-based sequence-to-sequence DNN speech recognition in 16 nm FinFET. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol 64. IEEE, pp 158–160
Tay Y, Dehghani M, Abnar S, Shen Y, Bahri D, Pham P, Rao J, Yang L, Ruder S, Metzler D (2020) Long range arena: a benchmark for efficient transformers. arXiv preprint arXiv:2011.04006
Temam O (2012) A defect-tolerant accelerator for emerging high-performance applications. In: 2012 39th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 356–367
Tuma T, Pantazi A, Le Gallo M, Sebastian A, Eleftheriou E (2016) Stochastic phase-change neurons. Nat Nanotechnol 11(8):693
Ueyoshi K, Ando K, Hirose K, Takamaeda-Yamazaki S, Kadomoto J, Miyata T, Hamada M, Kuroda T, Motomura M (2018) QUEST: a 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS. In: 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, pp 216–218
Venkatesan R, Shao YS, Wang M, Clemons J, Dai S, Fojtik M, Keller B, Klinefelter A, Pinckney N, Raina P, Zhang Y (2019) Magnet: a modular accelerator generator for neural networks. In: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, pp 1–8
Wang Q, Li P, Kim Y (2014) A parallel digital VLSI architecture for integrated support vector machine training and classification. IEEE Trans Very Large Scale Integr(VLSI) Syst 23(8):1471–1484
Wang S, Li Z, Ding C, Yuan B, Qiu Q, Wang Y, Liang Y (2018) C-LSTM: enabling efficient LSTM using structured compression techniques on FPGAs. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 11–20
Waser R, Dittmann R, Staikov G, Szot K (2009) Redox-based resistive switching memories–nanoionic mechanisms, prospects, and challenges. Adv Mater 21(25–26):2632–2663
Wei X, Liang Y, Li X, Yu CH, Zhang P, Cong J (2018) TGPA: tile-grained pipeline architecture for low latency CNN inference. In: Proceedings of the International Conference on Computer-Aided Design, pp 1–8
Wijekoon JH, Dudek P (2008) Compact silicon neuron circuit with spiking and bursting behaviour. Neural Netw 21(2–3):524–534
Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76
Winterstein F, Bayliss S, Constantinides GA (2013) September. FPGA-based K-means clustering using tree-based data structures. In: 2013 23rd International Conference on Field Programmable Logic and Applications. IEEE, pp 1–6
Wong CG, Martin AJ (2003) High-level synthesis of asynchronous systems by data-driven decomposition. In: Proceedings of the 40th Annual Design Automation Conference, pp 508–513
Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 129–137
Wu B, Wan A, Yue X, Keutzer K (2018) Squeezeseg: convolutional neural nets with recurrent crf for real-time road-object segmentation from 3D lidar point cloud. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1887–1893
Wu B, Zhou X, Zhao S, Yue X, Keutzer K (2019) Squeezesegv2: improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp 4376–4382
Xu P, Zhang X, Hao C, Zhao Y, Zhang Y, Wang Y, Li C, Guan Z, Chen D, Lin Y (2020) AutoDNNchip: an automated DNN chip predictor and builder for both FPGAs and ASICs. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 40–50
Yan M, Deng L, Hu X, Liang L, Feng Y, Ye X, Zhang Z, Fan D, Xie Y (2020) HyGCN: a GCN accelerator with hybrid architecture. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 15–29
Yang A (2019) Deep learning training at scale spring crest deep learning accelerator (intelⓇ nervanaTM NNP-T). In: 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE, pp 1–20
Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo KA (2018) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 49(7):2490–2503
Yin S, Ouyang P, Tang S, Tu F, Li X, Zheng S, Lu T, Gu J, Liu L, Wei S (2017) A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE J Solid-State Circuits 53(4):968–982
Yin S, Ouyang P, Yang J, Lu T, Li X, Liu L, Wei S (2018a) An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28nm CMOS. In: 2018 IEEE Symposium on VLSI Circuits. IEEE, pp 37–38
Yin S, Ouyang P, Zheng S, Song D, Li X, Liu L, Wei S (2018b) A 141 uw, 2.46 pj/neuron binarized convolutional neural network based self-learning speech recognition processor in 28 nm CMOS. In: 2018 IEEE Symposium on VLSI Circuits. IEEE, pp 139–140
Yin S, Jiang Z, Seo JS, Seok M (2020) XNOR-SRAM: in-memory computing SRAM macro for binary/ternary deep neural networks. IEEE J Solid-State Circuits 55(6):1733–1743
Zadeh AH, Edo I, Awad OM, Moshovos A (2020) GOBO: quantizing attention-based nlp models for low latency and energy efficient inference. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 811–824
Zeng H, Prasanna V (2020) Graphact: accelerating gcn training on CPU-FPGA heterogeneous platforms. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 255–265
Zhai B, Gao T, Xue F, Rothchild D, Wu B, Gonzalez JE, Keutzer K (2020) Squeezewave: Extremely lightweight vocoders for on-device speech synthesis. arXiv preprint arXiv:2001.05685
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 161–170
Zhang S, Du Z, Zhang L, Lan H, Liu S, Li L, Guo Q, Chen T, Chen Y (2016) Cambricon-X: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1–12
Zhang J, Wu H, Wei J, Wei S, Chen H (2019) An asynchronous reconfigurable SNN accelerator with event-driven time step update. In: 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, pp 213–216
Zhang X, Song SL, Xie C, Wang J, Zhang W, Fu X (2020) Enabling highly efficient capsule networks processing through a PIM-based architecture design. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 542–555
Zhao Y, Du Z, Guo Q, Liu S, Li L, Xu Z, Chen T, Chen Y (2019) Cambricon-F: machine learning computers with fractal von Neumann architecture. In: 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 788–801
Zhao L, Zhang Y, Yang J (2020) SCA: a secure CNN accelerator for both training and inference. In: 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, pp 1–6
Zhou X, Du Z, Guo Q, Liu S, Liu C, Wang C, Zhou X, Li L, Chen T, Chen Y (2018) Cambricon-S: addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 15–28
Zhu Y, Samajdar A, Mattina M, Whatmough P (2018) Euphrates: algorithm-SoC co-design for low-power mobile continuous vision. In: Proceedings of the 45th Annual International Symposium on Computer Architecture, pp 547–560
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2022 Springer Nature Singapore Pte Ltd.
About this entry
Cite this entry
Yang, Y., Chen, C., Wang, Z. (2022). Architectures for Machine Learning. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_12-1
Download citation
DOI: https://doi.org/10.1007/978-981-15-6401-7_12-1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6401-7
Online ISBN: 978-981-15-6401-7
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering