Architectures for Machine Learning

Yang, Yongkui; Chen, Chao; Wang, Zheng

doi:10.1007/978-981-15-6401-7_12-1

Yongkui Yang²,
Chao Chen² &
Zheng Wang²

254 Accesses

Abstract

The term “artificial intelligence (AI)” was coined in 1956, and its development has undergone periods of extreme hype and periods of strong disillusionment since then. Today, AI has received tremendous attention from both academia and industry, and it will remain one of the hottest topics in the foreseeable future. A subset of AI named machine learning (ML) has achieved great success throughout a huge variety of fields, such as computer vision, natural language processing, and computer gaming. ML was first proposed to endow machine the ability to imitate the learning process of the human brain using neuromorphic models. However, the modelling complexity and limited computing capabilities of machines hindered the development of ML in its early days. Benefiting from the ever-growing computing power and availability of digital data, ML has adopted both bio-inspired spiking neural network (SNN), or neuromorphic computing, and practical artificial neural network (ANN), which have become two of the top trending methods with outstanding results.

This chapter gives a brief overview of the state-of-the-art architectures and circuits for ML. On the one hand, neuromorphic computing architectures and accelerators are investigated, including bio-inspired computational models and learning methods, microarchitecture, circuit-level design considerations, and prominent neuromorphic chips. On the other hand, architectures for ANNs are outlined, including essential design metrics on ANN accelerators and various state-of-the-art ANN architectures and circuits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Akopyan F, Sawada J, Cassidy A, Alvarez-Icaza R, Arthur J, Merolla P, Imam N, Nakamura Y, Datta P, Nam GJ, Taba B (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans Comput-Aided Des Integr Circuits Syst 34(10):1537–1557
Article Google Scholar
Albericio J, Judd P, Hetherington T, Aamodt T, Jerger NE, Moshovos A (2016) Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput Archit News 44(3):1–13
Article Google Scholar
Albericio J, Delmás A, Judd P, Sharify S, O’Leary G, Genov R, Moshovos A (2017) Bit-pragmatic deep neural network computing. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp 382–394
Google Scholar
Amravati A, Nasir SB, Thangadurai S, Yoon I, Raychowdhury A (2018) A 55nm time-domain mixed-signal neuromorphic accelerator with stochastic synapses and embedded reinforcement learning for autonomous micro-robots. In: 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, pp 124–126
Google Scholar
Anwani N, Rajendran B (2015) Normad-normalized approximate descent based supervised learning rule for spiking neurons. In 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Google Scholar
Azizimazreah A, Chen L (2019) Shortcut mining: exploiting cross-layer shortcut reuse in dcnn accelerators. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 94–105
Google Scholar
Benjamin BV, Gao P, McQuinn E, Choudhary S, Chandrasekaran AR, Bussat JM, Alvarez-Icaza R, Arthur JV, Merolla PA, Boahen K (2014) Neurogrid: a mixed-analog-digital multichip system for large-scale neural simulations. Proc IEEE 102(5):699–716
Article Google Scholar
Berdan R, Marukame T, Kabuyanagi S, Ota K, Saitoh M, Fujii S (2019) In-memory reinforcement learning with moderatelystochastic conductance switching of ferroelectric tunnel junctions. In: Proceeding Symposium on VLSI Technology, pp 22–23
Google Scholar
Bi GQ, Poo MM (1998) Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci 18(24):10464–10472
Article Google Scholar
Bo D et al OR-ML: enhancing reliability for machine learning accelerator with opportunistic redundancy. In: 2021 IEEE Design, Automation and Test in Europe Conference (DATE) (2021)
Google Scholar
Bohte SM, Kok JN, La Poutre H (2002) Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48(1–4):17–37
Article MATH Google Scholar
Brader JM, Senn W, Fusi S (2007) Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural Comput 19(11):2881–2912
Article MathSciNet MATH Google Scholar
Buckler M, Bedoukian P, Jayasuriya S, Sampson A (2018) EVA2: exploiting temporal redundancy in live computer vision. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 533–546
Google Scholar
Cai R, Ren A, Liu N, Ding C, Wang L, Qian X, Pedram M, Wang Y (2018) Vibnn: hardware acceleration of Bayesian neural networks. ACM SIGPLAN Not 53(2):476–488
Article Google Scholar
Cai H, Gan C, Wang T, Zhang Z, Han S (2019) Once-for-all: train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791
Google Scholar
Chakradhar S, Sankaradas M, Jakkula V, Cadambi S (2010) A dynamically configurable coprocessor for convolutional neural networks. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, pp 247–257
Google Scholar
Chattopadhyay A, Meyr H, Leupers R (2008) LISA: a uniform ADL for embedded processor modeling, implementation, and software toolsuite generation. In: Processor description languages. Morgan Kaufmann, San Francisco, pp 95–132
Chapter Google Scholar
Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014a) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput Archit News 42(1):269–284
Article Google Scholar
Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N, Temam O (2014b) Dadiannao: a machine-learning supercomputer. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, pp 609–622
Google Scholar
Chen YH, Emer J, Sze V (2016) Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput Archit News 44(3):367–379
Article Google Scholar
Chen T, Moreau T, Jiang Z, Zheng L, Yan E, Shen H, Cowan M, Wang L, Hu Y, Ceze L, Guestrin C (2018) TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp 578–594
Google Scholar
Chen Y-H, Yang T-J, Emer J, Sze V (2019) Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J Emerg Sel Top Circuits Syst 9(2):292–308
Article Google Scholar
Chen Y, Xie Y, Song L, Chen F, Tang T (2020) A survey of accelerator architectures for deep neural networks. Engineering 6(3):264–274
Article Google Scholar
Chen W et al (2021) Improving system latency of AI accelerator with on-chip pipelined activation preprocessing and multi-mode batch inference. In: IEEE International Conference on Artificial Intelligence Circuits and Systems. IEEE
Book Google Scholar
Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E (2014) cudnn: efficient primitives for deep learning. arXiv preprint arXiv:1410.0759
Google Scholar
Chicca E, Stefanini F, Bartolozzi C, Indiveri G (2014) Neuromorphic electronic circuits for building autonomous cognitive systems. Proc IEEE 102(9):1367–1388
Article Google Scholar
Cho H, Oh P, Park J, Jung W, Lee J (2019) Fa3c: FPGA-accelerated deep reinforcement learning. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp 499–513
Google Scholar
Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with COTS HPC systems. In: International Conference on Machine Learning. PMLR, pp 1337–1345
Google Scholar
Dally B (2021) Sustainable computing via domain-specific architecture and efficient circuits. DATE Special Day on Sustainable HPC
Google Scholar
Davies M, Srinivasa N, Lin TH, Chinya G, Cao Y, Choday SH, Dimou G, Joshi P, Imam N, Jain S, Liao Y (2018) Loihi: a neuromorphic manycore processor with on-chip learning. Ieee Micro 38(1):82–99
Article Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Google Scholar
Ditzel D, Kuroda T, Lee S (2014) Low-cost 3D chip stacking with ThruChip wireless connections. In: Proceedings of IEEE Hot Chips Symposium (HCS), pp 1–37
Google Scholar
Dong C, Loy CC, Tang X (2016) Accelerating the super-resolution convolutional neural network. In: European Conference on Computer Vision. Springer (2016), pp 391–407
Google Scholar
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: shifting vision processing closer to the sensor. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp 92–104
Google Scholar
Folowosele F, Harrison A, Cassidy A, Andreou AG, Etienne-Cummings R, Mihalas S, Niebur E, Hamilton TJ (2009) A switched capacitor implementation of the generalized linear integrate-and-fire neuron. In: 2009 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, pp 2149–2152
Google Scholar
Freericks M (1991) The nML machine description formalism. Leiter der Fachbibliothek Informatik, Sekretariat FR 5–4
Google Scholar
Frenkel C, Lefebvre M, Legat JD, Bol D (2018) A 0.086-mm ²12.7-pj/sop 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor in 28-nm CMOS. IEEE Trans Biomed Circuits Syst 13(1):145–158
Google Scholar
Friedmann S, Schemmel J, Grübl A, Hartel A, Hock M, Meier K (2016) Demonstrating hybrid learning in a flexible neuromorphic hardware system. IEEE Trans Biomed Circuits Syst 11(1):128–142
Article Google Scholar
Furber SB, Galluppi F, Temple S, Plana LA (2014) The spinnaker project. Proc IEEE 102(5):652–665
Article Google Scholar
Gao M, Pu J, Yang X, Horowitz M, Kozyrakis C (2017) Tetris: scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, pp 751–764.
Google Scholar
Gao C, Neil D, Ceolini E, Liu SC, Delbruck T (2018) DeltaRNN: a power-efficient recurrent neural network accelerator. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 21–30
Google Scholar
Geng T, Li A, Shi R, Wu C, Wang T, Li Y, Haghi P, Tumeo A, Che S, Reinhardt S, Herbordt MC (2020) AWB-GCN: a graph convolutional network accelerator with runtime workload rebalancing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 922–936
Google Scholar
Ghosh-Dastidar S, Adeli H (2009) A new supervised learning algorithm for multiple spiking neural networks with application in epilepsy and seizure detection. Neural Netw 22(10):1419–1431
Article Google Scholar
Gokhale V, Jin J, Dundar A, Martini B, Culurciello E (2014) A 240 G-ops/s mobile coprocessor for deep neural networks. In: CVPR Workshop, pp 682–687
Google Scholar
Guo R, Liu Y, Zheng S, Wu SY, Ouyang P, Khwa WS, Chen X, Chen JJ, Li X, Liu L, Chang MF (2019) A 5.1 pJ/neuron 127.3 us/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65 nm CMOS. In: 2019 Symposium on VLSI Circuits. IEEE, pp C120–C121
Google Scholar
Gwennap L (2016) Wave accelerates deep learning-new dataflow processor targets 10x speedup for neural networks. The Linley MicroProcessor Report
Google Scholar
Ham TJ, Jung SJ, Kim S, Oh YH, Park Y, Song Y, Park JH, Lee S, Park K, Lee JW, Jeong DK (2020) A3̂: accelerating attention mechanisms in neural networks with approximation. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 328–341
Google Scholar
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput Archit News 44(3):243–254
Article Google Scholar
Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y, Yang H (2017) Ese: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 75–84
Google Scholar
Hegde K, Agrawal R, Yao Y, Fletcher CW (2018) Morph: flexible acceleration for 3d cnn-based video understanding. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 933–946
Google Scholar
Herculano-Houzel S (2009) The human brain in numbers: a linearly scaled-up primate brain. Front Hum Neurosci 3:31
Article Google Scholar
Hosomi M, Yamagishi H, Yamamoto T, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C, Nagao H (2005) A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. In: IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest. IEEE, pp 459–462
Chapter Google Scholar
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360
Google Scholar
Iandola FN, Shaw AE, Krishna R, Keutzer KW (2020) SqueezeBERT: what can computer vision teach NLP about efficient neural networks? arXiv preprint arXiv:2006.11316
Google Scholar
Indiveri G, Chicca E, Douglas RJ (2006) A VLSI array of low-power spiking neurons and bistable synapses with spike–timing dependent plasticity. IEEE Trans Neural Netw 17(1):211–221
Article Google Scholar
Izhikevich EM (2003) Simple model of spiking neurons. IEEE Trans Neural Netw 14(6):1569–1572
Article MathSciNet Google Scholar
James M et al (2020) Ispd 2020 physical mapping of neural networks on a wafer-scale deep learning accelerator. In: Proceedings of the 2020 International Symposium on Physical Design
Google Scholar
Jeddeloh J, Keeth B (2012) Hybrid memory cube new DRAM architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT). IEEE, pp 87–88
Google Scholar
Jia T, Ju Y, Joseph R, Gu J (2020) NCPU: an embedded neural CPU architecture on resource-constrained low power devices for real-time end-to-end performance. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1097–1109
Google Scholar
Joulin A, Cissé M, Grangier D, Jégou H (2017) Efficient softmax approximation for GPUs. In: International Conference on Machine Learning. PMLR, pp 1302–1310
Google Scholar
Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp 1–12
Google Scholar
Jouppi NP, Yoon DH, Kurian G, Li S, Patil N, Laudon J, Young C, Patterson D (2020) A domain-specific supercomputer for training deep neural networks. Commun ACM 63(7):67–78
Article Google Scholar
Judd P, Albericio J, Hetherington T, Aamodt TM, Moshovos A (2016) Stripes: bit-serial deep neural network computing. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1–12
Google Scholar
Keutzer K. What every NN accelerator architect should know about deep learning applications and software. In: keynote of 2021 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC)
Google Scholar
Kim D, Kung J, Chai S, Yalamanchili S, Mukhopadhyay S (2016) Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory. ACM SIGARCH Comput Archit News 44(3):380–392
Article Google Scholar
Kim H, Sim J, Choi Y, Kim LS (2019) Nand-net: minimizing computational complexity of in-memory processing for binary neural networks. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 661–673
Google Scholar
Kim S, Gholami A, Yao Z, Mahoney MW, Keutzer K (2021a) I-bert: integer-only bert quantization. In: International Conference on Machine Learning. PMLR, pp 5506–5518
Google Scholar
Kim S, Gholami A, Yao Z, Nrusimha A, Zhai B, Gao T, Mahoney MW, Keutzer K (2021b) Q-ASR: Integer-Only Zero-Shot Quantization for Efficient Speech Recognition. arXiv e-prints, arXiv-2103
Google Scholar
Ko GG, Chai Y, Donato M, Whatmough PN, Tambe T, Rutenbar RA, Brooks D, Wei GY (2020) A 3mm 2 programmable Bayesian inference accelerator for unsupervised machine perception using parallel Gibbs sampling in 16nm. In: 2020 IEEE Symposium on VLSI Circuits. IEEE, pp 1–2
Google Scholar
Korat UA, Alimohammad A (2019) A reconfigurable hardware architecture for principal component analysis. Circuits Syst Sig Process 38(5):2097–2113
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. ACM SIGPLAN Not 53(2):461–475
Article Google Scholar
Lee DU, Kim KW, Kim KW, Kim H, Kim JY, Park YJ, Kim JH, Kim DS, Park HB, Shin JW, Cho JH (2014) 25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, pp 432–433
Google Scholar
Lee J, Kim C, Kang S, Shin D, Kim S, Yoo H (2018) UNPU: a 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: 2018 IEEE International Solid – State Circuits Conference (ISSCC), pp 218–220
Google Scholar
Lee J, Shin D, Lee J, Lee J, Kang S, Yoo HJ (2019) A full HD 60 fps CNN super resolution processor with selective caching based layer fusion for mobile devices. In: 2019 Symposium on VLSI Circuits. IEEE, pp C302–C303
Google Scholar
Li Z, Ding C, Wang S, Wen W, Zhuo Y, Liu C, Qiu Q, Xu W, Lin X, Qian X, Wang Y (2019a) E-RNN: Design optimization for efficient recurrent neural networks in FPGAs. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 69–80
Google Scholar
Li Y, Liu IJ, Yuan Y, Chen D, Schwing A, Huang J (2019b) Accelerating distributed reinforcement learning with in-switch computing. In: 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 279–291
Google Scholar
Li J, Louri A, Karanth A, Bunescu R (2021) GCNAX: a flexible and energy-efficient accelerator for graph convolutional neural networks. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, pp 775–788
Google Scholar
Lines A, Joshi P, Liu R, McCoy S, Tse J, Weng YH, Davies M (2018) Loihi asynchronous neuromorphic research chip. In: 2018 24th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC). IEEE, pp 32–33
Google Scholar
Liu D, Chen T, Liu S, Zhou J, Zhou S, Teman O, Feng X, Zhou X, Chen Y (2015) Pudiannao: a polyvalent machine learning accelerator. ACM SIGARCH Comput Archit News 43(1):369–381
Article Google Scholar
Liu S, Du Z, Tao J, Han D, Luo T, Xie Y, Chen Y, Chen T (2016) Cambricon: an instruction set architecture for neural networks. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 393–405
Google Scholar
Liu C, Bellec G, Vogginger B, Kappel D, Partzsch J, Neumärker F, Höppner S, Maass W, Furber SB, Legenstein R, Mayr CG (2018) Memory-efficient deep learning on a spinnaker 2 prototype. Front Neurosci 12:840
Article Google Scholar
Lu W, Yan G, Li J, Gong S, Han Y, Li X (2017) Flexflow: a flexible dataflow accelerator architecture for convolutional neural networks. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 553–564
Google Scholar
Maher MAC, Deweerth SP, Mahowald MA, Mead CA (1989) Implementing neural architectures using analog VLSI circuits. IEEE Trans Circuits Syst 36(5):643–652
Article Google Scholar
Mahmoud M, Siu K, Moshovos A (2018) Diffy: a Déjà vu-free differential deep neural network accelerator. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 134–147
Google Scholar
Martin AJ (1990) The limitations to delay-insensitivity in asynchronous circuits. In: Beauty is our business. Springer, New York, pp 302–311
Chapter Google Scholar
Martin AJ, Nyström M (2004) CAST: Caltech asynchronous synthesis tools. In: Asynchronous Circuit Design Working Group Workshop, Turku
Google Scholar
Mead C (1990) Neuromorphic electronic systems. Proc IEEE 78(10):1629–1636
Article Google Scholar
Meng H, Appiah K, Hunter A, Dickinson P (2011) FPGA implementation of naive bayes classifier for visual object recognition. In: CVPR 2011 WORKSHOPS. IEEE, pp 123–128
Google Scholar
Mitchell TM (1997) Machine learning. McGraw Hill. ISBN 0-07-042807-7
MATH Google Scholar
Molchanov P, Hall J, Yin H, Kautz J, Fusi N, Vahdat A (2021) HANT: hardware-aware network transformation. arXiv preprint arXiv:2107.10624
Google Scholar
Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017) 14.5 envision: a 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, pp 246–247
Google Scholar
Moreau T, Chen T, Vega L, Roesch J, Yan E, Zheng L, Fromm J, Jiang Z, Ceze L, Guestrin C (2019) A hardware–software blueprint for flexible deep learning specialization. IEEE Micro 39(5):8–16
Article Google Scholar
Norrie T, Patil N, Yoon DH, Kurian G, Li S, Laudon J, Young C, Jouppi NP, Patterson DA (2020) Google’s Training Chips Revealed: TPUv2 and TPUv3. In: Hot Chips Symposium, pp 1–70
Google Scholar
NVIDIA (2017) NVIDIA deep learning accelerator (NVDLA). http://nvdla.org
Papadonikolakis M, Bouganis CS (2012) Novel cascade FPGA accelerator for support vector machines classification. IEEE Trans Neural Netw Learn Syst 23(7):1040–1052
Article Google Scholar
Peemen M, Setio AAA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: IEEE International Conference on Computer Design (ICCD), pp 13–19
Google Scholar
Pei J, Deng L, Song S, Zhao M, Zhang Y, Wu S, Wang G, Zou Z, Wu Z, He W, Chen F (2019) Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572(7767):106–111
Article Google Scholar
Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee SK, Hernández-Lobato JM, Wei GY, Brooks D (2016) Minerva: enabling low-power, highly-accurate deep neural network accelerators. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 267–278
Google Scholar
Riera M, Arnau JM, González A (2018) Computation reuse in DNNs by exploiting input similarity. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 57–68
Google Scholar
Ryu S, Kim H, Yi W, Kim JJ (2019) Bitblade: area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In: Proceedings of the 56th Annual Design Automation Conference 2019, pp 1–6
Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Google Scholar
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108
Google Scholar
Sanh V, Wolf T, Rush A (2020) Movement pruning: adaptive sparsity by fine-tuning. Adv Neural Inf Process Syst 33:20378–20389
Google Scholar
Saqib F, Dutta A, Plusquellic J, Ortiz P, Pattichis MS (2013) Pipelined decision tree classification accelerator implementation in FPGA (DT-CAIF). IEEE Trans Comput 64(1):280–285
Article MathSciNet MATH Google Scholar
Schemmel J, Brüderle D, Grübl A, Hock M, Meier K, Millner S (2010) A e neuromorphic hardware system for large-scale neural modeling. In: 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, pp 1947–1950
Google Scholar
Schuman CD, Potok TE, Patton RM, Birdwell JD, Dean ME, Rose GS, Plank JS (2017) A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963
Google Scholar
Sharma H, Park J, Suda N, Lai L, Chau B, Chandra V, Esmaeilzadeh H (2018) Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 764–775
Google Scholar
Shen J, Huang Y, Wang Z, Qiao Y, Wen M, Zhang C (2018) Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 97–106
Google Scholar
Sun Z, Yu H, Song X, Liu R, Yang Y, Zhou D (2019) Mobilebert: task-agnostic compression of bert by progressive knowledge transfer
Google Scholar
Sun Z, Yu H, Song X, Liu R, Yang Y, Zhou D (2020) Mobilebert: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984
Google Scholar
Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Article Google Scholar
Tambe T, Yang EY, Ko GG, Chai Y, Hooper C, Donato M, Whatmough PN, Rush AM, Brooks D, Wei GY (2021) 9.8 A 25 mm 2 SoC for IoT devices with 18 ms noise-robust speech-to-text latency via Bayesian speech denoising and attention-based sequence-to-sequence DNN speech recognition in 16 nm FinFET. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol 64. IEEE, pp 158–160
Google Scholar
Tay Y, Dehghani M, Abnar S, Shen Y, Bahri D, Pham P, Rao J, Yang L, Ruder S, Metzler D (2020) Long range arena: a benchmark for efficient transformers. arXiv preprint arXiv:2011.04006
Google Scholar
Temam O (2012) A defect-tolerant accelerator for emerging high-performance applications. In: 2012 39th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 356–367
Google Scholar
Tuma T, Pantazi A, Le Gallo M, Sebastian A, Eleftheriou E (2016) Stochastic phase-change neurons. Nat Nanotechnol 11(8):693
Article Google Scholar
Ueyoshi K, Ando K, Hirose K, Takamaeda-Yamazaki S, Kadomoto J, Miyata T, Hamada M, Kuroda T, Motomura M (2018) QUEST: a 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS. In: 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, pp 216–218
Google Scholar
Venkatesan R, Shao YS, Wang M, Clemons J, Dai S, Fojtik M, Keller B, Klinefelter A, Pinckney N, Raina P, Zhang Y (2019) Magnet: a modular accelerator generator for neural networks. In: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, pp 1–8
Google Scholar
Wang Q, Li P, Kim Y (2014) A parallel digital VLSI architecture for integrated support vector machine training and classification. IEEE Trans Very Large Scale Integr(VLSI) Syst 23(8):1471–1484
Article Google Scholar
Wang S, Li Z, Ding C, Yuan B, Qiu Q, Wang Y, Liang Y (2018) C-LSTM: enabling efficient LSTM using structured compression techniques on FPGAs. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 11–20
Google Scholar
Waser R, Dittmann R, Staikov G, Szot K (2009) Redox-based resistive switching memories–nanoionic mechanisms, prospects, and challenges. Adv Mater 21(25–26):2632–2663
Article Google Scholar
Wei X, Liang Y, Li X, Yu CH, Zhang P, Cong J (2018) TGPA: tile-grained pipeline architecture for low latency CNN inference. In: Proceedings of the International Conference on Computer-Aided Design, pp 1–8
Google Scholar
Wijekoon JH, Dudek P (2008) Compact silicon neuron circuit with spiking and bursting behaviour. Neural Netw 21(2–3):524–534
Article Google Scholar
Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76
Article Google Scholar
Winterstein F, Bayliss S, Constantinides GA (2013) September. FPGA-based K-means clustering using tree-based data structures. In: 2013 23rd International Conference on Field Programmable Logic and Applications. IEEE, pp 1–6
Google Scholar
Wong CG, Martin AJ (2003) High-level synthesis of asynchronous systems by data-driven decomposition. In: Proceedings of the 40th Annual Design Automation Conference, pp 508–513
Google Scholar
Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 129–137
Google Scholar
Wu B, Wan A, Yue X, Keutzer K (2018) Squeezeseg: convolutional neural nets with recurrent crf for real-time road-object segmentation from 3D lidar point cloud. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1887–1893
Google Scholar
Wu B, Zhou X, Zhao S, Yue X, Keutzer K (2019) Squeezesegv2: improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp 4376–4382
Google Scholar
Xu P, Zhang X, Hao C, Zhao Y, Zhang Y, Wang Y, Li C, Guan Z, Chen D, Lin Y (2020) AutoDNNchip: an automated DNN chip predictor and builder for both FPGAs and ASICs. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 40–50
Google Scholar
Yan M, Deng L, Hu X, Liang L, Feng Y, Ye X, Zhang Z, Fan D, Xie Y (2020) HyGCN: a GCN accelerator with hybrid architecture. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 15–29
Google Scholar
Yang A (2019) Deep learning training at scale spring crest deep learning accelerator (intelⓇ nervana^TM NNP-T). In: 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE, pp 1–20
Google Scholar
Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo KA (2018) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 49(7):2490–2503
Article Google Scholar
Yin S, Ouyang P, Tang S, Tu F, Li X, Zheng S, Lu T, Gu J, Liu L, Wei S (2017) A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE J Solid-State Circuits 53(4):968–982
Article Google Scholar
Yin S, Ouyang P, Yang J, Lu T, Li X, Liu L, Wei S (2018a) An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28nm CMOS. In: 2018 IEEE Symposium on VLSI Circuits. IEEE, pp 37–38
Google Scholar
Yin S, Ouyang P, Zheng S, Song D, Li X, Liu L, Wei S (2018b) A 141 uw, 2.46 pj/neuron binarized convolutional neural network based self-learning speech recognition processor in 28 nm CMOS. In: 2018 IEEE Symposium on VLSI Circuits. IEEE, pp 139–140
Google Scholar
Yin S, Jiang Z, Seo JS, Seok M (2020) XNOR-SRAM: in-memory computing SRAM macro for binary/ternary deep neural networks. IEEE J Solid-State Circuits 55(6):1733–1743
Google Scholar
Zadeh AH, Edo I, Awad OM, Moshovos A (2020) GOBO: quantizing attention-based nlp models for low latency and energy efficient inference. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 811–824
Google Scholar
Zeng H, Prasanna V (2020) Graphact: accelerating gcn training on CPU-FPGA heterogeneous platforms. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 255–265
Google Scholar
Zhai B, Gao T, Xue F, Rothchild D, Wu B, Gonzalez JE, Keutzer K (2020) Squeezewave: Extremely lightweight vocoders for on-device speech synthesis. arXiv preprint arXiv:2001.05685
Google Scholar
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 161–170
Google Scholar
Zhang S, Du Z, Zhang L, Lan H, Liu S, Li L, Guo Q, Chen T, Chen Y (2016) Cambricon-X: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 1–12
Google Scholar
Zhang J, Wu H, Wei J, Wei S, Chen H (2019) An asynchronous reconfigurable SNN accelerator with event-driven time step update. In: 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, pp 213–216
Google Scholar
Zhang X, Song SL, Xie C, Wang J, Zhang W, Fu X (2020) Enabling highly efficient capsule networks processing through a PIM-based architecture design. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 542–555
Google Scholar
Zhao Y, Du Z, Guo Q, Liu S, Li L, Xu Z, Chen T, Chen Y (2019) Cambricon-F: machine learning computers with fractal von Neumann architecture. In: 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 788–801
Google Scholar
Zhao L, Zhang Y, Yang J (2020) SCA: a secure CNN accelerator for both training and inference. In: 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, pp 1–6
Google Scholar
Zhou X, Du Z, Guo Q, Liu S, Liu C, Wang C, Zhou X, Li L, Chen T, Chen Y (2018) Cambricon-S: addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 15–28
Google Scholar
Zhu Y, Samajdar A, Mattina M, Whatmough P (2018) Euphrates: algorithm-SoC co-design for low-power mobile continuous vision. In: Proceedings of the 45th Annual International Symposium on Computer Architecture, pp 547–560
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yongkui Yang, Chao Chen & Zheng Wang

Authors

Yongkui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Wang .

Editor information

Editors and Affiliations

Sch of Computer Science & Engineering, Nanyang Technological University, Singapore, Singapore
Anupam Chattopadhyay

Section Editor information

Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Anupam Chattopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Yang, Y., Chen, C., Wang, Z. (2022). Architectures for Machine Learning. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_12-1

Download citation

DOI: https://doi.org/10.1007/978-981-15-6401-7_12-1
Published: 11 August 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6401-7
Online ISBN: 978-981-15-6401-7
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics