Abstract
Machine learning algorithms have been successfully deployed in cloud-centric applications and on computationally powerful digital platforms such as high-end FPGAs and GPUs. As this rise of machine learning applications continues, some of these algorithms must move “closer to the sensor,” thereby eliminating the latency of cloud access and providing a scalable solution that avoids the large energy cost per bit transmitted through the network. This chapter reviews state-of-the-art approaches and trends for low-energy machine learning inference and training at the edge. It covers dataflow, architecture and circuit design aspects of fully digital processors and mixed analog/digital implementations targeting the microwatts and milliwatts range.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, M. Ayyash, Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutorials 17(4), 2347–2376 (2015)
M. Satyanarayanan, P. Simoens, Y. Xiao, P. Pillai, Z. Chen, K. Ha, W. Hu, B. Amos, Edge analytics in the internet of things. IEEE Pervasive Comput. 14(2), 24–31 (2015)
H. Li, K. Ota, M. Dong, Learning IoT in edge: deep learning for the Internet of Things with edge computing. IEEE Netw. 32(1), 96–101 (2018)
A. Canziani, A. Paszke, E. Culurciello, An Analysis of Deep Neural Network Models for Practical Applications. arXiv preprint arXiv:1605.07
Caulfield, A.M., Chung, E.S., Putnam, A., Angepat, H., Fowers, J., Haselman, M., Lo, D. et al., A cloud-scale acceleration architecture, in The 49th Annual IEEE/ACM International Symposium on Microarchitecture (IEEE Press, 2016), p. 7
N. Strom, Scalable distributed DNN training using commodity GPU cloud computing, in Sixteenth Annual Conference of the International Speech Communication Association (2015)
Tractica report, Deep Learning Chipsets (2018). https://www.tractica.com/research/deep-learning-chipsets/
Semiconductor Engineering, AI Chip Architectures Race To The Edge (2018). https://semiengineering.com/ai-chip-architectures-race-to-the-edge/
K. Guo, W. Li, K. Zhong, Z. Zhu, S. Zeng, S. Han, Y. Xie, P. Debacker, M. Verhelst, Y. Wang, Neural Network Accelerator Comparison. [Online]. https://nicsefc.ee.tsinghua.edu.cn/projects/neural-network-accelerator/
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT press, Cambridge, 2016)
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision (2015). arXiv preprint arXiv:1512.00567
F. Chollet, Xception: Deep Learning with Depthwise Separable Convolutions (2016). arXiv preprint arXiv:1610.02357
A. Howard et al., MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017). arXiv:1704.04861
M. Sandler et al., MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv:1801.04381
C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, K. Murphy, Progressive neural architecture search, in ECCV2018
E. Real, A. Aggarwal, Y. Huang, Q.V. Le, Regularized evolution for image classifier architecture search, in The Thirty-Third AAAI Conference on Artificial Intelligence (2019)
S. Xie, A. Kirillov, R. Girshick, K. He, Exploring Randomly Wired Neural Networks for Image Recognition (2019). arXiv:1904.01569
X. Chu, B. Zhang, J. Li, Q. Li, R. Xu, ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search (2019). arXiv:1908.06022
X. Zhang, Z. Li, C. Change Loy, D. Lin, PolyNet: A Pursuit of Structural Diversity in Very Deep Networks (2019). arXiv:1611.05725
Google’s AutoML, https://research.googleblog.com/2017/11/automl-for-large-scale-image.html?m=1
Q. Yao et al., Taking the Human out of Learning Applications: A Survey on Automated Machine Learning. arXiv: 1810.13306
Y. He, J. Lin, Z. Liu, H. Wang, L.J. Li, S. Han, Amc: Automl for model compression and acceleration on mobile devices, in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 784–800
M. Tan, B. Chen, R. Pang, V. Vasudevan, Q.V. Le, Mnasnet: Platform-aware neural architecture search for mobile (2018). arXiv preprint arXiv:1807.11626
T.-J. Yang, et al., Netadapt: platform-aware neural network adaptation for mobile applications, in ECCV (2018)
J.A. Suykens, J. Vandewalle, Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
S. Gupta, A. Agrawal, K. Gopalakrishnan, P. Narayanan, Deep learning with limited numerical precision, in CoRR, vol. abs/1502.02551 (2015)
M. Courbariaux, Y. Bengio, J.-P. David, Training deep neural networks with low precision multiplications (2014). arXiv preprint arXiv:1412.7024
R. Krishnamoorthi, Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv:1806.08342
M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or −1 (2016). arXiv preprint arXiv:1602.02830
N. Mellempudi, A. Kundu, D. Das, D. Mudigere, B. Kaul, Mixed low-precision deep learning inference using dynamic fixed point (2017). arXiv preprint arXiv:1701.08978
I. Hubara et al., Quantized neural networks: training neural networks with low precision weights and activations. ArXiv1609.07061
B. Jacob et al., Quantization and training of neural networks for efficient integer-arithmetic-only inference, in CVPR (2018)
M. Nagel, M. van Baalen, T. Blankevoort, M. Welling, Data-free quantization (DFQ) through weight equalization and bias correction (2019). arXiv:1906.04721v1
E. Meller, A. Finkelstein, U. Almog, M. Grobman, Same, same but different—recovering neural network quantization error through weight factorization (2019). arxiv:1902.01917
B. Moons, K. Goetschalckx, N. Van Berckelaer, M. Verhelst, Minimum energy quantized neural networks (2017). arXiv preprint arXiv:1711.00215
S. Han, J. Pool, J. Tran, W. Dally, Learning both weights and connections for efficient neural network, in Advances in Neural Information Processing Systems (2015), pp. 1135–1143
J. Xue, J. Li, Y. Gong, Restructuring of deep neural network acoustic models with singular value decomposition, in INTERSPEECH (2013)
T.-J. Yang, Y.-H. Chen, V. Sze, Designing energy-efficient convolutional neural networks using energy-aware pruning, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
W. Wei, Learning structured sparsity in deep neural networks, in NIPS2016
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network (2016). arXiv preprint arXiv:1602.01528
Y.-H. Chen et al., Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices, in JETCAS (2019)
Y. Liu, Y. Wang, R. Yu, M. Li, V. Sharma, Y. Wang, Optimizing CNN model inference on CPUs (2018). arXiv: 1809.02697
S. Markidis, S.W. Der Chien, E. Laure, I.B. Peng, J.S. Vetter, Nvidia tensor core programmability, performance & precision, in 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (IEEE, May 2018), pp. 522–531
B. Moons, D. Bankman, M. Verhelst, Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing (Springer, 2019). ISBN 978-3-319-99223-5
B. Moons, R. Uytterhoeven, W. Dehaene, M. Verhelst, Envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm fdsoi, in 2017 IEEE International Solid-State Circuits Conference (ISSCC) (IEEE, 2017), pp. 246–247
N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, In-datacenter performance analysis of a tensor processing unit, in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (IEEE), June 2017, pp. 1–12
B. Moons, D. Bankman, L. Yang, B. Murmann, M. Verhelst, BinarEye: an always-on energy-accuracy-scalable binary CNN processor with all memory on chip in 28 nm CMOS, in IEEE Custom Integrated Circuits Conference (CICC) (2018), pp. 1–4
Y.H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. 52(1), 127–138 (2016)
G. Desoli, N. Chawla, T. Boesch, S. Singh, E. Guidetti, F. De Ambroggi, T. Majo, P. Zambotti, M. Ayodhyawasi, H. Singh, N. Aggarwal, A 2.9 TOPS/W deep convolutional neural network SoC in FD-SOI 28 nm for intelligent embedded systems
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S.W. Keckler, W.J. Dally, SCNN: an accelerator for compressed-sparse convolutional neural networks, in Proceedings of ISCA ’17, Toronto, ON, Canada, 24–28 June 2017
M. Nikolić, M. Mahmoud, A. Moshovos, Y. Zhao, R. Mullins, Characterizing sources of ineffectual computations in deep learning networks, in 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (IEEE, 2019), pp. 165–176
V. Camus, C. Enz, M. Verhelst, Survey of precision-scalable multiply-accumulate units for neural-network processing, in 2019 IEEE 1st International Conference on Artificial Intelligence Circuits and Systems (AICAS), Mar 2019
S. Cosemans, Advanced memory, logic and 3D technologies for in-memory computing and machine learning, in ISSCC2019 Forum Talk
L. Mei, M. Dandekar, D. Rodopoulos, J. Constantin, P. Debacker, R. Lauwereins, M. Verhelst, Sub-word parallel precision-scalable MAC engines for efficient embedded DNN inference, in 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) (IEEE, 2019), pp. 6–10
L. Mei, V. Camus, C. Enz, M. Verhelst, Review and benchmarking of precision-scalable multiply-accumulate unit architectures for embedded neural-network processing, in 2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2020)
D. Shin, J. Lee, J. Lee, H.-J. Yoo, DNPU: An 8.1 TOPS/W Reconfigurable CNN-RNN Processor for General-Purpose Deep Neural Networks
B. Moons, R. Uytterhoeven, W. Dehaene, M. Verhelst, DVAFS: trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling, in Design, Automation & Test in Europe Conference & Exhibition (DATE) (IEEE, 2017), pp. 488–493
Sharma et al., BitFusion: bit-level dynamically composable architecture for accelerating deep neural networks, in ISCA18
J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, H.J. Yoo, UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision, in 2018 IEEE International Solid-State Circuits Conference (ISSCC) (2018), pp. 218–220
S. Sharifymoghaddam et al., Loom: exploiting weight and activation precisions to accelerate convolutional neural networks, in DAC Conference (2018)
L. Liu, J. Deng, Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution (2017). arXiv preprint arXiv:1701.00299
A. Coucke, M. Chlieh, T. Gisselbrecht, D. Leroy, M. Poumeyrol, T. Lavril, Efficient keyword spotting using dilated convolutions and gating, in ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2019), pp. 6351–6355
Z. Yan, X. Li, M. Li, W. Zuo, S. Shan, Shift-net: image inpainting via deep feature rearrangement, in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 1–17
A.V.D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu, Wavenet: a generative model for raw audio (2016). arXiv preprint arXiv:1609.03499
R.A. Nawrocki, R.M. Voyles, S.E. Shaheen, A mini review of neuromorphic architectures and implementations. IEEE Trans. Electron Devices 63(10), 3819–3829 (2016)
C. Mead, Neuromorphic electronic systems. Proc. IEEE 78(10), 1629–1636 (1990)
E.A. Vittoz, Future of analog in the VLSI environment, in IEEE International Symposium on Circuits and Systems (1990), pp. 1372–1375
P.A. Merolla, J.V. Arthur, R. Alvarez-Icaza, A.S. Cassidy, J. Sawada, F. Akopyan, B.L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S.K. Esser, R. Appuswamy, B. Taba, A. Amir, M.D. Flickner, W.P. Risk, R. Manohar, D.S. Modha, A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014)
B. Murmann, D. Bankman, E. Chai, D. Miyashita, L. Yang, Mixed-signal circuits for embedded machine-learning applications, in Asilomar Conference on Signals, Systems and Computers (Nov 2015), Asilomar, CA
D. Bankman, B. Murmann, An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS, in Proceedings of IEEE Asian Solid-State Circuits Conference (Nov 2016), Toyama, Japan, pp. 21–24
A.S. Rekhi, B. Zimmer, N. Nedovic, N. Liu, R. Venkatesan, M. Wang, B. Khailany, W.J. Dally, C.T. Gray, Analog/mixed-signal hardware error modeling for deep learning inference, in Proceedings of Design Automation Conference (2019), pp. 1–6
V. Sze, Y. Chen, J. Emer, A. Suleiman, Z. Zhang, Hardware for machine learning: challenges and opportunities,in IEEE Custom Integrated Circuits Conference (CICC) (2017), Austin, TX, pp. 1–8
D. Bankman, L. Yang, B. Moons, M. Verhelst, B. Murmann, An always-on 3.8 uJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28-nm CMOS. IEEE J. Solid-State Circ. 54(1), 158–172 (2019)
W.H. Kautz, Cellular logic-in-memory arrays. IEEE Trans. Comput. C-18(8), 719–727 (1969)
H. Valavi, P.J. Ramadge, E. Nestler, N. Verma, A 64-tile 2.4-Mb in-memory-computing CNN accelerator employing charge-domain compute. IEEE J. Solid-State Circ. 54(6), 1789–1799 (2019)
N. Verma et al., In-memory computing: advances and prospects. IEEE Solid-State Circ. Mag. 11(3), 43–55 (2019)
H. Jia, Y. Tang, H. Valavi, J. Zhang, N. Verma, A microprocessor implemented in 65 nm CMOS with configurable and bit-scalable accelerator for programmable in-memory computing (2018). arXiv preprint, arXiv:1811.04047
H. Tsai, S. Ambrogio, P. Narayanan, R.M. Shelby, G.W. Burr, Recent progress in analog memory-based accelerators for deep learning. J. Phys. D Appl. Phys. 51(28), 283001 (2018)
S. Mittal, A survey of ReRAM-based architectures for processing-in-memory and neural networks, in Machine Learning & Knowledge Extraction (2018)
M. Dazzi, A. Sebastian, P.A. Francese, T. Parnell, L. Benini, E. Eleftheriou, 5 parallel prism: a topology for pipelined implementations of convolutional neural networks using computational memory (2019). arXiv preprint, arXiv:1906.03474
Y. Lin et al., Performance impacts of analog ReRAM Non-ideality on neuromorphic computing. IEEE Trans. Electron Devices 66(3), 1289–1295 (2019)
S. Yin, X. Sun, S. Yu, J.S. Seo, High-throughput in-memory computing for binary deep neural networks with monolithically integrated RRAM and 90 nm CMOS (2019). arXiv preprint arXiv:1909.07514
B. Murmann, ADC performance survey 1997–2019, [Online]. http://web.stanford.edu/~murmann/adcsurvey.html
W.J. Dally et al., Hardware-enabled artificial intelligence, in Symposium on VLSI Circuits (2018), pp. 1–2
D. Bankman, J. Messner, A. Gural, B. Murmann, RRAM-based in-memory computing for embedded deep neural networks, in Asilomar Conference on Signals, Systems and Computers, Asilomar, CA, Nov 2019
B. Moons, K. Goetschalckx, N. Van Berckelaer, M. Verhelst, Minimum energy quantized neural networks. arXiv preprint arXiv:1711.00215
B. Moons, D. Bankman, L. Yang, B. Murmann, M. Verhelst, BinarEye: an always-on energy-accuracy-scalable binary CNN processor with all memory on chip in 28 nm CMOS, in Custom Integrated Circuits Conference (CICC) (IEEE, 2018), pp. 1–4
A. Stoutchinin, F. Conti, L. Benini, Optimally scheduling CNN convolutions for efficient memory access (2019). arXiv preprint arXiv:1902.01492
X. Yang, M. Gao, J. Pu, A. Nayak, A. Liu, S.E. Bell, J.O. Setter, K. Cao, H. Ha, C. Kozyrakis, M. Horowitz, DNN dataflow choice is overrated (2018). arXiv preprint arXiv:1809.04070
A. Parashar, P. Raina, Y.S. Shao, Y.H. Chen, V.A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S.W. Keckler, J. Emer, Timeloop: a systematic approach to DNN accelerator evaluation, in 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (IEEE, 2019), pp. 304–315
H. Kwon, M. Pellauer, T. Krishna, Maestro: an open-source infrastructure for modeling dataflows within deep learning accelerators (2018). arXiv preprint arXiv:1805.02566
K. Goetschalckx, M. Verhelst, Breaking high resolution CNN bandwidth barriers with enhanced depth-first execution, in JETCAS 2019
R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, T. Tuytelaars, Memory aware synapses: learning what (not) to forget, in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 139–154
A. Nøkland, Direct feedback alignment provides learning in deep neural networks, in Advances in Neural Information Processing Systems (2016), pp. 1037–1045
C. Frenkel, M. Lefebvre, D. Bol, Learning without feedback: direct random target projection as a feedback-alignment algorithm with layerwise feedforward training (2019). arXiv preprint arXiv:1909.01311
J. Yue, R. Liu, W. Sun, Z. Yuan, Z. Wang, Y.N. Tu, Y.-J. Chen, A. Ren, Y. Wang, M.-F. Chang, X. Li, H. Yang, Y. Liu, 7.5 A 65 nm 0.39-to-140.3 TOPS/W 1-to-12b unified neural network processor using block-circulant-enabled transpose-domain acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-based 2D data-reuse architecture, in 2019 IEEE International Solid-State Circuits Conference-(ISSCC) (IEEE, 2019), pp. 138–140
D. Han, J. Lee, J. Lee, H.J. Yoo, A low-power deep neural network online learning processor for real-time object tracking application. IEEE Trans. Circuits Syst. I Regul. Pap. 66(5), 1794–1804 (2018)
S. Yu, Neuro-inspired computing with emerging nonvolatile memory. Proc. IEEE 106, 260–285 (2018)
A. Gural et al., Low-rank training of deep neural networks for emerging memory technology, unpublished work
H. Li, P. Raina, H.-S. P. Wong, Neuro-inspired computing with emerging memories: where device physics meets learning algorithms, in Proceedings of SPIE 11090, Spintronics XII, 110903L, Sep 2019
L.E. Sucar, Probabilistic graphical models, in Advances in Computer Vision and Pattern Recognition (Springer London, London, 2015)
L. Serafini, A.D.A. Garcez, Logic tensor networks: deep learning and logical reasoning from data and knowledge (2016). arXiv preprint arXiv:1606.04422
H. Wang, D.Y. Yeung, Towards Bayesian deep learning: a framework and some existing methods. IEEE Trans. Knowl. Data Eng. 28(12), 3395–3408 (2016)
R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, L. De Raedt, Deepproblog: neural probabilistic logic programming, in Advances in Neural Information Processing Systems (2018), pp. 3749–3759
N. Shah, L. Galindez, W. Meert, M. Verhelst, Acceleration of probabilistic reasoning through custom processor architecture and compiler, in Design and Test Conference Europe (DATE) (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Verhelst, M., Murmann, B. (2020). Machine Learning at the Edge. In: Murmann, B., Hoefflinger, B. (eds) NANO-CHIPS 2030. The Frontiers Collection. Springer, Cham. https://doi.org/10.1007/978-3-030-18338-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-18338-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18337-0
Online ISBN: 978-3-030-18338-7
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)