Abstract
Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-point multiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning “unnecessary” weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Also called input filter maps (ifmaps) and output filter maps (ofmaps) in literature.
- 2.
Note that we consider \(\mathcal {W}\) and \(\mathcal {I}\) to be flattened.
- 3.
Multiplication by α is still necessary when using the weight binarization technique in XNOR-Net.
- 4.
Not to be confused with XNOR-Net [2]. Here we are referring to the exclusive-NOR operation.
- 5.
Hamming weight is defined as the number of 1s in a vector.
- 6.
The softmax layer is at the output and has no trainable weights. It can therefore be replaced in the larger network with a separate temperature, with no need for retraining.
- 7.
First-order estimates of power costs can be calculated using Table 1.
- 8.
Our calculations assume there is no pooling layer after convolution, which is now commonly the case.
References
D. William, High-performance hardware for machine learning, in Conference on Neural Information Processing Systems (2015)
M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, in European Conference on Computer Vision (2016)
S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, in International Conference on Learning Representations (2016)
S. Marcel, Y. Rodriguez, Torchvision the machine-vision package of torch, in International Conference on Multimedia (ACM, New York, 2010), pp. 1485–1488
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S.W. Keckler, W.J. Dally, Scnn: an accelerator for compressed-sparse convolutional neural networks, in International Symposium on Computer Architecture (ACM, New York, 2017), pp. 27–40
N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T.V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C.R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D.H. Yoon, In-datacenter performance analysis of a tensor processing unit, in International Symposium on Computer Architecture (ACM, New York, 2017), pp. 1–12
M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: training deep neural networks with binary weights during propagations, in Conference on Neural Information Processing Systems (2015)
G. Hinton, Neural networks for machine learning. https://www.coursera.org/learn/neural-networks (2012). Accessed 03/14/18
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, in Conference on Neural Information Processing Systems (2016), pp. 4107–4115
C. Buciluǒ, R. Caruana, A. Niculescu-Mizil, Model compression, in International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2006), pp. 535–541
G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). Preprint. arXiv:1503.02531
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
R. Rigamonti, A. Sironi, V. Lepetit, P. Fua, Learning separable filters, Conference on Computer Vision and Pattern Recognition (IEEE, New York, 2013), pp. 2754–2761
M. Lin, Q. Chen, S. Yan, Network in network, International Conference on Learning Representations (2014)
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in Conference on Computer Vision and Pattern Recognition (IEEE, New York, 2015)
D. Strukov, ECE594BB neuromorphic engineering, University of California, Santa Barbara, March (2018)
Acknowledgements
Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Green, S., Vineyard, C.M., Koç, Ç.K. (2018). Mathematical Optimizations for Deep Learning. In: Koç, Ç.K. (eds) Cyber-Physical Systems Security. Springer, Cham. https://doi.org/10.1007/978-3-319-98935-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-98935-8_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98934-1
Online ISBN: 978-3-319-98935-8
eBook Packages: Computer ScienceComputer Science (R0)