Mathematical Optimizations for Deep Learning

Green, Sam; Vineyard, Craig M.; Koç, Çetin Kaya

doi:10.1007/978-3-319-98935-8_4

Mathematical Optimizations for Deep Learning

Sam Green⁴,
Craig M. Vineyard⁵ &
Çetin Kaya Koç^6,7,4

Chapter

1357 Accesses

Abstract

Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-point multiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning “unnecessary” weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Also called input filter maps (ifmaps) and output filter maps (ofmaps) in literature.
2.
Note that we consider \(\mathcal {W}\) and \(\mathcal {I}\) to be flattened.
3.
Multiplication by α is still necessary when using the weight binarization technique in XNOR-Net.
4.
Not to be confused with XNOR-Net [2]. Here we are referring to the exclusive-NOR operation.
5.
Hamming weight is defined as the number of 1s in a vector.
6.
The softmax layer is at the output and has no trainable weights. It can therefore be replaced in the larger network with a separate temperature, with no need for retraining.
7.
First-order estimates of power costs can be calculated using Table 1.
8.
Our calculations assume there is no pooling layer after convolution, which is now commonly the case.

References

D. William, High-performance hardware for machine learning, in Conference on Neural Information Processing Systems (2015)
Google Scholar
M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, in European Conference on Computer Vision (2016)
Google Scholar
S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, in International Conference on Learning Representations (2016)
Google Scholar
S. Marcel, Y. Rodriguez, Torchvision the machine-vision package of torch, in International Conference on Multimedia (ACM, New York, 2010), pp. 1485–1488
Google Scholar
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S.W. Keckler, W.J. Dally, Scnn: an accelerator for compressed-sparse convolutional neural networks, in International Symposium on Computer Architecture (ACM, New York, 2017), pp. 27–40
Google Scholar
N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T.V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C.R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D.H. Yoon, In-datacenter performance analysis of a tensor processing unit, in International Symposium on Computer Architecture (ACM, New York, 2017), pp. 1–12
Article Google Scholar
M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: training deep neural networks with binary weights during propagations, in Conference on Neural Information Processing Systems (2015)
Google Scholar
G. Hinton, Neural networks for machine learning. https://www.coursera.org/learn/neural-networks (2012). Accessed 03/14/18
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, in Conference on Neural Information Processing Systems (2016), pp. 4107–4115
Google Scholar
C. Buciluǒ, R. Caruana, A. Niculescu-Mizil, Model compression, in International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2006), pp. 535–541
Google Scholar
G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). Preprint. arXiv:1503.02531
Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Google Scholar
R. Rigamonti, A. Sironi, V. Lepetit, P. Fua, Learning separable filters, Conference on Computer Vision and Pattern Recognition (IEEE, New York, 2013), pp. 2754–2761
Google Scholar
M. Lin, Q. Chen, S. Yan, Network in network, International Conference on Learning Representations (2014)
Google Scholar
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in Conference on Computer Vision and Pattern Recognition (IEEE, New York, 2015)
Google Scholar
D. Strukov, ECE594BB neuromorphic engineering, University of California, Santa Barbara, March (2018)
Google Scholar

Download references

Acknowledgements

Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Author information

Authors and Affiliations

University of California Santa Barbara, Santa Barbara, CA, USA
Sam Green & Çetin Kaya Koç
Sandia National Laboratories, Albuquerque, NM, USA
Craig M. Vineyard
İstinye University, Istanbul, Turkey
Çetin Kaya Koç
Nanjing University of Aeronautics and Astronautics, Jiangsu, China
Çetin Kaya Koç

Authors

Sam Green
View author publications
You can also search for this author in PubMed Google Scholar
Craig M. Vineyard
View author publications
You can also search for this author in PubMed Google Scholar
Çetin Kaya Koç
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sam Green .

Editor information

Editors and Affiliations

İstinye University, İstanbul, Turkey
Çetin Kaya Koç
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Çetin Kaya Koç
University of California Santa Barbara, Santa Barbara, CA, USA
Çetin Kaya Koç

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Green, S., Vineyard, C.M., Koç, Ç.K. (2018). Mathematical Optimizations for Deep Learning. In: Koç, Ç.K. (eds) Cyber-Physical Systems Security. Springer, Cham. https://doi.org/10.1007/978-3-319-98935-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-98935-8_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98934-1
Online ISBN: 978-3-319-98935-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics