Cyber-Physical Systems Security pp 69-92 | Cite as

# Mathematical Optimizations for Deep Learning

## Abstract

Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-point multiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning “unnecessary” weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.

## Notes

### Acknowledgements

Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

## References

- 1.D. William, High-performance hardware for machine learning, in
*Conference on Neural Information Processing Systems*(2015)Google Scholar - 2.M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, in
*European Conference on Computer Vision*(2016)Google Scholar - 3.S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, in
*International Conference on Learning Representations*(2016)Google Scholar - 4.S. Marcel, Y. Rodriguez, Torchvision the machine-vision package of torch, in
*International Conference on Multimedia*(ACM, New York, 2010), pp. 1485–1488Google Scholar - 5.A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S.W. Keckler, W.J. Dally, Scnn: an accelerator for compressed-sparse convolutional neural networks, in
*International Symposium on Computer Architecture*(ACM, New York, 2017), pp. 27–40Google Scholar - 6.N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T.V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C.R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D.H. Yoon, In-datacenter performance analysis of a tensor processing unit, in
*International Symposium on Computer Architecture*(ACM, New York, 2017), pp. 1–12CrossRefGoogle Scholar - 7.M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: training deep neural networks with binary weights during propagations, in
*Conference on Neural Information Processing Systems*(2015)Google Scholar - 8.G. Hinton, Neural networks for machine learning. https://www.coursera.org/learn/neural-networks (2012). Accessed 03/14/18
- 9.I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, in
*Conference on Neural Information Processing Systems*(2016), pp. 4107–4115Google Scholar - 10.C. Buciluǒ, R. Caruana, A. Niculescu-Mizil, Model compression, in
*International Conference on Knowledge Discovery and Data Mining*(ACM, New York, 2006), pp. 535–541Google Scholar - 11.G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). Preprint. arXiv:1503.02531Google Scholar
- 12.A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in
*Advances in Neural Information Processing Systems*(2012), pp. 1097–1105Google Scholar - 13.R. Rigamonti, A. Sironi, V. Lepetit, P. Fua, Learning separable filters,
*Conference on Computer Vision and Pattern Recognition*(IEEE, New York, 2013), pp. 2754–2761Google Scholar - 14.M. Lin, Q. Chen, S. Yan, Network in network,
*International Conference on Learning Representations*(2014)Google Scholar - 15.C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in
*Conference on Computer Vision and Pattern Recognition*(IEEE, New York, 2015)Google Scholar - 16.D. Strukov, ECE594BB neuromorphic engineering, University of California, Santa Barbara, March (2018)Google Scholar