Mathematical Optimizations for Deep Learning

  • Sam GreenEmail author
  • Craig M. Vineyard
  • Çetin Kaya Koç


Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-point multiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning “unnecessary” weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.



Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.


  1. 1.
    D. William, High-performance hardware for machine learning, in Conference on Neural Information Processing Systems (2015)Google Scholar
  2. 2.
    M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, in European Conference on Computer Vision (2016)Google Scholar
  3. 3.
    S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, in International Conference on Learning Representations (2016)Google Scholar
  4. 4.
    S. Marcel, Y. Rodriguez, Torchvision the machine-vision package of torch, in International Conference on Multimedia (ACM, New York, 2010), pp. 1485–1488Google Scholar
  5. 5.
    A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S.W. Keckler, W.J. Dally, Scnn: an accelerator for compressed-sparse convolutional neural networks, in International Symposium on Computer Architecture (ACM, New York, 2017), pp. 27–40Google Scholar
  6. 6.
    N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T.V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C.R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D.H. Yoon, In-datacenter performance analysis of a tensor processing unit, in International Symposium on Computer Architecture (ACM, New York, 2017), pp. 1–12CrossRefGoogle Scholar
  7. 7.
    M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: training deep neural networks with binary weights during propagations, in Conference on Neural Information Processing Systems (2015)Google Scholar
  8. 8.
    G. Hinton, Neural networks for machine learning. (2012). Accessed 03/14/18
  9. 9.
    I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, in Conference on Neural Information Processing Systems (2016), pp. 4107–4115Google Scholar
  10. 10.
    C. Buciluǒ, R. Caruana, A. Niculescu-Mizil, Model compression, in International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2006), pp. 535–541Google Scholar
  11. 11.
    G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). Preprint. arXiv:1503.02531Google Scholar
  12. 12.
    A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105Google Scholar
  13. 13.
    R. Rigamonti, A. Sironi, V. Lepetit, P. Fua, Learning separable filters, Conference on Computer Vision and Pattern Recognition (IEEE, New York, 2013), pp. 2754–2761Google Scholar
  14. 14.
    M. Lin, Q. Chen, S. Yan, Network in network, International Conference on Learning Representations (2014)Google Scholar
  15. 15.
    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in Conference on Computer Vision and Pattern Recognition (IEEE, New York, 2015)Google Scholar
  16. 16.
    D. Strukov, ECE594BB neuromorphic engineering, University of California, Santa Barbara, March (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Sam Green
    • 1
    Email author
  • Craig M. Vineyard
    • 2
  • Çetin Kaya Koç
    • 3
    • 4
    • 1
  1. 1.University of California Santa BarbaraSanta BarbaraUSA
  2. 2.Sandia National LaboratoriesAlbuquerqueUSA
  3. 3.İstinye UniversityIstanbulTurkey
  4. 4.Nanjing University of Aeronautics and AstronauticsJiangsuChina

Personalised recommendations