Skip to main content

Mathematical Optimizations for Deep Learning

  • Chapter
  • 1357 Accesses

Abstract

Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-point multiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning “unnecessary” weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Also called input filter maps (ifmaps) and output filter maps (ofmaps) in literature.

  2. 2.

    Note that we consider \(\mathcal {W}\) and \(\mathcal {I}\) to be flattened.

  3. 3.

    Multiplication by α is still necessary when using the weight binarization technique in XNOR-Net.

  4. 4.

    Not to be confused with XNOR-Net [2]. Here we are referring to the exclusive-NOR operation.

  5. 5.

    Hamming weight is defined as the number of 1s in a vector.

  6. 6.

    The softmax layer is at the output and has no trainable weights. It can therefore be replaced in the larger network with a separate temperature, with no need for retraining.

  7. 7.

    First-order estimates of power costs can be calculated using Table 1.

  8. 8.

    Our calculations assume there is no pooling layer after convolution, which is now commonly the case.

References

  1. D. William, High-performance hardware for machine learning, in Conference on Neural Information Processing Systems (2015)

    Google Scholar 

  2. M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, in European Conference on Computer Vision (2016)

    Google Scholar 

  3. S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, in International Conference on Learning Representations (2016)

    Google Scholar 

  4. S. Marcel, Y. Rodriguez, Torchvision the machine-vision package of torch, in International Conference on Multimedia (ACM, New York, 2010), pp. 1485–1488

    Google Scholar 

  5. A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S.W. Keckler, W.J. Dally, Scnn: an accelerator for compressed-sparse convolutional neural networks, in International Symposium on Computer Architecture (ACM, New York, 2017), pp. 27–40

    Google Scholar 

  6. N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T.V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C.R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D.H. Yoon, In-datacenter performance analysis of a tensor processing unit, in International Symposium on Computer Architecture (ACM, New York, 2017), pp. 1–12

    Article  Google Scholar 

  7. M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: training deep neural networks with binary weights during propagations, in Conference on Neural Information Processing Systems (2015)

    Google Scholar 

  8. G. Hinton, Neural networks for machine learning. https://www.coursera.org/learn/neural-networks (2012). Accessed 03/14/18

  9. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, in Conference on Neural Information Processing Systems (2016), pp. 4107–4115

    Google Scholar 

  10. C. Buciluǒ, R. Caruana, A. Niculescu-Mizil, Model compression, in International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2006), pp. 535–541

    Google Scholar 

  11. G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). Preprint. arXiv:1503.02531

    Google Scholar 

  12. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

    Google Scholar 

  13. R. Rigamonti, A. Sironi, V. Lepetit, P. Fua, Learning separable filters, Conference on Computer Vision and Pattern Recognition (IEEE, New York, 2013), pp. 2754–2761

    Google Scholar 

  14. M. Lin, Q. Chen, S. Yan, Network in network, International Conference on Learning Representations (2014)

    Google Scholar 

  15. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in Conference on Computer Vision and Pattern Recognition (IEEE, New York, 2015)

    Google Scholar 

  16. D. Strukov, ECE594BB neuromorphic engineering, University of California, Santa Barbara, March (2018)

    Google Scholar 

Download references

Acknowledgements

Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sam Green .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Cite this chapter

Green, S., Vineyard, C.M., Koç, Ç.K. (2018). Mathematical Optimizations for Deep Learning. In: Koç, Ç.K. (eds) Cyber-Physical Systems Security. Springer, Cham. https://doi.org/10.1007/978-3-319-98935-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98935-8_4

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98934-1

  • Online ISBN: 978-3-319-98935-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics