Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

  • Julian Faraone
  • Nicholas Fraser
  • Giulio Gambardella
  • Michaela Blott
  • Philip H. W. Leong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)

Abstract

A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hardware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retraining. Resulting networks achieve improved accuracy, reduced memory footprint and reduced computational complexity compared with conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98% sparse and 5 & 11 times smaller than equivalent binary and ternary models, translating to significant resource and speed benefits for hardware implementations.

Keywords

Deep Neural Networks Ternary Neural Network Low-precision Pruning Sparsity Compression 

Notes

Acknowledgements

This research was partly supported under the Australian Research Councils Linkage Projects funding scheme (project number LP130101034) and Zomojo Pty Ltd.

References

  1. 1.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)Google Scholar
  2. 2.
    Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation (2015)Google Scholar
  3. 3.
    Courbariaux, M., Bengio, Y., David, J.-P.: Binaryconnect: training deep neural networks with binary weights during propagations (2015)Google Scholar
  4. 4.
    Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or \(-\)1 (2016)Google Scholar
  5. 5.
    Li, F., Liu, B.: Ternary weight networks (2016)Google Scholar
  6. 6.
    Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)Google Scholar
  7. 7.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (2015)Google Scholar
  8. 8.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)Google Scholar
  9. 9.
    Ardakani, A., Condo, C., Gross, W.J.: Sparsely-connected neural networks: towards efficient VLSI implementation of deep neural networks (2016)Google Scholar
  10. 10.
    Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P.H.W, Jahre, M., Vissers, K.A.: FINN: a framework for fast, scalable binarized neural network inference (2016)Google Scholar
  11. 11.
    Fraser, N.J., Umuroglu, Y., Gambardella, G., Blott, M., Leong, P.H.W., Jahre, M., Vissers, K.A.: Scaling binarized neural networks on reconfigurable logic (2017)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Julian Faraone
    • 1
    • 2
  • Nicholas Fraser
    • 1
    • 2
  • Giulio Gambardella
    • 2
  • Michaela Blott
    • 2
  • Philip H. W. Leong
    • 1
  1. 1.School of Electrical and Information EngineeringThe University of SydneySydneyAustralia
  2. 2.Xilinx Research LabsDublinIreland

Personalised recommendations