Skip to main content
Log in

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in neural networks are often imbalanced, such that the uniform quantization determined from extremal values may underutilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both convolutional neural networks and recurrent neural networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7%, which is superior to the state-of-the-arts of QNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th Int. Conf. Neural Information Processing Systems, December 2012, pp.1097-1105.

  2. Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In Proc. European Conference on Computer Vision, September 2014, pp.818-833.

  3. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580-587.

  4. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3431-3440.

  5. Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.

    Article  Google Scholar 

  6. Graves A, Mohamed A R, Hinton G E. Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), May 2013, pp.6645-6649.

  7. Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems, December 2013, pp.3111-3119.

  8. Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems, December 2014, pp.3104-3112.

  9. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv: 1409.0473, 2014. http://arxiv.org/abs/1409.0473, May 2017.

  10. Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Wierstra D K D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533.

    Article  Google Scholar 

  11. Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484-489.

    Article  Google Scholar 

  12. He K M, Zhang X Y, Ren S Q Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conf. Computer Vision (ECCV), October 2016, pp.630-645.

  13. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. http://arxiv.org/abs/1409.1556, May 2017.

  14. Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015.

  15. He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.770-778.

  16. Galal S, Horowitz M. Energy-efficient floating-point unit design. IEEE Trans. Computers, 2011, 60(7): 913-922.

    Article  MathSciNet  Google Scholar 

  17. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780.

    Article  Google Scholar 

  18. Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014. http://arxiv.org/abs/14-12.3555, May 2017.

  19. Pham P H, Jelaca D, Farabet C, Martini B, LeCun Y, Culurciello E. NeuFlow: Dataflow vision processing system-ona-chip. In Proc. the 55th IEEE Int. Midwest Symp. Circuits and Systems (MWSCAS), August 2012, pp.1044-1047.

  20. Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 9th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2014, pp.269-284.

  21. Luo T, Liu S L, Li L, Wang Y Q, Zhang S J, Chen T S, Xu Z W, Temam O, Chen Y J. DaDianNao: A neural network supercomputer. IEEE Trans. Computers, 2017, 66(1): 73-88.

    Article  MathSciNet  Google Scholar 

  22. Denton E L, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In Proc. the 27th Int. Conf. Neural Information Processing Systems, December 2014, pp.1269-1277.

  23. Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. In Proc. British Machine Vision Conference (BMVC), September 2014.

  24. Tai C, Xiao T, Zhang Y, Wang X G, E W N. Convolutional neural networks with low-rank regularization. arXiv: 1511.06067, 2015. http://arxiv.org/abs/1511.06067, May 2017.

  25. Zhou S C, Wu J N, Wu Y X, Zhou X Y. Exploiting local structures with the Kronecker layer in convolutional networks. arXiv: 1512.09194, 2015. https://arxiv.org/abs/15-12.09194, May 2017.

  26. Novikov A, Podoprikhin D, Osokin A, Vetrov D. Tensorizing neural networks. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.442-450.

  27. Zhang X Y, Zou J H, He K M, Sun J. Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2016, 38(10): 1943-1955.

    Article  Google Scholar 

  28. Anwar S, Hwang K, Sung W. Structured pruning of deep convolutional neural networks. arXiv: 1512.08571, 2015. http://arxiv.org/abs/1512.08571, May 2017.

  29. Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.1135-1143.

  30. Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv: 1510.00149, 2015. https://arxiv.org/abs/1510.00149, May 2017.

  31. Liu B Y,Wang M, Foroosh H, Tappen M, Penksy M. Sparse convolutional neural networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), June 2015, pp.806-814.

  32. Cheng Y, Yu F X, Feris R S, Kumar S, Choudhary A, Chang S F. An exploration of parameter redundancy in deep networks with circulant projections. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.2857-2865.

  33. Chen W L, Wilson J T, Tyree S, Weinberger K Q, Chen Y X. Compressing neural networks with the hashing trick. In Proc. the 32nd Int. Conf. Int. Machine Learning, July 2015, pp.2285-2294.

  34. Chen W L, Wilson J, Tyree S, Weinberger K Q, Chen Y X. Compressing convolutional neural networks in the frequency domain. In Proc. the 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2016, pp.1475-1484.

  35. Anguita D, Carlino L, Ghio A, Ridella S. A FPGA core generator for embedded classification systems. Journal of Circuits Systems and Computers, 2011, 20(2): 263-282.

    Article  Google Scholar 

  36. Vanhoucke V, Senior A, Mao M Z. Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning Workshop, December 2011.

  37. Alvarez R, Prabhavalkar R, Bakhtin A. On the efficient representation and execution of deep acoustic models. In Proc. the 17th Annual Conf. the Int. Speech Communication Association, September 2016, pp.2746-2750.

  38. Zen H, Agiomyrgiannakis Y, Egberts N, Henderson F, Szczepaniak P. Fast, compact, and high quality LSTMRNN based statistical parametric speech synthesizers for mobile devices. In Proc. the 17th Annual Conf. the Int. Speech Communication Association, September 2016, pp.2273-2277.

  39. Gong Y C, Liu L, Yang M, Bourdev L. Compressing deep convolutional networks using vector quantization. arXiv: 1412.6115, 2014. https://arxiv.org/abs/1412.6115, May 2017.

  40. Merolla P, Appuswamy R, Arthur J, Esser S K, Modha D. Deep neural networks are robust to weight binarization and other non-linear distortions. arXiv: 1606.01981, 2016. https://arxiv.org/abs/1606.01981, May 2017.

  41. Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. arXiv: 1502.02551, 2015. http://arxiv.org/abs/1502.02551, May 2017.

  42. Courbariaux M, Bengio Y. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv: 1602.02830v1, 2016. http://arxiv.org/abs/1602.02830v1, May 2017.

  43. Wu J X, Leng C, Wang Y H, Hu Q H, Cheng J. Quantized convolutional neural networks for mobile devices. arXiv: 1512.06473, 2016. https://www.arxiv.org/abs/1512.06473, May 2017.

  44. Kim M, Smaragdis P. Bitwise neural networks. arXiv: 1601.06071, 2016. https://arxiv.org/abs/1601.06071, May 2017.

  45. Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In Proc. the 30th Conf. Neural Information Processing Systems, December 2016, pp.4107-4115.

  46. Rastegari M, Ordonez V, Redmon J, Farhadi A. XNORNet: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conf. Computer Vision, October 2016, pp.525-542.

  47. Hinton G, Srivastava N, Swersky K. Coursera: Neural networks for machine learning. 2012. https://www.classcentral. com/mooc/398/coursera-neural-networks-for-mach ine-learning, May 2017.

  48. Bengio Y, L´eonard N, Courville A C. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv: 1308.3432, 2013. http://adsabs.harvard. edu/abs/2013arXiv1308.3432B, May 2017.

  49. Hwang K, Sung W. Fixed-point feedforward deep neural network design using weights +1, 0, and −1. In Proc. IEEE Workshop on Signal Processing Systems, October 2014.

  50. Shin S, Hwang K, Sung W. Fixed-point performance analysis of recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), March 2016, pp.976-980.

  51. Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv: 1609.07061, 2016. http://arxiv.org/abs/1609.07061, May 2017.

  52. Miyashita D, Lee E H, Murmann B. Convolutional neural networks using logarithmic data representation. arXiv: 1603.01025, 2016. https://arxiv.org/abs/1603.01025, May 2017.

  53. Zhou S C, Wu Y X, Ni Z K, Zhou X Y, Wen H, Zou Y H. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv: 1606.06160, 2016. https://www.arxiv.org/abs/1606.06160, May 2017.

  54. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z F, Citro C, Corrado G S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y Q, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X Q. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv: 1603.04467, 2015. https://arxiv.org/abs/1603.04467, May 2017.

  55. Andri R, Cavigelli L, Rossi D, Benini L. YodaNN: An ultralow power convolutional neural network accelerator based on binary weights. In Proc. IEEE Computer Society Annual Symposium on VLSI, July 2016, pp.236-241.

  56. Lee M, Hwang K, Park J, Choi S, Shin S, Sung W. FPGAbased low-power speech recognition with recurrent neural networks. In Proc. IEEE Int. Workshop on Signal Processing Systems, October 2016, pp.230-235.

  57. Courbariaux M, Bengio Y, David J P. BinaryConnect: Training deep neural networks with binary weights during propagations. In Proc. the 28th Int. Conf. Neural Information Processing Systems, December 2015, pp.3123-3131.

  58. Saxe A M, Koh P W, Chen Z H, Bhand M, Suresh B, Ng A Y. On random weights and unsupervised feature learning. In Proc. the 28th Int. Conf. Machine Learning, June 2011, pp.1089-1096.

  59. Giryes R, Sapiro G, Bronstein A M. Deep neural networks with random gaussian weights: A universal classification strategy? IEEE Trans. Signal Processing, 2016, 64(13): 3444-3457.

    Article  MathSciNet  Google Scholar 

  60. Heckbert P. Color image quantization for frame buffer display. In Proc. the 9th Annual Conf. Computer Graphics and Interactive Techniques, July 1982, pp.297-307.

  61. Mallows C. Another comment on o’cinneide. The American Statistician, 1991, 45(3): 257.

    Google Scholar 

  62. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv: 1502.03167, 2015. https://arxiv.org/abs/1502.03167, May 2017.

  63. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In Proc. Workshop on Deep Learning and Unsupervised Feature Learning, Dec. 2011.

  64. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.

    Article  MathSciNet  Google Scholar 

  65. Gysel P, Motamedi M, Ghiasi S. Hardware-oriented approximation of convolutional neural networks. arXiv: 1604.03168, 2016. http://arxiv.org/abs/1604.03168, May 2017.

  66. Taylor A, Marcus M, Santorini B. The Penn Treebank: An overview. In Treebanks, Abeill´e A(ed.), Springer, 2003, pp.5-22.

Download references

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu-Chang Zhou.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 1258 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, SC., Wang, YZ., Wen, H. et al. Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks. J. Comput. Sci. Technol. 32, 667–682 (2017). https://doi.org/10.1007/s11390-017-1750-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-017-1750-y

Keywords

Navigation