Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Zhou, Shu-Chang; Wang, Yu-Zhi; Wen, He; He, Qin-Yao; Zou, Yu-Heng

doi:10.1007/s11390-017-1750-y

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Regular Paper
Published: 14 July 2017

Volume 32, pages 667–682, (2017)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Shu-Chang Zhou^1,2,3,
Yu-Zhi Wang^3,4,
He Wen^3,5,
Qin-Yao He^3,5 &
…
Yu-Heng Zou^5,6

891 Accesses
61 Citations
9 Altmetric
Explore all metrics

Abstract

Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in neural networks are often imbalanced, such that the uniform quantization determined from extremal values may underutilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both convolutional neural networks and recurrent neural networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7%, which is superior to the state-of-the-arts of QNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

IQNN: Training Quantized Neural Networks with Iterative Optimizations

Blended coarse gradient descent for full quantization of deep neural networks

Article 02 January 2019

References

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th Int. Conf. Neural Information Processing Systems, December 2012, pp.1097-1105.
Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In Proc. European Conference on Computer Vision, September 2014, pp.818-833.
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580-587.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3431-3440.
Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.
Article Google Scholar
Graves A, Mohamed A R, Hinton G E. Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), May 2013, pp.6645-6649.
Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems, December 2013, pp.3111-3119.
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems, December 2014, pp.3104-3112.
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv: 1409.0473, 2014. http://arxiv.org/abs/1409.0473, May 2017.
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Wierstra D K D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533.
Article Google Scholar
Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484-489.
Article Google Scholar
He K M, Zhang X Y, Ren S Q Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conf. Computer Vision (ECCV), October 2016, pp.630-645.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. http://arxiv.org/abs/1409.1556, May 2017.
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015.
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.770-778.
Galal S, Horowitz M. Energy-efficient floating-point unit design. IEEE Trans. Computers, 2011, 60(7): 913-922.
Article MathSciNet Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780.
Article Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014. http://arxiv.org/abs/14-12.3555, May 2017.
Pham P H, Jelaca D, Farabet C, Martini B, LeCun Y, Culurciello E. NeuFlow: Dataflow vision processing system-ona-chip. In Proc. the 55th IEEE Int. Midwest Symp. Circuits and Systems (MWSCAS), August 2012, pp.1044-1047.
Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 9th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2014, pp.269-284.
Luo T, Liu S L, Li L, Wang Y Q, Zhang S J, Chen T S, Xu Z W, Temam O, Chen Y J. DaDianNao: A neural network supercomputer. IEEE Trans. Computers, 2017, 66(1): 73-88.
Article MathSciNet Google Scholar
Denton E L, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In Proc. the 27th Int. Conf. Neural Information Processing Systems, December 2014, pp.1269-1277.
Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. In Proc. British Machine Vision Conference (BMVC), September 2014.
Tai C, Xiao T, Zhang Y, Wang X G, E W N. Convolutional neural networks with low-rank regularization. arXiv: 1511.06067, 2015. http://arxiv.org/abs/1511.06067, May 2017.
Zhou S C, Wu J N, Wu Y X, Zhou X Y. Exploiting local structures with the Kronecker layer in convolutional networks. arXiv: 1512.09194, 2015. https://arxiv.org/abs/15-12.09194, May 2017.
Novikov A, Podoprikhin D, Osokin A, Vetrov D. Tensorizing neural networks. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.442-450.
Zhang X Y, Zou J H, He K M, Sun J. Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2016, 38(10): 1943-1955.
Article Google Scholar
Anwar S, Hwang K, Sung W. Structured pruning of deep convolutional neural networks. arXiv: 1512.08571, 2015. http://arxiv.org/abs/1512.08571, May 2017.
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.1135-1143.
Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv: 1510.00149, 2015. https://arxiv.org/abs/1510.00149, May 2017.
Liu B Y,Wang M, Foroosh H, Tappen M, Penksy M. Sparse convolutional neural networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), June 2015, pp.806-814.
Cheng Y, Yu F X, Feris R S, Kumar S, Choudhary A, Chang S F. An exploration of parameter redundancy in deep networks with circulant projections. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.2857-2865.
Chen W L, Wilson J T, Tyree S, Weinberger K Q, Chen Y X. Compressing neural networks with the hashing trick. In Proc. the 32nd Int. Conf. Int. Machine Learning, July 2015, pp.2285-2294.
Chen W L, Wilson J, Tyree S, Weinberger K Q, Chen Y X. Compressing convolutional neural networks in the frequency domain. In Proc. the 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2016, pp.1475-1484.
Anguita D, Carlino L, Ghio A, Ridella S. A FPGA core generator for embedded classification systems. Journal of Circuits Systems and Computers, 2011, 20(2): 263-282.
Article Google Scholar
Vanhoucke V, Senior A, Mao M Z. Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning Workshop, December 2011.
Alvarez R, Prabhavalkar R, Bakhtin A. On the efficient representation and execution of deep acoustic models. In Proc. the 17th Annual Conf. the Int. Speech Communication Association, September 2016, pp.2746-2750.
Zen H, Agiomyrgiannakis Y, Egberts N, Henderson F, Szczepaniak P. Fast, compact, and high quality LSTMRNN based statistical parametric speech synthesizers for mobile devices. In Proc. the 17th Annual Conf. the Int. Speech Communication Association, September 2016, pp.2273-2277.
Gong Y C, Liu L, Yang M, Bourdev L. Compressing deep convolutional networks using vector quantization. arXiv: 1412.6115, 2014. https://arxiv.org/abs/1412.6115, May 2017.
Merolla P, Appuswamy R, Arthur J, Esser S K, Modha D. Deep neural networks are robust to weight binarization and other non-linear distortions. arXiv: 1606.01981, 2016. https://arxiv.org/abs/1606.01981, May 2017.
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. arXiv: 1502.02551, 2015. http://arxiv.org/abs/1502.02551, May 2017.
Courbariaux M, Bengio Y. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv: 1602.02830v1, 2016. http://arxiv.org/abs/1602.02830v1, May 2017.
Wu J X, Leng C, Wang Y H, Hu Q H, Cheng J. Quantized convolutional neural networks for mobile devices. arXiv: 1512.06473, 2016. https://www.arxiv.org/abs/1512.06473, May 2017.
Kim M, Smaragdis P. Bitwise neural networks. arXiv: 1601.06071, 2016. https://arxiv.org/abs/1601.06071, May 2017.
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In Proc. the 30th Conf. Neural Information Processing Systems, December 2016, pp.4107-4115.
Rastegari M, Ordonez V, Redmon J, Farhadi A. XNORNet: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conf. Computer Vision, October 2016, pp.525-542.
Hinton G, Srivastava N, Swersky K. Coursera: Neural networks for machine learning. 2012. https://www.classcentral. com/mooc/398/coursera-neural-networks-for-mach ine-learning, May 2017.
Bengio Y, L´eonard N, Courville A C. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv: 1308.3432, 2013. http://adsabs.harvard. edu/abs/2013arXiv1308.3432B, May 2017.
Hwang K, Sung W. Fixed-point feedforward deep neural network design using weights +1, 0, and −1. In Proc. IEEE Workshop on Signal Processing Systems, October 2014.
Shin S, Hwang K, Sung W. Fixed-point performance analysis of recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), March 2016, pp.976-980.
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv: 1609.07061, 2016. http://arxiv.org/abs/1609.07061, May 2017.
Miyashita D, Lee E H, Murmann B. Convolutional neural networks using logarithmic data representation. arXiv: 1603.01025, 2016. https://arxiv.org/abs/1603.01025, May 2017.
Zhou S C, Wu Y X, Ni Z K, Zhou X Y, Wen H, Zou Y H. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv: 1606.06160, 2016. https://www.arxiv.org/abs/1606.06160, May 2017.
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z F, Citro C, Corrado G S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y Q, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X Q. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv: 1603.04467, 2015. https://arxiv.org/abs/1603.04467, May 2017.
Andri R, Cavigelli L, Rossi D, Benini L. YodaNN: An ultralow power convolutional neural network accelerator based on binary weights. In Proc. IEEE Computer Society Annual Symposium on VLSI, July 2016, pp.236-241.
Lee M, Hwang K, Park J, Choi S, Shin S, Sung W. FPGAbased low-power speech recognition with recurrent neural networks. In Proc. IEEE Int. Workshop on Signal Processing Systems, October 2016, pp.230-235.
Courbariaux M, Bengio Y, David J P. BinaryConnect: Training deep neural networks with binary weights during propagations. In Proc. the 28th Int. Conf. Neural Information Processing Systems, December 2015, pp.3123-3131.
Saxe A M, Koh P W, Chen Z H, Bhand M, Suresh B, Ng A Y. On random weights and unsupervised feature learning. In Proc. the 28th Int. Conf. Machine Learning, June 2011, pp.1089-1096.
Giryes R, Sapiro G, Bronstein A M. Deep neural networks with random gaussian weights: A universal classification strategy? IEEE Trans. Signal Processing, 2016, 64(13): 3444-3457.
Article MathSciNet Google Scholar
Heckbert P. Color image quantization for frame buffer display. In Proc. the 9th Annual Conf. Computer Graphics and Interactive Techniques, July 1982, pp.297-307.
Mallows C. Another comment on o’cinneide. The American Statistician, 1991, 45(3): 257.
Google Scholar
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv: 1502.03167, 2015. https://arxiv.org/abs/1502.03167, May 2017.
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In Proc. Workshop on Deep Learning and Unsupervised Feature Learning, Dec. 2011.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.
Article MathSciNet Google Scholar
Gysel P, Motamedi M, Ghiasi S. Hardware-oriented approximation of convolutional neural networks. arXiv: 1604.03168, 2016. http://arxiv.org/abs/1604.03168, May 2017.
Taylor A, Marcus M, Santorini B. The Penn Treebank: An overview. In Treebanks, Abeill´e A(ed.), Springer, 2003, pp.5-22.

Download references

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, 100049, China
Shu-Chang Zhou
State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Shu-Chang Zhou
Megvii Inc., Beijing, 100190, China
Shu-Chang Zhou, Yu-Zhi Wang, He Wen & Qin-Yao He
Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Yu-Zhi Wang
Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
He Wen, Qin-Yao He & Yu-Heng Zou
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Yu-Heng Zou

Authors

Shu-Chang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Zhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
He Wen
View author publications
You can also search for this author in PubMed Google Scholar
Qin-Yao He
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Heng Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shu-Chang Zhou.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 1258 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, SC., Wang, YZ., Wen, H. et al. Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks. J. Comput. Sci. Technol. 32, 667–682 (2017). https://doi.org/10.1007/s11390-017-1750-y

Download citation

Received: 20 December 2016
Revised: 18 May 2017
Published: 14 July 2017
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11390-017-1750-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Abstract

Access this article

Similar content being viewed by others

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

IQNN: Training Quantized Neural Networks with Iterative Optimizations

Blended coarse gradient descent for full quantization of deep neural networks

References

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Abstract

Access this article

Similar content being viewed by others

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

IQNN: Training Quantized Neural Networks with Iterative Optimizations

Blended coarse gradient descent for full quantization of deep neural networks

References

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation