Three-Means Ternary Quantization

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)

Abstract

Deep Convolution Neural Networks (DCNNs) have achieved state-of-the-art results in a wide range of tasks, especially in image recognition and object detection. However, millions of parameters make it difficult to be deployed on embedded devices with limited storage and computational capabilities. In this paper, we propose a new method called Three-Means Ternary Quantization (TMTQ), which can quantize the weights to ternary values {\( - \alpha_{1} , 0, + \alpha_{2} \)} during the forward and backward propagations. Scaling factors {\( \alpha_{1} , \alpha_{2} \)} are used to reduce the loss of quantization. We evaluate this method on MNIST, CIFAR-10 and ImageNet datasets with different network architectures. The results show that the performance of our ternary models obtained from TMTQ is only slightly worse than full precision models but better than recently proposed binary and ternary models. Meanwhile, our TMTQ method achieves up to about 16\( \times \) model compression rate compared with the 32-bits full precision counterparts, for we just use ternary weights (2-bits) and fixed scaling factors during the inference.

Keywords

Deep learning Model compression Neural network quantization Ternary neural network 

References

  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  2. 2.
    Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  3. 3.
    Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  4. 4.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1440–1448 (2015)Google Scholar
  5. 5.
    Gong, Y., Liu, L., Yang, M., et al.: Compressing deep convolutional networks using vector quantization (2014). arXiv preprint: arXiv:1412.6115
  6. 6.
    Chen, W., Wilson, J., Tyree, S., et al.: Compressing neural networks with the hashing trick. In: Proceedings of the International Conference on Machine Learning, pp. 2285–2294 (2015)Google Scholar
  7. 7.
    Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on CPUs. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 4–8 (2011)Google Scholar
  8. 8.
    Courbariaux, M., Bengio, Y., David, J.P.: Training deep neural networks with low precision multiplications (2015). arXiv preprint: arXiv:1412.7024
  9. 9.
    Lin, Z., Courbariaux, M., Memisevic, R., et al.: Neural Networks with Few Multiplications (2016). arXiv preprint: arXiv:1510.03009
  10. 10.
    Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 3123–3131 (2015)Google Scholar
  11. 11.
    Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  12. 12.
    Hwang, K., Sung, W.: Fixed-point feedforward deep neural network design using weights +1, 0, and −1 (2014). arXiv preprint: arXiv:1405.3866
  13. 13.
    Li, F., Zhang, B., Liu, B.: Ternary weight networks (2016). arXiv preprint: arXiv:1605.04711
  14. 14.
    Hubara, I., Courbariaux, M., Soudry, D., et al.: Binarized neural networks. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 4107–4115 (2016)Google Scholar
  15. 15.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). doi: 10.1007/978-3-319-46493-0_32 CrossRefGoogle Scholar
  16. 16.
    Zhou, S., Wu, Y., Ni, Z., et al.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients (2016). arXiv preprint: arXiv:1606.06160
  17. 17.
    Kim, M., Smaragdis, P.: Bitwise Neural Networks (2016). arXiv preprint: arXiv:1601.06071
  18. 18.
    Zhu, C., Han, S., Mao, H.: Trained ternary quantization (2016). arXiv preprint: arXiv:1612.01064
  19. 19.
    Pavan, K., Rao, A., Rao, V., et al.: Robust seed selection algorithm for k-means type algorithms (2012). arXiv preprint: arXiv:1202.1585
  20. 20.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  21. 21.
    Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the International Conference on Multimedia Retrieval, pp. 675–678 (2014)Google Scholar
  22. 22.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint: arXiv:1409.1556

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Suzhou Institute for Advanced StudyUniversity of Science and Technology of ChinaSuzhouChina
  2. 2.Department of Computer Science and TechnologyUniversity of Science and Technology of ChinaSuzhouChina

Personalised recommendations