Advertisement

Post-training Piecewise Linear Quantization for Deep Neural Networks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)

Abstract

Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. Post-training quantization is highly desirable since it does not require retraining or access to the full training dataset. The well-established uniform scheme for post-training quantization achieves satisfactory results by converting neural networks from full-precision to 8-bit fixed-point integers. However, it suffers from significant performance degradation when quantizing to lower bit-widths. In this paper, we propose a piecewise linear quantization (PWLQ) scheme (Code will be made available at https://github.com/jun-fang/PWLQ) to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. Our approach breaks the entire quantization range into non-overlapping regions for each tensor, with each region being assigned an equal number of quantization levels. Optimal breakpoints that divide the entire range are found by minimizing the quantization error. Compared to state-of-the-art post-training quantization methods, experimental results show that our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.

Keywords

Deep neural networks Post-training quantization Piecewise linear quantization 

Supplementary material

504434_1_En_5_MOESM1_ESM.pdf (1.1 mb)
Supplementary material 1 (pdf 1165 KB)

References

  1. 1.
    Bakunas-Milanowski, D., Rego, V., Sang, J., Chansu, Y.: Efficient algorithms for stream compaction on GPUs. Int. J. Netw. Comput. 7(2), 208–226 (2017)CrossRefGoogle Scholar
  2. 2.
    Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: Post training 4-bit quantization of convolution networks for rapid-deployment. CoRR, abs/1810.05723 1, 2 (2018)Google Scholar
  3. 3.
    Baskin, C., et al.: UNIQ: uniform noise injection for non-uniform quantization of neural networks. arXiv preprint arXiv:1804.10969 (2018)
  4. 4.
    Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave gaussian quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5918–5926 (2017)Google Scholar
  5. 5.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_49CrossRefGoogle Scholar
  6. 6.
    Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018)
  7. 7.
    Choukroun, Y., Kravchik, E., Kisilev, P.: Low-bit quantization of neural networks for efficient inference. arXiv preprint arXiv:1902.06822 (2019)
  8. 8.
    Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)Google Scholar
  9. 9.
    Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. arXiv preprint arXiv:1803.01442 (2018)
  10. 10.
    Dong, X., Huang, J., Yang, Y., Yan, S.: More is less: a more complicated network with less inference complexity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5840–5848 (2017)Google Scholar
  11. 11.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  12. 12.
    Faraone, J., Fraser, N., Blott, M., Leong, P.H.: SYQ: learning symmetric quantization for efficient deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4300–4309 (2018)Google Scholar
  13. 13.
    Finkelstein, A., Almog, U., Grobman, M.: Fighting quantization bias with bias. arXiv preprint arXiv:1906.03193 (2019)
  14. 14.
    Georgiadis, G.: Accelerating convolutional neural networks via activation map compression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7085–7095 (2019)Google Scholar
  15. 15.
    Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
  16. 16.
    Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746 (2015)Google Scholar
  17. 17.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
  18. 18.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)Google Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  20. 20.
    He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1389–1397 (2017)Google Scholar
  21. 21.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  22. 22.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  23. 23.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)Google Scholar
  24. 24.
    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)Google Scholar
  25. 25.
    Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)Google Scholar
  26. 26.
    Jain, S., Venkataramani, S., Srinivasan, V., Choi, J., Gopalakrishnan, K., Chang, L.: BiScaled-DNN: quantizing long-tailed datastructures with two scale factors for deep neural networks. In: 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2019)Google Scholar
  27. 27.
    Jung, S., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4350–4359 (2019)Google Scholar
  28. 28.
    Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv preprint arXiv:1806.08342 (2018)
  29. 29.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  30. 30.
    Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Fast and accurate image super-resolution with deep Laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2599–2613 (2018)CrossRefGoogle Scholar
  31. 31.
    Lee, J.H., Ha, S., Choi, S., Lee, W.J., Lee, S.: Quantization for rapid deployment of deep neural networks. arXiv preprint arXiv:1810.05488 (2018)
  32. 32.
    Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)
  33. 33.
    Li, R., Wang, Y., Liang, F., Qin, H., Yan, J., Fan, R.: Fully quantized network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2810–2819 (2019)Google Scholar
  34. 34.
    Li, Y., Dong, X., Wang, W.: Additive powers-of-two quantization: an efficient non-uniform discretization for neural networks. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=BkgXT24tDS
  35. 35.
    Lin, D., Talathi, S., Annapureddy, S.: Fixed point quantization of deep convolutional networks. In: International Conference on Machine Learning, pp. 2849–2858 (2016)Google Scholar
  36. 36.
    Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  37. 37.
    Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5058–5066 (2017)Google Scholar
  38. 38.
    Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01264-9_8CrossRefGoogle Scholar
  39. 39.
    Meller, E., Finkelstein, A., Almog, U., Grobman, M.: Same, same but different-recovering neural network quantization error through weight factorization. arXiv preprint arXiv:1902.01917 (2019)
  40. 40.
    Micikevicius, P., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
  41. 41.
    Miyashita, D., Lee, E.H., Murmann, B.: Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025 (2016)
  42. 42.
    Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. arXiv preprint arXiv:1906.04721 (2019)
  43. 43.
    Park, E., Yoo, S., Vajda, P.: Value-aware quantization for training and inference of neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 608–624. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01225-0_36CrossRefGoogle Scholar
  44. 44.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: 31st Conference on Neural Information Processing Systems (2017)Google Scholar
  45. 45.
    Polino, A., Pascanu, R., Alistarh, D.: Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018)
  46. 46.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  47. 47.
    Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)Google Scholar
  48. 48.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  49. 49.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  50. 50.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  51. 51.
    Russakovsky, O., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)Google Scholar
  53. 53.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  54. 54.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  55. 55.
    Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
  56. 56.
    Ullrich, K., Meeds, E., Welling, M.: Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008 (2017)
  57. 57.
    Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820–4828 (2016)Google Scholar
  58. 58.
    You, Y.: Audio Coding: Theory and Applications. Springer, New York (2010).  https://doi.org/10.1007/978-1-4419-1754-6CrossRefzbMATHGoogle Scholar
  59. 59.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
  60. 60.
    Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-Nets: learned quantization for highly accurate and compact deep neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 373–390. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01237-3_23CrossRefGoogle Scholar
  61. 61.
    Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)Google Scholar
  62. 62.
    Zhao, R., Hu, Y., Dotzel, J., De Sa, C., Zhang, Z.: Improving neural network quantization without retraining using outlier channel splitting. In: International Conference on Machine Learning, pp. 7543–7552 (2019)Google Scholar
  63. 63.
    Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless CNNs with low-precision weights. arXiv preprint arXiv:1702.03044 (2017)
  64. 64.
    Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
  65. 65.
    Zhou, Y., Moosavi-Dezfooli, S.M., Cheung, N.M., Frossard, P.: Adaptive quantization for deep neural network. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  66. 66.
    Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Samsung Semiconductor, Inc.San JoseUSA
  2. 2.MicrosoftRedmondUSA

Personalised recommendations