Advertisement

PROFIT: A Novel Training Method for sub-4-bit MobileNet Models

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)

Abstract

4-bit and lower precision mobile models are required due to the ever-increasing demand for better energy efficiency in mobile devices. In this work, we report that the activation instability induced by weight quantization (AIWQ) is the key obstacle to sub-4-bit quantization of mobile networks. To alleviate the AIWQ problem, we propose a novel training method called PROgressive-Freezing Iterative Training (PROFIT), which attempts to freeze layers whose weights are affected by the instability problem stronger than the other layers. We also propose a differentiable and unified quantization method (DuQ) and a negative padding idea to support asymmetric activation functions such as h-swish. We evaluate the proposed methods by quantizing MobileNet-v1, v2, and v3 on ImageNet and report that 4-bit quantization offers comparable (within 1.48% top-1 accuracy) accuracy to full precision baseline. In the ablation study of the 3-bit quantization of MobileNet-v3, our proposed method outperforms the state-of-the-art method by a large margin, 12.86% of top-1 accuracy. The quantized model and source code is available at https://github.com/EunhyeokPark/PROFIT.

Keywords

Mobile network Quantization Activation distribution h-swish activation 

Notes

Acknowledgement

This work was supported by Samsung Electronics, the National Research Foundation of Korea grants, NRF-2016M3A7B4909604 and NRF-2016M3C4A7952587 funded by the Ministry of Science, ICT & Future Planning (PE Class Heterogeneous High Performance Computer Development). We appreciate valuable comments from Dr. Andrew G. Howard and Dr. Jaeyoun Kim at Google.

Supplementary material

504443_1_En_26_MOESM1_ESM.pdf (726 kb)
Supplementary material 1 (pdf 726 KB)

References

  1. 1.
    Albericio, J., Judd, P., Hetherington, T.H., Aamodt, T.M., Jerger, N.D.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. In: International Symposium on Computer Architecture (ISCA) (2016)Google Scholar
  2. 2.
    Baskin, C., et al.: UNIQ: uniform noise injection for non-uniform quantization of neural networks. arXiv:1804.10969 (2018)
  3. 3.
    Choi, J., Chuang, P.I., Wang, Z., Venkataramani, S., Srinivasan, V., Gopalakrishnan, K.: Bridging the accuracy gap for 2-bit quantized neural networks. arXiv:1807.06964 (2018)
  4. 4.
    Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. arXiv:1805.06085 (2018)
  5. 5.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  6. 6.
    Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Human Language Technologies (NAACL-HLT). North American Chapter of the Association for Computational Linguistics (2019)Google Scholar
  7. 7.
    Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. arXiv:1902.08153) (2019)
  8. 8.
    Finkelstein, A., Almog, U., Grobman, M.: Fighting quantization bias with bias. arXiv:1906.03193 (2019)
  9. 9.
    Gong, R., et al.: Differentiable soft quantization: bridging full-precision and low-bit neural networks. arXiv:1908.05033 (2019)
  10. 10.
    Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:1706.02677 (2017)
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  12. 12.
    Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)
  13. 13.
    Howard, A., et al.: Searching for MobileNetV3. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  14. 14.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
  15. 15.
    Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. (JMLR) 18(1), 6869–6898 (2017)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D.P., Wilson, A.G.: Averaging weights leads to wider optima and better generalization. In: Uncertainty in Artificial Intelligence (UAI) (2018)Google Scholar
  17. 17.
    Jung, S., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  18. 18.
    Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv:1806.08342 (2018)
  19. 19.
    Krizhevsky, A., Nair, V., Hinton, G.: The cifar-10 dataset (2014). https://www.cs.toronto.edu/~kriz/cifar.html. Accessed 03 Mar 2020
  20. 20.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NeurIPS) (2012)Google Scholar
  21. 21.
    Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.-T.: Bi-Real Net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 747–763. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_44CrossRefGoogle Scholar
  22. 22.
    Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) (2017)Google Scholar
  23. 23.
    Mishra, A.K., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. In: International Conference on Learning Representations (ICLR) (2018)Google Scholar
  24. 24.
    Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. arXiv:1906.04721 (2019)
  25. 25.
    INT4 precision for AI inference (2019). https://devblogs.nvidia.com/int4-for-ai-inference/. Accessed 03 Mar 2020
  26. 26.
    Park, E., Kim, D., Yoo, S.: Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In: International Symposium on Computer Architecture (ISCA) (2018)Google Scholar
  27. 27.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  28. 28.
    Samsung low-power NPU solution for AI deep learning (2019). https://news.samsung.com/global/samsung-electronics-introduces-a-high-speed-low-power-npu-solution-for-ai-deep-learning. Accessed 03 Mar 2020
  29. 29.
    Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  30. 30.
    Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks. In: International Symposium on Computer Architecture (ISCA) (2018)Google Scholar
  31. 31.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  32. 32.
    Snapdragon neural processing engine SDK (2017). https://developer.qualcomm.com/docs/snpe/index.html. Accessed 03 Mar 2020
  33. 33.
    Song, J., et al.: 7.1 an 11.5 TOPS/W 1024-MAC butterfly structure dual-core sparsity-aware neural processing unit in 8nm flagship mobile SoS. In: International Solid-State Circuits Conference (ISSCC) (2019)Google Scholar
  34. 34.
    Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  35. 35.
    Tulloch, A., Jia, Y.: High performance ultra-low-precision convolutions on mobile devices. arXiv:1712.02427 (2017)
  36. 36.
    Tulloch, A., Jia, Y.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  37. 37.
    Wang, T., Xiong, J., Xu, X., Shi, Y.: SCNN: a general distribution based statistical convolutional neural network with application to video object detection. In: Association for the Advancement of Artificial Intelligence (AAAI) (2019)Google Scholar
  38. 38.
    Wu, H.: NVIDIA low precision inference on GPU. In: GPU Technology Conference (2019)Google Scholar
  39. 39.
    Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160 (2016)
  40. 40.
    Zhou, S., Wang, Y., Wen, H., He, Q., Zou, Y.: Balanced quantization: an effective and efficient approach to quantized neural networks. J. Comput. Sci. Technol. 32, 667–682 (2017).  https://doi.org/10.1007/s11390-017-1750-yMathSciNetCrossRefGoogle Scholar
  41. 41.
    Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. In: International Conference on Learning Representations (ICLR) (2017)Google Scholar
  42. 42.
    Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.D.: Towards effective low-bitwidth convolutional neural networks. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Inter-university Semiconductor Research Center (ISRC)Seoul National UniversitySeoulKorea
  2. 2.Department of Computer Science and Engineering, Neural Processing Research Center (NPRC)Seoul National UniversitySeoulKorea

Personalised recommendations