Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12354)


Emergent hardwares can support mixed precision CNN models inference that assign different bitwidths for different layers. Learning to find an optimal mixed precision model that can preserve accuracy and satisfy the specific constraints on model size and computation is extremely challenge due to the difficult in training a mixed precision model and the huge space of all possible bit quantizations.

In this paper, we propose a novel soft Barrier Penalty based NAS (BP-NAS) for mixed precision quantization, which ensures all the searched models are inside the valid domain defined by the complexity constraint, thus could return an optimal model under the given constraint by conducting search only one time. The proposed soft Barrier Penalty is differentiable and can impose very large losses to those models outside the valid domain while almost no punishment for models inside the valid domain, thus constraining the search only in the feasible domain. In addition, a differentiable Prob-1 regularizer is proposed to ensure learning with NAS is reasonable. A distribution reshaping training strategy is also used to make training more stable. BP-NAS sets new state of the arts on both classification (Cifar-10, ImageNet) and detection (COCO), surpassing all the efficient mixed precision methods designed manually and automatically. Particularly, BP-NAS achieves higher mAP (up to 2.7% mAP improvement) together with lower bit computation cost compared with the existing best mixed precision model on COCO detection.


Mixed precision quantization NAS Optimization problem with constraint Soft barrier penalty 



We thank Ligeng Zhu for the supportive feedback, and Kuntao Xiao for the meaningful discussion on solving constrained-optimization problem. This work is supported by the National Natural Science Foundation of China (61876180), the Beijing Natural Science Foundation (4202073), the Young Elite Scientists Sponsorship Program by CAST (2018QNRC001).

Supplementary material

504446_1_En_1_MOESM1_ESM.pdf (340 kb)
Supplementary material 1 (pdf 340 KB)


  1. 1.
    Alizadeh, F.: Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM J. Optim. 5(1), 13–51 (1995)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Cai, H., Zhu, L., Han, S.: ProxyLessnas: direct neural architecture search on target task and hardware. In: ICLR (2019)Google Scholar
  3. 3.
    Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave Gaussian quantization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  4. 4.
    Chaim, B., Eli, S., Evgenii, Z., Natan, L., Raja, G., Alex, M.B., Avi, M.: UNIQ: uniform noise injection for non-uniform quantization of neural networks. arXiv preprint arXiv:1804.10969 (2018)
  5. 5.
    Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018)
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  7. 7.
    Dong, Z., Yao, Z., Gholami, A., Mahoney, M., Keutzer, K.: HAWQ: hessian aware quantization of neural networks with mixed-precision. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  8. 8.
    Gu, J., Zhao, J., Jiang, X., Zhang, B., Liu, J., Guo, G., Ji, R.: Bayesian optimized 1-bit CNNs. In: ICCV (2019)Google Scholar
  9. 9.
    Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: ECCV (2020)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2016)Google Scholar
  11. 11.
    Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. arXiv preprint arXiv:1712.05877 (2017)
  12. 12.
    Li, R., Wang, Y., Liang, F., Qin, H., Yan, J., Fan, R.: Fully quantized network for object detection (2019)Google Scholar
  13. 13.
    Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: ICLR (2019)Google Scholar
  14. 14.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  15. 15.
    Moran, S., et al.: Robust quantization: one model to rule them all. ArXiv, abs/2002.07686 (2020)Google Scholar
  16. 16.
    Nvidia: Nvidia tensor cores (2018)Google Scholar
  17. 17.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  18. 18.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  19. 19.
    Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  20. 20.
    Wang, P., et al.: Two-step quantization for low-bit neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4376–4384 (2018)Google Scholar
  21. 21.
    Wu, B., et al.: FBNet: hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734–10742 (2019)Google Scholar
  22. 22.
    Wu, B., Wang, Y., Zhang, P., Tian, Y., Vajda, P., Keutzer, K.: Mixed precision quantization of ConvNets via differentiable neural architecture search. In: ICLR (2019)Google Scholar
  23. 23.
    Yochai, Z., et al.: Towards learning of filter-level heterogeneous compression of convolutional neural networks. In: ICML Workshop on AutoML (2019)Google Scholar
  24. 24.
    Yu, H., Wen, T., Cheng, G., Sun, J., Han, Q., Shi, J.: Low-bit quantization needs good distribution. In: CVPR Workshop on Efficient Deep Learning in Computer Vision (2020)Google Scholar
  25. 25.
    Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-nets: learned quantization for highly accurate and compact deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 365–382 (2018)Google Scholar
  26. 26.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Confernce on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)Google Scholar
  27. 27.
    Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless CNNs with low-precision weights. arXiv preprint arXiv:1702.03044 (2017)
  28. 28.
    Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
  29. 29.
    Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.SenseTime ResearchBeijingChina
  2. 2.Peking UniversityBeijingChina
  3. 3.University of Science and Technology BeijingBeijingChina

Personalised recommendations