Advertisement

Deep Transferring Quantization

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)

Abstract

Network quantization is an effective method for network compression. Existing methods train a low-precision network by fine-tuning from a pre-trained model. However, training a low-precision network often requires large-scale labeled data to achieve superior performance. In many real-world scenarios, only limited labeled data are available due to expensive labeling costs or privacy protection. With limited training data, fine-tuning methods may suffer from the overfitting issue and substantial accuracy loss. To alleviate these issues, we introduce transfer learning into network quantization to obtain an accurate low-precision model. Specifically, we propose a method named deep transferring quantization (DTQ) to effectively exploit the knowledge in a pre-trained full-precision model. To this end, we propose a learnable attentive transfer module to identify the informative channels for alignment. In addition, we introduce the Kullback-Leibler (KL) divergence to further help train a low-precision model. Extensive experiments on both image classification and face recognition demonstrate the effectiveness of DTQ.

Keywords

Quantization Deep transfer Knowledge distillation 

Notes

Acknowledgements

This work was partially supported by the Key-Area Research and Development Program of Guangdong Province 2019B010155002, National Natural Science Foundation of China (NSFC) 61836003 (key project), Program for Guangdong Introducing Innovative and Entrepreneurial Teams 2017ZT07X183, Fundamental Research Funds for the Central Universities D2191240.

Supplementary material

504445_1_En_37_MOESM1_ESM.pdf (1.7 mb)
Supplementary material 1 (pdf 1717 KB)

References

  1. 1.
    Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Conference on Neural Information Processing Systems, pp. 2654–2662 (2014)Google Scholar
  2. 2.
    Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: ACIQ: analytical clipping for integer quantization of neural networks. arXiv preprint arXiv:1810.05723 (2018)
  3. 3.
    Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Conference on Neural Information Processing Systems, pp. 7948–7956 (2019)Google Scholar
  4. 4.
    Bossard, L., Guillaumin, M., Van Gool, L.: Food-101-mining discriminative components with random forests. In: European Conference on Computer Vision, pp. 446–461 (2014)Google Scholar
  5. 5.
    Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: ZeroQ: a novel zero shot quantization framework. arXiv preprint arXiv:2001.00281 (2020)
  6. 6.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetGoogle Scholar
  7. 7.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)Google Scholar
  8. 8.
    Chen, S., Liu, Y., Gao, X., Han, Z.: MobileFaceNets: efficient CNNs for accurate real-time face verification on mobile devices. In: Chinese Conference on Biometric Recognition, pp. 428–438 (2018)Google Scholar
  9. 9.
    Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018)
  10. 10.
    Cui, Y., Song, Y., Sun, C., Howard, A., Belongie, S.: Large scale fine-grained categorization and domain-specific transfer learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  11. 11.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  12. 12.
    Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)Google Scholar
  13. 13.
    Deng, J., Guo, J., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698v1 (2018)
  14. 14.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)MathSciNetGoogle Scholar
  15. 15.
    Ge, W., Yu, Y.: Borrowing treasures from the wealthy: deep transfer learning through selective joint fine-tuning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1086–1095 (2017)Google Scholar
  16. 16.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)Google Scholar
  17. 17.
    Guo, Y., et al.: Breaking the curse of space explosion: towards efficient nas with curriculum search. In: International Conference on Machine Learning (2020)Google Scholar
  18. 18.
    Guo, Y., et al.: NAT: neural architecture transformer for accurate and compact architectures. In: Conference on Neural Information Processing Systems, pp. 735–747 (2019)Google Scholar
  19. 19.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (2016)Google Scholar
  20. 20.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  21. 21.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  22. 22.
    He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: IEEE International Conference on Computer Vision, pp. 1389–1397 (2017)Google Scholar
  23. 23.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)
  24. 24.
    Hong, S., Oh, J., Lee, H., Han, B.: Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3204–3212 (2016)Google Scholar
  25. 25.
    Huang, Z., Wang, N.: Like what you like: knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017)
  26. 26.
    Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Conference on Neural Information Processing Systems, pp. 4107–4115 (2016)Google Scholar
  27. 27.
    Huh, M., Agrawal, P., Efros, A.A.: What makes imagenet good for transfer learning? arXiv:1608.08614 (2016)
  28. 28.
    Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)Google Scholar
  29. 29.
    Khosla, A., Jayadevaprakash, N., Yao, B., Li, F.F.: Novel dataset for fine-grained image categorization: stanford dogs. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, vol. 2 (2011)Google Scholar
  30. 30.
    Kirkpatrick, J., Pascanu, R., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Leroux, S., Vankeirsbilck, B., Verbelen, T., Simoens, P., Dhoedt, B.: Training binary neural networks with knowledge transfer. Neurocomputing 396, 534–541 (2020)Google Scholar
  32. 32.
    Li, X., Xiong, H., et al.: DELTA: deep learning transfer using feature map with attention for convolutional networks. In: International Conference on Learning Representations (2019)Google Scholar
  33. 33.
    Li, X., Grandvalet, Y., Davoine, F.: Explicit inductive bias for transfer learning with convolutional networks. In: International Conference on Machine Learning, pp. 2830–2839 (2018)Google Scholar
  34. 34.
    Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)Google Scholar
  35. 35.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  36. 36.
    Liu, J., et al.: Sparse deep transfer learning for convolutional neural network. In: AAAI Conference on Artificial Intelligence (2017)Google Scholar
  37. 37.
    Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: IEEE International Conference on Computer Vision, pp. 5058–5066 (2017)Google Scholar
  38. 38.
    Moon, S., Carbonell, J.G.: Completely heterogeneous transfer learning with attention-what and what not to transfer. In: International Joint Conferences on Artificial Intelligence (2017)Google Scholar
  39. 39.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)Google Scholar
  40. 40.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, pp. 525–542 (2016)Google Scholar
  41. 41.
    Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  42. 42.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  43. 43.
    Romero, A., Ballas, N., et al.: FitNets: hints for thin deep nets. In: International Conference on Learning Representations (2015)Google Scholar
  44. 44.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)Google Scholar
  45. 45.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)MathSciNetGoogle Scholar
  46. 46.
    Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: European Conference on Computer Vision, pp. 213–226 (2010)Google Scholar
  47. 47.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  48. 48.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  49. 49.
    Tai, C., Xiao, T., Wang, X., Weinan, E.: Convolutional neural networks with low-rank regularization. In: International Conference on Learning Representations (2016)Google Scholar
  50. 50.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708. IEEE Computer Society (2014)Google Scholar
  51. 51.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)Google Scholar
  52. 52.
    Wang, H., et al.: CosFace: large margin cosine loss for deep face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5265–5274 (2018)Google Scholar
  53. 53.
    Wei, Y., Pan, X., Qin, H., Ouyang, W., Yan, J.: Quantization mimic: towards very tiny CNN for object detection. In: European Conference on Computer Vision, pp. 267–283 (2018)Google Scholar
  54. 54.
    Xu, J., Nie, Y., Wang, P., López, A.M.: Training a binary weight object detector by knowledge transfer for autonomous driving. In: International Conference on Robotics and Automation, pp. 2379–2384 (2019)Google Scholar
  55. 55.
    Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning (2015)Google Scholar
  56. 56.
    Yao, Y., Ren, J., Xie, X., Liu, W., Liu, Y., Wang, J.: Attention-aware multi-stroke style transfer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1467–1475 (2019)Google Scholar
  57. 57.
    Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv preprint arXiv:1411.7923 (2014)
  58. 58.
    Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  59. 59.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Conference on Neural Information Processing Systems, pp. 3320–3328 (2014)Google Scholar
  60. 60.
    Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In: International Conference on Learning Representations (2017)Google Scholar
  61. 61.
    Zhang, B., Zhang, L., Zhang, D., Shen, L.: Directional binary code with application to polyu near-infrared face database. Pattern Recognit. Lett. 31(14), 2337–2344 (2010)Google Scholar
  62. 62.
    Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-Nets: learned quantization for highly accurate and compact deep neural networks. In: European Conference on Computer Vision, pp. 365–382 (2018)Google Scholar
  63. 63.
    Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 1943–1955 (2015)Google Scholar
  64. 64.
    Zhang, Y., Chen, H., Wei, Y., et al.: From whole slide imaging to microscopy: Deep microscopy adaptation network for histopathology cancer image classification. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 360–368 (2019)Google Scholar
  65. 65.
    Zhang, Y., et al.: Collaborative unsupervised domain adaptation for medical image diagnosis. IEEE Trans. Image Process. 29, 7834–7844 (2020)Google Scholar
  66. 66.
    Zhang, Y., Zhang, Y., Yang, Q.: Parameter transfer unit for deep neural networks. In: The Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 82–95 (2019)Google Scholar
  67. 67.
    Zhao, R., Hu, Y., Dotzel, J., De Sa, C., Zhang, Z.: Improving neural network quantization without retraining using outlier channel splitting. In: International Conference on Machine Learning, pp. 7543–7552 (2019)Google Scholar
  68. 68.
    Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
  69. 69.
    Zhu, M., Wang, N., Gao, X., Li, J., Li, Z.: Face photo-sketch synthesis via knowledge transfer. In: International Joint Conferences on Artificial Intelligence, pp. 1048–1054 (2019)Google Scholar
  70. 70.
    Zhuang, B., Liu, J., Tan, M., Liu, L., Reid, I., Shen, C.: Effective training of convolutional neural networks with low-bitwidth weights and activations. arXiv preprint arXiv:1908.04680 (2019)
  71. 71.
    Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.: Towards effective low-bitwidth convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7920–7928 (2018)Google Scholar
  72. 72.
    Zhuang, Z., et al.: Discrimination-aware channel pruning for deep neural networks. In: Conference on Neural Information Processing Systems, pp. 875–886 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.South China University of TechnologyGuangzhouChina
  2. 2.PengCheng LaboratoryShenzhenChina
  3. 3.HuNan Gmax Intelligent TechnologyChangshaChina

Personalised recommendations