Skip to main content
Log in

Positive-Unlabeled Learning for Knowledge Distillation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have greatly promoted the development of artificial intelligence. In general, CNNs with high performance are over-parameterized, requiring massive calculations to process and predict the data. It leads CNNs unable to apply to exiting resource-limited intelligence devices. In this paper, we propose an efficient model compression framework based on knowledge distillation to train a compact student network by a large teacher network. Our key point is to introduce a positive-unlabeled (PU) classifier to promote the compressed student network to learn the features of the prominent teacher network as much as possible. During the training, the PU classifier is to discriminate the features of the teacher network as high-quality and discriminate the features of the student network as low-quality. Simultaneously, the student network learns knowledge from the teacher network through the soft-targets and attention features. Extensive experimental evaluations on four benchmark image classification datasets show that our method outperforms the prior works with a large margin at the same parameters and calculations cost. When selecting the VGGNet19 as the teacher network to train on the CIFAR dataset, the student network VGGNet13 achieves 94.47% and 75.73% accuracy on the CIFAR-10 and CIFAR-100 datasets, which improved 1.02% and 2.44%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Ba LJ, Caruana R (2013) Do deep nets really need to be deep? Advances in Neural Information Processing Systems, pp 2654–2662

  2. Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, Springer pp 177–186

  3. Chen D, Mei JP, Zhang Y, Wang C, Wang Z, Feng Y, Chen C (2021) Cross-layer distillation with semantic calibration. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp 7028–7036

  4. Chen H, Wang Y, Xu C, Shi B, Xu C, Tian Q, Xu C (2020) Addernet: Do we really need multiplications in deep learning? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1468–1477

  5. Chen W, Hays J (2018) Sketchygan: Towards diverse and realistic sketch to image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9416–9425

  6. Chen Y, Wang N, Zhang Z (2018) Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In: Thirty-Second AAAI Conference on Artificial Intelligence

  7. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  8. Coates A, Ng A, Lee H (2021) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 215–223. JMLR Workshop and Conference Proceedings

  9. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  10. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  11. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

  12. Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115

  13. Guo T, Xu C, Huang J, Wang Y, Shi B, Xu C, Tao D (2020) On positive-unlabeled classification in gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8385–8393

  14. Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks, pp 293–299. IEEE

  15. He K, Zhang X, Ren S, Jian S (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  16. Heo B, Lee M, Yun S, Choi JY (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 3779–3787

  17. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comp Sci 14(7):38–39

    Google Scholar 

  18. Hoang T, Do TT, Nguyen TV, Cheung NM (2020) Direct quantization for training highly accurate low bit-width deep neural networks. arXiv preprint arXiv:2012.13762

  19. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  20. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  21. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and\(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360

  22. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift

  23. Kianfar D, Wiggers A, Said A, Pourreza R, Cohen T (2020) Parallelized rate-distortion optimized quantization using deep learning. In: 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), pp 1–6. IEEE

  24. KingaD A (2015) A methodforstochasticoptimization. Anon. InternationalConferenceon Learning Representations. SanDego: ICLR

  25. Kiryo R, Niu G, Plessis MCd, Sugiyama M (2017) Positive-unlabeled learning with non-negative risk estimator. arXiv preprint arXiv:1703.00593

  26. Krizhevsky A, Hinton G, et al. (2009) Learning multiple layers of features from tiny images

  27. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  28. LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  29. LeCun Y, Denker JS, Solla SA, Howard RE, Jackel LD (1989) Optimal brain damage. In: NIPs, vol. 2, pp 598–605. Citeseer

  30. Li F, Zhang B, Liu B (2016) Ternary weight networks. arXiv preprint arXiv:1605.04711

  31. Li H, Hu J, Ran L, Wang Z, Lü Q, Du Z, Huang T (2021) Decentralized dual proximal gradient algorithms for non-smooth constrained composite optimization problems. IEEE Trans Parallel Distrib Syst 32(10):2594–2605

    Article  Google Scholar 

  32. Li Y, Chen X, Wu F, Zha ZJ (2019) Linestofacephoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2323–2331

  33. Liang H, Guo X, Pan Y, Huang T (2020) Event-triggered fuzzy bipartite tracking control for network systems based on distributed reduced-order observers. IEEE Trans Fuzzy Syst 29(6):1601–1614

    Article  Google Scholar 

  34. Liang H, Liu G, Zhang H, Huang T (2020) Neural-network-based event-triggered adaptive control of nonaffine nonlinear multiagent systems with dynamic uncertainties. IEEE Transactions on Neural Networks and Learning Systems 32(5):2239–2250

    Article  MathSciNet  Google Scholar 

  35. Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130

  36. Liu Y, Cao J, Li B, Yuan C, Hu W, Li Y, Duan Y (2019) Knowledge distillation via instance relationship graph. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7096–7104

  37. Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744

  38. Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng TKT, Sun J (2019) Metapruning: Meta learning for automatic neural network channel pruning. arXiv preprint arXiv:1903.10258

  39. Liu Z, Sun M, Zhou T, Huang G, Darrell T (2018) Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270

  40. Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440

  41. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning

  42. Odena A (2016) Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583

  43. Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3967–3976

  44. Passalis N, Tefas A (2018) Learning deep representations with probabilistic knowledge transfer. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 268–284

  45. Peng B, Jin X, Liu J, Li D, Wu Y, Liu Y, Zhou S, Zhang Z (2019) Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5007–5016

  46. Ping Q, Wu B, Ding W, Yuan J (2019) Fashion-attgan: Attribute-aware fashion editing with multi-objective gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

  47. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision, Springer. pp 525–542

  48. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  49. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. Computer Science

  50. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32

  51. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  52. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  53. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp 6105–6114. PMLR

  54. Tang J, Jin L, Li Z, Gao S (2015) Rgb-d object recognition via incorporating latent data structure and prior knowledge. IEEE Trans Multimedia 17(11):1899–1908

    Article  Google Scholar 

  55. Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. arXiv preprint arXiv:1910.10699

  56. Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1365–1374

  57. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762

  58. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  59. Wang K, Gao X, Zhao Y, Li X, Dou D, Xu CZ (2019) Pay attention to features, transfer learn faster cnns. In: International Conference on Learning Representations

  60. Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks

  61. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  62. Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4820–4828

  63. Xi C, Yan D, Houthooft R, Schulman J, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Neural Information Processing Systems (NIPS)

  64. Xu S, Ren X, Ma S, Wang H (2017) meprop: Sparsified back propagation for accelerated deep learning with reduced overfitting. In: ICML 2017

  65. Xu Y, Wang Y, Chen H, Han K, Xu C, Tao D, Xu C (2019) Positive-unlabeled compression on the cloud. arXiv preprint arXiv:1909.09757

  66. Xu Y, Xu C, Xu C, Tao D (2017) Multi-positive and unlabeled learning. In: IJCAI, pp 3182–3188

  67. Xu Z, Hsu YC, Huang J (2017) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks

  68. Yi Z, Zhang H, Tan P, Gong M (2017) Dualgan: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE international conference on computer vision, pp 2849–2857

  69. Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928

  70. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146

  71. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  72. Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5802–5810

Download references

Acknowledgements

This research is supported by Sichuan Science and Technology Program No. 2022YFG0324, SWUST Doctoral Foundation under Grant 19zx7102.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, N., Tang, J. & Yu, W. Positive-Unlabeled Learning for Knowledge Distillation. Neural Process Lett 55, 2613–2631 (2023). https://doi.org/10.1007/s11063-022-11038-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-11038-7

Keywords

Navigation