Abstract
Convolutional neural networks (CNNs) have greatly promoted the development of artificial intelligence. In general, CNNs with high performance are over-parameterized, requiring massive calculations to process and predict the data. It leads CNNs unable to apply to exiting resource-limited intelligence devices. In this paper, we propose an efficient model compression framework based on knowledge distillation to train a compact student network by a large teacher network. Our key point is to introduce a positive-unlabeled (PU) classifier to promote the compressed student network to learn the features of the prominent teacher network as much as possible. During the training, the PU classifier is to discriminate the features of the teacher network as high-quality and discriminate the features of the student network as low-quality. Simultaneously, the student network learns knowledge from the teacher network through the soft-targets and attention features. Extensive experimental evaluations on four benchmark image classification datasets show that our method outperforms the prior works with a large margin at the same parameters and calculations cost. When selecting the VGGNet19 as the teacher network to train on the CIFAR dataset, the student network VGGNet13 achieves 94.47% and 75.73% accuracy on the CIFAR-10 and CIFAR-100 datasets, which improved 1.02% and 2.44%, respectively.
Similar content being viewed by others
References
Ba LJ, Caruana R (2013) Do deep nets really need to be deep? Advances in Neural Information Processing Systems, pp 2654–2662
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, Springer pp 177–186
Chen D, Mei JP, Zhang Y, Wang C, Wang Z, Feng Y, Chen C (2021) Cross-layer distillation with semantic calibration. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp 7028–7036
Chen H, Wang Y, Xu C, Shi B, Xu C, Tian Q, Xu C (2020) Addernet: Do we really need multiplications in deep learning? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1468–1477
Chen W, Hays J (2018) Sketchygan: Towards diverse and realistic sketch to image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9416–9425
Chen Y, Wang N, Zhang Z (2018) Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In: Thirty-Second AAAI Conference on Artificial Intelligence
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Coates A, Ng A, Lee H (2021) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 215–223. JMLR Workshop and Conference Proceedings
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115
Guo T, Xu C, Huang J, Wang Y, Shi B, Xu C, Tao D (2020) On positive-unlabeled classification in gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8385–8393
Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks, pp 293–299. IEEE
He K, Zhang X, Ren S, Jian S (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Heo B, Lee M, Yun S, Choi JY (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 3779–3787
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comp Sci 14(7):38–39
Hoang T, Do TT, Nguyen TV, Cheung NM (2020) Direct quantization for training highly accurate low bit-width deep neural networks. arXiv preprint arXiv:2012.13762
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and\(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift
Kianfar D, Wiggers A, Said A, Pourreza R, Cohen T (2020) Parallelized rate-distortion optimized quantization using deep learning. In: 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), pp 1–6. IEEE
KingaD A (2015) A methodforstochasticoptimization. Anon. InternationalConferenceon Learning Representations. SanDego: ICLR
Kiryo R, Niu G, Plessis MCd, Sugiyama M (2017) Positive-unlabeled learning with non-negative risk estimator. arXiv preprint arXiv:1703.00593
Krizhevsky A, Hinton G, et al. (2009) Learning multiple layers of features from tiny images
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Denker JS, Solla SA, Howard RE, Jackel LD (1989) Optimal brain damage. In: NIPs, vol. 2, pp 598–605. Citeseer
Li F, Zhang B, Liu B (2016) Ternary weight networks. arXiv preprint arXiv:1605.04711
Li H, Hu J, Ran L, Wang Z, Lü Q, Du Z, Huang T (2021) Decentralized dual proximal gradient algorithms for non-smooth constrained composite optimization problems. IEEE Trans Parallel Distrib Syst 32(10):2594–2605
Li Y, Chen X, Wu F, Zha ZJ (2019) Linestofacephoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2323–2331
Liang H, Guo X, Pan Y, Huang T (2020) Event-triggered fuzzy bipartite tracking control for network systems based on distributed reduced-order observers. IEEE Trans Fuzzy Syst 29(6):1601–1614
Liang H, Liu G, Zhang H, Huang T (2020) Neural-network-based event-triggered adaptive control of nonaffine nonlinear multiagent systems with dynamic uncertainties. IEEE Transactions on Neural Networks and Learning Systems 32(5):2239–2250
Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130
Liu Y, Cao J, Li B, Yuan C, Hu W, Li Y, Duan Y (2019) Knowledge distillation via instance relationship graph. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7096–7104
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744
Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng TKT, Sun J (2019) Metapruning: Meta learning for automatic neural network channel pruning. arXiv preprint arXiv:1903.10258
Liu Z, Sun M, Zhou T, Huang G, Darrell T (2018) Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning
Odena A (2016) Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583
Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3967–3976
Passalis N, Tefas A (2018) Learning deep representations with probabilistic knowledge transfer. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 268–284
Peng B, Jin X, Liu J, Li D, Wu Y, Liu Y, Zhou S, Zhang Z (2019) Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5007–5016
Ping Q, Wu B, Ding W, Yuan J (2019) Fashion-attgan: Attribute-aware fashion editing with multi-objective gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision, Springer. pp 525–542
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. Computer Science
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp 6105–6114. PMLR
Tang J, Jin L, Li Z, Gao S (2015) Rgb-d object recognition via incorporating latent data structure and prior knowledge. IEEE Trans Multimedia 17(11):1899–1908
Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. arXiv preprint arXiv:1910.10699
Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1365–1374
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Wang K, Gao X, Zhao Y, Li X, Dou D, Xu CZ (2019) Pay attention to features, transfer learn faster cnns. In: International Conference on Learning Representations
Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4820–4828
Xi C, Yan D, Houthooft R, Schulman J, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Neural Information Processing Systems (NIPS)
Xu S, Ren X, Ma S, Wang H (2017) meprop: Sparsified back propagation for accelerated deep learning with reduced overfitting. In: ICML 2017
Xu Y, Wang Y, Chen H, Han K, Xu C, Tao D, Xu C (2019) Positive-unlabeled compression on the cloud. arXiv preprint arXiv:1909.09757
Xu Y, Xu C, Xu C, Tao D (2017) Multi-positive and unlabeled learning. In: IJCAI, pp 3182–3188
Xu Z, Hsu YC, Huang J (2017) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks
Yi Z, Zhang H, Tan P, Gong M (2017) Dualgan: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE international conference on computer vision, pp 2849–2857
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5802–5810
Acknowledgements
This research is supported by Sichuan Science and Technology Program No. 2022YFG0324, SWUST Doctoral Foundation under Grant 19zx7102.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, N., Tang, J. & Yu, W. Positive-Unlabeled Learning for Knowledge Distillation. Neural Process Lett 55, 2613–2631 (2023). https://doi.org/10.1007/s11063-022-11038-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-11038-7