Skip to main content

Self-Regulated Feature Learning via Teacher-free Feature Distillation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13686))

Included in the following conference series:

Abstract

Knowledge distillation conditioned on intermediate feature representations always leads to significant performance improvements. Conventional feature distillation framework demands extra selecting/training budgets of teachers and complex transformations to align the features between teacher-student models. To address the problem, we analyze teacher roles in feature distillation and have an intriguing observation: additional teacher architectures are not always necessary. Then we propose Tf-FD, a simple yet effective Teacher-\(\boldsymbol{f}\)ree Feature Distillation framework, reusing channel-wise and layer-wise meaningful features within the student to provide teacher-like knowledge without an additional model. In particular, our framework is subdivided into intra-layer and inter-layer distillation. The intra-layer Tf-FD performs feature salience ranking and transfers the knowledge from salient feature to redundant feature within the same layer. For inter-layer Tf-FD, we deal with distilling high-level semantic knowledge embedded in the deeper layer representations to guide the training of shallow layers. Benefiting from the small gap between these self-features, Tf-FD simply needs to optimize extra feature mimicking losses without complex transformations. Furthermore, we provide insightful discussions to shed light on Tf-FD from feature regularization perspectives. Our experiments conducted on classification and object detection tasks demonstrate that our technique achieves state-of-the-art results on different models with fast training speeds. Code is available at https://lilujunai.github.io/Teacher-free-Distillation/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: CVPR (2019)

    Google Scholar 

  2. Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint, arXiv:2005.14165 (2020)

  3. Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: KDD (2006)

    Google Scholar 

  4. Chattopadhyay, A., Sarkar, A., Howlader, P., Balasubramanian, V.: Grad-CAM++: Improved visual explanations for deep convolutional networks. In: WACV (2018)

    Google Scholar 

  5. Chen, D., Mei, J.P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., Chen, C.: Cross-layer distillation with semantic calibration. arXiv preprint, arXiv:2012.03236 (2020)

  6. Cheng, X., Rao, Z., Chen, Y., Zhang, Q.: Explaining knowledge distillation by quantifying the knowledge. In: CVPR (2020)

    Google Scholar 

  7. Chung, I., Park, S., Kim, J., Kwak, N.: Feature-map-level online adversarial knowledge distillation. In: ICML (2020)

    Google Scholar 

  8. Dong, P., Niu, X., Li, L., Xie, L., Zou, W., Ye, T., Wei, Z., Pan, H.: Prior-guided one-shot neural architecture search. arXiv preprint arXiv:2206.13329 (2022)

  9. Dong, Z., Hanwang, Z., Jinhui, T., Xiansheng, H., Qianru, S.: Self-regulation for semantic segmentation. International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  10. Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. In: NeurIPS (2018)

    Google Scholar 

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  12. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: CVPR (2020)

    Google Scholar 

  13. Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. In: NeurIPS (2015)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  15. Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: AAAI (2019)

    Google Scholar 

  16. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint, arXiv:1503.02531 (2015)

  17. Hou, Y., Ma, Z., Liu, C., Loy, C.C.: Learning lightweight lane detection CNNS by self attention distillation. In: ICCV (2019)

    Google Scholar 

  18. Hu, Y., Wang, X., Li, L., Gu, Q.: Improving one-shot NAS with shrinking-and-expanding supernet. Pattern Recogn. 118, 108025 (2021)

    Article  Google Scholar 

  19. Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018)

    Google Scholar 

  20. Huang, Z., Wang, N.: Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint, arXiv:1707.01219 (2017)

  21. Ji, M., Shin, S., Hwang, S., Park, G., Moon, I.C.: Refine myself by teaching myself: feature refinement via self-knowledge distillation. arXiv preprint, arXiv:2103.08273 (2021)

  22. Kim, J., Park, S., Kwak, N.: Paraphrasing complex network: Network compression via factor transfer. In: NeurIPS (2018)

    Google Scholar 

  23. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report (2009)

    Google Scholar 

  24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)

    Google Scholar 

  25. Lan, X., Zhu, X., Gong, S.: Knowledge distillation by on-the-fly native ensemble. In: NeurIPS (2018)

    Google Scholar 

  26. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: AISTATS (2015)

    Google Scholar 

  27. Lee, H., Hwang, S.J., Shin, J.: Self-supervised label augmentation via input transformations. In: ICML (2020)

    Google Scholar 

  28. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: ICLR (2017)

    Google Scholar 

  29. Li, L., Shiuan-Ni, L., Yang, Y., Jin, Z.: Boosting online feature transfer via separable feature fusion. In: IJCNN (2022)

    Google Scholar 

  30. Li, L., Shiuan-Ni, L., Yang, Y., Jin, Z.: Teacher-free distillation via regularizing intermediate representation. In: IJCNN (2022)

    Google Scholar 

  31. Li, L., Wang, Y., Yao, A., Qian, Y., Zhou, X., He, K.: Explicit connection distillation (2020)

    Google Scholar 

  32. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  33. Liu, L., et al.: Exploring inter-channel correlation for diversity-preserved knowledge distillation. In: ICCV (2021)

    Google Scholar 

  34. Liu, Y., et al.: Search to distill: Pearls are everywhere but not the eyes. In: CVPR (2020)

    Google Scholar 

  35. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV (2017)

    Google Scholar 

  36. Malinin, A., Mlodozeniec, B., Gales, M.: Ensemble distribution distillation. In: ICLR (2020)

    Google Scholar 

  37. Pan, H., Jiang, H., Niu, X., Dou, Y.: Dropfilter: A novel regularization method for learning convolutional neural networks. arXiv preprint, arXiv:1811.06783 (2018)

  38. Passalis, N., Tefas, A.: Learning deep representations with probabilistic knowledge transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 283–299. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_17

    Chapter  Google Scholar 

  39. Peng, B., et al.: Correlation congruence for knowledge distillation. In: ICCV (2019)

    Google Scholar 

  40. Pengguang, C., Shu, L., Hengshuang, Z., Jia, J.: Distilling knowledge via knowledge review. In: CVPR (2021)

    Google Scholar 

  41. Phuong, M., Lampert, C.H.: Distillation-based training for multi-exit architectures. In: ICCV (2019)

    Google Scholar 

  42. Qin, J., Wu, J., Xiao, X., Li, L., Wang, X.: Activation modulation and recalibration scheme for weakly supervised semantic segmentation. In: AAAI (2022)

    Google Scholar 

  43. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint, arXiv:1506.01497 (2015)

  44. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. In: ICLR (2015)

    Google Scholar 

  45. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  46. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint, arXiv:1409.1556 (2014)

  47. Sun, D., Yao, A.: Deeply-supervised knowledge synergy. In: CVPR (2019)

    Google Scholar 

  48. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: ICCV (2016)

    Google Scholar 

  49. Tang, Y., Wang, Y., Xu, Y., Shi, B., Xu, C., Xu, C., Xu, C.: Beyond dropout: feature map distortion to regularize deep neural networks. In: AAAI (2020)

    Google Scholar 

  50. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)

    Google Scholar 

  51. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR (2015)

    Google Scholar 

  52. Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: ICCV (2019)

    Google Scholar 

  53. Vaswani, A., et al.: Attention is all you need. arXiv preprint, arXiv:1706.03762 (2017)

  54. Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: CVPR (2019)

    Google Scholar 

  55. Wolchover, N., Reading, L.: New theory cracks open the black box of deep learning. Quanta Magazine (2017)

    Google Scholar 

  56. Yang, Z., et al.: Focal and global knowledge distillation for detectors. In: CVPR (2022)

    Google Scholar 

  57. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: CVPR (2017)

    Google Scholar 

  58. Yuan, L., Tay, F.E., Li, G., Wang, T., Feng, J.: Revisiting knowledge distillation via label smoothing regularization. In: CVPR (2020)

    Google Scholar 

  59. Yun, S., Park, J.S., Lee, K., Shin, J.: Regularizing class-wise predictions via self-knowledge distillation. In: CVPR (2020)

    Google Scholar 

  60. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)

    Google Scholar 

  61. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)

    Google Scholar 

  62. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: ICCV (2019)

    Google Scholar 

  63. Zhang, Z., Sabuncu, M.R.: Self-distillation as instance-specific label smoothing. arXiv preprint, arXiv:2006.05065 (2020)

  64. Zhao, C., Ni, B., Zhang, J., Zhao, Q., Zhang, W., Tian, Q.: Variational convolutional neural network pruning. In: CVPR (2019)

    Google Scholar 

  65. Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless CNNs with low-precision weights. In: ICLR (2017)

    Google Scholar 

  66. Zhou, S., Yuxin, W., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low Bitwidth convolutional neural networks with low Bitwidth gradients. arXiv preprint, arXiv:1606.06160 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lujun Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 144 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, L. (2022). Self-Regulated Feature Learning via Teacher-free Feature Distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13686. Springer, Cham. https://doi.org/10.1007/978-3-031-19809-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19809-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19808-3

  • Online ISBN: 978-3-031-19809-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics