Local Correlation Consistency for Knowledge Distillation

Li, Xiaojie; Wu, Jianlong; Fang, Hongyu; Liao, Yue; Wang, Fei; Qian, Chen

doi:10.1007/978-3-030-58610-2_2

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12357))

Included in the following conference series:

European Conference on Computer Vision

4840 Accesses
19 Citations

Abstract

Sufficient knowledge extraction from the teacher network plays a critical role in the knowledge distillation task to improve the performance of the student network. Existing methods mainly focus on the consistency of instance-level features and their relationships, but neglect the local features and their correlation, which also contain many details and discriminative patterns. In this paper, we propose the local correlation exploration framework for knowledge distillation. It models three kinds of local knowledge, including intra-instance local relationship, inter-instance relationship on the same local position, and the inter-instance relationship across different local positions. Moreover, to make the student focus on those informative local regions of the teacher’s feature maps, we propose a novel class-aware attention module to highlight the class-relevant regions and remove the confusing class-irrelevant regions, which makes the local correlation knowledge more accurate and valuable. We conduct extensive experiments and ablation studies on challenging datasets, including CIFAR100 and ImageNet, to show our superiority over the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic-aware knowledge distillation with parameter-free feature uniformization

Article Open access 08 May 2023

Improving knowledge distillation via pseudo-multi-teacher network

Article 04 March 2023

What Role Does Data Augmentation Play in Knowledge Distillation?

References

Aguilar, G., Ling, Y., Zhang, Y., Yao, B., Fan, X., Guo, E.: Knowledge distillation from internal representations. In: AAAI (2020)
Google Scholar
Chung, I., Park, S., Kim, J., Kwak, N.: Feature-map-level online adversarial knowledge distillation. In: ICML (2020)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE CVPR, pp. 248–255 (2009)
Google Scholar
Dong, X., Yang, Y.: Network pruning via transformable architecture search. In: NeurIPS, pp. 759–770 (2019)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2016)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: NeurIPS, pp. 1135–1143 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, pp. 770–778 (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NeurIPS Deep Learning Workshop (2014)
Google Scholar
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE CVPR, pp. 7132–7141 (2018)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18(1), 6869–6898 (2017)
MathSciNet MATH Google Scholar
Kim, H.G., et al.: Knowledge distillation using output errors for self-attention end-to-end models. In: ICASSP, pp. 6181–6185 (2019)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS, pp. 1097–1105 (2012)
Google Scholar
Lan, X., Zhu, X., Gong, S.: Knowledge distillation by on-the-fly native ensemble. In: NeurIPS, pp. 7528–7538 (2018)
Google Scholar
Liu, Y., et al.: Knowledge distillation via instance relationship graph. In: IEEE CVPR, pp. 7096–7104 (2019)
Google Scholar
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. In: ICLR (2017)
Google Scholar
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: IEEE CVPR, pp. 3967–3976 (2019)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)
Google Scholar
Peng, B., et al.: Correlation congruence for knowledge distillation. In: IEEE ICCV, pp. 5007–5016 (2019)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE ICCV, pp. 618–626 (2017)
Google Scholar
Shu, C., Li, P., Xie, Y., Qu, Y., Dai, L., Ma, L.: Knowledge squeezed adversarial network compression. arXiv preprint arXiv:1904.05100 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: NeurIPS, pp. 1988–1996 (2014)
Google Scholar
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)
Google Scholar
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: IEEE ICCV, pp. 1365–1374 (2019)
Google Scholar
Wang, F., et al.: The devil of face recognition is in the noise. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 780–795. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_47
Chapter Google Scholar
Wang, F., et al.: Residual attention network for image classification. In: IEEE CVPR, pp. 3156–3164 (2017)
Google Scholar
Wu, J., et al.: Deep comprehensive correlation mining for image clustering. In: IEEE ICCV, pp. 8150–8159 (2019)
Google Scholar
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: IEEE CVPR, pp. 4820–4828 (2016)
Google Scholar
Yang, L., Song, Q., Wu, Y., Hu, M.: Attention inspiring receptive-fields network for learning invariant representations. IEEE TNNLS 30(6), 1744–1755 (2018)
Google Scholar
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: IEEE CVPR, pp. 7130–7138 (2017)
Google Scholar
You, S., Huang, T., Yang, M., Wang, F., Qian, C., Zhang, C.: GreedyNAS: towards fast one-shot NAS with greedy supernet. In: IEEE CVPR, pp. 1999–2008 (2020)
Google Scholar
You, S., Xu, C., Xu, C., Tao, D.: Learning from multiple teacher networks. In: KDD, pp. 1285–1294 (2017)
Google Scholar
You, S., Xu, C., Xu, C., Tao, D.: Learning with single-teacher multi-student. In: AAAI, pp. 4390–4397 (2018)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)
Google Scholar
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: IEEE ICCV, pp. 3713–3722 (2019)
Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: IEEE CVPR, pp. 6848–6856 (2018)
Google Scholar
Zhang, Z., Ning, G., He, Z.: Knowledge projection for deep neural networks. arXiv preprint arXiv:1710.09505 (2017)

Download references

Acknowledgment

Jianlong Wu is the corresponding author, who is supported by the Fundamental Research Funds and the Future Talents Research Funds of Shandong University.

Author information

Authors and Affiliations

SenseTime Research, Beijing, China
Xiaojie Li, Fei Wang & Chen Qian
School of Computer Science and Technology, Shandong University, Qingdao, China
Jianlong Wu
Zhejiang Laboratory, Hangzhou, China
Jianlong Wu
School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Hongyu Fang
School of Computer Science and Engineering, Beihang University, Beijing, China
Yue Liao

Authors

Xiaojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianlong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Liao
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Qian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianlong Wu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1049 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Wu, J., Fang, H., Liao, Y., Wang, F., Qian, C. (2020). Local Correlation Consistency for Knowledge Distillation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12357. Springer, Cham. https://doi.org/10.1007/978-3-030-58610-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-58610-2_2
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58609-6
Online ISBN: 978-3-030-58610-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Local Correlation Consistency for Knowledge Distillation

Abstract

Access this chapter

Similar content being viewed by others

Semantic-aware knowledge distillation with parameter-free feature uniformization

Improving knowledge distillation via pseudo-multi-teacher network

What Role Does Data Augmentation Play in Knowledge Distillation?

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1049 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Local Correlation Consistency for Knowledge Distillation

Abstract

Access this chapter

Similar content being viewed by others

Semantic-aware knowledge distillation with parameter-free feature uniformization

Improving knowledge distillation via pseudo-multi-teacher network

What Role Does Data Augmentation Play in Knowledge Distillation?

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1049 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation