Abstract
Sufficient knowledge extraction from the teacher network plays a critical role in the knowledge distillation task to improve the performance of the student network. Existing methods mainly focus on the consistency of instance-level features and their relationships, but neglect the local features and their correlation, which also contain many details and discriminative patterns. In this paper, we propose the local correlation exploration framework for knowledge distillation. It models three kinds of local knowledge, including intra-instance local relationship, inter-instance relationship on the same local position, and the inter-instance relationship across different local positions. Moreover, to make the student focus on those informative local regions of the teacher’s feature maps, we propose a novel class-aware attention module to highlight the class-relevant regions and remove the confusing class-irrelevant regions, which makes the local correlation knowledge more accurate and valuable. We conduct extensive experiments and ablation studies on challenging datasets, including CIFAR100 and ImageNet, to show our superiority over the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aguilar, G., Ling, Y., Zhang, Y., Yao, B., Fan, X., Guo, E.: Knowledge distillation from internal representations. In: AAAI (2020)
Chung, I., Park, S., Kim, J., Kwak, N.: Feature-map-level online adversarial knowledge distillation. In: ICML (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE CVPR, pp. 248–255 (2009)
Dong, X., Yang, Y.: Network pruning via transformable architecture search. In: NeurIPS, pp. 759–770 (2019)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2016)
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: NeurIPS, pp. 1135–1143 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, pp. 770–778 (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NeurIPS Deep Learning Workshop (2014)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE CVPR, pp. 7132–7141 (2018)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18(1), 6869–6898 (2017)
Kim, H.G., et al.: Knowledge distillation using output errors for self-attention end-to-end models. In: ICASSP, pp. 6181–6185 (2019)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS, pp. 1097–1105 (2012)
Lan, X., Zhu, X., Gong, S.: Knowledge distillation by on-the-fly native ensemble. In: NeurIPS, pp. 7528–7538 (2018)
Liu, Y., et al.: Knowledge distillation via instance relationship graph. In: IEEE CVPR, pp. 7096–7104 (2019)
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. In: ICLR (2017)
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: IEEE CVPR, pp. 3967–3976 (2019)
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)
Peng, B., et al.: Correlation congruence for knowledge distillation. In: IEEE ICCV, pp. 5007–5016 (2019)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE ICCV, pp. 618–626 (2017)
Shu, C., Li, P., Xie, Y., Qu, Y., Dai, L., Ma, L.: Knowledge squeezed adversarial network compression. arXiv preprint arXiv:1904.05100 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: NeurIPS, pp. 1988–1996 (2014)
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: IEEE ICCV, pp. 1365–1374 (2019)
Wang, F., et al.: The devil of face recognition is in the noise. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 780–795. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_47
Wang, F., et al.: Residual attention network for image classification. In: IEEE CVPR, pp. 3156–3164 (2017)
Wu, J., et al.: Deep comprehensive correlation mining for image clustering. In: IEEE ICCV, pp. 8150–8159 (2019)
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: IEEE CVPR, pp. 4820–4828 (2016)
Yang, L., Song, Q., Wu, Y., Hu, M.: Attention inspiring receptive-fields network for learning invariant representations. IEEE TNNLS 30(6), 1744–1755 (2018)
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: IEEE CVPR, pp. 7130–7138 (2017)
You, S., Huang, T., Yang, M., Wang, F., Qian, C., Zhang, C.: GreedyNAS: towards fast one-shot NAS with greedy supernet. In: IEEE CVPR, pp. 1999–2008 (2020)
You, S., Xu, C., Xu, C., Tao, D.: Learning from multiple teacher networks. In: KDD, pp. 1285–1294 (2017)
You, S., Xu, C., Xu, C., Tao, D.: Learning with single-teacher multi-student. In: AAAI, pp. 4390–4397 (2018)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: IEEE ICCV, pp. 3713–3722 (2019)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: IEEE CVPR, pp. 6848–6856 (2018)
Zhang, Z., Ning, G., He, Z.: Knowledge projection for deep neural networks. arXiv preprint arXiv:1710.09505 (2017)
Acknowledgment
Jianlong Wu is the corresponding author, who is supported by the Fundamental Research Funds and the Future Talents Research Funds of Shandong University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, X., Wu, J., Fang, H., Liao, Y., Wang, F., Qian, C. (2020). Local Correlation Consistency for Knowledge Distillation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12357. Springer, Cham. https://doi.org/10.1007/978-3-030-58610-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-58610-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58609-6
Online ISBN: 978-3-030-58610-2
eBook Packages: Computer ScienceComputer Science (R0)