Advertisement

Intra-class Feature Variation Distillation for Semantic Segmentation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12352)

Abstract

Current state-of-the-art semantic segmentation methods usually require high computational resources for accurate segmentation. One promising way to achieve a good trade-off between segmentation accuracy and efficiency is knowledge distillation. In this paper, different from previous methods performing knowledge distillation for densely pairwise relations, we propose a novel intra-class feature variation distillation (IFVD) to transfer the intra-class feature variation (IFV) of the cumbersome model (teacher) to the compact model (student). Concretely, we compute the feature center (regarded as the prototype) of each class and characterize the IFV with the set of similarity between the feature on each pixel and its corresponding class-wise prototype. The teacher model usually learns more robust intra-class feature representation than the student model, making them have different IFV. Transferring such IFV from teacher to student could make the student mimic the teacher better in terms of feature distribution, and thus improve the segmentation accuracy. We evaluate the proposed approach on three widely adopted benchmarks: Cityscapes, CamVid and Pascal VOC 2012, consistently improving state-of-the-art methods. The code is available at https://github.com/YukangWang/IFVD.

Keywords

Semantic segmentation Knowledge distillation Intra-class feature variation 

Notes

Acknowledgement

This work was supported in part by the Major Project for New Generation of AI under Grant no. 2018AAA0100400, NSFC 61703171, and NSF of Hubei Province of China under Grant 2018CFB199. Dr. Yongchao Xu was supported by the Young Elite Scientists Sponsorship Program by CAST.

References

  1. 1.
  2. 2.
    Alvarez, J.M., Salzmann, M.: Learning the number of neurons in deep networks. In: Proceedings of NIPS, pp. 2270–2278 (2016)Google Scholar
  3. 3.
    Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Proceedings of NIPS, pp. 2654–2662 (2014)Google Scholar
  4. 4.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  5. 5.
    Breiman, L., Shang, N.: Born again trees. University of California, Berkeley, Berkeley, CA, Technical report 1, 2 (1996)Google Scholar
  6. 6.
    Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_5CrossRefGoogle Scholar
  7. 7.
    Buciluă, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of SIGKDD, pp. 535–541 (2006)Google Scholar
  8. 8.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  9. 9.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  10. 10.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_49CrossRefGoogle Scholar
  11. 11.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of CVPR, pp. 3213–3223 (2016)Google Scholar
  12. 12.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  13. 13.
    Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of CVPR, pp. 3146–3154 (2019)Google Scholar
  14. 14.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proceedings of ICLR (2016)Google Scholar
  15. 15.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Proceedings of NIPS, pp. 1135–1143 (2015)Google Scholar
  16. 16.
    Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: Proceedings of ICCV, pp. 991–998 (2011)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)Google Scholar
  18. 18.
    He, T., Shen, C., Tian, Z., Gong, D., Sun, C., Yan, Y.: Knowledge adaptation for efficient semantic segmentation. In: Proceedings of CVPR, pp. 578–587 (2019)Google Scholar
  19. 19.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: Proceedings of NIPS Workshop (2014)Google Scholar
  20. 20.
    Hou, Y., Ma, Z., Liu, C., Loy, C.C.: Learning lightweight lane detection CNNs by self attention distillation. In: Proceedings of ICCV, pp. 1013–1021 (2019)Google Scholar
  21. 21.
    Huang, Z., Wang, N.: Like what you like: knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017)
  22. 22.
    Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of CVPR, pp. 11–19 (2017)Google Scholar
  23. 23.
    Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of CVPR, pp. 1925–1934 (2017)Google Scholar
  24. 24.
    Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: Proceedings of CVPR, pp. 2604–2613 (2019)Google Scholar
  25. 25.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of CVPR, pp. 3431–3440 (2015)Google Scholar
  26. 26.
    Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 561–580. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_34CrossRefGoogle Scholar
  27. 27.
    Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of CVPR, pp. 3967–3976 (2019)Google Scholar
  28. 28.
    Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
  29. 29.
    Paszke, A., et al.: Automatic differentiation in pytorch. In: Proceedings of NIPS Workshop (2017)Google Scholar
  30. 30.
    Peng, B., et al.: Correlation congruence for knowledge distillation. In: Proceedings of ICCV, pp. 5007–5016 (2019)Google Scholar
  31. 31.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  32. 32.
    Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)CrossRefGoogle Scholar
  33. 33.
    Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. In: Proceedings of ICLR (2015)Google Scholar
  34. 34.
    Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of ICML, pp. 6105–6114 (2019)Google Scholar
  35. 35.
    Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of ICCV, pp. 1365–1374 (2019)Google Scholar
  36. 36.
    Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Proceedings of NIPS, pp. 2074–2082 (2016)Google Scholar
  37. 37.
    Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: Proceedings of CVPR, pp. 4820–4828 (2016)Google Scholar
  38. 38.
    Xu, Z., Hsu, Y.C., Huang, J.: Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. In: Proceedings of ICLR Workshop (2018)Google Scholar
  39. 39.
    Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of CVPR, pp. 4133–4141 (2017)Google Scholar
  40. 40.
    Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01261-8_20CrossRefGoogle Scholar
  41. 41.
    Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of CVPR, pp. 1857–1866 (2018)Google Scholar
  42. 42.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: Proceedings of ICLR (2016)Google Scholar
  43. 43.
    Yuan, Y., Wang, J.: Ocnet: object context network for scene parsing. arXiv preprint arXiv:1809.00916 (2018)
  44. 44.
    Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of ICLR (2017)Google Scholar
  45. 45.
    Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_25CrossRefGoogle Scholar
  46. 46.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of CVPR, pp. 2881–2890 (2017)Google Scholar
  47. 47.
    Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: Proceedings of CVPR, pp. 1529–1537 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computer ScienceWuhan UniversityWuhanChina
  2. 2.School of EiCHuazhong University of Science and TechnologyWuhanChina

Personalised recommendations