Skip to main content

Deep Ensemble Learning by Diverse Knowledge Distillation for Fine-Grained Object Classification

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Ensemble of networks with bidirectional knowledge distillation does not significantly improve on the performance of ensemble of networks without bidirectional knowledge distillation. We think that this is because there is a relationship between the knowledge in knowledge distillation and the individuality of networks in the ensemble. In this paper, we propose a knowledge distillation for ensemble by optimizing the elements of knowledge distillation as hyperparameters. The proposed method uses graphs to represent diverse knowledge distillations. It automatically designs the knowledge distillation for the optimal ensemble by optimizing the graph structure to maximize the ensemble accuracy. Graph optimization and evaluation experiments using Stanford Dogs, Stanford Cars, CUB-200-2011, CIFAR-10, and CIFAR-100 show that the proposed method achieves higher ensemble accuracy than conventional ensembles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9163–9171 (2019)

    Google Scholar 

  2. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)

    Google Scholar 

  3. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: IEEE Winter Conference on Applications of Computer Vision (2018)

    Google Scholar 

  4. Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, pp. 742–751 (2017)

    Google Scholar 

  5. Dabouei, A., Soleymani, S., Taherkhani, F., Dawson, J., Nasrabadi, N.M.: Exploiting joint robustness to adversarial perturbations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  6. Dvornik, N., Schmid, C., Mairal, J.: Diversity with cooperation: ensemble methods for few-shot classification. In: IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  7. Fukui, H., Hirakawa, T., Yamashita, T., Fujiyoshi, H.: Attention branch network: learning of attention mechanism for visual explanation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  8. Furlanello, T., Lipton, Z., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks. In: International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1607–1616 (2018)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  10. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: Neural Information Processing Systems Deep Learning and Representation Learning Workshop (2015)

    Google Scholar 

  11. Ji, M., Shin, S., Hwang, S., Park, G., Moon, I.C.: Refine myself by teaching myself: feature refinement via self-knowledge distillation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10664–10673 (2021)

    Google Scholar 

  12. Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition (2011)

    Google Scholar 

  13. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition. Sydney, Australia (2013)

    Google Scholar 

  14. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)

    Google Scholar 

  15. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representations (2017)

    Google Scholar 

  16. Lan, X., Zhu, X., Gong, S.: Knowledge distillation by on-the-fly native ensemble. In: Advances in Neural Information Processing Systems, pp. 7527–7537 (2018)

    Google Scholar 

  17. Lee, H., Hwang, S.J., Shin, J.: Self-supervised label augmentation via input transformations. In: International Conference on Machine Learning, pp. 5714–5724 (2020)

    Google Scholar 

  18. Li, L., et al.: A system for massively parallel hyperparameter tuning. In: Dhillon, I.S., Papailiopoulos, D.S., Sze, V. (eds.) Proceedings of Machine Learning and Systems, vol. 2, pp. 230–246 (2020)

    Google Scholar 

  19. Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  20. Liu, Y., Cao, J., Li, B., Yuan, C., Hu, W., Li, Y., Duan, Y.: Knowledge distillation via instance relationship graph. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  21. Malinin, A., Mlodozeniec, B., Gales, M.: Ensemble distribution distillation. In: International Conference on Learning Representations (2020)

    Google Scholar 

  22. Minami, S., Hirakawa, T., Yamashita, T., Fujiyoshi, H.: Knowledge transfer graph for deep collaborative learning. In: Asian Conference on Computer Vision (2020)

    Google Scholar 

  23. Mirzadeh, S.I., Farajtabar, M., Li, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant: Bridging the gap between student and teacher. In: Association for the Advancement of Artificial Intelligence (2020)

    Google Scholar 

  24. Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  25. Peng, B., et al.: Correlation congruence for knowledge distillation. In: IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  26. Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., He, K.: Data distillation: towards omni-supervised learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  27. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. In: International Conference on Learning Representations (2015)

    Google Scholar 

  28. Song, G., Chai, W.: Collaborative learning for deep neural networks. In: Advances in Neural Information Processing Systems, pp. 1837–1846 (2018)

    Google Scholar 

  29. Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  30. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: International Conference on Learning Representations (2020)

    Google Scholar 

  31. Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  32. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)

    Google Scholar 

  33. Wen, Y., Tran, D., Ba, J.: Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In: International Conference on Learning Representations (2020)

    Google Scholar 

  34. Wenzel, F., Snoek, J., Tran, D., Jenatton, R.: Hyperparameter ensembles for robustness and uncertainty quantification. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6514–6527. Curran Associates, Inc. (2020)

    Google Scholar 

  35. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133–4141 (2017)

    Google Scholar 

  36. Yu, L., Yazici, V.O., Liu, X., Weijer, J.V.D., Cheng, Y., Ramisa, A.: Learning metrics from teachers: compact networks for image embedding. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  37. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: International Conference on Learning Representations (2017)

    Google Scholar 

  38. Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  39. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

Download references

Acknowledgements

This paper is based on results obtained from a project, JPNP18002, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoki Okamoto .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2834 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Okamoto, N., Hirakawa, T., Yamashita, T., Fujiyoshi, H. (2022). Deep Ensemble Learning by Diverse Knowledge Distillation for Fine-Grained Object Classification. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20083-0_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20082-3

  • Online ISBN: 978-3-031-20083-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics