Skip to main content

Black-Box Few-Shot Knowledge Distillation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13681))

Included in the following conference series:

Abstract

Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large “teacher” network to a smaller “student” network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens at an external party side where we do not have access to much data, and the teacher does not disclose its parameters due to security and privacy concerns. To overcome these challenges, we propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. Our main idea is to expand the training set by generating a diverse set of out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder. These synthetic images along with their labels obtained from the teacher are used to train the student. We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks. The code and models are available at: https://github.com/nphdang/FS-BBT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    This is possible because we use benchmark datasets, and the training and test splits are fixed.

References

  1. Adriana, R., Nicolas, B., Ebrahimi, S., Antoine, C., Carlo, G., Yoshua, B.: FitNets: hints for thin deep nets. In: ICLR (2015)

    Google Scholar 

  2. Ahn, S., Hu, X., Damianou, A., Lawrence, N., Dai, Z.: Variational information distillation for knowledge transfer. In: CVPR, pp. 9163–9171 (2019)

    Google Scholar 

  3. Akisato, K., Zoubin, G., Koh, T., Tomoharu, I., Naonori, U.: Few-shot learning of neural networks from scratch by pseudo example optimization. In: British Machine Vision Conference (BMVC), p. 105 (2018)

    Google Scholar 

  4. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: NIPS, vol. 32 (2019)

    Google Scholar 

  5. Bhat, P., Arani, E., Zonooz, B.: Distill on the go: online knowledge distillation in self-supervised learning. In: CVPR, pp. 2678–2687 (2021)

    Google Scholar 

  6. Chawla, A., Yin, H., Molchanov, P., Alvarez, J.: Data-free knowledge distillation for object detection. In: CVPR, pp. 3289–3298 (2021)

    Google Scholar 

  7. Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: NIPS, pp. 742–751 (2017)

    Google Scholar 

  8. Chen, H., et al.: Data-free learning of student networks. In: ICCV, pp. 3514–3522 (2019)

    Google Scholar 

  9. Gopakumar, S., Gupta, S., Rana, S., Nguyen, V., Venkatesh, S.: Algorithmic assurance: an active approach to algorithmic testing using Bayesian optimisation. In: NIPS, vol. 31 (2018)

    Google Scholar 

  10. Gou, J., Yu, B., Maybank, S., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129(6), 1789–1819 (2021)

    Article  Google Scholar 

  11. Guo, G., Zhang, N.: A survey on deep learning based face recognition. Comput. Vis. Image Underst. 189, 102805 (2019)

    Google Scholar 

  12. Guo, H., Mao, Y., Zhang, R.: MixUp as locally linear out-of-manifold regularization. In: AAAI, vol. 33, pp. 3714–3722 (2019)

    Google Scholar 

  13. Gyawali, K.: Semi-supervised learning by disentangling and self-ensembling over stochastic latent space. arXiv preprint arXiv:1907.09607 (2019)

  14. Ha, H., Gupta, S., Rana, S., Venkatesh, S.: High dimensional level set estimation with Bayesian neural network. In: AAAI, vol. 35, pp. 12095–12103 (2021)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  16. Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR (2017)

    Google Scholar 

  17. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  18. Hongyi, Z., Moustapha, C., Yann, D., David, L.P.: MixUp: beyond empirical risk minimization. In: ICLR (2018)

    Google Scholar 

  19. Kim, J., Park, S., Kwak, N.: Paraphrasing complex network: network compression via factor transfer. In: NIPS, pp. 2760–2769 (2018)

    Google Scholar 

  20. Kong, S., Guo, T., You, S., Xu, C.: Learning student networks with few data. In: AAAI, vol. 34, pp. 4469–4476 (2020)

    Google Scholar 

  21. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, vol. 25, pp. 1097–1105 (2012)

    Google Scholar 

  22. LeCun, Y., et al.: LeNet-5: convolutional neural networks. 20(5), 14 (2015). http://yann.lecun.com/exdb/lenet

  23. Lee, S., Song, B.C.: Graph-based knowledge distillation by multi-head attention network. arXiv preprint arXiv:1907.02226 (2019)

  24. Lopes, R.G., Fenu, S., Starner, T.: Data-free knowledge distillation for deep neural networks. arXiv preprint arXiv:1710.07535 (2017)

  25. Ma, H., Chen, T., Hu, T.K., You, C., Xie, X., Wang, Z.: Undistillable: making a nasty teacher that cannot teach students. In: ICLR (2021)

    Google Scholar 

  26. Meng, Z., Li, J., Zhao, Y., Gong, Y.: Conditional teacher-student learning. In: ICASSP, pp. 6445–6449. IEEE (2019)

    Google Scholar 

  27. Nayak, G.K., Mopuri, K.R., Chakraborty, A.: Effectiveness of arbitrary transfer sets for data-free knowledge distillation. In: CVPR, pp. 1430–1438 (2021)

    Google Scholar 

  28. Nayak, K., Mopuri, R., Shaj, V., Radhakrishnan, B., Chakraborty, A.: Zero-shot knowledge distillation in deep networks. In: ICML, pp. 4743–4751 (2019)

    Google Scholar 

  29. Nguyen, D., et al.: Knowledge distillation with distribution mismatch. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 250–265. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_16

    Chapter  Google Scholar 

  30. Passalis, N., Tzelepi, M., Tefas, A.: Heterogeneous knowledge distillation using information flow modeling. In: CVPR, pp. 2339–2348 (2020)

    Google Scholar 

  31. Pouyanfar, S., et al.: A survey on deep learning: algorithms, techniques, and applications. ACM Comput. Surv. 51(5), 1–36 (2018)

    Article  Google Scholar 

  32. Santiago, F., Singh, P., Sri, L., et al.: Building Cognitive Applications with IBM Watson Services: Volume 6 Speech to Text and Text to Speech. IBM Redbooks (2017)

    Google Scholar 

  33. Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NIPS, pp. 3483–3491 (2015)

    Google Scholar 

  34. Sreenu, G., Saleem Durai, M.A.: Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J. Big Data 6(1), 1–27 (2019). https://doi.org/10.1186/s40537-019-0212-5

    Article  Google Scholar 

  35. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR, pp. 1701–1708 (2014)

    Google Scholar 

  36. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)

    Google Scholar 

  37. Wang, D., Li, Y., Wang, L., Gong, B.: Neural networks are more productive teachers than human raters: active mixup for data-efficient knowledge distillation from a blackbox model. In: CVPR, pp. 1498–1507 (2020)

    Google Scholar 

  38. Wang, Z.: Data-free knowledge distillation with soft targeted transfer set synthesis. In: AAAI, vol. 35, pp. 10245–10253 (2021)

    Google Scholar 

  39. Wang, Z.: Zero-shot knowledge distillation from a decision-based black-box model. In: ICML (2021)

    Google Scholar 

  40. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: CVPR, pp. 4133–4141 (2017)

    Google Scholar 

  41. Yin, H., et al.: Dreaming to distill: data-free knowledge transfer via deepinversion. In: CVPR, pp. 8715–8724 (2020)

    Google Scholar 

  42. Yuan, L., Tay, F., Li, G., Wang, T., Feng, J.: Revisiting knowledge distillation via label smoothing regularization. In: CVPR, pp. 3903–3911 (2020)

    Google Scholar 

  43. Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)

    Article  Google Scholar 

Download references

Acknowledgment

This research was fully supported by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (project DP210102798). The views expressed herein are those of the authors and are not necessarily those of the Australian Government or Australian Research Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dang Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, D., Gupta, S., Do, K., Venkatesh, S. (2022). Black-Box Few-Shot Knowledge Distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13681. Springer, Cham. https://doi.org/10.1007/978-3-031-19803-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19803-8_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19802-1

  • Online ISBN: 978-3-031-19803-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics