Abstract
Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large “teacher” network to a smaller “student” network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens at an external party side where we do not have access to much data, and the teacher does not disclose its parameters due to security and privacy concerns. To overcome these challenges, we propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. Our main idea is to expand the training set by generating a diverse set of out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder. These synthetic images along with their labels obtained from the teacher are used to train the student. We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks. The code and models are available at: https://github.com/nphdang/FS-BBT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This is possible because we use benchmark datasets, and the training and test splits are fixed.
References
Adriana, R., Nicolas, B., Ebrahimi, S., Antoine, C., Carlo, G., Yoshua, B.: FitNets: hints for thin deep nets. In: ICLR (2015)
Ahn, S., Hu, X., Damianou, A., Lawrence, N., Dai, Z.: Variational information distillation for knowledge transfer. In: CVPR, pp. 9163–9171 (2019)
Akisato, K., Zoubin, G., Koh, T., Tomoharu, I., Naonori, U.: Few-shot learning of neural networks from scratch by pseudo example optimization. In: British Machine Vision Conference (BMVC), p. 105 (2018)
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: NIPS, vol. 32 (2019)
Bhat, P., Arani, E., Zonooz, B.: Distill on the go: online knowledge distillation in self-supervised learning. In: CVPR, pp. 2678–2687 (2021)
Chawla, A., Yin, H., Molchanov, P., Alvarez, J.: Data-free knowledge distillation for object detection. In: CVPR, pp. 3289–3298 (2021)
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: NIPS, pp. 742–751 (2017)
Chen, H., et al.: Data-free learning of student networks. In: ICCV, pp. 3514–3522 (2019)
Gopakumar, S., Gupta, S., Rana, S., Nguyen, V., Venkatesh, S.: Algorithmic assurance: an active approach to algorithmic testing using Bayesian optimisation. In: NIPS, vol. 31 (2018)
Gou, J., Yu, B., Maybank, S., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129(6), 1789–1819 (2021)
Guo, G., Zhang, N.: A survey on deep learning based face recognition. Comput. Vis. Image Underst. 189, 102805 (2019)
Guo, H., Mao, Y., Zhang, R.: MixUp as locally linear out-of-manifold regularization. In: AAAI, vol. 33, pp. 3714–3722 (2019)
Gyawali, K.: Semi-supervised learning by disentangling and self-ensembling over stochastic latent space. arXiv preprint arXiv:1907.09607 (2019)
Ha, H., Gupta, S., Rana, S., Venkatesh, S.: High dimensional level set estimation with Bayesian neural network. In: AAAI, vol. 35, pp. 12095–12103 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR (2017)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hongyi, Z., Moustapha, C., Yann, D., David, L.P.: MixUp: beyond empirical risk minimization. In: ICLR (2018)
Kim, J., Park, S., Kwak, N.: Paraphrasing complex network: network compression via factor transfer. In: NIPS, pp. 2760–2769 (2018)
Kong, S., Guo, T., You, S., Xu, C.: Learning student networks with few data. In: AAAI, vol. 34, pp. 4469–4476 (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, vol. 25, pp. 1097–1105 (2012)
LeCun, Y., et al.: LeNet-5: convolutional neural networks. 20(5), 14 (2015). http://yann.lecun.com/exdb/lenet
Lee, S., Song, B.C.: Graph-based knowledge distillation by multi-head attention network. arXiv preprint arXiv:1907.02226 (2019)
Lopes, R.G., Fenu, S., Starner, T.: Data-free knowledge distillation for deep neural networks. arXiv preprint arXiv:1710.07535 (2017)
Ma, H., Chen, T., Hu, T.K., You, C., Xie, X., Wang, Z.: Undistillable: making a nasty teacher that cannot teach students. In: ICLR (2021)
Meng, Z., Li, J., Zhao, Y., Gong, Y.: Conditional teacher-student learning. In: ICASSP, pp. 6445–6449. IEEE (2019)
Nayak, G.K., Mopuri, K.R., Chakraborty, A.: Effectiveness of arbitrary transfer sets for data-free knowledge distillation. In: CVPR, pp. 1430–1438 (2021)
Nayak, K., Mopuri, R., Shaj, V., Radhakrishnan, B., Chakraborty, A.: Zero-shot knowledge distillation in deep networks. In: ICML, pp. 4743–4751 (2019)
Nguyen, D., et al.: Knowledge distillation with distribution mismatch. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 250–265. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_16
Passalis, N., Tzelepi, M., Tefas, A.: Heterogeneous knowledge distillation using information flow modeling. In: CVPR, pp. 2339–2348 (2020)
Pouyanfar, S., et al.: A survey on deep learning: algorithms, techniques, and applications. ACM Comput. Surv. 51(5), 1–36 (2018)
Santiago, F., Singh, P., Sri, L., et al.: Building Cognitive Applications with IBM Watson Services: Volume 6 Speech to Text and Text to Speech. IBM Redbooks (2017)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NIPS, pp. 3483–3491 (2015)
Sreenu, G., Saleem Durai, M.A.: Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J. Big Data 6(1), 1–27 (2019). https://doi.org/10.1186/s40537-019-0212-5
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR, pp. 1701–1708 (2014)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)
Wang, D., Li, Y., Wang, L., Gong, B.: Neural networks are more productive teachers than human raters: active mixup for data-efficient knowledge distillation from a blackbox model. In: CVPR, pp. 1498–1507 (2020)
Wang, Z.: Data-free knowledge distillation with soft targeted transfer set synthesis. In: AAAI, vol. 35, pp. 10245–10253 (2021)
Wang, Z.: Zero-shot knowledge distillation from a decision-based black-box model. In: ICML (2021)
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: CVPR, pp. 4133–4141 (2017)
Yin, H., et al.: Dreaming to distill: data-free knowledge transfer via deepinversion. In: CVPR, pp. 8715–8724 (2020)
Yuan, L., Tay, F., Li, G., Wang, T., Feng, J.: Revisiting knowledge distillation via label smoothing regularization. In: CVPR, pp. 3903–3911 (2020)
Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)
Acknowledgment
This research was fully supported by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (project DP210102798). The views expressed herein are those of the authors and are not necessarily those of the Australian Government or Australian Research Council.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, D., Gupta, S., Do, K., Venkatesh, S. (2022). Black-Box Few-Shot Knowledge Distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13681. Springer, Cham. https://doi.org/10.1007/978-3-031-19803-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-19803-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19802-1
Online ISBN: 978-3-031-19803-8
eBook Packages: Computer ScienceComputer Science (R0)