Rethinking Few-Shot Image Classification: A Good Embedding is All You Need?

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)


The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot learning is widely used as one of the standard benchmarks in meta-learning. In this work, we show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods. An additional boost can be achieved through the use of self-distillation. This demonstrates that using a good learned embedding model can be more effective than sophisticated meta-learning algorithms. We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms. Code:



The authors thank Hugo Larochelle and Justin Solomon for helpful discussions and feedback on this manuscript. This research was supported in part by iFlytek. This material was also based in part upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-19-C-1001.

Supplementary material

504468_1_En_16_MOESM1_ESM.pdf (131 kb)
Supplementary material 1 (pdf 131 KB)


  1. 1.
    Machine learning in python.
  2. 2.
    Allen, K., Shelhamer, E., Shin, H., Tenenbaum, J.: Infinite mixture prototypes for few-shot learning. In: ICML (2019)Google Scholar
  3. 3.
    Bertinetto, L., Henriques, J.F., Torr, P.H., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136 (2018)
  4. 4.
    Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: SIGKDD (2006)Google Scholar
  5. 5.
    Chen, W.Y., Liu, Y.C., Kira, Z., Wang, Y.C., Huang, J.B.: A closer look at few-shot classification. In: ICLR (2019)Google Scholar
  6. 6.
    Chen, Y., Wang, X., Liu, Z., Xu, H., Darrell, T.: A new meta-baseline for few-shot learning. ArXiv abs/2003.04390 (2020)Google Scholar
  7. 7.
    Clark, K., Luong, M.T., Manning, C.D., Le, Q.V.: Bam! born-again multi-task networks for natural language understanding. In: ACL (2019)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)Google Scholar
  9. 9.
    Dhillon, G.S., Chaudhari, P., Ravichandran, A., Soatto, S.: A baseline for few-shot image classification. In: ICLR (2020)Google Scholar
  10. 10.
    Dvornik, N., Schmid, C., Mairal, J.: Diversity with cooperation: ensemble methods for few-shot classification. In: ICCV (2019)Google Scholar
  11. 11.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)Google Scholar
  12. 12.
    Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born-again neural networks. In: ICML (2018)Google Scholar
  13. 13.
    Gan, C., Gong, B., Liu, K., Su, H., Guibas, L.J.: Geometry guided convolutional neural networks for self-supervised video representation learning. In: CVPR (2018)Google Scholar
  14. 14.
    Gan, C., Zhao, H., Chen, P., Cox, D., Torralba, A.: Self-supervised moving vehicle tracking with stereo sound. In: ICCV (2019)Google Scholar
  15. 15.
    Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: CVPR (2018)Google Scholar
  16. 16.
    Hao, F., He, F., Cheng, J., Wang, L., Cao, J., Tao, D.: Collect and select: semantic alignment metric learning for few-shot learning. In: ICCV (2019)Google Scholar
  17. 17.
    He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. ArXiv abs/1911.05722 (2019)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  19. 19.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning and Representation Learning Workshop (2015)Google Scholar
  20. 20.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)Google Scholar
  21. 21.
    Huang, S., Tao, D.: All you need is a good representation: A multi-level and classifier-centric representation for few-shot learning. ArXiv abs/1911.12476 (2019)Google Scholar
  22. 22.
    Jamal, M.A., Qi, G.J.: Task agnostic meta-learning for few-shot learning. In: CVPR (2019)Google Scholar
  23. 23.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop (2015)Google Scholar
  24. 24.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: The Omniglot challenge: a 3-year progress report. Curr. Opin. Behav. Sci. 29, 97–104 (2019)CrossRefGoogle Scholar
  26. 26.
    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: CVPR (2019)Google Scholar
  27. 27.
    Li, A., Luo, T., Xiang, T., Huang, W., Wang, L.: Few-shot learning with global class representations. In: ICCV (2019)Google Scholar
  28. 28.
    Li, H., Eigen, D., Dodge, S., Zeiler, M., Wang, X.: Finding task-relevant features for few-shot learning by category traversal. In: CVPR (2019)Google Scholar
  29. 29.
    Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. arXiv preprint arXiv:1707.03141 (2017)
  30. 30.
    Mobahi, H., Farajtabar, M., Bartlett, P.L.: Self-distillation amplifies regularization in hilbert space. arXiv preprint arXiv:2002.05715 (2020)
  31. 31.
    Munkhdalai, T., Yuan, X., Mehri, S., Trischler, A.: Rapid adaptation with conditionally shifted neurons. arXiv preprint arXiv:1712.09926 (2017)
  32. 32.
    Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. ArXiv abs/1803.02999 (2018)Google Scholar
  33. 33.
    Oreshkin, B., López, P.R., Lacoste, A.: Tadam: task dependent adaptive metric for improved few-shot learning. In: NIPS (2018)Google Scholar
  34. 34.
    Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G.J., Tang, J.: Few-shot image recognition with knowledge transfer. In: ICCV (2019)Google Scholar
  35. 35.
    Qiao, L., Shi, Y., Li, J., Wang, Y., Huang, T., Tian, Y.: Transductive episodic-wise adaptive metric for few-shot learning. In: ICCV (2019)Google Scholar
  36. 36.
    Qiao, S., Liu, C., Shen, W., Yuille, A.L.: Few-shot image recognition by predicting parameters from activations. In: CVPR (2018)Google Scholar
  37. 37.
    Raghu, A., Raghu, M., Bengio, S., Vinyals, O.: Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157 (2019)
  38. 38.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR (2017)Google Scholar
  39. 39.
    Ravichandran, A., Bhotika, R., Soatto, S.: Few-shot learning with embedded class models and shot-free meta training. In: ICCV (2019)Google Scholar
  40. 40.
    Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. In: ICLR (2018)Google Scholar
  41. 41.
    Rusu, A.A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., Hadsell, R.: Meta-learning with latent embedding optimization. In: ICLR (2019)Google Scholar
  42. 42.
    Scott, T., Ridgeway, K., Mozer, M.C.: Adapted deep embeddings: a synthesis of methods for k-shot inductive transfer learning. In: NIPS (2018)Google Scholar
  43. 43.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NIPS (2017)Google Scholar
  44. 44.
    Sun, Q., Liu, Y., Chua, T.S., Schiele, B.: Meta-transfer learning for few-shot learning. In: CVPR (2019)Google Scholar
  45. 45.
    Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: Relation network for few-shot learning. In: CVPR (2018)Google Scholar
  46. 46.
    Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. arXiv preprint arXiv:1906.05849 (2019)
  47. 47.
    Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)
  48. 48.
    Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? arXiv preprint arXiv:2005.10243 (2020)
  49. 49.
    Triantafillou, E., Zemel, R.S., Urtasun, R.: Few-shot learning through an information retrieval lens. In: NIPS (2017)Google Scholar
  50. 50.
    Triantafillou, E., et al.: Meta-dataset: a dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096 (2019)
  51. 51.
    Vinyals, O., Blundell, C., Lillicrap, T., kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: NIPS (2016)Google Scholar
  52. 52.
    Wang, Y.X., Girshick, R.B., Hebert, M., Hariharan, B.: Low-shot learning from imaginary data. In: CVPR (2018)Google Scholar
  53. 53.
    Wang, Y.X., Hebert, M.: Learning from small sample sets by combining unsupervised meta-training with CNNs. Adv. Neural Inform. Process. Syst. 29, 244–252 (2016)Google Scholar
  54. 54.
    Wang, Y., Hebert, M.: Learning to learn: model regression networks for easy small sample learning. In: ECCV (2016)Google Scholar
  55. 55.
    Weng, L.: Meta-learning: Learning to learn fast. (2018).
  56. 56.
    Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)Google Scholar
  57. 57.
    Wu, Z., Li, Y., Guo, L., Jia, K.: Parn: position-aware relation networks for few-shot learning. In: ICCV (2019)Google Scholar
  58. 58.
    Ye, H.J., Hu, H., Zhan, D.C., Sha, F.: Learning embedding adaptation for few-shot learning. CoRR abs/1812.03664 (2018)Google Scholar
  59. 59.
    Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: CVPR (2017)Google Scholar
  60. 60.
    Zhang, J., Zhao, C., Ni, B., Xu, M., Yang, X.: Variational few-shot learning. In: ICCV (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.MITCambridgeUSA
  2. 2.Google ResearchMountain ViewUSA

Personalised recommendations