When Does Self-supervision Improve Few-Shot Learning?

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12352)


We investigate the role of self-supervised learning (SSL) in the context of few-shot learning. Although recent research has shown the benefits of SSL on large unlabeled datasets, its utility on small datasets is relatively unexplored. We find that SSL reduces the relative error rate of few-shot meta-learners by 4%–27%, even when the datasets are small and only utilizing images within the datasets. The improvements are greater when the training set is smaller or the task is more challenging. Although the benefits of SSL may increase with larger training sets, we observe that SSL can hurt the performance when the distributions of images used for meta-learning and SSL are different. We conduct a systematic study by varying the degree of domain shift and analyzing the performance of several meta-learners on a multitude of domains. Based on this analysis we present a technique that automatically selects images for SSL from a large, generic pool of unlabeled images for a given dataset that provides further improvements.



This project is supported in part by NSF #1749833 and a DARPA LwLL grant. Our experiments were performed on the University of Massachusetts Amherst GPU cluster obtained under the Collaborative Fund managed by the Massachusetts Technology Collaborative.

Supplementary material

504444_1_En_38_MOESM1_ESM.pdf (3.5 mb)
Supplementary material 1 (pdf 3544 KB)


  1. 1.
    Achille, A., et al.: Task2Vec: task embedding for meta-learning. In: ICCV (2019)Google Scholar
  2. 2.
    Asano, Y.M., Rupprecht, C., Vedaldi, A.: A critical analysis of self-supervision, or what we can learn from a single image. In: ICLR (2020)Google Scholar
  3. 3.
    Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. arXiv preprint arXiv:1906.00910 (2019)
  4. 4.
    Bertinetto, L., Henriques, J.F., Torr, P.H., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: ICLR (2019)Google Scholar
  5. 5.
    Bojanowski, P., Joulin, A.: Unsupervised learning by predicting noise. In: ICML (2017)Google Scholar
  6. 6.
    Carlucci, F.M., D’Innocente, A., Bucci, S., Caputo, B., Tommasi, T.: Domain generalization by solving jigsaw puzzles. In: CVPR (2019)Google Scholar
  7. 7.
    Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: ECCV (2018)Google Scholar
  8. 8.
    Caron, M., Bojanowski, P., Mairal, J., Joulin, A.: Unsupervised pre-training of image features on non-curated data. In: ICCV (2019)Google Scholar
  9. 9.
    Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)Google Scholar
  10. 10.
    Chen, W.Y., Liu, Y.C., Kira, Z., Wang, Y.C., Huang, J.B.: A closer look at few-shot classification. In: ICLR (2019)Google Scholar
  11. 11.
    Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: ICML (2018)Google Scholar
  12. 12.
    Cui, Y., Song, Y., Sun, C., Howard, A., Belongie, S.: Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR (2018)Google Scholar
  13. 13.
    Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV (2015)Google Scholar
  14. 14.
    Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. In: ICCV (2017)Google Scholar
  15. 15.
    Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: NeurIPS (2014)Google Scholar
  16. 16.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)Google Scholar
  17. 17.
    Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. In: NeurIPS (2018)Google Scholar
  18. 18.
    Gidaris, S., Bursuc, A., Komodakis, N., Pérez, P., Cord, M.: Boosting few-shot visual learning with self-supervision. In: ICCV (2019)Google Scholar
  19. 19.
    Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: CVPR (2018)Google Scholar
  20. 20.
    Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: ICLR (2018)Google Scholar
  21. 21.
    Goyal, P., Mahajan, D., Gupta, A., Misra, I.: Scaling and benchmarking self-supervised visual representation learning. In: ICCV (2019)Google Scholar
  22. 22.
    He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)Google Scholar
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  24. 24.
    Hénaff, O.J., Razavi, A., Doersch, C., Eslami, S., Oord, A.V.D.: Data-efficient image recognition with contrastive predictive coding. arXiv preprint arXiv:1905.09272 (2019)
  25. 25.
    Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: ICLR (2019)Google Scholar
  26. 26.
    Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR (2018)Google Scholar
  27. 27.
    Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  28. 28.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  29. 29.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2 (2015)Google Scholar
  30. 30.
    Kokkinos, I.: Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: CVPR (2017)Google Scholar
  31. 31.
    Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. In: CVPR (2019)Google Scholar
  32. 32.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3DRR), Australia, Sydney (2013)Google Scholar
  33. 33.
    Kuznetsova, A., et al.: The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982 (2018)
  34. 34.
    Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: ECCV (2016)Google Scholar
  35. 35.
    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: CVPR (2019)Google Scholar
  36. 36.
    Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
  37. 37.
    Maninis, K.K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR (2019)Google Scholar
  38. 38.
    Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations. In: CVPR (2020)Google Scholar
  39. 39.
    Ngiam, J., Peng, D., Vasudevan, V., Kornblith, S., Le, Q.V., Pang, R.: Domain adaptive transfer learning with specialist models. arXiv preprint arXiv:1811.07056 (2018)
  40. 40.
    Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: CVPR (2006)Google Scholar
  41. 41.
    Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: ECCV (2016)Google Scholar
  42. 42.
    Noroozi, M., Pirsiavash, H., Favaro, P.: Representation learning by learning to count. In: ICCV (2017)Google Scholar
  43. 43.
    Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
  44. 44.
    Oreshkin, B., López, P.R., Lacoste, A.: Tadam: task dependent adaptive metric for improved few-shot learning. In: NeurIPS (2018)Google Scholar
  45. 45.
    Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)Google Scholar
  46. 46.
    Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: CVPR (2016)Google Scholar
  47. 47.
    Qi, H., Brown, M., Lowe, D.G.: Low-shot learning with imprinted weights. In: CVPR (2018)Google Scholar
  48. 48.
    Qiao, S., Liu, C., Shen, W., Yuille, A.L.: Few-shot image recognition by predicting parameters from activations. In: CVPR (2018)Google Scholar
  49. 49.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR (2017)Google Scholar
  50. 50.
    Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. In: ICLR (2018)Google Scholar
  51. 51.
    Ren, Z., Lee, Y.J.: Cross-domain self-supervised multi-task feature learning using synthetic imagery. In: CVPR (2018)Google Scholar
  52. 52.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  53. 53.
    Rusu, A.A., et al.: Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960 (2018)
  54. 54.
    Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. In: NeurIPS (2018)Google Scholar
  55. 55.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS (2017)Google Scholar
  56. 56.
    Su, J.C., Maji, S.: Adapting models to signal degradation using distillation. In: BMVC (2017)Google Scholar
  57. 57.
    Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: CVPR (2018)Google Scholar
  58. 58.
    Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: ECCV (2020)Google Scholar
  59. 59.
    Trinh, T.H., Luong, M.T., Le, Q.V.: Selfie: self-supervised pretraining for image embedding. arXiv preprint arXiv:1906.02940 (2019)
  60. 60.
    Van Horn, G., et al.: The iNaturalist species classification and detection dataset. In: CVPR (2018)Google Scholar
  61. 61.
    Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: NeurIPS (2016)Google Scholar
  62. 62.
    Wallace, B., Hariharan, B.: Extending and analyzing self-supervised learning across domains. In: ECCV (2020)Google Scholar
  63. 63.
    Welinder, P., et al.: Caltech-UCSD Birds 200. Technical report, CNS-TR-2010-001, California Institute of Technology (2010)Google Scholar
  64. 64.
    Wertheimer, D., Hariharan, B.: Few-shot learning with localization in realistic settings. In: CVPR (2019)Google Scholar
  65. 65.
    Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)Google Scholar
  66. 66.
    Zamir, A.R., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR, pp. 3712–3722 (2018)Google Scholar
  67. 67.
    Zhai, X., Oliver, A., Kolesnikov, A., Beyer, L.: S4L: self-supervised semi-supervised learning. In: ICCV (2019)Google Scholar
  68. 68.
    Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: ECCV (2016)Google Scholar
  69. 69.
    Zhang, R., Isola, P., Efros, A.A.: Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In: CVPR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of Massachusetts AmherstAmherstUSA
  2. 2.Cornell UniversityIthacaUSA

Personalised recommendations