Advertisement

Adaptive Task Sampling for Meta-learning

Conference paper
  • 512 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12363)

Abstract

Meta-learning methods have been extensively studied and applied in computer vision, especially for few-shot classification tasks. The key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time by randomly sampling classes in meta-training data to construct few-shot tasks for episodic training. While a rich line of work focuses solely on how to extract meta-knowledge across tasks, we exploit the complementary problem on how to generate informative tasks. We argue that the randomly sampled tasks could be sub-optimal and uninformative (e.g., the task of classifying “dog” from “laptop” is often trivial) to the meta-learner. In this paper, we propose an adaptive task sampling method to improve the generalization performance. Unlike instance based sampling, task based sampling is much more challenging due to the implicit definition of the task in each episode. Therefore, we accordingly propose a greedy class-pair based sampling method, which selects difficult tasks according to class-pair potentials. We evaluate our adaptive task sampling method on two few-shot classification benchmarks, and it achieves consistent improvements across different feature backbones, meta-learning algorithms and datasets.

Notes

Acknowledgment

This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-RP-2018-001). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.

Supplementary material

504473_1_En_44_MOESM1_ESM.pdf (362 kb)
Supplementary material 1 (pdf 362 KB)

References

  1. 1.
    Alain, G., Lamb, A., Sankar, C., Courville, A., Bengio, Y.: Variance reduction in SGD by distributed importance sampling. arXiv preprint arXiv:1511.06481 (2015)
  2. 2.
    Allen-Zhu, Z., Qu, Z., Richtárik, P., Yuan, Y.: Even faster accelerated coordinate descent using non-uniform sampling. In: International Conference on Machine Learning, pp. 1110–1119 (2016)Google Scholar
  3. 3.
    Aly, M.: Survey on multiclass classification methods. Neural Netw. 19, 1–9 (2005)Google Scholar
  4. 4.
    Antoniou, A., Edwards, H., Storkey, A.: How to train your MAML. arXiv preprint arXiv:1810.09502 (2018)
  5. 5.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)Google Scholar
  6. 6.
    Bertinetto, L., Henriques, J.F., Torr, P.H., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136 (2018)
  7. 7.
    Chang, H.S., Learned-Miller, E., McCallum, A.: Active bias: training more accurate neural networks by emphasizing high variance samples. In: Advances in Neural Information Processing Systems, pp. 1002–1012 (2017)Google Scholar
  8. 8.
    Chen, W., Liu, Y., Kira, Z., Wang, Y.F., Huang, J.: A closer look at few-shot classification. In: Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May (2019). https://openreview.net/forum?id=HkxLXnAcFQ
  9. 9.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
  10. 10.
    Cross, G.R., Jain, A.K.: Markov random field texture models. IEEE Trans. Pattern Anal. Mach. Intell. 5(1), 25–39 (1983)CrossRefGoogle Scholar
  11. 11.
    Csiba, D., Richtárik, P.: Importance sampling for minibatches. J. Mach. Learn. Res. 19(1), 962–982 (2018)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1126–1135 (2017). JMLR.orgGoogle Scholar
  13. 13.
    Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., Pontil, M.: Bilevel programming for hyperparameter optimization and meta-learning. In: International Conference on Machine Learning, pp. 1563–1572 (2018)Google Scholar
  14. 14.
    Freund, Y., Schapire, R.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(771–780), 1612 (1999)Google Scholar
  15. 15.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Gopal, S.: Adaptive sampling for SGD by exploiting side information. In: International Conference on Machine Learning, pp. 364–372 (2016)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  18. 18.
    Horváth, S., Richtárik, P.: Nonconvex variance reduced optimization with arbitrary sampling. arXiv preprint arXiv:1809.04146 (2018)
  19. 19.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  20. 20.
    Katharopoulos, A., Fleuret, F.: Biased importance sampling for deep neural network training. arXiv preprint arXiv:1706.00043 (2017)
  21. 21.
    Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. arXiv preprint arXiv:1803.00942 (2018)
  22. 22.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  23. 23.
    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)Google Scholar
  24. 24.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Landau, B., Smith, L.B., Jones, S.S.: The importance of shape in early lexical learning. Cogn. Dev. 3(3), 299–321 (1988)CrossRefGoogle Scholar
  26. 26.
    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10657–10665 (2019)Google Scholar
  27. 27.
    Li, Z., Zhou, F., Chen, F., Li, H.: Meta-SGD: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017)
  28. 28.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  29. 29.
    Liu, L., Zhou, T., Long, G., Jiang, J., Zhang, C.: Learning to propagate for graph meta-learning. arXiv preprint arXiv:1909.05024 (2019)
  30. 30.
    London, B.: A PAC-Bayesian analysis of randomized learning with application to stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 2931–2940 (2017)Google Scholar
  31. 31.
    Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343 (2015)
  32. 32.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008)zbMATHGoogle Scholar
  33. 33.
    Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: Proceedings of the ICLR (2017)Google Scholar
  34. 34.
    Munkhdalai, T., Yuan, X., Mehri, S., Trischler, A.: Rapid adaptation with conditionally shifted neurons. In: International Conference on Machine Learning, pp. 3661–3670 (2018)Google Scholar
  35. 35.
    Naik, D.K., Mammone, R.J.: Meta-neural networks that learn by learning. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 1992, vol. 1, pp. 437–442. IEEE (1992)Google Scholar
  36. 36.
    Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018)
  37. 37.
    Oreshkin, B., López, P.R., Lacoste, A.: TADAM: task dependent adaptive metric for improved few-shot learning. In: Advances in Neural Information Processing Systems, pp. 721–731 (2018)Google Scholar
  38. 38.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: Proceedings of the ICLR (2016)Google Scholar
  39. 39.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Rusu, A.A., et al.: Meta-learning with latent embedding optimization. In: Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May (2019). https://openreview.net/forum?id=BJgklhAcK7
  41. 41.
    Satorras, V.G., Bruna, J.: Few-shot learning with graph neural networks. In: Proceedings of the ICLR (2018)Google Scholar
  42. 42.
    Shalev-Shwartz, S., Wexler, Y.: Minimizing the maximal loss: how and why. In: Proceedings of the ICML, pp. 793–801 (2016)Google Scholar
  43. 43.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)Google Scholar
  44. 44.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp. 4077–4087 (2017)Google Scholar
  45. 45.
    Song, H., Kim, S., Kim, M., Lee, J.G.: Ada-boundary: accelerating the DNN training via adaptive boundary batch selection (2018)Google Scholar
  46. 46.
    Sun, Q., Liu, Y., Chua, T.S., Schiele, B.: Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 403–412 (2019)Google Scholar
  47. 47.
    Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)Google Scholar
  48. 48.
    Thrun, S., Pratt, L.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 3–17. Springer, Boston (1998).  https://doi.org/10.1007/978-1-4615-5529-2_1
  49. 49.
    Triantafillou, E., et al.: Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096 (2019)
  50. 50.
    Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)Google Scholar
  51. 51.
    Ze, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7962–7966. IEEE (2013)Google Scholar
  52. 52.
    Zhang, C., Kjellstrom, H., Mandt, S.: Determinantal point processes for mini-batch diversification. arXiv preprint arXiv:1705.00607 (2017)
  53. 53.
    Zhang, C., Öztireli, C., Mandt, S., Salvi, G.: Active mini-batch sampling using repulsive point processes. Proceedings of the AAAI Conference on Artificial Intelligence 33, 5741–5748 (2019)CrossRefGoogle Scholar
  54. 54.
    Zhang, R., Che, T., Ghahramani, Z., Bengio, Y., Song, Y.: MetaGAN: an adversarial approach to few-shot learning. In: Advances in Neural Information Processing Systems, pp. 2365–2374 (2018)Google Scholar
  55. 55.
    Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Singapore Management University SingaporeSingapore
  2. 2.South China University of TechnologyGuangzhouChina
  3. 3.Salesforce Research AsiaSan FranciscoUSA
  4. 4.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations