Advertisement

Incremental Few-Shot Meta-learning via Indirect Discriminant Alignment

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12352)

Abstract

We propose a method to train a model so it can learn new classification tasks while improving with each task solved. This amounts to combining meta-learning with incremental learning. Different tasks can have disjoint classes, so one cannot directly align different classifiers as done in model distillation. On the other hand, simply aligning features shared by all classes does not allow the base model sufficient flexibility to evolve to solve new tasks. We therefore indirectly align features relative to a minimal set of “anchor classes”. Such indirect discriminant alignment (IDA) adapts a new model to old classes without the need to re-process old data, while leaving maximum flexibility for the model to adapt to new tasks. This process enables incrementally improving the model by processing multiple learning episodes, each representing a different learning task, even with few training examples. Experiments on few-shot learning benchmarks show that this incremental approach performs favorably compared to training the model with the entire dataset at once.

Supplementary material

504444_1_En_40_MOESM1_ESM.pdf (226 kb)
Supplementary material 1 (pdf 226 KB)

References

  1. 1.
    Achille, A., et al.: Task2vec: task embedding for meta-learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6430–6439 (2019)Google Scholar
  2. 2.
    Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2654–2662. Curran Associates, Inc., (2014). http://papers.nips.cc/paper/5484-do-deep-nets-really-need-to-be-deep.pdf
  3. 3.
    Bengio, S., Bengio, Y., Cloutier, J., Gecsei, J.: On the optimization of a synaptic learning rule. In: Preprints Conference Optimality in Artificial and Biological Neural Networks, vol. 2. University of Texas (1992)Google Scholar
  4. 4.
    Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with a-gem. In: International Conference on Learning Representations (2019)Google Scholar
  5. 5.
    Dhillon, G.S., Chaudhari, P., Ravichandran, A., Soatto, S.: A baseline for few-shot image classification. arXiv preprint arXiv:1909.02729 (2019)
  6. 6.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1126–1135. JMLR org (2017)Google Scholar
  7. 7.
    Finn, C., Rajeswaran, A., Kakade, S., Levine, S.: Online meta-learning. arXiv preprint arXiv:1902.08438 (2019)
  8. 8.
    French, R.M.: Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3(4), 128–135 (1999)CrossRefGoogle Scholar
  9. 9.
    Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. In: Advances in Neural Information Processing Systems, pp. 10727–10737 (2018)Google Scholar
  10. 10.
    Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4367–4375 (2018)Google Scholar
  11. 11.
    Harrison, J., Sharma, A., Finn, C., Pavone, M.: Continuous meta-learning without tasks. arXiv preprint arXiv:1912.08866 (2019)
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS 2014 Deep Learning Workshop (2015)Google Scholar
  14. 14.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  15. 15.
    Jerfel, G., Grant, E., Griffiths, T., Heller, K.A.: Reconciling meta-learning and continual learning with online mixtures of tasks. In: Advances in Neural Information Processing Systems, pp. 9119–9130 (2019)Google Scholar
  16. 16.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  17. 17.
    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Nat. Acad. Sci. 114(13), 3521–3526 (2017)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. NIPS’2012, vol. 1, pp. 1097–1105 (2012)Google Scholar
  19. 19.
    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10657–10665 (2019)Google Scholar
  20. 20.
    Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)CrossRefGoogle Scholar
  21. 21.
    Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, pp. 6467–6476 (2017)Google Scholar
  22. 22.
    Naik, D.K., Mammone, R.J.: Meta-neural networks that learn by learning. In: Proceedings 1992 IJCNN International Joint Conference on Neural Networks, vol. 1, pp. 437–442. IEEE (1992)Google Scholar
  23. 23.
    Oreshkin, B., López, P.R., Lacoste, A.: Tadam: task dependent adaptive metric for improved few-shot learning. In: Advances in Neural Information Processing Systems, pp. 721–731 (2018)Google Scholar
  24. 24.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR 2017 (2017)Google Scholar
  25. 25.
    Ravichandran, A., Bhotika, R., Soatto, S.: Few-shot learning with embedded class models and shot-free meta training. In: International Conference on Computer Vision (2019)Google Scholar
  26. 26.
    Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: icarl: Incremental classifier and representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)Google Scholar
  27. 27.
    Ren, M., Liao, R., Fetaya, E., Zemel, R.S.: Incremental few-shot learning with attention attractor networks. arXiv preprint arXiv:1810.07218 (2018)
  28. 28.
    Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676 (2018)
  29. 29.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Rusu, A.A., et al.: Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960 (2018)
  31. 31.
    Schmidhuber, J.: Evolutionary principles in self referential learning. On learning how to learn: The meta-meta-... hook. Diploma thesis, Institut f. Informatik, Tech. Univ. Munich (1987)Google Scholar
  32. 32.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)Google Scholar
  33. 33.
    Schwarz, J., et al.: Progress and compress: a scalable framework for continual learning. In: International Conference on Machine Learning (2018)Google Scholar
  34. 34.
    Siam, M., Oreshkin, B.: Adaptive masked weight imprinting for few-shot segmentation. arXiv preprint arXiv:1902.11123 (2019)
  35. 35.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp. 4077–4087 (2017)Google Scholar
  36. 36.
    Thrun, S., Pratt, L.: Learning to Learn. Springer Science & Business Media, New York (2012)zbMATHGoogle Scholar
  37. 37.
    Van Horn, G., Perona, P.: The devil is in the tails: Fine-grained classification in the wild. arXiv preprint arXiv:1709.01450 (2017)
  38. 38.
    Vinyals, O., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)Google Scholar
  39. 39.
    Wang, Y.X., Ramanan, D., Hebert, M.: Learning to model the tail. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 7029–7039. Curran Associates, Inc., (2017). http://papers.nips.cc/paper/7278-learning-to-model-the-tail.pdf
  40. 40.
    Xiang, L., Jin, X., Ding, G., Han, J., Li, L.: Incremental few-shot learning for pedestrian attribute recognition. arXiv preprint arXiv:1906.00330 (2019)
  41. 41.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3320–3328. Curran Associates, Inc., (2014). http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf
  42. 42.
    Zhu, X., Anguelov, D., Ramanan, D.: Capturing long-tail distributions of object subcategories. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. CVPR’2014, Washington, DC, USA, pp. 915–922. IEEE Computer Society (2014).  https://doi.org/10.1109/CVPR.2014.122, https://doi.org/10.1109/CVPR.2014.122

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Johns Hopkins UniversityBaltimoreUSA
  2. 2.Amazon Web ServicesSeattleUSA

Personalised recommendations