Large-Scale Few-Shot Learning via Multi-modal Knowledge Discovery

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12355)


Large-scale few-shot learning aims at identifying hundreds of novel object categories where each category has only a few samples. It is a challenging problem since (1) the identifying process is susceptible to over-fitting with limited samples of an object, and (2) the sample imbalance between a base (known knowledge) category and a novel category is easy to bias the recognition results. To solve these problems, we propose a method based on multi-modal knowledge discovery. First, we use the visual knowledge to help the feature extractors focus on different visual parts. Second, we design a classifier to learn the distribution over all categories. In the second stage, we develop three schemes to minimize the prediction error and balance the training procedure: (1) Hard labels are used to provide precise supervision. (2) Semantic textual knowledge is utilized as weak supervision to find the potential relations between the novel and the base categories. (3) An imbalance control is presented from the data distribution to alleviate the recognition bias towards the base categories. We apply our method on three benchmark datasets, and it achieves state-of-the-art performances in all the experiments.


Large-scale few-shot learning Multi-modal knowledge discovery 



This work is supported by the National Key Research and Development Program of China under grant 2018YFB0804205, and National Nature Science Foundation of China (NSFC) under grants 61732008 and 61725203.

Supplementary material

504449_1_En_42_MOESM1_ESM.pdf (7.5 mb)
Supplementary material 1 (pdf 7666 KB)


  1. 1.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) COMPSTAT, pp. 177–186. Springer (2010).
  2. 2.
    Douze, M., Szlam, A., Hariharan, B., Jégou, H.: Low-shot learning with large-scale diffusion. In: CVPR (2018)Google Scholar
  3. 3.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)Google Scholar
  4. 4.
    Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: CVPR (2018)Google Scholar
  5. 5.
    Gidaris, S., Komodakis, N.: Generating classification weights with gnn denoising autoencoders for few-shot learning. In: CVPR (2019)Google Scholar
  6. 6.
    Guo, D., Wang, S., Tian, Q., Wang, M.: Dense temporal convolution network for sign language translation. In: IJCAI (2019)Google Scholar
  7. 7.
    Hariharan, B., Girshick, R.: Low-shot visual recognition by shrinking and hallucinating features. In: ICCV (2017)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  9. 9.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. TPAMI 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
  10. 10.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  11. 11.
    Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: CVPR (2017)Google Scholar
  12. 12.
    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)Google Scholar
  13. 13.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)Google Scholar
  14. 14.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Li, A., Luo, T., Lu, Z., Xiang, T., Wang, L.: Large-scale few-shot learning: knowledge transfer with class hierarchy. In: CVPR (2019)Google Scholar
  16. 16.
    Liu, T., et al.: Learning to detect a salient object. TPAMI 33(2), 35–367 (2010)Google Scholar
  17. 17.
    Meng, N., Wu, X., Liu, J., Lam, E.Y.: High-order residual network for light field super-resolution. In: AAAI (2020)Google Scholar
  18. 18.
    Munkhdalai, T., Yu, H.: Meta networks. In: ICML (2017)Google Scholar
  19. 19.
    Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G.J., Tang, J.: Few-shot image recognition with knowledge transfer. In: ICCV (2019)Google Scholar
  20. 20.
    Qi, H., Brown, M., Lowe, D.G.: Low-shot learning with imprinted weights. In: CVPR (2018)Google Scholar
  21. 21.
    Qiao, S., Liu, C., Shen, W., Yuille, A.L.: Few-shot image recognition by predicting parameters from activations. In: CVPR (2018)Google Scholar
  22. 22.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  23. 23.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)Google Scholar
  24. 24.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Rusu, A.A., et al.: Meta-learning with latent embedding optimization. In: ICLR (2019)Google Scholar
  26. 26.
    Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: NeurIPS (2016)Google Scholar
  27. 27.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)Google Scholar
  28. 28.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS (2017)Google Scholar
  29. 29.
    Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: CVPR (2018)Google Scholar
  30. 30.
    Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: NeurIPS (2016)Google Scholar
  31. 31.
    Wang, S., Guo, D., Zhou, W.g., Zha, Z.J., Wang, M.: Connectionist temporal fusion for sign language translation. In: ACM MM (2018)Google Scholar
  32. 32.
    Wang, Y.X., Girshick, R., Hebert, M., Hariharan, B.: Low-shot learning from imaginary data. In: CVPR (2018)Google Scholar
  33. 33.
    Xian, Y., Sharma, S., Schiele, B., Akata, Z.: F-VAEGAN-D2: a feature generating framework for any-shot learning. In: CVPR (2019)Google Scholar
  34. 34.
    You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR (2016)Google Scholar
  35. 35.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)Google Scholar
  36. 36.
    Zhang, H., Zhang, J., Koniusz, P.: Few-shot learning via saliency-guided hallucination of samples. In: CVPR (2019)Google Scholar
  37. 37.
    Zhang, J., Zhang, T., Dai, Y., Harandi, M., Hartley, R.: Deep unsupervised saliency detection: a multiple noisy labeling perspective. In: CVPR (2018)Google Scholar
  38. 38.
    Zhang, R., Che, T., Ghahramani, Z., Bengio, Y., Song, Y.: MetaGAN: an adversarial approach to few-shot learning. In: NeurIPS (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computer Science and Information EngineeringHefei University of TechnologyHefeiChina
  2. 2.Institute of Artificial Intelligence, Hefei Comprehensive National Science CenterHefeiChina
  3. 3.Noah’s Ark LabHuawei TechnologiesShenzhenChina
  4. 4.Huawei Cloud BUShenzhenChina

Personalised recommendations