Advertisement

Dynamic Conditional Networks for Few-Shot Learning

  • Fang Zhao
  • Jian ZhaoEmail author
  • Shuicheng Yan
  • Jiashi Feng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11219)

Abstract

This paper proposes a novel Dynamic Conditional Convolutional Network (DCCN) to handle conditional few-shot learning, i.e, only a few training samples are available for each condition. DCCN consists of dual subnets: DyConvNet contains a dynamic convolutional layer with a bank of basis filters; CondiNet predicts a set of adaptive weights from conditional inputs to linearly combine the basis filters. In this manner, a specific convolutional kernel can be dynamically obtained for each conditional input. The filter bank is shared between all conditions thus only a low-dimension weight vector needs to be learned. This significantly facilitates the parameter learning across different conditions when training data are limited. We evaluate DCCN on four tasks which can be formulated as conditional model learning, including specific object counting, multi-modal image classification, phrase grounding and identity based face generation. Extensive experiments demonstrate the superiority of the proposed model in the conditional few-shot learning setting.

Keywords

Conditional model Few-shot learning Deep learning Dynamic convolution Filter bank 

Notes

Acknowledgments

Jian Zhao was partially supported by China Scholarship Council (CSC) grant 201503170248. Jiashi Feng was partially supported by NUS IDS R-263-000-C67-646, ECRA R-263-000-C87-133 and MOE Tier-II R-263-000-D17-112.

References

  1. 1.
    Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)Google Scholar
  2. 2.
    Berthelot, D., Schumm, T., Metz, L.: Began: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)
  3. 3.
    Bertinetto, L., Henriques, J.F., Valmadre, J., Torr, P., Vedaldi, A.: Learning feed-forward one-shot learners. In: Advances in Neural Information Processing Systems, pp. 523–531 (2016)Google Scholar
  4. 4.
    Bloom, P.: How Children Learn the Meanings of Words. The MIT Press (2000)Google Scholar
  5. 5.
    Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)Google Scholar
  6. 6.
    Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning (chapelle, o. et al., eds. 2006)[book reviews]. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)Google Scholar
  7. 7.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)
  8. 8.
    Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 766–774 (2014)Google Scholar
  9. 9.
    Fan, H., Cao, Z., Jiang, Y., Yin, Q., Doudou, C.: Learning deep face representation. arXiv preprint arXiv:1403.2802 (2014)
  10. 10.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)Google Scholar
  11. 11.
    Guadarrama, S., et al.: Long short-term memory. Neural Comput. (1997)Google Scholar
  12. 12.
    Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_6CrossRefGoogle Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  14. 14.
    Hertz, T., Hillel, A.B., Weinshall, D.: Learning a kernel function for classification with small training samples. In: Proceedings of the International Conference on Machine Learning, pp. 401–408. ACM (2006)Google Scholar
  15. 15.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)
  16. 16.
    Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the ACM International Conference on Multimedia Information Retrieval (2008)Google Scholar
  17. 17.
    Krause, E.A., Zillich, M., Williams, T.E., Scheutz, M.: Learning to recognize novel objects in one shot through human-robot interactions in natural language dialogues (2014)Google Scholar
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  19. 19.
    Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387 (2016)Google Scholar
  20. 20.
    Lake, B.M., Salakhutdinov, R.R., Tenenbaum, J.: One-shot learning by inverting a compositional causal process. In: Advances in Neural Information Processing Systems, pp. 2526–2534 (2013)Google Scholar
  21. 21.
    Lim, J.J., Salakhutdinov, R.R., Torralba, A.: Transfer learning by borrowing examples for multiclass object detection. In: Advances in Neural Information Processing Systems, pp. 118–126 (2011)Google Scholar
  22. 22.
    Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)Google Scholar
  23. 23.
    Mehrotra, A., Dukkipati, A.: Generative adversarial residual pairwise networks for one shot learning. arXiv preprint arXiv:1703.08033 (2017)
  24. 24.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  25. 25.
    Movshovitz-Attias, Y., Yu, Q., Stumpe, M.C., Shet, V., Arnoud, S., Yatziv, L.: Ontological supervision for fine grained classification of street view storefronts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1693–1702 (2015)Google Scholar
  26. 26.
    Oord, A.v.d., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
  27. 27.
    Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognitionGoogle Scholar
  28. 28.
    Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: collecting region-to-phrase correspondences for richer imageto-sentence models. Int. J. Comput. Vis. 123(1), 74–93 (2017)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Plummer1, B.A., Mallya, A., Cervantes, C.M., Hockenmaier, J., Lazebnik., S.: Phrase localization and visual relationship detection with comprehensive image-language cues. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  30. 30.
    Plummer1, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C.: Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE International Conference on Computer Vision (2015)Google Scholar
  31. 31.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning (2016)Google Scholar
  32. 32.
    Rohrbach, A., Rohrbach, M., Hu, R., Darrell, T., Schiele, B.: Grounding of textual phrases in images by reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 817–834. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_49CrossRefGoogle Scholar
  33. 33.
    Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)Google Scholar
  34. 34.
    Wang, L., Li, Y., Lazebnik, S.: Earning deep structure preserving image-text embeddings. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  35. 35.
    Wang, M., Azab, M., Kojima, N., Mihalcea, R., Deng, J.: Structured matching for phrase localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 696–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_42CrossRefGoogle Scholar
  36. 36.
    Wang, Y.-X., Hebert, M.: Learning to learn: model regression networks for easy small sample learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 616–634. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_37CrossRefGoogle Scholar
  37. 37.
    Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
  38. 38.
    Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. (2014)Google Scholar
  39. 39.
    Zhang, Z., Song, Y., Qi, H.: Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  40. 40.
    Zhao, J., Xiong, L., Jayashree, K., et al.: Dual-agent GANs for photorealistic and identity preserving profile face synthesis. In: Advances in Neural Information Processing Systems, pp. 66–76 (2017)Google Scholar
  41. 41.
    Zhu, X., Vondrick, C., Fowlkes, C.C., Ramanan, D.: Do we need more training data? Int. J. Comput. Vis. 119(1), 76–92 (2016)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Zhu, X.: Semi-supervised learning literature survey (2005)Google Scholar
  43. 43.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_26CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.National University of SingaporeSingaporeSingapore
  2. 2.National University of Defense TechnologyHunanChina
  3. 3.Qihoo 360 AI InstituteBeijingChina

Personalised recommendations