Abstract
Metric learning is one of the feasible approaches to few-shot learning. However, most metric learning methods encode images through CNN directly, without considering image contents. The general CNN features may lead to hard discrimination among distinct classes. Based on observation that feature maps correspond to image regions, we assume that image regions relevant to target objects should be salient in image features. To this end, we propose an effective framework, called Spatial Attention Network (SAN), to exploit spatial context of images. SAN produces attention weights on clustered regional features indicating the contributions of different regions to classification, and takes weighted sum of regional features as discriminative features. Thus, SAN highlights important contents by giving them large weights. Once trained, SAN compares unlabeled data with class prototypes of few labeled data in nearest-neighbor manner and identifies classes of unlabeled data. We evaluate our approach on three disparate datasets: miniImageNet, Caltech-UCSD Birds and miniDogsNet. Experimental results show that when compared with state-of-the-art models, SAN achieves competitive accuracy in miniImageNet and Caltech-UCSD Birds, and it improves 5-shot accuracy in miniDogsNet by a large margin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pytorch. https://github.com/pytorch/pytorch
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, June 2009. https://doi.org/10.1109/cvprw.2009.5206848
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400 (2017)
Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 3. IEEE, July 2017. https://doi.org/10.1109/cvpr.2017.476
Hara, K., Liu, M.Y., Tuzel, O., Farahmand, A.m.: Attentional network for visual object detection. arXiv preprint arXiv:1702.01478 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, June 2016. https://doi.org/10.1109/cvpr.2016.90
Hilliard, N., Phillips, L., Howland, S., Yankov, A., Corley, C.D., Hodas, N.O.: Few-shot learning with metric-agnostic conditional embeddings. arXiv preprint arXiv:1802.04376 (2018)
Ji, Z., Fu, Y., Guo, J., Pang, Y., Zhang, Z.M., et al.: Stacked semantics-guided attention model for fine-grained zero-shot learning. In: Advances in Neural Information Processing Systems, pp. 5998–6007 (2018)
Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, Z., Zhou, F., Chen, F., Li, H.: Meta-SGD: learning to learn quickly for few shot learning. arXiv preprint arXiv:1707.09835 (2017)
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 4898–4906 (2016)
Maaten, L.v.d., Hinton, G.J.: Visualizing data using T-SNE. Mach. Learn. Res. 9, 2579–2605 (2008)
Mathe, S., Pirinen, A., Sminchisescu, C.: Reinforcement learning for visual object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2894–2902. IEEE, June 2016. https://doi.org/10.1109/cvpr.2016.316
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: International Conference on Learning Representations (2017)
Rippel, O., Paluri, M., Dollar, P., Bourdev, L.: Metric learning with adaptive density discrimination. arXiv preprint arXiv:1511.05939 (2015)
Schwartz, E., et al.: Delta-encoder: an effective sample synthesis method for few-shot object recognition. In: Advances in Neural Information Processing Systems, pp. 2850–2860 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp. 4077–4087 (2017)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1199–1208. IEEE, June 2018. https://doi.org/10.1109/cvpr.2018.00131
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 842–850. IEEE, June 2015. https://doi.org/10.1109/cvpr.2015.7298685
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4651–4659. IEEE, June 2016. https://doi.org/10.1109/cvpr.2016.503
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, October 2017. https://doi.org/10.1109/iccv.2017.557
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929. IEEE, June 2016. https://doi.org/10.1109/cvpr.2016.319
Acknowledgements
This paper is supported by the National Key Research and Development Program of China (Grant No. 2018YFB1003405) and the National Natural Science Foundation of China (Grant No. 61732018).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
He, X., Qiao, P., Dou, Y., Niu, X. (2019). Spatial Attention Network for Few-Shot Learning. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)