Abstract
Few-shot learning aims to classify novel images based on a small number of labeled examples. While recent work has shown promise using local descriptors, existing methods generally classify local descriptors independently, which potentially can loss the spatial and other essential information for new tasks. Moreover, such works ignore the semantics expressed by local descriptors may be irrelevant to image semantics. In this paper, we propose two methods to address these challenges. Firstly, we design a novel Spatial Cross Attention Module to generate a spatial cross attention map between a query and a class representation to enhance the local descriptors that are most relevant to each task. Then, we employ dense classification loss, which supervises the learning of all local descriptors, to constrain the semantic consistency of local descriptors. Furthermore, we show that the feature extractor trained by our method can be extended to some new baseline methods to achieve better performance. Extensive experiments conducted on three widely used few-shot learning benchmark datasets indicate that our proposed method achieves the competitive results.
Similar content being viewed by others
Data availability
We confirm that the data supporting the findings of this study are available within the article.
References
LeCun Y, Bengio Y, Hinton G. (2015) Deep learning. In: Nature, pp 436–444
Krizhevsky A, Sutskever I, Hinton G E. (2017) ImageNet classification with deep convolutional neural networks. In: Communications of the ACM, pp 84–90
Deng J, Dong W, Socher R, et al. (2009) ImageNet: a large-scale hierarchical image database. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 248–255
Szegedy C, Wei Liu, Yangqing Jia, et al. (2015) Going deeper with convolutions. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1–9
He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L, et al. (2017) Densely connected convolutional networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2261–2269
Li W, Liu X, Yuan Y. (2022) SIGMA: semantic-complete graph matching for domain adaptive object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5281–5290
K. Hu J, Kuai T, Waslander S. (2022) Point density-aware voxels for LiDAR 3D object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8459–8468
Zhang X, Xu H, Mo H, et al. (2021) DCNAS: densely connected neural architecture search for semantic image segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13951–13962
Kim N, Kim D, Kwak S, et al. (2022) ReSTR: convolution-free referring image segmentation using transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18124–18133
Lee S, Seong H, Lee S, et al. (2022) Correlation verification for image retrieval. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5364–5374
Yan C, Gong B, Wei Y, et al. (2020) Deep multi-view enhancement hashing for image retrieval. arXiv
Wang J, Sun K, Cheng T, et al. (2020) Deep high-resolution representation learning for visual recognition. arXiv
Ding K, Xu Z, Tong H, et al. (2022) Data augmentation for deep graph learning: a survey. In: ACM SIGKDD Explorations Newsletter, pp 61–77
Ioffe S, Szegedy C (2022) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:2202.08235
Kukačka J, Golkov V, Cremers D (2017) Regularization for deep learning: a taxonomy. arXiv:1710.10686
Sun Q, Liu Y, Chua T-S, et al. (2019) Meta-transfer learning for few-shot learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 403–412
Baik S, Hong S, Lee K M (2020) Learning to forget for meta-learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2376–2384
Simon C, Koniusz P, Nock R, et al. (2020) Adaptive subspaces for few-shot learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4135–4144
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML
Ye H-J, Hu H, Zhan D-C, et al. (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8805–8814
Zhang C, Cai Y, Lin G, et al. (2020) DeepEMD: differentiable earth mover’s distance for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12203–12213
Li W, Wang L, Xu J, et al. (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7253–7260
He J, Hong R, Liu X, et al. (2022) Revisiting local descriptor for improved few-shot classification. In: ACM Transactions on Multimedia Computing, Communications, and Applications, pp 1–23
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Neural Information Processing Systems, pp 4077–4087
Oreshkin BN, Lopez PR, Lacoste A. (2018) TADAM: task dependent adaptive metric for improved few-shot learning. In: Neural Information Processing Systems, pp 721–731
Sung F, Yang Y, Zhang L, et al. (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1199–1208
Kwonjoon L, Subhransu M, Avinash R, et al. (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10657–10665
Hou R, Chang H, Ma B, et al. (2019) Cross attention network for few-shot classification. In: Neural Information Processing Systems, pp 4005–4016
Jiang Z, Kang B, Zhou K, et al. (2020) Few-shot classification via adaptive attention. arXiv:2008.02465
Zhmoginov A, Sandler M, Vladymyrov M. (2022) HyperTransformer: model generation for supervised and semi-supervised few-shot learning. arXiv:2201.04182.
He Y, Liang W, Zhao D, et al. (2022) Attribute surrogates learning and spectral tokens pooling in transformers for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9109–9119
Doersch C, Gupta A, Zisserman A. (2020) CrossTransformers: spatially-aware few-shot transfer. In: Neural Information Processing Systems, pp 21981–21993
Tian Y, Wang Y, Krishnan D, et al. (2020) Rethinking few-shot image classification: a good embedding is all you need? arXiv:2003.11539
Dhillon G, Chaudhari P, Ravichandran A, et al. (2019) A baseline for few-shot image classification. arXiv:1909.02729
Chen W, Liu Y, Kira Z, et al. (2019) A closer look at few-shot classification. In: International Conference on Learning Representations
Wang Y, Xu C, Liu C, et al. (2020) Instance credibility inference for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12833–12842
Vinyals O, Blundell C, Lillicrap T, et al. (2016) Matching networks for one shot learning. In: Neural Information Processing Systems, pp 3630–3638
Ren M, Triantafillou E, Ravi S, et al. (2018) Meta-learning for semi-supervised few-shot classification. arXiv:1803.00676
Bertinetto L, Henriques J F, Torr P H S, et al. (2018) Meta-learning with differentiable closed-form solvers. arXiv:1805.08136
Russakovsky O, Deng J, Su H, et al. (2014) ImageNet large scale visual recognition challenge. arXiv:1409.0575
Zhang Z, Lan C, Zeng W, et al. (2020) Uncertainty-aware few-shot image classification. arXiv:2010.04525
Wu Z, Li Y, Guo L, et al. (2019) Parn: position-aware relation networks for few-shot learning. In: Proceedings of the IEEE International Conference on Computer Vision. pp 6659–6667
Carion N, Massa F, Synnaeve G, et al. (2020) End-to-end object detection with transformers. In: Proceedings of European Conference on Computer Vision, pp 213–229
Guan T, Wang J, Lan S, et al. (2022) M3detr: multi-representation, multi-scale, mutual-relation 3d object detection with transformers. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 772–782
Liu S, Zhang L, Yang X, et al. (2021) Query2label: a simple transformer way to multi-label classification. arXiv:2107.10834
Chen Z, Cui Q, Zhao B, et al. (2022) Sst: spatial and semantic transformers for multi-label image recognition. In: IEEE Transactions on Image Processing, pp 2570–2583
Valanarasu J, Oza P, Hacihaliloglu I, et al. (2021) Medical transformer: gated axial-attention for medical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, pp 36–46
Petit O, Thome N, Rambour C, et al. (2021) U-net transformer: Self and cross attention for medical image segmentation. In: Proceedings of International Workshop on Machine Learning in Medical Imaging, pp 267–276
Dong B, Zhou P, Yan S, et al. (2022) Self-promoted supervision for few-shot transformer. arXiv:2203.07057.
Liu L, Hamilton W, Long G, et al. (2021) A universal representation transformer layer for few-shot image classification. In: Proceedings of the IEEE/CVF International Conference on Learning Representations.
Jiang B, Zhao K, Tang J. (2022) Rgtransformer: region-graph transformer for image representation and few-shot classification. In: IEEE Signal Processing Letters, pp 792–796
Ravichandran A, Bhotika R, Soatto S (2019) Few-shot learning with embedded class models and shot-free meta training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 331–339
Qiao L, Shi Y, Li J, et al. (2019) Transductive episodic-wise adaptive metric for few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3603–3612
Laenen S, Bertinetto L (2020) On episodes, prototypical networks, and few-shot learning. arXiv preprint arXiv:2012.09831
Gao Z, Wu Y, Jia Y, et al. (2021) Curvature generation in curved spaces for few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8691–8700
Kim J, Kim H, Kim G. (2020) Model-agnostic boundary-adversarial sampling for test-time generalization in few-shot learning. In: ECCV, pp 599–617
Wang Z, Miao Z, Zhen X, et al. (2021) Learning to learn dense gaussian processes for few-shot learning. In: Neural Information Processing Systems
Hu S, Moreno P, Xiao Y, et al. (2020) Empirical bayes transductive meta-learning with synthetic gradients. In: International Conference on Learning Representations
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 92367111.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, J., Zhao, L. & Yang, H. Local descriptor-based spatial cross attention network for few-shot learning. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02189-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13042-024-02189-1