Skip to main content
Log in

Local descriptor-based spatial cross attention network for few-shot learning

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Few-shot learning aims to classify novel images based on a small number of labeled examples. While recent work has shown promise using local descriptors, existing methods generally classify local descriptors independently, which potentially can loss the spatial and other essential information for new tasks. Moreover, such works ignore the semantics expressed by local descriptors may be irrelevant to image semantics. In this paper, we propose two methods to address these challenges. Firstly, we design a novel Spatial Cross Attention Module to generate a spatial cross attention map between a query and a class representation to enhance the local descriptors that are most relevant to each task. Then, we employ dense classification loss, which supervises the learning of all local descriptors, to constrain the semantic consistency of local descriptors. Furthermore, we show that the feature extractor trained by our method can be extended to some new baseline methods to achieve better performance. Extensive experiments conducted on three widely used few-shot learning benchmark datasets indicate that our proposed method achieves the competitive results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4.
Fig. 5.
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

We confirm that the data supporting the findings of this study are available within the article.

References

  1. LeCun Y, Bengio Y, Hinton G. (2015) Deep learning. In: Nature, pp 436–444

  2. Krizhevsky A, Sutskever I, Hinton G E. (2017) ImageNet classification with deep convolutional neural networks. In: Communications of the ACM, pp 84–90

  3. Deng J, Dong W, Socher R, et al. (2009) ImageNet: a large-scale hierarchical image database. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 248–255

  4. Szegedy C, Wei Liu, Yangqing Jia, et al. (2015) Going deeper with convolutions. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1–9

  5. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 770–778

  6. Huang G, Liu Z, Van Der Maaten L, et al. (2017) Densely connected convolutional networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2261–2269

  7. Li W, Liu X, Yuan Y. (2022) SIGMA: semantic-complete graph matching for domain adaptive object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5281–5290

  8. K. Hu J, Kuai T, Waslander S. (2022) Point density-aware voxels for LiDAR 3D object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8459–8468

  9. Zhang X, Xu H, Mo H, et al. (2021) DCNAS: densely connected neural architecture search for semantic image segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13951–13962

  10. Kim N, Kim D, Kwak S, et al. (2022) ReSTR: convolution-free referring image segmentation using transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18124–18133

  11. Lee S, Seong H, Lee S, et al. (2022) Correlation verification for image retrieval. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5364–5374

  12. Yan C, Gong B, Wei Y, et al. (2020) Deep multi-view enhancement hashing for image retrieval. arXiv

  13. Wang J, Sun K, Cheng T, et al. (2020) Deep high-resolution representation learning for visual recognition. arXiv

  14. Ding K, Xu Z, Tong H, et al. (2022) Data augmentation for deep graph learning: a survey. In: ACM SIGKDD Explorations Newsletter, pp 61–77

  15. Ioffe S, Szegedy C (2022) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:2202.08235

  16. Kukačka J, Golkov V, Cremers D (2017) Regularization for deep learning: a taxonomy. arXiv:1710.10686

  17. Sun Q, Liu Y, Chua T-S, et al. (2019) Meta-transfer learning for few-shot learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 403–412

  18. Baik S, Hong S, Lee K M (2020) Learning to forget for meta-learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2376–2384

  19. Simon C, Koniusz P, Nock R, et al. (2020) Adaptive subspaces for few-shot learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4135–4144

  20. Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML

  21. Ye H-J, Hu H, Zhan D-C, et al. (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8805–8814

  22. Zhang C, Cai Y, Lin G, et al. (2020) DeepEMD: differentiable earth mover’s distance for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12203–12213

  23. Li W, Wang L, Xu J, et al. (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7253–7260

  24. He J, Hong R, Liu X, et al. (2022) Revisiting local descriptor for improved few-shot classification. In: ACM Transactions on Multimedia Computing, Communications, and Applications, pp 1–23

  25. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Neural Information Processing Systems, pp 4077–4087

  26. Oreshkin BN, Lopez PR, Lacoste A. (2018) TADAM: task dependent adaptive metric for improved few-shot learning. In: Neural Information Processing Systems, pp 721–731

  27. Sung F, Yang Y, Zhang L, et al. (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1199–1208

  28. Kwonjoon L, Subhransu M, Avinash R, et al. (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10657–10665

  29. Hou R, Chang H, Ma B, et al. (2019) Cross attention network for few-shot classification. In: Neural Information Processing Systems, pp 4005–4016

  30. Jiang Z, Kang B, Zhou K, et al. (2020) Few-shot classification via adaptive attention. arXiv:2008.02465

  31. Zhmoginov A, Sandler M, Vladymyrov M. (2022) HyperTransformer: model generation for supervised and semi-supervised few-shot learning. arXiv:2201.04182.

  32. He Y, Liang W, Zhao D, et al. (2022) Attribute surrogates learning and spectral tokens pooling in transformers for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9109–9119

  33. Doersch C, Gupta A, Zisserman A. (2020) CrossTransformers: spatially-aware few-shot transfer. In: Neural Information Processing Systems, pp 21981–21993

  34. Tian Y, Wang Y, Krishnan D, et al. (2020) Rethinking few-shot image classification: a good embedding is all you need? arXiv:2003.11539

  35. Dhillon G, Chaudhari P, Ravichandran A, et al. (2019) A baseline for few-shot image classification. arXiv:1909.02729

  36. Chen W, Liu Y, Kira Z, et al. (2019) A closer look at few-shot classification. In: International Conference on Learning Representations

  37. Wang Y, Xu C, Liu C, et al. (2020) Instance credibility inference for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12833–12842

  38. Vinyals O, Blundell C, Lillicrap T, et al. (2016) Matching networks for one shot learning. In: Neural Information Processing Systems, pp 3630–3638

  39. Ren M, Triantafillou E, Ravi S, et al. (2018) Meta-learning for semi-supervised few-shot classification. arXiv:1803.00676

  40. Bertinetto L, Henriques J F, Torr P H S, et al. (2018) Meta-learning with differentiable closed-form solvers. arXiv:1805.08136

  41. Russakovsky O, Deng J, Su H, et al. (2014) ImageNet large scale visual recognition challenge. arXiv:1409.0575

  42. Zhang Z, Lan C, Zeng W, et al. (2020) Uncertainty-aware few-shot image classification. arXiv:2010.04525

  43. Wu Z, Li Y, Guo L, et al. (2019) Parn: position-aware relation networks for few-shot learning. In: Proceedings of the IEEE International Conference on Computer Vision. pp 6659–6667

  44. Carion N, Massa F, Synnaeve G, et al. (2020) End-to-end object detection with transformers. In: Proceedings of European Conference on Computer Vision, pp 213–229

  45. Guan T, Wang J, Lan S, et al. (2022) M3detr: multi-representation, multi-scale, mutual-relation 3d object detection with transformers. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 772–782

  46. Liu S, Zhang L, Yang X, et al. (2021) Query2label: a simple transformer way to multi-label classification. arXiv:2107.10834

  47. Chen Z, Cui Q, Zhao B, et al. (2022) Sst: spatial and semantic transformers for multi-label image recognition. In: IEEE Transactions on Image Processing, pp 2570–2583

  48. Valanarasu J, Oza P, Hacihaliloglu I, et al. (2021) Medical transformer: gated axial-attention for medical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, pp 36–46

  49. Petit O, Thome N, Rambour C, et al. (2021) U-net transformer: Self and cross attention for medical image segmentation. In: Proceedings of International Workshop on Machine Learning in Medical Imaging, pp 267–276

  50. Dong B, Zhou P, Yan S, et al. (2022) Self-promoted supervision for few-shot transformer. arXiv:2203.07057.

  51. Liu L, Hamilton W, Long G, et al. (2021) A universal representation transformer layer for few-shot image classification. In: Proceedings of the IEEE/CVF International Conference on Learning Representations.

  52. Jiang B, Zhao K, Tang J. (2022) Rgtransformer: region-graph transformer for image representation and few-shot classification. In: IEEE Signal Processing Letters, pp 792–796

  53. Ravichandran A, Bhotika R, Soatto S (2019) Few-shot learning with embedded class models and shot-free meta training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 331–339

  54. Qiao L, Shi Y, Li J, et al. (2019) Transductive episodic-wise adaptive metric for few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3603–3612

  55. Laenen S, Bertinetto L (2020) On episodes, prototypical networks, and few-shot learning. arXiv preprint arXiv:2012.09831

  56. Gao Z, Wu Y, Jia Y, et al. (2021) Curvature generation in curved spaces for few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8691–8700

  57. Kim J, Kim H, Kim G. (2020) Model-agnostic boundary-adversarial sampling for test-time generalization in few-shot learning. In: ECCV, pp 599–617

  58. Wang Z, Miao Z, Zhen X, et al. (2021) Learning to learn dense gaussian processes for few-shot learning. In: Neural Information Processing Systems

  59. Hu S, Moreno P, Xiao Y, et al. (2020) Empirical bayes transductive meta-learning with synthetic gradients. In: International Conference on Learning Representations

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 92367111.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lina Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, J., Zhao, L. & Yang, H. Local descriptor-based spatial cross attention network for few-shot learning. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02189-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-024-02189-1

Keywords

Navigation