Skip to main content
Log in

Sparse spatial transformers for few-shot learning

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Learning from limited data is challenging because data scarcity leads to a poor generalization of the trained model. A classical global pooled representation will probably lose useful local information. Many few-shot learning methods have recently addressed this challenge using deep descriptors and learning a pixel-level metric. However, using deep descriptors as feature representations may lose image contextual information. Moreover, most of these methods independently address each class in the support set, which cannot sufficiently use discriminative information and task-specific embeddings. In this paper, we propose a novel transformer-based neural network architecture called sparse spatial transformers (SSFormers), which finds task-relevant features and suppresses task-irrelevant features. Particularly, we first divide each input image into several image patches of different sizes to obtain dense local features. These features retain contextual information while expressing local information. Then, a sparse spatial transformer layer is proposed to find spatial correspondence between the query image and the full support set to select task-relevant image patches and suppress task-irrelevant image patches. Finally, we propose using an image patch-matching module to calculate the distance between dense local representations, thus determining which category the query image belongs to in the support set. Extensive experiments on popular few-shot learning benchmarks demonstrate the superiority of our method over state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2017. 7291–7299

  2. Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of International Conference on Computer Vision (ICCV), 2017. 2980–2988

  3. Cheng G, Lang C, Han J. Holistic prototype activation for few-shot segmentation. IEEE Trans Pattern Anal Mach Intell, 2023, 45: 4650–4666

    Google Scholar 

  4. Lang C, Cheng G, Tu B, et al. Learning what not to segment: a new perspective on few-shot segmentation. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). 2022. 8057–8067

  5. Cheng G, Li R M, Lang C B, et al. Task-wise attention guided part complementary learning for few-shot image classification. Sci China Inf Sci, 2021, 64: 120104

    Article  Google Scholar 

  6. Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of International Conference on Machine Learning (ICML), 2017. 1126–1135

  7. Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: Proceedings of International Conference on Learning Representations (ICLR), 2017

  8. Chu W H, Li Y J, Chang J C, et al. Spot and learn: a maximum-entropy patch sampler for few-shot image classification. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 6251–6260

  9. Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2016. 3630–3638

  10. Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2017. 4077–4087

  11. Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2018. 1199–1208

  12. Li W, Wang L, Xu J, et al. Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 7260–7268

  13. Kang D, Kwon H, Min J, et al. Relational embedding for few-shot classification. In: Proceedings of International Conference on Computer Vision (ICCV), 2021. 8822–8833

  14. Li A, Luo T, Xiang T, et al. Few-shot learning with global class representations. In: Proceedings of International Conference on Computer Vision (ICCV), 2019. 9715–9724

  15. Chen H, Li H, Li Y, et al. Multi-scale adaptive task attention network for few-shot learning. In: Proceedings of International Conference on Pattern Recognition (ICPR), 2022

  16. Zhang C, Cai Y, Lin G, et al. DeepEMD: few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2020. 12203–12213

  17. Hou R, Chang H, Ma B, et al. Cross attention network for few-shot classification. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2019. 4005–4016

  18. Hao F, He F, Cheng J, et al. Collect and select: semantic alignment metric learning for few-shot learning. In: Proceedings of International Conference on Computer Vision (ICCV), 2019. 8460–8469

  19. Li W, Xu J, Huo J, et al. Distribution consistency based covariance metric networks for few-shot learning. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), 2019. 8642–8649

  20. Haghverdi L, Lun A T L, Morgan M D, et al. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol, 2018, 36: 421–427

    Article  Google Scholar 

  21. Liu Y, Zheng T, Song J, et al. DMN4: few-shot learning via discriminative mutual nearest neighbor neural network. 2021. ArXiv:2103.08160

  22. Jamal M A, Qi G J. Task agnostic meta-learning for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 11719–11727

  23. Li H, Dong W, Mei X, et al. LGM-Net: learning to generate matching networks for few-shot learning. In: Proceedings of International Conference on Machine Learning (ICML), 2019. 3825–3834

  24. Ye H J, Hu H, Zhan D C, et al. Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2020. 8808–8817

  25. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2017. 8088–8017

  26. Doersch C, Gupta A, Zisserman A. CrossTransformers: spatially-aware few-shot transfer. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2020. 21981–21993

  27. Ren M, Triantafillou E, Ravi S, et al. Meta-learning for semi-supervised few-shot classification. In: Proceedings of International Conference on Learning Representations (ICLR), 2018

  28. Bertinetto L, Henriques J F, Torr P H S, et al. Meta-learning with differentiable closed-form solvers. In: Proceedings of International Conference on Learning Representations (ICLR), 2019

  29. Oreshkin B, Rodriguez P, Lacoste A. TADAM: task dependent adaptive metric for improved few-shot learning. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2018. 719–729

  30. Simon C, Koniusz P, Nock R, et al. Adaptive subspaces for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2020. 4136–4145

  31. Lee K, Maji S, Ravichandran A, et al. Meta-learning with differentiable convex optimization. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 10657–10665

  32. Kinga D, Adam J B. A method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR), 2015

  33. Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. In: Proceedings of Neural Information Processing Systems (NeurIPS) Workshop, 2017

  34. Chen M T, Wang X G, Luo H, et al. Learning to focus: cascaded feature matching network for few-shot image recognition. Sci China Inf Sci, 2021, 64: 192105

    Article  Google Scholar 

  35. Lu S, Ye H J, Zhan D C. Tailoring embedding function to heterogeneous few-shot tasks by global and local feature adaptors. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), 2021. 8776–8783

  36. Zhang H, Koniusz P, Jian S, et al. Rethinking class relations: absolute-relative supervised and unsupervised few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2021. 9432–9441

  37. Chen Z, Ge J, Zhan H, et al. Pareto self-supervised training for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2021. 13663–13672

  38. Laenen S, Bertinetto L. On episodes, prototypical networks, and few-shot learning. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2021

  39. Kim J, Kim H, Kim G. Model-agnostic boundary-adversarial sampling for test-time generalization in few-shot learning. In: Proceedings of European Conference on Computer Vision (ECCV), 2020. 599–617

  40. Dhillon G S, Chaudhari P, Ravichandran A, et al. A baseline for few-shot image classification. In: Proceedings of International Conference on Learning Representations (ICLR), 2020

  41. Chen H, Li H, Li Y, et al. Multi-level metric learning for few-shot image recognition. In: Proceedings of International Conference on Artificial Neural Networks (ICANN), 2022

  42. Tian Y, Wang Y, Krishnan D, et al. Rethinking few-shot image classification: a good embedding is all you need? In: Proceedings of European Conference on Computer Vision (ECCV), 2020. 266–282

  43. Yun S, Han D, Oh S J, et al. CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 6023–6032

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Grant Nos. 62176116, 62073160, 62276136) and Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 20KJA520-006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaxiong Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Li, H., Li, Y. et al. Sparse spatial transformers for few-shot learning. Sci. China Inf. Sci. 66, 210102 (2023). https://doi.org/10.1007/s11432-022-3700-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3700-8

Keywords

Navigation