Skip to main content
Log in

Learning Transferable Feature Representation with Swin Transformer for Object Recognition

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Recent, substantial advancements in deep learning technologies have driven the flourishing of computer vision. However, the heavy dependence on the scale of training data limits deep learning applications because it is generally hard to obtain such a large number of data in many practical scenarios. And, deep learning seems to offer no significant advantage compared with traditional machine methods in a lack of sufficient training data. The proposed approach in this paper overcomes the problem of insufficient training data by taking Swin Transformer as the backbone for feature extraction and performing the fine-tuning strategies on the target dataset for learning transferable feature representation. Our experimental results demonstrate that the proposed method has a good performance for object recognition on small-scale datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Kang Y, Chao G, Hu X, Tu Z, Chu D (2022) Deep learning for fine-grained image recognition: a comprehensive study. In: 2022 4th Asia pacific information technology conference, pp 31–39

  2. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  3. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  4. Deng Z, Zhou L (2018) Detection and recognition of traffic planar objects using colorized laser scan and perspective distortion rectification. IEEE Trans Intell Transp Syst 19(5):1485–1495

    Article  Google Scholar 

  5. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507

    Article  MathSciNet  MATH  Google Scholar 

  6. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25

  7. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  8. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 1–14

  9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778

  10. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4700–4708

  11. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Conf Workshop Neural Inf Process Syst 30(11):6000–6010

    Google Scholar 

  12. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo Wc (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Conf Workshop Neural Inf Process Syst 1(9):802–810

    Google Scholar 

  13. Zhou J, Sun J, Zhang M, Ma Y (2020) Dependable scheduling for real-time workflows on cyber-physical cloud systems. IEEE Trans Ind Inf 17(11):7820–7829

    Article  Google Scholar 

  14. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations

  15. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE international conference on computer vision, pp 10012–10022

  16. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357

  17. Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: IEEE conference on computer vision and pattern recognition, pp 9981–9990

  18. Touvron H, Cord M, Sablayrolles A, Synnaeve G, Jégou H (2021) Going deeper with image transformers. In: IEEE conference on computer vision and pattern recognition, pp 32–42

  19. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255

  20. Zhou J, Cao K, Zhou X, Chen M, Wei T, Hu S (2021) Throughput-conscious energy allocation and reliability-aware task assignment for renewable powered in-situ server systems. IEEE Trans Comput Aided Des Integr Circuits Syst 41(3):516–529

    Article  Google Scholar 

  21. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handb Syst Autoimmune Dis 1(4)

  22. Chao G, Luo Y, Ding W (2019) Recent advances in supervised dimension reduction: a survey. Mach Learn Knowl Extr 1(1):341–358

    Article  Google Scholar 

  23. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788

  24. Zagoruyko S, Komodakis N (2016) Wide residual networks. In: British machine vision conference, pp 1–13

  25. Zhang Z, Zhang H, Zhao L, Chen T, Arik S, Pfister T (2022) Nested hierarchical transformer: towards accurate, data-efficient and interpretable visual understanding. arXiv preprint arXiv:2105.12723

  26. Hassani A, Walton S, Shah N, Abuduweili A, Li J, Shi H (2021) Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704

Download references

Funding

Supported by: National Natural Science Foundation of China (62006150); Science and Technology Commission of Shanghai Municipality (21DZ2203100); Shanghai Young Science and Technology Talents Sailing Program (19YF1418400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Jie Xiong.

Ethics declarations

Conflict of Interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, JX., Xiong, YJ., Xie, XJ. et al. Learning Transferable Feature Representation with Swin Transformer for Object Recognition. Neural Process Lett 55, 2211–2223 (2023). https://doi.org/10.1007/s11063-022-11004-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-11004-3

Keywords

Navigation