Learning Transferable Feature Representation with Swin Transformer for Object Recognition

Ren, Jian-Xin; Xiong, Yu-Jie; Xie, Xi-Jiong; Dai, Yu-Fan

doi:10.1007/s11063-022-11004-3

Learning Transferable Feature Representation with Swin Transformer for Object Recognition

Published: 27 August 2022

Volume 55, pages 2211–2223, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Jian-Xin Ren¹,
Yu-Jie Xiong¹,
Xi-Jiong Xie² &
…
Yu-Fan Dai¹

376 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Recent, substantial advancements in deep learning technologies have driven the flourishing of computer vision. However, the heavy dependence on the scale of training data limits deep learning applications because it is generally hard to obtain such a large number of data in many practical scenarios. And, deep learning seems to offer no significant advantage compared with traditional machine methods in a lack of sufficient training data. The proposed approach in this paper overcomes the problem of insufficient training data by taking Swin Transformer as the backbone for feature extraction and performing the fine-tuning strategies on the target dataset for learning transferable feature representation. Our experimental results demonstrate that the proposed method has a good performance for object recognition on small-scale datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 3

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

References

Kang Y, Chao G, Hu X, Tu Z, Chu D (2022) Deep learning for fine-grained image recognition: a comprehensive study. In: 2022 4th Asia pacific information technology conference, pp 31–39
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359
Article Google Scholar
Deng Z, Zhou L (2018) Detection and recognition of traffic planar objects using colorized laser scan and perspective distortion rectification. IEEE Trans Intell Transp Syst 19(5):1485–1495
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507
Article MathSciNet MATH Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 1–14
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4700–4708
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Conf Workshop Neural Inf Process Syst 30(11):6000–6010
Google Scholar
Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo Wc (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Conf Workshop Neural Inf Process Syst 1(9):802–810
Google Scholar
Zhou J, Sun J, Zhang M, Ma Y (2020) Dependable scheduling for real-time workflows on cyber-physical cloud systems. IEEE Trans Ind Inf 17(11):7820–7829
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE international conference on computer vision, pp 10012–10022
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357
Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: IEEE conference on computer vision and pattern recognition, pp 9981–9990
Touvron H, Cord M, Sablayrolles A, Synnaeve G, Jégou H (2021) Going deeper with image transformers. In: IEEE conference on computer vision and pattern recognition, pp 32–42
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Zhou J, Cao K, Zhou X, Chen M, Wei T, Hu S (2021) Throughput-conscious energy allocation and reliability-aware task assignment for renewable powered in-situ server systems. IEEE Trans Comput Aided Des Integr Circuits Syst 41(3):516–529
Article Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handb Syst Autoimmune Dis 1(4)
Chao G, Luo Y, Ding W (2019) Recent advances in supervised dimension reduction: a survey. Mach Learn Knowl Extr 1(1):341–358
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: British machine vision conference, pp 1–13
Zhang Z, Zhang H, Zhao L, Chen T, Arik S, Pfister T (2022) Nested hierarchical transformer: towards accurate, data-efficient and interpretable visual understanding. arXiv preprint arXiv:2105.12723
Hassani A, Walton S, Shah N, Abuduweili A, Li J, Shi H (2021) Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704

Download references

Funding

Supported by: National Natural Science Foundation of China (62006150); Science and Technology Commission of Shanghai Municipality (21DZ2203100); Shanghai Young Science and Technology Talents Sailing Program (19YF1418400).

Author information

Authors and Affiliations

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China
Jian-Xin Ren, Yu-Jie Xiong & Yu-Fan Dai
School of Information Science and Engineering, Ningbo University, Zhejiang, 315211, China
Xi-Jiong Xie

Authors

Jian-Xin Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Jie Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Xi-Jiong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Fan Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Jie Xiong.

Ethics declarations

Conflict of Interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ren, JX., Xiong, YJ., Xie, XJ. et al. Learning Transferable Feature Representation with Swin Transformer for Object Recognition. Neural Process Lett 55, 2211–2223 (2023). https://doi.org/10.1007/s11063-022-11004-3

Download citation

Accepted: 08 August 2022
Published: 27 August 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11063-022-11004-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Transferable Feature Representation with Swin Transformer for Object Recognition

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Transferable Feature Representation with Swin Transformer for Object Recognition

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation