SwinEFT: a robust and powerful Swin Transformer based Event Frame Tracker

Zeng, Zhaoyuan; Li, Xiaopeng; Fan, Cien; Zou, Lian; Chi, Ruan

doi:10.1007/s10489-023-04763-6

SwinEFT: a robust and powerful Swin Transformer based Event Frame Tracker

Published: 13 July 2023

Volume 53, pages 23564–23581, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhaoyuan Zeng¹,
Xiaopeng Li¹,
Cien Fan¹,
Lian Zou¹ &
…
Ruan Chi²

257 Accesses
1 Citation
Explore all metrics

Abstract

Recently, event cameras, as a new generation of bionic cameras with characteristics of high dynamic range and high temporal resolution, provide a brand new competitive modal for multi-modal tracking. However, recent works on RGBE tracking pay too much attention to utilizing the complementary information while ignoring to enhance the modality-shared information and the global relations inside and across modalities. In this paper, we propose an end-to-end full attention tracker named Swin Transformer Event Frame Tracker (SwinEFT) to fully explore both modality-specific and modality-shared information. To be specific, we firstly adopt a simple but effective event representation to narrow the domain gap as well as obtain a clearer tracking target. With the deployment of shifted window based attention mechanism, our tracker is better able to leverage the global relations, resulting in locating a more accurate bounding box. Besides, in order to enhance the modality-shared information, we design Swin Decoder by introducing cross-attention based on shifted windows for information interaction. Extended experiments on two realistic RGBE tracking datasets demonstrate the outstanding performance and robustness of SwinEFT against the state-of-the-art methods under various challenging scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Universal Event-Based Plug-In Module for Visual Object Tracking in Degraded Conditions

Article 18 December 2023

Feature Disentanglement and Adaptive Fusion for Improving Multi-modal Tracking

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

Data availability

All data generated or analysed during this study are included in these published articles [10, 11].

References

Javed S, Danelljan M, Shahbaz Khan F, Khan MH, Felsberg M, Matas J (2022) Visual object tracking with discriminative filters and siamese networks: a survey and outlook. IEEE Trans Pattern Anal Mach Intell
Huang L, Zhao X, Huang K (2019) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577
Article Google Scholar
Xiao Y, Yang M, Li C, Liu L, Tang J (2022) Attribute-based progressive fusion network for RGBT tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp 2831–2838
Gao Y, Li C, Zhu Y, Tang J, He T, Wang F (2019) Deep adaptive fusion network for high performance RGBT tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0
Andong L, Li C, Yan Y, Tang J, Luo B (2021) RGBT tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans Image Process 30:5613–5625
Article Google Scholar
Zhao P, Liu Q, Wang W, Guo Q (2021) TSDM: tracking by SIAMRPN++ with a depth-refiner and a mask-generator. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp 670–676
Yan S, Yang J, Käpylä J, Zheng F, Leonardis A, Kämäräinen J-K (2021) Depthtrack: unveiling the power of RGBD tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 10725–10733
Kumar A, Walia GS, Sharma K (2020) Recent trends in multicue based visual tracking: a review. Expert Syst Appl 162:113711
Article Google Scholar
Gallego G, Delbrück T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K et al (2020) Event-based vision: a survey. IEEE Trans Pattern Anal Mach Intell 44(1):154–180
Article Google Scholar
Zhang J, Yang X, Fu Y, Wei X, Yin B, Dong B (2021) Object tracking by jointly exploiting frame and event domain. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 13043–13052
Wang X, Li J, Zhu L, Zhang Z, Chen Z, Li X, Wang Y, Tian Y, Wu F (2021) Visevent: reliable object tracking via collaboration of frame and event flows. Preprint at http://arxiv.org/abs/2108.05015
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proces Syst 30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 10012–10022
Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8126–8135
Wang N, Zhou W, Wang J, Li H (2021) Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 1571–1580
Lin L, Fan H, Zhang Z, Xu Y, Ling H (2022) Swintrack: a simple and strong baseline for transformer tracking. In: Advances in Neural Information Processing Systems
Mayer C, Danelljan M, Bhat G, Paul M, Paudel DP, Yu F, Van Gool L (2022) Transforming model prediction for tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8731–8740
Ye B, Chang H, Ma B, Shan S, Chen X (2022) Joint feature learning and relation modeling for tracking: a one-stream framework. In: European Conference on Computer Vision. Springer, pp 341–357
Zhao C, Liu H, Nan S, Yan Y (2022) TFTN: a transformer-based fusion tracking framework of hyperspectral and RGB. IEEE Trans Geosci Remote Sens 60:1–15
Article Google Scholar
Feng M, Su J (2022) Learning reliable modal weight with transformer for robust RGBT tracking. Knowl-Based Syst 108945
Li C, Cheng H, Shiyi H, Liu X, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process 25(12):5743–5756
Article MathSciNet MATH Google Scholar
Lan X, Ye M, Zhang S, Zhou H, Yuen PC (2020) Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recogn Lett 130:12–20
Article Google Scholar
Qin X, Mei Y, Liu J, Li C (2021) Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Trans Multimedia 24:567–580
Google Scholar
Zhang P, Zhao J, Bo Chunjuan, Wang Dong, Huchuan Lu, Yang Xiaoyun (2021) Jointly modeling motion and appearance cues for robust RGB-t tracking. IEEE Trans Image Process 30:3335–3347
Article Google Scholar
Zhengzheng T, Lin C, Zhao W, Li C, Tang J (2021) M 5 l: multi-modal multi-margin metric learning for RGBT tracking. IEEE Trans Image Process 31:85–98
Google Scholar
Hu Yu, Li X, Fan C, Zou L, Yuanmei W (2023) MSDA: multi-scale domain adaptation dehazing network. Appl Intell 53(2):2147–2160
Article Google Scholar
Li X, Fan C, Zhao C, Zou L, Tian S (2022) NIRN: self-supervised noisy image reconstruction network for real-world image denoising. Appl Intell 1–18
Li X, Yu H, Zhao C, Fan C, Zou L (2023) DADRNet: cross-domain image dehazing via domain adaptation and disentangled representation. Neurocomputing 126242
Gehrig D, Rebecq H, Gallego G, Scaramuzza D (2020) EKLT: asynchronous photometric feature tracking using events and frames. Int J Comput Vision 128(3):601–618
Article Google Scholar
Huang J, Wang S, Guo M, Chen S (2018) Event-guided structured output tracking of fast-moving objects using a celex sensor. IEEE Trans Circuits Syst Video Technol 28(9):2413–2417
Article Google Scholar
Yang Z, Wu Y, Wang G, Yang Y, Li G, Deng L, Zhu J, Shi L (2019) DashNet: a hybrid artificial and spiking neural network for high-speed object tracking. Preprint at http://arxiv.org/abs/1909.12942
Rebecq H, Horstschaefer T, Scaramuzza D (2017) Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization. In: Proceedings of the British Machine Vision Conference (BMVC). pp 16–1
Maqueda AI, Loquercio A, Gallego G, García N, Scaramuzza D (2018) Event-based vision meets deep learning on steering prediction for self-driving cars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 5419–5427
Zhu AZ, Yuan L (2018) EV-flownet: self-supervised optical flow estimation for event-based cameras. In: Robotics: Science and Systems. pp 1–9
Benosman R, Clercq C, Lagorce X, Ieng S-H, Bartolozzi C (2013) Event-based visual flow. IEEE Trans Neural Netw Learn Syst 25(2):407–417
Article Google Scholar
Zhu AZ, Yuan L, Chaney K, Daniilidis K (2019) Unsupervised event-based learning of optical flow, depth, and egomotion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 989–997
Sironi A, Brambilla M, Bourdis N, Lagorce X, Benosman R (2018) Hats: histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1731–1740
Zhou ST, Ruan PV, Canu S (2022) A tri-attention fusion guided multi-modal segmentation network. Pattern Recogn 124:108417
Article Google Scholar
Zhang H, Wang Y, Dayoub F, Sunderhauf N (2021) Varifocalnet: an iou-aware dense object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8514–8523
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 658–666
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4660–4669
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 6182–6191
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) SIAMFC++: towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp 12549–12556
Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 10448–10457
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. Preprint at http://arxiv.org/abs/1711.05101
Lagorce X, Orchard G, Galluppi F, Shi BE, Benosman RB (2016) Hots: a hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans Pattern Anal Mach Intell 39(7):1346–1359
Article Google Scholar
Haosheng Chen, David Suter, Qiangqiang Wu, and Hanzi Wang (2020) End-to-end learning of object motion estimation from retinal events for event-based object tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp 10534–10541
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778

Download references

Funding

This work was funded by Open and Innovation Fund of Hubei Three Gorges Laboratory, grant number SK215002.

Author information

Authors and Affiliations

School of Electronic Information, Wuhan University, Wuhan, 430072, China
Zhaoyuan Zeng, Xiaopeng Li, Cien Fan & Lian Zou
Hubei Three Gorges Laboratory, Yichang, 443007, China
Ruan Chi

Authors

Zhaoyuan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Cien Fan
View author publications
You can also search for this author in PubMed Google Scholar
Lian Zou
View author publications
You can also search for this author in PubMed Google Scholar
Ruan Chi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lian Zou.

Ethics declarations

Conflict of interest

All authors declare that there are no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zeng, Z., Li, X., Fan, C. et al. SwinEFT: a robust and powerful Swin Transformer based Event Frame Tracker. Appl Intell 53, 23564–23581 (2023). https://doi.org/10.1007/s10489-023-04763-6

Download citation

Accepted: 02 June 2023
Published: 13 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04763-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SwinEFT: a robust and powerful Swin Transformer based Event Frame Tracker

Abstract

Access this article

Similar content being viewed by others

A Universal Event-Based Plug-In Module for Visual Object Tracking in Degraded Conditions

Feature Disentanglement and Adaptive Fusion for Improving Multi-modal Tracking

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SwinEFT: a robust and powerful Swin Transformer based Event Frame Tracker

Abstract

Access this article

Similar content being viewed by others

A Universal Event-Based Plug-In Module for Visual Object Tracking in Degraded Conditions

Feature Disentanglement and Adaptive Fusion for Improving Multi-modal Tracking

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation