Skip to main content
Log in

TGLC: Visual object tracking by fusion of global-local information and channel information

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visual object tracking aspires to locate the target incessantly in each frame with designated initial target location, which is an imperative yet demanding task in computer vision. Recent approaches strive to fuse global information of template and search region for object tracking, which achieve promising tracking performance. However, fusion of global information devastates some local details. Local information is essential for distinguishing the target from background regions. With a focus on addressing this problem, this work presents a novel tracking algorithm TGLC integrating a channel-aware convolution block and Transformer attention for global and local representation aggregation, and for channel information modeling. This method is capable of accurately estimating the bounding box of the target. Extensive experiments are conducted on five widely recognized datasets, i.e., GOT-10k, TrackingNet, LaSOT, OTB100 and UAV123. The results depict that the proposed tracking method achieves competitive tracking performance compared with state-of-the-art trackers while still running in real-time. Visualization of the tracking results on LaSOT further demonstrates the capability of the proposed tracking method to cope with tracking challenges, e.g., illumination variation, deformation of the target and background clutter.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

All the training and test datasets used in our experiment are public and can be downloaded from their official websites.

References

  1. Hsu CC, Kang LW, Chen SY, Wang IS, Hong CH, Chang CY (2023) Deep learning-based vehicle trajectory prediction based on generative adversarial network for autonomous driving applications. Multimed Tools Appl 82(7):10763–10780

    Article  Google Scholar 

  2. Čegovnik T, Stojmenova K, Tartalja I, Sodnik J (2020) Evaluation of different interface designs for human-machine interaction in vehicles. Multimed Tools Appl 79:21361–21388

    Article  Google Scholar 

  3. Tyagi B, Nigam S, Singh R (2022) A review of deep learning techniques for crowd behavior analysis. Arch Comput Methods Eng 29(7):5427–5455

    Article  Google Scholar 

  4. Nigam S, Singh R, Misra AK (2019) A review of computational approaches for human behavior detection. Arch Comput Methods Eng 26:831–863

    Google Scholar 

  5. Singh R, Nigam S, Singh AK, Elhoseny M (2020) Intelligent wavelet based techniques for advanced multimedia applications. Springer International Publishing, Cham

    Book  Google Scholar 

  6. Chen Z, Hong Z, Tao D (2015) An experimental survey on correlation filter-based tracking. arXiv preprint arXiv:150905520

  7. Nigam S, Khare A (2010) Curvelet transform based object tracking. In: 2010 international conference on computer and communication technology (ICCCT), pp 230–235

    Chapter  Google Scholar 

  8. Nigam S, Khare A (2012) Curvelet transform-based technique for tracking of moving objects. IET Comput Vis 6(3):231–251

    Article  MathSciNet  Google Scholar 

  9. Kwak S, Nam W, Han B, Han JH (2011) Learning occlusion with likelihoods for visual tracking. In: 2011 international conference on computer vision, pp 1551–1558

    Chapter  Google Scholar 

  10. Vojir T, Noskova J, Matas J (2014) Robust scale-adaptive mean-shift for tracking. Pattern Recogn Lett 49:250–258

    Article  Google Scholar 

  11. Hare S, Golodetz S, Saffari A, Vineet V, Cheng MM, Hicks SL, Torr PH (2015) Struck: structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109

    Article  Google Scholar 

  12. Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422

    Article  Google Scholar 

  13. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional Siamese networks for object tracking. In: Computer vision–the European conference on computer vision 2016 workshops, pp 850–865

  14. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980

    Google Scholar 

  15. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision, pp 101–117

    Google Scholar 

  16. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1328–1338

    Google Scholar 

  17. Wang Z, Xu J, Liu L, Zhu F, Shao L (2019) Ranet: ranking attention network for fast video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3978–3987

    Google Scholar 

  18. Yan B, Zhang X, Wang D, Lu H, Yang X (2021) Alpha-refine: boosting tracking performance by precise bounding box estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5289–5298

    Google Scholar 

  19. Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8126–8135

    Google Scholar 

  20. Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10448–10457

    Google Scholar 

  21. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proces Syst 30

  22. Huang L, Zhao X, Huang K (2019) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577

    Article  Google Scholar 

  23. Muller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European conference on computer vision, pp 300–317

    Google Scholar 

  24. Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5374–5383

    Google Scholar 

  25. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  26. Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Proceedings of the European conference on computer vision, pp 445–461

    Google Scholar 

  27. Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16519–16529

    Google Scholar 

  28. Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9981–9990

    Google Scholar 

  29. Dai Z, Liu H, Le QV, Tan M (2021) Coatnet: Marrying convolution and attention for all data sizes. Adv Neural Inf Proces Syst 34:3965–3977

    Google Scholar 

  30. Peng Z, Huang W, Gu S, Xie L, Wang Y, Jiao J, Ye Q (2021) Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 367–376

    Google Scholar 

  31. Mehta S, Rastegari M (2022) Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. In: International conference on learning representations

    Google Scholar 

  32. Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3–19

    Google Scholar 

  33. Hendria WF, Phan QT, Adzaka F, Jeong C (2023) Combining transformer and CNN for object detection in UAV imagery. ICT Express 9(2):258–263

    Article  Google Scholar 

  34. Zhang Y, Chen Y, Huang C, Gao M (2019) Object detection network based on feature fusion and attention mechanism. Future Internet 11(1):9

    Article  Google Scholar 

  35. Pandey D, Gupta P, Bhattacharya S, Sinha A, Agarwal R (2021) Transformer assisted convolutional network for cell instance segmentation. arXiv preprint arXiv:2110.02270

  36. Petit O, Thome N, Rambour C, Themyr L, Collins T, Soler L (2021) U-net transformer: self and cross attention for medical image segmentation. In: Machine learning in medical imaging: 12th international workshop, pp 267–276

    Chapter  Google Scholar 

  37. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

    Google Scholar 

  38. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229

    Google Scholar 

  39. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

    Google Scholar 

  40. Ma Z, Wang L, Zhang H, Lu W, Yin J (2020) RPT: learning point set representation for Siamese visual tracking. In: Computer vision–European conference on computer vision 2020 workshops, pp 653–665

    Google Scholar 

  41. Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9657–9666

    Google Scholar 

  42. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666

    Google Scholar 

  43. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755

    Google Scholar 

  44. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  45. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations

    Google Scholar 

  46. Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9543–9552

    Google Scholar 

  47. Yu H, Zhu P, Zhang K, Wang Y, Zhao S, Wang L, Zhang T, Hu Q (2022) Learning dynamic compact memory embedding for deformable visual object tracking. IEEE Trans Neural Netw Learn Syst

  48. Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. In: European conference on computer vision, pp 771–787

    Google Scholar 

  49. Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. Proc AAAI Conf Artif Intell 34(07):12549–12556

    Google Scholar 

  50. Bhat G, Danelljan M, Gool LV, Timofte R (2020) Know your surroundings: exploiting scene information for object tracking. In: European conference on computer vision, pp 205–221

    Google Scholar 

  51. Zheng L, Tang M, Chen Y, Wang J, Lu H (2020) Learning feature embeddings for discriminant model based tracking. In: European conference on computer vision, pp 759–775

    Google Scholar 

  52. Danelljan M, Gool LV, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7183–7192

    Google Scholar 

  53. Lukezic A, Matas J, Kristan M (2020) D3s-a discriminative single shot segmentation tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7133–7142

    Google Scholar 

  54. Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6182–6191

    Google Scholar 

  55. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291

    Google Scholar 

  56. Ma F, Shou MZ, Zhu L, Fan H, Xu Y, Yang Y, Yan Z (2022) Unified transformer tracker for object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8781–8790

    Google Scholar 

  57. Zhang H, Zhang Z, Zhang J, Zhao Y, Gao M (2023) Online bionic visual Siamese tracking based on mixed time-event triggering mechanism. Multimed Tools Appl 82(10):15199–15222

    Article  Google Scholar 

  58. Javed S, Mahmood A, Ullah I, Bouwmans T, Khonji M, Dias JMM, Werghi N (2022) A novel algorithm based on a common subspace fusion for visual object tracking. IEEE Access 10:24690–24703

    Article  Google Scholar 

  59. Zhang H, Liang J, Zhang J, Zhang T, Lin Y, Wang Y (2023) Attention-driven memory network for online visual tracking. IEEE Trans Neural Netw Learn Syst

  60. Liu J, Wang Y, Huang X, Su Y (2022) Tracking by dynamic template: dual update mechanism. J Vis Commun Image Represent 84:103456

    Article  Google Scholar 

  61. Wang J, Zhang H, Zhang J, Miao M, Zhang J (2022) Dual-branch memory network for visual object tracking. In: Chinese conference on pattern recognition and computer vision, pp 646–658

    Google Scholar 

  62. Yu Y, Xiong Y, Huang W, Scott MR (2020) Deformable Siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6728–6737

    Google Scholar 

  63. Yang K, Zhang H, Zhou D, Liu L (2021) TGAN: a simple model update strategy for visual tracking via template-guidance attention network. Neural Netw 144:61–74

    Article  Google Scholar 

  64. Du F, Liu P, Zhao W, Tang X (2020) Correlation-guided attention for corner detection based visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6836–6845

    Google Scholar 

  65. Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4660–4669

    Google Scholar 

  66. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6668–6677

    Google Scholar 

  67. Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6638–6646

    Google Scholar 

  68. Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: European conference on computer vision, pp 472–488

    Google Scholar 

  69. Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) Gradnet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6162–6171

    Google Scholar 

  70. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 58–66

    Google Scholar 

  71. Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2805–2813

    Google Scholar 

  72. Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6269–6277

    Google Scholar 

  73. Huang L, Zhao X, Huang K (2020) Globaltrack: a simple and strong baseline for long-term tracking. Proc AAAI Conf Artif Intell 34(07):11037–11044

    Google Scholar 

  74. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC); and the York Research Chairs (YRC) program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Zhang, D. & Zou, Q. TGLC: Visual object tracking by fusion of global-local information and channel information. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19002-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19002-4

Keywords

Navigation