Skip to main content
Log in

Siamada: visual tracking based on Siamese adaptive learning network

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recently, Siamese trackers based on region proposal networks (RPN) have gained a lot of popularity. However, the design of RPN requires manual tuning of parameters such as object-anchor intersection over union (IoU) and relative weights for different tasks, which is a difficult and expensive process for model training. To address this issue, we propose a novel Siamese adaptive learning network (SiamAda) for visual tracking, allowing the model trained in a flexible way. Rather than IoU-based anchor assignment, the proposed network uses spatial alignment and model learning status as criteria for anchor quality evaluation, and a Gaussian mixture distribution for adaptive assignment. Moreover, aiming at the inconsistency problem between classification confidence and localization accuracy, a localization branch is designed to predict the IoU for each candidate anchor box, responsible for localization quality assessment. Furthermore, to avoid the tricky relative weight tuning between each task’s loss, multi-task learning with homoscedastic uncertainty is employed to adaptively weigh these multiple losses. Extensive experiments on challenging benchmarks, namely OTB2015, VOT2018, DTB70, UAV20L, GOT-10k and LaSOT validate the superiority of our tracker. The ablation studies also illustrate the advantage of each strategy presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability and access

The data that support the fundings of this study are available from the corresponding author upon reasonable request.

References

  1. Wang F, Cao P, Li F, Wang X, He B, Sun F (2022) Watb: wild animal tracking benchmark. Int J Comput Vis 131:899–917

    Article  Google Scholar 

  2. Ahmed I, Din S, Jeon G, Piccialli F, Fortino G (2021) Towards collaborative robotics in top view surveillance: a framework for multiple object tracking by detection using deep learning. IEEE/CAA J Autom Sin 8:1253–1270

    Article  Google Scholar 

  3. Zhang P, Zhao J, Wang D, Lu H, Ruan X (2022) Visible-thermal UAV tracking: a large-scale benchmark and new baseline. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8876–8885

  4. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2018) Siamrpn++: evolution of Siamese visual tracking with very deep networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4277–4286

  5. Hu W, Wang Q, Zhang L, Bertinetto L, Torr PHS (2022) Siammask: a framework for fast online object tracking and segmentation. IEEE Trans Pattern Anal Mach Intell 45:3072–3089

    Google Scholar 

  6. Zhang T, Liu X, Zhang Q, Han J (2022) Siamcda: complementarity- and distractor-aware RGB-t tracking based on Siamese network. IEEE Trans Circuits Syst Video Technol 32:1403–1417

    Article  Google Scholar 

  7. Wang Z, Xie Q, Lai Y, Wu J, Long K, Wang J (2021) Mlvsnet: multi-level voting Siamese network for 3d visual tracking. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 3081–3090

  8. Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 39(6):1137–1149

    Google Scholar 

  9. Bo L, Yan J, Wei W, Zheng Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8971–8980

  10. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese networks for visual object tracking. In: Computer vision—ECCV 2018, pp 103–119

  11. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635

  12. Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2019) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9756–9765

  13. Zhang X, Wan F, Liu C, Ji X, Ye Q (2019) Learning to match anchors for visual object detection. IEEE Trans Pattern Anal Mach Intell 44:3096–3109

    Article  Google Scholar 

  14. Kim K-J, Lee HS (2020) Probabilistic anchor assignment with IOU prediction for object detection. In: European conference on computer vision

  15. Lin T-Y, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision

  16. Lu X, Ma C, Shen J, Yang X, Reid ID, Yang M-H (2020) Deep object tracking with shrinkage loss. IEEE Trans Pattern Anal Mach Intell 44:2386–2401

    Google Scholar 

  17. Zhang H, Ma Z, Zhang J, Chen F, Song X (2023) Multi-view confidence-aware method for adaptive Siamese tracking with shrink-enhancement loss. Pattern Anal Appl 26:1407–1424

    Article  Google Scholar 

  18. Zhang H, Cheng L, Zhang T, Wang Y, Zhang WJ, Zhang J (2022) Target-distractor aware deep tracking with discriminative enhancement learning loss. IEEE Trans Circuits Syst Video Technol 32:6267–6278

    Article  Google Scholar 

  19. Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7944–7953

  20. Feng J, Pu S, Zhao K, Zhang H, Du T (2019) Enhanced initialization with multi-stage learning for robust visual tracking. In: 2019 IEEE visual communications and image processing (VCIP), pp 1–4

  21. Wang N, Zhou W-G, Tian Q, Li H (2020) Cascaded regression tracking: towards online hard distractor discrimination. IEEE Trans Circuits Syst Video Technol 31:1580–1592

    Article  Google Scholar 

  22. Yang K, Zhang H, Zhou D, Dong L (2022) Paarpn: probabilistic anchor assignment with region proposal network for visual tracking. Inf Sci 598:19–36

    Article  Google Scholar 

  23. Zhou L, He Y, Li W, Mi J-X, Lei BJ (2021) Iou-guided Siamese region proposal network for real-time visual tracking. Neurocomputing 462:544–554

    Article  Google Scholar 

  24. Guo D, Wang J, Cui Y, Wang Z, Chen S (2019) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6268–6276

  25. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2018) Fast online object tracking and segmentation: a unifying approach. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1328–1338

  26. Zhou W, Wen L, Zhang L, Du D, Luo T, Wu Y (2021) Siamcan: real-time visual tracking based on Siamese center-aware network. IEEE Trans Image Process 30:3597–3609

    Article  Google Scholar 

  27. Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. Proc AAAI Confer Artif Intell 34(7):12549–12556

    Google Scholar 

  28. Kendall A, Gal Y, Cipolla R (2017) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7482–7491

  29. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: Computer science—computer vision and pattern recognition (CVPR)

  30. Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) Gradnet: gradient-guided network for visual object tracking. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6161–6170

  31. Dong X, Shen J (2018) Triplet loss in Siamese network for object tracking. In: European conference on computer vision

  32. Sosnovik I, Moskalev A, Smeulders AWM (2021) Scale equivariance improves Siamese tracking. In: 2021 IEEE winter conference on applications of computer vision (WACV), pp 2764–2773

  33. Zheng L, Chen Y, Tang M, Wang J, Lu H (2020) Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401:36–47. https://doi.org/10.1016/j.neucom.2020.02.080

    Article  Google Scholar 

  34. Huang H, Liu G, Zhang Y, Xiong R, Zhang S (2022) Ensemble Siamese networks for object tracking. Neural Comput Appl 34(10):8173–8191. https://doi.org/10.1007/s00521-022-06911-4

    Article  Google Scholar 

  35. Li D, Porikli F, Wen G, Kuai Y (2020) When correlation filters meet Siamese networks for real-time complementary tracking. IEEE Trans Circuits Syst Video Technol 30(2):509–519. https://doi.org/10.1109/TCSVT.2019.2892759

    Article  Google Scholar 

  36. Zhong P, Wu W, Dai X, Zhao Q, Li S (2023) Fisher pruning for developing real-time UAV trackers. J Real-Time Image Process. https://doi.org/10.1007/s11554-023-01348-x

  37. Yan B, Zhao H, Wang D, Lu H, Yang X (2019) ’Skimming-perusal’ tracking: a framework for real-time and robust long-term tracking. In: 2019 IEEE/CVF international conference on computer vision (ICCV)

  38. Zhang L, Gonzalez-Garcia A, van de Weijer J, Danelljan M, Khan FS (2019) Learning the model update for Siamese trackers. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4009–4018

  39. Zhang Z, Peng H (2020) Deeper and wider Siamese networks for real-time visual tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  40. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CPVR)

  41. Zheng G-Z, Fu C, Ye J, Li B, Lu G, Pan J-Y (2022) Siamese object tracking for vision-based UAM approaching with pairwise scale-channel attention. In: 2022 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 10486–10492

  42. Zheng G-Z, Fu C, Ye J, Li B, Lu G, Pan J-Y (2023) Scale-aware Siamese object tracking for vision-based UAM approaching. IEEE Trans Ind Inf 19:9349–9360

    Article  Google Scholar 

  43. Cao Z, Fu C, Ye J, Li B, Li Y (2021) Siamapn++: Siamese attentional aggregation network for real-time UAV tracking. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3086–3092

  44. Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9538–9547

  45. Wu S, Li X, Wang X (2019) Iou-aware single-stage object detector for accurate localization. Image Vis Comput 97:103911

    Article  Google Scholar 

  46. Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) 2018 European Conference on Computer Vision (ECCV), pp 784–799

  47. Chen Z, Zhong B, Li G, Zhang S, Ji R, Tang Z, Li X (2023) Siamban: target-aware tracking with Siamese box adaptive network. IEEE Trans Pattern Anal Mach Intell 45(4):5158–5173

    Google Scholar 

  48. Peng J, Jiang Z, Gu Y, Wu Y, Wang Y, Tai Y, Wang C, Lin W (2021) Siamrcr: reciprocal classification and regression for visual object tracking, pp. 952–958. arXiv:2105.11237. https://api.semanticscholar.org/CorpusID:235166830

  49. Tang F, Ling Q (2021) Learning to rank proposals for Siamese visual tracking. IEEE Trans Image Process 30:8785–8796

    Article  Google Scholar 

  50. Nie J, Wu H, He Z, Yang Y, Gao M, Dong Z (2022) Learning localization-aware target confidence for Siamese visual tracking. arXiv:2204.14093

  51. Fan H, Ling H (2021) Cract: cascaded regression-align-classification for robust tracking. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 7013–7020

  52. Zhang Z, Peng H (2020) Ocean: object-aware anchor-free tracking. In: European conference on computer vision, pp 771–787

  53. Zheng Y, Liu X, Cheng X, Zhang K, Wu Y, Chen S (2020) Multi-task deep dual correlation filters for visual tracking. IEEE Trans Image Process 29:9614–9626

    Article  Google Scholar 

  54. Zheng Y, Liu X, Xiao B, Cheng X, Wu Y, Chen S (2022) Multi-task convolution operators with object detection for visual tracking. IEEE Trans Circuits Syst Video Technol 32:8204–8216

    Article  Google Scholar 

  55. Cai Y, Sui X, Gu G (2023) Multi-modal multi-task feature fusion for RGBT tracking. Inf Fus 97:101816

    Article  Google Scholar 

  56. Wang F, Cao P, Wang X, He B, Sun F (2023) SiamADT: Siamese attention and deformable features fusion network for visual object tracking. Neural Proc Lett 55:7933–7950

    Article  Google Scholar 

  57. Marvasti-Zadeh SM, Khaghani J, Ghanei-Yakhdan H, Kasaei S, Cheng L (2020) Comet: context-aware IOU-guided network for small object tracking. In: Asian conference on computer vision. https://api.semanticscholar.org/CorpusID:219305183

  58. Tang F, Ling Q (2022) Ranking-based Siamese visual tracking. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8731–8740

  59. Wang Y, Wang F, Wang C, Sun F, He J (2021) Learning saliency-aware correlation filters for visual tracking. Comput J 65:1846–1859

    Article  Google Scholar 

  60. Sun F, Zhao T, Zhu B, Jia X, Wang F (2022) Deblurring transformer tracking with conditional cross-attention. Multimedia Syst 29:1131–1144

    Article  Google Scholar 

  61. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6667–6676

  62. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em—algorithm plus discussions on the paper. J R Stat Soc. Ser B (Methodol) 39(1):1–38

    Google Scholar 

  63. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2014) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252

    Article  MathSciNet  Google Scholar 

  64. Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: 2013 IEEE conference on computer vision and pattern recognition, pp 2411–2418

  65. et al MK (2018) The sixth visual object tracking vot2018 challenge results. In: ECCV workshops

  66. Li S, Yeung DY (2017) Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: AAAI conference on artificial intelligence

  67. M. Mueller, N.S., Ghanem, B (2016) A benchmark and simulator for UAV tracking. In: European conference on computer vision (ECCV), pp 445–461

  68. Huang L, Zhao X, Huang K (2018) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43:1562–1577

    Article  Google Scholar 

  69. Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2018) Lasot: a high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5369–5378

  70. Voigtlaender P, Luiten J, Torr PHS, Leibe B (2020) Siam r-CNN: visual tracking by re-detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6577–6587

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62075028).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Xin Lu and Wanqi Yang. The first draft of the manuscript was written by Xin Lu and Fusheng Li. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Fusheng Li.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

The data used in this study are public datasets published on official websites and do not involve human participants and/or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, X., Li, F. & Yang, W. Siamada: visual tracking based on Siamese adaptive learning network. Neural Comput & Applic 36, 7639–7656 (2024). https://doi.org/10.1007/s00521-024-09481-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09481-9

Keywords

Navigation