Abstract
Recently, Siamese trackers based on region proposal networks (RPN) have gained a lot of popularity. However, the design of RPN requires manual tuning of parameters such as object-anchor intersection over union (IoU) and relative weights for different tasks, which is a difficult and expensive process for model training. To address this issue, we propose a novel Siamese adaptive learning network (SiamAda) for visual tracking, allowing the model trained in a flexible way. Rather than IoU-based anchor assignment, the proposed network uses spatial alignment and model learning status as criteria for anchor quality evaluation, and a Gaussian mixture distribution for adaptive assignment. Moreover, aiming at the inconsistency problem between classification confidence and localization accuracy, a localization branch is designed to predict the IoU for each candidate anchor box, responsible for localization quality assessment. Furthermore, to avoid the tricky relative weight tuning between each task’s loss, multi-task learning with homoscedastic uncertainty is employed to adaptively weigh these multiple losses. Extensive experiments on challenging benchmarks, namely OTB2015, VOT2018, DTB70, UAV20L, GOT-10k and LaSOT validate the superiority of our tracker. The ablation studies also illustrate the advantage of each strategy presented in this paper.
Similar content being viewed by others
Data availability and access
The data that support the fundings of this study are available from the corresponding author upon reasonable request.
References
Wang F, Cao P, Li F, Wang X, He B, Sun F (2022) Watb: wild animal tracking benchmark. Int J Comput Vis 131:899–917
Ahmed I, Din S, Jeon G, Piccialli F, Fortino G (2021) Towards collaborative robotics in top view surveillance: a framework for multiple object tracking by detection using deep learning. IEEE/CAA J Autom Sin 8:1253–1270
Zhang P, Zhao J, Wang D, Lu H, Ruan X (2022) Visible-thermal UAV tracking: a large-scale benchmark and new baseline. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8876–8885
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2018) Siamrpn++: evolution of Siamese visual tracking with very deep networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4277–4286
Hu W, Wang Q, Zhang L, Bertinetto L, Torr PHS (2022) Siammask: a framework for fast online object tracking and segmentation. IEEE Trans Pattern Anal Mach Intell 45:3072–3089
Zhang T, Liu X, Zhang Q, Han J (2022) Siamcda: complementarity- and distractor-aware RGB-t tracking based on Siamese network. IEEE Trans Circuits Syst Video Technol 32:1403–1417
Wang Z, Xie Q, Lai Y, Wu J, Long K, Wang J (2021) Mlvsnet: multi-level voting Siamese network for 3d visual tracking. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 3081–3090
Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 39(6):1137–1149
Bo L, Yan J, Wei W, Zheng Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8971–8980
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese networks for visual object tracking. In: Computer vision—ECCV 2018, pp 103–119
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2019) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9756–9765
Zhang X, Wan F, Liu C, Ji X, Ye Q (2019) Learning to match anchors for visual object detection. IEEE Trans Pattern Anal Mach Intell 44:3096–3109
Kim K-J, Lee HS (2020) Probabilistic anchor assignment with IOU prediction for object detection. In: European conference on computer vision
Lin T-Y, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision
Lu X, Ma C, Shen J, Yang X, Reid ID, Yang M-H (2020) Deep object tracking with shrinkage loss. IEEE Trans Pattern Anal Mach Intell 44:2386–2401
Zhang H, Ma Z, Zhang J, Chen F, Song X (2023) Multi-view confidence-aware method for adaptive Siamese tracking with shrink-enhancement loss. Pattern Anal Appl 26:1407–1424
Zhang H, Cheng L, Zhang T, Wang Y, Zhang WJ, Zhang J (2022) Target-distractor aware deep tracking with discriminative enhancement learning loss. IEEE Trans Circuits Syst Video Technol 32:6267–6278
Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7944–7953
Feng J, Pu S, Zhao K, Zhang H, Du T (2019) Enhanced initialization with multi-stage learning for robust visual tracking. In: 2019 IEEE visual communications and image processing (VCIP), pp 1–4
Wang N, Zhou W-G, Tian Q, Li H (2020) Cascaded regression tracking: towards online hard distractor discrimination. IEEE Trans Circuits Syst Video Technol 31:1580–1592
Yang K, Zhang H, Zhou D, Dong L (2022) Paarpn: probabilistic anchor assignment with region proposal network for visual tracking. Inf Sci 598:19–36
Zhou L, He Y, Li W, Mi J-X, Lei BJ (2021) Iou-guided Siamese region proposal network for real-time visual tracking. Neurocomputing 462:544–554
Guo D, Wang J, Cui Y, Wang Z, Chen S (2019) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6268–6276
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2018) Fast online object tracking and segmentation: a unifying approach. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1328–1338
Zhou W, Wen L, Zhang L, Du D, Luo T, Wu Y (2021) Siamcan: real-time visual tracking based on Siamese center-aware network. IEEE Trans Image Process 30:3597–3609
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. Proc AAAI Confer Artif Intell 34(7):12549–12556
Kendall A, Gal Y, Cipolla R (2017) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7482–7491
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: Computer science—computer vision and pattern recognition (CVPR)
Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) Gradnet: gradient-guided network for visual object tracking. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6161–6170
Dong X, Shen J (2018) Triplet loss in Siamese network for object tracking. In: European conference on computer vision
Sosnovik I, Moskalev A, Smeulders AWM (2021) Scale equivariance improves Siamese tracking. In: 2021 IEEE winter conference on applications of computer vision (WACV), pp 2764–2773
Zheng L, Chen Y, Tang M, Wang J, Lu H (2020) Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401:36–47. https://doi.org/10.1016/j.neucom.2020.02.080
Huang H, Liu G, Zhang Y, Xiong R, Zhang S (2022) Ensemble Siamese networks for object tracking. Neural Comput Appl 34(10):8173–8191. https://doi.org/10.1007/s00521-022-06911-4
Li D, Porikli F, Wen G, Kuai Y (2020) When correlation filters meet Siamese networks for real-time complementary tracking. IEEE Trans Circuits Syst Video Technol 30(2):509–519. https://doi.org/10.1109/TCSVT.2019.2892759
Zhong P, Wu W, Dai X, Zhao Q, Li S (2023) Fisher pruning for developing real-time UAV trackers. J Real-Time Image Process. https://doi.org/10.1007/s11554-023-01348-x
Yan B, Zhao H, Wang D, Lu H, Yang X (2019) ’Skimming-perusal’ tracking: a framework for real-time and robust long-term tracking. In: 2019 IEEE/CVF international conference on computer vision (ICCV)
Zhang L, Gonzalez-Garcia A, van de Weijer J, Danelljan M, Khan FS (2019) Learning the model update for Siamese trackers. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4009–4018
Zhang Z, Peng H (2020) Deeper and wider Siamese networks for real-time visual tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CPVR)
Zheng G-Z, Fu C, Ye J, Li B, Lu G, Pan J-Y (2022) Siamese object tracking for vision-based UAM approaching with pairwise scale-channel attention. In: 2022 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 10486–10492
Zheng G-Z, Fu C, Ye J, Li B, Lu G, Pan J-Y (2023) Scale-aware Siamese object tracking for vision-based UAM approaching. IEEE Trans Ind Inf 19:9349–9360
Cao Z, Fu C, Ye J, Li B, Li Y (2021) Siamapn++: Siamese attentional aggregation network for real-time UAV tracking. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3086–3092
Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9538–9547
Wu S, Li X, Wang X (2019) Iou-aware single-stage object detector for accurate localization. Image Vis Comput 97:103911
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) 2018 European Conference on Computer Vision (ECCV), pp 784–799
Chen Z, Zhong B, Li G, Zhang S, Ji R, Tang Z, Li X (2023) Siamban: target-aware tracking with Siamese box adaptive network. IEEE Trans Pattern Anal Mach Intell 45(4):5158–5173
Peng J, Jiang Z, Gu Y, Wu Y, Wang Y, Tai Y, Wang C, Lin W (2021) Siamrcr: reciprocal classification and regression for visual object tracking, pp. 952–958. arXiv:2105.11237. https://api.semanticscholar.org/CorpusID:235166830
Tang F, Ling Q (2021) Learning to rank proposals for Siamese visual tracking. IEEE Trans Image Process 30:8785–8796
Nie J, Wu H, He Z, Yang Y, Gao M, Dong Z (2022) Learning localization-aware target confidence for Siamese visual tracking. arXiv:2204.14093
Fan H, Ling H (2021) Cract: cascaded regression-align-classification for robust tracking. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 7013–7020
Zhang Z, Peng H (2020) Ocean: object-aware anchor-free tracking. In: European conference on computer vision, pp 771–787
Zheng Y, Liu X, Cheng X, Zhang K, Wu Y, Chen S (2020) Multi-task deep dual correlation filters for visual tracking. IEEE Trans Image Process 29:9614–9626
Zheng Y, Liu X, Xiao B, Cheng X, Wu Y, Chen S (2022) Multi-task convolution operators with object detection for visual tracking. IEEE Trans Circuits Syst Video Technol 32:8204–8216
Cai Y, Sui X, Gu G (2023) Multi-modal multi-task feature fusion for RGBT tracking. Inf Fus 97:101816
Wang F, Cao P, Wang X, He B, Sun F (2023) SiamADT: Siamese attention and deformable features fusion network for visual object tracking. Neural Proc Lett 55:7933–7950
Marvasti-Zadeh SM, Khaghani J, Ghanei-Yakhdan H, Kasaei S, Cheng L (2020) Comet: context-aware IOU-guided network for small object tracking. In: Asian conference on computer vision. https://api.semanticscholar.org/CorpusID:219305183
Tang F, Ling Q (2022) Ranking-based Siamese visual tracking. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8731–8740
Wang Y, Wang F, Wang C, Sun F, He J (2021) Learning saliency-aware correlation filters for visual tracking. Comput J 65:1846–1859
Sun F, Zhao T, Zhu B, Jia X, Wang F (2022) Deblurring transformer tracking with conditional cross-attention. Multimedia Syst 29:1131–1144
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6667–6676
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em—algorithm plus discussions on the paper. J R Stat Soc. Ser B (Methodol) 39(1):1–38
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2014) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: 2013 IEEE conference on computer vision and pattern recognition, pp 2411–2418
et al MK (2018) The sixth visual object tracking vot2018 challenge results. In: ECCV workshops
Li S, Yeung DY (2017) Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: AAAI conference on artificial intelligence
M. Mueller, N.S., Ghanem, B (2016) A benchmark and simulator for UAV tracking. In: European conference on computer vision (ECCV), pp 445–461
Huang L, Zhao X, Huang K (2018) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43:1562–1577
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2018) Lasot: a high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5369–5378
Voigtlaender P, Luiten J, Torr PHS, Leibe B (2020) Siam r-CNN: visual tracking by re-detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6577–6587
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 62075028).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Xin Lu and Wanqi Yang. The first draft of the manuscript was written by Xin Lu and Fusheng Li. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
The data used in this study are public datasets published on official websites and do not involve human participants and/or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, X., Li, F. & Yang, W. Siamada: visual tracking based on Siamese adaptive learning network. Neural Comput & Applic 36, 7639–7656 (2024). https://doi.org/10.1007/s00521-024-09481-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09481-9