Abstract
Because AlexNet is too shallow to form a strong feature representation, the trackers based on the Siamese network have an accuracy gap comparing with state-of-the-art algorithms. Both deep features and appearance features benefit tracking accuracy. To combine these two kinds features, the modified pre-trained VGG16 network is fine-tuned as one branch of the backbone network. Secondly, an AlexNet branch is attached after the third convolutional layer of VGG16. Thus the response maps from both branches are merged to form a preliminary strong feature representation with deep features and shallow appearance features. Thirdly, a new mean Peak-to-side ratio(mPSR) loss is designed to help network learn target features adaptively. A channel attention block and the Average-Peak-to-Correlation Energy(APCE) are designed to help select contributed features and suppress distractors. SiamPF only takes ILSVRC2015-VID as training dataset, but it achieves excellent performance on OTB-2013 / OTB-2015 / VOT2015 / VOT2016 / VOT2017 while maintaining the real-time performance of 41FPS on the GTX 1080Ti.
Similar content being viewed by others
References
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: Complementary learners for real-time tracking. In: CVPR, pp 1401–1409
Bertinetto L, Valmadre J, Henriques J, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: ECCV, pp 850–865
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: CVPR, pp 2544–2550
Chen K, Tao W (2018) Convolutional regression for visual tracking. IEEE Trans Image Process 27(7):3611–3620
Danelljan M, Shahbaz Khan F, Felsberg M, Van de Weijer J (2014) Adaptive color attributes for real-time visual tracking. In: CVPR, pp 1090–1097
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: ICCV Workshops, pp 58–66
Danelljan M, Hager G, Shahbaz Khan F, et al. (2015) Convolutional features for correlation filter based visual tracking. In: CVPR Workshops, pp 58–66
Danelljan M, Hager G, Shahbaz Khan F et al (2015) Learning spatially regularized correlation filters for visual tracking. In: ICCV, pp 4310–4318
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking: In ECCV, pp 472–488
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) ECO: efficient convolution operators for tracking. In: In CVPR, pp 6638–6646
Dong X, Shen J, Wang W, Shao L, Ling H, Porikli F (2019) Dynamical Hyperparameter Optimization via Deep Reinforcement Learning in Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence
Fan H, Ling H (2017) Sanet: Structure-aware network for visual tracking. In: CVPR Workshops, pp 42–49
Fan DP, Cheng MM, Liu JJ, Gao SH, Hou Q, Borji A (2018) Salient objects in clutter: Bringing salient object detection to the foreground. In: ECCV, pp 186–202
Fan DP, Wang W, Cheng MM, Shen J (2019) Shifting more attention to video salient object detection. In: CVPR, pp 8554–8564
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: ICCV, pp 1763–1771
Gundogdu E, Alatan AA (2018) Good features to correlate for visual tracking. IEEE Trans Image Process 27(5):2526–2540
Hare S, Golodetz S, Saffari A, Vineet V, Cheng MM, Hicks SL, Torr PH (2015) Struck: Structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: CVPR, pp 4834–4843
Henriques J, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: ECCV, pp 702–715
Henriques J, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: ECCV, pp 749–765
Hua Y, Alahari K, Schmid C (2015) Online object tracking with proposal selection. In: ICCV, pp 3092–3100
Kristan M, Matas J, Leonardis A, Vojíř T, Pflugfelder R, Fernandez G, Čehovin L (2016) A novel performance evaluation methodology for single-target trackers. IEEE Trans Pattern Anal Mach Intell 38(11):2137–2155
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: ECCV, pp 254–265
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In CVPR, pp 8971–8980
Li Y, Zhang X (2019) SiamVGG: Visual Tracking using Deeper Siamese Networks. arXiv:1902.02804
Liang Z, Shen J (2019) Local Semantic Siamese Networks for Fast Tracking. IEEE Transactions on Image Processing
Lukezic A, Vojir T, Cehovin Zajc L, Matas J, Kristan M (2017) Discriminative correlation filter with channel and spatial reliability. In: CVPR, pp 6309–6318
Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp 4293–4302
Shen J, Liang Z, Liu J, Sun H, Shao L, Tao D (2018) Multiobject tracking by submodular optimization. IEEE Trans Cybern 49(6):1990–2001
Shen J, Tang X, Dong X, Shao L (2019) Visual object tracking by hierarchical attention siamese network. IEEE transactions on cybernetics
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: CVPR, pp 2805–2813
Wang N, Yeung D-Y (2015) Ensemble-based tracking: Ensemble-based Aggregating crowdsourced structured time series data. In: ICML, pp 1107–1115
Wang M, Liu Y, Huang Z (2017) Large margin object tracking with circulant feature maps. In: CVPR, pp 4021–4029
Wang N, Zhou W, Tian Q, Hong R, Wang M, Li H (2018) Multi-Cue Correlation filters for robust visual tracking. In: CVPR, pp 4844–4853
Wu Y, Lim J, Yang MH (2013) Online object tracking: A benchmark. In: CVPR, pp 2411–2418
Zhang T, Xu C, Yang MH (2017) Multi-task correlation particle filter for robust object tracking. In: CVPR, pp 4335–4343
Zhao JX, Cao Y, Fan DP, Cheng MM, Li XY, Zhang L (2019) Contrast prior and fluid pyramid integration for RGBD salient object detection. In: CVPR, pp 3927–3936
Zhao JX, Liu JJ, Fan DP, Cao Y, Yang J, Cheng MM (2019) EGNet: Edge guidance network for salient object detection. In: ICCV, pp 8779–8788
Zhu G, Porikli F, Li H (2016) Beyond local search: Tracking objects everywhere with instance-specific proposals. In: CVPR, pp 943–951
Acknowledgements
This work is supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61671423 and Grant No. 61271403.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, Z., Zhang, R. & Yin, D. A strong feature representation for siamese network tracker. Multimed Tools Appl 79, 25873–25887 (2020). https://doi.org/10.1007/s11042-020-09164-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09164-2