Abstract
As a fundamental visual task, single object tracking has witnessed astonishing improvements. However, there still existing many factors should be to addressed for accurately tracking performance. Among them, visual representation is one of important influencers suffer from complex appearance changes. In this work, we propose a rich appearance representation learning strategy for tracking. First, by embedding the saliency feature extractor module, we try to improve the visual representation ability by fusing the saliency information learning from different convolution lays. With leveraging lightweight Convolutional Neural Network VGG-M as the features extractor backbone, we can attain robust appearance model by deep features with fruitful semantic information. Second, as for the classifier has significant complementary guidance for location prediction, we propose to generate diverse feature instances of the target by introducing the adversarial learning strategy. Given the generated diverse instances, many complex situations in the tracking process can be effectively simulated, especially the occlusion that conformed to the long tail distribution. Third, to optimize the bounding boxes refinement, we employ a precise pooling strategy for attaining feature maps with high resolution. Then, our approach can capture the subtle appearance changes effectively over a long time range. Finally, extensive experiments was conducted on several benchmark datasets, the results demonstrate that the proposed approach performs favorably against many state-of-the-art algorithms.
Similar content being viewed by others
References
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters, in: Proceedings of theIEEE conference on computer vision and pattern recognition, pp 2544–2550
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531
Chen P, Li W, Sun L, Ning X, Yu L, Zhang L (2019) Lgcn: learnable gabor convolution network for human gender recognition in the wild. IEICE Trans Inf Syst 102(10):2067–2071
Cheng X, Song C, Gu Y (2020) Chen B (2020) Learning attention for object tracking with adversarial learning network. EURASIP Journal on Image and Video Processing 1:1–21
Chen B, Wang D, Li P, Wang S, Lu H (2018) Real-time’actor-critic’tracking. In: Proceedings of the European conference on computer vision, Springer, pp 318–334
Chen X, Yan X, Zheng F, Jiang Y, Ji R (2020) One-shot adversarial attacks on visual tracking with dual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10176–10185
Chu L, Li H (2019) Regressive scale estimation for visual tracking. In: 2019 IEEE International conference on industrial technology (ICIT), pp 893–898
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4660–4669
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 58–66
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1430–1438
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of the European conference on computer vision, Springer, pp 472–488
Dong X, Shen J, Yu D, Wang W, Liu J, Huang H (2016) Occlusion-aware real-time object tracking. IEEE Transactions on Multimedia 19(4):763–771
Dong X, Shen J, Wu D, Guo K, Jin X, Porikli F (2019) Quadruplet network with one-shot learning for fast visual object tracking. IEEE Transactions on Image Processing 28(7):3516–3527
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision, Springer, pp 459–474
Dong X, Shen J, Wang W, Shao L, Ling H, Porikli F (2019) Dynamical hyperparameter optimization via deep reinforcement learning in tracking. IEEE Trans Pattern Anal Mach Intell 43(5):1515–1529
Fan H, Ling H (2017) Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 5486–5494
Fan H, Ling H (2018) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1763–1771
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision , pp 2980–2988
Held D, Thrun S, Savarese S (2018) Learning to track at 100 fps with deep regression networks. In: Proceedings of the European conference on computer vision, Springer, pp 749–765
Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the European conference on computer vision, Springer, pp 702–715
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Hu H, Ma B, Shen J, Sun H, Shao L, Porikli F (2018) Robust object tracking using manifold regularized convolutional neural networks. IEEE Transactions on Multimedia 21(2):510–521
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision , Springer, pp 784–799
Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of the European conference on computer vision, Springer, pp 83–98
Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A, et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European conference on computer vision, Springer, pp 3–53
Liang Z, Shen J (2019) Local semantic siamese networks for fast tracking. IEEE Transactions on Image Processing 29:3351–3364
Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark. IEEE Transactions on Image Processing 24(12):5630–5644
Li X, Ma C, Wu B, He Z, Yang M-H (2019) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1369–1378
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2018) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980
Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circ Syst video Technol 31(4):1268–1282
Lu X, Ma C, Ni B, Yang X, Reid I, Yang M-H (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European conference on computer vision, Springer, pp 353–369
Lu X, Ma C, Shen J, Yang X, Reid I, Yang M-H (2020) Deep object tracking with shrinkage loss. IEEE Trans Pattern Anal Mach Intell (01):1–1
Ma B, Hu H, Shen J, Liu Y, Shao L (2016) Generalized pooling for robust object tracking. IEEE Trans Image Process 25(9):4199–4208
Ma B, Hu H, Shen J, Zhang Y, Shao L, Porikli F (2017) Robust object tracking by nonlinear learning. IEEE Trans Neural Netw Learn Syst 29(10):4769–4781
Ma C, Huang J-B, Yang X, Yang M-H (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Ning X, Li W, Tian W, et al (2018) Deep adaptive update of discriminant kcf for visual tracking. In: International conference on neural information processing, Springer, pp 441–451
Oord AVD, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759
Pu S, Song Y, Ma C, Zhang H, Yang M-H (2018) Deep attentive tracking via reciprocative learning. In: Advances in neural information processing systems, pp 1931–1941
Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang M-H (2016) Hedged deep tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4303–4311
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Shen J, Yu D, Deng L, Dong X (2017) Fast online tracking with detection refinement. IEEE Trans Intell Transp Syst 19(1):162–173
Shen J, Liang Z, Liu J, Sun H, Shao L, Tao D (2018) Multiobject tracking by submodular optimization. IEEE Trans Cybernet 49(6):1990–2001
Song Y, Ma C, Gong L, Zhang J, Lau RW, Yang M-H (2017) Crest: convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2555–2564
Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, Shen C, Lau RW, Yang M-H (2018) Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8990–8999
Sun Y, Sun C, Wang D, He Y, Lu H (2019) Roi pooled correlation filters for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5783–5791
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2606–2615
Wang N, Zhou W, Tian Q, Hong R, Wang M, Li H (2018) Multi-cue correlation filters for robust visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4844–4853
Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Yun S, Choi J, Yoo Y, Yun K, Young Choi J (2017) Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2711–2720
Zhang J, Jin X, Sun J, Wang J, Sangaiah AK (2020) Spatial and semantic convolutional features for robust visual object tracking. Multimedia Tools and Applications 79(21):15095–15115
Zhang J, Ma S, Sclaroff S (2014) Meem: robust tracking via multiple experts using entropy minimization. In: Proceedings of the European conference on computer vision, Springer, pp 188–203
Zhang J, Sun J, Wang J, Yue X-G (2020) Visual object tracking based on residual network and cascaded correlation filters. Journal of Ambient Intelligence and Humanized Computing, pp 1–14
Zhang T, Xu C, Yang M-H (2017) Multi-task correlation particle filter for robust object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4335–4343
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking, In: Proceedings of the European conference on computer vision, Springer, pp 101–117
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This document is the results of the research project funded by Provincial Natural Science Foundation of AnHui(No. 1908085MF217), and the Anhui Provincial Education Department Fund (No. KJ2019A0022).
Rights and permissions
About this article
Cite this article
Bao, H., Shu, P. & Wang, Q. Accurate visual representation learning for single object tracking. Multimed Tools Appl 81, 24059–24079 (2022). https://doi.org/10.1007/s11042-021-11736-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11736-9