Accurate visual representation learning for single object tracking

Bao, Hua; Shu, Ping; Wang, Qijun

doi:10.1007/s11042-021-11736-9

Accurate visual representation learning for single object tracking

Published: 19 March 2022

Volume 81, pages 24059–24079, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hua Bao^1,2,
Ping Shu^1,2 &
Qijun Wang^2,3

282 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

As a fundamental visual task, single object tracking has witnessed astonishing improvements. However, there still existing many factors should be to addressed for accurately tracking performance. Among them, visual representation is one of important influencers suffer from complex appearance changes. In this work, we propose a rich appearance representation learning strategy for tracking. First, by embedding the saliency feature extractor module, we try to improve the visual representation ability by fusing the saliency information learning from different convolution lays. With leveraging lightweight Convolutional Neural Network VGG-M as the features extractor backbone, we can attain robust appearance model by deep features with fruitful semantic information. Second, as for the classifier has significant complementary guidance for location prediction, we propose to generate diverse feature instances of the target by introducing the adversarial learning strategy. Given the generated diverse instances, many complex situations in the tracking process can be effectively simulated, especially the occlusion that conformed to the long tail distribution. Third, to optimize the bounding boxes refinement, we employ a precise pooling strategy for attaining feature maps with high resolution. Then, our approach can capture the subtle appearance changes effectively over a long time range. Finally, extensive experiments was conducted on several benchmark datasets, the results demonstrate that the proposed approach performs favorably against many state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

End-to-End Object Detection with Transformers

References

Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters, in: Proceedings of theIEEE conference on computer vision and pattern recognition, pp 2544–2550
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531
Chen P, Li W, Sun L, Ning X, Yu L, Zhang L (2019) Lgcn: learnable gabor convolution network for human gender recognition in the wild. IEICE Trans Inf Syst 102(10):2067–2071
Article Google Scholar
Cheng X, Song C, Gu Y (2020) Chen B (2020) Learning attention for object tracking with adversarial learning network. EURASIP Journal on Image and Video Processing 1:1–21
Google Scholar
Chen B, Wang D, Li P, Wang S, Lu H (2018) Real-time’actor-critic’tracking. In: Proceedings of the European conference on computer vision, Springer, pp 318–334
Chen X, Yan X, Zheng F, Jiang Y, Ji R (2020) One-shot adversarial attacks on visual tracking with dual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10176–10185
Chu L, Li H (2019) Regressive scale estimation for visual tracking. In: 2019 IEEE International conference on industrial technology (ICIT), pp 893–898
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4660–4669
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 58–66
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1430–1438
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of the European conference on computer vision, Springer, pp 472–488
Dong X, Shen J, Yu D, Wang W, Liu J, Huang H (2016) Occlusion-aware real-time object tracking. IEEE Transactions on Multimedia 19(4):763–771
Article Google Scholar
Dong X, Shen J, Wu D, Guo K, Jin X, Porikli F (2019) Quadruplet network with one-shot learning for fast visual object tracking. IEEE Transactions on Image Processing 28(7):3516–3527
Article MathSciNet Google Scholar
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision, Springer, pp 459–474
Dong X, Shen J, Wang W, Shao L, Ling H, Porikli F (2019) Dynamical hyperparameter optimization via deep reinforcement learning in tracking. IEEE Trans Pattern Anal Mach Intell 43(5):1515–1529
Fan H, Ling H (2017) Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 5486–5494
Fan H, Ling H (2018) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1763–1771
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision , pp 2980–2988
Held D, Thrun S, Savarese S (2018) Learning to track at 100 fps with deep regression networks. In: Proceedings of the European conference on computer vision, Springer, pp 749–765
Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the European conference on computer vision, Springer, pp 702–715
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Hu H, Ma B, Shen J, Sun H, Shao L, Porikli F (2018) Robust object tracking using manifold regularized convolutional neural networks. IEEE Transactions on Multimedia 21(2):510–521
Article Google Scholar
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision , Springer, pp 784–799
Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of the European conference on computer vision, Springer, pp 83–98
Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Article Google Scholar
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A, et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European conference on computer vision, Springer, pp 3–53
Liang Z, Shen J (2019) Local semantic siamese networks for fast tracking. IEEE Transactions on Image Processing 29:3351–3364
Article Google Scholar
Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark. IEEE Transactions on Image Processing 24(12):5630–5644
Article MathSciNet Google Scholar
Li X, Ma C, Wu B, He Z, Yang M-H (2019) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1369–1378
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2018) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980
Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circ Syst video Technol 31(4):1268–1282
Lu X, Ma C, Ni B, Yang X, Reid I, Yang M-H (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European conference on computer vision, Springer, pp 353–369
Lu X, Ma C, Shen J, Yang X, Reid I, Yang M-H (2020) Deep object tracking with shrinkage loss. IEEE Trans Pattern Anal Mach Intell (01):1–1
Ma B, Hu H, Shen J, Liu Y, Shao L (2016) Generalized pooling for robust object tracking. IEEE Trans Image Process 25(9):4199–4208
MathSciNet MATH Google Scholar
Ma B, Hu H, Shen J, Zhang Y, Shao L, Porikli F (2017) Robust object tracking by nonlinear learning. IEEE Trans Neural Netw Learn Syst 29(10):4769–4781
Article MathSciNet Google Scholar
Ma C, Huang J-B, Yang X, Yang M-H (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Ning X, Li W, Tian W, et al (2018) Deep adaptive update of discriminant kcf for visual tracking. In: International conference on neural information processing, Springer, pp 441–451
Oord AVD, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759
Pu S, Song Y, Ma C, Zhang H, Yang M-H (2018) Deep attentive tracking via reciprocative learning. In: Advances in neural information processing systems, pp 1931–1941
Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang M-H (2016) Hedged deep tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4303–4311
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Shen J, Yu D, Deng L, Dong X (2017) Fast online tracking with detection refinement. IEEE Trans Intell Transp Syst 19(1):162–173
Article Google Scholar
Shen J, Liang Z, Liu J, Sun H, Shao L, Tao D (2018) Multiobject tracking by submodular optimization. IEEE Trans Cybernet 49(6):1990–2001
Article Google Scholar
Song Y, Ma C, Gong L, Zhang J, Lau RW, Yang M-H (2017) Crest: convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2555–2564
Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, Shen C, Lau RW, Yang M-H (2018) Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8990–8999
Sun Y, Sun C, Wang D, He Y, Lu H (2019) Roi pooled correlation filters for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5783–5791
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2606–2615
Wang N, Zhou W, Tian Q, Hong R, Wang M, Li H (2018) Multi-cue correlation filters for robust visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4844–4853
Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article Google Scholar
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Yun S, Choi J, Yoo Y, Yun K, Young Choi J (2017) Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2711–2720
Zhang J, Jin X, Sun J, Wang J, Sangaiah AK (2020) Spatial and semantic convolutional features for robust visual object tracking. Multimedia Tools and Applications 79(21):15095–15115
Article Google Scholar
Zhang J, Ma S, Sclaroff S (2014) Meem: robust tracking via multiple experts using entropy minimization. In: Proceedings of the European conference on computer vision, Springer, pp 188–203
Zhang J, Sun J, Wang J, Yue X-G (2020) Visual object tracking based on residual network and cascaded correlation filters. Journal of Ambient Intelligence and Humanized Computing, pp 1–14
Zhang T, Xu C, Yang M-H (2017) Multi-task correlation particle filter for robust object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4335–4343
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking, In: Proceedings of the European conference on computer vision, Springer, pp 101–117

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
Hua Bao & Ping Shu
Key Laboratory of Intelligent Computing and Signal Processing Ministry of Education, Hefei, 230601, China
Hua Bao, Ping Shu & Qijun Wang
School of Computer Science and Technology, Anhui University, Hefei, 230601, China
Qijun Wang

Authors

Hua Bao
View author publications
You can also search for this author in PubMed Google Scholar
Ping Shu
View author publications
You can also search for this author in PubMed Google Scholar
Qijun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qijun Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This document is the results of the research project funded by Provincial Natural Science Foundation of AnHui(No. 1908085MF217), and the Anhui Provincial Education Department Fund (No. KJ2019A0022).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bao, H., Shu, P. & Wang, Q. Accurate visual representation learning for single object tracking. Multimed Tools Appl 81, 24059–24079 (2022). https://doi.org/10.1007/s11042-021-11736-9

Download citation

Received: 06 May 2021
Revised: 17 September 2021
Accepted: 08 November 2021
Published: 19 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11042-021-11736-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate visual representation learning for single object tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate visual representation learning for single object tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation