Skip to main content
Log in

Accurate visual representation learning for single object tracking

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As a fundamental visual task, single object tracking has witnessed astonishing improvements. However, there still existing many factors should be to addressed for accurately tracking performance. Among them, visual representation is one of important influencers suffer from complex appearance changes. In this work, we propose a rich appearance representation learning strategy for tracking. First, by embedding the saliency feature extractor module, we try to improve the visual representation ability by fusing the saliency information learning from different convolution lays. With leveraging lightweight Convolutional Neural Network VGG-M as the features extractor backbone, we can attain robust appearance model by deep features with fruitful semantic information. Second, as for the classifier has significant complementary guidance for location prediction, we propose to generate diverse feature instances of the target by introducing the adversarial learning strategy. Given the generated diverse instances, many complex situations in the tracking process can be effectively simulated, especially the occlusion that conformed to the long tail distribution. Third, to optimize the bounding boxes refinement, we employ a precise pooling strategy for attaining feature maps with high resolution. Then, our approach can capture the subtle appearance changes effectively over a long time range. Finally, extensive experiments was conducted on several benchmark datasets, the results demonstrate that the proposed approach performs favorably against many state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409

  2. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865

  3. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters, in: Proceedings of theIEEE conference on computer vision and pattern recognition, pp 2544–2550

  4. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531

  5. Chen P, Li W, Sun L, Ning X, Yu L, Zhang L (2019) Lgcn: learnable gabor convolution network for human gender recognition in the wild. IEICE Trans Inf Syst 102(10):2067–2071

    Article  Google Scholar 

  6. Cheng X, Song C, Gu Y (2020) Chen B (2020) Learning attention for object tracking with adversarial learning network. EURASIP Journal on Image and Video Processing 1:1–21

    Google Scholar 

  7. Chen B, Wang D, Li P, Wang S, Lu H (2018) Real-time’actor-critic’tracking. In: Proceedings of the European conference on computer vision, Springer, pp 318–334

  8. Chen X, Yan X, Zheng F, Jiang Y, Ji R (2020) One-shot adversarial attacks on visual tracking with dual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10176–10185

  9. Chu L, Li H (2019) Regressive scale estimation for visual tracking. In: 2019 IEEE International conference on industrial technology (ICIT), pp 893–898

  10. Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4660–4669

  11. Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646

  12. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 58–66

  13. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318

  14. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1430–1438

  15. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of the European conference on computer vision, Springer, pp 472–488

  16. Dong X, Shen J, Yu D, Wang W, Liu J, Huang H (2016) Occlusion-aware real-time object tracking. IEEE Transactions on Multimedia 19(4):763–771

    Article  Google Scholar 

  17. Dong X, Shen J, Wu D, Guo K, Jin X, Porikli F (2019) Quadruplet network with one-shot learning for fast visual object tracking. IEEE Transactions on Image Processing 28(7):3516–3527

    Article  MathSciNet  Google Scholar 

  18. Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision, Springer, pp 459–474

  19. Dong X, Shen J, Wang W, Shao L, Ling H, Porikli F (2019) Dynamical hyperparameter optimization via deep reinforcement learning in tracking. IEEE Trans Pattern Anal Mach Intell 43(5):1515–1529

  20. Fan H, Ling H (2017) Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 5486–5494

  21. Fan H, Ling H (2018) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961

  22. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  23. Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1763–1771

  24. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision , pp 2980–2988

  25. Held D, Thrun S, Savarese S (2018) Learning to track at 100 fps with deep regression networks. In: Proceedings of the European conference on computer vision, Springer, pp 749–765

  26. Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the European conference on computer vision, Springer, pp 702–715

  27. Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  28. Hu H, Ma B, Shen J, Sun H, Shao L, Porikli F (2018) Robust object tracking using manifold regularized convolutional neural networks. IEEE Transactions on Multimedia 21(2):510–521

    Article  Google Scholar 

  29. Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision , Springer, pp 784–799

  30. Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of the European conference on computer vision, Springer, pp 83–98

  31. Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422

    Article  Google Scholar 

  32. Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A, et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European conference on computer vision, Springer, pp 3–53

  33. Liang Z, Shen J (2019) Local semantic siamese networks for fast tracking. IEEE Transactions on Image Processing 29:3351–3364

    Article  Google Scholar 

  34. Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark. IEEE Transactions on Image Processing 24(12):5630–5644

    Article  MathSciNet  Google Scholar 

  35. Li X, Ma C, Wu B, He Z, Yang M-H (2019) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1369–1378

  36. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519

  37. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2018) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291

  38. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980

  39. Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circ Syst video Technol 31(4):1268–1282

  40. Lu X, Ma C, Ni B, Yang X, Reid I, Yang M-H (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European conference on computer vision, Springer, pp 353–369

  41. Lu X, Ma C, Shen J, Yang X, Reid I, Yang M-H (2020) Deep object tracking with shrinkage loss. IEEE Trans Pattern Anal Mach Intell (01):1–1

  42. Ma B, Hu H, Shen J, Liu Y, Shao L (2016) Generalized pooling for robust object tracking. IEEE Trans Image Process 25(9):4199–4208

    MathSciNet  MATH  Google Scholar 

  43. Ma B, Hu H, Shen J, Zhang Y, Shao L, Porikli F (2017) Robust object tracking by nonlinear learning. IEEE Trans Neural Netw Learn Syst 29(10):4769–4781

    Article  MathSciNet  Google Scholar 

  44. Ma C, Huang J-B, Yang X, Yang M-H (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082

  45. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  46. Ning X, Li W, Tian W, et al (2018) Deep adaptive update of discriminant kcf for visual tracking. In: International conference on neural information processing, Springer, pp 441–451

  47. Oord AVD, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759

  48. Pu S, Song Y, Ma C, Zhang H, Yang M-H (2018) Deep attentive tracking via reciprocative learning. In: Advances in neural information processing systems, pp 1931–1941

  49. Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang M-H (2016) Hedged deep tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4303–4311

  50. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  51. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

  52. Shen J, Yu D, Deng L, Dong X (2017) Fast online tracking with detection refinement. IEEE Trans Intell Transp Syst 19(1):162–173

    Article  Google Scholar 

  53. Shen J, Liang Z, Liu J, Sun H, Shao L, Tao D (2018) Multiobject tracking by submodular optimization. IEEE Trans Cybernet 49(6):1990–2001

    Article  Google Scholar 

  54. Song Y, Ma C, Gong L, Zhang J, Lau RW, Yang M-H (2017) Crest: convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2555–2564

  55. Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, Shen C, Lau RW, Yang M-H (2018) Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8990–8999

  56. Sun Y, Sun C, Wang D, He Y, Lu H (2019) Roi pooled correlation filters for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5783–5791

  57. Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429

  58. Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2606–2615

  59. Wang N, Zhou W, Tian Q, Hong R, Wang M, Li H (2018) Multi-cue correlation filters for robust visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4844–4853

  60. Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  61. Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418

  62. Yun S, Choi J, Yoo Y, Yun K, Young Choi J (2017) Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2711–2720

  63. Zhang J, Jin X, Sun J, Wang J, Sangaiah AK (2020) Spatial and semantic convolutional features for robust visual object tracking. Multimedia Tools and Applications 79(21):15095–15115

    Article  Google Scholar 

  64. Zhang J, Ma S, Sclaroff S (2014) Meem: robust tracking via multiple experts using entropy minimization. In: Proceedings of the European conference on computer vision, Springer, pp 188–203

  65. Zhang J, Sun J, Wang J, Yue X-G (2020) Visual object tracking based on residual network and cascaded correlation filters. Journal of Ambient Intelligence and Humanized Computing, pp 1–14

  66. Zhang T, Xu C, Yang M-H (2017) Multi-task correlation particle filter for robust object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4335–4343

  67. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking, In: Proceedings of the European conference on computer vision, Springer, pp 101–117

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qijun Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This document is the results of the research project funded by Provincial Natural Science Foundation of AnHui(No. 1908085MF217), and the Anhui Provincial Education Department Fund (No. KJ2019A0022).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bao, H., Shu, P. & Wang, Q. Accurate visual representation learning for single object tracking. Multimed Tools Appl 81, 24059–24079 (2022). https://doi.org/10.1007/s11042-021-11736-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11736-9

Keywords

Navigation