A robust visual tracking method via local feature extraction and saliency detection


Visual object tracking is a fundamental problem in computer vision. It heavily relies on feature description for the appearance of object. In this paper, we present a robust algorithm which exploits the locally adaptive regression kernel (LARK) feature for visual tracking. The proposed approach formulates the LARK feature in a tracking by detection framework. In addition, we compute a target-specific saliency map as LARK feature with the guidance of the tracking framework. The tracking problem is solved by maximizing an object location likelihood function. We adopt Fast Fourier Transform for fast learning and detection in this work. Extensive experimental results on challenging videos show that the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy and robustness.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17


  1. 1.

    Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 13 (2006)

    Article  Google Scholar 

  2. 2.

    Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. TPAMI 36(7), 1442–1468 (2014)

    Article  Google Scholar 

  3. 3.

    Mei, X., Ling, H.: Robust visual tracking and vehicle classification via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2259–2272 (2011)

    Article  Google Scholar 

  4. 4.

    Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust l1 tracker using accelerated proximal gradient approach. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 1830–1837 (2012)

  5. 5.

    Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via multi-task sparse learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2012)

  6. 6.

    Li, Fu, Jia, Xu, Xiang, Cheng, Huchuan, Lu: Visual tracking with structured patch-based model. Image Vis. Comput. 60, 124–133 (2017)

    Article  Google Scholar 

  7. 7.

    Babenko, B., Yang, M.-H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011)

    Article  Google Scholar 

  8. 8.

    Zhang, K., Zhang, L., Yang, M.-H.: Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36(10), 2002–2015 (2014)

    Article  Google Scholar 

  9. 9.

    Liu, F., Shen, C., Reid, I., van den Hengel, A.: Online unsupervised feature learning for visual tracking. Image Vis. Comput. 51, 84–94 (2016)

    Article  Google Scholar 

  10. 10.

    Ta, D., Chen, W., Gelfand, N., Pulli, K.: Surftrac: Efficient tracking and continuous object recognition using local feature descriptors. In: CVPR (2009)

  11. 11.

    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)

  12. 12.

    Ross, D., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1), 125–141 (2008)

    Article  Google Scholar 

  13. 13.

    Henriques, J.F., Carreira, J., Caseiro, R., Batista, J.: Beyond hard negative mining: efficient detector learning via block-circulant decomposition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2760–2767 (2013)

  14. 14.

    Galoogahi, H.K., Sim, T.: Correlation filter cascade for facial landmark localization. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 1–8 (2016)

  15. 15.

    Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui Y. M.: Visual object tracking using adaptive correlation filters. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2544–2550 (2010)

  16. 16.

    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Computer Vision-ECCV, pp. 702–715. Springer (2012)

  17. 17.

    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. TPAMI 37, 583–596 (2015)

    Article  Google Scholar 

  18. 18.

    Danelljan, M., Khan, F.S., Felsberg, M., Weijer, J.v.d.: Adaptive color attributes for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 1090–1097 (2014)

  19. 19.

    Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Computer Vision-ECCV 2014 Workshops, pp. 254–265. Springer (2014)

  20. 20.

    Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference BMVC (2014)

  21. 21.

    Ma, C., Yang, X., Zhang, C., Yang, M.-H.: Long-term correlation tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5388-5396 (2015)

  22. 22.

    Zhang, K., Zhang, L., Liu, Q., Zhang, D., Yang, M.-H.: Fast visual tracking via dense spatio-temporal context learning. In: Computer Vision-ECCV 2014, pp. 127–141. Springer (2014)

  23. 23.

    Dalal, N., Triggs, B.: Histogram of oriented gradietns for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

  24. 24.

    Seo, H.J., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(12), 1–27 (2009)

    Article  Google Scholar 

  25. 25.

    Seo, H., Milanfar, P.: Training-free, generic object detection using locally adaptive regression kernels. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1688–1704 (2010)

    Article  Google Scholar 

  26. 26.

    Seo, H.J., Milanfar, P.: Face verification using the lark representation. IEEE Trans. Inf. Forensics Secur. 6(4), 1275–1286 (2011)

    Article  Google Scholar 

  27. 27.

    Seo, H.J., Milanfar, P.: Action recognition from one example. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 867–882 (2011)

    Article  Google Scholar 

  28. 28.

    Wang, Y., Shi, W., Wu, S.: Robust UAV-based tracking using hybrid classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137 (2017)

  29. 29.

    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2411–2418 (2013)

  30. 30.

    Shen, J., Liang, Z., Liu, J., Sun, H., Shao, L., Tao, D.: Multiobject tracking by submodular optimization. IEEE Trans. Cybern. 99, 1–12 (2018)

    Google Scholar 

  31. 31.

    Ma, B., Hongwei, H., Shen, J., Liu, Y., Shao, L.: Generalized pooling for robust object tracking. IEEE Trans. Image Process. 25(9), 4199–4208 (2016)

    MathSciNet  MATH  Google Scholar 

  32. 32.

    Ma, B., Huang, L., Shen, J., Shao, L., Yang, M.-H., Porikli, F.: Visual tracking under motion blur. IEEE Trans. Image Process. 25(12), 5867–5876 (2016)

    MathSciNet  Article  Google Scholar 

  33. 33.

    Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: ICCV (2011)

  34. 34.

    Wei, X., Li, Y., Shen, H., Wang, Z.: Dynamical texture modeling via joint video dictionary learning. IEEE Trans. Image Process. 26(6), 2929–2943 (2017)

    MathSciNet  Article  Google Scholar 

  35. 35.

    Wei, X., Hao, S., Li, Y., Tang, X., Wang, F., Kleinsteuber, M., Murphey, Y.L.: Reconstructible nonlinear dimensionality reduction via joint dictionary learning [J]. IEEE Trans. Neural Netw. Learn. Syst. 30(1), 175–189 (2019)

    MathSciNet  Article  Google Scholar 

  36. 36.

    Ma, B., Shen, J., Liu, Y., Hongwei, H., Shao, Ling, Li, Xuelong: Visual tracking using strong classifier and structural local sparse descriptors. IEEE Trans. Multimed. 17(10), 1818–1828 (2015)

    Article  Google Scholar 

  37. 37.

    Ma, B., Huang, L., Shen, J., Shao, L.: Discriminative tracking using tensor pooling. IEEE Trans. Cybern. 46(11), 2411–2422 (2016)

    Article  Google Scholar 

  38. 38.

    Dong, X., Shen, J., Dajiang, Y., Wang, W., Liu, J., Huang, H.: Occlusion-aware real-time object tracking. IEEE Trans. Multimed. 19(4), 763–771 (2017)

    Article  Google Scholar 

  39. 39.

    Shen, J., Dajiang, Y., Deng, L., Dong, X.: Fast online tracking with detection refinement. IEEE Trans. Intell. Transp. Syst. 19(1), 162–173 (2018)

    Article  Google Scholar 

  40. 40.

    Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision, Springer, Cham (2014)

    Google Scholar 

  41. 41.

    Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)

  42. 42.

    Guibo, Zhu, Wang, Jinqiao, Wu, Yi, Lu, Hanqing: Collaborative correlation tracking. In: BMVC, pp. 1–184 (2015)

  43. 43.

    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR (2017)

  44. 44.

    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer, Cham (2016)

    Google Scholar 

  45. 45.

    Hongwei, H., Ma, B., Shen, J., Shao, L.: Manifold regularized correlation object tracking. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1786–1795 (2018)

    MathSciNet  Article  Google Scholar 

  46. 46.

    Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: ECCV, vol. 13, pp. 472–488 (2018)

    Google Scholar 

  47. 47.

    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2009)

  48. 48.

    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082

  49. 49.

    Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2018)

    MathSciNet  Article  Google Scholar 

  50. 50.

    Wang, W., Shen, J., Ling, H.: A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans. Pattern Anal. Mach. Intell. (2018)

  51. 51.

    Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2018)

    MathSciNet  Article  Google Scholar 

  52. 52.

    Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)

    MathSciNet  Article  Google Scholar 

  53. 53.

    Seo, H.J., Milanfar, P.: Nonparametric bottom-up saliency detection by self-resemblance. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, IEEE, pp. 45–52 (2009)

  54. 54.

    Zhong, W., Lu, H., Yang, M.-H.: Robust object tracking via sparsity-based collaborative model. In: CVPR (2012)

  55. 55.

    Kalal, Z., Matas, J., Mikolajczyk, K.: P-N learning: bootstrapping binary classifiers by structural constraints. In: CVPR (2010)

  56. 56.

    Kwon, J., Lee, K.M.: Visual tracking decomposition. In: CVPR (2010)

  57. 57.

    Kwon, J., Lee, K.M.: Tracking by sampling trackers. In: ICCV (2011)

  58. 58.

    Dinh, T.B., Vo, N., Medioni, G.: Context tracker: exploring supporters and distracters in unconstrained environments. In: CVPR (2011)

  59. 59.

    Jia, X., Lu, H., Yang, M.-H.: Visual tracking via adaptive structural local sparse appearance model. In: CVPR (2012)

  60. 60.

    Sevilla-Lara, L., Learned-Miller, E.: Distribution fields for tracking. In: CVPR (2012)

  61. 61.

    Liu, B., Huang, J., Yang, L., Kulikowsk, C.: Robust tracking using local sparse appearance model and K-selection. In: CVPR (D2011)

  62. 62.

    Perez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-based probabilistic tracking. In: ECCV (2002)

  63. 63.

    Oron, S., Bar-Hillel, A., Levi, D., Avidan, S.: Locally orderless tracking. In: CVPR (2012)

  64. 64.

    Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. PAMI 25(5), 564–577 (2003)

    Article  Google Scholar 

  65. 65.

    Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on-line boosting. In: BMVC (2006)

  66. 66.

    Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: ECCV (2008)

  67. 67.

    Adam, A., Rivlin, E., Shimshoni, I.: Robust fragments-based tracking using the integral histogram. In: CVPR (2006)

  68. 68.

    Wu, Y., Shen, B., Ling, H.: Online robust image alignment via iterative convex optimization. In: CVPR (2012)

  69. 69.

    Stalder, S., Grabner, H., van Gool, L.: Beyond semi-supervised tracking: tracking should be as simple as detection, but not simpler than recognition. In: ICCV Workshop (2009)

  70. 70.

    Collins, R.: Mean-shift blob tracking through scale space. In: CVPR (2003)

  71. 71.

    Wu, Y., Lim, J., Yang, M.-H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  72. 72.

    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision. 88(2), 303–338 (2010)

    Article  Google Scholar 

  73. 73.

    Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process. 24(12), 5630–5644 (2015)

    MathSciNet  Article  Google Scholar 

  74. 74.

    Wang, W., Shen, J., Shao, L., Porikli, F.: Correspondence driven saliency transfer. IEEE Trans. Image Process. 25(11), 5025–5034 (2016)

    MathSciNet  Article  Google Scholar 

  75. 75.

    Wang, W., Shen, J., Yang, R., Porikli, F.: Saliency-aware video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 20–33 (2018)

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Xian Wei.

Ethics declarations

Conflicts of interest

We thank the anonymous editor and reviewers for their careful reading and many insightful comments and suggestions. All the authors declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was jointly supported by CAS Pioneer Hundred Talents Program (Type C) under Grant No. 2017-122, National Science Found for Young Scholars under Grant No. 61806186 and the National Natural Science Foundation of China (No. 61503173, No. 61873246).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Wei, X., Ding, L. et al. A robust visual tracking method via local feature extraction and saliency detection. Vis Comput 36, 683–700 (2020). https://doi.org/10.1007/s00371-019-01646-1

Download citation


  • Visual object tracking
  • Locally adaptive regression kernel
  • Correlation filter tracking
  • Saliency detection