A robust visual tracking method via local feature extraction and saliency detection

  • Yong Wang
  • Xian WeiEmail author
  • Lu Ding
  • Xiaoliang Tang
  • Huanlong Zhang
Original Article


Visual object tracking is a fundamental problem in computer vision. It heavily relies on feature description for the appearance of object. In this paper, we present a robust algorithm which exploits the locally adaptive regression kernel (LARK) feature for visual tracking. The proposed approach formulates the LARK feature in a tracking by detection framework. In addition, we compute a target-specific saliency map as LARK feature with the guidance of the tracking framework. The tracking problem is solved by maximizing an object location likelihood function. We adopt Fast Fourier Transform for fast learning and detection in this work. Extensive experimental results on challenging videos show that the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy and robustness.


Visual object tracking Locally adaptive regression kernel Correlation filter tracking Saliency detection 


Compliance with ethical standards

Conflicts of interest

We thank the anonymous editor and reviewers for their careful reading and many insightful comments and suggestions. All the authors declare that we have no conflict of interest.


  1. 1.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 13 (2006)CrossRefGoogle Scholar
  2. 2.
    Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. TPAMI 36(7), 1442–1468 (2014)CrossRefGoogle Scholar
  3. 3.
    Mei, X., Ling, H.: Robust visual tracking and vehicle classification via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2259–2272 (2011)CrossRefGoogle Scholar
  4. 4.
    Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust l1 tracker using accelerated proximal gradient approach. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 1830–1837 (2012)Google Scholar
  5. 5.
    Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via multi-task sparse learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2012)Google Scholar
  6. 6.
    Li, Fu, Jia, Xu, Xiang, Cheng, Huchuan, Lu: Visual tracking with structured patch-based model. Image Vis. Comput. 60, 124–133 (2017)CrossRefGoogle Scholar
  7. 7.
    Babenko, B., Yang, M.-H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011)CrossRefGoogle Scholar
  8. 8.
    Zhang, K., Zhang, L., Yang, M.-H.: Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36(10), 2002–2015 (2014)CrossRefGoogle Scholar
  9. 9.
    Liu, F., Shen, C., Reid, I., van den Hengel, A.: Online unsupervised feature learning for visual tracking. Image Vis. Comput. 51, 84–94 (2016)CrossRefGoogle Scholar
  10. 10.
    Ta, D., Chen, W., Gelfand, N., Pulli, K.: Surftrac: Efficient tracking and continuous object recognition using local feature descriptors. In: CVPR (2009)Google Scholar
  11. 11.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)Google Scholar
  12. 12.
    Ross, D., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1), 125–141 (2008)CrossRefGoogle Scholar
  13. 13.
    Henriques, J.F., Carreira, J., Caseiro, R., Batista, J.: Beyond hard negative mining: efficient detector learning via block-circulant decomposition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2760–2767 (2013)Google Scholar
  14. 14.
    Galoogahi, H.K., Sim, T.: Correlation filter cascade for facial landmark localization. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 1–8 (2016)Google Scholar
  15. 15.
    Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui Y. M.: Visual object tracking using adaptive correlation filters. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2544–2550 (2010)Google Scholar
  16. 16.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Computer Vision-ECCV, pp. 702–715. Springer (2012)Google Scholar
  17. 17.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. TPAMI 37, 583–596 (2015)CrossRefGoogle Scholar
  18. 18.
    Danelljan, M., Khan, F.S., Felsberg, M., Weijer, J.v.d.: Adaptive color attributes for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 1090–1097 (2014)Google Scholar
  19. 19.
    Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Computer Vision-ECCV 2014 Workshops, pp. 254–265. Springer (2014)Google Scholar
  20. 20.
    Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference BMVC (2014)Google Scholar
  21. 21.
    Ma, C., Yang, X., Zhang, C., Yang, M.-H.: Long-term correlation tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5388-5396 (2015)Google Scholar
  22. 22.
    Zhang, K., Zhang, L., Liu, Q., Zhang, D., Yang, M.-H.: Fast visual tracking via dense spatio-temporal context learning. In: Computer Vision-ECCV 2014, pp. 127–141. Springer (2014)Google Scholar
  23. 23.
    Dalal, N., Triggs, B.: Histogram of oriented gradietns for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  24. 24.
    Seo, H.J., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(12), 1–27 (2009)CrossRefGoogle Scholar
  25. 25.
    Seo, H., Milanfar, P.: Training-free, generic object detection using locally adaptive regression kernels. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1688–1704 (2010)CrossRefGoogle Scholar
  26. 26.
    Seo, H.J., Milanfar, P.: Face verification using the lark representation. IEEE Trans. Inf. Forensics Secur. 6(4), 1275–1286 (2011)CrossRefGoogle Scholar
  27. 27.
    Seo, H.J., Milanfar, P.: Action recognition from one example. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 867–882 (2011)CrossRefGoogle Scholar
  28. 28.
    Wang, Y., Shi, W., Wu, S.: Robust UAV-based tracking using hybrid classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137 (2017)Google Scholar
  29. 29.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2411–2418 (2013)Google Scholar
  30. 30.
    Shen, J., Liang, Z., Liu, J., Sun, H., Shao, L., Tao, D.: Multiobject tracking by submodular optimization. IEEE Trans. Cybern. 99, 1–12 (2018)Google Scholar
  31. 31.
    Ma, B., Hongwei, H., Shen, J., Liu, Y., Shao, L.: Generalized pooling for robust object tracking. IEEE Trans. Image Process. 25(9), 4199–4208 (2016)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Ma, B., Huang, L., Shen, J., Shao, L., Yang, M.-H., Porikli, F.: Visual tracking under motion blur. IEEE Trans. Image Process. 25(12), 5867–5876 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: ICCV (2011)Google Scholar
  34. 34.
    Wei, X., Li, Y., Shen, H., Wang, Z.: Dynamical texture modeling via joint video dictionary learning. IEEE Trans. Image Process. 26(6), 2929–2943 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Wei, X., Hao, S., Li, Y., Tang, X., Wang, F., Kleinsteuber, M., Murphey, Y.L.: Reconstructible nonlinear dimensionality reduction via joint dictionary learning [J]. IEEE Trans. Neural Netw. Learn. Syst. 30(1), 175–189 (2019)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Ma, B., Shen, J., Liu, Y., Hongwei, H., Shao, Ling, Li, Xuelong: Visual tracking using strong classifier and structural local sparse descriptors. IEEE Trans. Multimed. 17(10), 1818–1828 (2015)CrossRefGoogle Scholar
  37. 37.
    Ma, B., Huang, L., Shen, J., Shao, L.: Discriminative tracking using tensor pooling. IEEE Trans. Cybern. 46(11), 2411–2422 (2016)CrossRefGoogle Scholar
  38. 38.
    Dong, X., Shen, J., Dajiang, Y., Wang, W., Liu, J., Huang, H.: Occlusion-aware real-time object tracking. IEEE Trans. Multimed. 19(4), 763–771 (2017)CrossRefGoogle Scholar
  39. 39.
    Shen, J., Dajiang, Y., Deng, L., Dong, X.: Fast online tracking with detection refinement. IEEE Trans. Intell. Transp. Syst. 19(1), 162–173 (2018)CrossRefGoogle Scholar
  40. 40.
    Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision, Springer, Cham (2014)Google Scholar
  41. 41.
    Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)Google Scholar
  42. 42.
    Guibo, Zhu, Wang, Jinqiao, Wu, Yi, Lu, Hanqing: Collaborative correlation tracking. In: BMVC, pp. 1–184 (2015)Google Scholar
  43. 43.
    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR (2017)Google Scholar
  44. 44.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer, Cham (2016)Google Scholar
  45. 45.
    Hongwei, H., Ma, B., Shen, J., Shao, L.: Manifold regularized correlation object tracking. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1786–1795 (2018)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: ECCV, vol. 13, pp. 472–488 (2018)Google Scholar
  47. 47.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  48. 48.
    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082Google Scholar
  49. 49.
    Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  50. 50.
    Wang, W., Shen, J., Ling, H.: A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans. Pattern Anal. Mach. Intell. (2018)Google Scholar
  51. 51.
    Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  52. 52.
    Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  53. 53.
    Seo, H.J., Milanfar, P.: Nonparametric bottom-up saliency detection by self-resemblance. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, IEEE, pp. 45–52 (2009)Google Scholar
  54. 54.
    Zhong, W., Lu, H., Yang, M.-H.: Robust object tracking via sparsity-based collaborative model. In: CVPR (2012)Google Scholar
  55. 55.
    Kalal, Z., Matas, J., Mikolajczyk, K.: P-N learning: bootstrapping binary classifiers by structural constraints. In: CVPR (2010)Google Scholar
  56. 56.
    Kwon, J., Lee, K.M.: Visual tracking decomposition. In: CVPR (2010)Google Scholar
  57. 57.
    Kwon, J., Lee, K.M.: Tracking by sampling trackers. In: ICCV (2011)Google Scholar
  58. 58.
    Dinh, T.B., Vo, N., Medioni, G.: Context tracker: exploring supporters and distracters in unconstrained environments. In: CVPR (2011)Google Scholar
  59. 59.
    Jia, X., Lu, H., Yang, M.-H.: Visual tracking via adaptive structural local sparse appearance model. In: CVPR (2012)Google Scholar
  60. 60.
    Sevilla-Lara, L., Learned-Miller, E.: Distribution fields for tracking. In: CVPR (2012)Google Scholar
  61. 61.
    Liu, B., Huang, J., Yang, L., Kulikowsk, C.: Robust tracking using local sparse appearance model and K-selection. In: CVPR (D2011)Google Scholar
  62. 62.
    Perez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-based probabilistic tracking. In: ECCV (2002)Google Scholar
  63. 63.
    Oron, S., Bar-Hillel, A., Levi, D., Avidan, S.: Locally orderless tracking. In: CVPR (2012)Google Scholar
  64. 64.
    Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. PAMI 25(5), 564–577 (2003)CrossRefGoogle Scholar
  65. 65.
    Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on-line boosting. In: BMVC (2006)Google Scholar
  66. 66.
    Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: ECCV (2008)Google Scholar
  67. 67.
    Adam, A., Rivlin, E., Shimshoni, I.: Robust fragments-based tracking using the integral histogram. In: CVPR (2006)Google Scholar
  68. 68.
    Wu, Y., Shen, B., Ling, H.: Online robust image alignment via iterative convex optimization. In: CVPR (2012)Google Scholar
  69. 69.
    Stalder, S., Grabner, H., van Gool, L.: Beyond semi-supervised tracking: tracking should be as simple as detection, but not simpler than recognition. In: ICCV Workshop (2009)Google Scholar
  70. 70.
    Collins, R.: Mean-shift blob tracking through scale space. In: CVPR (2003)Google Scholar
  71. 71.
    Wu, Y., Lim, J., Yang, M.-H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)CrossRefGoogle Scholar
  72. 72.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  73. 73.
    Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process. 24(12), 5630–5644 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  74. 74.
    Wang, W., Shen, J., Shao, L., Porikli, F.: Correspondence driven saliency transfer. IEEE Trans. Image Process. 25(11), 5025–5034 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  75. 75.
    Wang, W., Shen, J., Yang, R., Porikli, F.: Saliency-aware video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 20–33 (2018)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electrical Engineering and Computer ScienceUniversity of OttawaOttawaCanada
  2. 2.Fujian Institute of Research on the Structure of MatterChinese Academy of Sciences (CAS)FuzhouChina
  3. 3.School of Aeronautics and AstronauticsShanghai Jiao Tong UniversityShanghaiChina
  4. 4.College of Electric and Information EngineeringZhengzhou University of Light IndustryZhengzhouChina

Personalised recommendations