Advertisement

Cross-Modal Ranking with Soft Consistency and Noisy Labels for Robust RGB-T Tracking

  • Chenglong Li
  • Chengli Zhu
  • Yan Huang
  • Jin Tang
  • Liang WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11217)

Abstract

Due to the complementary benefits of visible (RGB) and thermal infrared (T) data, RGB-T object tracking attracts more and more attention recently for boosting the performance under adverse illumination conditions. Existing RGB-T tracking methods usually localize a target object with a bounding box, in which the trackers or detectors is often affected by the inclusion of background clutter. To address this problem, this paper presents a novel approach to suppress background effects for RGB-T tracking. Our approach relies on a novel cross-modal manifold ranking algorithm. First, we integrate the soft cross-modality consistency into the ranking model which allows the sparse inconsistency to account for the different properties between these two modalities. Second, we propose an optimal query learning method to handle label noises of queries. In particular, we introduce an intermediate variable to represent the optimal labels, and formulate it as a \(l_1\)-optimization based sparse learning problem. Moreover, we propose a single unified optimization algorithm to solve the proposed model with stable and efficient convergence behavior. Finally, the ranking results are incorporated into the patch-based object features to address the background effects, and the structured SVM is then adopted to perform RGB-T tracking. Extensive experiments suggest that the proposed approach performs well against the state-of-the-art methods on large-scale benchmark datasets.

Keywords

Visual tracking Information fusion Manifold ranking Soft cross-modality consistency Label optimization 

Notes

Acknowledgment

This work is jointly supported by National Key Research and Development Program of China (2016YFB1001000), National Natural Science Foundation of China (61702002, 61472002, 61525306, 61633021, 61721004, 61420102015), Beijing Natural Science Foundation (4162058), Capital Science and Technology Leading Talent Training Project (Z181100006318030), China Postdoctoral Science Foundation, Natural Science Foundation of Anhui Province (1808085QF187), Natural Science Foundation of Anhui Higher Education Institution of China (KJ2017A017), and Co-Innovation Center for Information Supply & Assurance Technology, Anhui University.

Supplementary material

474201_1_En_49_MOESM1_ESM.pdf (17.5 mb)
Supplementary material 1 (pdf 17879 KB)

References

  1. 1.
    Cvejic, N., et al.: The effect of pixel-level fusion on object tracking in multi-sensor surveillance video. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  2. 2.
    Wu, Y., Blasch, E., Chen, G., Bai, L., Ling, H.: Multiple source data fusion via sparse representation for robust visual tracking. In: Proceedings of International Conference on Information Fusion (2011)Google Scholar
  3. 3.
    Liu, H., Sun, F.: Fusion tracking in color and infrared images using joint sparse representation. Inf. Sci. 55(3), 590–599 (2012)MathSciNetGoogle Scholar
  4. 4.
    Li, C., Cheng, H., Hu, S., Liu, X., Tang, J., Lin, L.: Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25(12), 5743–5756 (2016)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Li, C., Zhao, N., Lu, Y., Zhu, C., Tang, J.: Weighted sparse representation regularized graph learning for RGB-T object tracking. In: Proceedings of ACM International Conference on Multimedia (2017)Google Scholar
  6. 6.
    Kim, H.U., Lee, D.Y., Sim, J.Y., Kim, C.S.: SOWP: spatially ordered and weighted patch descriptor for visual tracking. In: Proceedings of IEEE International Conference on Computer Vision (2015)Google Scholar
  7. 7.
    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of IEEE International Conference on Computer Vision (2015)Google Scholar
  8. 8.
    Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of IEEE International Conference on Computer Vision (2015)Google Scholar
  9. 9.
    Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_29CrossRefGoogle Scholar
  10. 10.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Wang, J., Bensmail, H., Gao, X.: Multiple graph regularized protein domain ranking. In: BMC Bioinformatics (2012)Google Scholar
  12. 12.
    Zhou, D., Weston, J., Gretton, A., Bousquet, O., Scholkopf, B.: Ranking on data manifolds. In: Proceedings of Neural Information Processing Systems (2004)Google Scholar
  13. 13.
    Zhang, L., Yang, C., Lu, H., Ruan, X., Yang, M.H.: Ranking saliency. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1892–1904 (2017)CrossRefGoogle Scholar
  14. 14.
    Wang, L., Lu, H., Yang, M.H.: Constrained superpixel tracking. IEEE Trans. Cybern. 48(3), 1030–1041 (2017)CrossRefGoogle Scholar
  15. 15.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011)CrossRefGoogle Scholar
  16. 16.
    Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low rank representation. In: Proceedings of Annual Conference on Neural Information Processing Systems (2011)Google Scholar
  17. 17.
    Gade, R., Moeslund, T.B.: Thermal cameras and applications: a survey. Mach. Vis. Appl. 25, 245–262 (2014)CrossRefGoogle Scholar
  18. 18.
    Li, C., Hu, S., Gao, S., Tang, J.: Real-time grayscale-thermal tracking via Laplacian sparse representation. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9517, pp. 54–65. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-27674-8_6CrossRefGoogle Scholar
  19. 19.
    Li, C., Lin, L., Zuo, W., Tang, J.: Learning patch-based dynamic graph for visual tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4126–4132 (2017)Google Scholar
  20. 20.
    Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)CrossRefGoogle Scholar
  21. 21.
    Fu, Y., et al.: Robust subjective visual property prediction from crowdsourced pairwise labels. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 563–577 (2016)CrossRefGoogle Scholar
  22. 22.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRefGoogle Scholar
  23. 23.
    Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Proceedings of Annual Conference on Neural Information Processing Systems (2013)Google Scholar
  24. 24.
    Ma, C., Yang, X., Zhang, C., Yang, M.H.: Long-term tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  25. 25.
    Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: Proceedings of IEEE International Conference on Computer Vision (2011)Google Scholar
  26. 26.
    Zhong, W., Lu, H., Yang, M.H.: Robust object tracking via sparsity-based collaborative model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  27. 27.
    Valmadre, J., et al.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  28. 28.
    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  29. 29.
    Choi, J., Chang, H.J., Yun, S., Fischer, T., Demiris, Y., Choi, J.Y., et al.: Attentional filter network for adaptive visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  30. 30.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional Siamese networks for object tracking. arXiv preprint arXiv:1606.09549 (2016)
  31. 31.
    Mueller, M., Smith, N., Ghanem, B.: Context-aware correlation filter tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  32. 32.
    Danelljan, M., Hager, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of British Machine Vision Conference (2014)Google Scholar
  33. 33.
    Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 188–203. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_13CrossRefGoogle Scholar
  34. 34.
    Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR)Institute of Automation, Chinese Academy of Sciences (CASIA)BeijingChina
  2. 2.School of Computer Science and TechnologyAnhui UniveristyHefeiChina

Personalised recommendations