Skip to main content

Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking

Abstract

The development of a real-time and robust RGB-T tracker is an extremely challenging task because the tracked object may suffer from shared and specific challenges in RGB and thermal (T) modalities. In this work, we observe that the implicit attribute information can boost the model discriminability, and propose a novel attribute-driven representation network to improve the RGB-T tracking performance. First, according to appearance change in RGB-T tracking scenarios, we divide the major and special challenges into four typical attributes: extreme illumination, occlusion, motion blur, and thermal crossover. Second, we design an attribute-driven residual branch for each heterogeneous attribute to mine the attribute-specific property and therefore build a powerful residual representation for object modeling. Furthermore, we aggregate these representations in channel and pixel levels by using the proposed attribute ensemble network (AENet) to adaptively fit the attribute-agnostic tracking process. The AENet can effectively make aware of appearance change while suppressing the distractors. Finally, we conduct numerous experiments on three RGB-T tracking benchmarks to compare the proposed trackers with other state-of-the-art methods. Experimental results show that our tracker achieves very competitive results with a real-time tracking speed. Code will be available at https://github.com/zhang-pengyu/ADRNet.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    We exclude DAPNet Zhu et al. 2019 for comparison, which does not report its speed in the original paper.

References

  1. Ak, K.E., Kassim, A.A., Lim, J.H., & Tham, J.Y., (2018)Learning attribute representations with localization for flexible fashion search. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7708–7717

  2. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: (2016) Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision Workshop, pp. 850–865

  3. Bhat, G., Danelljan, M., Gool, L.V., & Timofte, R., (2019)Learning discriminative model prediction for tracking. In: IEEE International Conference on Computer Vision, pp. 6182–6191

  4. Bolme, D.S., Beveridge, J.R., Draper, B.A., & Lui, Y.M., (2010)Visual object tracking using adaptive correlation filters. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2544–2550

  5. Camplani, M., Hannuna, S., Mirmehdi, M., Damen, D., Paiement, A., Tao, L., Burghardt, T.: Real-time RGB-D tracking with depth scaling kernelised correlation filters and occlusion handling. In: British Machine Vision Conference, pp. 1–11

  6. Chen, Z., Zhong, B., Li, G., Zhang, S., & Ji, R., (2020) Siamese box adaptive network for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6668–6677

  7. Danelljan, M., Bhat, G., Khan, F.S., & Felsberg, M., (2017) ECO: Efficient convolution operators for tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646

  8. Danelljan, M., Bhat, G., Khan, F.S., & Felsberg, M., (2019) ATOM: Accurate tracking by overlap maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4660–4669

  9. Danelljan, M., Gool, L.V., & Timofte, R., (2020) Probabilistic regression for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7183–7192

  10. Danelljan, M., Hager, G., Khan, F. S., & Felsberg, M. (2017). Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8), 1561–1575.

    Article  Google Scholar 

  11. Ding, P., & Song, Y., (2015) Robust object tracking using color and depth images with a depth based occlusion handling and recovery. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 930–935

  12. Feng, Q., Ablavsky, V., Bai, Q., & Sclaroff, S., (2019)Robust visual object tracking with natural language region proposal network. CoRR abs/1912.02048

  13. Feng, Q., Ablavsky, V., Bai, Q., Li, G., & Sclaroff, S.,(2020)Real-time visual object tracking with natural language description. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 700–709

  14. Gao, Y., Li, C., Zhu, Y., Tang, J., He, T., & Wang, F.,(2019) Deep adaptive fusion network for high performance RGBT tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 1–9

  15. Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E., (2018) Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141

  16. Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y., (2018) Acquisition of localization confidence for accurate object detection. In: European Conference on Computer Vision, pp. 784–799

  17. Jung, I., Son, J., Baek, M., & Han, B., (2018) Real-time MDNet. In: European Conference on Computer Vision, pp. 83–98

  18. Kart, U., Kamarainen, J.K., & Matas, J., (2018) How to make an RGBD tracker? In: European Conference on Computer Vision Workshop, pp. 1–15

  19. Kart, U., Kamarainen, J.K., Matas, J., Fan, L., & Cricri, F., (2018) Depth masked discriminative correlation filter. In: International Conference on Pattern Recognition, pp. 2112–2117

  20. Kart, U., Lukezic, A., Kristan, M., Kamarainen, J.K., & Matas, J., (2019) Object tracking by reconstruction with view-specific discriminative correlation filters. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1339–1348

  21. Kim, D. Y., & Jeon, M. (2014). Data fusion of radar and image measurements for multi-object tracking via kalman filtering. Information Fusion, 278(10), 641–652.

    MathSciNet  Google Scholar 

  22. Kristan, M., Matas, J., Leonardis, A., Felsberg, M., & et al., (2019) The seventh visual object tracking VOT2019 challenge results. In: IEEE International Conference on Computer Vision Workshop, pp. 1–36

  23. Lan, X., Ye, M., Zhang, S., & Yuen, P.C., (2018) Robust collaborative discriminative learning for RGB-infrared tracking. In: AAAI Conference on Artificial Intelligence, pp. 1–8

  24. Lan, X., Ye, M., Zhang, S., Zhou, H., & Yuen, P.C., (2018) Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters

  25. Lan, X., Ye, M., Shao, R., & Zhong, B. (2019). Online non-negative multi-modality feature template learning for RGB-assisted infrared tracking. IEEE Access, 7, 67761–67771.

    Article  Google Scholar 

  26. Lan, X., Ye, M., Shao, R., Zhong, B., Yuen, P. C., & Zhou, H. (2019). Learning modality-consistency feature templates: A robust RGB-Infrared tracking system. IEEE Transactions on Industrial Electronics, 66(12), 9887–9897.

    Article  Google Scholar 

  27. Li, Y., & Zhu, J., (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265

  28. Li, C., Hu, S., Gao, S., & Tang, J., (2016) Real-time grayscale-thermal tracking via laplacian sparse representation. In: International Conference on Multimedia Modeling, pp. 54–65

  29. Li, C., Liang, X., Lu, Y., Zhao, N., & Tang, J., (2019) RGB-T object tracking: Benchmark and baseline. Pattern Recognition 96(12), 106,977

  30. Li, C., Liu, L., Lu, A., Ji, Q., & Tang, J., (2020) Challenge-aware RGBT tracking. In: European Conference on Computer Vision, pp. 222–237

  31. Li, C., Lu, A., Zheng, A., Tu, Z., & Tang, J., (2019) Multi-adapter RGBT tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 2262–2270

  32. Li, Z., Tao, R., Gavves, E., Snoek, C.G., & Smeulders, A.W., (2017) Tracking by natural language specification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6495–6503

  33. Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X., High performance visual tracking with siamese region proposal network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980

  34. Li, C., Zhao, N., Lu, Y., Zhu, C., & Tang, J., (2017) Weighted sparse representation regularized graph learning for RGB-T object tracking. In: ACM International Conference on Multimedia, pp. 1856–1864

  35. Li, C., Zhu, C., Huang, Y., Tang, J., & Wang, L., (2018) Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: European Conference on Computer Vision, pp. 808–823

  36. Li, C., Cheng, H., Hu, S., Liu, X., Tang, J., & Lin, L. (2016). Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transaction Image Processing, 25(12), 5743–5756.

    MathSciNet  Article  Google Scholar 

  37. Liu, H., & Sun, F. (2012). Fusion tracking in color and infrared images using joint sparse representation. Information Sciences, 55(3), 590–599.

    MathSciNet  Google Scholar 

  38. Li, C., Wu, X., Zhao, N., Cao, X., & Tang, J. (2018). Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing, 281, 78–85.

    Article  Google Scholar 

  39. Luo, C., Sun, B., Yang, K., Lu, T., & Yeh, W. C. (2019). Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Physics & Technology, 99, 265–276.

    Article  Google Scholar 

  40. Lu, H., & Wang, D. (2019). Online Visual Tracking. Berlin: Springer.

    Book  Google Scholar 

  41. Megherbi, N., Ambellouis, S., Colot, O., & Cabestaing, F., (2005) Joint audio-video people tracking using belief theory. In: IEEE Conference on Advanced Video and Signal based Surveillance, pp. 135–140

  42. Nam, H., & Han, B., (2016) Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302

  43. Ning, J., Yang, J., Jiang, S., Zhang, L., & Yang, M.H., (2016) Object tracking via dual linear structured svm and explicit feature map. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4266–4274

  44. Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., & Yang, M.H., (2019) Learning attribute-specific representations for visual tracking. In: AAAI Conference on Artificial Intelligence, pp. 8835–8842

  45. Ronneberger, O., Fischer, P., & Brox, T.,(2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241

  46. Seunghoon Hong Tackgeun You, S.K., & Han, B., (2015) Online tracking by learning discriminative saliency map with convolutional neural network. pp. 597–606

  47. Simonyan, K., & Zisserman, A., (2015) Very deep convolutional networks for large-scale image recognition. In: IEEE International Conference on Learning Representations, pp. 1–14

  48. Song, X., Zhao, H., Cui, J., Shao, X., Shibasaki, R., & Zha, H. (2013). An online system for multiple interacting targets tracking: Fusion of laser and vision, tracking and learning. ACM Transactions on Intelligent Systems and Technology, 4(1), 1–21.

    Article  Google Scholar 

  49. Voigtlaender, P., Luiten, J., Torr, P.H., & Leibe, B., (2020) Siam R-CNN: Visual tracking by re-detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6578–6588

  50. Wang, N., & Yeung, D.Y., (2013) Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems, pp. 1–9

  51. Wang, C., Xu, C., Cui, Z., Zhou, L., Zhang, T., Zhang, X., & Yang, J., (2020) Cross-modal pattern-propagation for rgb-t tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7064–7073

  52. Wang, Z., Xu, J., Liu, L., Zhu, F., & Shao, L., (2019) RANet: Ranking attention network for fast video object segmentation. In: IEEE International Conference on Computer Vision, pp. 3978–3987

  53. Wang, D., Lu, H., Xiao, Z., & Yang, M. H. (2015). Inverse sparse tracker with a locally weighted distance metric. IEEE Transaction Image Processing, 24(9), 2446–2457.

    MathSciNet  MATH  Google Scholar 

  54. Wang, W., Yan, Y., Winkler, S., & Sebe, N. (2016). Category specific dictionary learning for attribute specific feature selection. IEEE Transaction Image Processing, 25(3), 1465–1478.

    MathSciNet  Article  Google Scholar 

  55. Woo, S., Park, J., Lee, J.Y., & Kweon, I.S.,(2018) CBAM: Convolutional block attention module. In: European Conference on Computer Vision, pp. 3–19

  56. Wu, Y., Blasch, E., Chen, G., Bai, L., & Ling, H., (2011) Multiple source data fusion via sparse representation for robust visual tracking. In: International Conference on Information Fusion, pp. 1–8

  57. Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G., (2020) SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, pp. 12,549–12,556

  58. Yang, Z., Kumar, T., Chen, T., Su, J., & Luo, J. (2020). Grounding-tracking-integration. IEEE Transactions on Circuits and Systems for Video Technology.

  59. Yang, R., Zhu, Y., Wang, X., Li, C., Tang, J., (2019) Learning target-oriented dual attention for robust RGB-T tracking. In: IEEE International Conference on Image Processing, pp. 1–8

  60. Yu, Y., Xiong, Y., Huang, W., & Scott, M.R., (2020) Deformable siamese attention networks for visual object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6728–6737

  61. Zhai, S., Shao, P., Liang, X., & Wang, X. (2019). Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing, 334, 172–181.

    Article  Google Scholar 

  62. Zhang, T., Ghanem, B., Liu, S., & Ahuja, N., (2012) Low-rank sparse learning for robust visual tracking. In: European Conference on Computer Vision, pp. 470–484

  63. Zhang, X., Zhang, X., Du, X., Zhou, X., & Yin, J., (2018) Learning multi-domain convolutional network for RGB-T visual tracking. In: International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, pp. 1–6

  64. Zhang, H., Zhang, L., Zhuo, L., & Zhang, J. (2020). Object tracking in RGB-T videos using modal-aware attention network and competitive learning. Sensors,20(2).

  65. Zhang, P., Zhao, J., Wang, D., Lu, H., & Yang, X., Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Transactions on Image Processing 30, 3335 – 3347

  66. Zhang, Z., & Peng, H., (2019) Deeper and wider siamese networks for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4591–4600

  67. Zhu, Y., Li, C., Lu, Y., Lin, L., Luo, B., & Tang, J., (2018) FANet: Quality-aware feature aggregation network for RGB-T tracking. CoRR abs/1811.09855

  68. Zhu, Y., Li, C., Luo, B., Tang, J., & Wang, X., (2019) Dense feature aggregation and pruning for RGBT tracking. In: ACM International Conference on Multimedia, pp. 465–472

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62022021, Grant 61806037, Grant 61872056, and Grant 61725202; and in part by the Science and Technology Innovation Foundation of Dalian under Grant 2020JJ26GX036; and in part by the Fundamental Research Funds for the Central Universities under Grant DUT21LAB127.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Dong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Dong Xu.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 30142 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, P., Wang, D., Lu, H. et al. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. Int J Comput Vis 129, 2714–2729 (2021). https://doi.org/10.1007/s11263-021-01495-3

Download citation

Keywords

  • Object tracking
  • RGB-T tracking
  • Deep learning