Abstract
Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs based on different residual neural networks (ResNets) has not previously been investigated. This paper aims to evaluate the performance of 12 state-of-the-art ResNet-based FENs in a DCF-based framework to determine the best for visual tracking purposes. First, it ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method. Then, the proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation in the learning process of continuous convolution filters. Finally, it introduces a new and efficient semantic weighting method (using semantic segmentation feature maps on each video frame) to reduce the drift problem. Extensive experimental results on the well-known OTB-2013, OTB-2015, TC-128, UAV-123 and VOT-2018 visual tracking datasets demonstrate that the proposed method effectively outperforms state-of-the-art methods in terms of precision and robustness of visual tracking.
Similar content being viewed by others
References
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In Proceedings of the IEEE CVPR, pp. 1401–1409 (2016)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In Proceedings of the ECCV, pp. 850–865 (2016)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2010)
Čehovin, L.: TraX: the visual tracking exchange protocol and library. Neurocomputing 260, 5–8 (2017)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of the BMVC, pp. 1–11 (2014)
Che, M., Wang, R., Lu, Y., Li, Y., Zhi, H., Xiong, C.: Channel pruning for visual tracking. In Proceedings of the ECCVW, pp. 70–82 (2019)
Chen, Z., Liu, P., Du, Y., et al.: Long-term correlation tracking via spatial-temporal context. Vis. Comput. 36, 425–442 (2020). https://doi.org/10.1007/s00371-019-01631-8
Chen, Z., Liu, P., Yongzhao, D., Luo, Y., Guo, J.-M..: Robust visual tracking using self-adaptive strategy. Multimed. Tools Appl. (2019)
Chi, Z., Li, H., Huchuan, L., Yang, M.H.: Dual deep network for visual tracking. IEEE Trans. Image Process. 26(4), 2005–2015 (2017)
Choi, J., Chang, H.J., Fischer, T., Yun, S., Lee, K., Jeong, J., Demiris, Y., Choi, J.Y.: Context-aware deep feature compression for high-speed visual tracking. In Proceedings of the IEEE CVPR, pp. 479–488 (2018)
Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In Proceedings of the CVPR, pp. 4670–4679 (2019)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In Proceedings of the IEEE CVPR, pp. 886–893 (2005)
Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In Proceedings of the IEEE CVPR, pp. 1430–1438 (2016)
Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE ICCV, pp. 4310–4318 (2015)
Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In Proceedings of the IEEE ICCVW, pp. 621–629 (2016)
Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In Proceedings of the ECCV, volume 9909 LNCS, pp. 472–488 (2016)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In Proceedings of the IEEE CVPR, pp. 6931–6939 (2017)
Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Discriminative scale space tracking. IEEE Trans. Pattern Anal. Mach. Intell. 39(8), 1561–1575 (2017)
Fei, D., Liu, P., Zhao, W., Tang, X.: Spatial-temporal adaptive feature weighted correlation filter for visual tracking. Signal Proc. Image Comm. 67, 58–70 (2018)
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111(1), 98–136 (2015)
Fan, H., Ling, H.: Parallel tracking and verifying. IEEE Trans. Image Process. 28(8), 4130–4144 (2019)
Fan, H., Ling, H.: Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In Proceedings of the IEEE ICCV, pp. 5487–5495 (2017)
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking (2018) arXiv:1812.06148
Galoogahi, H.K., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In Proceedings of the IEEE ICCV, pp. 1144–1152 (2017)
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In Proceedings of the CVPR, pp. 4649–4659 (2019)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE CVPR, pp. 580–587 (2014)
Gladh, S., Danelljan, M., Khan, F.S., Felsberg, M.: Deep motion features for visual tracking. In Proceedings of the ICPR, pp. 1243–1248 (2016)
Gundogdu, E., Alatan, A.A.: Good features to correlate for visual tracking. IEEE Trans. Image Process. 27(5), 2526–2540 (2018)
Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In Proceedings of the IEEE ICCV, pp. 1781–1789 (2017)
Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.S.: Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)
He, A., Luo, C., Tian, X., Zeng, W.: A twofold Siamese network for real-time object tracking. In Proceedings of the IEEE CVPR, pp. 4834–4843 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE CVPR, pp. 770–778 (2016)
He, Z., Fan, Y., Zhuang, J., Dong, Y., Bai, H.: Correlation filters with weighted convolution responses. In Proceedings of the ICCVW, pp. 1992–2000 (2018)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the ECCV, pp. 702–715 (2012)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In Proceedings of the ICML, pp. 597–606 (2015)
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: MUlti-Store Tracker (MUSTer): a cognitive psychology inspired approach to object tracking. In Proceedings of the IEEE CVPR, pp. 749–758 (2015)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In Proceedings of the IEEE CVPR, pp. 7132–7141 (2018)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2913372
Huang, G., Liu, Z., Maaten, L.v.d., Weinberger, K.Q.: Densely connected convolutional networks. In Proceedings of the IEEE CVPR, pp. 2261–2269 (2017)
Huang, Y., Zhao, Z., Wu, B., Mei, Z., Cui, Z., Gao, G.: Visual object tracking with discriminative correlation filtering and hybrid color feature. Multimed. Tools Appl. (2019)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R. et al.: The sixth visual object tracking vot2018 challenge results. In Proceedings of the ECCVW, pp. 3–53 (2019)
Kuai, Y., Wen, G., Li, D.: Learning adaptively windowed correlation filters for robust tracking. J. Vis. Commun. Image R. 51, 104–111 (2018)
Lee, H., Choi, S., Kim, C.: A memory model based on the Siamese network for long-term tracking. In Proceedings of the ECCVW, pp. 100–115 (2019)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE CVPR, pp. 8971–8980 (2018)
Li, D., Wen, G., Kuai, Y., Xiao, J., Porikli, F.: Learning target-aware correlation filters for visual tracking. J. Vis. Commun. Image R. 58, 149–159 (2019)
Li, Feng, Tian, Cheng, Zuo, Wangmeng, Zhang, Lei, Yang, Ming Hsuan: Learning spatial-temporal regularized correlation filters for visual tracking. In: Proc. IEEE CVPR, pp. 4904–4913 (2018b)
Li, F., Yao, Y., Li, P., Zhang, D., Zuo, W., Yang, M.H.: Integrating boundary and center correlation filters for visual tracking with aspect ratio variation. In Proceedings of the IEEE ICCVW, pp. 2001–2009 (2018)
Li, P., Wang, D., Wang, L., Huchuan, L.: Deep visual tracking: review and experimental comparison. Pattern Recognit. 76, 323–338 (2018)
Li, S., Zhao, S., Cheng, B., Zhao, E., Chen, J.: Robust visual tracking via hierarchical particle filter and ensemble deep features. IEEE Trans. Circuits Syst. Video Technol. (2018)
Li, X., Ma, C., Wu, B., He, Z., Yang, M.-H.: Target-aware deep tracking (2019). arXiv:1904.01772
Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the ECCVW, pp. 254–265 (2015)
Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process. 24(12), 5630–5644 (2015)
Liang, Y., Li, K., Zhang, J., Wang, M., Lin, C.: Robust visual tracking via identifying multi-scale patches. Multimed. Tools Appl. 78(11), 14195–14230 (2019)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. Proceedings of the ECCV, pp. 740–755 (2014)
Lin, Z., Yuan, C.: Robust visual tracking in low-resolution sequence. Proceedings of the ICIP, pp. 4103–4107 (2018)
Liu, J., Luo, Z., Xiong, X.: An improved correlation filter tracking method with occlusion and drift handling. Vis. Comput. (2019). https://doi.org/10.1007/s00371-019-01776-6
Liu, M., Jin, C.B., Yang, B., Cui, X., Kim, H.: Occlusion-robust object tracking based on the confidence of online selected hierarchical features. IET Image Proc. 12(11), 2023–2029 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In Proceedings of the ECCV, pp. 21–37 (2016)
Lugmayr, A., Danelljan, M., Timofte, R.: NTIRE 2020 challenge on real-world image super-resolution: methods and results. In Proceedings of the IEEE CVPRW (2020)
Lukežič, A., Vojíř, T., Zajc, L., Matas, J., Kristan, M.: Discriminative correlation filter tracker with channel and spatial reliability. IJCV 126(7), 671–688 (2018)
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE ICCV, pp. 3074–3082 (2015)
Ma, C., Yi, X., Ni, B., Yang, X.: When correlation filters meet convolutional neural networks for visual tracking. IEEE Signal Process. Lett. 23(10), 1454–1458 (2016)
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Robust visual tracking via hierarchical convolutional features. IEEE Trans. Pattern Anal. Mach, Intell. (2018)
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Adaptive correlation filters with long-term and short-term memory for object tracking. IJCV 126(8), 771–796 (2018)
Marvasti-Zadeh, S.M., Cheng, L., Ghanei-Yakhdan, H., Kasaei, S.: Deep learning for visual tacking: A comprehensive survey. In: IEEE Trans. Intell. Transp. Syst. (2021). https://doi.org/10.1109/TITS.2020.3046478. arXiv:1912.00535
Marvasti-Zadeh, S.M., Ghanei-Yakhdan, H., Kasaei, S.: Rotation-aware discriminative scale space tracking. In Iranian Conference on Electrical Engineering (ICEE), pp. 1272–1276 (2019)
Marvasti-Zadeh, S.M., Khaghani, J., Ghanei-Yakhdan, H., Kasaei, S., Cheng, L.: COMET: context-aware IoU-Guided network for small object tracking. In: Ishikawa, H., Liu, C.L., Pajdla, T., Shi, J. (eds) Computer Vision–ACCV 2020. ACCV 2020. Lecture Notes in Computer Science, vol 12623. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_36
Mozhdehi, R.J., Medeiros, H.: Deep convolutional particle filter for visual tracking. In Proceedings of the IEEE ICIP, pp. 3650–3654 (2017)
Mozhdehi, R.J., Reznichenko, Y., Siddique, A., Medeiros, H.: Deep convolutional particle filter with adaptive correlation maps for visual tracking. In Proceedings of the ICIP, pp. 798–802 (2018)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In Proceedings of the ECCV, pp. 445–461 (2016)
Nah, S., Son, S., Timofte, R., Lee, K.M.: NTIRE 2020 challenge on image and video deblurring. In Proceedings of the IEEE CVPRW (2020)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE CVPR, pp. 4293–4302 (2016)
Pu, S., Song, Y., Ma, C., Zhang, H., Yang, M.H.: Deep attentive tracking via reciprocative learning. In Proceedings of the NIPS, pp. 1931–1941 (2018)
Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., Yang, M.H.: Hedged deep tracking. In Proceedings of the IEEE CVPR, pp. 4303–4311 (2016)
Rout, L., Mishra, D., Sai Subrahmanyam Gorthi, Rama Krishna: WAEF: Weighted aggregation with enhancement filter for visual object tracking. In Proceedings of the ECCVW, pp. 83–99 (2019)
Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In Proceedings of the ICLR, pp. 1–14 (2014)
Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R.W.H., Yang, M.H.: CREST: convolutional residual learning for visual tracking. In Proceedings of the ICCV, pp. 2574–2583 (2017)
Sun, C., Wang, D., Lu, H., Yang, M.: Learning spatial-aware regressions for visual tracking. In Proceedings of the IEEE CVPR, pp. 8962–8970 (2018)
Sun, C., Wang, D., Lu, H., Yang, M.H.: Correlation tracking via joint discrimination and reliability learning. In Proceedings of the IEEE CVPR, pp. 489–497 (2018)
Sun, Y., Sun, C., Wang, D., He, Y., Lu, H.: ROI pooled correlation filters for visual tracking. In Proceedings of the CVPR, pp. 5783–5791 (2019)
Tang, F., Xiankai, L., Zhang, X., Shiqiang, H., Zhang, H.: Deep feature tracking based on interactive multiple model. Neurocomputing 333, 29–40 (2019)
Tong, K., Yiquan, W., Zhou, F.: Recent advances in small object detection based on deep learning: a review. Image Vis. Comput. 97 (2020)
Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.S.: End-to-end representation learning for correlation filter based tracking. In Proceedings of the IEEE CVPR, pp. 5000–5008 (2017)
Van De Weijer, J., Schmid, C., Verbeek, J.: Learning color names from real-world images. In Proceedings of the IEEE CVPR, pp. 1–8 (2007)
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In Proceedings of the IEEE ICCV, pp. 3119–3127 (2015)
Wang, M., Liu, Y., Huang, Z.: Large margin object tracking with circulant feature maps. In Proceedings of the IEEE CVPR, pp. 4800–4808 (2017)
Wang, N., Zhou, W., Tian, Q., Hong, R., Wang, M., Li, H.: Multi-cue correlation filters for robust visual tracking. In Proceedings of the IEEE CVPR, pp. 4844–4853 (2018)
Wang, Q., Gao, J., Xing, J., Zhang, M., Hu, W.: DCFNet: discriminant correlation filters network for visual tracking (2017). arXiv:1704.04057
Wang, X., Li, H., Li, Y., Porikli, F., Wang, M.: Deep tracking with objectness. In Proceedings of the ICIP, pp. 660–664 (2018)
Wang, Y., Luo, X., Ding, L., Wu, J., Fu, S.: Robust visual tracking via a hybrid correlation filter. Multimed. Tools Appl. (2019)
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In Proceedings of the IEEE CVPR, pp. 2411–2418 (2013)
Yi, W., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE CVPR, pp. 5987–5995 (2017)
Yang, X., Zhu, S., Xia, S., et al.: A new TLD target tracking method based on improved correlation filter and adaptive scale. Vis. Comput. (2019). https://doi.org/10.1007/s00371-019-01772-w
Yi, Y., Luo, L., Zheng, Z.: Single online visual object tracking with enhanced tracking and detection learning. Multimed. Tools Appl. 78(9), 12333–12351 (2019)
Yuan, Di., Zhang, X., Liu, J., Li, D.: A multiple feature fused model for visual object tracking via correlation filters. Multimed. Tools Appl. (2019)
Zhang, J., Ma, S., Sclaroff, S.: MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the ECCV), pp. 188–203 (2014)
Zhang, P., Zhuo, T., Huang, W., Chen, K., Kankanhalli, M.: Online object tracking based on CNN with spatial-temporal saliency guided sampling. Neurocomputing 257, 115–127 (2017)
Zhang, T., Xu, C., Yang, M.H.: Multi-task correlation particle filter for robust object tracking. In Proceedings of the IEEE CVPR, pp. 4819–4827 (2017)
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking (2019). arXiv:1901.01660
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In Proceedings of the IEEE ICCV, pp. 1529–1537 (2015)
Zhu, Z., Huang, G., Zou, W., Du, D., Huang, C.: UCT: learning unified convolutional networks for real-time visual tracking. In Proceedings of the ICCVW, pp. 1973–1982 (2018)
Acknowledgements
This work was partly supported by a grant (No. 96013046) from Iran National Science Foundation (INSF).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Marvasti-Zadeh, S.M., Ghanei-Yakhdan, H., Kasaei, S. et al. Effective fusion of deep multitasking representations for robust visual tracking. Vis Comput 38, 4397–4417 (2022). https://doi.org/10.1007/s00371-021-02304-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02304-1