Learning Regression and Verification Networks for Robust Long-term Tracking

Abstract

This paper proposes a new visual tracking algorithm, which leverages the merits of both template matching approaches and classification models for long-term object detection and tracking. To this end, a regression network is learned offline to detect a set of target candidates through target template matching. To cope with target appearance variations in long-term scenarios, a target-aware feature fusion mechanism is also developed, giving rise to more effective template matching. Meanwhile, a verification network is trained online to better capture target appearance and identify the target from potential candidates. During online update, contaminated training samples can be filtered out through a monitoring module, alleviating model degeneration caused by error accumulation. The regression and verification networks operate in a cascaded manner, which allows tracking to be performed in a coarse-to-fine manner and enforces the discriminative power. To further address the target reappearance issues in long-term tracking, a learning-based switching scheme is proposed, which learns to switch the tracking mode between local and global search based on the tracking results. Extensive evaluations on long-term tracking in the wild have been conducted. We achieve state-of-the-art performance on the OxUvA long-term tracking dataset. Our submission based on the proposed method has also won the 1st place of the long-term tracking challenge in VOT-2018 competition.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, GS., Davis, A., Dean, J., & Devin, M. (2016). Tensorflow: Large scale machine learning on heterogeneous distributed systems. In arXiv preprintarXiv:1603.04467

  2. Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., & Torr, P.H.S. (2016a). Staple: Complementary learners for real-time tracking. In CVPR.

  3. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., & Torr, P.H.S. (2016b). Fully-convolutional siamese networks for object tracking. In ECCV Workshop.

  4. Danelljan, M., Gustav, H., Khan, F., & Felsberg, M. (2015a). Learning spatially regularized correlation filters for visual tracking. In ICCV.

  5. Danelljan, M., Häger, G., Khan, F.S., & Felsberg, M. (2015b). Convolutional features for correlation filter based visual tracking. In ICCV Workshop.

  6. Danelljan, M., Robinson, A., Khan, F.S., & Felsberg, M. (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking. In ECCV.

  7. Danelljan, M., Bhat, G., Khan, F.S., & Felsberg, M. (2017). ECO: Efficient convolution operators for tracking. In CVPR, 2017.

  8. Fan, Heng., & Ling, Haibin. (2017) Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In ICCV.

  9. Galoogahi, H.K., Fagg, A., & Lucey, S. (2017). Learning background-aware correlation filters for visual tracking. In ICCV.

  10. Han, B., Sim, J., & Adam, H. (2017). Branchout: Regularization for online ensemble tracking with convolutional neural networks. In CVPR.

  11. He, A., Luo, C., Tian, X., Zeng, W. (2018). A twofold siamese network for real-time object tracking. In CVPR.

  12. Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., & Tao, D. (2015). Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking. In CVPR.

  13. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. In arXiv preprint arXiv:1704.04861.

  14. Jung, I., Son, J., Baek, M., & Han, B. (2018). Real–Time MDNet. In ECCV.

  15. Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7), 1409–1422.

  16. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L., Vojir, T., & Hager, G. (2017). The visual object tracking vot2017 challenge results. In ICCV Workshop.

  17. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L., Vojir, T. & Hager, G. (2018). The sixth visual object tracking vot2018 challenge results. In ECCV Workshop.

  18. Li, Bo., Yan, Junjie., Wu, Wei., Zhu, Zheng., & Hu, Xiaolin. (2018). High performance visual tracking with siamese region proposal network. In CVPR.

  19. Lin, T.-Y., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., et al. (2014). Zitnick (p. 2014). Microsoft coco: Common objects in context. In ECCV.

  20. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., & Berg, A.C. (2016). SSD: Single shot multibox detector. In ECCV, 2016.

  21. Lukežič, A., Zajc, L., Vojíř, T., Matas, J., & Kristan, M. (2017). FCLT—a fully-correlational long-term tracker. In arXiv preprint arXiv:1711.09594.

  22. Lukežič, A., Zajc, L., Vojíř, L., Matas, J., & Kristan, M. (2018). Now you see me: evaluating performance in long-term visual tracking. In arXiv preprint arXiv:1804.07056.

  23. Ma, C., Yang, X., Zhang, C., & Yang, M.H. (2015). Long-term correlation tracking. In CVPR.

  24. Nam, H., & Han, B. (2016). Learning multi–domain convolutional neural networks for visual tracking. In CVPR.

  25. Nebehay, G., & Pflugfelder, R. (2015). Clustering of static-adaptive correspondences for deformable object tracking. In CVPR.

  26. Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In CVPR.

  27. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2014). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    MathSciNet  Article  Google Scholar 

  28. Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    MathSciNet  Article  Google Scholar 

  29. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Computer Science

  30. Sun, C., Wang, D., Lu, H., & Yang, M.H. (2018a). Correlation tracking via joint discrimination and reliability learning. In CVPR.

  31. Sun, C., Wang, D., Lu, H., & Yang, M.H. (2018b). Learning spatial-aware regressions for visual tracking. In CVPR.

  32. Supancic, III S.J., & Ramanan, D. (2017). Tracking as online decision-making: Learning a policy from streaming videos with reinforcement learning. In ICCV.

  33. Tao, R., Gavves, E., Smeulders, A.W.M. (2016). Siamese instance search for tracking. In CVPR.

  34. Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A.W.M., Torr, P.H.S., & Gavves, E. (2018). Long-term tracking in the wild: a benchmark. In ECCV, 2018.

  35. Wang, L., Ouyang, W., Wang, X., & Lu, H. (2015). Visual tracking with fully convolutional networks. In ICCV.

  36. Wang, L., Ouyang, W., Wang, X., & Lu, H. (2016). Stct: Sequentially training convolutional networks for visual tracking. In CVPR.

  37. Wu, Y., Lim, J., & Yang, M.H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1834–1848.

  38. Yan, B., Zhao, H., Wang, D., Lu. H., & Yang, X. (2019). ’Skimming-Perusal’Tracking: A Framework for Real–Time and Robust Longterm Tracking. In ICCV.

  39. Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2017). Random erasing data augmentation. In arXiv preprint arXiv:1708.04896.

  40. Zhu, G., Porikli, F., & Li, H. (2016). Beyond local search: Tracking objects everywhere with instance-specific proposals. In CVPR.

  41. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. (2018a). Distractor-aware siamese networks for visual object tracking. In ECCV.

  42. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. (2018b). Distractor-aware siamese networks for visual object tracking. In CVPR.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lijun Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Mei Chen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Wang, L., Wang, D. et al. Learning Regression and Verification Networks for Robust Long-term Tracking. Int J Comput Vis 129, 2536–2547 (2021). https://doi.org/10.1007/s11263-021-01487-3

Download citation

Keywords

  • Long-term visual tracking
  • Regression network
  • Classification network