Skip to main content

Learning Feature Embeddings for Discriminant Model Based Tracking

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12360))

Abstract

After observing that the features used in most online discriminatively trained trackers are not optimal, in this paper, we propose a novel and effective architecture to learn optimal feature embeddings for online discriminative tracking. Our method, called DCFST, integrates the solver of a discriminant model that is differentiable and has a closed-form solution into convolutional neural networks. Then, the resulting network can be trained in an end-to-end way, obtaining optimal feature embeddings for the discriminant model-based tracker. As an instance, we apply the popular ridge regression model in this work to demonstrate the power of DCFST. Extensive experiments on six public benchmarks, OTB2015, NFS, GOT10k, TrackingNet, VOT2018, and VOT2019, show that our approach is efficient and generalizes well to class-agnostic target objects in online tracking, thus achieves state-of-the-art accuracy, while running beyond the real-time speed. Code will be made available.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In this paper, offline training refers to training deep convolutional neural networks, that is the process of learning feature embeddings, whereas discriminative training refers to training discriminant models, such as ridge regression and SVM. In our approach, each iteration of the offline training involves discriminative training.

  2. 2.

    We state the relationship between DCFST and ATOM in supplementary materials.

References

  1. Bertinetto, L., Henriques, J.F., Torr, P., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HyxnZh0ct7

  2. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  3. Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: The IEEE International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  4. Bhat, G., Johnander, J., Danelljan, M., Khan, F.S., Felsberg, M.: Unveiling the power of deep tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 493–509. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_30

    Chapter  Google Scholar 

  5. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)

    Google Scholar 

  6. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  7. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)

    Google Scholar 

  8. Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_29

    Chapter  Google Scholar 

  9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  10. Fan, H., et al.: LaSoT: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)

    Google Scholar 

  11. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  12. Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  13. Gundogdu, E., Alatan, A.A.: Good features to correlate for visual tracking. IEEE Trans. Image Process. 27(5), 2526–2540 (2018)

    Article  MathSciNet  Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  15. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45

    Chapter  Google Scholar 

  16. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)

    Article  Google Scholar 

  17. Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. arXiv preprint arXiv:1810.11981 (2018)

  18. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

  19. Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 816–832. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_48

    Chapter  Google Scholar 

  20. Jung, I., Son, J., Baek, M., Han, B.: Real-time MDNet. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 89–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_6

    Chapter  Google Scholar 

  21. Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1125–1134 (2017)

    Google Scholar 

  22. Kiani Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1135–1143 (2017)

    Google Scholar 

  23. Kiani Galoogahi, H., Sim, T., Lucey, S.: Correlation filters with limited boundaries. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4630–4638 (2015)

    Google Scholar 

  24. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  25. Kristan, M., et al.: The sixth visual object tracking VOT2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 3–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_1

    Chapter  Google Scholar 

  26. Kristan, M., et al.: The seventh visual object tracking VOT2019 challenge results (2019)

    Google Scholar 

  27. Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  28. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  29. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

    Google Scholar 

  30. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  31. Lu, X., Ma, C., Ni, B., Yang, X., Reid, I., Yang, M.-H.: Deep regression tracking with shrinkage loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 369–386. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_22

    Chapter  Google Scholar 

  32. Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)

    Google Scholar 

  33. Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19

    Chapter  Google Scholar 

  34. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)

    Google Scholar 

  35. Petersen, K.B., Pedersen, M.S., et al.: The matrix cookbook. Tech. Univ. Denmark 7(15), 510 (2008)

    Google Scholar 

  36. Qi, Y., et al.: Hedged deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)

    Google Scholar 

  37. Song, Y., et al.: Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8990–8999 (2018)

    Google Scholar 

  38. Sun, C., Wang, D., Lu, H., Yang, M.H.: Learning spatial-aware regressions for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8962–8970 (2018)

    Google Scholar 

  39. Tang, M., Yu, B., Zhang, F., Wang, J.: High-speed tracking with multi-kernel correlation filters. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  40. Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)

    Google Scholar 

  41. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000–5008. IEEE (2017)

    Google Scholar 

  42. Wang, G., Luo, C., Xiong, Z., Zeng, W.: SPM-tracker: series-parallel matching for real-time visual object tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  43. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  44. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  45. Zheng, L., Chen, Y., Tang, M., Wang, J., Lu, H.: Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401, 36–47 (2020)

    Article  Google Scholar 

  46. Zheng, L., Tang, M., Chen, Y., Wang, J., Lu, H.: Fast-deepKCF without boundary effect. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4020–4029 (2019)

    Google Scholar 

  47. Zheng, L., Tang, M., Chen, Y., Wang, J., Lu, H.: High-speed and accurate scale estimation for visual tracking with Gaussian process regression. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)

    Google Scholar 

  48. Zheng, L., Tang, M., Wang, J.: Learning robust Gaussian process regression for visual tracking. In: IJCAI, pp. 1219–1225 (2018)

    Google Scholar 

  49. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported by the Research and Development Projects in the Key Areas of Guangdong Province (No. 2020B010165001). This work was also supported by National Natural Science Foundation of China under Grants 61772527, 61976210, 61806200, 61702510 and 61876086.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linyu Zheng .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3452 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, L., Tang, M., Chen, Y., Wang, J., Lu, H. (2020). Learning Feature Embeddings for Discriminant Model Based Tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12360. Springer, Cham. https://doi.org/10.1007/978-3-030-58555-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58555-6_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58554-9

  • Online ISBN: 978-3-030-58555-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics