Learning Feature Embeddings for Discriminant Model Based Tracking

Zheng, Linyu; Tang, Ming; Chen, Yingying; Wang, Jinqiao; Lu, Hanqing

doi:10.1007/978-3-030-58555-6_45

Learning Feature Embeddings for Discriminant Model Based Tracking

Linyu Zheng^12,13,
Ming Tang^12,14,
Yingying Chen^12,13,15,
Jinqiao Wang^12,13,15 &
…
Hanqing Lu^12,13

Conference paper
First Online: 16 November 2020

3141 Accesses
36 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12360))

Abstract

After observing that the features used in most online discriminatively trained trackers are not optimal, in this paper, we propose a novel and effective architecture to learn optimal feature embeddings for online discriminative tracking. Our method, called DCFST, integrates the solver of a discriminant model that is differentiable and has a closed-form solution into convolutional neural networks. Then, the resulting network can be trained in an end-to-end way, obtaining optimal feature embeddings for the discriminant model-based tracker. As an instance, we apply the popular ridge regression model in this work to demonstrate the power of DCFST. Extensive experiments on six public benchmarks, OTB2015, NFS, GOT10k, TrackingNet, VOT2018, and VOT2019, show that our approach is efficient and generalizes well to class-agnostic target objects in online tracking, thus achieves state-of-the-art accuracy, while running beyond the real-time speed. Code will be made available.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In this paper, offline training refers to training deep convolutional neural networks, that is the process of learning feature embeddings, whereas discriminative training refers to training discriminant models, such as ridge regression and SVM. In our approach, each iteration of the offline training involves discriminative training.
2.
We state the relationship between DCFST and ATOM in supplementary materials.

References

Bertinetto, L., Henriques, J.F., Torr, P., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HyxnZh0ct7
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Bhat, G., Johnander, J., Danelljan, M., Khan, F.S., Felsberg, M.: Unveiling the power of deep tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 493–509. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_30
Chapter Google Scholar
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)
Google Scholar
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)
Google Scholar
Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_29
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Fan, H., et al.: LaSoT: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)
Google Scholar
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Gundogdu, E., Alatan, A.A.: Good features to correlate for visual tracking. IEEE Trans. Image Process. 27(5), 2526–2540 (2018)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45
Chapter Google Scholar
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)
Article Google Scholar
Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. arXiv preprint arXiv:1810.11981 (2018)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 816–832. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_48
Chapter Google Scholar
Jung, I., Son, J., Baek, M., Han, B.: Real-time MDNet. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 89–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_6
Chapter Google Scholar
Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1125–1134 (2017)
Google Scholar
Kiani Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1135–1143 (2017)
Google Scholar
Kiani Galoogahi, H., Sim, T., Lucey, S.: Correlation filters with limited boundaries. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4630–4638 (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kristan, M., et al.: The sixth visual object tracking VOT2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 3–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_1
Chapter Google Scholar
Kristan, M., et al.: The seventh visual object tracking VOT2019 challenge results (2019)
Google Scholar
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lu, X., Ma, C., Ni, B., Yang, X., Reid, I., Yang, M.-H.: Deep regression tracking with shrinkage loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 369–386. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_22
Chapter Google Scholar
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)
Google Scholar
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Chapter Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)
Google Scholar
Petersen, K.B., Pedersen, M.S., et al.: The matrix cookbook. Tech. Univ. Denmark 7(15), 510 (2008)
Google Scholar
Qi, Y., et al.: Hedged deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)
Google Scholar
Song, Y., et al.: Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8990–8999 (2018)
Google Scholar
Sun, C., Wang, D., Lu, H., Yang, M.H.: Learning spatial-aware regressions for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8962–8970 (2018)
Google Scholar
Tang, M., Yu, B., Zhang, F., Wang, J.: High-speed tracking with multi-kernel correlation filters. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)
Google Scholar
Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000–5008. IEEE (2017)
Google Scholar
Wang, G., Luo, C., Xiong, Z., Zeng, W.: SPM-tracker: series-parallel matching for real-time visual object tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Zheng, L., Chen, Y., Tang, M., Wang, J., Lu, H.: Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401, 36–47 (2020)
Article Google Scholar
Zheng, L., Tang, M., Chen, Y., Wang, J., Lu, H.: Fast-deepKCF without boundary effect. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4020–4029 (2019)
Google Scholar
Zheng, L., Tang, M., Chen, Y., Wang, J., Lu, H.: High-speed and accurate scale estimation for visual tracking with Gaussian process regression. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)
Google Scholar
Zheng, L., Tang, M., Wang, J.: Learning robust Gaussian process regression for visual tracking. In: IJCAI, pp. 1219–1225 (2018)
Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the Research and Development Projects in the Key Areas of Guangdong Province (No. 2020B010165001). This work was also supported by National Natural Science Foundation of China under Grants 61772527, 61976210, 61806200, 61702510 and 61876086.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Linyu Zheng, Ming Tang, Yingying Chen, Jinqiao Wang & Hanqing Lu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Linyu Zheng, Yingying Chen, Jinqiao Wang & Hanqing Lu
Shenzhen Infinova Limited, Shenzhen, China
Ming Tang
ObjectEye Inc., Beijing, China
Yingying Chen & Jinqiao Wang

Authors

Linyu Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ming Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yingying Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jinqiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hanqing Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linyu Zheng .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3452 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, L., Tang, M., Chen, Y., Wang, J., Lu, H. (2020). Learning Feature Embeddings for Discriminant Model Based Tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12360. Springer, Cham. https://doi.org/10.1007/978-3-030-58555-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-58555-6_45
Published: 16 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58554-9
Online ISBN: 978-3-030-58555-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics