Abstract
Anchor-based Siamese trackers have achieved remarkable advancements in accuracy, yet the further improvement is restricted by the lagged tracking robustness. We find the underlying reason is that the regression network in anchor-based methods is only trained on the positive anchor boxes (i.e., \(IoU\ge 0.6\)). This mechanism makes it difficult to refine the anchors whose overlap with the target objects are small. In this paper, we propose a novel object-aware anchor-free network to address this issue. First, instead of refining the reference anchor boxes, we directly predict the position and scale of target objects in an anchor-free fashion. Since each pixel in groundtruth boxes is well trained, the tracker is capable of rectifying inexact predictions of target objects during inference. Second, we introduce a feature alignment module to learn an object-aware feature from predicted bounding boxes. The object-aware feature can further contribute to the classification of target objects and background. Moreover, we present a novel tracking framework based on the anchor-free model. The experiments show that our anchor-free tracker achieves state-of-the-art performance on five benchmarks, including VOT-2018, VOT-2019, OTB-100, GOT-10k and LaSOT. The source code is available at https://github.com/researchmm/TracKit.
Z. Zhang—Work performed when Zhipeng was an intern of Microsoft Research.
Z. Zhang, B. Li, W. Hu are with the Institution of Automation, Chinese Academy of Sciences (CASIA) and School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS) and CAS Center for Excellence in Brain Science and Intelligence Technology (CEBSIT).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV, pp. 6182–6191 (2019)
Chen, L.C., et al.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. TPAMI 40(4), 834–848 (2017)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: CVPR, pp. 4660–4669 (2019)
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR, pp. 6931–6939 (2017)
De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: ICCV, pp. 6569–6578 (2019)
Fan, H., Lin, L., Yang, F., et al.: LaSOT: A high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5374–5383 (2019)
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR. pp, 7952–7961 (2019)
Gao, J., et al.: Graph convolutional tracking. In: CVPR, pp. 4649–4659 (2019)
Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: ICCV, pp. 1763–1771 (2017)
He, K., Gkioxari, G., et al.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. TPAMI 1(1), 1–17 (2019)
Jung, I., Son, J., et al.: Real-Time MDNet. In: ECCV, pp. 83–98 (2018)
Kiani Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: ICCV, pp. 1135–1143 (2017)
Kristan, M., Leonardis, A., Matas, et al.: The sixth visual object tracking vot2018 challenge results. In: ECCVW, pp. 1–52 (2018)
Kristan, M., Matas, et al.: The seventh visual object tracking vot2019 challenge results. In: ICCVW, pp. 1–36 (2019)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: ECCV, pp. 734–750 (2018)
LeCun, Y., Boser, B., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)
Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: CVPR, pp. 4904–4913 (2018)
Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: GradNet: gradient-guided network for visual object tracking. In: ICCV, pp. 6162–6171 (2019)
Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A., Hengel, A.V.D.: A survey of appearance models in visual object tracking. TIST 4(4), 1–48 (2013)
Lin, T.S., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2016)
Park, E., Berg, A.C.: Meta-tracker: fast and robust online adaptation for visual object trackers. In: ECCV, pp. 569–585 (2018)
Real, E., Shlens, J., et al.: Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: CVPR, pp. 7464–7473 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. TPAMI 36(7), 1442–1468 (2013)
Song, Y., et al.: VITAL: visual tracking via adversarial learning. In: CVPR, pp. 8990–8999 (2018)
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV, pp. 9627–9636 (2019)
Tripathi, A.S., Danelljan, M., Van Gool, L., Timofte, R.: Tracking the known and the unknown by leveraging semantic information. In: BMVC, pp. 1192–1198 (2019)
Valmadre, J., Bertinetto, L., Henriques, et al.: End-to-end representation learning for correlation filter based tracking. In: CVPR, pp. 2805–2813 (2017)
Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam R-CNN: visual tracking by re-detection. arXiv preprint arXiv:1911.12836 (2019)
Vu, T., Jang, H., Pham, T.X., Yoo, C.: Cascade RPN: delving into high-quality region proposal network with adaptive convolution. In: NIPS, pp. 1430–1440 (2019)
Wang, G., et al.: SPM-tracker: series-parallel matching for real-time visual object tracking. In: CVPR, pp. 3643–3652 (2019)
Wang, N., Song, Y., Ma, C., Zhou, W., Liu, W., Li, H.: Unsupervised deep tracking. In: CVPR, pp. 1308–1317 (2019)
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, pp. 1328–1338 (2019)
Wu, Y., Lim, et al.: Object tracking benchmark. TPAMI, 37(9), 1834–1848 (2015)
Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. TIP 28(11), 5596–5609 (2019)
Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: ECCV, pp. 152–167 (2018)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: ACM MM, pp. 516–520 (2016)
Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR, pp. 7151–7160 (2018)
Zhang, K., Zhang, L., Yang, M.H.: Fast compressive tracking. TPAMI 36(10), 2002–2015 (2014)
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: CVPR, pp. 4591–4600 (2019)
Acknowledgement
This work is supported by the Natural Science Foundation of China (Grant No. U1636218, 61751212, 61721004, 61902401, 61972071), Natural Science Foundation of Beijing (Grant No.L172051), the CAS Key Research Program of Frontier Sciences (Grant No. QYZDJ-SSW-JSC040), and the Natural Science Foundation of Guangdong (No. 2018B030311046).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W. (2020). Ocean: Object-Aware Anchor-Free Tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-58589-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)