Ocean: Object-Aware Anchor-Free Tracking

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12366)


Anchor-based Siamese trackers have achieved remarkable advancements in accuracy, yet the further improvement is restricted by the lagged tracking robustness. We find the underlying reason is that the regression network in anchor-based methods is only trained on the positive anchor boxes (i.e., \(IoU\ge 0.6\)). This mechanism makes it difficult to refine the anchors whose overlap with the target objects are small. In this paper, we propose a novel object-aware anchor-free network to address this issue. First, instead of refining the reference anchor boxes, we directly predict the position and scale of target objects in an anchor-free fashion. Since each pixel in groundtruth boxes is well trained, the tracker is capable of rectifying inexact predictions of target objects during inference. Second, we introduce a feature alignment module to learn an object-aware feature from predicted bounding boxes. The object-aware feature can further contribute to the classification of target objects and background. Moreover, we present a novel tracking framework based on the anchor-free model. The experiments show that our anchor-free tracker achieves state-of-the-art performance on five benchmarks, including VOT-2018, VOT-2019, OTB-100, GOT-10k and LaSOT. The source code is available at


Visual tracking Anchor-free Object-aware 



This work is supported by the Natural Science Foundation of China (Grant No. U1636218, 61751212, 61721004, 61902401, 61972071), Natural Science Foundation of Beijing (Grant No.L172051), the CAS Key Research Program of Frontier Sciences (Grant No. QYZDJ-SSW-JSC040), and the Natural Science Foundation of Guangdong (No. 2018B030311046).

Supplementary material

504479_1_En_46_MOESM1_ESM.pdf (1.3 mb)
Supplementary material 1 (pdf 1363 KB)


  1. 1.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). Scholar
  2. 2.
    Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV, pp. 6182–6191 (2019)Google Scholar
  3. 3.
    Chen, L.C., et al.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. TPAMI 40(4), 834–848 (2017)CrossRefGoogle Scholar
  4. 4.
    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: CVPR, pp. 4660–4669 (2019)Google Scholar
  5. 5.
    Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR, pp. 6931–6939 (2017)Google Scholar
  6. 6.
    De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: ICCV, pp. 6569–6578 (2019)Google Scholar
  8. 8.
    Fan, H., Lin, L., Yang, F., et al.: LaSOT: A high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5374–5383 (2019)Google Scholar
  9. 9.
    Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR. pp, 7952–7961 (2019)Google Scholar
  10. 10.
    Gao, J., et al.: Graph convolutional tracking. In: CVPR, pp. 4649–4659 (2019)Google Scholar
  11. 11.
    Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: ICCV, pp. 1763–1771 (2017)Google Scholar
  12. 12.
    He, K., Gkioxari, G., et al.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  14. 14.
    Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. TPAMI 1(1), 1–17 (2019)Google Scholar
  15. 15.
    Jung, I., Son, J., et al.: Real-Time MDNet. In: ECCV, pp. 83–98 (2018)Google Scholar
  16. 16.
    Kiani Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: ICCV, pp. 1135–1143 (2017)Google Scholar
  17. 17.
    Kristan, M., Leonardis, A., Matas, et al.: The sixth visual object tracking vot2018 challenge results. In: ECCVW, pp. 1–52 (2018)Google Scholar
  18. 18.
    Kristan, M., Matas, et al.: The seventh visual object tracking vot2019 challenge results. In: ICCVW, pp. 1–36 (2019)Google Scholar
  19. 19.
    Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: ECCV, pp. 734–750 (2018)Google Scholar
  20. 20.
    LeCun, Y., Boser, B., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  21. 21.
    Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)Google Scholar
  22. 22.
    Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)Google Scholar
  23. 23.
    Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: CVPR, pp. 4904–4913 (2018)Google Scholar
  24. 24.
    Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: GradNet: gradient-guided network for visual object tracking. In: ICCV, pp. 6162–6171 (2019)Google Scholar
  25. 25.
    Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A., Hengel, A.V.D.: A survey of appearance models in visual object tracking. TIST 4(4), 1–48 (2013)CrossRefGoogle Scholar
  26. 26.
    Lin, T.S., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  27. 27.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2016)Google Scholar
  28. 28.
    Park, E., Berg, A.C.: Meta-tracker: fast and robust online adaptation for visual object trackers. In: ECCV, pp. 569–585 (2018)Google Scholar
  29. 29.
    Real, E., Shlens, J., et al.: Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: CVPR, pp. 7464–7473 (2017)Google Scholar
  30. 30.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)Google Scholar
  31. 31.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  32. 32.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). Scholar
  33. 33.
    Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. TPAMI 36(7), 1442–1468 (2013)Google Scholar
  34. 34.
    Song, Y., et al.: VITAL: visual tracking via adversarial learning. In: CVPR, pp. 8990–8999 (2018)Google Scholar
  35. 35.
    Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)Google Scholar
  36. 36.
    Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV, pp. 9627–9636 (2019)Google Scholar
  37. 37.
    Tripathi, A.S., Danelljan, M., Van Gool, L., Timofte, R.: Tracking the known and the unknown by leveraging semantic information. In: BMVC, pp. 1192–1198 (2019)Google Scholar
  38. 38.
    Valmadre, J., Bertinetto, L., Henriques, et al.: End-to-end representation learning for correlation filter based tracking. In: CVPR, pp. 2805–2813 (2017)Google Scholar
  39. 39.
    Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam R-CNN: visual tracking by re-detection. arXiv preprint arXiv:1911.12836 (2019)
  40. 40.
    Vu, T., Jang, H., Pham, T.X., Yoo, C.: Cascade RPN: delving into high-quality region proposal network with adaptive convolution. In: NIPS, pp. 1430–1440 (2019)Google Scholar
  41. 41.
    Wang, G., et al.: SPM-tracker: series-parallel matching for real-time visual object tracking. In: CVPR, pp. 3643–3652 (2019)Google Scholar
  42. 42.
    Wang, N., Song, Y., Ma, C., Zhou, W., Liu, W., Li, H.: Unsupervised deep tracking. In: CVPR, pp. 1308–1317 (2019)Google Scholar
  43. 43.
    Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, pp. 1328–1338 (2019)Google Scholar
  44. 44.
    Wu, Y., Lim, et al.: Object tracking benchmark. TPAMI, 37(9), 1834–1848 (2015)Google Scholar
  45. 45.
    Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. TIP 28(11), 5596–5609 (2019)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: ECCV, pp. 152–167 (2018)Google Scholar
  47. 47.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: ACM MM, pp. 516–520 (2016)Google Scholar
  48. 48.
    Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR, pp. 7151–7160 (2018)Google Scholar
  49. 49.
    Zhang, K., Zhang, L., Yang, M.H.: Fast compressive tracking. TPAMI 36(10), 2002–2015 (2014)CrossRefGoogle Scholar
  50. 50.
    Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: CVPR, pp. 4591–4600 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.NLPR, CASIA and AI School, UCAS and CEBSITBeijingChina
  2. 2.Microsoft ResearchBeijingChina

Personalised recommendations