CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12365)


In this paper, we provide a deep analysis for Siamese-based trackers and find that the one core reason for their failure on challenging cases can be attributed to the problem of decisive samples missing during offline training. Furthermore, we notice that the samples given in the first frame can be viewed as the decisive samples for the sequence since they contain rich sequence-specific information. To make full use of these sequence-specific samples, we propose a compact latent network to quickly adjust the tracking model to adapt to new scenes. A statistic-based compact latent feature is proposed to efficiently capture the sequence-specific information for the fast adjustment. In addition, we design a new training approach based on a diverse sample mining strategy to further improve the discrimination ability of our compact latent network. To evaluate the effectiveness of our method, we apply it to adjust a recent state-of-the-art tracker, SiamRPN++. Extensive experimental results on five recent benchmarks demonstrate that the adjusted tracker achieves promising improvement in terms of tracking accuracy, with almost the same speed. The code and models are available at


Siamese tracker Latent feature Sequence-specific Sample mining Fast adjustment 


  1. 1.
    Andrychowicz, M., et al.: Learning to learn by gradient descent by gradient descent. In: NeurIPS (2016)Google Scholar
  2. 2.
    Ba, J., Hinton, G.E., Mnih, V., Leibo, J.Z., Ionescu, C.: Using fast weights to attend to the recent past. In: NeurIPS (2016)Google Scholar
  3. 3.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). Scholar
  4. 4.
    Choi, J., Kwon, J., Lee, K.M.: Deep meta learning for real-time target-aware visual tracking. In: ICCV (2019)Google Scholar
  5. 5.
    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M., et al.: Eco: efficient convolution operators for tracking. In: CVPR (2017)Google Scholar
  6. 6.
    Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). Scholar
  7. 7.
    Dong, X., Shen, J., Yu, D., Wang, W., Liu, J., Huang, H.: Occlusion-aware real-time object tracking. IEEE TMM 19, 763–771 (2017)Google Scholar
  8. 8.
    Dong, X., Shen, J.: Triplet loss in Siamese network for object tracking. In: ECCV (2018)Google Scholar
  9. 9.
    Dong, X., Shen, J., Shao, L., Van Gool, L.: Sub-Markov random walk for image segmentation. IEEE TIP 25, 516–527 (2015)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Dong, X., Shen, J., Wang, W., Liu, Y., Shao, L., Porikli, F.: Hyperparameter optimization for tracking with continuous deep q-learning. In: CVPR (2018)Google Scholar
  11. 11.
    Dong, X., Shen, J., Wang, W., Shao, L., Ling, H., Porikli, F.: Dynamical hyperparameter optimization via deep reinforcement learning in tracking. IEEE TPAMI (2019)Google Scholar
  12. 12.
    Dong, X., Shen, J., Wu, D., Guo, K., Jin, X., Porikli, F.: Quadruplet network with one-shot learning for fast visual object tracking. IEEE TIP 28, 3516–3527 (2019)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Fan, H., et al.: Lasot: a high-quality benchmark for large-scale single object tracking. In: CVPR (2019)Google Scholar
  14. 14.
    Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR (2019)Google Scholar
  15. 15.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)Google Scholar
  16. 16.
    Finn, C., Xu, K., Levine, S.: Probabilistic model-agnostic meta-learning. In: NeurIPS (2018)Google Scholar
  17. 17.
    Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: A benchmark for higher frame rate object tracking. In: ICCV (2017)Google Scholar
  18. 18.
    Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: ICCV (2017)Google Scholar
  19. 19.
    He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: CVPR (2018)Google Scholar
  20. 20.
    Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). Scholar
  21. 21.
    Henriques, J.F., Rui, C., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE TPAMI 37, 583–596 (2015)CrossRefGoogle Scholar
  22. 22.
    Hinton, G.E., Plaut, D.C.: Using fast weights to deblur old memories. In: CCSS (1987)Google Scholar
  23. 23.
    Hochreiter, S., Younger, A.S., Conwell, P.R.: Learning to learn using gradient descent. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 87–94. Springer, Heidelberg (2001). Scholar
  24. 24.
    Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: ICML (2015)Google Scholar
  25. 25.
    Huang, C., Lucey, S., Ramanan, D.: Learning policies for adaptive tracking with deep feature cascades. In: ICCV (2017)Google Scholar
  26. 26.
    Khan, S., Hayat, M., Zamir, S.W., Shen, J., Shao, L.: Striking the right balance with uncertainty. In: CVPR (2019)Google Scholar
  27. 27.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop (2015)Google Scholar
  28. 28.
    Kristan, M., et al.: The seventh visual object tracking vot2019 challenge results (2019)Google Scholar
  29. 29.
    Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. IEEE TPAMI 38, 2137–2155 (2016)CrossRefGoogle Scholar
  30. 30.
    Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: evolution of siamese visual tracking with very deep networks. In: CVPR (2019)Google Scholar
  31. 31.
    Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: CVPR (2018)Google Scholar
  32. 32.
    Li, H., Dong, W., Mei, X., Ma, C., Huang, F., Hu, B.G.: Lgm-net: learning to generate matching networks for few-shot learning. In: ICML (2019)Google Scholar
  33. 33.
    Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: gradient-guided network for visual object tracking. In: ICCV (2019)Google Scholar
  34. 34.
    Li, S., Yeung, D.Y.: Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: AAAI (2017)Google Scholar
  35. 35.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  36. 36.
    Liu, Y., Dong, X., Lu, X., Khan, F.S., Shen, J., Hoi, S.: Teacher-Students Knowledge Distillation for Siamese Trackers. arXiv (2019)Google Scholar
  37. 37.
    Lu, X., Ma, C., Ni, B., Yang, X., Reid, I., Yang, M.H.: Deep regression tracking with shrinkage loss. In: ECCV (2018)Google Scholar
  38. 38.
    Lu, X., Wang, W., Shen, J., Tai, Y.W., Crandall, D.J., Hoi, S.C.: Learning video object segmentation from unlabeled videos. In: CVPR (2020)Google Scholar
  39. 39.
    Ma, B., Hu, H., Shen, J., Zhang, Y., Porikli, F.: Linearization to nonlinear learning for visual tracking. In: ICCV (2015)Google Scholar
  40. 40.
    Ma, B., Shen, J., Liu, Y., Hu, H., Shao, L., Li, X.: Visual tracking using strong classifier and structural local sparse descriptors. IEEE TMM 17, 1818–1828 (2015)Google Scholar
  41. 41.
    Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). Scholar
  42. 42.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR (2016)Google Scholar
  43. 43.
    Park, E., Berg, A.C.: Meta-tracker: fast and robust online adaptation for visual object trackers. In: ECCV (2018)Google Scholar
  44. 44.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR (2017)Google Scholar
  45. 45.
    Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: CVPR (2017)Google Scholar
  46. 46.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NeurIPS (2015)Google Scholar
  47. 47.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015). Scholar
  48. 48.
    Rusu, A.A., et al.: Meta-learning with latent embedding optimization. In: ICLR (2019)Google Scholar
  49. 49.
    Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: ICML (2016)Google Scholar
  50. 50.
    Schmidhuber, J.: Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. Ph.D. thesis, Technische Universität München (1987)Google Scholar
  51. 51.
    Shen, J., Tang, X., Dong, X., Shao, L.: Visual object tracking by hierarchical attention siamese network. IEEE TCYB 50, 3068–3080 (2020)Google Scholar
  52. 52.
    Shen, J., Yu, D., Deng, L., Dong, X.: Fast online tracking with detection refinement. IEEE TITS 19, 162–173 (2017)Google Scholar
  53. 53.
    Shen, Z., Lai, W.S., Xu, T., Kautz, J., Yang, M.H.: Exploiting semantics for face image deblurring. IJCV 128, 1829–1846 (2020). Scholar
  54. 54.
    Shen, Z., et al.: Human-aware motion deblurring. In: ICCV (2019)Google Scholar
  55. 55.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS (2017)Google Scholar
  56. 56.
    Song, Y., et al.: Vital: visual tracking via adversarial learning. In: CVPR (2018)Google Scholar
  57. 57.
    Thrun, S., Pratt, L.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L. (eds.) Learning to learn, pp. 3–17. Springer, Boston (1998). Scholar
  58. 58.
    Valmadre, J., Bertinetto, L., Henriques, J.F., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: CVPR (2017)Google Scholar
  59. 59.
    Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: NeurIPS (2016)Google Scholar
  60. 60.
    Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: CVPR (2018)Google Scholar
  61. 61.
    Wang, W., Shen, J., Dong, X., Borji, A.: Salient object detection driven by fixation prediction. In: CVPR (2018)Google Scholar
  62. 62.
    Wang, W., Shen, J., Dong, X., Borji, A., Yang, R.: Inferring salient objects from human fixations. IEEE TPAMI 42, 1913–1927 (2019)CrossRefGoogle Scholar
  63. 63.
    Wang, X., Li, C., Luo, B., Tang, J.: Sint++: robust visual tracking via adversarial positive instance generation. In: CVPR (2018)Google Scholar
  64. 64.
    Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: ECCV (2018)Google Scholar
  65. 65.
    Yi, W., Jongwoo, L., Yang, M.H.: Object tracking benchmark. IEEE TPAMI (2015)Google Scholar
  66. 66.
    Yin, J., Wang, W., Meng, Q., Yang, R., Shen, J.: A unified object motion and affinity model for online multi-object tracking. In: CVPR (2020)Google Scholar
  67. 67.
    Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., Lu, H.: Structured siamese network for real-time visual tracking. In: ECCV (2018)Google Scholar
  68. 68.
    Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: CVPR (2019)Google Scholar
  69. 69.
    Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: ECCV (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Inception Institute of Artificial IntelligenceAbu DhabiUAE
  2. 2.Mohamed bin Zayed University of Artificial IntelligenceAbu DhabiUAE
  3. 3.Australian National UniversityCanberraAustralia

Personalised recommendations