Advertisement

Visual Tracking with Attentional Convolutional Siamese Networks

  • Ke Tan
  • Zhenzhong WeiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11901)

Abstract

Recently Siamese trackers have drawn great attention due to their considerable accuracy and speed. To further improve the discriminability of Siamese networks for visual tracking, some deeper networks, such as VGG and ResNet, are exploited as backbone. However, high-level semantic information reduces the location discrimination. In this paper, we propose a novel Attentional Convolutional Siamese Networks for visual tracking (ACST), to improve the classical AlexNet by fusing spatial and channel attentions during feature learning. Moreover, a response-based weighted sampling strategy during training is proposed to strengthen the discrimination power to distinguish two objects with the similar attributes. With the efficiency of cross-correlation operator, our tracker can be trained end-to-end while running in real-time at inference phase. We validate our tracker through extensive experiments on OTB2013 and OTB2015, and results show that the proposed tracker obtains great improvements over the other Siamese trackers.

Keywords

Visual tracking Siamese networks Visual attentions 

References

  1. 1.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_56CrossRefGoogle Scholar
  2. 2.
    Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Learning discriminative model prediction for tracking. arXiv preprint arXiv:1904.07220 (2019)
  3. 3.
    Chen, B., Wang, D., Li, P., Wang, S., Lu, H.: Real-time ‘Actor-Critic’ tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 328–345. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_20 CrossRefGoogle Scholar
  4. 4.
    Choi, J., Jin Chang, H., Jeong, J., Demiris, Y., Young Choi, J.: Visual tracking using attention-modulated disintegration and integration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4321–4330 (2016)Google Scholar
  5. 5.
    Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)Google Scholar
  6. 6.
    Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 472–488. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01261-8_28CrossRefGoogle Scholar
  7. 7.
    Fan, H., Ling, H.: SaNet: structure-aware network for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–49 (2017)Google Scholar
  8. 8.
    Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1763–1771 (2017)Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  11. 11.
    Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_45CrossRefGoogle Scholar
  12. 12.
    Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: International Conference on Machine Learning, pp. 597–606 (2015)Google Scholar
  13. 13.
    Jung, I., Son, J., Baek, M., Han, B.: Real-time MDNet. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 89–104. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01225-0_6CrossRefGoogle Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  15. 15.
    Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. arXiv preprint arXiv:1812.11703 (2018)
  16. 16.
    Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)Google Scholar
  17. 17.
    Li, Y., Zhang, X.: SiamVGG: visual tracking using deeper siamese networks. arXiv preprint arXiv:1902.02804 (2019)
  18. 18.
    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)Google Scholar
  19. 19.
    Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_19CrossRefGoogle Scholar
  20. 20.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)Google Scholar
  21. 21.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  23. 23.
    Smeulders, A.W.M., Chu, D.M., Rita, C., Simone, C., Afshin, D., Mubarak, S.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)CrossRefGoogle Scholar
  24. 24.
    Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R.W., Yang, M.H.: CREST: convolutional residual learning for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2555–2564 (2017)Google Scholar
  25. 25.
    Tan, K., Wei, Z.: Learning an orientation and scale adaptive tracker with regularized correlation filters. IEEE Access 7, 53476–53486 (2019)CrossRefGoogle Scholar
  26. 26.
    Tang, M., Yu, B., Zhang, F., Wang, J.: High-speed tracking with multi-kernel correlation filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4874–4883 (2018)Google Scholar
  27. 27.
    Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)Google Scholar
  28. 28.
    Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)Google Scholar
  29. 29.
    Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.J.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  30. 30.
    Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_1CrossRefGoogle Scholar
  31. 31.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)Google Scholar
  32. 32.
    Yi, W., Jongwoo, L., Ming-Hsuan, Y.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)CrossRefGoogle Scholar
  33. 33.
    Yun, S., Choi, J., Yoo, Y., Yun, K., Young Choi, J.: Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2711–2720 (2017)Google Scholar
  34. 34.
    Zhang, T., Xu, C., Yang, M.H.: Learning multi-task correlation particle filters for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 365–378 (2019)CrossRefGoogle Scholar
  35. 35.
    Zhipeng, Z., Houwen, P., Qiang, W.: Deeper and wider siamese networks for real-time visual tracking. arXiv preprint arXiv:1901.01660 (2019)
  36. 36.
    Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01240-3_7CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Instrumentation and Optoelectronic EngineeringBeihang UniversityBeijingChina
  2. 2.The Key Laboratory of Precision Opto-mechatronics TechnologyMinistry of EducationBeijingChina

Personalised recommendations