Siamese network for real-time tracking with action-selection

Abstract

Considering that most deep learning based trackers capture accurate locations for targets at the expense of consuming much time in training phrase, in this paper we present a new powerful tracker using the Siamese network which can be implemented with low computation resource. Our proposed tracker can track targets accurately by a fine-tuned model which is convenient to train. During the tracking, we apply a new sampling method that is independent of training called action-selection to conduct selective and flexible sampling step by step with a variable stride, by which we can get bounding boxes with varied aspect radio. By verifying its performance on online tracking benchmarks, it turns out that our tracker achieves higher accuracy than most traditional trackers. In addition, our tracker operates at frame-rates beyond real-time.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. 1.

    Yi, W., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Computer Vision and Pattern Recognition (2013)

  2. 2.

    Porikli, F.: Achieving real-time object detection and tracking under extreme conditions. J. Real-Time Image Process. 1(1), 33–40 (2006)

    Article  Google Scholar 

  3. 3.

    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: An astounding baseline for recognition (2014)

  4. 4.

    Lee, S.H., Jang, W.D., Kim, C.S.: Tracking-by-segmentation using superpixel-wise neural network. IEEE Access 99, 1–1 (2018)

    Google Scholar 

  5. 5.

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)

  6. 6.

    Bromley, J., Guyon, I., Lecun, Y., Shah, R.: Signature verification using a “siamese” time delay neural network. In: International Conference on Neural Information Processing Systems (1993)

  7. 7.

    Taigman, Y., Ming, Y., Ranzato, M., Wolf, L.: Deepface: Closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)

  8. 8.

    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks (2015)

  9. 9.

    Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal. In: IEEE conference on computer vision and pattern recognition (2018)

  10. 10.

    Li, B., Wu, W., Wang, Q., Zhang, F., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks (2018)

  11. 11.

    Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: IEEE conference on computer vision and pattern recognition (2016)

  12. 12.

    Kai, C., Tao, W.: Convolutional regression for visual tracking. IEEE Trans. Image Process. 99, 1–1 (2016)

    MATH  Google Scholar 

  13. 13.

    Smeulders, A.W.M., Chu, D.M., Rita, C., Simone, C., Afshin, D., Mubarak, S.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)

    Article  Google Scholar 

  14. 14.

    Ning, G., Zhi, Z., Chen, H., He, Z., Wang, H.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE International Symposium on Circuits and Systems (2017)

  15. 15.

    Hare, S., Saffari, A., Torr, P.H.S.: Struck: Structured output tracking with kernels. In: International Conference on Computer Vision (2011)

  16. 16.

    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision (2016)

  17. 17.

    Guo, Q., Wei, F., Zhou, C., Rui, H., Song, W.: Learning dynamic siamese network for visual object tracking. In: IEEE international conference on computer vision (2017)

  18. 18.

    Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: International Conference on Neural Information Processing Systems (2013)

  19. 19.

    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking (2015)

  20. 20.

    Chao, M., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision (2016)

  21. 21.

    Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: IEEE international conference on computer vision (2016)

  22. 22.

    Fan, H., Ling, H.: Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)

  23. 23.

    Chong, S., Lu, H., Yang, M.H.: Learning spatial-aware regressions for visual tracking (2018)

  24. 24.

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    MathSciNet  Article  Google Scholar 

  25. 25.

    Kai, C., Tao, W.: Once for all: a two-flow convolutional neural network for visual tracking. IEEE Trans. Circ. Syst. Video Technol. 99, 1–1 (2016)

    Google Scholar 

  26. 26.

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Computer Science (2014)

  27. 27.

    Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks (2016)

  28. 28.

    Huang, C., Lucey, S., Ramanan, D.: Learning policies for adaptive tracking with deep feature cascades (2017)

  29. 29.

    Song, Y., Chao, M., Gong, L., Zhang, J., Yang, M.H.: Crest: Convolutional residual learning for visual tracking. In: IEEE International Conference on Computer Vision (2017)

  30. 30.

    Real, E., Shlens, J., Mazzocchi, S., Xin, P., Vanhoucke, V.: Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video (2017)

  31. 31.

    Girshick, R.: Fast r-cnn. Computer Science (2015)

  32. 32.

    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Computer Vision and Pattern Recognition (2015)

  33. 33.

    Yi, W., Jongwoo, L., Ming-Hsuan, Y.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  34. 34.

    Kristan, M., Eldesokey, A., Xing, Y., Fan, Y., Zheng, Z., Zhang, Z., He, Z., Fernandez, G., Garciamartin, A., Muhic, A.: The visual object tracking vot2017 challenge results. In: IEEE International Conference on Computer Vision Workshop (2017)

  35. 35.

    Yang, L., Liu, R., Zhang, D., Lei, Z.: Deep location-specific tracking. In: ACM on Multimedia Conference (2017)

  36. 36.

    Zhang, J., Ma, S., Sclaroff, S.: Meem: Robust tracking via multiple experts using entropy minimization (2014)

  37. 37.

    Martin, D., Gustav, H., Fahad, S.K., Michael, F.: Accurate scale estimation for robust visual tracking. In: British machine vision conference (2014)

  38. 38.

    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)

    Article  Google Scholar 

  39. 39.

    Yang, M.H., Lu, H., Wei, Z.: Robust object tracking via sparsity-based collaborative model. In: Computer Vision and Pattern Recognition (2012)

  40. 40.

    Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.S.: Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2015)

    Article  Google Scholar 

Download references

Funding

This work was supported by the Natural Science Foundation of Jiangsu Province under (Grant no. BK20151102), Natural Science Foundation of China (Grant no. 61673108), Ministry of Education Key Laboratory of Machine Perception, Peking University (Grant no. K-2016-03), Open Project Program of the Ministry of Education Key Laboratory of Underwater Acoustic Signal Processing, Southeast University (Grant no. UASP1502) and Natural Science Foundation of China (Grant no. 61802058).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhuoyi Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Zhang, Y., Cheng, X. et al. Siamese network for real-time tracking with action-selection. J Real-Time Image Proc 17, 1647–1657 (2020). https://doi.org/10.1007/s11554-019-00922-6

Download citation

Keywords

  • Computer vision
  • Object tracking
  • Siamese network