Skip to main content
Log in

Real-time object tracking in the wild with Siamese network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

A Correction to this article was published on 28 March 2023

This article has been updated

Abstract

Single object tracking (SOT) is one of the most important tasks in computer vision. With the development of deep neural networks and the release for a series of large scale datasets for single object tracking, Siamese networks have been proposed and perform better than most of the traditional methods. However, recent Siamese networks are getting slower to obtain better performance as they become deeper. Most of those networks could only meet the needs of real-time object tracking in ideal environments. In order to achieve a better balance between efficiency and accuracy, we propose a simpler Siamese network for single object tracking, which runs fast in poor hardware configurations while remaining an acceptable accuracy. The proposed method consists of three parts: sample generation, SE-Siamese and regression localization. In the sample generation stage, template patch and detection patch are cropped from the selected video frames in a new way. The SE-Siamese subnetwork adopts Siamese network and Squeeze-and-Excitation (SE) network as the feature extractor which is an effective way of speeding up the training phase. The regression localization network aims to compute the location of the tracked object in a more efficient way without losing much precision. To validate the effectiveness of the proposed approach, we conduct extensive experiments on several challenging tracking benchmark datasets, including VOT2015, VOT2016, VOT2017 and OTB-100. The experimental results show that our approach displays significant speed improvements compared to several strong baseline trackers (19.5 FPS to 44.4 FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Change history

References

  1. Bai S, He Z, Dong Y, Bai H (2020) Multi-hierarchical independent correlation filters for visual tracking. In: Proceedings of the IEEE international conference on multimedia and expo, pp 1–6

  2. Benenson R, Petti S, Fraichard T, Parent M (2008) Towards urban driverless vehicles. J Vehicle Auton Syst 1/2(6):4–23

    Article  Google Scholar 

  3. Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409

  4. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision workshops, vol 9914, pp 850–865

  5. Beymer D, Konolige K (1999) Real-time tracking of multiple people using continuous detection. In: Proceedings of the IEEE frame rate workshop, pp 1–8

  6. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2544–2550

  7. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577

    Article  Google Scholar 

  8. Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6931–6939

  9. Danelljan M, Häger G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, pp 65.1–65.11

  10. Danelljan M, Hȧger G, Khan FS, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318

  11. Danelljan M, Hȧger G, Khan FS, Felsberg M (2017) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575

    Article  Google Scholar 

  12. Danelljan M, Khan FS, Felsberg M, van de Weijer J (2014) Adaptive color attributes for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1090–1097

  13. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of the European conference on computer vision, vol 9909, pp 472–488

  14. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations

  15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  16. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: Proceedings of the European conference on computer vision, vol 9905, pp 749–765

  17. Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  18. Henriques JF, Caseiro R, Martins P, Batista JP (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the European conference on computer vision, vol 7575, pp 702–715

  19. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  20. Hua Y, Alahari K, Schmid C (2015) Online object tracking with proposal selection. In: Proceedings of the IEEE international conference on computer vision, pp 3092–3100

  21. Huang L, Zhao X, Huang K (2021) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577

    Article  Google Scholar 

  22. Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Hager G, Lukezic A, Eldesokey A et al (2017) The visual object tracking VOT2017 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1949–1972

  23. Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernȧndez G, Vojír T, Häger G, Nebehay G, Pflugfelder RP (2015) The visual object tracking VOT2015 challenge results. In: Proceedings of the IEEE international conference on computer vision workshop, pp 564–586

  24. Kristan M et al (2016) The visual object tracking VOT2016 challenge results. In: Proceedings of the European conference on computer vision workshops, vol 9914, pp 777–823

  25. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1106–1114

  26. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4282–4291

  27. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980

  28. Li C, Lin S, Qiao J, An S (2021) Partial tracking method based on siamese network. Vis Comput 37(3):587–601

    Article  Google Scholar 

  29. Li Y, Zhang X (2019) Siamvgg: visual tracking using deeper siamese networks. arXiv:1902.02804

  30. Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Proceedings of the European conference on computervision, vol 8693, pp 740–755

  31. Liu T, Wang G, Yang Q (2015) Real-time part-based visual tracking via adaptive correlation filters. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4902–4912

  32. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  33. Lukezic A, Vojir T, Zajc LC, Matas J, Kristan M (2017) Discriminative correlation filter with channel and spatial reliability. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4847–4856

  34. Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242

  35. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  36. Ning X, Duan P, Li W, Shi Y, Li S (2020) A CPU real-time face alignment for mobile platform. IEEE Access 8:8834–8843

    Article  Google Scholar 

  37. Ning X, Duan P, Li W, Zhang S (2020) Real-time 3d face alignment using an encoder-decoder network with an efficient deconvolution layer. IEEE Signal Process Lett 27:1944–1948

    Article  Google Scholar 

  38. Ong P, Chong TK, Ong KM, Low ES (2021) Tracking of moving athlete from video sequences using flower pollination algorithm. Vis Comput

  39. Possegger H, Mauthner T, Bischof H (2015) In defense of color-based model-free tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2113–2120

  40. Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  41. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Li F (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  42. Vishwakarma S, Agrawal A (2013) A survey on activity recognition and behavior understanding in video surveillance. Vis Comput 29(10):983–1009

    Article  Google Scholar 

  43. Vojir T, Noskova J, Matas J (2014) Robust scale-adaptive mean-shift for tracking. Pattern Recognit Lett 49:250–258

    Article  Google Scholar 

  44. Wang L, Ouyang W, Wang X, Lu H (2015) Visual tracking with fully convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3119–3127

  45. Wang N, Yeung D (2014) Ensemble-based tracking: aggregating crowdsourced structured time series data. In: Proceedings of the international conference on machine learning, vol 32, pp 1107–1115

  46. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1328–1338

  47. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  48. Woo S, Park J, Lee JY, So Kweon I (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3–19

  49. Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418

  50. Xu T, Feng Z, Wu X, Kittler J (2019) Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process 28(11):5596–5609

    Article  MathSciNet  MATH  Google Scholar 

  51. Yang C, Duraiswami R, Davis LS (2005) Fast multiple object tracking via a hierarchical particle filter. In: Proceedings of the IEEE international conference on computer vision, pp 212–219

  52. Zhang T, Xu C, Yang M (2017) Multi-task correlation particle filter for robust object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4819–4827

  53. Zhang W, Du Y, Chen Z, Deng J, Liu P (2021) Robust adaptive learning with siamese network architecture for visual tracking. Vis Comput 37(5):881–894

    Article  Google Scholar 

  54. Zhu G, Porikli F, Li H (2015) Tracking randomly moving objects on edge box proposals. arXiv:1507.08085

  55. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision, pp 101–117

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant Nos. (62276127).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Furao Shen.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The affiliations of Jianmin Wu were incorrect in the original publication of this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, F., Jiang, S., Wu, J. et al. Real-time object tracking in the wild with Siamese network. Multimed Tools Appl 82, 24327–24343 (2023). https://doi.org/10.1007/s11042-023-14519-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14519-6

Keywords

Navigation