Abstract
Visual object tracking is among the extremely attractive topics in various vision system applications. The appearance of the target object often drastically changes over time due to complicated attributes such as motion blur and fast motion; accordingly object tracking is a challenging issue. Over the past few years, Siamese architecture has been satisfactorily employed in object tracking because of its great accuracy and robustness. In this paper, a new matching operator for determining the similarity between the template image and search region to locate the state of the object is introduced. The matching operator is a weighted sum of depth-wise cosine similarity and depth-wise cross-correlation. For computing cosine similarity in each block, extracted features from the backbone architecture are divided into column vectors in each channel. The similarity is calculated in corresponding columns. Then, the max-pooling operation is applied to similarity values, and the maximum value is assigned to the center of the block. Finally, the combination of cosine similarity and cross-correlation in each channel is used. The results on OTB-100 demonstrate that the proposed tracker acquires promising prediction performance concerning the state-of-the-art tracking techniques in terms of motion blur, fast motion, and also out-of-view attributes.
Similar content being viewed by others
References
Soleimanitaleb Z, Keyvanrad MA (2022) Single Object Tracking: A Survey of Methods, Datasets, and Evaluation Metrics. arXiv preprint arXiv:2201.13066. https://doi.org/10.48550/arXiv.2201.13066
Abbasi S, Rezaeian M (2021) Visual object tracking using similarity transformation and adaptive optical flow. Multimed Tools Appl 80(24):33455–33473. https://doi.org/10.1007/s11042-021-11344-7
Zuo M (2020) Survey of target tracking algorithm based on siamese network structure. J Phys Conf Ser 2203:012035
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. Computer vision–ECCV 2016 workshops: Amsterdam, The Netherlands, October 8–10 and 15–16. Springer, Heidelberg, pp 850–865
Zhang Z, et al (2021) Learn to match: Automatic matching network design for visual tracking. Proceedings of the IEEE/CVF International Conference on Computer Vision
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Xu Y et al (2020) Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. Proc AAAI Conf Artif Intell 34(07):12549–12556. https://doi.org/10.1609/aaai.v34i07.6944
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese Box Adaptive Network for Visual Tracking. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 6667–6676
Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2018.2884211
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670. https://doi.org/10.1109/TIP.2015.2487860
Hong C, Yu J, Tao D, Wang M (2015) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751. https://doi.org/10.1109/TIE.2014.2378735
Zhang J, Yang J, Yu J, Fan J (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. Int J Intell Syst 37(5):3117–3141. https://doi.org/10.1002/int.22814
Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recognit 116:107952. https://doi.org/10.1016/j.patcog.2021.107952
Yu J, Tan M, Zhang H, Rui Y, Tao D (2022) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578. https://doi.org/10.1109/TPAMI.2019.2932058
Ma B, Huang L, Shen J, Shao L, Yang MH, Porikli F (2016) Visual tracking under motion blur. IEEE Trans Image Process 25(12):5867–5876. https://doi.org/10.1109/TIP.2016.2615812
Xu L, Luo H, Hui B, Chang Z (2016) Real-time robust tracking for motion blur and fast motion via correlation filters. Sensors 16(9):1443. https://doi.org/10.3390/s16091443
Guo Q et al (2021) Learning to adversarially blur visual object tracking. Proc IEEE/CVF Int Conf Comput Vis. https://doi.org/10.1109/ICCV48922.2021.01066
Liu S, Liu X, Wang S, Muhammad K (2021) Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT-assisted complex environment. Neural Comput Appl 33(4):1055–1065. https://doi.org/10.1007/s00521-020-05021-3
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Heidelberg, pp 103–119
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 4834–4843
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 4655–4664
Ma C, Bin Huang J, Yang X, Yang MH (2018) Adaptive correlation filters with long-term and short-term memory for object tracking. Int J Comput Vis 126(8):771–796. https://doi.org/10.1007/s11263-018-1076-4
Voigtlaender P, Luiten J, Torr PH, Leibe B (2022) Siam r-cnn: visual tracking by re-detection. Proc IEEE/CVF Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR42600.2020.00661
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SIAMRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 4277–4286
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision (ECCV). Springer: Heidelberg, pp 472–488
Wu Y, Cai C, Yeo CK (2022) Siamese centerness prediction network for real-time visual object tracking. Neural Process Lett. https://doi.org/10.1007/s11063-022-10924-4
Xiang X, Ren W, Qiu Y, Zhang K, Lv N (2021) Multi-object tracking method based on efficient channel attention and switchable atrous convolution. Neural Process Lett 53(4):2747–2763. https://doi.org/10.1007/s11063-021-10519-5
Ma F, Shou MZ, Zhu L, Fan H, Xu Y, Yang Y, Yan Z (2022) Unified transformer tracker for object tracking. Proc IEEE/CVF Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR52688.2022.00858
Zhang J, Sun J, Wang J, Yue XG (2021) Visual object tracking based on residual network and cascaded correlation filters. J Ambient Intell Humaniz Comput 12(8):8427–8440. https://doi.org/10.1007/S12652-020-02572-0
Zhang N, Liu J, Wang K, Zeng D, Mei T (2020) Robust visual object tracking with two-stream residual convolutional networks. In: Proceedings—International Conference on Pattern Recognition, pp 4123–4130
Kinasih FMTR, Saragih CFD, Machbub C, Rusmin PH, Yulianti L, Andriana D (2019) State machine implementation for human object tracking using combination of mobilenet, KCF tracker, and HOG features. Int J Electr Eng Inf 11(4):697–712. https://doi.org/10.15676/ijeei.2019.11.4.5
Zhu J, Zhang G, Zhou S, Li K (2021) Relation-aware Siamese region proposal network for visual object tracking. Multimed Tools Appl. https://doi.org/10.1007/S11042-021-10574-Z
You H et al (2022) MC-Net: multiple max-pooling integration module and cross multi-scale deconvolution network. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2021.107456
Zhang X, Wei Y, Yang Y, Huang TS (2020) SG-One: similarity guidance network for one-shot semantic segmentation. IEEE Trans Cybern 50(9):3855–3865. https://doi.org/10.1109/TCYB.2020.2992433
Asuero AG, Sayago A, González AG (2006) The correlation coefficient: an overview. Crit Rev Anal Chem 36(1):41–59. https://doi.org/10.1080/10408340500526766
Hassan M, Bhagvati C (2012) Structural similarity measure for color images. Int J Comput Appl 43:7–12
Lin A (2019) Binary search algorithm. Wiki J Sci 2(1):5. https://doi.org/10.15347/wjs/2019.005
Wu J, Chen XY, Zhang H, Xiong LD, Lei H, Deng SH (2019) Hyperparameter optimization for machine learning models based on Bayesian optimization. J Electron Sci Technol 17(1):26–40. https://doi.org/10.11989/JEST.1674-862X.80904120
Thiede LA, Parlitz U (2019) Gradient based hyperparameter optimization in Echo State Networks. Neural Netw 115:23–29. https://doi.org/10.1016/j.neunet.2019.02.001
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2544–2550
Danelljan M, Khan FS, Felsberg M, Van De Weijer J (2014) Adaptive color attributes for real-time visual tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1090–1097
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. Computer vision—ECCV 2016 workshops: Amsterdam, The Netherlands, October 8–10 and 15–16 2016 proceedings part II. Springer, Heidelberg, pp 850–865
Author information
Authors and Affiliations
Contributions
Soolmaz Abbasi and Mehdi Rezaeian conceived of the presented idea. Soolmaz Abbasi developed the theory and performed the computations and experiments. Mehdi Rezaeian encouraged Soolmaz Abbasi to investigate the superiority of cosine similarity against fast motion and motion blur attributes and supervised the findings of this work. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abbasi, S., Rezaeian, M. A Novel Matching Operator for Visual Object Tracking. Neural Process Lett 55, 9065–9084 (2023). https://doi.org/10.1007/s11063-023-11192-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11192-6