SiamLight: lightweight networks for object tracking via attention mechanisms and pixel-level cross-correlation

Lin, Yu-e; Li, Mengfan; Liang, Xingzhu; Xia, Chenxing

doi:10.1007/s11554-023-01291-x

SiamLight: lightweight networks for object tracking via attention mechanisms and pixel-level cross-correlation

Research
Published: 11 March 2023

Volume 20, article number 31, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Yu-e Lin¹,
Mengfan Li¹,
Xingzhu Liang^1,2 &
…
Chenxing Xia¹

332 Accesses
2 Citations
Explore all metrics

Abstract

Despite Siamese-based trackers have achieved great success in recent years, researchers have focused more on the accuracy of trackers than their complexity, which leads to their inapplicability in some scenarios, and the real-time speed can be greatly limited. In this work, we propose a lightweight network method called SiamLight for object tracking. MobileNet-V3 is selected as the backbone network. The PG-corr module is added as the feature fusion module, a strategy that decomposes the template feature into spatial and channel kernels, reducing the matching regions and suppressing the effect of similar interference. In addition, we also add the CSM module, which carries out attention to the channel and spatial simultaneously. CSM module not only reduces the number of parameters but also ensures that it can be integrated into existing network architectures as a plug-and-play module. Finally, multiple separable convolution blocks are added to the classification and regression branches to meet our lightweight parameters and Flops requirements. The experiments on LaSOT, VOT2018, VOT2019, OTB100, and UAV123 benchmarks show that the method has fewer Flops and parameters than state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 5

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865 (2016). https://doi.org/10.1007/978-3-319-48881-3_56. Springer
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)
Yan, B., Peng, H., Wu, K., Wang, D., Fu, J., Lu, H.: Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15180–15189 (2021)
Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L.-C., Tan, M., Chu, G., Vasudevan, V., Zhu, Y., Pang, R., Adam, H., Le, Q.: Searching for mobilenetv3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019). https://doi.org/10.1109/ICCV.2019.00140
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019)
Yan, B., Zhang, X., Wang, D., Lu, H., Yang, X.: Alpha-refine: Boosting tracking performance by precise bounding box estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5289–5298 (2021)
Liao, B., Wang, C., Wang, Y., Wang, Y., Yin, J.: Pg-net: Pixel to global matching network for visual tracking. In: European Conference on Computer Vision, pp. 429–444 (2020). https://doi.org/10.1007/978-3-030-58542-6_26. Springer
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1. Springer
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016). https://doi.org/10.1109/CVPR.2016.158
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: Object-aware anchor-free tracking. In: European Conference on Computer Vision, pp. 771–787 (2020). https://doi.org/10.1007/978-3-030-58589-1_46. Springer
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). https://doi.org/10.1109/TPAMI.2016.2572683
Article Google Scholar
Fu, Z., Liu, Q., Fu, Z., Wang, Y.: Stmtrack: Template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13774–13783 (2021)
Cui, Y., Guo, D., Shao, Y., Wang, Z., Shen, C., Zhang, L., Chen, S.: Joint classification and regression for visual tracking with fully convolutional siamese networks. Int. J. Comput. Vision 130(2), 550–566 (2022). https://doi.org/10.1007/s11263-021-01559-4
Article Google Scholar
Xie, F., Wang, C., Wang, G., Cao, Y., Yang, W., Zeng, W.: Correlation-aware deep tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8751–8760 (2022)
Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6728–6737 (2020)
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4649–4659 (2019)
Huang, H., Yu, X., et al.: Tapl: Dynamic part-based visual tracking via attention-guided part localization. arXiv preprint arXiv:2110.13027 (2021)
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A., et al: The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 3–53 (2018). https://doi.org/10.1007/978-3-030-11009-3_1
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kamarainen, J.-K., Cehovin Zajc, L., Drbohlav, O., Lukezic, A., Berg, A., et al: The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 2206–2241 (2019). https://doi.org/10.1109/ICCVW.2019.00276
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: European Conference on Computer Vision, pp. 445–461 (2016). Springer
Wu, Y., Lim, J., Yang, M.: Visual tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4660–4669 (2019)
Tang, F., Ling, Q.: Ranking-based siamese visual tracking. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8731–8740 (2022). https://doi.org/10.1109/CVPR52688.2022.00854
Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12549–12556 (2020)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)
Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2019). https://doi.org/10.1109/TPAMI.2019.2957464
Article Google Scholar
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)
Zhu, X.F., Wu, X.J., Xu, T., Feng, Z., Kittler, J.: Robust visual object tracking via adaptive attribute-aware discriminative correlation filters. IEEE Transactions on Multimedia pp. 1–1 (2021)
Xu, T., Feng, Z.-H., Wu, X.-J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans. Image Process. 28(11), 5596–5609 (2019)
Article MathSciNet MATH Google Scholar
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
Tripathi, A.S., Danelljan, M., Van Gool, L., Timofte, R.: Tracking the known and the unknown by leveraging semantic information. Proceedings BMVC 2019, 1–14 (2019)
Google Scholar
Ding, W., Xu, Q., Liu, S., Wang, T., Shao, B., Gong, H., Liu, T.-Y.: Samf: a self-adaptive protein modeling framework. Bioinformatics 37(22), 4075–4082 (2021)
Article Google Scholar
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Fan, J., Song, H., Zhang, K., Yang, K., Liu, Q.: Feature alignment and aggregation siamese networks for fast visual tracking. IEEE Transactions on Circuits and Systems for Video Technology pp. 1–1 (2020)

Download references

Funding

This research was supported by: [1] the Research Foundation of the Institute of Environment-friendly Materials and Occupational Health (Wuhu), Anhui University of Science and Technology (No. ALW2021YF04), the National Natural Science Foundation of China (No. 62102003); [2] Anhui University of Science and Technology Graduate Innovation Fund (No. 2022CX2126): Research on object tracking based on Siamese method.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, Anhui, China
Yu-e Lin, Mengfan Li, Xingzhu Liang & Chenxing Xia
Institute of Environment-friendly Materials and Occupational Health, Anhui University of Science and Technology, Wuhu, 10587, Anhui, China
Xingzhu Liang

Authors

Yu-e Lin
View author publications
You can also search for this author in PubMed Google Scholar
Mengfan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xingzhu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Chenxing Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingzhu Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, Ye., Li, M., Liang, X. et al. SiamLight: lightweight networks for object tracking via attention mechanisms and pixel-level cross-correlation. J Real-Time Image Proc 20, 31 (2023). https://doi.org/10.1007/s11554-023-01291-x

Download citation

Received: 25 October 2022
Accepted: 28 February 2023
Published: 11 March 2023
DOI: https://doi.org/10.1007/s11554-023-01291-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SiamLight: lightweight networks for object tracking via attention mechanisms and pixel-level cross-correlation

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SiamLight: lightweight networks for object tracking via attention mechanisms and pixel-level cross-correlation

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation