Skip to main content
Log in

SiamLight: lightweight networks for object tracking via attention mechanisms and pixel-level cross-correlation

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Despite Siamese-based trackers have achieved great success in recent years, researchers have focused more on the accuracy of trackers than their complexity, which leads to their inapplicability in some scenarios, and the real-time speed can be greatly limited. In this work, we propose a lightweight network method called SiamLight for object tracking. MobileNet-V3 is selected as the backbone network. The PG-corr module is added as the feature fusion module, a strategy that decomposes the template feature into spatial and channel kernels, reducing the matching regions and suppressing the effect of similar interference. In addition, we also add the CSM module, which carries out attention to the channel and spatial simultaneously. CSM module not only reduces the number of parameters but also ensures that it can be integrated into existing network architectures as a plug-and-play module. Finally, multiple separable convolution blocks are added to the classification and regression branches to meet our lightweight parameters and Flops requirements. The experiments on LaSOT, VOT2018, VOT2019, OTB100, and UAV123 benchmarks show that the method has fewer Flops and parameters than state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865 (2016). https://doi.org/10.1007/978-3-319-48881-3_56. Springer

  2. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)

  3. Yan, B., Peng, H., Wu, K., Wang, D., Fu, J., Lu, H.: Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15180–15189 (2021)

  4. Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L.-C., Tan, M., Chu, G., Vasudevan, V., Zhu, Y., Pang, R., Adam, H., Le, Q.: Searching for mobilenetv3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019). https://doi.org/10.1109/ICCV.2019.00140

  5. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)

  6. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019)

  7. Yan, B., Zhang, X., Wang, D., Lu, H., Yang, X.: Alpha-refine: Boosting tracking performance by precise bounding box estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5289–5298 (2021)

  8. Liao, B., Wang, C., Wang, Y., Wang, Y., Yin, J.: Pg-net: Pixel to global matching network for visual tracking. In: European Conference on Computer Vision, pp. 429–444 (2020). https://doi.org/10.1007/978-3-030-58542-6_26. Springer

  9. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1. Springer

  10. Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016). https://doi.org/10.1109/CVPR.2016.158

  11. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

  12. Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: Object-aware anchor-free tracking. In: European Conference on Computer Vision, pp. 771–787 (2020). https://doi.org/10.1007/978-3-030-58589-1_46. Springer

  13. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  14. Fu, Z., Liu, Q., Fu, Z., Wang, Y.: Stmtrack: Template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13774–13783 (2021)

  15. Cui, Y., Guo, D., Shao, Y., Wang, Z., Shen, C., Zhang, L., Chen, S.: Joint classification and regression for visual tracking with fully convolutional siamese networks. Int. J. Comput. Vision 130(2), 550–566 (2022). https://doi.org/10.1007/s11263-021-01559-4

    Article  Google Scholar 

  16. Xie, F., Wang, C., Wang, G., Cao, Y., Yang, W., Zeng, W.: Correlation-aware deep tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8751–8760 (2022)

  17. Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6728–6737 (2020)

  18. Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4649–4659 (2019)

  19. Huang, H., Yu, X., et al.: Tapl: Dynamic part-based visual tracking via attention-guided part localization. arXiv preprint arXiv:2110.13027 (2021)

  20. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)

  21. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A., et al: The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 3–53 (2018). https://doi.org/10.1007/978-3-030-11009-3_1

  22. Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kamarainen, J.-K., Cehovin Zajc, L., Drbohlav, O., Lukezic, A., Berg, A., et al: The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 2206–2241 (2019). https://doi.org/10.1109/ICCVW.2019.00276

  23. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: European Conference on Computer Vision, pp. 445–461 (2016). Springer

  24. Wu, Y., Lim, J., Yang, M.: Visual tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  25. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4660–4669 (2019)

  26. Tang, F., Ling, Q.: Ranking-based siamese visual tracking. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8731–8740 (2022). https://doi.org/10.1109/CVPR52688.2022.00854

  27. Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12549–12556 (2020)

  28. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  29. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)

  30. Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2019). https://doi.org/10.1109/TPAMI.2019.2957464

    Article  Google Scholar 

  31. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)

  32. Zhu, X.F., Wu, X.J., Xu, T., Feng, Z., Kittler, J.: Robust visual object tracking via adaptive attribute-aware discriminative correlation filters. IEEE Transactions on Multimedia pp. 1–1 (2021)

  33. Xu, T., Feng, Z.-H., Wu, X.-J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans. Image Process. 28(11), 5596–5609 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  34. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)

  35. Tripathi, A.S., Danelljan, M., Van Gool, L., Timofte, R.: Tracking the known and the unknown by leveraging semantic information. Proceedings BMVC 2019, 1–14 (2019)

    Google Scholar 

  36. Ding, W., Xu, Q., Liu, S., Wang, T., Shao, B., Gong, H., Liu, T.-Y.: Samf: a self-adaptive protein modeling framework. Bioinformatics 37(22), 4075–4082 (2021)

    Article  Google Scholar 

  37. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  38. Fan, J., Song, H., Zhang, K., Yang, K., Liu, Q.: Feature alignment and aggregation siamese networks for fast visual tracking. IEEE Transactions on Circuits and Systems for Video Technology pp. 1–1 (2020)

Download references

Funding

This research was supported by: [1] the Research Foundation of the Institute of Environment-friendly Materials and Occupational Health (Wuhu), Anhui University of Science and Technology (No. ALW2021YF04), the National Natural Science Foundation of China (No. 62102003); [2] Anhui University of Science and Technology Graduate Innovation Fund (No. 2022CX2126): Research on object tracking based on Siamese method.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingzhu Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Ye., Li, M., Liang, X. et al. SiamLight: lightweight networks for object tracking via attention mechanisms and pixel-level cross-correlation. J Real-Time Image Proc 20, 31 (2023). https://doi.org/10.1007/s11554-023-01291-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01291-x

Keywords

Navigation