DiffusionTracker: Targets Denoising Based on Diffusion Model for Visual Tracking

Zhang, Runqing; Cai, Dunbo; Qian, Ling; Du, Yujian; Lu, Huijun; Zhang, Yijun

doi:10.1007/978-981-99-8555-5_18

Runqing Zhang¹⁵,
Dunbo Cai¹⁵,
Ling Qian¹⁵,
Yujian Du¹⁵,
Huijun Lu¹⁵ &
…
Yijun Zhang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14436))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

369 Accesses

Abstract

The problem of background clutter (BC) is caused by distractors in the background that resemble the target’s appearance, thereby reducing the precision of visual trackers. We consider these similar distractors as noise and formulate a denoising task to solve the visual tracking problem. We propose a target denoising method based on a diffusion model for visual tracking, referred to as DiffusionTracker, which introduces the diffusion model to distinguish between targets and noise (distractors). Specifically, we introduce a reverse diffusion model to eliminate noisy distractors from the proposal candidates generated by the Siamese tracking backbone. To handle the difficulty that distractors do not strictly conform to a Gaussian distribution, we incorporate Spatial-Temporal Weighting (STW) to integrate spatial correlation and noise decay time information, mitigating the impact of noise distribution on denoising effectiveness. Experimental results demonstrate the effectiveness of the proposed method, with DiffusionTracker achieving a precision of 64.0% on BC sequences and a success rate of 63.8% on BC sequences from the LaSOT test datasets, representing improvements of 11.7% and 10.2% respectively over state-of-the-art trackers. Furthermore, our proposed method can be seamlessly integrated as a plug-and-play module with cutting-edge tracking algorithms, significantly improving the success rate for tracking task in background clutter scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Batzolis, G., Stanczuk, J., Schönlieb, C.B., Etmann, C.: Conditional image generation with score-based diffusion models. In: CVPR, pp. 1–10 (2021)
Google Scholar
Brempong, E.A., Kornblith, S., Chen, T., Parmar, N., Minderer, M., Norouzi, M.: Denoising pretraining for semantic segmentation. In: CVPR, pp. 4175–4186 (2022)
Google Scholar
Chen, S., Sun, P., Song, Y., Luo, P.: Diffusiondet: diffusion model for object detection. In: CVPR, pp. 1–10 (2023)
Google Scholar
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR, pp. 1–11 (2021)
Google Scholar
Cui, Y., Jiang, C., Wang, L., Wu, G.: Mixformer: end-to-end tracking with iterative mixed attention. In: CVPR, pp. 13608–13618 (2022)
Google Scholar
Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, pp. 4670–4679 (2019)
Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M., et al.: Eco: efficient convolution operators for tracking. In: CVPR, pp. 3–14 (2017)
Google Scholar
Dou, Y., Li, T., Li, L., Zhang, Y., Li, Z.: Tracking the research on ten emerging digital technologies in the AECO industry. J Constr. Eng. Manag. 149(3), 03123003 (2023)
Article Google Scholar
Fan, H., et al.: Lasot: a high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5374–5383 (2019)
Google Scholar
Gedara Chaminda Bandara, W., Gopalakrishnan Nair, N., Patel, V.M.: Remote sensing change detection (segmentation) using denoising diffusion probabilistic models. In: CVPR, pp. arXiv-2206 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022)
MathSciNet Google Scholar
Hu, W., Wang, Q., Zhang, L., Bertinetto, L., Torr, P.H.: Siammask: a framework for fast online object tracking and segmentation. PAMI 45(3), 3072–3089 (2023)
Google Scholar
Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. In: PAMI, p. 1 (2019)
Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)
Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)
Google Scholar
Li, H., et al.: SRDIFF: single image super-resolution with diffusion probabilistic models. Neurocomputing 479, 47–59 (2022)
Article Google Scholar
Li, X., Ma, C., Wu, B., He, Z., Yang, M.H.: Target-aware deep tracking. In: CVPR, pp. 1369–1378 (2019)
Google Scholar
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Chapter Google Scholar
Özbey, M., et al.: Unsupervised medical image translation with adversarial diffusion models. In: CVPR, pp. 1–10 (2022)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. In: PAMI (2022)
Google Scholar
Vahdat, A., Kreis, K., Kautz, J.: Score-based generative modeling in latent space. In: NIPS, vol. 34, pp. 11287–11302 (2021)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. PAMI 37(9), 1–1 (2015)
Article Google Scholar
Wyatt, J., Leach, A., Schmon, S.M., Willcocks, C.G.: AnodDPM: anomaly detection with denoising diffusion probabilistic models using simplex noise. In: CVPR, pp. 650–656 (2022)
Google Scholar
Xu, J., Liu, S., Vahdat, A., Byeon, W., Wang, X., De Mello, S.: Open-vocabulary panoptic segmentation with text-to-image diffusion models. In: CVPR, pp. 2955–2966 (2023)
Google Scholar
Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: SiamFC++: towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, pp. 12549–12556 (2020)
Google Scholar
Yang, Y., Gu, X.: Joint correlation and attention based feature fusion network for accurate visual tracking. IEEE Trans. Image Process. 32, 1705–1715 (2023)
Article Google Scholar
Zhang, J., Yang, X., Wang, W., Guan, J., Ding, L., Lee, V.C.: Automated guided vehicles and autonomous mobile robots for recognition and tracking in civil engineering. Autom. Constr. 146, 104699 (2023)
Article Google Scholar

Download references

Acknowledgment

This work was fully supported by Applied Basic Research Foundation of China Mobile (NO. R23100TM and R23103H0).

Author information

Authors and Affiliations

China Mobile (Suzhou) Software Technology Co., Ltd., Suzhou, 215153, China
Runqing Zhang, Dunbo Cai, Ling Qian, Yujian Du, Huijun Lu & Yijun Zhang

Authors

Runqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dunbo Cai
View author publications
You can also search for this author in PubMed Google Scholar
Ling Qian
View author publications
You can also search for this author in PubMed Google Scholar
Yujian Du
View author publications
You can also search for this author in PubMed Google Scholar
Huijun Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yijun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yujian Du .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, R., Cai, D., Qian, L., Du, Y., Lu, H., Zhang, Y. (2024). DiffusionTracker: Targets Denoising Based on Diffusion Model for Visual Tracking. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14436. Springer, Singapore. https://doi.org/10.1007/978-981-99-8555-5_18

Download citation

DOI: https://doi.org/10.1007/978-981-99-8555-5_18
Published: 28 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8554-8
Online ISBN: 978-981-99-8555-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DiffusionTracker: Targets Denoising Based on Diffusion Model for Visual Tracking