Skip to main content

DiffusionTracker: Targets Denoising Based on Diffusion Model for Visual Tracking

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14436))

Included in the following conference series:

  • 369 Accesses

Abstract

The problem of background clutter (BC) is caused by distractors in the background that resemble the target’s appearance, thereby reducing the precision of visual trackers. We consider these similar distractors as noise and formulate a denoising task to solve the visual tracking problem. We propose a target denoising method based on a diffusion model for visual tracking, referred to as DiffusionTracker, which introduces the diffusion model to distinguish between targets and noise (distractors). Specifically, we introduce a reverse diffusion model to eliminate noisy distractors from the proposal candidates generated by the Siamese tracking backbone. To handle the difficulty that distractors do not strictly conform to a Gaussian distribution, we incorporate Spatial-Temporal Weighting (STW) to integrate spatial correlation and noise decay time information, mitigating the impact of noise distribution on denoising effectiveness. Experimental results demonstrate the effectiveness of the proposed method, with DiffusionTracker achieving a precision of 64.0% on BC sequences and a success rate of 63.8% on BC sequences from the LaSOT test datasets, representing improvements of 11.7% and 10.2% respectively over state-of-the-art trackers. Furthermore, our proposed method can be seamlessly integrated as a plug-and-play module with cutting-edge tracking algorithms, significantly improving the success rate for tracking task in background clutter scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Batzolis, G., Stanczuk, J., Schönlieb, C.B., Etmann, C.: Conditional image generation with score-based diffusion models. In: CVPR, pp. 1–10 (2021)

    Google Scholar 

  2. Brempong, E.A., Kornblith, S., Chen, T., Parmar, N., Minderer, M., Norouzi, M.: Denoising pretraining for semantic segmentation. In: CVPR, pp. 4175–4186 (2022)

    Google Scholar 

  3. Chen, S., Sun, P., Song, Y., Luo, P.: Diffusiondet: diffusion model for object detection. In: CVPR, pp. 1–10 (2023)

    Google Scholar 

  4. Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR, pp. 1–11 (2021)

    Google Scholar 

  5. Cui, Y., Jiang, C., Wang, L., Wu, G.: Mixformer: end-to-end tracking with iterative mixed attention. In: CVPR, pp. 13608–13618 (2022)

    Google Scholar 

  6. Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, pp. 4670–4679 (2019)

    Google Scholar 

  7. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M., et al.: Eco: efficient convolution operators for tracking. In: CVPR, pp. 3–14 (2017)

    Google Scholar 

  8. Dou, Y., Li, T., Li, L., Zhang, Y., Li, Z.: Tracking the research on ten emerging digital technologies in the AECO industry. J Constr. Eng. Manag. 149(3), 03123003 (2023)

    Article  Google Scholar 

  9. Fan, H., et al.: Lasot: a high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5374–5383 (2019)

    Google Scholar 

  10. Gedara Chaminda Bandara, W., Gopalakrishnan Nair, N., Patel, V.M.: Remote sensing change detection (segmentation) using denoising diffusion probabilistic models. In: CVPR, pp. arXiv-2206 (2022)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  12. Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022)

    MathSciNet  Google Scholar 

  13. Hu, W., Wang, Q., Zhang, L., Bertinetto, L., Torr, P.H.: Siammask: a framework for fast online object tracking and segmentation. PAMI 45(3), 3072–3089 (2023)

    Google Scholar 

  14. Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. In: PAMI, p. 1 (2019)

    Google Scholar 

  15. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)

    Google Scholar 

  16. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)

    Google Scholar 

  17. Li, H., et al.: SRDIFF: single image super-resolution with diffusion probabilistic models. Neurocomputing 479, 47–59 (2022)

    Article  Google Scholar 

  18. Li, X., Ma, C., Wu, B., He, Z., Yang, M.H.: Target-aware deep tracking. In: CVPR, pp. 1369–1378 (2019)

    Google Scholar 

  19. Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19

    Chapter  Google Scholar 

  20. Özbey, M., et al.: Unsupervised medical image translation with adversarial diffusion models. In: CVPR, pp. 1–10 (2022)

    Google Scholar 

  21. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)

    Google Scholar 

  22. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  23. Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. In: PAMI (2022)

    Google Scholar 

  24. Vahdat, A., Kreis, K., Kautz, J.: Score-based generative modeling in latent space. In: NIPS, vol. 34, pp. 11287–11302 (2021)

    Google Scholar 

  25. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. PAMI 37(9), 1–1 (2015)

    Article  Google Scholar 

  26. Wyatt, J., Leach, A., Schmon, S.M., Willcocks, C.G.: AnodDPM: anomaly detection with denoising diffusion probabilistic models using simplex noise. In: CVPR, pp. 650–656 (2022)

    Google Scholar 

  27. Xu, J., Liu, S., Vahdat, A., Byeon, W., Wang, X., De Mello, S.: Open-vocabulary panoptic segmentation with text-to-image diffusion models. In: CVPR, pp. 2955–2966 (2023)

    Google Scholar 

  28. Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: SiamFC++: towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, pp. 12549–12556 (2020)

    Google Scholar 

  29. Yang, Y., Gu, X.: Joint correlation and attention based feature fusion network for accurate visual tracking. IEEE Trans. Image Process. 32, 1705–1715 (2023)

    Article  Google Scholar 

  30. Zhang, J., Yang, X., Wang, W., Guan, J., Ding, L., Lee, V.C.: Automated guided vehicles and autonomous mobile robots for recognition and tracking in civil engineering. Autom. Constr. 146, 104699 (2023)

    Article  Google Scholar 

Download references

Acknowledgment

This work was fully supported by Applied Basic Research Foundation of China Mobile (NO. R23100TM and R23103H0).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yujian Du .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, R., Cai, D., Qian, L., Du, Y., Lu, H., Zhang, Y. (2024). DiffusionTracker: Targets Denoising Based on Diffusion Model for Visual Tracking. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14436. Springer, Singapore. https://doi.org/10.1007/978-981-99-8555-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8555-5_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8554-8

  • Online ISBN: 978-981-99-8555-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics