Advertisement

Visual Tracking via Spatially Aligned Correlation Filters Network

  • Mengdan Zhang
  • Qiang Wang
  • Junliang Xing
  • Jin Gao
  • Peixi Peng
  • Weiming Hu
  • Steve Maybank
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)

Abstract

Correlation filters based trackers rely on a periodic assumption of the search sample to efficiently distinguish the target from the background. This assumption however yields undesired boundary effects and restricts aspect ratios of search samples. To handle these issues, an end-to-end deep architecture is proposed to incorporate geometric transformations into a correlation filters based network. This architecture introduces a novel spatial alignment module, which provides continuous feedback for transforming the target from the border to the center with a normalized aspect ratio. It enables correlation filters to work on well-aligned samples for better tracking. The whole architecture not only learns a generic relationship between object geometric transformations and object appearances, but also learns robust representations coupled to correlation filters in case of various geometric transformations. This lightweight architecture permits real-time speed. Experiments show our tracker effectively handles boundary effects and aspect ratio variations, achieving state-of-the-art tracking results on recent benchmarks.

Keywords

Visual tracking Spatial transformer network Deep learning Correlation filters network 

Notes

Acknowledgements

This work is supported by the Natural Science Foundation of China (Grant No. 61751212, 61472421, 61602478), the NSFC-general technology collaborative Fund for basic re-search (Grant No. U1636218), the Key Research Program of Frontier Sciences, CAS, Grant No. QYZDJ-SSW-JSC040, and the CAS External cooperation key project.

Supplementary material

474178_1_En_29_MOESM1_ESM.pdf (1.4 mb)
Supplementary material 1 (pdf 1429 KB)

References

  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  2. 2.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_56CrossRefGoogle Scholar
  3. 3.
    Bolme, D., Beveridge, J., Draper, B., Lui, Y.: Visual object tracking using adaptive correlation filters. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2544–2550 (2010)Google Scholar
  4. 4.
    Chen, D., Hua, G., Wen, F., Sun, J.: Supervised transformer network for efficient face detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 122–138. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_8CrossRefGoogle Scholar
  5. 5.
    Choi, J., Chang, H., Yun, S., Fischer, T., Demiris, Y., Choi, J.: Attentional correlation filter network for adaptive visual tracking, pp. 4828–4837 (2017)Google Scholar
  6. 6.
    Danelljan, M., Bhat, G., Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 6931–6939 (2017)Google Scholar
  7. 7.
    Danelljan, M., Häger, G., Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: Proceedings of IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)Google Scholar
  8. 8.
    Danelljan, M., Häger, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of British Machine Vision Conference, pp. 65.1–65.11 (2014)Google Scholar
  9. 9.
    Danelljan, M., Häger, G., Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)Google Scholar
  10. 10.
    Danelljan, M., Khan, F., Felsberg, M., van de Weijer, J.: Adaptive color attributes for real-time visual tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1097 (2014)Google Scholar
  11. 11.
    Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_29CrossRefGoogle Scholar
  12. 12.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  13. 13.
    Emami, A., Dadgostar, F., Bigdeli, A., Lovell, B.: Role of spatiotemporal oriented energy features for robust visual tracking in video surveillance. In: Proceedings of International Conference on Advanced Video and Signal-Based Surveillance, pp. 349–354 (2012)Google Scholar
  14. 14.
    Fang, H., Xie, S., Lu, C.: RMPE: regional multi-person pose estimation. arXiv preprint arXiv:1612.00137 (2016)
  15. 15.
    Hamed, K., Ashton, F., Simon, L.: Learning background-aware correlation filters for visual tracking. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1144–1152 (2017)Google Scholar
  16. 16.
    Hamed, K., Terence, S., Simon, L.: Correlation filters with limited boundaries. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4630–4638 (2015)Google Scholar
  17. 17.
    Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_45CrossRefGoogle Scholar
  18. 18.
    Henriques, J., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRefGoogle Scholar
  19. 19.
    Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of International Conference on Machine Learning, pp. 597–606 (2015)Google Scholar
  20. 20.
    Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (MUSTer): a cognitive psychology inspired approach to object tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 749–758 (2015)Google Scholar
  21. 21.
    Jaderberg, M., Simonyan, K., Zisserman, A.: Spatial transformer networks. In: Proceedings of Neural Information Processing Systems, pp. 2017–2025 (2015)Google Scholar
  22. 22.
    Kristan, M., et al.: The visual object tracking VOT2015 challenge results. In: Proceedings of IEEE International Conference on Computer Vision Workshops, pp. 1–23 (2015)Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  24. 24.
    Kwon, J., Lee, K.: Visual tracking decomposition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1269–1276 (2010)Google Scholar
  25. 25.
    Li, F., Yao, Y., Li, P., Zhang, D., Zuo, W., Yang, M.: Integrating boundary and center correlation filters for visual tracking with aspect ratio variation. arXiv preprint arXiv:1710.02039 (2017)
  26. 26.
    Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 254–265. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16181-5_18CrossRefGoogle Scholar
  27. 27.
    Liu, L., Xing, J., Ai, H., Ruan, X.: Hand posture recognition using finger geometric feature. In: Proceedings of IEEE International Conference on Pattern Recognition, pp. 565–568 (2012)Google Scholar
  28. 28.
    Mueller, M., Neil, S., Bernard, G.: Context-aware correlation filter tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1387–1395 (2017)Google Scholar
  29. 29.
    Ma, C., Huang, J., Yang, X., Yang, M.: Hierarchical convolutional features for visual tracking. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)Google Scholar
  30. 30.
    Ma, C., Yang, X., Zhang, C., Yang, M.: Long-term correlation tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5388–5396 (2015)Google Scholar
  31. 31.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking, pp. 4293–4302 (2016)Google Scholar
  32. 32.
    Qi, Y., et al.: Hedged deep tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)Google Scholar
  33. 33.
    Wang, Q., Gao, J., Xing, J., Zhang, M., Hu, W.: DCFNet: discriminant correlation filters network for visual tracking. arXiv preprint arXiv:1704.04057 (2017)
  34. 34.
    Song, W., Zhu, J., Li, Y., Chen, C.: Image alignment by online robust PCA via stochastic gradient descent. IEEE Trans Circuits Syst. Video Technol. 26(7), 1241–1250 (2016)CrossRefGoogle Scholar
  35. 35.
    Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R., Yang, M.: CREST: convolutional residual learning for visual tracking, pp. 2574–2583 (2017)Google Scholar
  36. 36.
    Tang, M., Feng, J.: Multi-kernel correlation filter for visual tracking. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3038–3046 (2015)Google Scholar
  37. 37.
    Tao, R., Gavves, E., Smeulders, A.: Siamese instance search for tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)Google Scholar
  38. 38.
    Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5000–5008 (2017)Google Scholar
  39. 39.
    Vedaldi, A., Lenc, K.: MatConvNet: convolutional neural networks for Matlab. In: ACM MM (2015)Google Scholar
  40. 40.
    Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3119–3127 (2015)Google Scholar
  41. 41.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)Google Scholar
  42. 42.
    Wu, Y., Lim, J., Yang, M.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)CrossRefGoogle Scholar
  43. 43.
    Wu, Y., Shen, B., Ling, H.: Online robust image alignment via iterative convex optimization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1808–1814 (2012)Google Scholar
  44. 44.
    Zhang, M., Xing, J., Gao, J., Hu, W.: Robust visual tracking using joint scale-spatial correlation filters. In: Proceedings of IEEE International Conference on Image Processing, pp. 1468–1472 (2015)Google Scholar
  45. 45.
    Zhu, G., Porikli, F., Li, H.: Tracking randomly moving objects on edge box proposals. arXiv preprint arXiv:1507.08085 (2015)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Mengdan Zhang
    • 1
  • Qiang Wang
    • 1
  • Junliang Xing
    • 1
  • Jin Gao
    • 1
  • Peixi Peng
    • 1
  • Weiming Hu
    • 1
  • Steve Maybank
    • 2
  1. 1.CAS Center for Excellence in Brain Science and Intelligence Technology, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of SciencesUniversity of Chinese Academy of SciencesBeijingChina
  2. 2.Birkbeck CollegeUniversity of LondonLondonUK

Personalised recommendations