International Journal of Computer Vision

, Volume 101, Issue 2, pp 367–383

Robust Visual Tracking via Structured Multi-Task Sparse Learning

Authors

  • Tianzhu Zhang
    • Advanced Digital Sciences Center (ADSC)
  • Bernard Ghanem
    • King Abdullah University of Science and Technology (KAUST)
    • Department of Electrical and Computer EngineeringNational University of Singapore
  • Narendra Ahuja
    • Department of Electrical and Computer Engineering, Beckman Institute, and Coordinated Science LaboratoryUniversity of Illinois at Urbana-Champaign
Article

DOI: 10.1007/s11263-012-0582-z

Cite this article as:
Zhang, T., Ghanem, B., Liu, S. et al. Int J Comput Vis (2013) 101: 367. doi:10.1007/s11263-012-0582-z

Abstract

In this paper, we formulate object tracking in a particle filter framework as a structured multi-task sparse learning problem, which we denote as Structured Multi-Task Tracking (S-MTT). Since we model particles as linear combinations of dictionary templates that are updated dynamically, learning the representation of each particle is considered a single task in Multi-Task Tracking (MTT). By employing popular sparsity-inducing \(\ell _{p,q}\) mixed norms \((\text{ specifically} p\in \{2,\infty \}\) and \(q=1),\) we regularize the representation problem to enforce joint sparsity and learn the particle representations together. As compared to previous methods that handle particles independently, our results demonstrate that mining the interdependencies between particles improves tracking performance and overall computational complexity. Interestingly, we show that the popular \(L_1\) tracker (Mei and Ling, IEEE Trans Pattern Anal Mach Intel 33(11):2259–2272, 2011) is a special case of our MTT formulation (denoted as the \(L_{11}\) tracker) when \(p=q=1.\) Under the MTT framework, some of the tasks (particle representations) are often more closely related and more likely to share common relevant covariates than other tasks. Therefore, we extend the MTT framework to take into account pairwise structural correlations between particles (e.g. spatial smoothness of representation) and denote the novel framework as S-MTT. The problem of learning the regularized sparse representation in MTT and S-MTT can be solved efficiently using an Accelerated Proximal Gradient (APG) method that yields a sequence of closed form updates. As such, S-MTT and MTT are computationally attractive. We test our proposed approach on challenging sequences involving heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that S-MTT is much better than MTT, and both methods consistently outperform state-of-the-art trackers.

Keywords

Visual trackingParticle filterGraphStructureSparse representationMulti-task learning

Supplementary material

11263_2012_582_MOESM10_ESM.avi (1.6 mb)
Supplementary material 1 (avi 1635 KB)
11263_2012_582_MOESM11_ESM.avi (1.5 mb)
Supplementary material 2 (avi 1575 KB)
11263_2012_582_MOESM12_ESM.avi (1.8 mb)
Supplementary material 3 (avi 1890 KB)
11263_2012_582_MOESM13_ESM.avi (1.7 mb)
Supplementary material 4 (avi 1777 KB)
11263_2012_582_MOESM14_ESM.avi (4.4 mb)
Supplementary material 5 (avi 4539 KB)
11263_2012_582_MOESM15_ESM.avi (2.9 mb)
Supplementary material 6 (avi 2923 KB)
11263_2012_582_MOESM1_ESM.avi (2.2 mb)
Supplementary material 7 (avi 2204 KB)
11263_2012_582_MOESM2_ESM.avi (3.1 mb)
Supplementary material 8 (avi 3211 KB)
11263_2012_582_MOESM3_ESM.avi (1.6 mb)
Supplementary material 9 (avi 1682 KB)
11263_2012_582_MOESM4_ESM.avi (2.6 mb)
Supplementary material 10 (avi 2641 KB)
11263_2012_582_MOESM5_ESM.avi (4.4 mb)
Supplementary material 11 (avi 4459 KB)
11263_2012_582_MOESM6_ESM.avi (5.3 mb)
Supplementary material 12 (avi 5427 KB)
11263_2012_582_MOESM7_ESM.avi (1.6 mb)
Supplementary material 13 (avi 1603 KB)
11263_2012_582_MOESM8_ESM.avi (2.8 mb)
Supplementary material 14 (avi 2836 KB)
11263_2012_582_MOESM9_ESM.avi (2.7 mb)
Supplementary material 15 (avi 2762 KB)

Copyright information

© Springer Science+Business Media New York 2012