Skip to main content
Log in

Object Tracking by Incremental Structural Learning of Deformable Parts

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Online object tracking in various scenarios remains a challenging problem as it entails a contradiction between distinguishing target from background and learning target appearance in scene. In this paper, we attempt to address this problem from three different perspectives. To prompt the descriptiveness of appearance model, we introduce structural model from object detection field into tracking tasks, tracing both target as well as its deformable parts simultaneously. To prompt the robustness of tracking, we propose a logistics regression-based voting method to exclude the influence of occluded parts in tracking results and model update. Finally, we propose an online method to incrementally learn structural appearance model with the samples filtered by occlusion handling mechanism. Empirical results demonstrate that the proposed tracking framework outperforms other leading methods especially on challenging object tracking tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. A. Adam, E. Rivlin, I. Shimshoni, Robust fragments-based tracking using the integral histogram, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 798–805 (2006). doi:10.1109/CVPR.2006.256

  2. B. Babenko, M.H. Yang, S. Belongie, Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011). doi:10.1109/TPAMI.2010.226

    Article  Google Scholar 

  3. L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of COMPSTAT’2010 (Springer, 2010), pp. 177–186

  4. L. Bourdev, S. Maji, J. Malik, Describing people: A poselet-based approach to attribute classification, in 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1543–1550 (2011). doi:10.1109/ICCV.2011.6126413

  5. G. Bradski, Real time face and object tracking as a component of a perceptual user interface, in IEEE Workshop on Applications of Computer Vision, pp. 214–219 (1998)

  6. X. Cheng, N. Li, S. Zhang, Z. Wu, Robust visual tracking with sift features and fragments based on particle swarm optimization. Circuits Syst. Signal Process. 33(5), 1507–1526 (2014). doi:10.1007/s00034-013-9713-1

    Article  Google Scholar 

  7. D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift, in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 142–149 (2000)

  8. N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)

  9. M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The PASCAL Visual Object Classes Challenge (2009) (VOC2009) Results

  10. J. Fan, X. Shen, Y. Wu, What are we tracking: a unified approach of tracking and recognition. IEEE Trans. Image Process. 22(2), 549–560 (2013). doi:10.1109/TIP.2012.2218827

    Article  MathSciNet  MATH  Google Scholar 

  11. P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010). doi:10.1109/TPAMI.2009.167

    Article  Google Scholar 

  12. P. Felzenszwalb, D. McAllester, D. Ramanan, A discriminatively trained, multiscale, deformable part model, in IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (2008). doi:10.1109/CVPR.2008.4587597

  13. S. Geman, D. Geman, Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)

    Article  MATH  Google Scholar 

  14. L. Gorelick, M. Blank, E. Shechtman, M. Irani, R. Basri, Actions as space–time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  15. S. Hare, S. Golodetz, A. Saffari, V. Vineet, M.M. Cheng, S.L. Hicks, P.H. Torr, Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)

    Article  Google Scholar 

  16. M. Isard, A. Blake, Condensationconditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)

    Article  Google Scholar 

  17. J. Kwon, K.M. Lee, Visual tracking decomposition, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1269–1276 (2010). doi:10.1109/CVPR.2010.5539821

  18. J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)

  19. Y. Li, H. Ai, C. Huang, S. Lao, Robust head tracking based on a multi-state particle filter, in International Conference on Automatic Face and Gesture Recognition, pp. 335–340 (2006)

  20. Y. Li, Y. Shen, Z. Liu, P. He, Tracking a maneuvering target in clutter by a new smoothing particle filter, in Proceedings of the IEEE on Instrumentation and Measurement Technology Conference, vol. 2, pp. 843–848 (2005)

  21. J. Liu, B. Kuipers, S. Savarese, Recognizing human actions by attributes, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3337–3344 (2011). doi:10.1109/CVPR.2011.5995353

  22. H. Nam, B. Han, Learning multi-domain convolutional neural networks for visual tracking. arXiv preprint arXiv:1510.07945 (2015)

  23. D.A. Ross, J. Lim, R.S. Lin, M.H. Yang, Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 (2008)

    Article  Google Scholar 

  24. S. Shalev-Shwartz, Y. Singer, N. Srebro, A. Cotter, Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. G. Sharma, F. Jurie, C. Schmid, Expanded parts model for human attribute and action recognition in still images, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–659 (2013). doi:10.1109/CVPR.2013.90

  26. G. Shu, A. Dehghan, O. Oreifej, E. Hand, M. Shah, Part-based multiple-person tracking with partial occlusion handling, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1815–1821 (2012). doi:10.1109/CVPR.2012.6247879

  27. L. Si, Z. Tianzhu, C. Xiaochun, X. Changsheng, Structural correlation filter for robust visual tracking, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 00, pp. 4312–4320 (2016). doi:10.1109/CVPR.2016.467

  28. Y. Tian, R. Sukthankar, M. Shah, Spatiotemporal deformable part models for action detection, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2649 (2013). doi:10.1109/CVPR.2013.341

  29. D. Wang, H. Lu, M.H. Yang, Online object tracking with sparse prototypes. IEEE Trans. Image Process. 22(1), 314–325 (2013). doi:10.1109/TIP.2012.2202677

    Article  MathSciNet  MATH  Google Scholar 

  30. Y. Wang, Q. Ji, A dynamic conditional random field model for object segmentation in image sequences, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 264–270 (2005)

  31. Y. Wang, K.F. Loe, J.K. Wu, A dynamic conditional random field model for foreground and shadow segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 279–289 (2006)

    Article  Google Scholar 

  32. Y. Wang, G. Mori, Max-margin hidden conditional random fields for human action recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 872–879 (2009)

  33. Y. Wu, J. Lim, M.H. Yang, Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  34. Y. Xie, H. Chang, Z. Li, L. Liang, X. Chen, D. Zhao, A unified framework for locating and recognizing human actions, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 25–32 (2011). doi:10.1109/CVPR.2011.5995648

  35. R. Yao, Q. Shi, C. Shen, Y. Zhang, A. van den Hengel, Part-based visual tracking with online latent structural learning, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2363–2370 (2013). doi:10.1109/CVPR.2013.306

  36. C.N.J. Yu, T. Joachims, Learning structural svms with latent variables, in Proceedings of the 26th Annual International Conference on Machine Learning. ACM, pp. 1169–1176 (2009)

  37. S. Zhang, X. Cheng, H. Guo, L. Zhou, Z. Wu, Tracking deformable parts via dynamic conditional random fields, in 2014 IEEE International Conference on Image Processing (ICIP), pp. 476–480 (2014). doi:10.1109/ICIP.2014.7025095

  38. T. Zhang, A. Bibi, B. Ghanem, In defense of sparse tracking: circulant sparse tracker, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3880–3888 (2016)

  39. Y. Zhang, T. Hesketh, H. Wang, J. Liu, D. Xiao, Actuator fault compensation for nonlinear systems using adaptive tracking control. Circuits Syst. Signal Process. 29(3), 419–430 (2010). doi:10.1007/s00034-010-9152-1

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Chinese National Natural Science Foundation (Grant No. 61571106), Scientific Research Foundation of NJUPT (No. NY213102) and Natural Science Foundation in Universities on Jiangsu Province (16KJB510032).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suofei Zhang.

Appendix: Derivation of DCRF on DPMs

Appendix: Derivation of DCRF on DPMs

According to the Hammersley–Clifford theorem, the posterior probability of random field \(s_t\) at time t in Eq. (5) can be given by a Gibbs distribution [18] as

$$\begin{aligned} p(s_t|o_{1:t})\propto \exp \Bigg \{\sum _{x\in {X}}\Bigg [V_x(s_t(x)|o_{1:t}(x)) +\sum _{y\in {N_x}}V_{x,y}(s_t(x),s_t(y))\Bigg ]\Bigg \}. \end{aligned}$$
(17)

Since in the context of DPMs, we only account part deformation as unidirectional pairwise potential \(V_{x,y}(s_t(x),s_t(y))\). The prior knowledge from previous frame can be factorized to each single vertex directly. With similar conditional independence assumption in [31], the observation model \(p(o_{t+1}|s_{t+1})\) in Eq. (5) can also be evaluated by unary potentials on vertices:

$$\begin{aligned} p(o_{t+1}|s_{t+1})=\prod _{x\in {X}}p(o_{t+1}(x)|s_{t+1}(x))\propto \exp \big (\sum _{x\in {X}}V_x(o_{t+1}(x)|s_{t+1}(x))\big ). \end{aligned}$$
(18)

The state transition probability \(p(s_{t+1}|s_t)\) in Eq. (5) consists of both temporal pairwise potentials and spatial pairwise potentials:

$$\begin{aligned} p(s_{t+1}|s_t)\propto \exp \big \{\sum _{x\in {X}}\big [V_x(s_{t+1}(x)|s_t(M_x)) +\sum _{y\in {N_x}}V_{x,y}(s_{t+1}(x),s_{t+1}(y))\big ]\big \}\nonumber \\ \end{aligned}$$
(19)

where potential \(V_x(s_{t+1}(x)|s_t(M_x))\) actually denotes the mean of pairwise potentials between x and its temporal neighboring vertices:

$$\begin{aligned} V_x(s_{t+1}(x)|s_t(M_x))=\frac{1}{|M_x|}\sum _{y\in {M_x}}V_x(s_{t+1}(x)|s_t(y)). \end{aligned}$$
(20)

Combining Eqs. (17), (18) and (19) into Eq. (5), the probability \(p(s_{t+1}|o_{1:t+1})\) becomes

$$\begin{aligned} p(s_{t+1}|o_{1:t+1})\propto & {} \exp \big (\sum _{x\in {X}}V_x(o_{t+1}(x)|s_{t+1}(x))\big ) \cdot \sum _{s_t}\prod _{x\in {x}} \exp \big [V_x(s_{t+1}(x)|s_t(M_x))\nonumber \\&+\sum _{y\in {N_x}}V_{x,y}(s_{t+1}(x),s_{t+1}(y))\big ]p(s_t(x)|o_{1:t}(x))\nonumber \\= & {} \prod _{x\in {X}}\exp \big [V_x(o_{t+1}(x)|s_{t+1}(x))+\sum _{y\in {N_x}}V_{x,y} (s_{t+1}(x),s_{t+1}(y))\big ]\nonumber \\&\cdot \sum _{s_t(x)}\exp \big [V_x(s_{t+1}(x)|s_t(M_x))\big ]p(s_t(x)|o_{1:t}(x)). \end{aligned}$$
(21)

Using Jensen’s inequality, we can evaluate \(p(s_{t+1}|o_{1:t+1})\) by its lower bound:

$$\begin{aligned} p(s_{t+1}|o_{1:t+1})\approx & {} \prod _{x\in {X}}\exp \big [V_x(o_{t+1}(x)|s_{t+1}(x)) +\sum _{y\in {N_x}}V_{x,y}(s_{t+1}(x),s_{t+1}(y))\big ]\nonumber \\&\cdot \exp \Big \{\sum _{s_t(x)}p(s_t(x)|o_{1:t}(x))\big [\frac{1}{|M_x|} \sum _{y\in {M_x}}V_x(s_{t+1}(x)|s_t(y))\big ]\Big \}.\nonumber \\ \end{aligned}$$
(22)

Here we only consider corresponding vertex at last frame as temporal neighbor to current vertex. Equation (22) can be simply rewritten as Eq. (6).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Xing, L., Zhou, L. et al. Object Tracking by Incremental Structural Learning of Deformable Parts. Circuits Syst Signal Process 37, 255–276 (2018). https://doi.org/10.1007/s00034-017-0546-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-017-0546-1

Keywords

Navigation