Object Tracking by Incremental Structural Learning of Deformable Parts

Zhang, Suofei; Xing, Lingzhi; Zhou, Lin; Sun, Zhixin

doi:10.1007/s00034-017-0546-1

Object Tracking by Incremental Structural Learning of Deformable Parts

Published: 04 April 2017

Volume 37, pages 255–276, (2018)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Suofei Zhang¹,
Lingzhi Xing²,
Lin Zhou³ &
…
Zhixin Sun¹

282 Accesses
Explore all metrics

Abstract

Online object tracking in various scenarios remains a challenging problem as it entails a contradiction between distinguishing target from background and learning target appearance in scene. In this paper, we attempt to address this problem from three different perspectives. To prompt the descriptiveness of appearance model, we introduce structural model from object detection field into tracking tasks, tracing both target as well as its deformable parts simultaneously. To prompt the robustness of tracking, we propose a logistics regression-based voting method to exclude the influence of occluded parts in tracking results and model update. Finally, we propose an online method to incrementally learn structural appearance model with the samples filtered by occlusion handling mechanism. Empirical results demonstrate that the proposed tracking framework outperforms other leading methods especially on challenging object tracking tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Part-Based Tracking with Appearance Learning and Structural Constrains

Robust object tracking via online discriminative appearance modeling

Article Open access 29 October 2019

Learning Discriminative Hidden Structural Parts for Visual Tracking

References

A. Adam, E. Rivlin, I. Shimshoni, Robust fragments-based tracking using the integral histogram, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 798–805 (2006). doi:10.1109/CVPR.2006.256
B. Babenko, M.H. Yang, S. Belongie, Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011). doi:10.1109/TPAMI.2010.226
Article Google Scholar
L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of COMPSTAT’2010 (Springer, 2010), pp. 177–186
L. Bourdev, S. Maji, J. Malik, Describing people: A poselet-based approach to attribute classification, in 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1543–1550 (2011). doi:10.1109/ICCV.2011.6126413
G. Bradski, Real time face and object tracking as a component of a perceptual user interface, in IEEE Workshop on Applications of Computer Vision, pp. 214–219 (1998)
X. Cheng, N. Li, S. Zhang, Z. Wu, Robust visual tracking with sift features and fragments based on particle swarm optimization. Circuits Syst. Signal Process. 33(5), 1507–1526 (2014). doi:10.1007/s00034-013-9713-1
Article Google Scholar
D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift, in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 142–149 (2000)
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The PASCAL Visual Object Classes Challenge (2009) (VOC2009) Results
J. Fan, X. Shen, Y. Wu, What are we tracking: a unified approach of tracking and recognition. IEEE Trans. Image Process. 22(2), 549–560 (2013). doi:10.1109/TIP.2012.2218827
Article MathSciNet MATH Google Scholar
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010). doi:10.1109/TPAMI.2009.167
Article Google Scholar
P. Felzenszwalb, D. McAllester, D. Ramanan, A discriminatively trained, multiscale, deformable part model, in IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (2008). doi:10.1109/CVPR.2008.4587597
S. Geman, D. Geman, Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)
Article MATH Google Scholar
L. Gorelick, M. Blank, E. Shechtman, M. Irani, R. Basri, Actions as space–time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Article Google Scholar
S. Hare, S. Golodetz, A. Saffari, V. Vineet, M.M. Cheng, S.L. Hicks, P.H. Torr, Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)
Article Google Scholar
M. Isard, A. Blake, Condensationconditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)
Article Google Scholar
J. Kwon, K.M. Lee, Visual tracking decomposition, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1269–1276 (2010). doi:10.1109/CVPR.2010.5539821
J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
Y. Li, H. Ai, C. Huang, S. Lao, Robust head tracking based on a multi-state particle filter, in International Conference on Automatic Face and Gesture Recognition, pp. 335–340 (2006)
Y. Li, Y. Shen, Z. Liu, P. He, Tracking a maneuvering target in clutter by a new smoothing particle filter, in Proceedings of the IEEE on Instrumentation and Measurement Technology Conference, vol. 2, pp. 843–848 (2005)
J. Liu, B. Kuipers, S. Savarese, Recognizing human actions by attributes, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3337–3344 (2011). doi:10.1109/CVPR.2011.5995353
H. Nam, B. Han, Learning multi-domain convolutional neural networks for visual tracking. arXiv preprint arXiv:1510.07945 (2015)
D.A. Ross, J. Lim, R.S. Lin, M.H. Yang, Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 (2008)
Article Google Scholar
S. Shalev-Shwartz, Y. Singer, N. Srebro, A. Cotter, Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)
Article MathSciNet MATH Google Scholar
G. Sharma, F. Jurie, C. Schmid, Expanded parts model for human attribute and action recognition in still images, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–659 (2013). doi:10.1109/CVPR.2013.90
G. Shu, A. Dehghan, O. Oreifej, E. Hand, M. Shah, Part-based multiple-person tracking with partial occlusion handling, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1815–1821 (2012). doi:10.1109/CVPR.2012.6247879
L. Si, Z. Tianzhu, C. Xiaochun, X. Changsheng, Structural correlation filter for robust visual tracking, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 00, pp. 4312–4320 (2016). doi:10.1109/CVPR.2016.467
Y. Tian, R. Sukthankar, M. Shah, Spatiotemporal deformable part models for action detection, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2649 (2013). doi:10.1109/CVPR.2013.341
D. Wang, H. Lu, M.H. Yang, Online object tracking with sparse prototypes. IEEE Trans. Image Process. 22(1), 314–325 (2013). doi:10.1109/TIP.2012.2202677
Article MathSciNet MATH Google Scholar
Y. Wang, Q. Ji, A dynamic conditional random field model for object segmentation in image sequences, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 264–270 (2005)
Y. Wang, K.F. Loe, J.K. Wu, A dynamic conditional random field model for foreground and shadow segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 279–289 (2006)
Article Google Scholar
Y. Wang, G. Mori, Max-margin hidden conditional random fields for human action recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 872–879 (2009)
Y. Wu, J. Lim, M.H. Yang, Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Y. Xie, H. Chang, Z. Li, L. Liang, X. Chen, D. Zhao, A unified framework for locating and recognizing human actions, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 25–32 (2011). doi:10.1109/CVPR.2011.5995648
R. Yao, Q. Shi, C. Shen, Y. Zhang, A. van den Hengel, Part-based visual tracking with online latent structural learning, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2363–2370 (2013). doi:10.1109/CVPR.2013.306
C.N.J. Yu, T. Joachims, Learning structural svms with latent variables, in Proceedings of the 26th Annual International Conference on Machine Learning. ACM, pp. 1169–1176 (2009)
S. Zhang, X. Cheng, H. Guo, L. Zhou, Z. Wu, Tracking deformable parts via dynamic conditional random fields, in 2014 IEEE International Conference on Image Processing (ICIP), pp. 476–480 (2014). doi:10.1109/ICIP.2014.7025095
T. Zhang, A. Bibi, B. Ghanem, In defense of sparse tracking: circulant sparse tracker, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3880–3888 (2016)
Y. Zhang, T. Hesketh, H. Wang, J. Liu, D. Xiao, Actuator fault compensation for nonlinear systems using adaptive tracking control. Circuits Syst. Signal Process. 29(3), 419–430 (2010). doi:10.1007/s00034-010-9152-1
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Chinese National Natural Science Foundation (Grant No. 61571106), Scientific Research Foundation of NJUPT (No. NY213102) and Natural Science Foundation in Universities on Jiangsu Province (16KJB510032).

Author information

Authors and Affiliations

Key Laboratory of Broadband Wireless Communication and Sensor Network Technology, Nanjing University of Posts and Telecommunications, Ministry of Education, Nanjing, 210003, China
Suofei Zhang & Zhixin Sun
North Information Control Group Co. Ltd, Nanjing, 210000, Jiangsu, China
Lingzhi Xing
School of Information Science and Engineering, Southeast University, Nanjing, 210096, Jiangsu, China
Lin Zhou

Authors

Suofei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lingzhi Xing
View author publications
You can also search for this author in PubMed Google Scholar
Lin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suofei Zhang.

Appendix: Derivation of DCRF on DPMs

According to the Hammersley–Clifford theorem, the posterior probability of random field $s_t$ at time t in Eq. (5) can be given by a Gibbs distribution [18] as

$$\begin{aligned} p(s_t|o_{1:t})\propto \exp \Bigg \{\sum _{x\in {X}}\Bigg [V_x(s_t(x)|o_{1:t}(x)) +\sum _{y\in {N_x}}V_{x,y}(s_t(x),s_t(y))\Bigg ]\Bigg \}. \end{aligned}$$

(17)

Since in the context of DPMs, we only account part deformation as unidirectional pairwise potential $V_{x,y}(s_t(x),s_t(y))$. The prior knowledge from previous frame can be factorized to each single vertex directly. With similar conditional independence assumption in [31], the observation model $p(o_{t+1}|s_{t+1})$ in Eq. (5) can also be evaluated by unary potentials on vertices:

$$\begin{aligned} p(o_{t+1}|s_{t+1})=\prod _{x\in {X}}p(o_{t+1}(x)|s_{t+1}(x))\propto \exp \big (\sum _{x\in {X}}V_x(o_{t+1}(x)|s_{t+1}(x))\big ). \end{aligned}$$

(18)

The state transition probability $p(s_{t+1}|s_t)$ in Eq. (5) consists of both temporal pairwise potentials and spatial pairwise potentials:

$$\begin{aligned} p(s_{t+1}|s_t)\propto \exp \big \{\sum _{x\in {X}}\big [V_x(s_{t+1}(x)|s_t(M_x)) +\sum _{y\in {N_x}}V_{x,y}(s_{t+1}(x),s_{t+1}(y))\big ]\big \}\nonumber \\ \end{aligned}$$

(19)

where potential $V_x(s_{t+1}(x)|s_t(M_x))$ actually denotes the mean of pairwise potentials between x and its temporal neighboring vertices:

$$\begin{aligned} V_x(s_{t+1}(x)|s_t(M_x))=\frac{1}{|M_x|}\sum _{y\in {M_x}}V_x(s_{t+1}(x)|s_t(y)). \end{aligned}$$

(20)

Combining Eqs. (17), (18) and (19) into Eq. (5), the probability $p(s_{t+1}|o_{1:t+1})$ becomes

$$\begin{aligned} p(s_{t+1}|o_{1:t+1})\propto & {} \exp \big (\sum _{x\in {X}}V_x(o_{t+1}(x)|s_{t+1}(x))\big ) \cdot \sum _{s_t}\prod _{x\in {x}} \exp \big [V_x(s_{t+1}(x)|s_t(M_x))\nonumber \\&+\sum _{y\in {N_x}}V_{x,y}(s_{t+1}(x),s_{t+1}(y))\big ]p(s_t(x)|o_{1:t}(x))\nonumber \\= & {} \prod _{x\in {X}}\exp \big [V_x(o_{t+1}(x)|s_{t+1}(x))+\sum _{y\in {N_x}}V_{x,y} (s_{t+1}(x),s_{t+1}(y))\big ]\nonumber \\&\cdot \sum _{s_t(x)}\exp \big [V_x(s_{t+1}(x)|s_t(M_x))\big ]p(s_t(x)|o_{1:t}(x)). \end{aligned}$$

(21)

Using Jensen’s inequality, we can evaluate $p(s_{t+1}|o_{1:t+1})$ by its lower bound:

$$\begin{aligned} p(s_{t+1}|o_{1:t+1})\approx & {} \prod _{x\in {X}}\exp \big [V_x(o_{t+1}(x)|s_{t+1}(x)) +\sum _{y\in {N_x}}V_{x,y}(s_{t+1}(x),s_{t+1}(y))\big ]\nonumber \\&\cdot \exp \Big \{\sum _{s_t(x)}p(s_t(x)|o_{1:t}(x))\big [\frac{1}{|M_x|} \sum _{y\in {M_x}}V_x(s_{t+1}(x)|s_t(y))\big ]\Big \}.\nonumber \\ \end{aligned}$$

(22)

Here we only consider corresponding vertex at last frame as temporal neighbor to current vertex. Equation (22) can be simply rewritten as Eq. (6).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Xing, L., Zhou, L. et al. Object Tracking by Incremental Structural Learning of Deformable Parts. Circuits Syst Signal Process 37, 255–276 (2018). https://doi.org/10.1007/s00034-017-0546-1

Download citation

Received: 13 June 2016
Revised: 20 March 2017
Accepted: 22 March 2017
Published: 04 April 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s00034-017-0546-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object Tracking by Incremental Structural Learning of Deformable Parts

Abstract

Access this article

Similar content being viewed by others

Part-Based Tracking with Appearance Learning and Structural Constrains

Robust object tracking via online discriminative appearance modeling

Learning Discriminative Hidden Structural Parts for Visual Tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Derivation of DCRF on DPMs

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Object Tracking by Incremental Structural Learning of Deformable Parts

Abstract

Access this article

Similar content being viewed by others

Part-Based Tracking with Appearance Learning and Structural Constrains

Robust object tracking via online discriminative appearance modeling

Learning Discriminative Hidden Structural Parts for Visual Tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Derivation of DCRF on DPMs

Appendix: Derivation of DCRF on DPMs

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation