Abstract
Online object tracking in various scenarios remains a challenging problem as it entails a contradiction between distinguishing target from background and learning target appearance in scene. In this paper, we attempt to address this problem from three different perspectives. To prompt the descriptiveness of appearance model, we introduce structural model from object detection field into tracking tasks, tracing both target as well as its deformable parts simultaneously. To prompt the robustness of tracking, we propose a logistics regression-based voting method to exclude the influence of occluded parts in tracking results and model update. Finally, we propose an online method to incrementally learn structural appearance model with the samples filtered by occlusion handling mechanism. Empirical results demonstrate that the proposed tracking framework outperforms other leading methods especially on challenging object tracking tasks.
Similar content being viewed by others
References
A. Adam, E. Rivlin, I. Shimshoni, Robust fragments-based tracking using the integral histogram, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 798–805 (2006). doi:10.1109/CVPR.2006.256
B. Babenko, M.H. Yang, S. Belongie, Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011). doi:10.1109/TPAMI.2010.226
L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of COMPSTAT’2010 (Springer, 2010), pp. 177–186
L. Bourdev, S. Maji, J. Malik, Describing people: A poselet-based approach to attribute classification, in 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1543–1550 (2011). doi:10.1109/ICCV.2011.6126413
G. Bradski, Real time face and object tracking as a component of a perceptual user interface, in IEEE Workshop on Applications of Computer Vision, pp. 214–219 (1998)
X. Cheng, N. Li, S. Zhang, Z. Wu, Robust visual tracking with sift features and fragments based on particle swarm optimization. Circuits Syst. Signal Process. 33(5), 1507–1526 (2014). doi:10.1007/s00034-013-9713-1
D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift, in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 142–149 (2000)
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The PASCAL Visual Object Classes Challenge (2009) (VOC2009) Results
J. Fan, X. Shen, Y. Wu, What are we tracking: a unified approach of tracking and recognition. IEEE Trans. Image Process. 22(2), 549–560 (2013). doi:10.1109/TIP.2012.2218827
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010). doi:10.1109/TPAMI.2009.167
P. Felzenszwalb, D. McAllester, D. Ramanan, A discriminatively trained, multiscale, deformable part model, in IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (2008). doi:10.1109/CVPR.2008.4587597
S. Geman, D. Geman, Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)
L. Gorelick, M. Blank, E. Shechtman, M. Irani, R. Basri, Actions as space–time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
S. Hare, S. Golodetz, A. Saffari, V. Vineet, M.M. Cheng, S.L. Hicks, P.H. Torr, Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)
M. Isard, A. Blake, Condensationconditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)
J. Kwon, K.M. Lee, Visual tracking decomposition, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1269–1276 (2010). doi:10.1109/CVPR.2010.5539821
J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
Y. Li, H. Ai, C. Huang, S. Lao, Robust head tracking based on a multi-state particle filter, in International Conference on Automatic Face and Gesture Recognition, pp. 335–340 (2006)
Y. Li, Y. Shen, Z. Liu, P. He, Tracking a maneuvering target in clutter by a new smoothing particle filter, in Proceedings of the IEEE on Instrumentation and Measurement Technology Conference, vol. 2, pp. 843–848 (2005)
J. Liu, B. Kuipers, S. Savarese, Recognizing human actions by attributes, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3337–3344 (2011). doi:10.1109/CVPR.2011.5995353
H. Nam, B. Han, Learning multi-domain convolutional neural networks for visual tracking. arXiv preprint arXiv:1510.07945 (2015)
D.A. Ross, J. Lim, R.S. Lin, M.H. Yang, Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 (2008)
S. Shalev-Shwartz, Y. Singer, N. Srebro, A. Cotter, Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)
G. Sharma, F. Jurie, C. Schmid, Expanded parts model for human attribute and action recognition in still images, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–659 (2013). doi:10.1109/CVPR.2013.90
G. Shu, A. Dehghan, O. Oreifej, E. Hand, M. Shah, Part-based multiple-person tracking with partial occlusion handling, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1815–1821 (2012). doi:10.1109/CVPR.2012.6247879
L. Si, Z. Tianzhu, C. Xiaochun, X. Changsheng, Structural correlation filter for robust visual tracking, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 00, pp. 4312–4320 (2016). doi:10.1109/CVPR.2016.467
Y. Tian, R. Sukthankar, M. Shah, Spatiotemporal deformable part models for action detection, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2649 (2013). doi:10.1109/CVPR.2013.341
D. Wang, H. Lu, M.H. Yang, Online object tracking with sparse prototypes. IEEE Trans. Image Process. 22(1), 314–325 (2013). doi:10.1109/TIP.2012.2202677
Y. Wang, Q. Ji, A dynamic conditional random field model for object segmentation in image sequences, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 264–270 (2005)
Y. Wang, K.F. Loe, J.K. Wu, A dynamic conditional random field model for foreground and shadow segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 279–289 (2006)
Y. Wang, G. Mori, Max-margin hidden conditional random fields for human action recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 872–879 (2009)
Y. Wu, J. Lim, M.H. Yang, Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Y. Xie, H. Chang, Z. Li, L. Liang, X. Chen, D. Zhao, A unified framework for locating and recognizing human actions, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 25–32 (2011). doi:10.1109/CVPR.2011.5995648
R. Yao, Q. Shi, C. Shen, Y. Zhang, A. van den Hengel, Part-based visual tracking with online latent structural learning, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2363–2370 (2013). doi:10.1109/CVPR.2013.306
C.N.J. Yu, T. Joachims, Learning structural svms with latent variables, in Proceedings of the 26th Annual International Conference on Machine Learning. ACM, pp. 1169–1176 (2009)
S. Zhang, X. Cheng, H. Guo, L. Zhou, Z. Wu, Tracking deformable parts via dynamic conditional random fields, in 2014 IEEE International Conference on Image Processing (ICIP), pp. 476–480 (2014). doi:10.1109/ICIP.2014.7025095
T. Zhang, A. Bibi, B. Ghanem, In defense of sparse tracking: circulant sparse tracker, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3880–3888 (2016)
Y. Zhang, T. Hesketh, H. Wang, J. Liu, D. Xiao, Actuator fault compensation for nonlinear systems using adaptive tracking control. Circuits Syst. Signal Process. 29(3), 419–430 (2010). doi:10.1007/s00034-010-9152-1
Acknowledgements
This work was supported by the Chinese National Natural Science Foundation (Grant No. 61571106), Scientific Research Foundation of NJUPT (No. NY213102) and Natural Science Foundation in Universities on Jiangsu Province (16KJB510032).
Author information
Authors and Affiliations
Corresponding author
Appendix: Derivation of DCRF on DPMs
Appendix: Derivation of DCRF on DPMs
According to the Hammersley–Clifford theorem, the posterior probability of random field \(s_t\) at time t in Eq. (5) can be given by a Gibbs distribution [18] as
Since in the context of DPMs, we only account part deformation as unidirectional pairwise potential \(V_{x,y}(s_t(x),s_t(y))\). The prior knowledge from previous frame can be factorized to each single vertex directly. With similar conditional independence assumption in [31], the observation model \(p(o_{t+1}|s_{t+1})\) in Eq. (5) can also be evaluated by unary potentials on vertices:
The state transition probability \(p(s_{t+1}|s_t)\) in Eq. (5) consists of both temporal pairwise potentials and spatial pairwise potentials:
where potential \(V_x(s_{t+1}(x)|s_t(M_x))\) actually denotes the mean of pairwise potentials between x and its temporal neighboring vertices:
Combining Eqs. (17), (18) and (19) into Eq. (5), the probability \(p(s_{t+1}|o_{1:t+1})\) becomes
Using Jensen’s inequality, we can evaluate \(p(s_{t+1}|o_{1:t+1})\) by its lower bound:
Here we only consider corresponding vertex at last frame as temporal neighbor to current vertex. Equation (22) can be simply rewritten as Eq. (6).
Rights and permissions
About this article
Cite this article
Zhang, S., Xing, L., Zhou, L. et al. Object Tracking by Incremental Structural Learning of Deformable Parts. Circuits Syst Signal Process 37, 255–276 (2018). https://doi.org/10.1007/s00034-017-0546-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-017-0546-1