Skip to main content
Log in

Robust tracking based on H-CNN with low-resource sampling and scaling by frame-wise motion localization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In big data age, learning with deep models has shown its outstanding effectiveness in a variety of vision tasks. Unfortunately, the requirement of enormous training samples and computational cost still limit its practicability in the low resource media computing based applications such online object tracking. More recently, CNN based feature extraction has helped tracking-by-learning strategies make a significant progress, although the coarse resolution outputs from the last layer still substantially limit a further improvement of tracking performance. By exploiting the hierarchies of convolutional layers as an image pyramid representation, earlier convolutional layers of hierarchical CNN have shown a certain enhancement of spatial localization but are less invariant to target appearance changes, which inevitably led to an inaccurate region for sampling when the non-rigid objects have intrinsic motion. To guarantee a qualified sampling for tracking-by-learning with hierarchical CNN, in this paper, we incorporated an inter-frame motion guidance with the intra-frame appearance correlations by formulating different energy optimization process in both spatial and temporal domains. With an optional functionality for the extracted regions combination, the proposed algorithm is able to achieve more precise target localization for qualified sampling. Experiments on challenging non-rigid tracking benchmark dataset have demonstrated a superior performance of the proposed tracking in comparison to the other state-of-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Babenko B, Yang M.-H., Belongie S (2009) Visual tracking with online multiple instance learning. In: IEEE International conference on computer vision and pattern recognition (CVPR)

  2. Cai Z, Gu Z, Y ZL, Liu H (2016) A real-time visual object tracking system based on kalman filter and mb-lbp feature matching. Multimed Tool Appl (MTAP) 75:2393–2409

    Article  Google Scholar 

  3. Choi JW, Whangbo TK, Kim CG (2015) A contour tracking method of large motion object using optical flow and active contour model. Multimed Tool Appl (MTAP) 74:199–210

    Article  Google Scholar 

  4. Danelljan M, Hager G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference (BMVC), pp 1–11

  5. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE International conference on computer vision and pattern recognition (CVPR)

  6. Dinh TB, Vo N, Medioni G (2011) Context tracker: Exploring supporters and distracters in unconstrained environments. In: IEEE International conference on computer vision and pattern recognition (CVPR)

  7. Godec M, Roth PM, Bischof H (2011) Hough-based tracking of non-rigid objects. In: IEEE International conference on computer vision (ICCV)

  8. Hare S, Saffari A, Torr PH (2011) Struck: Structured output tracking with kernels. In: IEEE International conference on computer vision (ICCV), pp 263–270

  9. Henriques F, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision (ECCV)

  10. Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell (T-PAMI) 37 (3):583–596

    Article  Google Scholar 

  11. Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. International Conference on Machine Learning (ICML) pp. 597–606

  12. Hong Z, Chen Z, Wang C, Mei X, Prokhorov D, Tao D (2015) Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 749–758

  13. Kalal Z, Matas J, Mikolajczyk K (2010) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell (T-PAMI) 6(1):1409–1422

    Google Scholar 

  14. Kwon J, Lee KM (2010) Visual tracking decomposition. In: IEEE International conference on computer vision and pattern recognition (CVPR)

  15. Kwon J, Lee KM (2011) Tracking by sampling trackers. In: IEEE International conference on computer vision (ICCV)

  16. Li H, Li Y, Porikli F (2014) Deeptrack: Learning discriminative feature representations by convolutional neural networks for visual tracking British Machine Vision Conference (BMVC)

  17. Li H, Li Y, Porikli F (2014) Robust online visual tracking with a single convolutional neural network. Asian Conference on Computer Vision (ACCV) pp. 194–209

  18. Liu C (2009) Beyond pixels: Exploring new representations and applications for motion analysis. Ph.D Thesis of Massachusetts Institute of Technology

  19. Liu B, Huang J, Yang L, Kulikowsk C (2011) Robust tracking using local sparse appearance model and k-selection. In: IEEE International conference on computer vision and pattern recognition (CVPR)

  20. Liu T, Tao D, Song M, Maybank SJ (2016) Algorithm-dependent generalization bounds for multi-task learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)

  21. Ma C, Huang JB, Yang X, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: IEEE International conference on computer vision (ICCV), pp 3074–3082

  22. Oron S, Bar-Hillel A, Levi D, Avidan S (2012) Locally orderless tracking. In: IEEE International conference on computer vision and pattern recognition (CVPR)

  23. Pan Z, Liu S, Fu W (2016) A review of visual moving target tracking. Multimedia Tools and Applications (MTAP). doi:10.1007/s11042-016-3647-0

  24. Rother C, Kolmogorov V, Blake A (2004) Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Trans Graph (TOG) 23(3):309–314

    Article  Google Scholar 

  25. Sevilla-Lara L, Learned-Miller E (2012) Distribution fields for tracking. In: IEEE International conference on computer vision and pattern recognition (CVPR)

  26. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition International Conference on Learning Representations (ICLR)

  27. Son J, Jung I, Park K, Han B (2015) Tracking-by-segmentation with online gradient boosting decision tree. In: IEEE International conference on computer vision (ICCV), pp 3056–3064

  28. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. Neural Information Processing Systems (NIPS) pp. 809–817

  29. Wang L, Liu T, Wang G, Chan KL, Yang Q (2015) Video tracking using learned hierarchical features. IEEE Trans Image Process (T-IP) 24(4):1424–1435

    Article  MathSciNet  Google Scholar 

  30. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell (T-PAMI) 37(9):1834–1848

    Article  Google Scholar 

  31. Wu Z, Yang J, Liu H, Zhang Q (2016) A real-time object tracking via l2-rls and compressed haar-like features matching. Multimed Tool Appl (MTAP) 75:9427–9443

    Article  Google Scholar 

  32. Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics p. doi:10.1109/TCYB.2016.2591583

  33. Yu J, Kuang Z, Zhang B, Lin D, Fan J (2016) Image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Transactions on Information Forensics and Security p. doi:10.1109/TIFS.2016.2636090

  34. Zhang J, Ma S, Sclaroff S (2014) Meem: Robust tracking via multiple experts using entropy minimization. In: European conference on computer vision (ECCV), pp 188–203

  35. Zhang K, Zhang L, Yang M.-H. (2012) Real-time compressive tracking. In: European conference on computer vision (ECCV)

  36. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European Conference on Computer Vision (ECCV) pp. 818–833

Download references

Acknowledgments

This research is supported by National Natural Science Foundation of China 61571362 & 61601505, and the National Research Foundation, Prime Ministers Office, Singapore under its International Research Centre in Singapore Funding Initiative.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Peng Zhang or Tao Zhuo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, P., Zhuo, T., Huang, H. et al. Robust tracking based on H-CNN with low-resource sampling and scaling by frame-wise motion localization. Multimed Tools Appl 77, 18781–18800 (2018). https://doi.org/10.1007/s11042-017-4493-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4493-4

Keywords

Navigation