Skip to main content

Temporal Frame Sub-Sampling for Video Object Tracking

Abstract

Temporal frame sub-sampling (TFS) for reducing video object tracking (VOT) computing time is investigated. With a sampling ratio N, the TFS VOT algorithm will process a shorter video by sampling 1 out of N frames of the given video. The object trajectory of the remaining frames will be interpolated linearly based on those of sampled frames. Thus, TFS can result in a significant reduction of processing time at a cost of losing tracking accuracy. More importantly, it can be applied to accelerate any VOT algorithms. However, it is observed that when the object trajectory is smooth, the tracking accuracy of a TFS VOT algorithm may be improved compared to non-TFS results. In this work, we provide an empirical analysis of this unexpected outcome of the TFS scheme. We also suggest rules of thumb to leverage this property to use TFS to enhance efficiency and accuracy of VOT.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

References

  1. Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys (CSUR), 38(4), 13.

    Article  Google Scholar 

  2. Wang, X., Hu, Y.H., Radwin, R. G., & Lee, J. D. (2015). Head tracking using video analytics. In Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 88–92.

  3. Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). "Tracking-learning- detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1409–1422, 2012.

  4. Babenko, B., Yang, M.-H., & Belongie, S. (2011). “Robust Object Tracking with Online Multiple Instance Learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1619–1632, 2011.

  5. Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M. M., Hicks, S. L., & Torr, P. H. (2016). Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10), 2096–2109.

    Article  Google Scholar 

  6. Grabner, H., Leistner, C., & Bischof, H. (2008). Semi-supervised on-line boosting for robust tracking. In Proc. European Conf. Computer Vision (ECCV), pp. 234–247.

  7. Zhong, W., Lu, H., & Yang, M.-H. (2012). Robust object tracking via sparsity-based collaborative model. In Proc. IEEE Conf. Computer vision and pattern recognition (CVPR), 2012, pp. 1838–1845.

  8. Wu, Y., Lim, J., & Yang, M.-H. (2013). Online object tracking: A benchmark. In Proc. IEEE Conf. Computer vision and pattern recognition (CVPR), 2013, pp. 2411–2418.

  9. Kristan, M. et al. (2013). The Visual Object Tracking vot 2013 challenge results. In Proc. IEEE Int’l Conf. Computer Vision Workshops, pp. 98–111.

  10. Kristan, M. et al. (2014). The Visual Object Tracking VOT 2014 challenge results. http://www.votchallenge.net/vot2014/results.html.

  11. Kristan, M. et al. (2015). The Visual Object Tracking VOT 2015 challenge results. In Proc. IEEE Int’l Conf. Computer Vision Workshops.

  12. Kristan, M. et al. (2016). The Visual Object Tracking VOT 2016 Challenge Results. In Proc. European Conf. Computer Vision (ECCV), Lecture Notes in Computer Science, vol 9914.

  13. Kristan, M. et al. (2017). The Visual Object Tracking VOT 2017 Challenge Results. In Proc. IEEE Int’l Conf. Computer Vision (ICCV).

  14. Kristan, M. et al. (2018). The sixth Visual Object Tracking VOT2018 challenge results. In Proc. European Conf. Computer Vision (ECCV) Workshop on visual object tracking challenge, 2018.

  15. Campbell, K. L. (2012). The SHRP 2 naturalistic driving study: Addressing driver performance and behavior in traffic safety. TR News, no. 282.

  16. Long-Term Detection and Tracking, CVPR (2014). http://www.micc.unifi.it/LTDT2014/

  17. Wang, X., Hu, Y.H., Radwin, R. G., & Lee, J. D. (2018). Frame-Subsampled, Drift-Resilient Video Object Tracking. In Proc. IEEE Int’l Conf. Acoustics, Speech, and Signal Processing (ICASSP), pp. 1573–1577.

  18. Wang, X., Hu, Y.H., Radwin, R. G., & Lee, J. D. (2018). Frame-Subsampled, Drift-Resilient Long-Term Video Object Tracking. In Proc. IEEE Int’l Conf. Multimedia and Expo (ICME), pp. 1–6.

  19. Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.

    Article  Google Scholar 

  20. Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In Proc. European Conf. Computer Vision (ECCV), pp. 850–865.

  21. Kılıç, V., Barnard, M., Wang, W. & Kittler, J. (2015) "Audio assisted robust visual tracking with adaptive particle filtering," IEEE Transactions on Multimedia, vol. 17, no. 2, pp. 186–200, 2015.

  22. De Freitas, A., et al. (2016). Autonomous crowds tracking with box particle filtering and convolution particle filtering. Automatica, 69(2016), 380–394.

    MathSciNet  Article  Google Scholar 

  23. Klein, J., Peters, C., Martin, J., Laurenzis, M., & Hullin, M. B. (2016). Tracking objects outside the line of sight using 2D intensity images. Scientific Reports, 6, 32491.

    Article  Google Scholar 

  24. Ochs, P., Malik, J., & Brox, T. (2014). "Segmentation of moving objects by long term video analysis." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 6, pp. 1187–1200, 2014.

  25. He, J., Balzano, L., & Szlam, A. (2012). Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In Proc. IEEE Conf. Computer vision and pattern recognition (CVPR), pp. 1568–1575.

  26. Leung, A. P., & Gong. S. (2007). Optimizing distribution-based matching by random subsampling. In: Proc. IEEE Conf. Computer vision and pattern recognition (CVPR), 2007, pp. 1–8.

  27. Wu, W., Bernal, E. A., Loce, R. P., & Hoover, M. E. (2015). Multi-resolution video analysis and key feature preserving video reduction strategy for (real-time) vehicle tracking and speed enforcement systems. U.S. Patent 8,953,044.

  28. Nam, H., & Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In Proc. IEEE Conf. Computer vision and pattern recognition (CVPR), pp. 4293–4302.

  29. Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 fps with deep regression networks. In Proc. European Conference on Computer Vision (ECCV), pp. 749–765.

  30. Korshunov, P., & Ooi. W. T. (2010). Reducing frame rate for object tracking. In Proc. Int’l Conf. Multimedia Modeling, Springer, Berlin, Heidelberg, pp. 454–464.

  31. Misra, I., Shrivastava, A., & Hebert, M. (2015). Watch and learn: Semi-supervised learning of object detectors from videos. In Proc. IEEE Conf. Computer vision and pattern recognition (CVPR), pp. 3593–3602.

  32. Parseval, M. A. (1806). Mémoire sur les séries et sur l'intégration complète d'une équation aux différences partielles linéaire du second ordre, à coefficients constants, Mém. Prés. par divers savants, Acad. des Sciences, Paris, vol. 1, no. 1, 1806, pp. 638–648.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is submitted on date: April 18, 2019. This material is based upon work supported by the US Department of Transportation, Federal Highway Administration under contract number DTFH6114C00011.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Hu, Y.H., Radwin, R.G. et al. Temporal Frame Sub-Sampling for Video Object Tracking. J Sign Process Syst 92, 569–581 (2020). https://doi.org/10.1007/s11265-019-01488-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-019-01488-z

Keywords

  • Computing time
  • Sub-sampling
  • Video object tracking