Skip to main content
Log in

Directional coherence-based spatiotemporal descriptor for object detection in static and dynamic scenes

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper presents a simple, yet powerful local descriptor, so-called the histograms of space–time dominant orientations (HiSTDO). Specifically, our HiSTDO is composed of two main components, i.e., the dominant orientation and its coherence, which represents how intensively gradients in the local region are distributed along the space–time dominant orientation. By incorporating them into the histogram, we define it as our HiSTDO descriptor. In contrast to previous methods vulnerable to the presence of the background clutter and the camera noise, our HiSTDO greatly encodes the space–time shape of underlying structures even under such challenging conditions, and it can thus be efficiently applied to various applications (e.g., object and action detection). Experimental results on diverse datasets demonstrate that the proposed descriptor is effective for human action as well as object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  2. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 886–893 (2005)

  3. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  4. Zeng, C., Ma, H., Ming, A.: Fast human detection using mi-SVM and a cascade of HOG-LBP features. In: Proceedings IEEE international conference on image processing (ICIP), pp. 3845–3848 (2010)

  5. Zhang, J., Huang, K., Yu, Y., Tan, T.: Boosted local structured HOG-LBP for object localization. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 1393–1400 (2011)

  6. Jia, W., Hu, R.-X., Lei, Y.-K., Zhao, Y., Gui, J.: Histogram of oriented lines for palmprint recognition. IEEE Trans. Syst. Man Cybern.: Syst. 44(3), 385–395 (2014)

    Article  Google Scholar 

  7. Pang, Y., Zhang, K., Yuan, Y., Wang, K.: Distributed object detection with linear SVMs. IEEE Trans. Cybern. 44(11), 2122–2133 (2014)

    Article  Google Scholar 

  8. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings ACM international conference on multimedia, pp. 357–360 (2007)

  9. Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: Proceedings British machine vision conference (BMVC), pp. 995–1004 (2008)

  10. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of optical flow and appreance. In: Proceedings European conference on computer vision (ECCV), pp. 428–441 (2006)

  11. Baker, S., Roth, S., Scharstein, D., Black, M. J., Lewis, J., Szeliski, R.: A database and evaluation methodology for optical flow. In: Proceedings IEEE international conference on computer vision (ICCV), pp. 1–8 (2007)

  12. Kim, W., Yoo, B., Han, J-J.: HDO : a novel local image descriptor. In: Proceedings IEEE international conference on image processing (ICIP), pp. 5671–5675 (2014)

  13. Kim, W., Kim, C.: Spatiotemporal saliency detection using textural contrast and its applications. IEEE Trans. Circuits Syst. Video Technol. 24(4), 646–659 (2014)

    Article  Google Scholar 

  14. Rowley, H., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 22–38 (1998)

    Article  Google Scholar 

  15. Devernay, F.: A non-maxima suppression method for edge detection with sub-pixel accuracy. Technical report, INRIA, no. RR-2724 (1995)

  16. Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1475–1490 (2004)

    Article  Google Scholar 

  17. Bourdev, L., Brandt, J.: Robust object detection via soft cascade. Proc. IEEE Comput. Vis. Pattern Recognit. (CVPR) 2, 236–243 (2005)

    Google Scholar 

  18. Li, J., Wang, T., Zhang, Y.: Face detection using SURF cascade. In: Proceedings IEEE computer vision and pattern recognition workshops (CVPRW), pp. 2183–2190 (2011)

  19. Liao, S., Jain, A.K., Li, S.: A fast and accurate unconstrained face detector. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 211–223 (2016)

    Article  Google Scholar 

  20. Wang, H., Klaser, A., Schmid, C., Liu, C-L.: Action recognition by dense trajectories. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 3169–3176 (2011)

  21. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings IEEE international conference on computer vision (ICCV), pp. 3551–3558 (2013)

  22. Kappor, A., Winn, J.: Located hidden random fields: learning discriminative parts for object Detection. In: Proceedings European conference on computer vision (ECCV), pp. 302–315 (2006)

  23. Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vision 77(1), 259–289 (2008)

    Article  Google Scholar 

  24. Shechtman, E., Irani, M.: Space-time behavior-based correlation-or-how to tell if two underlying motion fields are similar without computing them? IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 2045–2056 (2007)

    Article  Google Scholar 

  25. Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: Proceedings IEEE international conference on computer vision (ICCV), pp. 1–8 (2007)

  26. Tian, Y., Cao, L., Liu, Z., Zhang, Z.: Hierarchical filtered motion for action recognition in crowded videos. IEEE Trans. Syst. Man Cybern.-Part C: Appl. Rev. 42(3), 313–323 (2012)

    Article  Google Scholar 

  27. Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 2442–2449 (2009)

  28. Siva, P., Xiang, T.: Weakly supervised action detection. In: Proceedings British machine vision conference (BMVC), pp. 65.1–65.11 (2011)

  29. Roshtkhari, M.J., Levine, M.D.: Human activity recognition in videos using a single example. Image Vis. Comput. 31(11), 864–876 (2013)

  30. Adeli-Mosabbeb, E., Fathy, M.: Non-negative matrix completion for action detection. Image Vis. Comput. 39(7), 38–51 (2015)

    Article  Google Scholar 

  31. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings IEEE conference on pattern recognition (ICPR), pp. 32–36 (2004)

  32. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 1996–2003 (2009)

  33. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 2929–2936 (2009)

  34. Wang, H., Klaser, A., Schmid, C., Liu, L.: Action recognition by dense trajectories. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 3169–3176 (2011)

  35. Reddy, K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24, 971–981 (2013)

    Article  Google Scholar 

  36. Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Proceedings European conference on computer vision (ECCV), pp. 494–507 (2010)

  37. Le, Q. V., Zou, W. Y., Yeung, S. Y., Ng, A. Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 3361–3368 (2011)

  38. Chetverikov, D., Axt, A.: Approximation-free running SVD and its application to motion detection. Pattern Recogn. Lett. 31(9), 891–897 (2010)

  39. Liu, X., Wen, Z., Zhang, Y.: Limited memory block Krylov subspace optimization for computing dominant singular value decomposition. SIAM J. Sci. Comput. 35(3), 1641–1668 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wonjun Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, W., Han, JJ. Directional coherence-based spatiotemporal descriptor for object detection in static and dynamic scenes. Machine Vision and Applications 28, 49–59 (2017). https://doi.org/10.1007/s00138-016-0801-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-016-0801-7

Keywords

Navigation