Abstract
This paper presents a simple, yet powerful local descriptor, so-called the histograms of space–time dominant orientations (HiSTDO). Specifically, our HiSTDO is composed of two main components, i.e., the dominant orientation and its coherence, which represents how intensively gradients in the local region are distributed along the space–time dominant orientation. By incorporating them into the histogram, we define it as our HiSTDO descriptor. In contrast to previous methods vulnerable to the presence of the background clutter and the camera noise, our HiSTDO greatly encodes the space–time shape of underlying structures even under such challenging conditions, and it can thus be efficiently applied to various applications (e.g., object and action detection). Experimental results on diverse datasets demonstrate that the proposed descriptor is effective for human action as well as object detection.
Similar content being viewed by others
References
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 886–893 (2005)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Zeng, C., Ma, H., Ming, A.: Fast human detection using mi-SVM and a cascade of HOG-LBP features. In: Proceedings IEEE international conference on image processing (ICIP), pp. 3845–3848 (2010)
Zhang, J., Huang, K., Yu, Y., Tan, T.: Boosted local structured HOG-LBP for object localization. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 1393–1400 (2011)
Jia, W., Hu, R.-X., Lei, Y.-K., Zhao, Y., Gui, J.: Histogram of oriented lines for palmprint recognition. IEEE Trans. Syst. Man Cybern.: Syst. 44(3), 385–395 (2014)
Pang, Y., Zhang, K., Yuan, Y., Wang, K.: Distributed object detection with linear SVMs. IEEE Trans. Cybern. 44(11), 2122–2133 (2014)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings ACM international conference on multimedia, pp. 357–360 (2007)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: Proceedings British machine vision conference (BMVC), pp. 995–1004 (2008)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of optical flow and appreance. In: Proceedings European conference on computer vision (ECCV), pp. 428–441 (2006)
Baker, S., Roth, S., Scharstein, D., Black, M. J., Lewis, J., Szeliski, R.: A database and evaluation methodology for optical flow. In: Proceedings IEEE international conference on computer vision (ICCV), pp. 1–8 (2007)
Kim, W., Yoo, B., Han, J-J.: HDO : a novel local image descriptor. In: Proceedings IEEE international conference on image processing (ICIP), pp. 5671–5675 (2014)
Kim, W., Kim, C.: Spatiotemporal saliency detection using textural contrast and its applications. IEEE Trans. Circuits Syst. Video Technol. 24(4), 646–659 (2014)
Rowley, H., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 22–38 (1998)
Devernay, F.: A non-maxima suppression method for edge detection with sub-pixel accuracy. Technical report, INRIA, no. RR-2724 (1995)
Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1475–1490 (2004)
Bourdev, L., Brandt, J.: Robust object detection via soft cascade. Proc. IEEE Comput. Vis. Pattern Recognit. (CVPR) 2, 236–243 (2005)
Li, J., Wang, T., Zhang, Y.: Face detection using SURF cascade. In: Proceedings IEEE computer vision and pattern recognition workshops (CVPRW), pp. 2183–2190 (2011)
Liao, S., Jain, A.K., Li, S.: A fast and accurate unconstrained face detector. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 211–223 (2016)
Wang, H., Klaser, A., Schmid, C., Liu, C-L.: Action recognition by dense trajectories. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 3169–3176 (2011)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings IEEE international conference on computer vision (ICCV), pp. 3551–3558 (2013)
Kappor, A., Winn, J.: Located hidden random fields: learning discriminative parts for object Detection. In: Proceedings European conference on computer vision (ECCV), pp. 302–315 (2006)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vision 77(1), 259–289 (2008)
Shechtman, E., Irani, M.: Space-time behavior-based correlation-or-how to tell if two underlying motion fields are similar without computing them? IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 2045–2056 (2007)
Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: Proceedings IEEE international conference on computer vision (ICCV), pp. 1–8 (2007)
Tian, Y., Cao, L., Liu, Z., Zhang, Z.: Hierarchical filtered motion for action recognition in crowded videos. IEEE Trans. Syst. Man Cybern.-Part C: Appl. Rev. 42(3), 313–323 (2012)
Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 2442–2449 (2009)
Siva, P., Xiang, T.: Weakly supervised action detection. In: Proceedings British machine vision conference (BMVC), pp. 65.1–65.11 (2011)
Roshtkhari, M.J., Levine, M.D.: Human activity recognition in videos using a single example. Image Vis. Comput. 31(11), 864–876 (2013)
Adeli-Mosabbeb, E., Fathy, M.: Non-negative matrix completion for action detection. Image Vis. Comput. 39(7), 38–51 (2015)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings IEEE conference on pattern recognition (ICPR), pp. 32–36 (2004)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 1996–2003 (2009)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 2929–2936 (2009)
Wang, H., Klaser, A., Schmid, C., Liu, L.: Action recognition by dense trajectories. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 3169–3176 (2011)
Reddy, K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24, 971–981 (2013)
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Proceedings European conference on computer vision (ECCV), pp. 494–507 (2010)
Le, Q. V., Zou, W. Y., Yeung, S. Y., Ng, A. Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings IEEE computer vision and pattern recognition (CVPR), pp. 3361–3368 (2011)
Chetverikov, D., Axt, A.: Approximation-free running SVD and its application to motion detection. Pattern Recogn. Lett. 31(9), 891–897 (2010)
Liu, X., Wen, Z., Zhang, Y.: Limited memory block Krylov subspace optimization for computing dominant singular value decomposition. SIAM J. Sci. Comput. 35(3), 1641–1668 (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, W., Han, JJ. Directional coherence-based spatiotemporal descriptor for object detection in static and dynamic scenes. Machine Vision and Applications 28, 49–59 (2017). https://doi.org/10.1007/s00138-016-0801-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-016-0801-7