Abstract
Action recognition in videos plays an important role in the field of computer vision and multimedia, and there exist lots of challenges due to the complexity of spatial and temporal information. Trajectory-based approach has shown to be efficient recently, and a new framework and algorithm of trajectory space information based multiple kernel learning (TSI-MKL) is exploited in this paper. First, dense trajectories are extracted as raw features, and three saliency maps are computed corresponding to color, space, and optical flow on frames at the same time. Secondly, a new method combining above saliency maps is proposed to filter the achieved trajectories, by which a set of salient trajectories only containing foreground motion regions is obtained. Afterwards, a novel two-layer clustering is developed to cluster the obtained trajectories into several semantic groups and the ultimate video representation is generated by encoding each group. Finally, representations of different semantic groups are fed into the proposed kernel function of a multiple kernel classifier. Experiments are conducted on three popular video action datasets and the results demonstrate that our presented approach performs competitively compared with the state-of-the-art.
Similar content being viewed by others
References
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Ssstrunk S (2010) SLIC superpixels. EPFL Technical Report no. 149300
Aihara K, Aoki T (2015) Motion dense sampling and component clustering for action recognition. Multimedia Tools Appl 74(16):6303–6321
Alfaro A, Mery D, Soto A (2016) Action recognition in video using sparse coding and relative features. In: IEEE Conference on Computer Vision and Pattern Recognition
Ballas N, Yang Y, Lan ZZ, Delezoide B, Preteux F, Hauptmann A (2013) Space-time robust representation for action recognition. In: IEEE International Conference on Computer Vision, pp 2704–2711
Borji A, Sihite DN, Itti L (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Feichtenhofer C, Pinz A, Wildes RP (2015) Dynamically encoded actions based on spacetime saliency. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2755–2764
Fernando B, Gavves E, Jose OM, Ghodrati A, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Gu Y, Liu H (2016) Sample-screening MKL method via boosting strategy for hyperspectral image classification. Neurocomputing 173:1630–1639
Guo Z, Yi Y (2013) Graph-based multiple instance learning for action recognition. In: IEEE International Conference on Image Processing, pp 3745–3749
Hsieh CY, Lin WY (2017) Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors. Multimedia Tools Appl 76:7575
Lan T, Zhu Y, Zamir AR, Savarese S (2015) Action recognition by hierarchical mid-level action elements. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4552–4560
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the Wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1996–2003
Liu F, Xu X, Qing C (2017) Temporal order information for complex action recognition. In: IEEE International Conference on Consumer Electronics-China
Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: International Conference on Computer Vision, pp 104–111
Nagendar G, Ganesh Bandiatmakuri S, Goud Tandarpally M (2012) Action recognition using canonical correlation kernels. In: Asian Conference on Computer Vision, pp 479–492
Ni B, Moulin P, Yan S (2015) Pose adaptive motion feature pooling for human action analysis. Int J Comput Vis 111(2):229–248
Peng X, Wang L, Wang X, Qiao Y (2014) Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst 150:109–125
Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 733–740
Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision, pp 143–156
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, 3
Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158(C):73–80
Souza CRD, Gaidon A, Vig E, Lopez AM (2016) Sympathy for the details: dense trajectories and hybrid classification architectures for action recognition. In: European conference on computer vision. Springer International Publishing, pp 697–716
Sun J, Wu X, Yan S, Cheong LF, Chua TS, Li J (2009) Hierarchical spatio-temporal context modeling for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2004–2011
Theodorakopoulos I, Kastaniotis D, Economou G, Fotopoulos S (2014) Pose-based human action recognition via sparse representation in dissimilarity space. J Vis Commun Image Represent 25(1):12–23
Tuia D, Camps-Valls G, Matasci G, Kanevski M (2010) Learning relevant image features with multiple-kernel classification. IEEE Trans Geosci Remote Sens 48(10):3780–3791
Viet VH, Ngoc LQ, Son TT, Hoang PM (2016) Multiple kernel learning and optical flow for action recognition in RGB-D video. In: Seventh International Conference on Knowledge and Systems Engineering, pp 222–227
Vig E, Dorr M, Cox DD (2012) Saliency-based selection of sparse descriptors for action recognition. In: IEEE International Conference on Image Processing, pp 1405–1408
Vishwanathan SVN, Sun Z, Ampornpunt N, Varma M (2010) Multiple kernel learning and the SMO algorithm. In: Conference on Neural Information Processing Systems, pp 1–8
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, pp 3551–3558
Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3169–3176
Wang H, Ullah M M, Klaser A, Laptev I, Schmid C (2012) Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference
Wang L, Qiao Y, Tang X (2013) Motionlets: mid-level 3D parts for human motion recognition. IEEE Conf Comput Vis Pattern Recognit 9(4):2674–2681
Xu H, Tian Q, Wang Z, Wu J (2016) A survey on aggregating methods for action recognition with dense trajectories. Multimedia Tools Appl 75(10):1–17
Yang J, Ma Z, Xie M (2015) Action recognition based on multi-scale oriented neighborhood features. Int J Signal Process Image Process Pattern Recognit 8(10):241–254
Yi Y, Lin Y (2013) Human action recognition with salient trajectories. Signal Process 93(11):2932–2941
Yi Y, Zheng Z, Lin M (2017) Realistic action recognition with salient foreground trajectories. Expert Syst Appl 75:44–55
Yuan F, Xia GS, Sahbi H, Prinet V (2012) Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn 45(12):4182–4191
Acknowledgements
The paper is partly supported by National Natural Science Foundation of China with No. 61672546 and No. 61573385 and Guangzhou Science and Technology Project with No. 201707010127.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yi, Y., Hu, P. & Deng, X. Human action recognition with salient trajectories and multiple kernel learning. Multimed Tools Appl 77, 17709–17730 (2018). https://doi.org/10.1007/s11042-017-5209-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5209-5