Human action recognition with salient trajectories and multiple kernel learning

Yi, Yang; Hu, Pan; Deng, Xiaokang

doi:10.1007/s11042-017-5209-5

Human action recognition with salient trajectories and multiple kernel learning

Published: 20 September 2017

Volume 77, pages 17709–17730, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

344 Accesses
7 Citations
Explore all metrics

Abstract

Action recognition in videos plays an important role in the field of computer vision and multimedia, and there exist lots of challenges due to the complexity of spatial and temporal information. Trajectory-based approach has shown to be efficient recently, and a new framework and algorithm of trajectory space information based multiple kernel learning (TSI-MKL) is exploited in this paper. First, dense trajectories are extracted as raw features, and three saliency maps are computed corresponding to color, space, and optical flow on frames at the same time. Secondly, a new method combining above saliency maps is proposed to filter the achieved trajectories, by which a set of salient trajectories only containing foreground motion regions is obtained. Afterwards, a novel two-layer clustering is developed to cluster the obtained trajectories into several semantic groups and the ultimate video representation is generated by encoding each group. Finally, representations of different semantic groups are fed into the proposed kernel function of a multiple kernel classifier. Experiments are conducted on three popular video action datasets and the results demonstrate that our presented approach performs competitively compared with the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Human Action Recognition and Prediction: A Survey

Article 28 March 2022

Human action recognition using fusion of multiview and deep features: an application to video surveillance

Article 14 March 2020

References

Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Ssstrunk S (2010) SLIC superpixels. EPFL Technical Report no. 149300
Aihara K, Aoki T (2015) Motion dense sampling and component clustering for action recognition. Multimedia Tools Appl 74(16):6303–6321
Article Google Scholar
Alfaro A, Mery D, Soto A (2016) Action recognition in video using sparse coding and relative features. In: IEEE Conference on Computer Vision and Pattern Recognition
Ballas N, Yang Y, Lan ZZ, Delezoide B, Preteux F, Hauptmann A (2013) Space-time robust representation for action recognition. In: IEEE International Conference on Computer Vision, pp 2704–2711
Borji A, Sihite DN, Itti L (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722
Article MathSciNet Google Scholar
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Feichtenhofer C, Pinz A, Wildes RP (2015) Dynamically encoded actions based on spacetime saliency. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2755–2764
Fernando B, Gavves E, Jose OM, Ghodrati A, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787
Article Google Scholar
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Article MathSciNet MATH Google Scholar
Gu Y, Liu H (2016) Sample-screening MKL method via boosting strategy for hyperspectral image classification. Neurocomputing 173:1630–1639
Article Google Scholar
Guo Z, Yi Y (2013) Graph-based multiple instance learning for action recognition. In: IEEE International Conference on Image Processing, pp 3745–3749
Hsieh CY, Lin WY (2017) Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors. Multimedia Tools Appl 76:7575
Article Google Scholar
Lan T, Zhu Y, Zamir AR, Savarese S (2015) Action recognition by hierarchical mid-level action elements. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4552–4560
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the Wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1996–2003
Liu F, Xu X, Qing C (2017) Temporal order information for complex action recognition. In: IEEE International Conference on Consumer Electronics-China
Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: International Conference on Computer Vision, pp 104–111
Nagendar G, Ganesh Bandiatmakuri S, Goud Tandarpally M (2012) Action recognition using canonical correlation kernels. In: Asian Conference on Computer Vision, pp 479–492
Ni B, Moulin P, Yan S (2015) Pose adaptive motion feature pooling for human action analysis. Int J Comput Vis 111(2):229–248
Article Google Scholar
Peng X, Wang L, Wang X, Qiao Y (2014) Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst 150:109–125
Article Google Scholar
Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 733–740
Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision, pp 143–156
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, 3
Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158(C):73–80
Article Google Scholar
Souza CRD, Gaidon A, Vig E, Lopez AM (2016) Sympathy for the details: dense trajectories and hybrid classification architectures for action recognition. In: European conference on computer vision. Springer International Publishing, pp 697–716
Sun J, Wu X, Yan S, Cheong LF, Chua TS, Li J (2009) Hierarchical spatio-temporal context modeling for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2004–2011
Theodorakopoulos I, Kastaniotis D, Economou G, Fotopoulos S (2014) Pose-based human action recognition via sparse representation in dissimilarity space. J Vis Commun Image Represent 25(1):12–23
Article Google Scholar
Tuia D, Camps-Valls G, Matasci G, Kanevski M (2010) Learning relevant image features with multiple-kernel classification. IEEE Trans Geosci Remote Sens 48(10):3780–3791
Article Google Scholar
Viet VH, Ngoc LQ, Son TT, Hoang PM (2016) Multiple kernel learning and optical flow for action recognition in RGB-D video. In: Seventh International Conference on Knowledge and Systems Engineering, pp 222–227
Vig E, Dorr M, Cox DD (2012) Saliency-based selection of sparse descriptors for action recognition. In: IEEE International Conference on Image Processing, pp 1405–1408
Vishwanathan SVN, Sun Z, Ampornpunt N, Varma M (2010) Multiple kernel learning and the SMO algorithm. In: Conference on Neural Information Processing Systems, pp 1–8
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, pp 3551–3558
Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3169–3176
Wang H, Ullah M M, Klaser A, Laptev I, Schmid C (2012) Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference
Wang L, Qiao Y, Tang X (2013) Motionlets: mid-level 3D parts for human motion recognition. IEEE Conf Comput Vis Pattern Recognit 9(4):2674–2681
Google Scholar
Xu H, Tian Q, Wang Z, Wu J (2016) A survey on aggregating methods for action recognition with dense trajectories. Multimedia Tools Appl 75(10):1–17
Article Google Scholar
Yang J, Ma Z, Xie M (2015) Action recognition based on multi-scale oriented neighborhood features. Int J Signal Process Image Process Pattern Recognit 8(10):241–254
Yi Y, Lin Y (2013) Human action recognition with salient trajectories. Signal Process 93(11):2932–2941
Article Google Scholar
Yi Y, Zheng Z, Lin M (2017) Realistic action recognition with salient foreground trajectories. Expert Syst Appl 75:44–55
Article Google Scholar
Yuan F, Xia GS, Sahbi H, Prinet V (2012) Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn 45(12):4182–4191
Article Google Scholar

Download references

Acknowledgements

The paper is partly supported by National Natural Science Foundation of China with No. 61672546 and No. 61573385 and Guangzhou Science and Technology Project with No. 201707010127.

Author information

Authors and Affiliations

School of Data & Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Yang Yi, Pan Hu & Xiaokang Deng
Xinhua College of Sun Yat-sen University, Guangzhou, 510520, China
Yang Yi
Guangdong Province Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
Yang Yi

Authors

Yang Yi
View author publications
You can also search for this author in PubMed Google Scholar
Pan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaokang Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaokang Deng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yi, Y., Hu, P. & Deng, X. Human action recognition with salient trajectories and multiple kernel learning. Multimed Tools Appl 77, 17709–17730 (2018). https://doi.org/10.1007/s11042-017-5209-5

Download citation

Received: 07 November 2016
Revised: 05 July 2017
Accepted: 05 September 2017
Published: 20 September 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11042-017-5209-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human action recognition with salient trajectories and multiple kernel learning

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Human Action Recognition and Prediction: A Survey

Human action recognition using fusion of multiview and deep features: an application to video surveillance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human action recognition with salient trajectories and multiple kernel learning

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Human Action Recognition and Prediction: A Survey

Human action recognition using fusion of multiview and deep features: an application to video surveillance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation