Skip to main content

Advertisement

Log in

Human action recognition with salient trajectories and multiple kernel learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Action recognition in videos plays an important role in the field of computer vision and multimedia, and there exist lots of challenges due to the complexity of spatial and temporal information. Trajectory-based approach has shown to be efficient recently, and a new framework and algorithm of trajectory space information based multiple kernel learning (TSI-MKL) is exploited in this paper. First, dense trajectories are extracted as raw features, and three saliency maps are computed corresponding to color, space, and optical flow on frames at the same time. Secondly, a new method combining above saliency maps is proposed to filter the achieved trajectories, by which a set of salient trajectories only containing foreground motion regions is obtained. Afterwards, a novel two-layer clustering is developed to cluster the obtained trajectories into several semantic groups and the ultimate video representation is generated by encoding each group. Finally, representations of different semantic groups are fed into the proposed kernel function of a multiple kernel classifier. Experiments are conducted on three popular video action datasets and the results demonstrate that our presented approach performs competitively compared with the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Ssstrunk S (2010) SLIC superpixels. EPFL Technical Report no. 149300

  2. Aihara K, Aoki T (2015) Motion dense sampling and component clustering for action recognition. Multimedia Tools Appl 74(16):6303–6321

    Article  Google Scholar 

  3. Alfaro A, Mery D, Soto A (2016) Action recognition in video using sparse coding and relative features. In: IEEE Conference on Computer Vision and Pattern Recognition

  4. Ballas N, Yang Y, Lan ZZ, Delezoide B, Preteux F, Hauptmann A (2013) Space-time robust representation for action recognition. In: IEEE International Conference on Computer Vision, pp 2704–2711

  5. Borji A, Sihite DN, Itti L (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722

    Article  MathSciNet  Google Scholar 

  6. Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  7. Feichtenhofer C, Pinz A, Wildes RP (2015) Dynamically encoded actions based on spacetime saliency. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2755–2764

  8. Fernando B, Gavves E, Jose OM, Ghodrati A, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787

    Article  Google Scholar 

  9. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  MATH  Google Scholar 

  10. Gu Y, Liu H (2016) Sample-screening MKL method via boosting strategy for hyperspectral image classification. Neurocomputing 173:1630–1639

    Article  Google Scholar 

  11. Guo Z, Yi Y (2013) Graph-based multiple instance learning for action recognition. In: IEEE International Conference on Image Processing, pp 3745–3749

  12. Hsieh CY, Lin WY (2017) Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors. Multimedia Tools Appl 76:7575

    Article  Google Scholar 

  13. Lan T, Zhu Y, Zamir AR, Savarese S (2015) Action recognition by hierarchical mid-level action elements. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4552–4560

  14. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  15. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the Wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1996–2003

  16. Liu F, Xu X, Qing C (2017) Temporal order information for complex action recognition. In: IEEE International Conference on Consumer Electronics-China

  17. Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: International Conference on Computer Vision, pp 104–111

  18. Nagendar G, Ganesh Bandiatmakuri S, Goud Tandarpally M (2012) Action recognition using canonical correlation kernels. In: Asian Conference on Computer Vision, pp 479–492

  19. Ni B, Moulin P, Yan S (2015) Pose adaptive motion feature pooling for human action analysis. Int J Comput Vis 111(2):229–248

    Article  Google Scholar 

  20. Peng X, Wang L, Wang X, Qiao Y (2014) Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst 150:109–125

    Article  Google Scholar 

  21. Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 733–740

  22. Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision, pp 143–156

  23. Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990

    Article  Google Scholar 

  24. Rodriguez MD, Ahmed J, Shah M (2008) Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  25. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, 3

  26. Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158(C):73–80

    Article  Google Scholar 

  27. Souza CRD, Gaidon A, Vig E, Lopez AM (2016) Sympathy for the details: dense trajectories and hybrid classification architectures for action recognition. In: European conference on computer vision. Springer International Publishing, pp 697–716

  28. Sun J, Wu X, Yan S, Cheong LF, Chua TS, Li J (2009) Hierarchical spatio-temporal context modeling for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2004–2011

  29. Theodorakopoulos I, Kastaniotis D, Economou G, Fotopoulos S (2014) Pose-based human action recognition via sparse representation in dissimilarity space. J Vis Commun Image Represent 25(1):12–23

    Article  Google Scholar 

  30. Tuia D, Camps-Valls G, Matasci G, Kanevski M (2010) Learning relevant image features with multiple-kernel classification. IEEE Trans Geosci Remote Sens 48(10):3780–3791

    Article  Google Scholar 

  31. Viet VH, Ngoc LQ, Son TT, Hoang PM (2016) Multiple kernel learning and optical flow for action recognition in RGB-D video. In: Seventh International Conference on Knowledge and Systems Engineering, pp 222–227

  32. Vig E, Dorr M, Cox DD (2012) Saliency-based selection of sparse descriptors for action recognition. In: IEEE International Conference on Image Processing, pp 1405–1408

  33. Vishwanathan SVN, Sun Z, Ampornpunt N, Varma M (2010) Multiple kernel learning and the SMO algorithm. In: Conference on Neural Information Processing Systems, pp 1–8

  34. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, pp 3551–3558

  35. Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3169–3176

  36. Wang H, Ullah M M, Klaser A, Laptev I, Schmid C (2012) Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference

  37. Wang L, Qiao Y, Tang X (2013) Motionlets: mid-level 3D parts for human motion recognition. IEEE Conf Comput Vis Pattern Recognit 9(4):2674–2681

    Google Scholar 

  38. Xu H, Tian Q, Wang Z, Wu J (2016) A survey on aggregating methods for action recognition with dense trajectories. Multimedia Tools Appl 75(10):1–17

    Article  Google Scholar 

  39. Yang J, Ma Z, Xie M (2015) Action recognition based on multi-scale oriented neighborhood features. Int J Signal Process Image Process Pattern Recognit 8(10):241–254

  40. Yi Y, Lin Y (2013) Human action recognition with salient trajectories. Signal Process 93(11):2932–2941

    Article  Google Scholar 

  41. Yi Y, Zheng Z, Lin M (2017) Realistic action recognition with salient foreground trajectories. Expert Syst Appl 75:44–55

    Article  Google Scholar 

  42. Yuan F, Xia GS, Sahbi H, Prinet V (2012) Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn 45(12):4182–4191

    Article  Google Scholar 

Download references

Acknowledgements

The paper is partly supported by National Natural Science Foundation of China with No. 61672546 and No. 61573385 and Guangzhou Science and Technology Project with No. 201707010127.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaokang Deng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yi, Y., Hu, P. & Deng, X. Human action recognition with salient trajectories and multiple kernel learning. Multimed Tools Appl 77, 17709–17730 (2018). https://doi.org/10.1007/s11042-017-5209-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5209-5

Keywords

Navigation