A discriminative structural model for joint segmentation and recognition of human actions

Liu, Cuiwei; Hou, Jingyi; Wu, Xinxiao; Jia, Yunde

doi:10.1007/s11042-018-6189-9

A discriminative structural model for joint segmentation and recognition of human actions

Published: 09 June 2018

Volume 77, pages 31627–31645, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Cuiwei Liu¹,
Jingyi Hou²,
Xinxiao Wu² &
…
Yunde Jia²

331 Accesses
4 Citations
Explore all metrics

Abstract

Achieving joint segmentation and recognition of continuous actions in a long-term video is a challenging task due to the varying durations of actions and the complex transitions of multiple actions. In this paper, a novel discriminative structural model is proposed for splitting a long-term video into segments and annotating the action label of each segment. A set of state variables is introduced into the model to explore discriminative semantic concepts shared among different actions. To exploit the statistical dependences among segments, temporal context is captured at both the action level and the semantic concept level. The state variables are treated as latent information in the discriminative structural model and inferred during both training and testing. Experiments on multi-view IXMAS and realistic Hollywood datasets demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization

Article 18 April 2024

Human Action Recognition and Prediction: A Survey

Article 28 March 2022

References

Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International conference on human behavior unterstanding, pp 29–39
Google Scholar
Chen Q, Cai Y, Brown L, Datta A, Fan Q, Feris R, Yan S, Hauptmann A, Pankanti S (2013) Spatio-temporal fisher vector coding for surveillance event detection. In: Proceedings of the 21st ACM international conference on Multimedia, ACM, pp 589–592
Cheng Y, Fan Q, Pankanti S, Choudhary A (2014) Temporal sequence modeling for video event detection. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 2235–2242
Chun SY, Lee CS (2016) Human action recognition using histogram of motion intensity and direction from multiple views. IET Comput Vis 10(4):250–256
Article Google Scholar
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: Eurpoean conference on computer vision, Springer, pp 428–441
Do TMT, Artières T (2009) Large margin training for hidden markov models with partially observed states. In: Annual international conference on machine learning, ACM, pp 265–272
Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: IEEE Conference on computer vision and pattern recognition, vol 2. p 8
Fu Y, Zhang T, Wang W (2017) Sparse coding-based space-time video representation for action recognition. Multimedia Tools and Applications 76:1–14
Article Google Scholar
Gaidon A, Harchaoui Z, Schmid C (2011) Actom sequence models for efficient action detection. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 3201–3208
Harchaoui Z, Moulines E, Bach FR (2009) Kernel change-point analysis. In: Advances in neural information processing systems, pp 609–616
Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 60:4–21
Article Google Scholar
Hoai M, Lan ZZ, De la Torre F (2011) Joint segmentation and classification of human actions in video. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 3265–3272
Hsu YP, Liu C, Chen TY, Fu LC (2016) Online view-invariant human action recognition using rgb-d spatio-temporal matrix. Pattern Recogn 60:215–226
Article Google Scholar
Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33 (1):172–185
Article Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition, pp 1725–1732
Kulkarni K, Evangelidis G, Cech J, Horaud R (2015) Continuous action recognition based on sequence alignment. Int J Comput Vis 112(1):90–114
Article Google Scholar
Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2-3):107–123
Article Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 1–8
Lei J, Zhang J, Li G, Guo Q, Tu D (2016) Continuous action segmentation and recognition using hybrid convolutional neural network-hidden markov model model. IET Comput Vis 10(6):537–544
Article Google Scholar
Li S, Li K, Fu Y (2015) Temporal subspace clustering for human motion segmentation. In: IEEE International conference on computer vision, pp 4453–4461
Lin W, Chen Y, Wu J, Wang H, Sheng B, Li H (2015) A new network-based algorithm for human activity recognition in videos. IEEE Trans Circuits Syst Video Technol 24(5):826–841
Article Google Scholar
Liu C, Wu X, Jia Y (2016) A hierarchical video description for complex activity understanding. Int J Comput Vis 118(2):240–255
Article MathSciNet Google Scholar
Liu C, Xu W, Wu Q, Yang G (2016) Learning motion and content-dependent features with convolutions for action recognition. Multimedia Tools and Applications 75(21):13,023–13,039
Article Google Scholar
Liu J, Gu Y, Kamijo S (2017) Customer behavior classification using surveillance camera for marketing. Multimedia Tools and Applications 76(5):6595–6622
Article Google Scholar
Liu J, Kuipers B, Savarese S (2011) Recognizing human actions by attributes. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 3337–3344
Lu G, Kudo M, Toyama J (2013) Temporal segmentation and assignment of successive actions in a long-term video. Pattern Recogn Lett 34(15):1936–1944
Article Google Scholar
Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 1–8
Ni B, Moulin P, Yang X, Yan S (2015) Motion part regularization: Improving action recognition via trajectory selection. In: IEEE Conference on computer vision and pattern recognition, pp 3698–3706
Ogale A, Karapurkar A, Guerra-Filho G, Aloimonos Y (2004) View-invariant identification of pose sequences for action recognition. In: Video analysis and content extraction workshop, Citeseer
Ramezani M, Yaghmaee F (2016) A review on human action analysis in videos for retrieval applications. Artif Intell Rev 46(4):485–514
Article Google Scholar
Ryan MS, Nudd GR (1973) The viterbi algorithm. Proc IEEE 61(5):268–278
MathSciNet Google Scholar
Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 1234–1241
Santos L, Khoshhal K, Dias J (2015) Trajectory-based human action segmentation. Pattern Recogn 48(2):568–579
Article Google Scholar
Shao L, Ji L, Liu Y, Zhang J (2012) Human action segmentation and recognition via motion and shape analysis. Pattern Recogn Lett 33(4):438–445
Article Google Scholar
Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal laplacian pyramid coding for action recognition. IEEE Transactions on Cybernetics 44(6):2168–2267
Google Scholar
Shi Q, Wang L, Cheng L, Smola A (2008) Discriminative human action segmentation and recognition using semi-markov model. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 1–8
Shi Q, Cheng L, Wang L, Smola A (2011) Human action segmentation and recognition using discriminative semi-markov models. Int J Comput Vis 93(1):22–32
Article Google Scholar
Simon T, Nguyen MH, De La Torre F, Cohn JF (2010) Action unit detection with segment-based svms. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 2737–2744
Tejerodepablos A, Nakashima Y, Sato T, Yokoya N (2016) Human action recognition-based video summarization for rgb-d personal sports video. In: IEEE International conference on multimedia and expo, pp 1–6
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: IEEE International conference on computer vision, pp 4489–4497
Vitaladevuni SN, Kellokumpu V, Davis LS (2008) Action recognition using ballistic dynamics. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 1–8
Wang H, Kläser A., Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Article MathSciNet Google Scholar
Wang J, Nie X, Xia Y, Wu Y, Zhu SC (2014) Cross-view action modeling, learning, and recognition. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp 2649–2656
Wang H, Dan O, Verbeek J, Schmid C (2016) A robust and efficient video representation for action recognition. Int J Comput Vis 119(3):219–238
Article MathSciNet Google Scholar
Wang W, Yan Y, Zhang L, Hong R, Sebe N (2016) Collaborative sparse coding for multiview action recognition. IEEE MultiMedia 23(4):80–87
Article Google Scholar
Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104(2):249–257
Article Google Scholar
Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3d exemplars. In: IEEE International conference on computer vision, IEEE, pp 1–7
Weinland D, Özuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes. In: European conference on computer vision, Springer, pp 635–648
Wu D, Shao L (2013) Silhouette analysis-based action recognition via exploiting human poses. IEEE Trans Circuits Syst Video Technol 23(2):236–243
Article MathSciNet Google Scholar
Wu X, Xu D, Duan L, Luo J, Jia Y (2013) Action recognition using multilevel features and latent structural svm. IEEE Trans Circuits Syst Video Technol 23(8):1422–1431
Article Google Scholar
Wu D, Sharma N, Blumenstein M (2017) Recent advances in video-based human action recognition using deep learning: a review. In: International joint conference on neural networks, IEEE, pp 2865–2872
Xuan X, Murphy K (2007) Modeling changing dependency structure in multivariate time series. In: International conference on machine learning, ACM, pp 1055–1062
Yang Y, Mao G (2013) A self-adaptive sliding window technique for mining data streams. In: Intelligence computation and evolutionary computation, pp 689–697
Chapter Google Scholar
Yi Y, Wang H, Zhang B (2017) Learning correlations for human action recognition in videos. Multimedia Tools and Applications 76(18):18891–18913
Article Google Scholar
Yu CNJ, Joachims T (2009) Learning structural svms with latent variables. In: Annual international conference on machine learning, ACM, pp 1169–1176
Zhao S, Liu Y, Han Y, Hong R, Hu Q, Tian Q (2017) Pooling the convolutional layers in deep convnets for video action recognition. IEEE Transactions on Circuits and Systems for Video Technology
Zhen X, Shao L (2013) Spatio-temporal steerable pyramid for human action recognition. In: IEEE International conference and workshops on automatic face and gesture recognition, IEEE
Zhou Q, Wang G, Jia K, Zhao Q (2013) Learning to share latent tasks for action recognition. In: IEEE International conference on computer vision, IEEE, pp 2264–2271
Zhu G, Huang Q, Xu C, Xing L, Gao W, Yao H (2007) Human behavior analysis for highlight ranking in broadcast racket sports video. IEEE Trans Multimedia 9(6):1167–1182
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of China(NSFC) under Grants No. 61602320 and No. 61673062, and Liaoning Doctoral Startup Project under Grant No. 201601172, and project of Liaoning provincial education department under Grant No. L201607.

Author information

Authors and Affiliations

School of Computer Science, Shenyang Aerospace University, Shenyang, Liaoning, 110136, People’s Republic of China
Cuiwei Liu
Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, 100081, People’s Republic of China
Jingyi Hou, Xinxiao Wu & Yunde Jia

Authors

Cuiwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jingyi Hou
View author publications
You can also search for this author in PubMed Google Scholar
Xinxiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yunde Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cuiwei Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Hou, J., Wu, X. et al. A discriminative structural model for joint segmentation and recognition of human actions. Multimed Tools Appl 77, 31627–31645 (2018). https://doi.org/10.1007/s11042-018-6189-9

Download citation

Received: 14 April 2017
Revised: 25 March 2018
Accepted: 23 May 2018
Published: 09 June 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11042-018-6189-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A discriminative structural model for joint segmentation and recognition of human actions

Abstract

Access this article

Similar content being viewed by others

Deep learning for video object segmentation: a review

A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization

Human Action Recognition and Prediction: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A discriminative structural model for joint segmentation and recognition of human actions

Abstract

Access this article

Similar content being viewed by others

Deep learning for video object segmentation: a review

A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization

Human Action Recognition and Prediction: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation