Skip to main content

Advertisement

Log in

Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Regularized multi-task learning (MTL) algorithms have been exploited in the field of pattern recognition and computer vision gradually, which can fully excavate the relationships of different related tasks. Therefore, many dramatically favorable approaches based on regularized MTL have been proposed. In the past decades, although the promising results about human action recognition have been achieved, most of existing action recognition algorithms focus on action descriptors, single/multi-view and multi-modality action recognition, and few works are related with MTL, especial of lacking the systematic evaluation of existing MTL algorithms for human action recognition. Thus, in the paper, seven popular regularized MTL algorithms in which different actions are considered as different tasks, are systematically exploited on two public multi-view action datasets. In detail, dense trajectory features are firstly extracted for each view, and then the shared codebook are constructed for all views by k-means, and then each video is coded by the shared codebook. Moreover, according to different regularized MTL algorithms, all actions or part of actions are considered as related, and then these actions are set to different tasks in MTL. Further, the effectiveness of different number of training samples from different action views is also evaluated for MTL. Large scale experimental results show that: 1) Regularized MTL is very useful for action recognition which can dig the latent relationship among different actions; 2) Not of all human actions are related, if irrelative actions are put together in MTL, its performance will fall; 3) With the increase of the training samples from different views, the relationships about different actions can be fully exploited, and it promotes the accuracy improvement of action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://shivvitaladevuni.com/action_rec/ixmas_example.htm

  2. http://users.eecs.northwestern.edu/~jwa368/my_data.html

  3. http://www.public.asu.edu/~jye02/Software/MALSAR/

References

  1. Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75

    Article  MathSciNet  Google Scholar 

  2. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. in: VS-PETS

  3. Doumanoglou A, Kim T-K, Zhao X, Malassiotis S (2014) Active random forests: an application to autonomous unfolding of clothes. In Proceedings of the European Conference on Computer Vision (ECCV)

  4. Everts I, van Gemert J, Gevers T (2014) Evaluation of color spatio-temporal interest points for human action recognition, IEEE trans. Image Process 23(4):1569–1580

    Article  MathSciNet  Google Scholar 

  5. Evgeniou T, Pontil M (2004) Regularized multi–task learning. in: KDD

  6. Gao Z, Song JM, Zhang H, Liu AA, Xu GP, Xue YB (2013) Human action recognition via multi-modality information. J Elect Eng Technol 8(2):742–751

    Google Scholar 

  7. Gao Y, Wang M, Ji R, Wu X, Dai Q (2014a) 3D object retrieval with Hausdorff distance learning. IEEE Trans Ind Electron 61(4):2088–2098

    Article  Google Scholar 

  8. Gao Z, Zhang H, Liu AA, Xue YB, Xu GP (2014b) Human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning. KSII Trans Int Inf Syst 8(2):483–503

    Google Scholar 

  9. Gao Z, Zhang LF, Chen MY et al (2014c) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657

    Article  Google Scholar 

  10. Gao Z, Zhang H, Xu GP, Xue YB, Hauptmann AG (2015a) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97

    Article  Google Scholar 

  11. Z. Gao, H. Zhang, G.P Xu, Y.B Xue (2015b) Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition, Neurocomputing, 151, Part 2, Pages 554–564.

  12. Gao Z, Zhang H, Liu AA, Xu GP, Xue YB (2016a) Human action recognition on depth dataset. Neural Comput & Applic 27(7):2047–2054

    Article  Google Scholar 

  13. Gao Z, Zhang Y, Zhang H, Xue YB, Xu GP (2016b) Multi-dimensional human action recognition model based on image set and group sparisty. Neurocomputing 215:138–149. doi:10.1016/j.neucom.2016.01.113

  14. Gao Z, Nie WZ, Liu AA, Zhang H (2016c) Evaluation of local spatial–temporal features for cross-view action recognition. Neurocomputing, 173. Part 1:110–117

    Google Scholar 

  15. Gao Z, Wang D, Zhang H, Xue Y, Xu G (2016d) A fast 3D retrieval algorithm via class-statistic and pair-constraint model. Proceedings of the 2016 ACM on Multimedia Conference, 117–121

  16. Ge L, Ju R, Ren T, Wu G (2015) Interactive RGB-D image segmentation using hierarchical graph cut and geodesic distance. Proceedings of Pacific Rim Conference on Multimedia (PCM'15), Gwangju, Korea, 114–124

  17. Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space time shapes. IEEE Trans Pattern Anal Mach Intell:2247–2253

  18. Guo Y (2013) Convex subspace representation learning from multi-view data. In AAAI:387–393

  19. Guo W, Chen G (2015) Human action recognition via multi-task learning base on spatial–temporal feature. Inf Sci 320(1):418–428

    Article  MathSciNet  Google Scholar 

  20. Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. Proceedings of IEEE International Conference on Multimedia and Expo (ICME'16), Seattle, USA

  21. Hao T, Peng W, Wang Q, Wang B, Sun J-S (2016) Reconstruction and application of protein–protein interaction network. Int J Mol Sci 17:907

    Article  Google Scholar 

  22. Hu R, Xu H, Rohrbach M, Feng J, Saenko K, Darrell T (2015) Natural language object retrieval. arXiv preprint arXiv:1511.04164

  23. Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d gradients. Proceedings of European Conference on Computer Vision 275:1–10

    Google Scholar 

  24. Konecny J, Hagara M (2013) One-shot-learning gesture recognition using HOG-HOF features. CoRR, abs/1312.4190

  25. Kumar A, Daum’e H III (2011) A co-training approach for multi-view spectral clustering. In ICML 393–400

  26. Laptev I, Lindeberg T (2003) Space-time interest points. in: ICCV’03, p 432–439

  27. Laptev I, Marszałek M, Schmid C, Rozenfeld B (2009) Learning realistic human actions from movies. in Proc. CVPR'08

  28. Li R, Tian T, Sclaroff S (2007) Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. in: ICCV'07, p 1–8

  29. Lin L, Wang K, Zuo W, Wang M, Luo J, Zhang L (2015) A deep structured model with radius-margin bound for 3d human activity recognition. Int J Comput Vis 118:256

    Article  MathSciNet  Google Scholar 

  30. Liu A, Wang Z, Nie W, Yuting S (2015a) Graph-based characteristic view set extraction and matching for 3D model retrieval. Inf Sci, doi:10.1016/j.ins.2015.04.042

  31. Liu A-A, Su Y-T, Jia P-P, Gao Z, Hao T, Yang Z-X (2015b) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208

    Article  Google Scholar 

  32. Liu A-A, Xu N, Nie W, Su Y, Wong Y, Kankanhalli M (2016a) Benchmarking a multimodal and Multiview and interactive dataset for human action recognition. IEEE Transactions on Cybernetics 0(0):1–1

    Google Scholar 

  33. Liu A-A, Nie W-Z, Gao Y, Su Y-T (2016b) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116

    Article  MathSciNet  Google Scholar 

  34. Liu J, Ren T, Wang Y, Zhong S-H, Bei J, Chen S (2016c) Object proposal on RGB-D images via elastic edge boxes. Neurocomputing, doi:10.1016/j.neucom.2016.09.111

  35. Liu A-A, Su Y-T, Nie W-Z, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114

  36. Mansur A, Makihara Y, Yagi Y (2013) Inverse dynamics for action recognition. IEEE Trans Cybern 43(4):1226–1236

    Article  Google Scholar 

  37. Marszalek M, Laptev I, Schmid C (2009) Actions in context. in: CVPR’09, p 2929–2936

  38. Nie L, Wang M, Zha Z-J, Li G, Chua T-S (2011) Multimedia answering: enriching text QA with media information. SIGIR:695–704

  39. Nie WZ, Liu AA, Gao Z, Su YT (2015) Clique-graph matching by preserving global & local structure. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4503–4510

  40. Nie WZ, Liu AA, Li WH, Su YT (2016) Cross-view action recognition by cross-domain learning, Image and Vision Computing.

  41. Onishi K, Takiguchi T, Ariki Y (2008) 3D human posture estimation using the HOG features from monocular image. in: ICPR, p 1–4

  42. Rahmani H, Mian A (2016) 3D action recognition from novel viewpoints. In: CVPR

  43. Ran J, Yang L, Ren T, Ge L, Wu G (2015) Depth-aware salient object detection using anisotropic center-surround difference. Signal Processing: Image Communication (SPIC) 38:115–126

    Google Scholar 

  44. Rodriguez MD, Ahmed J, Shah M (2008) Action match a spatio-temporal maximum average correlation height filter for action recognition. in: CVPR’08, p 1–8

  45. Suk H, Jain AK, Lee S (2011) A network of dynamic probabilistic models for human interaction analysis. IEEE Trans Circuits Syst Video Technol 21(7):932–945

    Article  Google Scholar 

  46. Sun S (2013) A survey of multi-view machine learning. Neural Comput & Applic 23(Issue 7-8):2031–2038

    Article  Google Scholar 

  47. Wang H, Schmid C (2013) Action recognition with improved trajectories. ICCV

  48. Wang H, Kläser A, Schmid C, Liu C-L (2011) Action recognition by dense trajectories. CVPR:3169–3176

  49. Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1):60–79

    Article  MathSciNet  Google Scholar 

  50. Wang J, Nie X, Xia Y, Wu Y, Zhu S (2014a) Cross-view action modeling, learning and recognition. In CVPR

  51. Wang J, Nie X, Xia Y, Wu Y, Zhu S-C (2014b) Cross-view action modeling, learning, and recognition. Proc of IEEE Conf on Computer Vision and Pattern Recognition (CVPR)

  52. Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3d exemplars. ICCV

  53. Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In CVPRW

  54. Xu C, Tao D, Xu C (2013) A survey on multi-view learning https://arxiv.org/abs/1304.5634

  55. Yao H, Zhang S, Zhang Y, Li J, Tian Q (2016) Coarse-to-fine description for fine-grained visual categorization. IEEE Trans Image Process 25(10):4858–4872

    Article  MathSciNet  Google Scholar 

  56. Yuting S et al (2014) Coupled hidden conditional random fields for RGB-D human action recognition. Singal Process. doi:10.1016/j.sigpro.2014.08.038

    Google Scholar 

  57. Zhang X, Zhang H, Zhang Y, Yang Y, Wang M, Luan H-B, Li J, Chua T-S (2016) Deep fusion of multiple semantic cues for complex event recognition. IEEE Trans Image Process 25(3):1033–1046

    Article  MathSciNet  Google Scholar 

  58. Zhou J, Chen J, Ye J (2012) MALSAR: multi-tAsk learning via structural regularization. Arizona State University, http://www.MALSAR.org

  59. Zhou Q, Wang G, Jia K, Zhao Q (2013) Learning to share latent tasks for action recognition. in: ICCV

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No.61572357, No.61202168). Tianjin Research Program of Application Foundation and Advanced Technology (14JCZDJC31700 and 13JCQNJC0040). Tianjin Municipal Natural Science Foundation (No.13JCQNJC0040). Country China Scholarship Council (No.201608120021).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Z. Gao or H. Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Z., Li, S.H., Zhang, G.T. et al. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimed Tools Appl 76, 20125–20148 (2017). https://doi.org/10.1007/s11042-017-4384-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4384-8

Keywords

Navigation