Advertisement

Signal, Image and Video Processing

, Volume 13, Issue 8, pp 1619–1627 | Cite as

Weakly supervised pairwise Frank–Wolfe algorithm to recognize a sequence of human actions in RGB-D videos

  • Zohreh Ghaderi
  • Hassan KhotanlouEmail author
Original Paper
  • 59 Downloads

Abstract

Human activity recognition is an attractive subject in machine vision which is applicable to intelligent living environments. Daily human activities consist of several actions, in which the boundaries of these actions are different among individuals. The existence of Kinect cameras and RGB-D images, which record joints data, high-resolution RGB and depth images, has improved human action recognition. In this research, actions are recognized by applying the action’s order in the weakly supervised and semi-supervised learning model and extracting the RGB-D data feature. Frank–Wolfe algorithm, which is a constrained convex optimization algorithm, and the pairwise Frank–Wolfe algorithm, which is a developed model of the Frank–Wolfe algorithm, are used as learning models. The evaluation of the proposed method was carried out on the Watch-n-Patch database. The results show a good performance of the proposed method.

Keywords

Human action recognition Pairwise Frank–Wolfe RGB-D video Sequence of human action 

Notes

References

  1. 1.
    Aggarwal JK, Xia L (2014) Human activity recognition from 3d data: a review. In: Pattern recognition letters, p 70–80CrossRefGoogle Scholar
  2. 2.
    Liu J, Kuipers B, Savarese S (2011). Recognizing human actions by attributes. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), IEEEGoogle Scholar
  3. 3.
    Wu C. et al. (2015) Watch-n-patch: unsupervised understanding of actions and relations. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  4. 4.
    Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  5. 5.
    Derpanis KG et al. (2010) Efficient action spotting based on a spacetime oriented structure representation. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  6. 6.
    Hoai M, Lan Z-Z, De la Torre F (2010) Joint segmentation and classification of human actions in video. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  7. 7.
    Laptev I et al. (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEEGoogle Scholar
  8. 8.
    Duchenne O et al. (2009) Automatic annotation of human actions in video. In: 2009 IEEE 12th international conference on computer vision. IEEEGoogle Scholar
  9. 9.
    Bojanowski P et al. (2013) Finding actors and actions in movies. In: Proceedings of the IEEE international conference on computer visionGoogle Scholar
  10. 10.
    Bojanowski P et al. (2015) Weakly-supervised alignment of video with text. In: 2015 IEEE international conference on computer vision (ICCV). IEEEGoogle Scholar
  11. 11.
    Huang D-A, Fei-Fei L, Niebles JC (2016) Connectionist temporal modeling for weakly supervised action labeling. In: European conference on computer vision. SpringerGoogle Scholar
  12. 12.
    Bhattacharya S et al. (2014) Recognition of complex events: exploiting temporal dynamics between underlying concepts. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  13. 13.
    Pirsiavash H, Ramanan D (2014) Parsing videos of actions with segmental grammars. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  14. 14.
    Vo NN, Bobick AF (2014) From stochastic grammar to bayes network: Probabilistic parsing of complex activity. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  15. 15.
    Kuehne H, Arslan A, Serre T (2014) The language of actions: Recovering the syntax and semantics of goal-directed human activities. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  16. 16.
    Tang K, Fei-Fei L, Koller D (2012) Learning latent temporal structure for complex event detection. In: 2012 IEEE Conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  17. 17.
    Shi, Q., et al.: Human action segmentation and recognition using discriminative semi-markov models. Int J Comput Vis 93(1), 22–32 (2011)CrossRefGoogle Scholar
  18. 18.
    Kataoka H et al. (2016) Recognition of transitional action for short-term action prediction using discriminative temporal CNN feature. In: BMVCGoogle Scholar
  19. 19.
    Bojanowski P et al. (2014) Weakly supervised action labeling in videos under ordering constraints. In: European conference on computer vision. SpringerGoogle Scholar
  20. 20.
    Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res Logist 3(1–2), 95–110 (1956)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Sung J et al. (2012) Unstructured human activity detection from rgbd images. In: 2012 IEEE International conference on robotics and automation (ICRA). IEEEGoogle Scholar
  22. 22.
    Wu C, Lenz I, Saxena A (2014) Hierarchical semantic labeling for task-relevant RGB-D perception. In: Robotics: science and systemsGoogle Scholar
  23. 23.
    Jaggi M (2013) Revisiting Frank-Wolfe: projection-free sparse convex optimization. In: ICML, vol 1Google Scholar
  24. 24.
    Andrew, A.M.: Another efficient algorithm for convex hulls in two dimensions. Inf Process Lett 9(5), 216–219 (1979)CrossRefGoogle Scholar
  25. 25.
    Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)CrossRefGoogle Scholar
  26. 26.
    Lacoste-Julien S, Jaggi M (2015) On the global linear convergence of Frank-Wolfe optimization variants. In: Advances in neural information processing systemsGoogle Scholar
  27. 27.
    Everingham, M., et al.: The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2), 303–338 (2010)CrossRefGoogle Scholar
  28. 28.
    Smucker MD, Allan J, Carterette B (2007) A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACMGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringBu-Ali Sina UniversityHamedanIran

Personalised recommendations