Advertisement

Machine Vision and Applications

, Volume 28, Issue 5–6, pp 463–473 | Cite as

Power difference template for action recognition

  • Liangliang WangEmail author
  • Ruifeng Li
  • Yajun Fang
Original Paper

Abstract

This paper proposes power difference template as a new spatial-temporal representation for action recognition. Specifically, spatial power features are first extracted according to the transform of Gaussian convolution on gradients between logarithmic and exponential domain. Using the forward–backward frame power difference method, we thus present normalized projection histogram (NPH) to characterize segmented action spatial features by normalizing histogram of the 2D horizontal–vertical projections. Furthermore, from the perspective of energy conservation, motion kinetic velocity (MKV) is introduced as a supplement for representing temporal relationships of power features by supposing that the variation of power is produced by motion in the form of kinetic energy. Our power difference template fusing NPH and MKV is further integrated to a bag of word model for training and testing under a support vector machine framework. Experiments on KTH, UCF Sports, UCF101 and HMDB datasets demonstrate the effectiveness of the proposed algorithm.

Keywords

Action recognition Power difference template Normalized projection histogram Motion kinetic velocity 

Notes

Acknowledgements

This research is supported by the National Natural Science Foundation of China (Grant No.: 661273339). The author also would like to thank Berthold K. P. Horn for his good ideas during author’s visit study at MIT CSAIL.

References

  1. 1.
    Ma, S., et al.: Action recognition and localization by hierarchical space-time segments. In: Proceedings of IEEE Conference on Computer Vision, pp. 2744–2751 (2013)Google Scholar
  2. 2.
    Cao, X., et al.: Action recognition using 3d daisy descriptor. Mach. Vis. Appl. 25, 159–171 (2014)CrossRefGoogle Scholar
  3. 3.
    Ballas, N., et al.: Space-time robust representation for action recognition. In: Proceedings of IEEE Conference Computer Vision, pp. 2704–2711 (2013)Google Scholar
  4. 4.
    LE, Q., et al.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 3361–3368 (2011)Google Scholar
  5. 5.
    Wang, H., et al.: Action recognition using nonnegative action component representation and sparse basis selection. IEEE Trans. Image Process. 23, 570–581 (2014)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Cao, L., et al.: Scene aligned pooling for complex video recognition. In: Proceedings of European Conference Computer Vision, pp. 688–701 (2012)Google Scholar
  7. 7.
    Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: Proceedings of IEEE International Conference Computer Vision, pp. 778–785 (2011)Google Scholar
  8. 8.
    Laptev, I., Linderberg, T.: Space-time interest points. In: Proceedings of IEEE International Conference Computer Vision, pp. 3362–3364 (2003)Google Scholar
  9. 9.
    Willems, T.T.G., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings of European Conference Computer Vision (2008)Google Scholar
  10. 10.
    Laptev, I., et al.: Learning realistic human actions from movies. In: IEEE Conference Computer Vision Pattern Recognition, pp. 23–28 (2008)Google Scholar
  11. 11.
    Klaser, M.M.A., Schmid, C.: A spatio-temporal descriptor based on 3d gradients. In: Proceedings of Bmvc (2008)Google Scholar
  12. 12.
    Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 2046–2053 (2010)Google Scholar
  13. 13.
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)CrossRefGoogle Scholar
  14. 14.
    Rodriguez, J.A.M., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 1–8 (2008)Google Scholar
  15. 15.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1106–1114 (2012)Google Scholar
  16. 16.
    Tran, D., et al.: Learning spatiotemporal features with 3d convolutional network. In: ICCV, pp. 4489–4497 (2015)Google Scholar
  17. 17.
    Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: ICCV, pp. 4041–4049 (2015)Google Scholar
  18. 18.
    Xin, M., et al.: Arch: Adaptive recurrent-convolutional hybrid networks for long-term action recognition. Neurocomputing 178, 87–102 (2016)CrossRefGoogle Scholar
  19. 19.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 886–893 (2005)Google Scholar
  20. 20.
    Schuldt, I.L.C., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of 17th International Conference Pattern Recognition, pp. 32–36 (2004)Google Scholar
  21. 21.
    Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRefGoogle Scholar
  22. 22.
    Wang, H., et al.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings of IEEE British Machine Vision Conference, pp. 124.1–124.11 (2009)Google Scholar
  23. 23.
    Kaaniche, M., Bremond, F.: Gesture recognition by learning local motion signatures. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 2745–2752 (2010)Google Scholar
  24. 24.
    Wu, X., et al.: Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 489–496 (2011)Google Scholar
  25. 25.
    Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: Proceedings of IEEE Conference Computer Vision Pattern Recogniton, pp. 1234–1241 (2013)Google Scholar
  26. 26.
    Derpanis, K., et al.: Action spotting and recognition based on a spatiotemporal orientation analysis. IEEE Trans. Pattern. Anal. Mach. Intell. 35, 527–540 (2013)CrossRefGoogle Scholar
  27. 27.
    Eweiwi, M.C.A., Bauckhage, C.: Action recognition in still images by learning spatial interest regions from videos. Pattern. Recognit. Lett. 51, 8–15 (2014)CrossRefGoogle Scholar
  28. 28.
    Jiang, Z.L.Z., Davvis, L.: A unified tree-based framework for joint action localization, recognition and segmentation. Comput. Vis. Image Underst 117, 1345–1355 (2013)CrossRefGoogle Scholar
  29. 29.
    Adeli-Mosabbeb, E., Fathy, M.: Non-negative matrix completion for action detection. Image Vis. Comput. 39, 38–51 (2015)CrossRefGoogle Scholar
  30. 30.
    Sheng, W.Y.B., Sun, C.: Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neuro 158, 73–80 (2015)Google Scholar
  31. 31.
    Kuehne, H., et al.: Hmdb: a large video database for human motion recognition. In: Proceedings of IEEE International Conference Computer Vision, pp. 2556–2563 (2011)Google Scholar
  32. 32.
    Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)CrossRefGoogle Scholar
  33. 33.
    Shih, Y., et al.: Style transfer for headshot portraits. ACM Trans. Graph. 33 (2014)Google Scholar
  34. 34.
    Khan, R., et al.: Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model. Comput. Vis. Image. Underst. 132, 102–112 (2015)CrossRefGoogle Scholar
  35. 35.
    Su, F.D.S., Agrawala, M.: De-emphasis of distracting image regions using texture power maps. In: Proceedings of Symposium Applied Perception Graphics Visual, pp. 119–124 (2005)Google Scholar
  36. 36.
    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. Comput. Sci. (2016)Google Scholar
  37. 37.
    Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. VLFeat.org (Online). http://www.vlfeat.org/ (2008)
  38. 38.
    Li, F., et al.: Libsvm-parallel-chi2 library, version 1.0. version. Dept. Mathe. Sci., Lund Uni. Lund, Sweden (Online). http://www.maths.lth.se/matematiklth/personal/sminchis/code/libsvm-chi2.html/ (2012)
  39. 39.
    Iosifidis, A.T.A., Pitas, I.: Discriminant bag of words based representation for human action recognition. Pattern Recognit. Let. 49, 185–192 (2014)CrossRefGoogle Scholar
  40. 40.
    Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732 (2014)Google Scholar
  41. 41.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE International Conference Computer Vision, pp. 3551–3558 (2013)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.State Key Laboratory of Robotics and SystemHarbin Institute of TechnologyHarbinChina
  2. 2.Massachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations