Spatial Multi-scale Motion History Histograms and Its Applications

  • Asim Jan
  • Zunduo Zhao
  • Tong Chen
  • Hongying MengEmail author
  • Tao Lei
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1075)


Precisely describing the action inside of a video is a challenging task because the content of the video includes various objects, with different local motion information at different speed in the video frames. In this paper, a new video feature is proposed based on the spatial information of the objects in a frame, along with the motion information between one against multiple consecutive frames. Motion information between pixels at the same position in the whole video are all combined for a new Spatial Multi-Scale Motion History Histogram (SMMHH) dynamic descriptor. The detailed algorithm of the SMMHH was given and it is tested in both human action recognition and touch gesture recognition applications based on the public video datasets. Experimental results demonstrate its excellent performance compared to other traditional methods.


Motion Video dynamic feature Human action recognition Hand gesture recognition 


  1. 1.
    Meng, H., Pears, N., Freeman, M., Bailey, C.: Motion history histograms for human action recognition. In: Embedded Computer Vision, pp. 139–162. Springer (2009)Google Scholar
  2. 2.
    Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 6, 915–928 (2007)CrossRefGoogle Scholar
  3. 3.
    Davis, J.W., Bobick, A.F.: The representation and recognition of action using temporal templates. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 928–934 (1997)Google Scholar
  4. 4.
    Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2–3), 249–257 (2006)CrossRefGoogle Scholar
  5. 5.
    Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 7, 971–987 (2002)CrossRefGoogle Scholar
  6. 6.
    Mattivi, R., Shao, L.: Human action recognition using LBP-top as sparse spatio-temporal feature descriptor. In: International Conference on Computer Analysis of Images and Patterns, pp. 740–747. Springer (2009)Google Scholar
  7. 7.
    Almaev, T.R., Valstar, M.F.: Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 356–361. IEEE (2013)Google Scholar
  8. 8.
    Senechal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., Prevost, L.: Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(4), 993–1005 (2012)CrossRefGoogle Scholar
  9. 9.
    Jiang, B., Valstar, M.F., Pantic, M.: Action unit detection using sparse appearance descriptors in space-time video volumes. In: International Conference on Face and Gesture Recognition, pp. 314–321. IEEE (2011)Google Scholar
  10. 10.
    Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British Machine Vision Conference, vol. 275, pp. 1–10. British Machine Vision Association (2008)Google Scholar
  11. 11.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on computer vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE Computer Society (2005)Google Scholar
  12. 12.
    Meng, H., Pears, N., Bailey, C.: A human action recognition system for embedded computer vision application. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6. IEEE (2007)Google Scholar
  13. 13.
    Meng, H., Pears, N.: Descriptive temporal template features for visual motion recognition. Pattern Recogn. Lett. 30(12), 1049–1058 (2009)CrossRefGoogle Scholar
  14. 14.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, pp. 32–36 (2004)Google Scholar
  15. 15.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In:CVPR 2008-IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE Computer Society (2008)Google Scholar
  16. 16.
    Jung, M.M., Poppe, R., Poel, M., Heylen, D.K.J.: Touching the void–introducing cost: corpus of social touch. In: Proceedings of the 16th International Conference on Multimodal Interaction, pp. 120–127. ACM (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Asim Jan
    • 1
  • Zunduo Zhao
    • 2
  • Tong Chen
    • 2
  • Hongying Meng
    • 1
    • 2
    Email author
  • Tao Lei
    • 3
  1. 1.Brunel University LondonLondonUK
  2. 2.Southwest UniversityChongqingChina
  3. 3.Shaanxi University of Science and TechnologyXi’anChina

Personalised recommendations