Multimedia Tools and Applications

, Volume 74, Issue 21, pp 9323–9338 | Cite as

Compressed domain human action recognition in H.264/AVC video streams

  • Manu Tom
  • R. Venkatesh Babu
  • R Gnana Praveen


This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of the proposed work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can result in reduced hardware utilization and faster recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust to outdoor as well as indoor testing scenarios. We have evaluated the performance of the proposed method on two benchmark action datasets and achieved more than 85 % accuracy. The proposed algorithm classifies actions with speed ( > 2,000 fps) approximately 100 times faster than existing state-of-the-art pixel-domain algorithms.


H.264/AVC Human action recognition Compressed domain video analysis Motion vectors Quantization parameters 



This work was supported by CARS (CARS-25) project from Centre for Artificial Intelligence and Robotics, Defence Research and Development Organization (DRDO), Govt. of India.


  1. 1.
    Amiri SM, Nasiopoulos P, Leung, VCM (2012) Non-negative sparse coding for human action recognition. Proceedings of the IEEE International Conference on Image ProcessingGoogle Scholar
  2. 2.
    Babu RV, Anantharaman B, Ramakrishnan KR, Srinivasan SH (2002) Compressed domain action classification using HMM. Pattern Recogn Lett 23:1203–1213Google Scholar
  3. 3.
    Babu RV, Ramakrishnan KR (2004) Recognition of human actions using motion history information extracted from the compressed video. Image Vis Comput 22(8):597–607CrossRefGoogle Scholar
  4. 4.
    Biswas S, Babu RV (2013) H.264 compressed video classification using histogram of oriented motion vectors (HOMV). In: Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2040–2044Google Scholar
  5. 5.
    Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Proceedings of the Tenth International Conference on Computer VisionGoogle Scholar
  6. 6.
    Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267CrossRefGoogle Scholar
  7. 7.
    Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2 (27):1–27CrossRefGoogle Scholar
  8. 8.
    Chuohao Y, Ahammad P, Ramchandran K, Sastry SS (2008) High-speed action recognition and localization in compressed domain videos. IEEE Trans Circ Syst Video Technol 18(8):1006–1015CrossRefGoogle Scholar
  9. 9.
    Efros AA, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. Proc Int Conf Comp Vision 2:726–733CrossRefGoogle Scholar
  10. 10.
    Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253CrossRefGoogle Scholar
  11. 11.
  12. 12.
    Joint model H.264/AVC reference software.
  13. 13.
    Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (ICCV)Google Scholar
  14. 14.
    Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2/3):107–123CrossRefGoogle Scholar
  15. 15.
    Li Z, Fu Y, Huang T, Yan S (2008) Real-time human action recognition by luminance field trajectory analysis. In: Proceedings of the 16th ACM International conference on MultimediaGoogle Scholar
  16. 16.
    Lin CA, Lin YY, Liao HYM, Jeng SK (2012) Action recognition using instance-specific and class-consistent cues. In: Proceedings of the IEEE International Conference on Image ProcessingGoogle Scholar
  17. 17.
    Liu C, Yuen PC (2010) Human action recognition using boosted eigenactions. Image Vis Comput 28(5):825–835CrossRefGoogle Scholar
  18. 18.
    Ozer B, Wolf W, Akansu AN (2000) Human activity detection in MPEG sequences. In: Proceedings of the Workshop on Human MotionGoogle Scholar
  19. 19.
    Poppe R (2010) A survey on vision-based human action recognition. Int J Comput Vis 28(2/3):976–990CrossRefGoogle Scholar
  20. 20.
    Sadek S, Al-Hamadi A, Michaelis B, Sayed U (2012) A fast statistical approach for human activity recognition. Int J Intell Sci 2(1):9–15CrossRefGoogle Scholar
  21. 21.
    Schldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local SVM approach. In: Proceedings of the 17th International Conference on Pattern RecognitionGoogle Scholar
  22. 22.
    Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402Google Scholar
  23. 23.
    Sullivan G, Ohm J, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Video Technol 22(12):1649–1668CrossRefGoogle Scholar
  24. 24.
    Tom M, Babu RV (2013) Fast moving-object detection in H.264/AVC compressed domain for video surveillance. In: Proceedings of the National Conference on Computer Vision, Pattern Recognition, Image Processing and GraphicsGoogle Scholar
  25. 25.
    Wang H, Ullah MM, Klser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: Proceedings of British Machine Vision ConferenceGoogle Scholar
  26. 26.
    Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision and Image Understanding 115(2):224–241Google Scholar
  27. 27.
    Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circ Syst Video Technol 13(7):560–576Google Scholar
  28. 28.
    Wu B, Yuan C, Hu W (2012) Human action recognition based on a heat kernel structural descriptor. In: Proceedings of the IEEE International Conference on Image ProcessingGoogle Scholar
  29. 29.
  30. 30.
    Yu TH, Kim TK, Cipolla R (2010) Real-time action recognition by spatiotemporal semantic and structural forests. In: British Machine Vision ConferenceGoogle Scholar
  31. 31.
    Zhang X, Miao Z, Wan L (2012) Human action categories using motion descriptors. In: Proceedings of the IEEE International Conference on Image ProcessingGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Manu Tom
    • 1
  • R. Venkatesh Babu
    • 1
  • R Gnana Praveen
    • 1
  1. 1.Video Analytics Lab, SERCIndian Institute of ScienceBangaloreIndia

Personalised recommendations