Action Recognition in Sports Videos Using Stacked Auto Encoder and HOG3D Features

  • Earnest Paul IjjinaEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1090)


Sports analytics is an emerging area of research with applications to personalized training and entertainment. In this work, an approach for sports action recognition using stacked autoencoder and HOG3D features is presented. We demonstrate that actions in sports videos can be recognized by 2D interpretation of HOG3D features, extracted from the bounding box of the player as input to a deep learning model. The ability of a stacked autoencoder to learn the underlying global patterns associated with each action is used to recognize human actions. We demonstrate the efficacy of the proposed classification system for action recognition on ACASVA dataset.


Stacked autoencoder (SAE) Sports action recognition HOG3D features 


  1. 1.
    Foggia, P., N. Petkov, A. Saggese, N. Strisciuglio, and M. Vento. 2015. Reliable Detection of Audio Events in Highly Noisy Environments. Pattern Recognition Letters 65: 22–28.CrossRefGoogle Scholar
  2. 2.
    Simard, P.Y., D. Steinkraus, J.C. Platt. 2003. Best practices for convolutional neural networks applied to visual document analysis. In ICDAR’03 Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, 958, Washington, DC, USA: IEEE Computer Society.
  3. 3.
    Ijjina, Paul E., and Mohan, Krishna C. 2017. Human Behavioral Analysis Using Evolutionary Algorithms and Deep Learning. Chap. 7, 165–186. Wiley.
  4. 4.
    Ljjjina, Earnest Paul. 2016. Classification of Human Actions Using Pose-Based Features and Stacked Auto Encoder. Pattern Recognition Letters 83: 268–277.CrossRefGoogle Scholar
  5. 5.
    Ijjina, E.P., and K.M. Chalavadi. 2017. Human Action Recognition in RGB-D Videos Using Motion Sequence Information and Deep Learning. Pattern Recognition 72: 504–516. Scholar
  6. 6.
    Klaser, A., M. Marszalek, and C. Schmid. 2008. A Spatio-Temporal Descriptor Based on D-Gradients. In BMVC 2008—19th British Machine Vision Conference, vol. 275, eds. M. Everingham, C. Needham, and R. Fraile, 1–10. UK: British Machine Vision Association, Leeds.
  7. 7.
    Lowe, D.G. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60 (2): 91–110.CrossRefGoogle Scholar
  8. 8.
    De Campos, T., M. Barnard, Mikolajczyk, K., Kittler, J., Yan, F., Christmas, W., Windridge, D. 2011. An Evaluation of Bags-of-Words and Spatio-Temporal Shapes for Action Recognition. In IEEE Workshop on Applications of Computer Vision (WACV), 344–351.
  9. 9.
    FarajiDavar, N., T. De Campos, J. Kittler, F. Yan. 2011. Transductive Transfer Learning for Action Recognition in Tennis Games. In IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 1548–1553.Google Scholar
  10. 10.
    Farajidavar, N., T. De Campos, D. Windridge, J. Kittler, W. Christmas. 2011. Domain Adaptation in the Context of Sport Video Action Recognition. In Domain Adaptation Workshop, in Conjunction with NIPS, 1–6.Google Scholar
  11. 11.
    Tran, D., L. Torresani. 2013. EXMOVES: Classifier-Based Features for Scalable Action Recognition. CoRR abs/1312.5785.Google Scholar
  12. 12.
    FarajiDavar, N., De Campos, T. 2018. Adaptive Cognition for Automated Sports Video Annotation (ACASVA). Accessed 5 Oct 2018.

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of Technology WarangalWarangalIndia

Personalised recommendations