Action Recognition in Realistic Sports Videos

Soomro, Khurram; Zamir, Amir R.

doi:10.1007/978-3-319-09396-3_9

Action Recognition in Realistic Sports Videos

Khurram Soomro⁵ &
Amir R. Zamir⁶

Chapter
First Online: 01 January 2015

2829 Accesses
96 Citations

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

The ability to analyze the actions which occur in a video is essential for automatic understanding of sports. Action localization and recognition in videos are two main research topics in this context. In this chapter, we provide a detailed study of the prominent methods devised for these two tasks which yield superior results for sports videos. We adopt UCF Sports, which is a dataset of realistic sports videos collected from broadcast television channels, as our evaluation benchmark. First, we present an overview of UCF Sports along with comprehensive statistics of the techniques tested on this dataset as well as the evolution of their performance over time. To provide further details about the existing action recognition methods in this area, we decompose the action recognition framework into three main steps of feature extraction, dictionary learning to represent a video, and classification; we overview several successful techniques for each of these steps. We also overview the problem of spatio-temporal localization of actions and argue that, in general, it manifests a more challenging problem compared to action recognition. We study several recent methods for action localization which have shown promising results on sports videos. Finally, we discuss a number of forward-thinking insights drawn from overviewing the action recognition and localization methods. In particular, we argue that performing the recognition on temporally untrimmed videos and attempting to describe an action, instead of conducting a forced-choice classification, are essential for analyzing the human actions in a realistic environment.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Download UCF Sports dataset: http://crcv.ucf.edu/data/UCF_Sports_Action.php.
2.
UCF Sports experimental setup for Action Localization: http://www.sfu.ca/~tla58/other/train_test_split.

References

Ahmad M, Lee SW (2008) Human action recognition using shape and CLG-motion flow from multi-view image sequences. Pattern Recognit 41(7):2237–2252
Article MATH Google Scholar
Alatas O, Yan P, Shah M (2007) Spatio-temporal regularity flow (SPREF): its estimation and applications. IEEE Trans Circuits Syst Video Technol 17(5):584–589
Article Google Scholar
Alexe B, Heess N, Teh Y, Ferrari V (2012) Searching for objects driven by context. In: Neural information processing systems (NIPS)
Google Scholar
Ali S, Shah M (2010) Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(2):288–303
Article Google Scholar
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell (TPAMI) 24(4):509–522
Article Google Scholar
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell (TPAMI) 23(3):257–267
Article Google Scholar
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell (TPAMI) 6:679–698
Article Google Scholar
Carreira J, Sminchisescu C (2010) Constrained parametric min-cuts for automatic object segmentation. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Cheng SC, Cheng KY, Chen YPP (2013) GHT-based associative memory learning and its application to human action detection and classification. Pattern Recognit 46(11):3117–3128
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision (ECCV)
Google Scholar
Dollar P (2010) A seismic shift in object detection. http://pdollar.wordpress.com/2013/12/10/a-seismic-shift-in-object-detection
Efros A, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. In: International conference on computer vision (ICCV)
Google Scholar
Endres I, Hoiem D (2014) Category-independent object proposals with diverse ranking. IEEE Trans Pattern Anal Mach Intell (TPAMI) 36:222–234
Article Google Scholar
Everts I, van Gemert J, Gevers T (2013) Evaluation of color stips for human action recognition. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: computer vision and pattern recognition (CVPR)
Google Scholar
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Comput vision and pattern recognition (CVPR), vol 25, pp 24–531
Google Scholar
Feng X, Perona P (2002) Human action recognition by sequence of movelet codewords. In: International symposium on 3D data processing, visualization, and transmission. IEEE, pp 717–721
Google Scholar
Freeman WT, Adelson EH (1991) The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intell (TPAMI) 13(9):891–906
Article Google Scholar
Gall J, Yao A, Razavi N, Van Gool L, Lempitsky V (2011) Hough forests for object detection, tracking, and action recognition. IEEE Trans Pattern Anal Mach Intell (TPAMI) 33(11):2188–2202
Article Google Scholar
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(12):2247–2253
Article Google Scholar
Harandi MT, Sanderson C, Shirazi S, Lovell BC (2013) Kernel analysis on Grassmann manifolds for action recognition. Pattern Recognit Lett 34(15):1906–1915
Article Google Scholar
Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey vision conference, vol 15. Manchester, p 50
Google Scholar
Jain M, van Gemert JC, Bouthemy P, Jégou H, Snoek C (2014) Action localization by tubelets from motion. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Jiang Z, Lin Z, Davis LS (2012) Recognizing human actions by learning and matching shape-motion prototype trees. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(3):533–547
Article Google Scholar
Jiang YG, Liu J, Zamir AR, Laptev I, Piccardi M, Shah M, Sukthankar R (2014) Thumos challenge: action recognition with a large number of classes
Google Scholar
Jiang Z, Lin Z, Davis L (2013) Label consistent K-SVD—learning a discriminative dictionary for recognition
Google Scholar
Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: British machine vision conference (BMVC)
Google Scholar
Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: International conference on computer vision (ICCV)
Google Scholar
Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123
Article Google Scholar
Laptev I, Lindeberg T (2003) Space-time interest points. In: International conference on computer vision (ICCV)
Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Le Q, Zou W, Yeung S, Ng A (2011) Learning hierarchical invariant spatiotemporal features for action recognition with independent subspace analysis. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Liu J, Kuipers B, Savarese S (2011) Recognizing human actions by attributes. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Lucas B.D, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: International joint conference on artificial intelligence (IJCAI)
Google Scholar
Ma S, Zhang J, Cinbis N, Sclaroff S (2013) Action recognition and localization by hierarchical space-time segments. In: International conference on computer vision (ICCV)
Google Scholar
Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: British machine vision conference (BMVC)
Google Scholar
Matikainen P, Hebert M, Sukthankar R (2009) Action recognition through the motion analysis of tracked features. In: ICCV workshops on video-oriented object and event classification
Google Scholar
Mendoza M.Á, De La Blanca NP (2008) Applying space state models in human action recognition: a comparative study. In: International Workshop on Articulated Motion and Deformable Objects. Springer, pp 53–62
Google Scholar
Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: International conference on computer vision (ICCV)
Google Scholar
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell (TPAMI) 27(10):1615–1630
Article Google Scholar
Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Mikolajczyk K, Zisserman A, Schmid C (2003) Shape recognition with edge-based features. In: British machine vision conference (BMVC)
Google Scholar
Nelson RC, Selinger A (1998) Large-scale tests of a keyed, appearance-based 3-d object recognition system. Vis Res 38(15):2469–2488
Article Google Scholar
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision (ECCV), pp 490–503
Google Scholar
O’Hara S, Draper B (2012) Scalable action recognition with a subspace forest. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Pope AR, Lowe DG (2000) Probabilistic models of appearance for 3-d object recognition. Int J Comput Vis 40(2):149–167
Article MATH Google Scholar
Qiu Q, Jiang Z, Chellappa R (2011) Sparse dictionary-based representation and recognition of action attributes. In: International conference on computer vision (ICCV)
Google Scholar
Randen T, Husoy JH (1999) Filtering for texture classification: a comparative study. IEEE Trans Pattern Anal Mach Intell (TPAMI) 21(4):291–310
Article Google Scholar
Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model. In: Neural information processing systems (NIPS)
Google Scholar
Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Rodriguez M, Ahmed J, Shah M (2008) Action Mach: a spatio-temporal maximum average correlation height filter for action recognition. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Schuldt C, Laptev I, Caputo B (2004 ) Recognizing human actions: a local SVM approach. In: International conference on pattern recognition (ICPR)
Google Scholar
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM international conference on multimedia
Google Scholar
Shapovalova N, Raptis M, Sigal L, Mori G (2013) Action is in the eye of the beholder: eye-gaze driven model for spatio-temporal action localization. In: Neural information processing systems (NIPS)
Google Scholar
Shi J, Tomasi C (1994) Good features to track. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Sminchisescu C, Kanaujia A, Metaxas D (2006) Conditional models for contextual human motion recognition. Comput Vis Image Underst 104(2):210–220
Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human action classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).
Sun J, Mu Y, Yan S, Cheong L (2010) Activity recognition using dense long-duration trajectories. In: International conference on multimedia and expo
Google Scholar
Sun J, Wu X, Yan S, Cheong L, Chua T, Li J (2009) Hierarchical spatio-temporal context modeling for action recognition. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran, A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: Computer vision and pattern recognition
Google Scholar
Thi TH, Cheng L, Zhang J, Wang L, Satoh S (2012) Integrating local action elements for action analysis. Comput Vis Image Underst 116(3):378–395
Article Google Scholar
Tian Y, Sukthankar R, Shah M (2013) Spatiotemporal deformable part models for action detection. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Tran D, Sorokin A (2008) Human activity recognition with metric learning. In: European conference on computer vision (ECCV)
Google Scholar
Tran D, Yuan J (2012) Max-margin structured output regression for spatio-temporal action localization. In: Neural information processing systems (NIPS)
Google Scholar
Uijlings J, van de Sande K, Gevers T, Smeulders A (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
van Gool L, Moons T, Ungureanu D (1996) Affine/photometric invariants for planar intensity patterns. In: European conference on computer vision (ECCV)
Google Scholar
Wang Y, Huang K, Tan T (2007) Human activity recognition based on r transform. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Wang H, Ullah MM, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: British machine vision conference (BMVC)
Google Scholar
Wang C, Wang Y, Yuille A (2013) An approach to pose-based action recognition. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Wang H, Kläser A, Schmid C, Liu C (2011) Action recognition by dense trajectories. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Article MathSciNet Google Scholar
Wang L, Wang Y, Gao W (2011) Mining layered grammar rules for action recognition. Int J Comput Vis 93(2):162–182
Article MATH MathSciNet Google Scholar
Willems G, Tuytelaars T, van Gool L (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision (ECCV)
Google Scholar
Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Yang J, Yang M (2012) Top-down visual saliency via joint CRF and dictionary learning. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Yang J, Yu K, Gong Y, Huang T (2009) Computer vision and pattern recognition (CVPR)
Google Scholar
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Yao A, Gall J, van Gool L (2010) A Hough transform-based voting framework for action recognition. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: International conference on computer vision (ICCV)
Google Scholar
Yilmaz A, Shah M (2005) A novel action representation. In: Computer vision and pattern recognition (CVPR)
Google Scholar
Yuan C, Hu W, Tian G, Yang S, Wang H (2013) Multi-task sparse learning with beta process prior for action recognition. In: Computer vision and pattern recognition (CVPR)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Research in Computer Vision, University of Central Florida, Orlando, FL, 32826, USA
Khurram Soomro
Gates Computer Science, #130, Stanford University, 353 Serra Mall, Stanford, CA, 94305, USA
Amir R. Zamir

Authors

Khurram Soomro
View author publications
You can also search for this author in PubMed Google Scholar
Amir R. Zamir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khurram Soomro .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Thomas B. Moeslund
BBC R&D, London, United Kingdom
Graham Thomas
University of Surrey, Guildford, Surrey, United Kingdom
Adrian Hilton

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Soomro, K., Zamir, A.R. (2014). Action Recognition in Realistic Sports Videos. In: Moeslund, T., Thomas, G., Hilton, A. (eds) Computer Vision in Sports. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-09396-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-09396-3_9
Published: 20 January 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09395-6
Online ISBN: 978-3-319-09396-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics