Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model

Fakhar, Babak; Rashidy Kanan, Hamidreza; Behrad, Alireza

doi:10.1007/s11042-018-7083-1

Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model

Published: 03 January 2019

Volume 78, pages 16995–17025, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

762 Accesses
19 Citations
Explore all metrics

Abstract

Most existing researches for semantic analysis of soccer videos benefit from special approaches to bridge the semantic gap between low-level features and high-level events using a hierarchical structure. In this paper, we propose a novel data-driven model for automatic recognition of important events in soccer broadcast videos based on the analysis of spatio-temporal local features of video frames. Our presented algorithm explores the local visual content of video frames by focusing on spatial and temporal learned features in a low-dimensional transformed sparse space. The proposed algorithm, without using mid-level futures, dynamically extracts the most informative semantic concepts/features and improves the generality of the system. The dictionary learning process plays an important role in sparse coding and sparse representation-based event classification. In this paper, we present a novel dictionary learning method, which calculates several category-specific dictionaries by training the detected shots of various view categories. In order to evaluate the feasibility and effectiveness of the proposed algorithm, an extensive experimental investigation is conducted for the analysis, detection, and classification of soccer events on a large collection of video data. Experimental results indicate that our approach outperforms the state-of-the-art methods and demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 11

Keep It Simple and Sparse: Real-Time Action Recognition

Unsupervised mining of visually consistent shots for sports genre categorization over large-scale database

Article 12 December 2014

Discriminative self-adapted locality-sensitive sparse representation for video semantic analysis

Article 17 May 2018

References

Aharon M, Elad M, Bruckstein A (2006) $ rm k K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54:4311–4322
Article MATH Google Scholar
Akrivas G, Stamou GB, Kollias S (2004) Semantic association of multimedia document descriptions through fuzzy relational algebra and fuzzy reasoning. IEEE Trans Syst Man Cybernet-Part A: Syst Humans 34:190–196
Article Google Scholar
Bengio Y, Frasconi P (1994) Credit assignment through time: alternatives to backpropagation. Adv Neural Inform Process Syst: 75–82
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5:157–166
Article Google Scholar
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia 14:66–75
Article Google Scholar
Cong Y, Yuan J, Liu JJPR (2013) Abnormal event detection in crowded scenes using sparse representation 46: 1851–1864
Cong Y, Yuan J, Liu J (2013) Abnormal event detection in crowded scenes using sparse representation. Pattern Recogn 46:1851–1864
Article Google Scholar
D’Orazio T, Leo M, Spagnolo P, Nitti M, Mosca N, Distante A (2009) A visual system for real time detection of goal events during soccer matches. Comput Vis Image Underst 113:622–632
Article Google Scholar
Dai W, Shen Y, Tang X, Zou J, Xiong H, Chen CW (2016) Sparse representation with Spatio-temporal online dictionary learning for promising video coding. IEEE Trans Image Process 25:4580–4595
Article MathSciNet MATH Google Scholar
D'Orazio T, Leo M, Spagnolo P, Mazzeo PL, Mosca N, Nitti M et al (2009) An investigation into the feasibility of real-time soccer offside detection from a multiple camera system. IEEE Trans Circ Syst Video Technol 19:1804–1818
Article Google Scholar
Ekin A, Tekalp AM, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12:796–807
Article Google Scholar
Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15:3736–3745
Article MathSciNet Google Scholar
F. J. I. T o p a Perronnin and m intelligence (2008) Universal and adapted vocabularies for generic visual categorization 30: 1243–1256
Fani M, Yazdi M, Clausi DA, Wong A (2017) Soccer video structure analysis by parallel feature fusion network and hidden-to-observable transferring Markov model. IEEE Access 5:27322–27336
Article Google Scholar
Guan G, Wang Z, Yu K, Mei S, He M, Feng D (2012) Video summarization with global and local features. Multimed Expo Workshops (ICMEW), 2012 IEEE Int Conf: 570–575
Guan G, Wang Z, Lu S, Da Deng J, Feng DD (2013) Keypoint-based keyframe selection. IEEE Trans Circ Syst Video Technol 23:729–734
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Article Google Scholar
Hosseini M-S, Eftekhari-Moghadam A-M (2013) Fuzzy rule-based reasoning approach for event detection and annotation of broadcast soccer video. Appl Soft Comput 13:846–866
Article Google Scholar
Huang C-L, Shih H-C, Chao C-Y (2006) Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans Multimed 8:749–760
Article Google Scholar
Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimed 14:1196–1205
Article Google Scholar
Jai-Andaloussi S, El Mourabit I, Madrane N, Chaouni SB, Sekkaki A (2015) Soccer events summarization by using sentiment analysis. Comput Sci Comput Intell (CSCI), 2015 Int Conf: 398–403
Ji Won Lee D-WN, Moon S-W, Lee J, Yoo W-Y (2017) Soccer event recognition technique based on pattern matching. Comput Sci Inform Syst (FedCSIS), 2017 Fed Conf: 4, 3–6
Jiang Z, Lin Z, Davis LS (2013) Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35:2651–2664
Article Google Scholar
Jiang H, Lu Y, Xue J (2016) Automatic soccer video event detection based on a deep neural network combined CNN and RNN. Tools Artif Intell (ICTAI), 2016 IEEE 28th Int Conf: 490–494
Kolekar MH, Sengupta S (2015) Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Trans Broadcast 61:195–209
Article Google Scholar
Kolekar MH, Sengupta SJITOB (2015) Bayesian network-based customized highlight generation for broadcast soccer videos 61: 195–209
Li N, Wu X, Xu D, Guo H, Feng W (2015) Spatio-temporal context analysis within video volumes for anomalous-event detection and localization. Neurocomputing 155:309–319
Article Google Scholar
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. IJCAI: 1617–1623
Liu Y, Nie L, Liu L, Rosenblum DSJN (2016) From action to activity: sensor-based activity. Recognition 181:108–115
Google Scholar
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning
Liu T, Lu Y, Lei X, Zhang L, Wang H, Huang W et al. (2017) Soccer video event detection using 3D convolutional networks and shot boundary detection via deep feature distance. Int Conf Neural Inform Process: 440–449
Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16:1497–1509
Article Google Scholar
Mairal J, Leordeanu M, Bach F, Hebert M, Ponce J (2008) Discriminative sparse image models for class-specific edge detection and image interpretation. European conference on computer vision: 43–56
Mei S, Guan G, Wang Z, Wan S, He M, Feng DDJPR (2015) Video summarization via minimum sparse reconstruction 48: 522–533
Nagasaka A, Tanaka Y (1992) Automatic video indexing and full-video search for object appearances
Ouyang J-q, Liu R (2013) Ontology reasoning scheme for constructing meaningful sports video summarisation. IET Image Process 7:324–334
Article Google Scholar
Pandya MAZDS (2017) Frame based approach for automatic event boundary detection of soccer video using optical flow. Conference: Conference: 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA): 5
Park J-H, Cho K (2016) Extraction of visual information in basketball broadcasting video for event segmentation system. Inform Commun Technol convergence (ICTC), 2016 Int Conf: 1098–1100
Perin C, Vuillemot R, Fekete J-D (2013) SoccerStories: a kick-off for visual soccer analysis. IEEE Trans Vis Comput Graph 19:2506–2515
Article Google Scholar
Poultney C, Chopra S, Cun YL (2007) Efficient learning of sparse representations with an energy-based model. Adv Neural Inform Process Syst: 1137–1144
Qian X, Wang H, Liu G, Hou X (2012) HMM based soccer video event detection using enhanced mid-level semantic. Multimed Tools Appl 60:233–255
Article Google Scholar
Ramirez I, Sprechmann P, Sapiro G (2010) Classification and clustering via dictionary learning with structured incoherence and shared features
Raventos A, Quijada R, Torres L, Tarrés F (2015) Automatic summarization of soccer highlights using audio-visual descriptors. SpringerPlus 4:301
Article Google Scholar
Roy D, Srinivas M, Mohan CK (2016) Sparsity-inducing dictionaries for effective action classification. Pattern Recogn 59:55–62
Article Google Scholar
Sadlier DA, O'Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Video Technol 15:1225–1233
Article Google Scholar
Saraogi H, Sharma RA, Kumar V (2016) Event recognition in broadcast soccer videos Proc Tenth Indian Conf Comput Vision Graph Image Process: 14
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Comput Vision Pattern Recogn 2006 IEEE Comput Soc Conf: 2169–2178
Sigari M-H, Soltanian-Zadeh H, Pourreza H-R (2016) A framework for dynamic restructuring of semantic video analysis systems based on learning attention control. Image Vis Comput 53:20–34
Article Google Scholar
Sivalingam R, Boley D, Morellas V, Papanikolopoulos N (2011) Positive definite dictionary learning for region covariances. Comput Vision (ICCV), 2011 IEEE Int Conf: 1013–1019
Song W, Hagras H (2017) A type-2 fuzzy logic system for event detection in soccer videos. Fuzzy Syst (FUZZ-IEEE), 2017 IEEE Int Conf: 1–6
Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans Circ Syst Video Technol 24:291–304
Article Google Scholar
Tjondronegoro DW, Chen Y-PP (2010) Knowledge-discounted event detection in sports video. IEEE Trans Syst Man Cybernet-Part A: Syst Humans 40:1009–1024
Article Google Scholar
Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53:4655–4666
Article MathSciNet MATH Google Scholar
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. Comput Vision Pattern Recogn (CVPR), 2010 IEEE Conf: 3360–3367
Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. Tools Artif Intell (ICTAI), 2015 IEEE 27th Int Conf: 234–241
Wang C, Yang H, Meinel C (2016) Exploring multimodal video representation for action recognition. Neural Networks (IJCNN), 2016 International Joint Conf: 1924–1931
Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. Proc 2016 ACM Multimed Conf: 988–997
Wang C, Yang H, C J M T Meinel, and Applications (2016) A deep semantic framework for multimodal representation learning 75: 9255–9276
Wang Z, Yu J, He YJITOC, S. F. V Technology (2017) Soccer video event annotation by synchronization of attack–defense clips and match reports with coarse-grained time information 27: 1104–1117,
Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Sign Process Lett 24:510–514
Article Google Scholar
Wang C, Yang H, Meinel CJATOMC (2018) Communications,, and applications. Image Cap Deep Bidirect LSTMs Multi-Task Learn 14:40
Google Scholar
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31:210–227
Article Google Scholar
Xiang T, Gong S (2008) Video behavior profiling for anomaly detection. IEEE Trans Pattern Anal Mach Intell 30:893–908
Article Google Scholar
Xie W, Tong M (2011) A novel framework for soccer goal detection based on semantic rule. J Electron (China) 28:670–674
Article Google Scholar
Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based face recognition. Image Process (ICIP), 2010 17th IEEE Int Conf: 1601–1604
Yang M, Zhang L, Feng X, Zhang DJIJOCV (2014) Sparse representation based fisher discrimination dictionary learning for image classification 109: 209–232
Zawbaa HM, El-Bendary N, Hassanien AE, Abraham A (2011) SVM-based soccer video summarization system. Nature Biol Inspired Comput (NaBIC), 2011 Third World Congress: 7–11
Zhang Z, Xu Y, Yang J, Li X, Zhang D (2015) A survey of sparse representation: algorithms and applications. IEEE access 3:490–530
Article Google Scholar
Zhao W, Lu Y, Jiang H, Huang W (2015) Event detection in soccer videos using shot focus identification. Pattern Recogn (ACPR), 2015 3rd IAPR Asian Conf: 341–345
Zhao Z, Song Y, Su F (2016) Specific video identification via joint learning of latent semantic concept, scene and temporal structure. Neurocomputing 208:378–386
Article Google Scholar
Zhou N, Shen Y, Peng J, Fan J (2012) Learning inter-related visual dictionary for object recognition. Computer vision and pattern recognition (CVPR), 2012 IEEE conference: 3490–3497

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
Babak Fakhar
Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Iran
Hamidreza Rashidy Kanan
Department of Electrical Engineering, Shahed University, Tehran, Iran
Alireza Behrad

Authors

Babak Fakhar
View author publications
You can also search for this author in PubMed Google Scholar
Hamidreza Rashidy Kanan
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Behrad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamidreza Rashidy Kanan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fakhar, B., Rashidy Kanan, H. & Behrad, A. Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model. Multimed Tools Appl 78, 16995–17025 (2019). https://doi.org/10.1007/s11042-018-7083-1

Download citation

Received: 05 March 2018
Revised: 06 November 2018
Accepted: 18 December 2018
Published: 03 January 2019
Issue Date: 30 June 2019
DOI: https://doi.org/10.1007/s11042-018-7083-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model

Abstract

Access this article

Similar content being viewed by others

Keep It Simple and Sparse: Real-Time Action Recognition

Unsupervised mining of visually consistent shots for sports genre categorization over large-scale database

Discriminative self-adapted locality-sensitive sparse representation for video semantic analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model

Abstract

Access this article

Similar content being viewed by others

Keep It Simple and Sparse: Real-Time Action Recognition

Unsupervised mining of visually consistent shots for sports genre categorization over large-scale database

Discriminative self-adapted locality-sensitive sparse representation for video semantic analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation