Abstract
Most existing researches for semantic analysis of soccer videos benefit from special approaches to bridge the semantic gap between low-level features and high-level events using a hierarchical structure. In this paper, we propose a novel data-driven model for automatic recognition of important events in soccer broadcast videos based on the analysis of spatio-temporal local features of video frames. Our presented algorithm explores the local visual content of video frames by focusing on spatial and temporal learned features in a low-dimensional transformed sparse space. The proposed algorithm, without using mid-level futures, dynamically extracts the most informative semantic concepts/features and improves the generality of the system. The dictionary learning process plays an important role in sparse coding and sparse representation-based event classification. In this paper, we present a novel dictionary learning method, which calculates several category-specific dictionaries by training the detected shots of various view categories. In order to evaluate the feasibility and effectiveness of the proposed algorithm, an extensive experimental investigation is conducted for the analysis, detection, and classification of soccer events on a large collection of video data. Experimental results indicate that our approach outperforms the state-of-the-art methods and demonstrate the effectiveness of the proposed approach.
Similar content being viewed by others
References
Aharon M, Elad M, Bruckstein A (2006) $ rm k K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54:4311–4322
Akrivas G, Stamou GB, Kollias S (2004) Semantic association of multimedia document descriptions through fuzzy relational algebra and fuzzy reasoning. IEEE Trans Syst Man Cybernet-Part A: Syst Humans 34:190–196
Bengio Y, Frasconi P (1994) Credit assignment through time: alternatives to backpropagation. Adv Neural Inform Process Syst: 75–82
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5:157–166
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia 14:66–75
Cong Y, Yuan J, Liu JJPR (2013) Abnormal event detection in crowded scenes using sparse representation 46: 1851–1864
Cong Y, Yuan J, Liu J (2013) Abnormal event detection in crowded scenes using sparse representation. Pattern Recogn 46:1851–1864
D’Orazio T, Leo M, Spagnolo P, Nitti M, Mosca N, Distante A (2009) A visual system for real time detection of goal events during soccer matches. Comput Vis Image Underst 113:622–632
Dai W, Shen Y, Tang X, Zou J, Xiong H, Chen CW (2016) Sparse representation with Spatio-temporal online dictionary learning for promising video coding. IEEE Trans Image Process 25:4580–4595
D'Orazio T, Leo M, Spagnolo P, Mazzeo PL, Mosca N, Nitti M et al (2009) An investigation into the feasibility of real-time soccer offside detection from a multiple camera system. IEEE Trans Circ Syst Video Technol 19:1804–1818
Ekin A, Tekalp AM, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12:796–807
Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15:3736–3745
F. J. I. T o p a Perronnin and m intelligence (2008) Universal and adapted vocabularies for generic visual categorization 30: 1243–1256
Fani M, Yazdi M, Clausi DA, Wong A (2017) Soccer video structure analysis by parallel feature fusion network and hidden-to-observable transferring Markov model. IEEE Access 5:27322–27336
Guan G, Wang Z, Yu K, Mei S, He M, Feng D (2012) Video summarization with global and local features. Multimed Expo Workshops (ICMEW), 2012 IEEE Int Conf: 570–575
Guan G, Wang Z, Lu S, Da Deng J, Feng DD (2013) Keypoint-based keyframe selection. IEEE Trans Circ Syst Video Technol 23:729–734
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Hosseini M-S, Eftekhari-Moghadam A-M (2013) Fuzzy rule-based reasoning approach for event detection and annotation of broadcast soccer video. Appl Soft Comput 13:846–866
Huang C-L, Shih H-C, Chao C-Y (2006) Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans Multimed 8:749–760
Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimed 14:1196–1205
Jai-Andaloussi S, El Mourabit I, Madrane N, Chaouni SB, Sekkaki A (2015) Soccer events summarization by using sentiment analysis. Comput Sci Comput Intell (CSCI), 2015 Int Conf: 398–403
Ji Won Lee D-WN, Moon S-W, Lee J, Yoo W-Y (2017) Soccer event recognition technique based on pattern matching. Comput Sci Inform Syst (FedCSIS), 2017 Fed Conf: 4, 3–6
Jiang Z, Lin Z, Davis LS (2013) Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35:2651–2664
Jiang H, Lu Y, Xue J (2016) Automatic soccer video event detection based on a deep neural network combined CNN and RNN. Tools Artif Intell (ICTAI), 2016 IEEE 28th Int Conf: 490–494
Kolekar MH, Sengupta S (2015) Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Trans Broadcast 61:195–209
Kolekar MH, Sengupta SJITOB (2015) Bayesian network-based customized highlight generation for broadcast soccer videos 61: 195–209
Li N, Wu X, Xu D, Guo H, Feng W (2015) Spatio-temporal context analysis within video volumes for anomalous-event detection and localization. Neurocomputing 155:309–319
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. IJCAI: 1617–1623
Liu Y, Nie L, Liu L, Rosenblum DSJN (2016) From action to activity: sensor-based activity. Recognition 181:108–115
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning
Liu T, Lu Y, Lei X, Zhang L, Wang H, Huang W et al. (2017) Soccer video event detection using 3D convolutional networks and shot boundary detection via deep feature distance. Int Conf Neural Inform Process: 440–449
Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16:1497–1509
Mairal J, Leordeanu M, Bach F, Hebert M, Ponce J (2008) Discriminative sparse image models for class-specific edge detection and image interpretation. European conference on computer vision: 43–56
Mei S, Guan G, Wang Z, Wan S, He M, Feng DDJPR (2015) Video summarization via minimum sparse reconstruction 48: 522–533
Nagasaka A, Tanaka Y (1992) Automatic video indexing and full-video search for object appearances
Ouyang J-q, Liu R (2013) Ontology reasoning scheme for constructing meaningful sports video summarisation. IET Image Process 7:324–334
Pandya MAZDS (2017) Frame based approach for automatic event boundary detection of soccer video using optical flow. Conference: Conference: 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA): 5
Park J-H, Cho K (2016) Extraction of visual information in basketball broadcasting video for event segmentation system. Inform Commun Technol convergence (ICTC), 2016 Int Conf: 1098–1100
Perin C, Vuillemot R, Fekete J-D (2013) SoccerStories: a kick-off for visual soccer analysis. IEEE Trans Vis Comput Graph 19:2506–2515
Poultney C, Chopra S, Cun YL (2007) Efficient learning of sparse representations with an energy-based model. Adv Neural Inform Process Syst: 1137–1144
Qian X, Wang H, Liu G, Hou X (2012) HMM based soccer video event detection using enhanced mid-level semantic. Multimed Tools Appl 60:233–255
Ramirez I, Sprechmann P, Sapiro G (2010) Classification and clustering via dictionary learning with structured incoherence and shared features
Raventos A, Quijada R, Torres L, Tarrés F (2015) Automatic summarization of soccer highlights using audio-visual descriptors. SpringerPlus 4:301
Roy D, Srinivas M, Mohan CK (2016) Sparsity-inducing dictionaries for effective action classification. Pattern Recogn 59:55–62
Sadlier DA, O'Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Video Technol 15:1225–1233
Saraogi H, Sharma RA, Kumar V (2016) Event recognition in broadcast soccer videos Proc Tenth Indian Conf Comput Vision Graph Image Process: 14
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Comput Vision Pattern Recogn 2006 IEEE Comput Soc Conf: 2169–2178
Sigari M-H, Soltanian-Zadeh H, Pourreza H-R (2016) A framework for dynamic restructuring of semantic video analysis systems based on learning attention control. Image Vis Comput 53:20–34
Sivalingam R, Boley D, Morellas V, Papanikolopoulos N (2011) Positive definite dictionary learning for region covariances. Comput Vision (ICCV), 2011 IEEE Int Conf: 1013–1019
Song W, Hagras H (2017) A type-2 fuzzy logic system for event detection in soccer videos. Fuzzy Syst (FUZZ-IEEE), 2017 IEEE Int Conf: 1–6
Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans Circ Syst Video Technol 24:291–304
Tjondronegoro DW, Chen Y-PP (2010) Knowledge-discounted event detection in sports video. IEEE Trans Syst Man Cybernet-Part A: Syst Humans 40:1009–1024
Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53:4655–4666
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. Comput Vision Pattern Recogn (CVPR), 2010 IEEE Conf: 3360–3367
Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. Tools Artif Intell (ICTAI), 2015 IEEE 27th Int Conf: 234–241
Wang C, Yang H, Meinel C (2016) Exploring multimodal video representation for action recognition. Neural Networks (IJCNN), 2016 International Joint Conf: 1924–1931
Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. Proc 2016 ACM Multimed Conf: 988–997
Wang C, Yang H, C J M T Meinel, and Applications (2016) A deep semantic framework for multimodal representation learning 75: 9255–9276
Wang Z, Yu J, He YJITOC, S. F. V Technology (2017) Soccer video event annotation by synchronization of attack–defense clips and match reports with coarse-grained time information 27: 1104–1117,
Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Sign Process Lett 24:510–514
Wang C, Yang H, Meinel CJATOMC (2018) Communications,, and applications. Image Cap Deep Bidirect LSTMs Multi-Task Learn 14:40
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31:210–227
Xiang T, Gong S (2008) Video behavior profiling for anomaly detection. IEEE Trans Pattern Anal Mach Intell 30:893–908
Xie W, Tong M (2011) A novel framework for soccer goal detection based on semantic rule. J Electron (China) 28:670–674
Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based face recognition. Image Process (ICIP), 2010 17th IEEE Int Conf: 1601–1604
Yang M, Zhang L, Feng X, Zhang DJIJOCV (2014) Sparse representation based fisher discrimination dictionary learning for image classification 109: 209–232
Zawbaa HM, El-Bendary N, Hassanien AE, Abraham A (2011) SVM-based soccer video summarization system. Nature Biol Inspired Comput (NaBIC), 2011 Third World Congress: 7–11
Zhang Z, Xu Y, Yang J, Li X, Zhang D (2015) A survey of sparse representation: algorithms and applications. IEEE access 3:490–530
Zhao W, Lu Y, Jiang H, Huang W (2015) Event detection in soccer videos using shot focus identification. Pattern Recogn (ACPR), 2015 3rd IAPR Asian Conf: 341–345
Zhao Z, Song Y, Su F (2016) Specific video identification via joint learning of latent semantic concept, scene and temporal structure. Neurocomputing 208:378–386
Zhou N, Shen Y, Peng J, Fan J (2012) Learning inter-related visual dictionary for object recognition. Computer vision and pattern recognition (CVPR), 2012 IEEE conference: 3490–3497
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fakhar, B., Rashidy Kanan, H. & Behrad, A. Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model. Multimed Tools Appl 78, 16995–17025 (2019). https://doi.org/10.1007/s11042-018-7083-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-7083-1