Skip to main content
Log in

Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Most existing researches for semantic analysis of soccer videos benefit from special approaches to bridge the semantic gap between low-level features and high-level events using a hierarchical structure. In this paper, we propose a novel data-driven model for automatic recognition of important events in soccer broadcast videos based on the analysis of spatio-temporal local features of video frames. Our presented algorithm explores the local visual content of video frames by focusing on spatial and temporal learned features in a low-dimensional transformed sparse space. The proposed algorithm, without using mid-level futures, dynamically extracts the most informative semantic concepts/features and improves the generality of the system. The dictionary learning process plays an important role in sparse coding and sparse representation-based event classification. In this paper, we present a novel dictionary learning method, which calculates several category-specific dictionaries by training the detected shots of various view categories. In order to evaluate the feasibility and effectiveness of the proposed algorithm, an extensive experimental investigation is conducted for the analysis, detection, and classification of soccer events on a large collection of video data. Experimental results indicate that our approach outperforms the state-of-the-art methods and demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Aharon M, Elad M, Bruckstein A (2006) $ rm k K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54:4311–4322

    Article  MATH  Google Scholar 

  2. Akrivas G, Stamou GB, Kollias S (2004) Semantic association of multimedia document descriptions through fuzzy relational algebra and fuzzy reasoning. IEEE Trans Syst Man Cybernet-Part A: Syst Humans 34:190–196

    Article  Google Scholar 

  3. Bengio Y, Frasconi P (1994) Credit assignment through time: alternatives to backpropagation. Adv Neural Inform Process Syst: 75–82

  4. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5:157–166

    Article  Google Scholar 

  5. Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia 14:66–75

    Article  Google Scholar 

  6. Cong Y, Yuan J, Liu JJPR (2013) Abnormal event detection in crowded scenes using sparse representation 46: 1851–1864

  7. Cong Y, Yuan J, Liu J (2013) Abnormal event detection in crowded scenes using sparse representation. Pattern Recogn 46:1851–1864

    Article  Google Scholar 

  8. D’Orazio T, Leo M, Spagnolo P, Nitti M, Mosca N, Distante A (2009) A visual system for real time detection of goal events during soccer matches. Comput Vis Image Underst 113:622–632

    Article  Google Scholar 

  9. Dai W, Shen Y, Tang X, Zou J, Xiong H, Chen CW (2016) Sparse representation with Spatio-temporal online dictionary learning for promising video coding. IEEE Trans Image Process 25:4580–4595

    Article  MathSciNet  MATH  Google Scholar 

  10. D'Orazio T, Leo M, Spagnolo P, Mazzeo PL, Mosca N, Nitti M et al (2009) An investigation into the feasibility of real-time soccer offside detection from a multiple camera system. IEEE Trans Circ Syst Video Technol 19:1804–1818

    Article  Google Scholar 

  11. Ekin A, Tekalp AM, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12:796–807

    Article  Google Scholar 

  12. Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15:3736–3745

    Article  MathSciNet  Google Scholar 

  13. F. J. I. T o p a Perronnin and m intelligence (2008) Universal and adapted vocabularies for generic visual categorization 30: 1243–1256

  14. Fani M, Yazdi M, Clausi DA, Wong A (2017) Soccer video structure analysis by parallel feature fusion network and hidden-to-observable transferring Markov model. IEEE Access 5:27322–27336

    Article  Google Scholar 

  15. Guan G, Wang Z, Yu K, Mei S, He M, Feng D (2012) Video summarization with global and local features. Multimed Expo Workshops (ICMEW), 2012 IEEE Int Conf: 570–575

  16. Guan G, Wang Z, Lu S, Da Deng J, Feng DD (2013) Keypoint-based keyframe selection. IEEE Trans Circ Syst Video Technol 23:729–734

    Article  Google Scholar 

  17. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780

    Article  Google Scholar 

  18. Hosseini M-S, Eftekhari-Moghadam A-M (2013) Fuzzy rule-based reasoning approach for event detection and annotation of broadcast soccer video. Appl Soft Comput 13:846–866

    Article  Google Scholar 

  19. Huang C-L, Shih H-C, Chao C-Y (2006) Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans Multimed 8:749–760

    Article  Google Scholar 

  20. Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimed 14:1196–1205

    Article  Google Scholar 

  21. Jai-Andaloussi S, El Mourabit I, Madrane N, Chaouni SB, Sekkaki A (2015) Soccer events summarization by using sentiment analysis. Comput Sci Comput Intell (CSCI), 2015 Int Conf: 398–403

  22. Ji Won Lee D-WN, Moon S-W, Lee J, Yoo W-Y (2017) Soccer event recognition technique based on pattern matching. Comput Sci Inform Syst (FedCSIS), 2017 Fed Conf: 4, 3–6

  23. Jiang Z, Lin Z, Davis LS (2013) Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35:2651–2664

    Article  Google Scholar 

  24. Jiang H, Lu Y, Xue J (2016) Automatic soccer video event detection based on a deep neural network combined CNN and RNN. Tools Artif Intell (ICTAI), 2016 IEEE 28th Int Conf: 490–494

  25. Kolekar MH, Sengupta S (2015) Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Trans Broadcast 61:195–209

    Article  Google Scholar 

  26. Kolekar MH, Sengupta SJITOB (2015) Bayesian network-based customized highlight generation for broadcast soccer videos 61: 195–209

  27. Li N, Wu X, Xu D, Guo H, Feng W (2015) Spatio-temporal context analysis within video volumes for anomalous-event detection and localization. Neurocomputing 155:309–319

    Article  Google Scholar 

  28. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. IJCAI: 1617–1623

  29. Liu Y, Nie L, Liu L, Rosenblum DSJN (2016) From action to activity: sensor-based activity. Recognition 181:108–115

    Google Scholar 

  30. Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning

  31. Liu T, Lu Y, Lei X, Zhang L, Wang H, Huang W et al. (2017) Soccer video event detection using 3D convolutional networks and shot boundary detection via deep feature distance. Int Conf Neural Inform Process: 440–449

  32. Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16:1497–1509

    Article  Google Scholar 

  33. Mairal J, Leordeanu M, Bach F, Hebert M, Ponce J (2008) Discriminative sparse image models for class-specific edge detection and image interpretation. European conference on computer vision: 43–56

  34. Mei S, Guan G, Wang Z, Wan S, He M, Feng DDJPR (2015) Video summarization via minimum sparse reconstruction 48: 522–533

  35. Nagasaka A, Tanaka Y (1992) Automatic video indexing and full-video search for object appearances

  36. Ouyang J-q, Liu R (2013) Ontology reasoning scheme for constructing meaningful sports video summarisation. IET Image Process 7:324–334

    Article  Google Scholar 

  37. Pandya MAZDS (2017) Frame based approach for automatic event boundary detection of soccer video using optical flow. Conference: Conference: 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA): 5

  38. Park J-H, Cho K (2016) Extraction of visual information in basketball broadcasting video for event segmentation system. Inform Commun Technol convergence (ICTC), 2016 Int Conf: 1098–1100

  39. Perin C, Vuillemot R, Fekete J-D (2013) SoccerStories: a kick-off for visual soccer analysis. IEEE Trans Vis Comput Graph 19:2506–2515

    Article  Google Scholar 

  40. Poultney C, Chopra S, Cun YL (2007) Efficient learning of sparse representations with an energy-based model. Adv Neural Inform Process Syst: 1137–1144

  41. Qian X, Wang H, Liu G, Hou X (2012) HMM based soccer video event detection using enhanced mid-level semantic. Multimed Tools Appl 60:233–255

    Article  Google Scholar 

  42. Ramirez I, Sprechmann P, Sapiro G (2010) Classification and clustering via dictionary learning with structured incoherence and shared features

  43. Raventos A, Quijada R, Torres L, Tarrés F (2015) Automatic summarization of soccer highlights using audio-visual descriptors. SpringerPlus 4:301

    Article  Google Scholar 

  44. Roy D, Srinivas M, Mohan CK (2016) Sparsity-inducing dictionaries for effective action classification. Pattern Recogn 59:55–62

    Article  Google Scholar 

  45. Sadlier DA, O'Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Video Technol 15:1225–1233

    Article  Google Scholar 

  46. Saraogi H, Sharma RA, Kumar V (2016) Event recognition in broadcast soccer videos Proc Tenth Indian Conf Comput Vision Graph Image Process: 14

  47. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Comput Vision Pattern Recogn 2006 IEEE Comput Soc Conf: 2169–2178

  48. Sigari M-H, Soltanian-Zadeh H, Pourreza H-R (2016) A framework for dynamic restructuring of semantic video analysis systems based on learning attention control. Image Vis Comput 53:20–34

    Article  Google Scholar 

  49. Sivalingam R, Boley D, Morellas V, Papanikolopoulos N (2011) Positive definite dictionary learning for region covariances. Comput Vision (ICCV), 2011 IEEE Int Conf: 1013–1019

  50. Song W, Hagras H (2017) A type-2 fuzzy logic system for event detection in soccer videos. Fuzzy Syst (FUZZ-IEEE), 2017 IEEE Int Conf: 1–6

  51. Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans Circ Syst Video Technol 24:291–304

    Article  Google Scholar 

  52. Tjondronegoro DW, Chen Y-PP (2010) Knowledge-discounted event detection in sports video. IEEE Trans Syst Man Cybernet-Part A: Syst Humans 40:1009–1024

    Article  Google Scholar 

  53. Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53:4655–4666

    Article  MathSciNet  MATH  Google Scholar 

  54. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. Comput Vision Pattern Recogn (CVPR), 2010 IEEE Conf: 3360–3367

  55. Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. Tools Artif Intell (ICTAI), 2015 IEEE 27th Int Conf: 234–241

  56. Wang C, Yang H, Meinel C (2016) Exploring multimodal video representation for action recognition. Neural Networks (IJCNN), 2016 International Joint Conf: 1924–1931

  57. Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. Proc 2016 ACM Multimed Conf: 988–997

  58. Wang C, Yang H, C J M T Meinel, and Applications (2016) A deep semantic framework for multimodal representation learning 75: 9255–9276

  59. Wang Z, Yu J, He YJITOC, S. F. V Technology (2017) Soccer video event annotation by synchronization of attack–defense clips and match reports with coarse-grained time information 27: 1104–1117,

  60. Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Sign Process Lett 24:510–514

    Article  Google Scholar 

  61. Wang C, Yang H, Meinel CJATOMC (2018) Communications,, and applications. Image Cap Deep Bidirect LSTMs Multi-Task Learn 14:40

    Google Scholar 

  62. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31:210–227

    Article  Google Scholar 

  63. Xiang T, Gong S (2008) Video behavior profiling for anomaly detection. IEEE Trans Pattern Anal Mach Intell 30:893–908

    Article  Google Scholar 

  64. Xie W, Tong M (2011) A novel framework for soccer goal detection based on semantic rule. J Electron (China) 28:670–674

    Article  Google Scholar 

  65. Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based face recognition. Image Process (ICIP), 2010 17th IEEE Int Conf: 1601–1604

  66. Yang M, Zhang L, Feng X, Zhang DJIJOCV (2014) Sparse representation based fisher discrimination dictionary learning for image classification 109: 209–232

  67. Zawbaa HM, El-Bendary N, Hassanien AE, Abraham A (2011) SVM-based soccer video summarization system. Nature Biol Inspired Comput (NaBIC), 2011 Third World Congress: 7–11

  68. Zhang Z, Xu Y, Yang J, Li X, Zhang D (2015) A survey of sparse representation: algorithms and applications. IEEE access 3:490–530

    Article  Google Scholar 

  69. Zhao W, Lu Y, Jiang H, Huang W (2015) Event detection in soccer videos using shot focus identification. Pattern Recogn (ACPR), 2015 3rd IAPR Asian Conf: 341–345

  70. Zhao Z, Song Y, Su F (2016) Specific video identification via joint learning of latent semantic concept, scene and temporal structure. Neurocomputing 208:378–386

    Article  Google Scholar 

  71. Zhou N, Shen Y, Peng J, Fan J (2012) Learning inter-related visual dictionary for object recognition. Computer vision and pattern recognition (CVPR), 2012 IEEE conference: 3490–3497

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamidreza Rashidy Kanan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fakhar, B., Rashidy Kanan, H. & Behrad, A. Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model. Multimed Tools Appl 78, 16995–17025 (2019). https://doi.org/10.1007/s11042-018-7083-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-7083-1

Keywords

Navigation