Multimedia Tools and Applications

, Volume 60, Issue 1, pp 233–255 | Cite as

HMM based soccer video event detection using enhanced mid-level semantic



Highlight detection is a fundamental step in semantics based video retrieval and personalized sports video browsing. In this paper, an effective hidden Markov models (HMMs) based soccer video event detection method based on a hierarchical video analysis framework is proposed. Soccer video shots are classified into four coarse mid-level semantics: global, median, close-up and audience. Global and local motion information is utilized for the refinement of coarse mid-level semantics. Sequential soccer video is segmented into event clips. Both the temporal transitions of the mid-level semantics and the overall features of an event clip are fused using HMMs to determine the type of event. Highlight detection performance of dynamic Bayesian networks (DBN), conditional random fields (CRF) and the proposed HMM based approach are compared. The average F-score of our highlights (including goal, shoot, foul and placed kick) detection approach is 82.92%, which outperforms that of DBN and CRF by 9.85% and 11.12% respectively. The effects of number of hidden states, overall features, and the refinement of mid-level semantics on the event detection performance are also discussed.


Hidden Markov model Highlight Event detection Shot classification Soccer video 



This work is supported by the National Natural Science Foundation of China No.60903121, Chinese Center University Foundation XJTU-HRT-002, and Microsoft Research Foundation FY11-RES-THEME-052. The authors give their special thanks to Wenjun Zeng with the Computer Science Department of University of Missouri for proof reading the paper and discussion.


  1. 1.
    Assfalg J, Bertini M, Colombo C, Bimbo A, Nunziati W (2003) Semantic annotation of soccer videos: automatic highlight identification. Comput Vis Image Underst 6(4):285–305CrossRefGoogle Scholar
  2. 2.
    Chen S, Chen M, Zhang C, Shyu M (2006). Exciting event detection using multi-level multimodal descriptors and data classification. in Proc. ISM.Google Scholar
  3. 3.
    Cheng C, Hsu C (2006) Fusion of audio and motion information on HMM-based highlight extraction for baseball games. IEEE Trans Multimedia 8(3):585–599CrossRefGoogle Scholar
  4. 4.
    Dalal N, Triggs B (2005) Histogram of oriented gradients for human detection. In Proc. Int. Conf. Computer Vision and Pattern RecognitionGoogle Scholar
  5. 5.
    Dao M, Babaguchi N (2008) Mining temporal information and web-casting text for automatic sports event detection. In Proc. MMSP, pp.616–621Google Scholar
  6. 6.
    Dao M, Babaguchi N (2008) Sports event detection using temporal patterns mining and web-casting text. In Proc. ACM AREA, pp. 33–40Google Scholar
  7. 7.
    Duan L, Xu M, Chua T, Tian Q, Xu C (2003) A mid-level representation framework for semantic sports video analysis. In Proc. ACM Multimedia, pp. 29–32Google Scholar
  8. 8.
    Duan L, Xu M, Tian Q (2003) Semantic shot classification in sports video. In Proc. SPIE Storage and Retrieval for Media Database 5021:300–313Google Scholar
  9. 9.
    Duan L, Xu M, Tian Q, Xu C, Jin JS (2005) A unified framework for semantic shot classification in sports video. IEEE Trans Multimedia 7(6):1066–1083CrossRefGoogle Scholar
  10. 10.
    Ekin A, Tekalp A (2003) Generic play-break event detection for summarization and hierarchical sports video analysis. In Proc. Int. Conf. Mulmedia and Expo 1:169–172Google Scholar
  11. 11.
    Ekin A, Tekalp A, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12(7):796–807CrossRefGoogle Scholar
  12. 12.
    Hanjialic A (2003) Generic approach to highlights extraction from a sports video. In Proc. Int. Conf. Image Processing 1: 1–4Google Scholar
  13. 13.
    Huang C, Shih H, Chao C (2006) Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans Multimedia 8(4):749–760CrossRefGoogle Scholar
  14. 14.
    Jin G, Tao L, Xu G (2004) Hidden markov model based events detection in soccer video. ICIAR 2004, LNCS 3221:605–612Google Scholar
  15. 15.
    Li B, Errico J, Pan H, Sezan M (2004) Bridging the semantic gap in sports video retrieval and summarization. J Vis Commun Image R 17:393–424Google Scholar
  16. 16.
    Lien C, Chiang C, Lee C (2007) Scene-based event detection for baseball videos. J Vis Commun Image R 18:1–14CrossRefGoogle Scholar
  17. 17.
    Lyu M, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circ Syst Video Technol 15(2):243–255CrossRefGoogle Scholar
  18. 18.
    Mittal A, Cheong L, Leung T (2001) Dynamic bayesian framework for extracting temporal structure in video. In Proc. Int. Conf. Computer Vision and Pattern Recognition, pp. 110–115Google Scholar
  19. 19.
    Nan N, Liu G, Qian X, Wang C (2008) An SVM-based soccer video shot classification scheme using projection histograms. PCMGoogle Scholar
  20. 20.
  21. 21.
    Pan H, Beek P, Sezan M (2001) Detection of slow-motion replay segments in sports video for highlights generation. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing 3:1649–1652, Salt Lake City, USA, May, 2001Google Scholar
  22. 22.
    Pan H, Li B, Sezan M (2002). Automatic detection of replay segments in broadcast sports programs by detecting of logos in scene transitions. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing 4:3385–3388, Orlando, FL, May 2002Google Scholar
  23. 23.
    Papadopoulos G, Mezaris V, Kompatsiaris I, Strintzis M (2008) Accumulated motion energy fields estimation and representation for semantic event detection. In Proc. CIVR, pp. 221-230Google Scholar
  24. 24.
    Qian X, Liu G (2007) Global motion estimation from randomly selected motion vector groups and GM/LM based applications. Signal, Image and Video ProcessingGoogle Scholar
  25. 25.
    Qian X, Liu G, Wang H, Su R (2007) Text detection, localization and tracking in compressed videos. Signal Process Image Commun 22:752–768CrossRefGoogle Scholar
  26. 26.
    Qian X, Liu G, Guo D, Li Z, Wang Z, Wang H (2009) Object categorization using hierarchical wavelet packet texture descriptors. In Proc. ISM, pp. 44–51Google Scholar
  27. 27.
    Qian X, Wang H, Liu G, Li Z, Wang Z (2010) Soccer video event detection by fusing middle level visual semantics of an event clip. In Proc. PCM, pp. 439–451Google Scholar
  28. 28.
    Qian X, Liu G, Wang Z, Li Z, Wang H (2010) Highlight events detection in soccer video using HCRF. In Proc. ICIMCSGoogle Scholar
  29. 29.
    Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–285CrossRefGoogle Scholar
  30. 30.
    Sadlier D, O’Connor N (2005) Event detection in field sports video using audio-visual features and a support vector Machine. IEEE Trans Circuits Syst Video Technol 15(10):602–615CrossRefGoogle Scholar
  31. 31.
    Snoek C, Worring M (2005) Multimedia event-based video indexing using time intervals. IEEE Trans Multimedia 7(4):638–647CrossRefGoogle Scholar
  32. 32.
    Su Y, Sun M, Hsu V (2005) Global motion estimation from coarsely sampled motion vector field and the applications. IEEE Trans Circuits Syst Video Technol 15(2):232–242CrossRefGoogle Scholar
  33. 33.
    Tjondronegoro DW, Chen Y, Pham B (2004) Classification of self-consumable highlights for soccer video summaries. In Proc. Int. Conf. Mulmedia and Expo pp. 579–582Google Scholar
  34. 34.
    Wang Y, Liu Z, Huang J (2000) Multimedia content analysis using both audio and video clues. IEEE Signal Processing MagazineGoogle Scholar
  35. 35.
    Wang F, Ma Y, Zhang H, Li J (2004) Dynamic Bayesian network based event detection for soccer highlight extraction. In Proc. Int. Conf. Image Processing, pp. 633–636Google Scholar
  36. 36.
    Wang F, Ma Y, Zhang H, Li J (2005) A generic framework for semantic sports video analysis using dynamic Bayesian networks. In Proc. Int. Conf. Multimedia Modelling, pp. 29–32Google Scholar
  37. 37.
    Wang T, Li J, Diao Q, Hu W, Zhang Y, Dulong C (2006) Semantic event detection using conditional random fields. In Proc. Computer Vision and Pattern Recognition Workshop, pp. 109–115Google Scholar
  38. 38.
    Wickramaratna K, Chen M, Chen S, Shyu M (2005) Neural network based framework for goal event detection in soccer videos. In Proc. Int. Symposium on Multimedia. pp. 21–28Google Scholar
  39. 39.
    Xie L, Chang S, Divakaran A, Sun H (2002) Structure analysis of soccer video with hidden Markov models. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing, pp. 4096–4099Google Scholar
  40. 40.
    Xiong Z, Radhakrishnan R, Divakaran A, Huang T (2005) Highlights extraction from sports video based on an audio-visual marker detection framework. In Proc. Int. Conf. Multimedia & Expo, pp. 29–32Google Scholar
  41. 41.
    Xu P, Xie L, Chang S (2001) Algorithms and systems for segmentation and structure analysis in soccer video. In Proc. Int. Conf. Multimedia & Expo, pp. 184–187.Google Scholar
  42. 42.
    Xu C, Wang J, Lu H, Zhang Y (2008) A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans Multimedia 10(3):421–436CrossRefGoogle Scholar
  43. 43.
    Xu C, Zhang Y, Zhu G, Rui Y, Lu H, Huang Q (2008) Using webcast text for semantic event detection in broadcast sports video. IEEE Trans Multimedia 10(7):1342–1325CrossRefGoogle Scholar
  44. 44.
    Xu G, Ma Y, Zhang H, Yang S (2005) An HMM-based framework for video semantic analysis. IEEE Trans Circ Syst Video Technol 15(11):1422–1433CrossRefGoogle Scholar
  45. 45.
    Zhang D, Chang S (2002) Event detection in baseball video using superimposed caption recognition. In Proc. ACM Multimedia, Juan-les- Pins, France, Nov. 1, pp. 315–318Google Scholar
  46. 46.
    Zhao Z, Jiang S, Huang Q, Zhu G (2006) Highlight summarization in sports video based on replay detection. In Proc. Int. Conf. Mulmedia and Expo pp. 1613–1616, Toronto, Ontario, Canada, July 2006Google Scholar
  47. 47.
    Zhu X, Wu X, Elmagarmid A, Feng Z, Wu L (2005) Video data mining semantic indexing and event detection from the association perspective. IEEE Trans Knowl Data Eng 17(5):665–677CrossRefGoogle Scholar
  48. 48.
    Zhu G, Xu C, Huang Q, Rui Y, Jiang S, Gao W, Yao H (2009) Event tactic analysis based on broadcast sport video. IEEE Trans Multimedia 11(1):49–67CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.School of Electronics and Information EngineeringXi’an Jiaotong UniversityXi’anChina

Personalised recommendations