Multimedia Tools and Applications

, Volume 48, Issue 2, pp 313–337 | Cite as

Semantic annotation of soccer videos by visual instance clustering and spatial/temporal reasoning in ontologies

  • Lamberto Ballan
  • Marco BertiniEmail author
  • Alberto Del Bimbo
  • Giuseppe Serra


In this paper we present a framework for semantic annotation of soccer videos that exploits an ontology model referred to as Dynamic Pictorially Enriched Ontology, where the ontology, defined using OWL, includes both schema and data. Visual instances are used as matching references for the visual descriptors of the entities to be annotated. Three mechanisms are included to support effective annotation: visual instance clustering—to cluster instances of similar patterns, prototype selection—to select one or more visual representatives of each cluster, dynamic cluster updating—to update clusters and prototypes whenever new knowledge is presented to the ontology. Experimental results show the capability of performing semantic annotation of entities that exhibit a variety of complex changes in visual appearance or of events that show complex motion patterns in the same shot. SWRL rules are used to perform rule-based reasoning over both concepts and concept instances, to improve the quality of the annotation.


Semantic video annotation Dynamic pictorial ontology Content descriptor matching Ontology reasoning Sports video analysis 



The authors are grateful to Sport System Europe s.r.l., Bologna, Italy, for having provided the large set of sports video used for the experimental validation of this research. This work was partially supported by the Information Society Technologies (IST) Program of the European Commission DELOS Network of Excellence on Digital Libraries (Contract G038-507618) and by the VidiVideo Project (Contract FP6-045547).


  1. 1.
    Assfalg J, Bertini M, Del Bimbo A, Nunziati W, Pala P (2002) Soccer highlights detection and recognition using HMMs. In: Proc of IEEE int’l conference on multimedia & expo (ICME)Google Scholar
  2. 2.
    Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W (2003) Semantic annotation of soccer videos: automatic highlights identification. Comput Vis Image Underst 92(2–3):285–305CrossRefGoogle Scholar
  3. 3.
    Bagdanov AD, Ballan L, Bertini M, Del Bimbo A (2007) Trademark matching and retrieval in sports video databases. In: Proc of ACM int’l workshop on multimedia information retrieval (MIR), AugsburgGoogle Scholar
  4. 4.
    Bagdanov AD, Del Bimbo A, Dini F, Nunziati W (2007) Improving the robustness of particle filter-based visual trackers using online parameter adaptation. In: Proc of IEEE int’l conference on AVSS, London, pp 218–223Google Scholar
  5. 5.
    Bai L, Lao S, Jones G, Smeaton AF (2007) Video semantic content analysis based on ontology. In: Proc of int’l machine vision and image processing conference, Maynooth, pp 117–124Google Scholar
  6. 6.
    Bai L, Lao S, Zhang W, Jones G, Smeaton A (2007) A semantic event detection approach for soccer video based on perception concepts and finite state machines. In: Proc intl’l workshop on image analysis for multimedia interactive services (WIAMIS)Google Scholar
  7. 7.
    Ballan L, Bertini M, Del Bimbo A, Nunziati W (2007) Soccer players identification based on visual local features. In: Proc of ACM int’l conference on image and video retrieval (CIVR), AmsterdamGoogle Scholar
  8. 8.
    Bertini M, Cucchiara R, Del Bimbo A, Torniai C (2005) Video annotation with pictorially enriched ontologies. In: Proc of IEEE int’l conference on multimedia & expo (ICME), Amsterdam, pp 1428–1431Google Scholar
  9. 9.
    Bertini M, Del Bimbo A, Nunziati W (2006) Automatic detection of player’s identity in soccer videos using faces and text cues. In: Proc of ACM multimedia, Santa Barbara, pp 663–666Google Scholar
  10. 10.
    Bloehdorn S, Simou N, Tzouvaras V, Petridis K, Handschuh S, Avrithis Y, Kompatsiaris I, Staab S, Strintzis MG (2004) Knowledge representation for semantic multimedia content analysis and reasoning. In: Proc of EWIMT, LondonGoogle Scholar
  11. 11.
    Buitelaar P, Cimiano P, Racioppa S (2006) Ontology-based information extraction with soba. In: Proc of international conference on language resources and evaluationGoogle Scholar
  12. 12.
    Dasiopoulou S, Mezaris V, Kompatsiaris I, Papastathis VK, Strintzis MG (2005) Knowledge-assisted semantic video object detection. IEEE Trans Circuits Syst Video Technol 15(10):1210–1224CrossRefGoogle Scholar
  13. 13.
    Dublin Core Metadata Initiative (2009) Dublin Core Metadata Initiative homepage.
  14. 14.
    Ekin A, Tekalp AM, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12(7):796–807CrossRefGoogle Scholar
  15. 15.
    Espinosa S, Kaya A, Melzer S, Moller R, Wessel M (2007) Towards a media interpretation framework for the semantic web. In: Proc ICWI, pp 374–380Google Scholar
  16. 16.
    Fellbaum C (ed) (1998) Wordnet. An electronic lexical database. MIT, CambridgezbMATHGoogle Scholar
  17. 17.
    FIFA (2006) 2006 FIFA world cup broadcast wider, longer and farther than ever before.
  18. 18.
    Francois A, Nevatia R, Hobbs J, Bolles R, Smith J (2005) VERL: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76–86CrossRefGoogle Scholar
  19. 19.
    Grana C, Cucchiara R (2007) Linear transition detection as a unified shot detection approach. IEEE Trans Circuits Syst Video Technol 17(4):483–489CrossRefGoogle Scholar
  20. 20.
    Haubold A, Naphade M (2007) Classification of video events using 4-dimensional time-compressed motion features. In: Proc of ACM int’l conference on image and video retrieval (CIVR), Amsterdam, pp 178–185Google Scholar
  21. 21.
    Hauptmann A, Chen M-Y, Christel M, Lin WH, Yang J (2007) A hybrid approach to improving semantic extraction of news video. In: Proc of IEEE int’l conference on semantic computing (ICSC), Irvine, pp 79–86Google Scholar
  22. 22.
    Huang CL, Shih HC, Chao CY (2006) Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans Multimed 8(4):749–760CrossRefGoogle Scholar
  23. 23.
    Kasutani E, Yamada A (2001) The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In: Proc IEEE int’l conference on image processing (ICIP), ThessalonikiGoogle Scholar
  24. 24.
    Kokaram A, Rea N, Dahyot R, Tekalp AM, Bouthemy P, Gros P, Sezan I (2006) Browsing sports video: trends in sports-related indexing and retrieval work. IEEE Signal Process Mag 23(2):47–58CrossRefGoogle Scholar
  25. 25.
    Lenat D, Guha R (1990) Building large knowledge-based systems: representation and inference in the cyc project. Addison-Wesley, ReadingGoogle Scholar
  26. 26.
    Leonardi R, Migliorati P (2002) Semantic indexing of multimedia documents. IEEE Multimed 9(2):44–51CrossRefGoogle Scholar
  27. 27.
    Leslie L, Chua T, Ramesh J (2007) Annotation of paintings with high-level semantic concepts using transductive inference and ontology-based concept disambiguation. In: Proc of ACM multimedia, Augsburg, pp 443–452Google Scholar
  28. 28.
    Liu J, Tong X, Li W, Wang T, Zhang Y, Wang H (2009) Automatic player detection, labeling and tracking in broadcast soccer video. Pattern Recogn Lett 30:103–113CrossRefGoogle Scholar
  29. 29.
    Luo M, Ma YF, Zhang HJ (2003) Pyramidwise structuring for soccer highlight extraction. In: Proc of ICICS-PCMGoogle Scholar
  30. 30.
    Masolo C, Borgo S, Gangemi A, Guarino N, Oltramari A, Schneider L (2002) The wonderweb library of foundational ontologies. Tech rep, WonderWeb Deliverable D17.
  31. 31.
    Mei T, Hua XS (2008) Structure and event mining in sports video with efficient mosaic. Multimedia Tools and Applications 40:89–110CrossRefGoogle Scholar
  32. 32.
    Naphade M, Smith J, Tesic J, Chang SF, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimed 13(3):86–91CrossRefGoogle Scholar
  33. 33.
    Neumann B, Moeller R (2006) On scene interpretation with description logics. In: Cognitive vision systems: sampling the spectrum of approaches, LNCS. Springer, New York, pp 247–278Google Scholar
  34. 34.
    Qasemizadeh B, Haghi H, Kangavari M (2006) A framework for temporal content modeling of video data using an ontological infrastructure. In: Proc of semantics, knowledge and grid, GuilinGoogle Scholar
  35. 35.
    Sadlier D, O’Connor N (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circuits Syst Video Technol 15(10):1225–1233CrossRefGoogle Scholar
  36. 36.
    Sedgewick R (1983) Algorithms. Addison Wesley, ReadingzbMATHGoogle Scholar
  37. 37.
    Shyu ML, Xie Z, Chen M, Chen SC (2008) Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Trans Multimedia 10(2):252–259CrossRefGoogle Scholar
  38. 38.
    Simou N, Saathoff C, Dasiopoulou S, Spyrou E, Voisine N, Tzouvaras V, Kompatsiaris I, Avrithis Y, Staab S (2005) An ontology infrastructure for multimedia reasoning. In: Proc of VLBV, Italy, pp 51–60Google Scholar
  39. 39.
    Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proc of ACM int’l workshop on multimedia information retrieval (MIR), Santa Barbara, pp 321–330Google Scholar
  40. 40.
    Snoek C, Worring M (2005) Multimedia event-based video indexing multimedia event-based video indexing using time intervals. IEEE Trans Multimed 7(4):638–647CrossRefGoogle Scholar
  41. 41.
    Snoek C, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimedia Tools and Applications 25(1):5–35CrossRefGoogle Scholar
  42. 42.
    Snoek C, Huurnink B, Hollink L, de Rijke M, Schreiber G, Worring M (2007) Adding semantics to detectors for video retrieval. IEEE Trans Multimedia 9(5):975–986CrossRefGoogle Scholar
  43. 43.
    Tsinaraki C, Polydoros P, Kazasis F, Christodoulakis S (2005) Ontology-based semantic indexing for MPEG-7 and TV-Anytime audiovisual content. Multimedia Tools and Applications (26):299–325Google Scholar
  44. 44.
    Utsumi O, Miura K, Ide I, Sakai S, Tanaka H (2002) An object detection method for describing soccer games from video. In: Proc of IEEE int’l conference on multimedia & expo (ICME)Google Scholar
  45. 45.
    Watve A, Sural S (2008) Soccer video processing for the detection of advertisement billboards. Pattern Recogn Lett (29):994–1006Google Scholar
  46. 46.
    Wei X, Ngo CW (2007) Ontology-enriched semantic space for video search. In: Proc of ACM multimediaGoogle Scholar
  47. 47.
    Wu Y, Tseng B, Smith J (2004) Ontology-based multi-classification learning for video concept detection. In: Proc of IEEE int’l conference on multimedia & expo (ICME)Google Scholar
  48. 48.
    Xie L, Xu P, Chang SF, Divakaran A, Sun H (2004) Structure analysis of soccer video with domain knowledge and hidden Markov models. Pattern Recogn Lett 25(7):767–775CrossRefGoogle Scholar
  49. 49.
    Xu C, Wang J, Lu H, Zhang Y (2008) A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans Multimed 10(3):421–436CrossRefGoogle Scholar
  50. 50.
    Xu P, Xie L, Chang SF, Divakaran A, Vetro A, Sun H (2001) Algorithms and system for segmentation and structure analysis in soccer video. In: Proc of IEEE int’l conference on multimedia & expo (ICME)Google Scholar
  51. 51.
    Yang Y, Lin S, Zhang Y, Tang S (2008) A statisticall framework for replay detection in soccer video. In: Proc of IEEE international symposium on circuits and systemsGoogle Scholar
  52. 52.
    Ye Q, Huang Q, Jang S (2005) Jersey number detection in sports video for athlete identification. In: Proc of visual communications & image processing (VCIP)Google Scholar
  53. 53.
    Yu X, Farin D (2005) Current and emerging topics in sports video processing. In: Proc IEEE ICMEGoogle Scholar
  54. 54.
    Zha ZJ, Mei T, Wang Z, Hua XS (2007) Building a comprehensive ontology to refine video concept detection. In: Proc of ACM int’l workshop on multimedia information retrieval (MIR), Augsburg, pp 227–236Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Lamberto Ballan
    • 1
  • Marco Bertini
    • 1
    Email author
  • Alberto Del Bimbo
    • 1
  • Giuseppe Serra
    • 1
  1. 1.Media Integration and Communication CenterUniversity of FlorenceFlorenceItaly

Personalised recommendations