Skip to main content
Log in

Survey on modeling and indexing events in multimedia

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript


Events have gained increasing interest in the area of multimedia in recent years. There have been many approaches published and research conducted on how to extract events from multimedia, represent it using appropriate models, and how to use events in end user applications. In this paper, we conduct an extensive analysis of existing event models along commonly identified aspects of events. In addition, we analyze how the different aspects of events relate to each other and how they can be applied together. Subsequently, we look into different approaches for how to index multimedia data. Finally, we elaborate on how to link the multimedia data with events in order to provide the basis for future event-based multimedia applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others


  1., last visited: December 14, 2012.




  5. Large Scale Concept Ontology for Multimedia,


  1. Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843. ISSN 0001-0782. doi:10.1145/182.358434

    Article  MATH  Google Scholar 

  2. Appan P, Sundaram H (2004) Networked multimedia event exploration. In: Proceedings of the 12th annual ACM international conference on multimedia, MULTIMEDIA ’04. ACM, New York, NY, pp 40–47. ISBN 1-58113-893-8. doi:10.1145/1027527.1027536

    Chapter  Google Scholar 

  3. Arndt R, Troncy R, Staab S, Hardman L, Vacura M (2007) COMM: designing a well-founded multimedia ontology for the web. In: The Semantic Web: ISWC 2007 + ASWC 2007, lecture notes in computer science, vol 4825. Springer, Berlin, pp 30–43

    Chapter  Google Scholar 

  4. Atrey PK, Saddik AE, Kankanhalli MS (2011) Effective multimedia surveillance using a human-centric approach. Multimed Tools Appl 51(2):697–721

    Article  Google Scholar 

  5. Ballan L, Bertini M, Bimbo AD, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tools Appl 51(1):279–302

    Article  Google Scholar 

  6. Ballan L, Bertini M, Bimbo AD, Serra G (2010) Semantic annotation of soccer videos by visual instanc clustering and spatial/temporal reasoning in ontologies. Multimed Tools Appl 48(2):313–337

    Article  Google Scholar 

  7. Ballan L, Bertini M, Serra G (2010) Video annotation and retrieval using ontologies and rule learning. IEEE Multimed 17(4):80–88

    Article  Google Scholar 

  8. Baumgartner N, Retschitzegger W (2006) A survey of upper ontologies for situation awareness. In: Knowledge sharing and collaborative engineering. ACTA Press, St. Thomas, VI, pp 1–9

    Google Scholar 

  9. Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  10. Bertini M, Bimbo AD, Serra G, Torniai C, Cucchiara R, Grana C, Vezzani R (2009) Dynamic pictorially enriched ontologies for digital video libraries. IEEE Multimed 16:42–51

    Article  Google Scholar 

  11. Cao L, Codella N, Gong L et al (2012) Ibm research and columbia university trecvid-2012 multimedia event detection (med), multimedia event recounting (mer), and semantic indexing (sin) systems. In: Proc. TRECVID 2012 workshop. Gaithersburg, MD, USA

  12. Carbonaro A (2008) Ontology-based video retrieval in a semantic-based learning environment. J E-Learn Knowl Soc 4(3):203–212

    MathSciNet  Google Scholar 

  13. Casati R, Varzi A (2006) Events. Stanford Encyclopedia of Philosophy.

  14. Cervesato I, Franceschet M, Montanari A (1999) A guided tour through some extensions of the event calculus. Comput Intell 16(2):307–347

    Article  MathSciNet  Google Scholar 

  15. Chandy KM, Charpentier M, Capponi A (2007) Towards a theory of events. In: Proceedings of the 2007 inaugural international conference on distributed event-based systems, DEBS ’07. ACM, New York, NY, pp 180–187. ISBN 978-1-59593-665-3. doi:10.1145/1266894.1266929

    Chapter  Google Scholar 

  16. Chang, S-F, He J, Jiang Y-G, Khoury EE, Ngo C-W, Yanagawa A, Zavesky E (2008) Columbia University/VIREO-CityU/IRIT TRECVID2008 high-level feature extraction and interactive video search. In: Proc. TRECVID 2008 workshop. Gaithersburg, MD, USA

  17. Chechik G, Ie E, Rehn M, Bengio S, Lyon D (2008) Large-scale content-based audio retrieval from tex queries. In: Proc. 1st ACM int. conf. on Multimedia Information Retrieval, (MIR ’08). Vancouver, BC, Canada, pp 105–112

  18. Chen H, Finin TW, Joshi A (2003) Using OWL in a pervasive computing broker. In: Proceedings ontologies in agent systems CEUR workshop,, vol 73. Melbourne, Australia, pp 9–16

  19. Chen H, Joshi A (2004) The SOUPA ontology for pervasive computing. Birkhauser Publishing Ltd.

  20. Cheng H, Liu J, Ali S et al (2012) Sri-sarnoff aurora system at TRECVID 2012 multimedia event detection and recounting. In: Proc. TRECVID 2012 workshop. Gaithersburg, MD, USA

  21. Dasiopoulou S, Mezaris V, Kompatsiaris I, Papastathis V, Strintzis M (2005) Knowledge-assisted semantic video object detection. IEEE Trans Circuits Syst Video Technol 15(10):1210–1224

    Article  Google Scholar 

  22. Doerr M, Ore C-E, Stead S (2007) The CIDOC conceptual reference model: a new standard for knowledge sharing. In: Conceptual modeling. Australian Computer Society Inc., pp 51–56. ISBN 978-1-920682-64-4

  23. Ekin A, Tekalp AM, Mehrotra R (2004) Integrated semantic-syntactic video modeling for search and browsing. IEEE Trans Multimedia 6(6):839–851

    Article  Google Scholar 

  24. Francois ARJ, Nevatia R, Hobbs J, Bolles RC (2005) VERL: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76–86

    Article  Google Scholar 

  25. Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with DOLCE. In: International conference on knowledge engineering and knowledge management. Springer, London, pp 166–181. ISBN 3-540-44268-5

    Google Scholar 

  26. Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with DOLCE. In: Proc. of the 13th int. conf. on knowledge engineering and knowledge management. Ontologies and the semantic web, (EKAW ’02). London, UK, pp 166–181

  27. Gangemi A, Presutti V (2009) Ontology design patterns. In: Staab S, Studer R (eds) Handbook of ontologies, 2nd edn. International handbooks on information systems. Springer

  28. Gkalelis N, Mezaris V, Kompatsiaris I (2010) A joint content-event model for event-centric multimedia indexing. In: Proceedings of the 4th IEEE international conference on semantic computing, (ICSC 2010). Carnegie Mellon University, Pittsburgh. IEEE, PA, pp 79–84, 22–24 September 2010

    Google Scholar 

  29. Gkalelis N, Mezaris V, Kompatsiaris I (2011) High-level event detection in video exploiting discriminant concepts. In: Proc. 9th International workshop on Content-Based Multimedia Indexing, (CBMI 2011). Madrid, Spain, pp 85–90

  30. Gkalelis N, Mezaris V, Kompatsiaris I (2011) Mixture subclass discriminant analysis. IEEE Signal Process Lett 18(5):319–322

    Article  Google Scholar 

  31. Gkalelis N, Mezaris V, Kompatsiaris I, Stathaki T (2013) Mixture subclass discriminant analysis link to restricted Gaussian model and other generalizations. IEEE Transactions on Neural Networks and Learning Systems 24(1):8–21

    Article  Google Scholar 

  32. Gupta A, Jain R (2011) Managing event information: modeling, retrieval, and applications. Synthesis lectures on data management. Morgan & Claypool Publishers

  33. Hakeem A, Sheikh Y, Shah M (2004) Casee: a hierarchical event representation for the analysis of videos. In: McGuinness DL, Ferguson G (eds) Proceedings of the 19th national conference on artificial intelligence, 16th conference on innovative applications of artificial intelligence. AAAI Press/The MIT Press, San Jose, CA, pp 263–268. ISBN 0-262-51183-5, 25–29 July 2004

    Google Scholar 

  34. Hill M, Hua G, Natsev A et al (2010) IBM research TRECVID 2010 video copy detection and multimedia event detection system. In: Proc. TRECVID 2010 workshop. Gaithersburg, MD, USA

  35. IPTC International Press Telecommunications Council, London, UK (2012) EventML. Last accessed 15 Mar 2013

  36. IPTC International Press Telecommunications Council, London, UK (2012) NewsML. Last accessed 15 Mar 2013

  37. Itkonen E (1983) Causality in linguistic theory. Indiana Univ. Press, Bloomington, IN

    Google Scholar 

  38. Jain R (2008) EventWeb: developing a human-centered computing system. Comput 41(2):42–50. ISSN 0018-9162. doi:10.1109/MC.2008.49

    Article  Google Scholar 

  39. Jiang Y, Zeng X, Ye G et al (2010) Columbia-UCF TRECVID 2010 multimedia event detection: combining multiple modalities, contextual concepts, and temporal matching. In: Proc. TRECVID 2010 workshop. Gaithersburg, MD, USA

  40. Jiang Y-G, Bhattacharya S, Chang S-F, Shah M (2012) High-level event recognition in unconstrained videos. Int J Multimedia Infor Retr. doi:10.1007/s13735-012-0024-2

  41. Kokar MM, Matheus CJ, Baclawski K (2009) Ontology-based situation awareness. Inf Fusion 10(1):83–98. ISSN 1566-2535. doi:10.1016/j.inffus.2007.01.004

    Google Scholar 

  42. Kowalski R, Sergot M (1986) A logic-based calculus of events. New Gener Comput 4(1):67–95. ISSN 0288-3635. doi:10.1007/BF03037383

    Article  Google Scholar 

  43. Lin F (1996) Embracing causality in specifying the indeterminate effects of actions. In: AAAI/IAAI, vol 1, pp 670–676

  44. Lin F (2008) Handbook of knowledge representation, chapter situtation calculus. Elsevier

  45. Lombard L (1986) Events: a metaphysical study. Routledge & Kegan Paul

  46. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  47. Manjunath B, Ohm J-R, Vasudevan V, Yamada A (2001) Color and texture descriptors. IEEE Trans Circuits Syst Video Technol 11(6):703–715

    Article  Google Scholar 

  48. Matheus C, Kokar M, Baclawski K, Letkowski J, Call C, Hinman M, Salerno J, Boulware D (2005) Sawa: an assistant for higher-level fusion and situation awareness. In: Multisensor, multisource informatio fusion: architectures, algorithms, and applications. SPIE, Orlando, pp 75–85

    Google Scholar 

  49. Matheus CJ, Baclawski K, Kokar MM, Letkowski J (2005) Using SWRL and OWL to capture domain knowledge for situation awareness application applied to a supply logistics scenario. In: Rules and rule markup languages for the semantic web, LNCS, vol 3791. Springer, pp 130–144

  50. Matheus CJ, Kokar MM, Baclawski K (2003) A core ontology for situation awareness. In: Information fusion. Cairns, Australia, pp 545–552

    Google Scholar 

  51. Matheus CJ, Kokar MM, Baclawski K, Letkowski J (2005) An application of semantic web technologies to situation awareness. In: International semantic web conference, LNCS, vol 3729. Springer, pp 944–958

  52. Merler M, Huang B, Xie L, Hua G, Natsev A (2012) Semantic model vectors for complex video event recognition. IEEE Trans Multimedia 14(1):88–101

    Article  Google Scholar 

  53. Mezaris V, Dimou A, Kompatsiaris I (2010) On the use of feature tracks for dynamic concept detection in video. In: Proc. IEEE International Conference on Image Processing (ICIP 2010). Hong Kong, China pp 4697–4700

  54. Mezaris V, Gidaros S, Papadopoulos G, Kasper W, Steffen J, Ordelman R, Huijbregts M, de Jong F, Kompatsiaris I, Strintzis M (2010) A system for the semantic multi-modal analysis of news audio-visual content. EURASIP J Adv Signal Process. doi:10.1155/2010/645052

    Google Scholar 

  55. Moumtzidou A, Gkalelis N, Sidiropoulos P, Dimopoulos M, Nikolopoulos S, Vrochidis S, Mezaris V, Kompatsiaris I (2012) Iti-certh participation to trecvid 2012. In: Proc. TRECVID 2012 workshop. Gaithersburg, MD, USA

  56. Mueller ET (2008) Handbook of knowledge representation, chapter event calculus. Elsevier

  57. Nack F, Ossenbruggen J, Hardman L (2005) That obscure object of desire: multimedia metadata on the web, part 2. IEEE Multimed 12(1):54–63

    Article  Google Scholar 

  58. Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. In: Proceedings of the 2004 conference on Computer Vision and Pattern Recognition Workshop, CVPRW’04, vol 7. IEEE Computer Society, Washington, DC, p 119. ISBN 0-7695-2158-4. URL:

    Chapter  Google Scholar 

  59. OASIS Emergency Management TC (2010) Common alerting protocol version 1.2 (oasis standard).

  60. Over P, Fiscus J, Sanders G, Shaw B, Awad G, Michel M, Smeaton A, Kraaij W, Quenot G (2012) Trecvid 2012—goals, tasks, data, evaluation mechanisms and metrics. In: Proc. TRECVID 2012 workshop. Gaithersburg, MD, USA

  61. Papadopoulos G, Briassouli A, Mezaris V, Kompatsiaris I, Strintzis M (2009) Statistical motion information extraction and representation for semantic video analysis. IEEE Trans Circuits Syst Video Technol 19(10):1513–1528

    Article  Google Scholar 

  62. Quinton A (1979) Objects and events. Mind 88(350):197–214

    Article  Google Scholar 

  63. Raimond Y, Abdallah S (2007) The event ontology. Last accessed 15 Mar 2013

  64. Saathoff C, Scherp A (2010) Unlocking the semantics of multimedia presentations in the web with the multimedia metadata ontology. In: World Wide Web conference. ACM, Raleigh, NC, pp 831–840

    Google Scholar 

  65. Scherp A, Agaram S, Jain R (2008) Event-centric media management. In: SPIE, vol 6820

  66. Scherp A, Eißing D, Saathoff C (2012) A method for integrating multimedia metadata standards and metadata formats with the multimedia metadata ontology. Int J Semantic Computing 6(1):25–50

    Article  Google Scholar 

  67. Scherp A, Franz T, Saathoff C, Staab S (2009) F–a model of events based on the foundational ontology DOLCE+DnS Ultralight. In: Proceedings of the 5th International conference on knowledge capture (K-CAP 2009). ACM, Redondo Beach, CA, pp 137–144. ISBN 978-1-60558- 658-8, 1–4 September 2009

  68. Scherp A, Franz T, Saathoff C, Staab S (2012) A core ontology on events for representing occurrences in the real world. Multimed Tools Appl 58(2):293–331

    Article  Google Scholar 

  69. Scherp A, Saathoff C, Franz T, Staab S (2011) Designing core ontologies. Appl Ontology 6(3):177–221

    Google Scholar 

  70. Shadbolt N, Berners-Lee T, Hall W (2006) The semantic web revisited. IEEE Intell Syst 21(3):96–101

    Article  Google Scholar 

  71. Shaw R, Troncy R, Hardman L (2009) Lode: linking open descriptions of events. In: Gómez-Pérez A, Yu Y, Ding Y (eds) Proceedings the semantic web, 4th Asian conference, ASWC 2009, Shanghai, China, vol 5926. Lecture notes in computer science. Springer, pp 153–167. ISBN 978-3-642-10870-9, 6–9 December 2009

  72. Shipley B (2002) Cause and correlation in biology. Cambridge Univ. Press

  73. Sinclair P, Addis M, Choi F, Doerr M, Lewis P, Martinez K (2006) The use of CRM core in multimedia annotation. In: Semantic web annotations for multimedia

  74. Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in TRECV id: a 5-year retrospective of achievements. In: Divakaran A (ed) Multimedia content analysis, theory and applications. Springer-Verlag, Berlin, pp 151–174

    Google Scholar 

  75. Snoek C, Worring M (2009) Concept-based video retrieval. Foundations and Trends in Information Retrieval 4(2):215–322

    Google Scholar 

  76. Snoek C, Worring M, van Gemert J, Geusebroek J-M, Smeulders A (2006) The challenge problem for automate detection of 101 semantic concepts in multimedia. In: Proc. ACM Multimedia. Santa Barbara, USA, pp 421–430

  77. Technical Standardization Committee on AV & IT Storage Systems and Equipment (2002) Exchangeable image file format for digital still cameras: exif version 2.2. Technical report

  78. Tesic J (2005) Metadata practices for consumer photos. IEEE Multimed 12(3):86–92

    Article  Google Scholar 

  79. Tjondronegoro DW, Chen YP (2010) Knowledge-discounted event detection in sports video. IEEE Trans Syst Man Cybern Part A Syst Humans 40(5):1009–1024

    Article  Google Scholar 

  80. Troncy R, Celma O, Little S, Garcia R, Tsinaraki C (2007) MPEG-7 based multimedia ontologies: interoperability support or interoperability issue? In: Proc. 1st workshop on multimedia annotation and retrieval enabled by shared ontologies. Genova, Italy

  81. van de Sande K, Gevers T, Snoek C (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596

    Article  Google Scholar 

  82. van Hage WR, Malaisé V, de Vries G, Schreiber G, van Someren M (2012) Abstracting and reasoning over ship trajectories and web data with the simple event model (sem). Multimed Tools Appl 57(1):175–197

    Article  Google Scholar 

  83. Wang F, Jiang Y-G, Ngo C-W (2008) Video event detection using motion relativity and visual relatedness. In: Proc. 16th ACM international conference on multimedia. Vancouver, BC, Canada, pp 239–248

  84. Wang X, Mamadgi S, Thekdi A, Kelliher A, Sundaram H (2007) Eventory—an event based media repository. In: Semantic computing. IEEE, Washington, DC, pp 95–104. ISBN 0-7695-2997-6

    Google Scholar 

  85. Wang XH, Zhang DQ, Gu T, Pung HK (2004) Ontology based context modeling and reasoning using OWL. In: Pervasive computing and communications workshops. IEEE, Washington, DC, p 18. ISBN 0-7695-2106-1

    Google Scholar 

  86. Westermann U, Jain R (2006) E—a generic event model for event-centric multimedia data management in echronicle applications. In: Data engineering workshops. IEEE, Washington, DC, p 106. ISBN 0-7695-2571-7. doi:10.1109/ICDEW.2006.1

    Google Scholar 

  87. Westermann U, Jain R (2007) Toward a common event model for multimedia applications. IEEE Multimed 14(1):19–29

    Article  Google Scholar 

  88. Xu D, Chang S-F (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Intell 30(11):1985–1997

    Article  Google Scholar 

  89. Yan W, Kieran DF, Rafatirad S, Jain R (2011) A comprehensive study of visual event computing. Multimed Tools Appl 55(3):443–481

    Article  Google Scholar 

  90. Yau SS, Liu J (2006) Hierarchical situation modeling and reasoning for pervasive computing. In: Software technologies for future embedded and ubiquitous systems. IEEE, Washington, DC, pp 5–10. ISBN 0-7695-2560-1

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ansgar Scherp.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scherp, A., Mezaris, V. Survey on modeling and indexing events in multimedia. Multimed Tools Appl 70, 7–23 (2014).

Download citation

  • Published:

  • Issue Date:

  • DOI: