Multimedia Tools and Applications

, Volume 70, Issue 1, pp 7–23 | Cite as

Survey on modeling and indexing events in multimedia

Article

Abstract

Events have gained increasing interest in the area of multimedia in recent years. There have been many approaches published and research conducted on how to extract events from multimedia, represent it using appropriate models, and how to use events in end user applications. In this paper, we conduct an extensive analysis of existing event models along commonly identified aspects of events. In addition, we analyze how the different aspects of events relate to each other and how they can be applied together. Subsequently, we look into different approaches for how to index multimedia data. Finally, we elaborate on how to link the multimedia data with events in order to provide the basis for future event-based multimedia applications.

Keywords

Event models Event aspects Event-based indexing 

References

  1. 1.
    Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843. ISSN 0001-0782. doi:10.1145/182.358434 CrossRefMATHGoogle Scholar
  2. 2.
    Appan P, Sundaram H (2004) Networked multimedia event exploration. In: Proceedings of the 12th annual ACM international conference on multimedia, MULTIMEDIA ’04. ACM, New York, NY, pp 40–47. ISBN 1-58113-893-8. doi:10.1145/1027527.1027536 CrossRefGoogle Scholar
  3. 3.
    Arndt R, Troncy R, Staab S, Hardman L, Vacura M (2007) COMM: designing a well-founded multimedia ontology for the web. In: The Semantic Web: ISWC 2007 + ASWC 2007, lecture notes in computer science, vol 4825. Springer, Berlin, pp 30–43CrossRefGoogle Scholar
  4. 4.
    Atrey PK, Saddik AE, Kankanhalli MS (2011) Effective multimedia surveillance using a human-centric approach. Multimed Tools Appl 51(2):697–721CrossRefGoogle Scholar
  5. 5.
    Ballan L, Bertini M, Bimbo AD, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tools Appl 51(1):279–302CrossRefGoogle Scholar
  6. 6.
    Ballan L, Bertini M, Bimbo AD, Serra G (2010) Semantic annotation of soccer videos by visual instanc clustering and spatial/temporal reasoning in ontologies. Multimed Tools Appl 48(2):313–337CrossRefGoogle Scholar
  7. 7.
    Ballan L, Bertini M, Serra G (2010) Video annotation and retrieval using ontologies and rule learning. IEEE Multimed 17(4):80–88CrossRefGoogle Scholar
  8. 8.
    Baumgartner N, Retschitzegger W (2006) A survey of upper ontologies for situation awareness. In: Knowledge sharing and collaborative engineering. ACTA Press, St. Thomas, VI, pp 1–9Google Scholar
  9. 9.
    Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vis Image Underst 110(3):346–359CrossRefGoogle Scholar
  10. 10.
    Bertini M, Bimbo AD, Serra G, Torniai C, Cucchiara R, Grana C, Vezzani R (2009) Dynamic pictorially enriched ontologies for digital video libraries. IEEE Multimed 16:42–51CrossRefGoogle Scholar
  11. 11.
    Cao L, Codella N, Gong L et al (2012) Ibm research and columbia university trecvid-2012 multimedia event detection (med), multimedia event recounting (mer), and semantic indexing (sin) systems. In: Proc. TRECVID 2012 workshop. Gaithersburg, MD, USAGoogle Scholar
  12. 12.
    Carbonaro A (2008) Ontology-based video retrieval in a semantic-based learning environment. J E-Learn Knowl Soc 4(3):203–212MathSciNetGoogle Scholar
  13. 13.
    Casati R, Varzi A (2006) Events. Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/entries/events
  14. 14.
    Cervesato I, Franceschet M, Montanari A (1999) A guided tour through some extensions of the event calculus. Comput Intell 16(2):307–347CrossRefMathSciNetGoogle Scholar
  15. 15.
    Chandy KM, Charpentier M, Capponi A (2007) Towards a theory of events. In: Proceedings of the 2007 inaugural international conference on distributed event-based systems, DEBS ’07. ACM, New York, NY, pp 180–187. ISBN 978-1-59593-665-3. doi:10.1145/1266894.1266929 CrossRefGoogle Scholar
  16. 16.
    Chang, S-F, He J, Jiang Y-G, Khoury EE, Ngo C-W, Yanagawa A, Zavesky E (2008) Columbia University/VIREO-CityU/IRIT TRECVID2008 high-level feature extraction and interactive video search. In: Proc. TRECVID 2008 workshop. Gaithersburg, MD, USAGoogle Scholar
  17. 17.
    Chechik G, Ie E, Rehn M, Bengio S, Lyon D (2008) Large-scale content-based audio retrieval from tex queries. In: Proc. 1st ACM int. conf. on Multimedia Information Retrieval, (MIR ’08). Vancouver, BC, Canada, pp 105–112Google Scholar
  18. 18.
    Chen H, Finin TW, Joshi A (2003) Using OWL in a pervasive computing broker. In: Proceedings ontologies in agent systems CEUR workshop, CEUR-WS.org, vol 73. Melbourne, Australia, pp 9–16Google Scholar
  19. 19.
    Chen H, Joshi A (2004) The SOUPA ontology for pervasive computing. Birkhauser Publishing Ltd.Google Scholar
  20. 20.
    Cheng H, Liu J, Ali S et al (2012) Sri-sarnoff aurora system at TRECVID 2012 multimedia event detection and recounting. In: Proc. TRECVID 2012 workshop. Gaithersburg, MD, USAGoogle Scholar
  21. 21.
    Dasiopoulou S, Mezaris V, Kompatsiaris I, Papastathis V, Strintzis M (2005) Knowledge-assisted semantic video object detection. IEEE Trans Circuits Syst Video Technol 15(10):1210–1224CrossRefGoogle Scholar
  22. 22.
    Doerr M, Ore C-E, Stead S (2007) The CIDOC conceptual reference model: a new standard for knowledge sharing. In: Conceptual modeling. Australian Computer Society Inc., pp 51–56. ISBN 978-1-920682-64-4Google Scholar
  23. 23.
    Ekin A, Tekalp AM, Mehrotra R (2004) Integrated semantic-syntactic video modeling for search and browsing. IEEE Trans Multimedia 6(6):839–851CrossRefGoogle Scholar
  24. 24.
    Francois ARJ, Nevatia R, Hobbs J, Bolles RC (2005) VERL: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76–86CrossRefGoogle Scholar
  25. 25.
    Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with DOLCE. In: International conference on knowledge engineering and knowledge management. Springer, London, pp 166–181. ISBN 3-540-44268-5Google Scholar
  26. 26.
    Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with DOLCE. In: Proc. of the 13th int. conf. on knowledge engineering and knowledge management. Ontologies and the semantic web, (EKAW ’02). London, UK, pp 166–181Google Scholar
  27. 27.
    Gangemi A, Presutti V (2009) Ontology design patterns. In: Staab S, Studer R (eds) Handbook of ontologies, 2nd edn. International handbooks on information systems. SpringerGoogle Scholar
  28. 28.
    Gkalelis N, Mezaris V, Kompatsiaris I (2010) A joint content-event model for event-centric multimedia indexing. In: Proceedings of the 4th IEEE international conference on semantic computing, (ICSC 2010). Carnegie Mellon University, Pittsburgh. IEEE, PA, pp 79–84, 22–24 September 2010Google Scholar
  29. 29.
    Gkalelis N, Mezaris V, Kompatsiaris I (2011) High-level event detection in video exploiting discriminant concepts. In: Proc. 9th International workshop on Content-Based Multimedia Indexing, (CBMI 2011). Madrid, Spain, pp 85–90Google Scholar
  30. 30.
    Gkalelis N, Mezaris V, Kompatsiaris I (2011) Mixture subclass discriminant analysis. IEEE Signal Process Lett 18(5):319–322CrossRefGoogle Scholar
  31. 31.
    Gkalelis N, Mezaris V, Kompatsiaris I, Stathaki T (2013) Mixture subclass discriminant analysis link to restricted Gaussian model and other generalizations. IEEE Transactions on Neural Networks and Learning Systems 24(1):8–21CrossRefGoogle Scholar
  32. 32.
    Gupta A, Jain R (2011) Managing event information: modeling, retrieval, and applications. Synthesis lectures on data management. Morgan & Claypool PublishersGoogle Scholar
  33. 33.
    Hakeem A, Sheikh Y, Shah M (2004) Casee: a hierarchical event representation for the analysis of videos. In: McGuinness DL, Ferguson G (eds) Proceedings of the 19th national conference on artificial intelligence, 16th conference on innovative applications of artificial intelligence. AAAI Press/The MIT Press, San Jose, CA, pp 263–268. ISBN 0-262-51183-5, 25–29 July 2004Google Scholar
  34. 34.
    Hill M, Hua G, Natsev A et al (2010) IBM research TRECVID 2010 video copy detection and multimedia event detection system. In: Proc. TRECVID 2010 workshop. Gaithersburg, MD, USAGoogle Scholar
  35. 35.
    IPTC International Press Telecommunications Council, London, UK (2012) EventML. http://www.iptc.org/site/News_Exchange_Formats/EventsML-G2/Specification Last accessed 15 Mar 2013
  36. 36.
    IPTC International Press Telecommunications Council, London, UK (2012) NewsML. http://www.iptc.org/site/News_Exchange_Formats/NewsML-G2 Last accessed 15 Mar 2013
  37. 37.
    Itkonen E (1983) Causality in linguistic theory. Indiana Univ. Press, Bloomington, INGoogle Scholar
  38. 38.
    Jain R (2008) EventWeb: developing a human-centered computing system. Comput 41(2):42–50. ISSN 0018-9162. doi:10.1109/MC.2008.49 CrossRefGoogle Scholar
  39. 39.
    Jiang Y, Zeng X, Ye G et al (2010) Columbia-UCF TRECVID 2010 multimedia event detection: combining multiple modalities, contextual concepts, and temporal matching. In: Proc. TRECVID 2010 workshop. Gaithersburg, MD, USAGoogle Scholar
  40. 40.
    Jiang Y-G, Bhattacharya S, Chang S-F, Shah M (2012) High-level event recognition in unconstrained videos. Int J Multimedia Infor Retr. doi:10.1007/s13735-012-0024-2
  41. 41.
    Kokar MM, Matheus CJ, Baclawski K (2009) Ontology-based situation awareness. Inf Fusion 10(1):83–98. ISSN 1566-2535. doi:10.1016/j.inffus.2007.01.004 Google Scholar
  42. 42.
    Kowalski R, Sergot M (1986) A logic-based calculus of events. New Gener Comput 4(1):67–95. ISSN 0288-3635. doi:10.1007/BF03037383 CrossRefGoogle Scholar
  43. 43.
    Lin F (1996) Embracing causality in specifying the indeterminate effects of actions. In: AAAI/IAAI, vol 1, pp 670–676Google Scholar
  44. 44.
    Lin F (2008) Handbook of knowledge representation, chapter situtation calculus. ElsevierGoogle Scholar
  45. 45.
    Lombard L (1986) Events: a metaphysical study. Routledge & Kegan PaulGoogle Scholar
  46. 46.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  47. 47.
    Manjunath B, Ohm J-R, Vasudevan V, Yamada A (2001) Color and texture descriptors. IEEE Trans Circuits Syst Video Technol 11(6):703–715CrossRefGoogle Scholar
  48. 48.
    Matheus C, Kokar M, Baclawski K, Letkowski J, Call C, Hinman M, Salerno J, Boulware D (2005) Sawa: an assistant for higher-level fusion and situation awareness. In: Multisensor, multisource informatio fusion: architectures, algorithms, and applications. SPIE, Orlando, pp 75–85Google Scholar
  49. 49.
    Matheus CJ, Baclawski K, Kokar MM, Letkowski J (2005) Using SWRL and OWL to capture domain knowledge for situation awareness application applied to a supply logistics scenario. In: Rules and rule markup languages for the semantic web, LNCS, vol 3791. Springer, pp 130–144Google Scholar
  50. 50.
    Matheus CJ, Kokar MM, Baclawski K (2003) A core ontology for situation awareness. In: Information fusion. Cairns, Australia, pp 545–552Google Scholar
  51. 51.
    Matheus CJ, Kokar MM, Baclawski K, Letkowski J (2005) An application of semantic web technologies to situation awareness. In: International semantic web conference, LNCS, vol 3729. Springer, pp 944–958Google Scholar
  52. 52.
    Merler M, Huang B, Xie L, Hua G, Natsev A (2012) Semantic model vectors for complex video event recognition. IEEE Trans Multimedia 14(1):88–101CrossRefGoogle Scholar
  53. 53.
    Mezaris V, Dimou A, Kompatsiaris I (2010) On the use of feature tracks for dynamic concept detection in video. In: Proc. IEEE International Conference on Image Processing (ICIP 2010). Hong Kong, China pp 4697–4700Google Scholar
  54. 54.
    Mezaris V, Gidaros S, Papadopoulos G, Kasper W, Steffen J, Ordelman R, Huijbregts M, de Jong F, Kompatsiaris I, Strintzis M (2010) A system for the semantic multi-modal analysis of news audio-visual content. EURASIP J Adv Signal Process. doi:10.1155/2010/645052 Google Scholar
  55. 55.
    Moumtzidou A, Gkalelis N, Sidiropoulos P, Dimopoulos M, Nikolopoulos S, Vrochidis S, Mezaris V, Kompatsiaris I (2012) Iti-certh participation to trecvid 2012. In: Proc. TRECVID 2012 workshop. Gaithersburg, MD, USAGoogle Scholar
  56. 56.
    Mueller ET (2008) Handbook of knowledge representation, chapter event calculus. ElsevierGoogle Scholar
  57. 57.
    Nack F, Ossenbruggen J, Hardman L (2005) That obscure object of desire: multimedia metadata on the web, part 2. IEEE Multimed 12(1):54–63CrossRefGoogle Scholar
  58. 58.
    Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. In: Proceedings of the 2004 conference on Computer Vision and Pattern Recognition Workshop, CVPRW’04, vol 7. IEEE Computer Society, Washington, DC, p 119. ISBN 0-7695-2158-4. URL: http://dl.acm.org/citation.cfm?id=1032638.1033010 CrossRefGoogle Scholar
  59. 59.
    OASIS Emergency Management TC (2010) Common alerting protocol version 1.2 (oasis standard). http://docs.oasis-open.org/emergency/cap/v1.2/CAP-v1.2.doc
  60. 60.
    Over P, Fiscus J, Sanders G, Shaw B, Awad G, Michel M, Smeaton A, Kraaij W, Quenot G (2012) Trecvid 2012—goals, tasks, data, evaluation mechanisms and metrics. In: Proc. TRECVID 2012 workshop. Gaithersburg, MD, USAGoogle Scholar
  61. 61.
    Papadopoulos G, Briassouli A, Mezaris V, Kompatsiaris I, Strintzis M (2009) Statistical motion information extraction and representation for semantic video analysis. IEEE Trans Circuits Syst Video Technol 19(10):1513–1528CrossRefGoogle Scholar
  62. 62.
    Quinton A (1979) Objects and events. Mind 88(350):197–214CrossRefGoogle Scholar
  63. 63.
    Raimond Y, Abdallah S (2007) The event ontology. http://motools.sf.net/event Last accessed 15 Mar 2013
  64. 64.
    Saathoff C, Scherp A (2010) Unlocking the semantics of multimedia presentations in the web with the multimedia metadata ontology. In: World Wide Web conference. ACM, Raleigh, NC, pp 831–840Google Scholar
  65. 65.
    Scherp A, Agaram S, Jain R (2008) Event-centric media management. In: SPIE, vol 6820Google Scholar
  66. 66.
    Scherp A, Eißing D, Saathoff C (2012) A method for integrating multimedia metadata standards and metadata formats with the multimedia metadata ontology. Int J Semantic Computing 6(1):25–50CrossRefGoogle Scholar
  67. 67.
    Scherp A, Franz T, Saathoff C, Staab S (2009) F–a model of events based on the foundational ontology DOLCE+DnS Ultralight. In: Proceedings of the 5th International conference on knowledge capture (K-CAP 2009). ACM, Redondo Beach, CA, pp 137–144. ISBN 978-1-60558- 658-8, 1–4 September 2009Google Scholar
  68. 68.
    Scherp A, Franz T, Saathoff C, Staab S (2012) A core ontology on events for representing occurrences in the real world. Multimed Tools Appl 58(2):293–331CrossRefGoogle Scholar
  69. 69.
    Scherp A, Saathoff C, Franz T, Staab S (2011) Designing core ontologies. Appl Ontology 6(3):177–221Google Scholar
  70. 70.
    Shadbolt N, Berners-Lee T, Hall W (2006) The semantic web revisited. IEEE Intell Syst 21(3):96–101CrossRefGoogle Scholar
  71. 71.
    Shaw R, Troncy R, Hardman L (2009) Lode: linking open descriptions of events. In: Gómez-Pérez A, Yu Y, Ding Y (eds) Proceedings the semantic web, 4th Asian conference, ASWC 2009, Shanghai, China, vol 5926. Lecture notes in computer science. Springer, pp 153–167. ISBN 978-3-642-10870-9, 6–9 December 2009Google Scholar
  72. 72.
    Shipley B (2002) Cause and correlation in biology. Cambridge Univ. PressGoogle Scholar
  73. 73.
    Sinclair P, Addis M, Choi F, Doerr M, Lewis P, Martinez K (2006) The use of CRM core in multimedia annotation. In: Semantic web annotations for multimediaGoogle Scholar
  74. 74.
    Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in TRECV id: a 5-year retrospective of achievements. In: Divakaran A (ed) Multimedia content analysis, theory and applications. Springer-Verlag, Berlin, pp 151–174Google Scholar
  75. 75.
    Snoek C, Worring M (2009) Concept-based video retrieval. Foundations and Trends in Information Retrieval 4(2):215–322Google Scholar
  76. 76.
    Snoek C, Worring M, van Gemert J, Geusebroek J-M, Smeulders A (2006) The challenge problem for automate detection of 101 semantic concepts in multimedia. In: Proc. ACM Multimedia. Santa Barbara, USA, pp 421–430Google Scholar
  77. 77.
    Technical Standardization Committee on AV & IT Storage Systems and Equipment (2002) Exchangeable image file format for digital still cameras: exif version 2.2. Technical reportGoogle Scholar
  78. 78.
    Tesic J (2005) Metadata practices for consumer photos. IEEE Multimed 12(3):86–92CrossRefGoogle Scholar
  79. 79.
    Tjondronegoro DW, Chen YP (2010) Knowledge-discounted event detection in sports video. IEEE Trans Syst Man Cybern Part A Syst Humans 40(5):1009–1024CrossRefGoogle Scholar
  80. 80.
    Troncy R, Celma O, Little S, Garcia R, Tsinaraki C (2007) MPEG-7 based multimedia ontologies: interoperability support or interoperability issue? In: Proc. 1st workshop on multimedia annotation and retrieval enabled by shared ontologies. Genova, ItalyGoogle Scholar
  81. 81.
    van de Sande K, Gevers T, Snoek C (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596CrossRefGoogle Scholar
  82. 82.
    van Hage WR, Malaisé V, de Vries G, Schreiber G, van Someren M (2012) Abstracting and reasoning over ship trajectories and web data with the simple event model (sem). Multimed Tools Appl 57(1):175–197CrossRefGoogle Scholar
  83. 83.
    Wang F, Jiang Y-G, Ngo C-W (2008) Video event detection using motion relativity and visual relatedness. In: Proc. 16th ACM international conference on multimedia. Vancouver, BC, Canada, pp 239–248Google Scholar
  84. 84.
    Wang X, Mamadgi S, Thekdi A, Kelliher A, Sundaram H (2007) Eventory—an event based media repository. In: Semantic computing. IEEE, Washington, DC, pp 95–104. ISBN 0-7695-2997-6Google Scholar
  85. 85.
    Wang XH, Zhang DQ, Gu T, Pung HK (2004) Ontology based context modeling and reasoning using OWL. In: Pervasive computing and communications workshops. IEEE, Washington, DC, p 18. ISBN 0-7695-2106-1Google Scholar
  86. 86.
    Westermann U, Jain R (2006) E—a generic event model for event-centric multimedia data management in echronicle applications. In: Data engineering workshops. IEEE, Washington, DC, p 106. ISBN 0-7695-2571-7. doi:10.1109/ICDEW.2006.1 Google Scholar
  87. 87.
    Westermann U, Jain R (2007) Toward a common event model for multimedia applications. IEEE Multimed 14(1):19–29CrossRefGoogle Scholar
  88. 88.
    Xu D, Chang S-F (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Intell 30(11):1985–1997CrossRefGoogle Scholar
  89. 89.
    Yan W, Kieran DF, Rafatirad S, Jain R (2011) A comprehensive study of visual event computing. Multimed Tools Appl 55(3):443–481CrossRefGoogle Scholar
  90. 90.
    Yau SS, Liu J (2006) Hierarchical situation modeling and reasoning for pervasive computing. In: Software technologies for future embedded and ubiquitous systems. IEEE, Washington, DC, pp 5–10. ISBN 0-7695-2560-1Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Institute of Computer Science and Business InformaticsUniversity of MannheimMannheimGermany
  2. 2.Information Technologies Institute (ITI)Centre for Research and Technology Hellas (CERTH)Thermi-ThessalonikiGreece

Personalised recommendations