Multimedia Tools and Applications

, Volume 76, Issue 12, pp 14437–14460 | Cite as

RDF-powered semantic video annotation tools with concept mapping to Linked Data for next-generation video indexing: a comprehensive review

  • Leslie F. SikosEmail author


Video annotation tools are often compared in the literature, however, most reviews mix unstructured, semi-structured, and the very few structured annotation software. This paper is a comprehensive review of video annotations tools generating structured data output for video clips, regions of interest, frames, and media fragments, with a focus on Linked Data support. The tools are compared in terms of supported input and output data formats, expressivity, annotation specificity, spatial and temporal fragmentation, the concept mapping sources used for Linked Open Data (LOD) interlinking, provenance data support, and standards alignment. Practicality and usability aspects of the user interface of these tools are highlighted. Moreover, this review distinguishes extensively researched yet discontinued semantic video annotation software from promising state-of-the-art tools that show new directions in this increasingly important field.


Video annotation Multimedia semantics Spatiotemporal fragmentation Video scene interpretation Multimedia ontologies Hypervideo application 


  1. 1.
    Aydınlılar M, Yazıcı A (2013) Semi-automatic semantic video annotation tool. In: Gelenbe E, Lent R (eds) Computer and information sciences III, pp 303–310. doi: 10.1007/978-1-4471-4594-3_31
  2. 2.
    Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tools Appl 51(1):279–302. doi: 10.1007/s11042-010-0643-7 CrossRefGoogle Scholar
  3. 3.
    Ballan L, Bertini M, Del Bimbo A, Serra G (2010) Semantic annotation of soccer videos by visual instance clustering and spatial/temporal reasoning in ontologies. Multimed Tools Appl 48:313–337. doi: 10.1007/s11042-009-0342-4 CrossRefGoogle Scholar
  4. 4.
    Bellini P, Nesi P, Serena M (2015) MyStoryPlayer: experiencing multiple audiovisual content for education and training. Multimed Tools Appl 74:8219–8259. doi: 10.1007/s11042-014-2052-9 CrossRefGoogle Scholar
  5. 5.
    Benmokhtar R, Huet B (2014) An ontology-based evidential framework for video indexing using high-level multimodal fusion. Multimed Tools Appl 73(2):663–689. doi: 10.1007/s11042-011-0936-5 CrossRefGoogle Scholar
  6. 6.
    Bertini M, d’Amico G, Ferracani A, Meoni M, Serra G (2010) Sirio, Orione and Pan: an integrated web system for ontology-based video search and annotation. In: ACM international conference on multimedia, Firenze, Oct 25–29, 2010, pp 1625–1628. doi: 10.1145/1873951.1874305
  7. 7.
    Bertini M, Del Bimbo A, Torniai C, Cucchiara R, Grana C (2006) MOM: multimedia ontology manager. A framework for automatic annotation and semantic retrieval of video sequences. In: ACM Multimedia 2006, Santa Barbara, Oct 23–27, 2006, pp 787–788Google Scholar
  8. 8.
    Bizer C, Heath T, Berners-Lee T (2009) Linked Data—the story so far. Int J Semant Web Inform Syst 5(3):1–22. doi: 10.4018/jswis.2009081901
  9. 9.
    Bohlken W, Neumann B, Hotz L, Koopmann P (2011) Ontology-based realtime activity monitoring using beam search. Lect Notes Comput Sci 6962:112–121. doi: 10.1007/978-3-642-23968-7_12 CrossRefGoogle Scholar
  10. 10.
    Carrer M, Ligresti L, Ahanger G, Little TDC (1998) An annotation engine for supporting video database population. Springer Int Series Eng Comput Sci 431:161–184. doi: 10.1007/978-0-585-28767-6_7 Google Scholar
  11. 11.
    Choudhury S, Breslin JG (2010) Enriching videos with light semantics. In: Fourth international conference on advances in semantic processing, Florence, Oct 25–30, 2010, pp 126–131Google Scholar
  12. 12.
    Duong TH, Nguyen NT, Truong HB, Nguyen VH (2015) A collaborative algorithm for semantic video annotation using a consensus-based social network analysis. Expert Syst Appl 42(1):246–258. doi: 10.1016/j.eswa.2014.07.046 CrossRefGoogle Scholar
  13. 13.
    Elleuch N, Zarka M, Ammar AB, Alimi AM (2011) A fuzzy ontology-based framework for reasoning in visual video content analysis and indexing. In: Eleventh international workshop on multimedia data mining, San Diego, Aug 21–24, 2011, Article 1. doi: 10.1145/2237827.2237828
  14. 14.
    Gómez-Romero J, Patricio MA, García J, Molina JM (2010) Ontology-based context representation and reasoning for object tracking and scene interpretation in video. Expert Syst Appl 38:7494–7510. doi: 10.1016/j.eswa.2010.12.118 CrossRefGoogle Scholar
  15. 15.
    Grassi M, Morbidoni C, Nucci M (2012) A collaborative video annotation system based on semantic web technologies. Cogn Comput 4(4):497–514. doi: 10.1007/s12559-012-9172-1 Google Scholar
  16. 16.
    Guo K, Zhang S (2013) A semantic medical multimedia retrieval approach using ontology information hiding. Computational and Mathematical Methods in Medicine, Volume 2013, Article ID 407917, Hindawi Publishing Corporation. doi: 10.1155/2013/407917
  17. 17.
    Haslhofer B, Jochum W, King R, Sadilek C, Schellner K (2009) The LEMO annotation framework: weaving multimedia annotations with the web. Int J Digit Libr 10(1):15–32. doi: 10.1007/s00799-009-0050-8 CrossRefGoogle Scholar
  18. 18.
    Haslhofer B, Momeni E, Gay M, Simon R (2010) Augmenting Europeana content with Linked Data resources. In: 6th international conference on semantic systems, Graz, Sep 1–3, 2010, Article 40. doi: 10.1145/1839707.1839757
  19. 19.
    Heggland J (2002) Ontolog: temporal annotation using ad hoc ontologies and application profiles. Lect Notes Comput Sci 2458:118–128. doi: 10.1007/3-540-45747-X_9 CrossRefzbMATHGoogle Scholar
  20. 20.
    Hunter J, Newmarch J (1999) An indexing, browsing, search and retrieval system for audiovisual libraries. Lect Notes Comput Sci 1696:76–91. doi: 10.1007/3-540-48155-9_7 CrossRefGoogle Scholar
  21. 21.
    Hunter J, Schroeter R, Henderson M (2003) Vannotea screenshot. University of Queensland. Accessed 4 April 2016
  22. 22.
    Jiang Y-G, Bhattacharya S, Chang S-F, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimed Info Retr 2:73–101. doi: 10.1007/s13735-012-0024-2 CrossRefGoogle Scholar
  23. 23.
    Khedher MI, El Yacoubi MA (2015) Local sparse representation based interest point matching for person re-identification. Lect Notes Comput Sci 9491:241–250. doi: 10.1007/978-3-319-26555-1_28 CrossRefGoogle Scholar
  24. 24.
    Krötzsch M, Simančík F, Horrocks I (2013) A description logic primer. arXiv:1201.4089v3
  25. 25.
    Lee M-H, Rho S, Choi E-I (2014) Ontology-based user query interpretation for semantic multimedia contents retrieval. Multimed Tools Appl 73(2):901–915. doi: 10.1007/s11042-013-1383-2 CrossRefGoogle Scholar
  26. 26.
    Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: 2002 International conference on image processing, New York, Sep 22–25, 2002, pp 900–903. doi: 10.1109/ICIP.2002.1038171
  27. 27.
    Lombardo V, Pizzo A (2014) Ontology–based visualization of characters’ intentions. Lect Notes Comput Sci 8832:176–187. doi: 10.1007/978-3-319-12337-0_18 CrossRefGoogle Scholar
  28. 28.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. doi: 10.1023/B:VISI.0000029664.99615.94 CrossRefGoogle Scholar
  29. 29.
    Mazloom M, Habibian A, Snoek CG (2013) Querying for video events by semantic signatures from few examples. In: 21st ACM international conference on multimedia, Barcelona, Oct 21–25, 2013, pp 609–612. doi: 10.1145/2502081.2502160
  30. 30.
    Merler M, Huang B, Xie L, Hua G, Natsev A (2012) Semantic model vectors for complex video event recognition. IEEE Trans Multimed 14(1):88–101. doi: 10.1109/TMM.2011.2168948 CrossRefGoogle Scholar
  31. 31.
    Naphade M, Smith JR, Tesic J, Chang S-F, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimedia 13(3):86–91. doi: 10.1109/MMUL.2006.63 CrossRefGoogle Scholar
  32. 32.
    Nixon L, Bauer M, Bara C, Kurz T, Pereira J (2012) ConnectME: semantic tools for enriching online video with web content. In: 8th international conference on semantic systems, Graz, Sep 5–7, 2012, pp 55–62Google Scholar
  33. 33.
    Oomoto E, Tanaka K (1993) OVID: design and implementation of a video-object database system. IEEE T Knowl Data En 5(4):629–643. doi: 10.1109/69.234775 CrossRefGoogle Scholar
  34. 34.
    Poppe C, Martens G, De Potter P, Van de Walle R (2012) Semantic web technologies for video surveillance metadata. Multimed Tools Appl 56(3):439–467. doi: 10.1007/s11042-010-0600-5 CrossRefGoogle Scholar
  35. 35.
    Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: 2011 I.E. international conference on computer vision, Barcelona, Nov 6–13, 2011, pp 2564–2571. doi: 10.1109/ICCV.2011.6126544
  36. 36.
    Sikos LF (2015) Mastering structured data on the Semantic Web: from HTML5 Microdata to Linked Open Data. Apress Media, New York. doi: 10.1007/978-1-4842-1049-9
  37. 37.
    Sikos LF (2016) A novel approach to multimedia ontology engineering for automated reasoning over audiovisual LOD datasets. Lect Notes Comput Sci 9621:3–12. doi: 10.1007/978-3-662-49381-6_1 CrossRefGoogle Scholar
  38. 38.
    Sikos LF, Powers DMW (2015) Knowledge-driven video information retrieval with LOD: from semi-structured to structured video metadata. In: Exploiting semantic annotations in information retrieval, Melbourne, Oct 23, 2015, pp 35–37. doi: 10.1145/2810133.2810141
  39. 39.
    Simon R, Jung J, Haslhofer B (2011) The YUMA media annotation framework. Lect Notes Comput Sci 6966:434–437. doi: 10.1007/978-3-642-24469-8_43 CrossRefGoogle Scholar
  40. 40.
    Steiner T, Hausenblas M (2010) SemWebVid—making video a first class semantic web citizen and a first class web Bourgeois. In: Ninth international semantic web conference, Shanghai, Nov 7–11, 2010Google Scholar
  41. 41.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, Kauai, Dec 8–14, 2001, pp 511–518. doi: 10.1109/CVPR.2001.990517
  42. 42.
    Weiss W, Bürger T, Villa R, Punitha P, Halb W (2009) Statement-based semantic annotation of media resources. Int J Digital Libr 5887:52–64. doi: 10.1007/978-3-642-10543-2_7 Google Scholar
  43. 43.
    Xu F, Zhang Y-J (2006) Evaluation and comparison of texture descriptors proposed in MPEG-7. J Vis Commun Image Represent 17:701–716. doi: 10.1016/j.jvcir.2005.10.002 CrossRefGoogle Scholar
  44. 44.
    Yang N-C, Chang W-H, Kuo C-M, Li T-H (2008) A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval. J Vis Commun Image Represent 19:92–105. doi: 10.1016/j.jvcir.2007.05.003 CrossRefGoogle Scholar
  45. 45.
    Yıldırım Y, Yazıcı A, Yılmaz T (2013) Automatic semantic content extraction in videos using a fuzzy ontology and rule-based model. IEEE T Knowl Data En 25(1):47–61. doi: 10.1109/TKDE.2011.189
  46. 46.
    Zarka M, Ammar AB, Alimi AM (2015) Fuzzy reasoning framework to improve semantic video interpretation. Multimed Tools Appl. doi: 10.1007/s11042-015-2537-1 Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Centre for Knowledge and Interaction Technologies, School of Computer Science, Engineering and MathematicsFlinders UniversityAdelaideAustralia

Personalised recommendations