Multimedia Tools and Applications

, Volume 39, Issue 3, pp 293–327 | Cite as

Semantic representation of multimedia content: Knowledge representation and semantic indexing

  • Phivos MylonasEmail author
  • Thanos Athanasiadis
  • Manolis Wallace
  • Yannis Avrithis
  • Stefanos Kollias


In this paper we present a framework for unified, personalized access to heterogeneous multimedia content in distributed repositories. Focusing on semantic analysis of multimedia documents, metadata, user queries and user profiles, it contributes to the bridging of the gap between the semantic nature of user queries and raw multimedia documents. The proposed approach utilizes as input visual content analysis results, as well as analyzes and exploits associated textual annotation, in order to extract the underlying semantics, construct a semantic index and classify documents to topics, based on a unified knowledge and semantics representation model. It may then accept user queries, and, carrying out semantic interpretation and expansion, retrieve documents from the index and rank them according to user preferences, similarly to text retrieval. All processes are based on a novel semantic processing methodology, employing fuzzy algebra and principles of taxonomic knowledge representation. The first part of this work presented in this paper deals with data and knowledge models, manipulation of multimedia content annotations and semantic indexing, while the second part will continue on the use of the extracted semantic information for personalized retrieval.


Multimedia content semantics extraction Semantic indexing Semantic classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akrivas G, Stamou G, Kollias S (2004) Semantic association of multimedia document descriptions through fuzzy relational algebra and fuzzy reasoning. IEEE Trans Syst Man Cybern Part A, 34:(2), MarchGoogle Scholar
  2. 2.
    Akrivas G, Wallace M, Andreou G, Stamou G, Kollias S (2002) “Context-Sensitive Semantic Query Expansion”, Proceedings of the IEEE international conference on artificial intelligence systems (ICAIS), Divnomorskoe, Russia, September 2002Google Scholar
  3. 3.
    Altenschmidt C, Biskup J (2002) Explicit representation of constrained schema mappings for mediated data integration. In: Bhalla S (ed) Databases in networked information systems, pp 103–132Google Scholar
  4. 4.
    Altenschmidt C, Biskup J, Flegel U, Karabulut Y (2003) Secure mediation: requirements, design, and architecture. J Comput Secur 11(3):365–398, MarchGoogle Scholar
  5. 5.
    Amir A et al (2003) IBM research TRECVID-2003 video retrieval system. Proceedings of NIST TRECVID workshop, Gaithersburg, MD, USA, November 2003Google Scholar
  6. 6.
    Argillander J, Iyengar G, Nock H (2005) Semantic annotation of multimedia using maximum entropy models. Proceedings of IEEE international conference on acoustics, speech, and signal processing, (ICASSP ’05), March 2005Google Scholar
  7. 7.
    Athanasiadis Th, Avrithis Y (2004) Adding semantics to audiovisual content. Proceedings of the international conference for image and video retrieval (CIVR ’04), Dublin, Ireland, July 2004Google Scholar
  8. 8.
    Athanasiadis Th, Tzouvaras V, Petridis K, Precioso F, Avrithis Y, Kompatsiaris Y (2005) Using a multimedia ontology infrastructure for semantic annotation of multimedia content. Proceedings of the 5th international workshop on knowledge markup and semantic annotation (SemAnnot ’05). Galway, Ireland, November 2005Google Scholar
  9. 9.
    Baeza-Yates RA, Ribeiro-Neto BA (1999) Modern information retrieval. ACM Press/Addison-WesleyGoogle Scholar
  10. 10.
    Benitez AB, Chang S-F (2003) Extraction, description and application of multimedia using MPEG-7. Proceedings of the 37th Asilomar conference on signals, systems and computers. Pacific Grove, California, USA, November 2003Google Scholar
  11. 11.
    Benitez AB, Chang S-F (2003) Image classification using multimedia knowledge networks. Proceedings of the IEEE international conference on image processing (ICIP’03). Barcelona, Spain 2003Google Scholar
  12. 12.
    Benitez AB et al (2000) Object-based multimedia content description schemes and applications for MPEG-7. Image Communication Journal 16:235–269 (invited paper on a special issue on MPEG-7)Google Scholar
  13. 13.
    Benitez AB, Chang S-F, Smith JR (2001) “IMKA: a multimedia organization system combining perceptual and semantic knowledge”. Proceedings of the 9th ACM multimedia, Ottawa, Canada 2001Google Scholar
  14. 14.
    Benitez AB, Zhong D, Chang S, Smith J (2001) MPEG-7 MDS content description tools and applications. Proceedings of the international conference on computer analysis of images and patterns (CAIP), Warsaw, PolandGoogle Scholar
  15. 15.
    Benitez AB et al (2002) Semantics of multimedia in MPEG-7. Proceedings of the IEEE international conference on image processing, vol. 1, pp 137–140CrossRefGoogle Scholar
  16. 16.
    Benkhalifa M, Bensaid A, Mouradi A (1999) Text categorization using the semi-supervised fuzzy c-means algorithm”. Proceedings of the 18th international conference of the North American Fuzzy Information Processing Society-NAFIPS, pp 561–565Google Scholar
  17. 17.
    Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 28(5):34–43Google Scholar
  18. 18.
    Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37(4):177–196CrossRefMathSciNetGoogle Scholar
  19. 19.
    Bertini M, Cucchiara R, Del Bimbo A, Torniai C (2005) Video annotation with pictorially enriched ontologies. Proceedings of the IEEE international conference on multimedia and expo, Amsterdam, The Netherlands, July 2005Google Scholar
  20. 20.
    Bertini M, Del Bimbo A, Torniai C (2005) Automatic video annotation using ontologies extended with visual information. Proceedings of the 13th annual ACM international conference on Multimedia, Singapore, November 2005Google Scholar
  21. 21.
    Biskup J, Freitag J, Karabulut Y, Sprick B (1997) A mediator for multimedia systems. Proceedings of the 3rd international workshop on multimedia information systems, Como, Italy, September 1997Google Scholar
  22. 22.
    Bloehdorn S et al (2005) Semantic annotation of images and videos for multimedia analysis. Lecture notes in computer science—The semantic web: research and applications, vol. 3532, Springer, pp 592–607Google Scholar
  23. 23.
    Burgin R (1995) The retrieval effectiveness of five clustering algorithms as a function of indexing exhaustivity. J Am Soc Inf Sci 46(8):562–572CrossRefMathSciNetGoogle Scholar
  24. 24.
    Burnett I et al (2003) MPEG-21 goals and achievements. IEEE Multimedia 10(4):60–70CrossRefGoogle Scholar
  25. 25.
    Cai L, Hofmann T (2003) Text categorization by boosting automatically extracted concepts. Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, Toronto, Canada, July/August 2003, pp 182–189Google Scholar
  26. 26.
    Cutting D, Karger DR, Pedersen JO, Tukey JW (1992) Scatter/Gather: a cluster-based approach to browsing large document collections. Proceedings of the ACM/SIGIR, pp 318–329Google Scholar
  27. 27.
    Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci 41(6):391–407CrossRefGoogle Scholar
  28. 28.
    Denoyer L, Gallinari P, Vittaut J-N, Brunesseaux S (2003) Structured multimedia document classification. Proceedings of the ACM DOCENG conference, Grenoble, FranceGoogle Scholar
  29. 29.
    Doerr M, Hunter J, Lagoze C (2003) Towards a core ontology for information integration. J Digit Inf 4(1), AprilGoogle Scholar
  30. 30.
    Dorai C, Venkatesh S (2001) Computational media aesthetics: finding meaning beautiful. IEEE Multimed 8(4):10–12CrossRefGoogle Scholar
  31. 31.
    Fagin R, Kumar R, Sivakumar D (2003) Efficient similarity search and classification via rank aggregation. Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, California, USA, June 2003, pp 301–312Google Scholar
  32. 32.
    Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66:614–656zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    García R, Celma O (2005) Semantic integration and retrieval of multimedia metadata. Proceedings of the 5th international workshop on knowledge markup and semantic annotation (SemAnnot), Galway, Ireland, November 2005Google Scholar
  34. 34.
    Gruber TR (1993) A translation approach to portable ontology specification. Knowl Acquis 5:199–220CrossRefGoogle Scholar
  35. 35.
    Hauptmann AG (2004) Towards a large scale concept ontology for broadcast video. Proceedings of the 3rd international conference on image and video retrieval (CIVR’04), Dublin, Ireland, July 2004Google Scholar
  36. 36.
    Hauptmann AG (2005) Lessons for the future from a decade of informedia video analysis research. Lect Notes Comput Sci 3568:1–10CrossRefGoogle Scholar
  37. 37.
    Hauptmann AG, Yan R, Ng TD, Lin W, Jin R, Derthick M, Christel M, Chen M, Baron R (2002) Video classification and retrieval with the informedia digital video library system. Proceedings of the text and retrieval conference (TREC02), Gaithersburg, MD, USA, November 2002Google Scholar
  38. 38.
    Hauptmann AG et al (2003) Informedia at TRECVID 2003: analyzing and searching broadcast news video. Proceedings of the NIST TRECVID workshop, Gaithersburg, MD, USA, November 2003Google Scholar
  39. 39.
    Henderson JM, Hollingworth A (1999) High level scene perception. Annu Rev Psychol 50:243–271CrossRefGoogle Scholar
  40. 40.
    Hofmann T (1999) Probabilistic latent semantic indexing. Proceedings of the 22nd ACM-SIGIR international conference on research and development in information retrieval, pp 50–57Google Scholar
  41. 41.
    Hollink L, Worring M, Schreiber G (2005) Building a visual ontology for video retrieval. Proceedings of the ACM multimedia, Singapore, November 2005Google Scholar
  42. 42.
    Hoogs A, Rittscher J, Stein G, Schmiederer J (2003) Video content annotation using visual analysis and a large semantic knowledgebase. Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), Madison, Wisconsin, USA, June 2003Google Scholar
  43. 43.
    Hunter J (1999) A proposal for an MPEG-7 description definition language. MPEG-7 AHG test and evaluation meeting, Lancaster, February 1999Google Scholar
  44. 44.
    Hunter J (2001) Adding multimedia to the semantic web-building an MPEG-7 ontology. Proceedings of the international semantic web working symposium (SWWS), California, USA, July 30–August 1Google Scholar
  45. 45.
    Hunter J (2003) Enhancing the semantic interoperability of multimedia through a core ontology. IEEE Trans Circuits Syst Video Technol 13(1):49–58CrossRefGoogle Scholar
  46. 46.
    ISO/IEC FDIS 15938-5, ISO/IEC JTC 1/SC 29 M 4242 (2001) Information technology multimedia content description interface Part 5: multimedia description schemes, pp 442–448, October 2001Google Scholar
  47. 47.
    Klir G, Bo Yuan (1995) Fuzzy sets and fuzzy logic, theory and applications. Prentice Hall, New JerseyzbMATHGoogle Scholar
  48. 48.
    Landauer T, Foltz P, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25:259–284CrossRefGoogle Scholar
  49. 49.
    MacLeod K (1990) An application specific neural model for document clustering. Proceedings of the 4th annual parallel processing symposium, vol. 1, pp 5–16Google Scholar
  50. 50.
    Mich O, Brunelli R, Modena CM (1999) A survey on video indexing. J Vis Commun Image Represent 10:78–112CrossRefGoogle Scholar
  51. 51.
    Milanese R (1993) Detecting salient regions in an image: from biology to implementation. PhD Thesis, University of Geneva, SwitzerlandGoogle Scholar
  52. 52.
    Miyamoto S (1990) Fuzzy sets in information retrieval and cluster analysis. Kluwer Academic Publishers, Dordrecht/Boston/LondonzbMATHGoogle Scholar
  53. 53.
    MPEG-21 Overview v.5, ISO/IEC JTC1/SC29/WG11/N5231, Shanghai, October 2002,
  54. 54.
    Mylonas Ph, Avrithis Y (2005) Context modeling for multimedia analysis and use. Proceedings of the 5th international and interdisciplinary conference on modeling and using context (CONTEXT ‘05), Paris, France 2005Google Scholar
  55. 55.
    Naphade M, Huang T (2001) A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans Multimedia 3(1):141–151CrossRefGoogle Scholar
  56. 56.
    Naphade MR, Kozintsev IV, Huang TS (2002) A factor graph framework for semantic video indexing. IEEE Trans Circuits Syst Video Technol 12(1):40–52, JanuaryCrossRefGoogle Scholar
  57. 57.
  58. 58.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comp Vis 42:145–175zbMATHCrossRefGoogle Scholar
  59. 59.
    Osberger W, Maeder AJ (1998) Automatic identification of perceptually important regions in an image. Proceedings of IEEE International Conference on Pattern RecognitionGoogle Scholar
  60. 60.
    Papadopoulos G, Mylonas Ph, Mezaris V, Avrithis Y, Kompatsiaris I (2006) Knowledge-assisted image analysis based on context and spatial optimization. International Journal on Semantic Web and Information Systems 2(3):17–36Google Scholar
  61. 61.
    Petridis K et al (2006) Knowledge representation and semantic annotation of multimedia content. IEE Proc Vis Image Signal Process (special issue on knowledge-based digital media processing) 153(3):255–262, June 2006Google Scholar
  62. 62.
    Rapantzikos K, Avrithis Y, Kollias S (2005) On the use of spatiotemporal visual attention for video classification”. Proceedings of international workshop on very low bitrate video coding (VLBV '05), Sardinia, Italy, September 2005Google Scholar
  63. 63.
    Sahami et al (1997) Real-time full-text clustering of networked documents. Proceedings of the National Conference on Artificial Intelligence, p 845Google Scholar
  64. 64.
    Salembier P, Smith JR (2001) MPEG-7 multimedia description schemes. IEEE Trans Circuits Syst Video Technol 11(6):748–759CrossRefGoogle Scholar
  65. 65.
    Schutze et al (1997) Craig projections for efficient document clustering. SIGIR Forum (ACM Special Interest Group on Information Retrieval), pp 74–81Google Scholar
  66. 66.
    Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47CrossRefGoogle Scholar
  67. 67.
    Sikora T (2001) The MPEG-7 Visual standard for content description—an overview. IEEE Trans Circuits Syst Video Technol (special issue on MPEG-7) 11(6):696–702CrossRefMathSciNetGoogle Scholar
  68. 68.
    Simou N, Saathoff C, Dasiopoulou S, Spyrou E, Voisine N, Tzouvaras V, Kompatsiaris I, Avrithis Y, Staab S (2005) An ontology infrastructure for multimedia reasoning. International workshop VLBV05, Sardinia, Italy, September 2005Google Scholar
  69. 69.
    Simou N, Tzouvaras V, Avrithis Y, Stamou G, Kollias S (2005) A visual descriptor ontology for multimedia reasoning. Proceedings of the workshop on image analysis for multimedia interactive services (WIAMIS ’05), Montreux, Switzerland, April 2005Google Scholar
  70. 70.
    Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22:1349–1380CrossRefGoogle Scholar
  71. 71.
    Smith JR (2004) Video indexing and retrieval using MPEG-7. In: B Furht, O Marques (eds) The handbook of image and video databases: design and applications. CRC PressGoogle Scholar
  72. 72.
    Smith JR (2006) “MARVEL: Multimedia Analysis and Retrieval System”, (November 6)
  73. 73.
    Snoek C et al (2005) MediaMill: exploring news video archives based on learned semantics. Proceedings of ACM Multimedia, Singapore, November 2005Google Scholar
  74. 74.
    Snoek C, Worring M, Geusebroek J-M, Koelma D, Seinstra F, Smeulders A (2006) The semantic pathfinder for generic news video indexing. Proceedings of the 2006 international conference on multimedia and expo (ICME), Toronto, Canada, July 2006Google Scholar
  75. 75.
    Snoek C, Worring M, Hauptmann A (2006) Learning rich semantics from news video archives by style analysis. ACM Transactions on Multimedia Computing, Communications and Applications, 2(2):91–108Google Scholar
  76. 76.
    Staab S, Studer R (2004) Handbook on ontologies. International handbooks on information systems. Springer-Verlag, Heidelberg, New YorkGoogle Scholar
  77. 77.
    Stamou G, Kollias S (eds) (2005) Multimedia content and the semantic web: methods, standards and tools. Wiley & Sons LtdGoogle Scholar
  78. 78.
    Theodoridis S, Koutroumbas K (1998) Pattern recognition. Academic PressGoogle Scholar
  79. 79.
    Troncy R (2003) Integrating structure and semantics into audio-visual documents. Proceedings of the 2nd international semantic web conference (ISWC'03), LNCS 2870, Florida, USA, October 2003, pp 566–581Google Scholar
  80. 80.
    Tsechpenakis G, Akrivas G, Andreou G, Stamou G, Kollias S (2002) Knowledge-assisted video analysis and object detection. Proceedings of European symposium on intelligent technologies, hybrid systems and their implementation on smart adaptive systems (Eunite02), Albufeira, Portugal, September 2002Google Scholar
  81. 81.
    Tsinaraki C, Polydoros P, Christodoulakis S (2004) Integration of OWL ontologies in MPEG-7 and TVAnytime compliant Semantic Indexing. Proceedings of the 16th international conference on advanced information systems engineering (CAiSE 2004), Riga, Latvia, June 2004Google Scholar
  82. 82.
    Tzitzikas Y, Meghini C, Spyratos N (2004) Towards a generalized interaction scheme for information access. Foundations of information and knowledge systems: third international symposium (FoIKS 2004), Wilheminenburg Castle, Austria, February 17–20, 2004Google Scholar
  83. 83.
    Voisine N, Dasiopoulou S, Mezaris V, Spyrou E, Athanasiadis Th, Kompatsiaris I, Avrithis Y, Strintzis MG (2005) Knowledge-assisted video analysis using a genetic algorithm. Proceedings of the 6th international workshop on image analysis for multimedia interactive services (WIAMIS 2005), April 2005Google Scholar
  84. 84.
    Wallace M, Akrivas G, Mylonas Ph, Avrithis Y, Kollias S (2003) Using context and fuzzy relations to interpret multimedia content. Proceedings of the 3rd international workshop on content-based multimedia indexing (CBMI), IRISA, Rennes, France, September 2003Google Scholar
  85. 85.
    Wallace M, Avrithis Y, Stamou G, Kollias S (2005) Knowledge-based multimedia content indexing and retrieval. In: Stamou G, Kollias S (eds) Multimedia content and semantic web: methods, standards and tools. WileyGoogle Scholar
  86. 86.
    Wallace M, Avrithis Y, Kollias S (2006) Computationally efficient sup-t transitive closure for sparse fuzzy binary relations. Fuzzy Sets Syst 157(3):341–372zbMATHCrossRefMathSciNetGoogle Scholar
  87. 87.
    Willett P (1988) Recent trends in hierarchic document clustering: a critical review. Inf Process Manag 24(5):577–597CrossRefGoogle Scholar
  88. 88.
    W3C, Semantic Web, (November 6, 2006).
  89. 89.
    W3C, SWBPD MM Task Force Description, (November 6, 2006).
  90. 90.
    W3C, Web Ontology Language-OWL, (November 6, 2006).
  91. 91.
    W3C, XML Schema, (November 6, 2006).
  92. 92.
    Zhao R, Grosky WI (2002) Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimedia (special issue on multimedia databases) 4(2), June 2002Google Scholar
  93. 93.
    Zhong D, Chang S-F (1999) An integrated system for content-based video object segmentation and retrieval. IEEE Trans Circuits Syst Video Technol 9(8):1259–1268, DecemberCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Phivos Mylonas
    • 1
    Email author
  • Thanos Athanasiadis
    • 1
  • Manolis Wallace
    • 2
  • Yannis Avrithis
    • 1
  • Stefanos Kollias
    • 1
  1. 1.School of Electrical and Computer EngineeringNational Technical University of AthensAthensGreece
  2. 2.Department of Computer ScienceUniversity of Indianapolis, Athens CampusAthensGreece

Personalised recommendations