Retrieval of Multimedia Objects by Combining Semantic Information from Visual and Textual Descriptors

  • Mats Sjöberg
  • Jorma Laaksonen
  • Matti Pöllä
  • Timo Honkela
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4132)


We propose a method of content-based multimedia retrieval of objects with visual, aural and textual properties. In our method, training examples of objects belonging to a specific semantic class are associated with their low-level visual descriptors (such as MPEG-7) and textual features such as frequencies of significant keywords. A fuzzy mapping of a semantic class in the training set to a class of similar objects in the test set is created by using Self-Organizing Maps (SOMs) trained from automatically extracted low-level descriptors. We have performed several experiments with different textual features to evaluate the potential of our approach in bridging the gap from visual features to semantic concepts by the use textual presentations. Our initial results show a promising increase in retrieval performance.


Textual Feature Video Clip Machine Translation Retrieval Performance Semantic Concept 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Laaksonen, J., Koskela, M., Oja, E.: PicSOM—Self-organizing image retrieval with MPEG-7 content descriptions. IEEE Transactions on Neural Networks, Special Issue on Intelligent Multimedia Processing 13(4), 841–853 (2002)Google Scholar
  2. 2.
    Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer Series in Information Sciences, vol. 30. Springer, Berlin (2001)zbMATHGoogle Scholar
  3. 3.
    Koskela, M., Laaksonen, J., Sjöberg, M., Muurinen, H.: PicSOM experiments in TRECVID 2005. In: Proceedings of the TRECVID 2005 Workshop, Gaithersburg, MD, USA, pp. 262–270 (2005)Google Scholar
  4. 4.
    MPEG: MPEG-7 visual part of the eXperimentation Model (version 9.0) (2001), ISO/IEC JTC1/SC29/WG11 N3914Google Scholar
  5. 5.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Waibel, A., Lee, K. (eds.) Readings in speech recognition, pp. 65–74. Morgan Kaufmann Publishers Inc., San Francisco (1990)Google Scholar
  6. 6.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. Computer Science Series. McGraw-Hill, New York (1983)Google Scholar
  7. 7.
    Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation, (Draft, Unpublished)
  8. 8.
    Koskela, M., Laaksonen, J., Oja, E.: Use of image subset features in image retrieval with self-organizing maps. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 508–516. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mats Sjöberg
    • 1
  • Jorma Laaksonen
    • 1
  • Matti Pöllä
    • 1
  • Timo Honkela
    • 1
  1. 1.Laboratory of Computer and Information ScienceHelsinki University of TechnologyHUTFinland

Personalised recommendations