Chapter 8: Multimedia and Multimodal Information Retrieval

  • Alessandro Bozzon
  • Piero Fraternali
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5950)


The Web is progressively becoming a multimedia content delivery platform. This trend poses severe challenges to the information retrieval theories, techniques and tools. This chapter defines the problem of multimedia information retrieval with its challenges and application areas, overviews its major technical issues, proposes a reference architecture unifying the aspects of content processing and querying, exemplifies a next-generation platform for multimedia search, and concludes by showing the close ties between multi-domain search investigated in Search Computing and multimodal/multimedia search.


multimedia information retrieval digital signal processing video search engines multi-modal query interfaces 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adistambha, K., Döller, M., Tous, R., Gruhne, M., Sano, M., Tsinaraki, C., Christodoulakis, S., Yoon, K., Ritz, C., Burnett, I.: The MPEG-7 Query Format: A New Standard in Progress for Multimedia Query by Content. In: Proceedings of the 7th International IEEE Symposium on Communications and Information Technologies (ISCIT 2007), pp. 479–484 (2007)Google Scholar
  2. 2.
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 1st edn. Addison Wesley, Reading (1999)Google Scholar
  4. 4.
    Baldi, A., Murace, R., Dragonetti, E., Manganaro, M., Guerra, O., Bizzi, S., Galli, L.: Definition of an automated Content-Based Image Retrieval (CBIR) system for the comparison of dermoscopic images of pigmented skin lesions. Biomed. Eng. Online (2009)Google Scholar
  5. 5.
    Barbieri, M., Agnihotri, L., Dimitrova, N.: Internet Multimedia Management Systems IV. In: Proceedings of the SPIE, vol. 5242, pp. 1–13 (2003)Google Scholar
  6. 6.
    Beitzel, S.M., Jensen, E.C., Grossman, D.A.: Retrieving OCR Text: A Survey of Current Approaches. In: Symposium on Document Image Understanding Technologies, SDUIT (2003)Google Scholar
  7. 7.
    Blinkx – Video Search Engine (2009),
  8. 8.
    BMat - 2009 (2009),
  9. 9.
    Bozzon, A., Brambilla, M., Fraternali, P.: Model-Driven Design of Audiovisual Indexing Processes for Search-Based Applications. In: 7th IEEE International Workshop on Content-Based Multimedia Indexing, pp. 120–125. IEEE Press, New York (2009)Google Scholar
  10. 10.
    Bozzon, A., Brambilla, M., Fraternali, P., Nucci, F., Debald, S., Moore, E., Neidl, W., Plu, M., Aichroth, P., Pihlajamaa, O., Laurier, C., Zagorac, S., Backfried, G., Weinland, D., Croce, V.: Pharos: an audiovisual search platform. In: Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in information Retrieval, SIGIR 2009, Boston, MA, USA, July 19 - 23, p. 841. ACM, New York (2009)Google Scholar
  11. 11.
    Buchenwald demonstrator, University of Twente (2009),
  12. 12.
    Caringella, N., Zoia, G., Mlynek, D.: Automatic genre classification of music content: a survey. IEEE Signal Processing Magazine 23(2), 133–141 (2006)CrossRefGoogle Scholar
  13. 13.
    Carrato, K.S.: Temporal video segmentation: a survey. Signal Processing: Image Communication 16, 477–500 (2001)Google Scholar
  14. 14.
    Cees, G.M.: Concept-Based Video Retrieval. Foundations and Trends in Information Retrieval 4(2), 215–322 (2009)Google Scholar
  15. 15.
    Cotsaces, C., Nikolaidis, N., Pitas, I.: Video Shot Boundary Detection and Condensed Representation: A Review. IEEE Signal Processing Magazine (2006)Google Scholar
  16. 16.
    Delve Networks - Online Video Platform and Content Management (2009),
  17. 17.
    Devlin, B., Wilkinson, J.: The Material Exchange Format. In: Gilmer, B. (Hrsg.) File Interchange Handbook, pp. 123–176. Elsevier Inc., Focal Press (2004)Google Scholar
  18. 18.
    Diou, C., Papachristou, C., Panagiotopoulos, P., Stephanopoulos, G., Dimitriou, N., Delopoulos, A., Rode, H., Aly, R., de Vries, A.P., Tsikrika, T.: VITALAS at TRECVID 2008. In: Proceedings of the 6th TREC Video Retrieval Evaluation Workshop, Gaithersburg, USA, November 17-18 (2008)Google Scholar
  19. 19.
    Dublin Core Metadata Initiative (2009),
  20. 20.
    Empora Online Shop (2009),
  21. 21.
    Eu, H., Hedge, A.: Survey of continuous speech recognition software usability. Cornell University, Ithaca, NY (1999), (retrieved April 5, 2004)
  22. 22.
    Eyealike platform for facial similarity (2009),
  23. 23.
    Facesaerch (2009),
  24. 24.
    Geurts, J., van Ossenbruggen, J., Hardman, L.: Requirements for practical multimedia annotation. In: Workshop on Multimedia and the Semantic Web Heraklion, Crete, pp. 4–11 (2005)Google Scholar
  25. 25.
  26. 26.
    Google Images (2009),
  27. 27.
    Google Picasa (2009),
  28. 28.
    Hanbury, A.: A survey of methods for image annotation. Journal of Visual Languages and Computing 19(5), 617–627 (2008)CrossRefGoogle Scholar
  29. 29.
    Henrich, A., Robbert, G.: Combining multimedia retrieval and text retrieval to search structured documents in digital libraries. In: Proc. 1st DELOS Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland, vol. 01/W001 (2000)Google Scholar
  30. 30.
    Henrich, A., Robbert, G.: POQLMM: A Query Language for Structured Multimedia Documents. In: Proceedings of the First International Workshop on Multimedia Data and Document Engineering, Lyon, France, pp. 17–26 (2001)Google Scholar
  31. 31.
    Japan Electronics and Information Technology Industries Association: Exchangeable image file format for digital still cameras: EXIF. Version 2.2 (2002)Google Scholar
  32. 32.
  33. 33.
    IST SAPIR Large Scale Multimedia Search and P2P (2009),
  34. 34.
    International Press Telecommunications Council (2009),
  35. 35.
    Le, T.H., Thonnat, M., Boucher, A., Bremond, F.: A Query Language Combining Object Features and Semantic Events for Surveillance Video Retrieval. In: Proceedings of Advances in Multimedia Modeling, 14th MMM Conference, Kyoto, Japan, pp. 307–317 (2008)Google Scholar
  36. 36.
    Learning Object Metadata (2009),
  37. 37.
    Lew, M., et al.: Content-Based Multimedia Information Retrieval: State of the Art and Challenges. ACM Transactions on Multimedia Computing, Communications, and Applications 2(1) (2006)Google Scholar
  38. 38.
    Liu, Y., Zhang, D., Lu, G., Ma, W.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)CrossRefzbMATHGoogle Scholar
  39. 39.
    LSCOM Lexicon Definitions and Annotations (2009),
  40. 40.
    LTU technologies (2009),
  41. 41.
    Martínez, J.M.: MPEG-7 Overview (version 10), ISO/IEC JTC1/SC29/WG11N6828, Palma de Mallorca (2004)Google Scholar
  42. 42.
    Manjunath, B.S., Salembier, P., Sikora, T.: Introduction to MPEG-7: Multimedia Content Description Interface, 396 p. Wiley, Chichester (2002)Google Scholar
  43. 43.
    Maragos, P., Potamianos, A., Gros, P.: Multimodal Processing and Interaction, Audio, Video, Text. Multimedia Systems and Applications, vol. 33. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  44. 44.
    Marsden, A., Mackenzie, A., Lindsay, A.: Tools for Searching, Annotation and Analysis of Speech, Music, Film and Video; A Survey. Literary and Linguistic Computing 22(4), 469–488 (2007)CrossRefGoogle Scholar
  45. 45.
  46. 46.
    Meyers, O.C.: A Mood-Based Music Classification and. Exploration System, MS Thesis, Massachusetts Institute of. Technology (MIT), USA (2007)Google Scholar
  47. 47.
    MiDoMi (2009),
  48. 48.
    Microsoft Bing (2009),
  49. 49.
    MPEG Industry Forum (2009),
  50. 50.
    Ngo, C., Chan, C.: Video text detection and segmentation for optical character recognition. Multimedia Systems 10(3), 261–272 (2004)CrossRefGoogle Scholar
  51. 51.
    Petrovska-Delacrétaz, D., El Hannani, A., Chollet, G.: Text-Independent Speaker Verification: State of the Art and Challenges. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.) COST 277. LNCS, vol. 4391, pp. 135–169. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  52. 52.
    Pictron Solutions (2009),
  53. 53.
    Pixta, Visual search technologies (2009),
  54. 54.
    Pluggd Podcast Search Engine (2009),
  55. 55.
  56. 56.
    Podscope Podcast Search Engine (2009),
  57. 57.
    Potamitis, I., Ganchev, T.: Generalized recognition of sound events: Approaches and applications. Studies in Computational Intelligence, vol. 120, pp. 41–79. Springer, Heidelberg (2008)Google Scholar
  58. 58.
    Radio Oranje speech search, Univeristy of Twente (2009),
  59. 59.
    Radke, R.J., Andra, S., Al-Kofahi, O., Roysam, B.: Image change detection algorithms: a systematic survey. IEEE Transactions on Image Processing 14(3), 294 (2005)MathSciNetCrossRefGoogle Scholar
  60. 60.
    Rasheed, Z., Shah, M.: Scene detection in Hollywood movies and TV shows. In: Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (2003)Google Scholar
  61. 61.
  62. 62.
  63. 63.
    Sacco, S.M., Tzitzikas, Y.: Dynamic Taxonomies and Faceted Search, Theory, Practice, and Experience. The Information Retrieval Series, vol. 25, p. 340. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  64. 64.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (November1975) (2003)CrossRefzbMATHGoogle Scholar
  65. 65.
    SHAZAM (2009),
  66. 66.
  67. 67.
  68. 68.
    TILTOMO (2009),
  69. 69.
    Tineye, Image Search Engine (2009),
  70. 70.
    The 3GP video standard (2009),
  71. 71.
    The DAML Ontology Library (2009),
  72. 72.
    The Internet Movie Database (2009),
  73. 73.
    The Theseus programme (2009),
  74. 74.
    The Quaero Program (2009),
  75. 75.
    Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology 18(11), 1473–1488 (2008)CrossRefGoogle Scholar
  76. 76.
    Typke, R., Wiering, F., Veltkamp, R.C.: A survey of music information retrieval systems. In: ISMIR 2005, pp. 153–160 (2005)Google Scholar
  77. 77.
  78. 78.
    Wang, C.C., Wang, J., Li, J., Sun, J.G., Shi, S.: MuSQL: A Music Structured Query Language. In: Cham, T.-J., Cai, J., Dorai, C., Rajan, D., Chua, T.-S., Chia, L.-T. (eds.) MMM 2007. LNCS, vol. 4352, pp. 216–225. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  79. 79.
    Wattamwar, S.S., Ghosh, H.: Spatio-temporal query for multimedia databases. In: Proceeding of the 2nd ACM Workshop on Multimedia Semantics (MS 2008), pp. 48–55. ACM, New York (2008)CrossRefGoogle Scholar
  80. 80.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Comput. Survey 38(4) (2006)Google Scholar
  81. 81.
    Yu, G., Chen, Y., Shih, K.: A Content-Based Image Retrieval System for Outdoor Ecology Learning: A Firefly Watching System. In: International Conference on Advanced Information Networking and Applications, vol. 2, p. 112 (2004)Google Scholar
  82. 82.
    Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: A literature survey. ACM Comput. Surv. 35(4), 399–458 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Alessandro Bozzon
    • 1
  • Piero Fraternali
    • 1
  1. 1.Dipartimento di Elettronica e InformazionePolitecnico di MilanoMilanoItaly

Personalised recommendations