Skip to main content
Log in

A three-level architecture for bridging the image semantic gap

  • Original Research
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Image retrieval systems face the problem of dealing with the different ways to apprehend the content of images and in particular the difficulty to characterize the visual semantics. To address this issue, we examine the use of three abstract levels of representation, namely Signal, Object and Semantic. At the Signal Level, we propose a framework mapping the extracted low-level features to symbolic signal descriptors. The Object Level features a statistical model considering the joint distribution of object concepts (such as mountains, sky…) and the symbolic signal descriptors. At the Semantic Level, signal and object characterizations are coupled within a logic-based framework. The latter is instantiated by a knowledge representation formalism allowing to define an expressive query language consisting of several boolean and quantification operators. Our architecture therefore makes it possible to process topic-based queries. Experimentally, we evaluate our theoretical proposition on a corpus of real-world photographs and the TRECVid corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Smeulders, A., et al.: Content-based image retrieval at the end of the early years. IEEE PAMI 22(12), 1349–1380 (2000)

    Google Scholar 

  2. Mojsilovic, A., Rogowitz, B.: Capturing image semantics with low-level descriptors. ICIP, pp.18–21 (2001)

  3. Blei, D.M., Jordan, M.I.: Modeling annotated data. SIGIR, pp. 127–134 (2003)

  4. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern. Anal. Mach. Intell. 29(3), 394–410 (2007)

    Google Scholar 

  5. Feng, S., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. CVPR 2, 1002–1009 (2004)

    Google Scholar 

  6. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. SIGIR, pp. 119–126 (2003)

  7. Li, J., Wang, J.Z.: Real-time computerized annotation of pictures. IEEE PAMI 30(6), 985–1002 (2008)

    Google Scholar 

  8. Liu, J., et al.: Dual cross-media relevance model for image annotation. ACM MM, pp. 605–614 (2007)

  9. Jin, Y., et al.: Image annotations by combining multiple evidence and wordNet. ACM MM, pp. 706–715 (2005)

  10. Srikanth, M. et al.: Exploiting Ontologies for Automatic Image Annotation. ACM SIGIR, pp. 1349–1380 (2005)

  11. Bradshaw, B.: Semantic based image retrieval: a probabilistic approach. ACM MM, pp. 167–176 (2000)

  12. Lim, J., Jin, J.S.: A structured learning framework for content-based image indexing and visual query. Multimed. Syst. 10(4), 317–331 (2005)

    Article  Google Scholar 

  13. Town, C.P., Sinclair, D.: CBIR Using Semantic Visual Categories. TR2000-14, AT&T Labs Cambridge (2000)

  14. Mulhem, P., et al.: Advances in Digital Home Image Albums. Multimedia Systems and Content-Based Image Retrieval, Idea Publishing, chapter IX, pp. 201–226 (2003)

  15. Mechkour, M.: EMIR2: An Extended Model for Image Representation and Retrieval. DEXA, pp. 395–404 (1995)

  16. Meghini, C., et al.: A model of multimedia information retrieval. J. ACM 48(5), 909–970 (2001)

    Article  MathSciNet  Google Scholar 

  17. Berlin, B., Kay, P.: Basic Color Terms. Their Universality and Evolution. UC Press, Berkeley (1991)

    Google Scholar 

  18. Bhushan, N., et al.: The texture lexicon: understanding the categorization of visual texture terms and their relationship to texture images. Cogn. Sci. 21(2), 219–246 (1997)

    Article  Google Scholar 

  19. Peters, S., Westerthal, D.: Quantifiers. MIT Press, Cambridge, MA (2002)

    Google Scholar 

  20. Kender, J.R., et al.: IBM Research TRECVID Video Retrieval System. In: Online Proceedings of the TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html#2004

  21. Ianeva, T., et al.: Probabilistic approaches to video retrieval. TREC video retrieval evaluation online proceedings. http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/cwi-twente.pdf

  22. Gong, Y., et al.: Image indexing and retrieval based on color histograms. Multimed. Tools App. II, 133–156 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Belkhatir.

Additional information

Communicated by Wei-Ying Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Belkhatir, M. A three-level architecture for bridging the image semantic gap. Multimedia Systems 17, 135–148 (2011). https://doi.org/10.1007/s00530-010-0207-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-010-0207-8

Keywords

Navigation