International Journal of Computer Vision

, Volume 72, Issue 2, pp 133–157 | Cite as

Semantic Modeling of Natural Scenes for Content-Based Image Retrieval

  • Julia Vogel
  • Bernt Schiele


In this paper, we present a novel image representation that renders it possible to access natural scenes by local semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic concept classes such as water, rocks, or foliage. Images are represented through the frequency of occurrence of these local concepts. Through extensive experiments, we demonstrate that the image representation is well suited for modeling the semantic content of heterogenous scene categories, and thus for categorization and retrieval.

The image representation also allows us to rank natural scenes according to their semantic similarity relative to certain scene categories. Based on human ranking data, we learn a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking. This result is especially valuable for content-based image retrieval where the goal is to present retrieval results in descending semantic similarity from the query.


semantic scene understanding content-based image retrieval scene clasification human scene preception perceptually based techniques computer vision 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barnard, K., Duygulu, P., de Freitas, N., and Forsyth, D. 2002. Object recognition as machine translation—part 2: Exploiting image data-base clustering models. In European Conference on Computer Vision ECCV’02, Copenhagen, Denmark.Google Scholar
  2. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., and Jordan, M.I. 2003. Matching words and pictures. Journal of Machine Learning Research, 3:1107–1135.CrossRefGoogle Scholar
  3. Bortz, J. 1999. Statistik für Sozialwissenschaftler, 5th edition. Springer.Google Scholar
  4. Boutell, M.R., Luo, J., Shen, X., and Brown. C.M. 2004. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771.CrossRefGoogle Scholar
  5. Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. Software available at:
  6. Comaniciu, D. and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5).Google Scholar
  7. Duygulu, P., Barnard, K., de Freitas, J.F.D., and Forsyth, D.A. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In European Conference on Computer Vision ECCV’02, Copenhagen, Denmark.Google Scholar
  8. Eakins, J.P. and Graham, M.E. 1999. Content-based image retrieval, a report to the JISC Technology Applications programme. Technical report, Institute for Image Data Research, University of Northumbria at Newcastle.Google Scholar
  9. Feng, S.L., Manmatha, R., and Lavrenko, V. 2004. Multiple bernoulli relevance models for image and video annotation. In Conference on Image and Video Retrieval CIVR’04, Dublin, Ireland.Google Scholar
  10. Feng, X., Fang, J., and Qiu, G. 2003. Color photo categorization using compressed histograms and support vector machines. In International Conference on Image Processing ICIP’03, Barcelona, Spain.Google Scholar
  11. Hsu, C.-W. and Lin, C.-J. 2002. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, 13(2):415–425.CrossRefGoogle Scholar
  12. Jain, R., Kasturi, R., and Schunck. B.G. 1995. Machine Vision. McGraw-Hill, Inc.Google Scholar
  13. Joachims, T. 2002. Learning to Classify Text using Support Vector Machines—Methods, Theory, and Algorithms. Kluwer Academic Publishers.Google Scholar
  14. Kline, P. 2000. Handbook of Psychological Testing, 2nd edition. Routledge.Google Scholar
  15. Kumar, S. and Hebert, M. 2003. Man-made structure detection in natural images using a causal multiscale random field. In Conference on Computer Vision and Pattern Recognition CVPR’03, Madison, Wisconsin, pp. 119–126.Google Scholar
  16. Lavrenko, V., Manmatha, R., and Jeon, J. 2003. A model for learning the semantics of pictures. In 17th Annual Conference on Neural Information Processing Systems NIPS’03, Vancouver, Canada.Google Scholar
  17. Li, J. and Wang, J.Z. 2003. Automatic linguistic indexing of pictures by a statistical modeling approach. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075–1088.CrossRefGoogle Scholar
  18. Lipson, P., Grimson, E., and Sinha. P. 1997. Configuration based scene classification and image indexing. In Conference on Computer Vision and Pattern Recognition CVPR’97, Puerto Rico, pp. 1007–1011.Google Scholar
  19. Maron, O. and Ratan, A.L. 1998. Multiple-instance learning for natural scene classification. In International Conference on Machine Learning ICML’98, Morgan Kaufmann, San Francisco, CA, pp. 341–349.Google Scholar
  20. Minka, T.P. and Picard, R.W. 1997. Interactive learning using a society of models. In IEEE Transactions on Pattern Recognition and Machine Intelligence, 30(4).Google Scholar
  21. Mojsilovic, A., Gomes, J., and Rogowitz, B. 2004. Semantic-friendly indexing and querying of images based on the extraction of the objective semantic cues. International Journal of Computer Vision, 56(1/2):79–107.CrossRefGoogle Scholar
  22. Murphy, G.L. 2002. The Big Book of Concepts. MIT Press.Google Scholar
  23. Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175.CrossRefGoogle Scholar
  24. Oliva, A. and Torralba, A. 2002. Scene-centered description from spatial envelope properties. In Second Workshop on Biologically Motivated Computer Vision BMCV’02, Tübingen, Germany.Google Scholar
  25. Oliva, A., Torralba, A., Guerin-Dugue, A., and Herault, J. 1999. Global semantic classification of scenes using power spectrum templates. In Challenge of Image Retrieval CIR, Newcastle, UK.Google Scholar
  26. Picard, R.W. and Minka, T.P. 1995. Vision texture for annotation. ACM Journal of Multimedia Systems.Google Scholar
  27. Rogowitz, B.E., Frese, T., Smith, J.R., Bouman, C.A., and Kalin, E. 1997. Perceptual image similarity experiments. In SPIE Conference on Human Vision and Electronic Imaging, San Jose, California, pp. 576–590.Google Scholar
  28. Rosch, E. 1978. Principles of categorization. In E. Rosch, and B.B. Lloyd, (Eds), Cognition and Categorization, Erlbaum.Google Scholar
  29. Rosch, E. and Mervis, C.B. 1975. Family resemblance: Studies in the internal structure of categories. Cognitive Psychology, 7:573–605.CrossRefGoogle Scholar
  30. Rosch, E., Simpson, C., and Miller, R.S. 1976. Structural bases of typicality effects. Journal of Experimental Psychology: Human Perception and Performance, 2:491–502.CrossRefGoogle Scholar
  31. Rui, Y., Huang, T.S., and Chang, S. 1999. Image retrieval: Current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 10:39–62.CrossRefGoogle Scholar
  32. Schwaninger, A., Vogel, J., Hofer, F., and Schiele, B. A psychophysically plausible model for typicality ranking of natural scenes. Submitted to ACM Transactions on Applied Perception.Google Scholar
  33. Sebe, N., Lew, M.S., Zhou, X., Huang, Th.S., and Bakker. E.M. 2003. The state of the art in image and video retrieval. In Conf. Image and Video Retrieval CIVR, Urbana-Champaign, IL, USA, pp. 1–8.Google Scholar
  34. Serrano, N., Savakis, A.E., and Luo, J. 2004. Improved scene classification using efficient low-level features and semantic cues. Pattern Recognition, 37(9):1773–1784.CrossRefGoogle Scholar
  35. Shi, J. and Malik, J. 1997. Normalised cuts and image segmentation. In Conference on Computer Vision and Pattern Recognition CVPR’97, Puerto Rico.Google Scholar
  36. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349–1380.CrossRefGoogle Scholar
  37. Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classification. In Workshop on Content-Based Access of Image and Video Databases, Bombay, India.Google Scholar
  38. Town, C.P. and Sinclair, D. 2000. Content based image retrieval using semantic visual categories. Technical Report 2000.14, AT&T Laboratories Cambridge.Google Scholar
  39. Tversky, B. and Hemenway, K. 1983. Categories of environmental scenes. Cognitive Psychology, 15:121–149.CrossRefGoogle Scholar
  40. Vailaya, A., Figueiredo, M.A., Jain, A.K., and Zhang, H.J. 2001. Image classification for content-based indexing. IEEE Transactions on Image Processing, 10(1):117–130.CrossRefGoogle Scholar
  41. Veltkamp, R.C. and Tanase, M. 2001. Content-based image retrieval systems: A survey. Technical report, Department of Computer Science, Utrecht University.Google Scholar
  42. Vogel, J. 2004. Semantic Scene Modeling and Retrieval. Number 33 in Selected Readings in Vision and Graphics. Hartung-Gorre, Verlag Konstanz.Google Scholar
  43. Wang, Y. and Zhang, H. 2001. Content-based image orientation detection with support vector machines. In Workshop on Content-Based Access of Image and Video Libraries CBAIVL’01, Kauai, Hawaii, USA.Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of British ColumbiaVancouverCanada
  2. 2.Computer Science DepartmentDarmstadt University of TechnologyGermany

Personalised recommendations