Skip to main content
Log in

Semantic Modeling of Natural Scenes for Content-Based Image Retrieval

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper, we present a novel image representation that renders it possible to access natural scenes by local semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic concept classes such as water, rocks, or foliage. Images are represented through the frequency of occurrence of these local concepts. Through extensive experiments, we demonstrate that the image representation is well suited for modeling the semantic content of heterogenous scene categories, and thus for categorization and retrieval.

The image representation also allows us to rank natural scenes according to their semantic similarity relative to certain scene categories. Based on human ranking data, we learn a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking. This result is especially valuable for content-based image retrieval where the goal is to present retrieval results in descending semantic similarity from the query.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Barnard, K., Duygulu, P., de Freitas, N., and Forsyth, D. 2002. Object recognition as machine translation—part 2: Exploiting image data-base clustering models. In European Conference on Computer Vision ECCV’02, Copenhagen, Denmark.

  • Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., and Jordan, M.I. 2003. Matching words and pictures. Journal of Machine Learning Research, 3:1107–1135.

    Article  Google Scholar 

  • Bortz, J. 1999. Statistik für Sozialwissenschaftler, 5th edition. Springer.

  • Boutell, M.R., Luo, J., Shen, X., and Brown. C.M. 2004. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771.

    Article  Google Scholar 

  • Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. Software available at: http://www.csie.ntu.edu.tw.

  • Comaniciu, D. and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5).

  • Duygulu, P., Barnard, K., de Freitas, J.F.D., and Forsyth, D.A. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In European Conference on Computer Vision ECCV’02, Copenhagen, Denmark.

  • Eakins, J.P. and Graham, M.E. 1999. Content-based image retrieval, a report to the JISC Technology Applications programme. Technical report, Institute for Image Data Research, University of Northumbria at Newcastle.

  • Feng, S.L., Manmatha, R., and Lavrenko, V. 2004. Multiple bernoulli relevance models for image and video annotation. In Conference on Image and Video Retrieval CIVR’04, Dublin, Ireland.

  • Feng, X., Fang, J., and Qiu, G. 2003. Color photo categorization using compressed histograms and support vector machines. In International Conference on Image Processing ICIP’03, Barcelona, Spain.

  • Hsu, C.-W. and Lin, C.-J. 2002. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, 13(2):415–425.

    Article  Google Scholar 

  • Jain, R., Kasturi, R., and Schunck. B.G. 1995. Machine Vision. McGraw-Hill, Inc.

  • Joachims, T. 2002. Learning to Classify Text using Support Vector Machines—Methods, Theory, and Algorithms. Kluwer Academic Publishers.

  • Kline, P. 2000. Handbook of Psychological Testing, 2nd edition. Routledge.

  • Kumar, S. and Hebert, M. 2003. Man-made structure detection in natural images using a causal multiscale random field. In Conference on Computer Vision and Pattern Recognition CVPR’03, Madison, Wisconsin, pp. 119–126.

  • Lavrenko, V., Manmatha, R., and Jeon, J. 2003. A model for learning the semantics of pictures. In 17th Annual Conference on Neural Information Processing Systems NIPS’03, Vancouver, Canada.

  • Li, J. and Wang, J.Z. 2003. Automatic linguistic indexing of pictures by a statistical modeling approach. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075–1088.

    Article  Google Scholar 

  • Lipson, P., Grimson, E., and Sinha. P. 1997. Configuration based scene classification and image indexing. In Conference on Computer Vision and Pattern Recognition CVPR’97, Puerto Rico, pp. 1007–1011.

  • Maron, O. and Ratan, A.L. 1998. Multiple-instance learning for natural scene classification. In International Conference on Machine Learning ICML’98, Morgan Kaufmann, San Francisco, CA, pp. 341–349.

  • Minka, T.P. and Picard, R.W. 1997. Interactive learning using a society of models. In IEEE Transactions on Pattern Recognition and Machine Intelligence, 30(4).

  • Mojsilovic, A., Gomes, J., and Rogowitz, B. 2004. Semantic-friendly indexing and querying of images based on the extraction of the objective semantic cues. International Journal of Computer Vision, 56(1/2):79–107.

    Article  Google Scholar 

  • Murphy, G.L. 2002. The Big Book of Concepts. MIT Press.

  • Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175.

    Article  Google Scholar 

  • Oliva, A. and Torralba, A. 2002. Scene-centered description from spatial envelope properties. In Second Workshop on Biologically Motivated Computer Vision BMCV’02, Tübingen, Germany.

  • Oliva, A., Torralba, A., Guerin-Dugue, A., and Herault, J. 1999. Global semantic classification of scenes using power spectrum templates. In Challenge of Image Retrieval CIR, Newcastle, UK.

  • Picard, R.W. and Minka, T.P. 1995. Vision texture for annotation. ACM Journal of Multimedia Systems.

  • Rogowitz, B.E., Frese, T., Smith, J.R., Bouman, C.A., and Kalin, E. 1997. Perceptual image similarity experiments. In SPIE Conference on Human Vision and Electronic Imaging, San Jose, California, pp. 576–590.

  • Rosch, E. 1978. Principles of categorization. In E. Rosch, and B.B. Lloyd, (Eds), Cognition and Categorization, Erlbaum.

  • Rosch, E. and Mervis, C.B. 1975. Family resemblance: Studies in the internal structure of categories. Cognitive Psychology, 7:573–605.

    Article  Google Scholar 

  • Rosch, E., Simpson, C., and Miller, R.S. 1976. Structural bases of typicality effects. Journal of Experimental Psychology: Human Perception and Performance, 2:491–502.

    Article  Google Scholar 

  • Rui, Y., Huang, T.S., and Chang, S. 1999. Image retrieval: Current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 10:39–62.

    Article  Google Scholar 

  • Schwaninger, A., Vogel, J., Hofer, F., and Schiele, B. A psychophysically plausible model for typicality ranking of natural scenes. Submitted to ACM Transactions on Applied Perception.

  • Sebe, N., Lew, M.S., Zhou, X., Huang, Th.S., and Bakker. E.M. 2003. The state of the art in image and video retrieval. In Conf. Image and Video Retrieval CIVR, Urbana-Champaign, IL, USA, pp. 1–8.

  • Serrano, N., Savakis, A.E., and Luo, J. 2004. Improved scene classification using efficient low-level features and semantic cues. Pattern Recognition, 37(9):1773–1784.

    Article  Google Scholar 

  • Shi, J. and Malik, J. 1997. Normalised cuts and image segmentation. In Conference on Computer Vision and Pattern Recognition CVPR’97, Puerto Rico.

  • Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349–1380.

    Article  Google Scholar 

  • Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classification. In Workshop on Content-Based Access of Image and Video Databases, Bombay, India.

  • Town, C.P. and Sinclair, D. 2000. Content based image retrieval using semantic visual categories. Technical Report 2000.14, AT&T Laboratories Cambridge.

  • Tversky, B. and Hemenway, K. 1983. Categories of environmental scenes. Cognitive Psychology, 15:121–149.

    Article  Google Scholar 

  • Vailaya, A., Figueiredo, M.A., Jain, A.K., and Zhang, H.J. 2001. Image classification for content-based indexing. IEEE Transactions on Image Processing, 10(1):117–130.

    Article  Google Scholar 

  • Veltkamp, R.C. and Tanase, M. 2001. Content-based image retrieval systems: A survey. Technical report, Department of Computer Science, Utrecht University.

  • Vogel, J. 2004. Semantic Scene Modeling and Retrieval. Number 33 in Selected Readings in Vision and Graphics. Hartung-Gorre, Verlag Konstanz.

  • Wang, Y. and Zhang, H. 2001. Content-based image orientation detection with support vector machines. In Workshop on Content-Based Access of Image and Video Libraries CBAIVL’01, Kauai, Hawaii, USA.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julia Vogel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vogel, J., Schiele, B. Semantic Modeling of Natural Scenes for Content-Based Image Retrieval. Int J Comput Vision 72, 133–157 (2007). https://doi.org/10.1007/s11263-006-8614-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-006-8614-1

Keywords

Navigation