Semantic Modeling of Natural Scenes for Content-Based Image Retrieval
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
In this paper, we present a novel image representation that renders it possible to access natural scenes by local semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic concept classes such as water, rocks, or foliage. Images are represented through the frequency of occurrence of these local concepts. Through extensive experiments, we demonstrate that the image representation is well suited for modeling the semantic content of heterogenous scene categories, and thus for categorization and retrieval.
The image representation also allows us to rank natural scenes according to their semantic similarity relative to certain scene categories. Based on human ranking data, we learn a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking. This result is especially valuable for content-based image retrieval where the goal is to present retrieval results in descending semantic similarity from the query.
- Barnard, K., Duygulu, P., de Freitas, N., and Forsyth, D. 2002. Object recognition as machine translation—part 2: Exploiting image data-base clustering models. In European Conference on Computer Vision ECCV’02, Copenhagen, Denmark.
- Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., and Jordan, M.I. 2003. Matching words and pictures. Journal of Machine Learning Research, 3:1107–1135. CrossRef
- Bortz, J. 1999. Statistik für Sozialwissenschaftler, 5th edition. Springer.
- Boutell, M.R., Luo, J., Shen, X., and Brown. C.M. 2004. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771. CrossRef
- Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. Software available at: http://www.csie.ntu.edu.tw.
- Comaniciu, D. and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5).
- Duygulu, P., Barnard, K., de Freitas, J.F.D., and Forsyth, D.A. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In European Conference on Computer Vision ECCV’02, Copenhagen, Denmark.
- Eakins, J.P. and Graham, M.E. 1999. Content-based image retrieval, a report to the JISC Technology Applications programme. Technical report, Institute for Image Data Research, University of Northumbria at Newcastle.
- Feng, S.L., Manmatha, R., and Lavrenko, V. 2004. Multiple bernoulli relevance models for image and video annotation. In Conference on Image and Video Retrieval CIVR’04, Dublin, Ireland.
- Feng, X., Fang, J., and Qiu, G. 2003. Color photo categorization using compressed histograms and support vector machines. In International Conference on Image Processing ICIP’03, Barcelona, Spain.
- Hsu, C.-W. and Lin, C.-J. 2002. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, 13(2):415–425. CrossRef
- Jain, R., Kasturi, R., and Schunck. B.G. 1995. Machine Vision. McGraw-Hill, Inc.
- Joachims, T. 2002. Learning to Classify Text using Support Vector Machines—Methods, Theory, and Algorithms. Kluwer Academic Publishers.
- Kline, P. 2000. Handbook of Psychological Testing, 2nd edition. Routledge.
- Kumar, S. and Hebert, M. 2003. Man-made structure detection in natural images using a causal multiscale random field. In Conference on Computer Vision and Pattern Recognition CVPR’03, Madison, Wisconsin, pp. 119–126.
- Lavrenko, V., Manmatha, R., and Jeon, J. 2003. A model for learning the semantics of pictures. In 17th Annual Conference on Neural Information Processing Systems NIPS’03, Vancouver, Canada.
- Li, J. and Wang, J.Z. 2003. Automatic linguistic indexing of pictures by a statistical modeling approach. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075–1088. CrossRef
- Lipson, P., Grimson, E., and Sinha. P. 1997. Configuration based scene classification and image indexing. In Conference on Computer Vision and Pattern Recognition CVPR’97, Puerto Rico, pp. 1007–1011.
- Maron, O. and Ratan, A.L. 1998. Multiple-instance learning for natural scene classification. In International Conference on Machine Learning ICML’98, Morgan Kaufmann, San Francisco, CA, pp. 341–349.
- Minka, T.P. and Picard, R.W. 1997. Interactive learning using a society of models. In IEEE Transactions on Pattern Recognition and Machine Intelligence, 30(4).
- Mojsilovic, A., Gomes, J., and Rogowitz, B. 2004. Semantic-friendly indexing and querying of images based on the extraction of the objective semantic cues. International Journal of Computer Vision, 56(1/2):79–107. CrossRef
- Murphy, G.L. 2002. The Big Book of Concepts. MIT Press.
- Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175. CrossRef
- Oliva, A. and Torralba, A. 2002. Scene-centered description from spatial envelope properties. In Second Workshop on Biologically Motivated Computer Vision BMCV’02, Tübingen, Germany.
- Oliva, A., Torralba, A., Guerin-Dugue, A., and Herault, J. 1999. Global semantic classification of scenes using power spectrum templates. In Challenge of Image Retrieval CIR, Newcastle, UK.
- Picard, R.W. and Minka, T.P. 1995. Vision texture for annotation. ACM Journal of Multimedia Systems.
- Rogowitz, B.E., Frese, T., Smith, J.R., Bouman, C.A., and Kalin, E. 1997. Perceptual image similarity experiments. In SPIE Conference on Human Vision and Electronic Imaging, San Jose, California, pp. 576–590.
- Rosch, E. 1978. Principles of categorization. In E. Rosch, and B.B. Lloyd, (Eds), Cognition and Categorization, Erlbaum.
- Rosch, E. and Mervis, C.B. 1975. Family resemblance: Studies in the internal structure of categories. Cognitive Psychology, 7:573–605. CrossRef
- Rosch, E., Simpson, C., and Miller, R.S. 1976. Structural bases of typicality effects. Journal of Experimental Psychology: Human Perception and Performance, 2:491–502. CrossRef
- Rui, Y., Huang, T.S., and Chang, S. 1999. Image retrieval: Current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 10:39–62. CrossRef
- Schwaninger, A., Vogel, J., Hofer, F., and Schiele, B. A psychophysically plausible model for typicality ranking of natural scenes. Submitted to ACM Transactions on Applied Perception.
- Sebe, N., Lew, M.S., Zhou, X., Huang, Th.S., and Bakker. E.M. 2003. The state of the art in image and video retrieval. In Conf. Image and Video Retrieval CIVR, Urbana-Champaign, IL, USA, pp. 1–8.
- Serrano, N., Savakis, A.E., and Luo, J. 2004. Improved scene classification using efficient low-level features and semantic cues. Pattern Recognition, 37(9):1773–1784. CrossRef
- Shi, J. and Malik, J. 1997. Normalised cuts and image segmentation. In Conference on Computer Vision and Pattern Recognition CVPR’97, Puerto Rico.
- Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349–1380. CrossRef
- Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classification. In Workshop on Content-Based Access of Image and Video Databases, Bombay, India.
- Town, C.P. and Sinclair, D. 2000. Content based image retrieval using semantic visual categories. Technical Report 2000.14, AT&T Laboratories Cambridge.
- Tversky, B. and Hemenway, K. 1983. Categories of environmental scenes. Cognitive Psychology, 15:121–149. CrossRef
- Vailaya, A., Figueiredo, M.A., Jain, A.K., and Zhang, H.J. 2001. Image classification for content-based indexing. IEEE Transactions on Image Processing, 10(1):117–130. CrossRef
- Veltkamp, R.C. and Tanase, M. 2001. Content-based image retrieval systems: A survey. Technical report, Department of Computer Science, Utrecht University.
- Vogel, J. 2004. Semantic Scene Modeling and Retrieval. Number 33 in Selected Readings in Vision and Graphics. Hartung-Gorre, Verlag Konstanz.
- Wang, Y. and Zhang, H. 2001. Content-based image orientation detection with support vector machines. In Workshop on Content-Based Access of Image and Video Libraries CBAIVL’01, Kauai, Hawaii, USA.
- Semantic Modeling of Natural Scenes for Content-Based Image Retrieval
International Journal of Computer Vision
Volume 72, Issue 2 , pp 133-157
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- semantic scene understanding
- content-based image retrieval
- scene clasification
- human scene preception
- perceptually based techniques
- computer vision
- Industry Sectors