Semantic Modeling of Natural Scenes for Content-Based Image Retrieval
- 1.3k Downloads
- 192 Citations
Abstract
In this paper, we present a novel image representation that renders it possible to access natural scenes by local semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic concept classes such as water, rocks, or foliage. Images are represented through the frequency of occurrence of these local concepts. Through extensive experiments, we demonstrate that the image representation is well suited for modeling the semantic content of heterogenous scene categories, and thus for categorization and retrieval.
The image representation also allows us to rank natural scenes according to their semantic similarity relative to certain scene categories. Based on human ranking data, we learn a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking. This result is especially valuable for content-based image retrieval where the goal is to present retrieval results in descending semantic similarity from the query.
Keywords
semantic scene understanding content-based image retrieval scene clasification human scene preception perceptually based techniques computer visionPreview
Unable to display preview. Download preview PDF.
References
- Barnard, K., Duygulu, P., de Freitas, N., and Forsyth, D. 2002. Object recognition as machine translation—part 2: Exploiting image data-base clustering models. In European Conference on Computer Vision ECCV’02, Copenhagen, Denmark.Google Scholar
- Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., and Jordan, M.I. 2003. Matching words and pictures. Journal of Machine Learning Research, 3:1107–1135.CrossRefGoogle Scholar
- Bortz, J. 1999. Statistik für Sozialwissenschaftler, 5th edition. Springer.Google Scholar
- Boutell, M.R., Luo, J., Shen, X., and Brown. C.M. 2004. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771.CrossRefGoogle Scholar
- Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. Software available at: http://www.csie.ntu.edu.tw.
- Comaniciu, D. and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5).Google Scholar
- Duygulu, P., Barnard, K., de Freitas, J.F.D., and Forsyth, D.A. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In European Conference on Computer Vision ECCV’02, Copenhagen, Denmark.Google Scholar
- Eakins, J.P. and Graham, M.E. 1999. Content-based image retrieval, a report to the JISC Technology Applications programme. Technical report, Institute for Image Data Research, University of Northumbria at Newcastle.Google Scholar
- Feng, S.L., Manmatha, R., and Lavrenko, V. 2004. Multiple bernoulli relevance models for image and video annotation. In Conference on Image and Video Retrieval CIVR’04, Dublin, Ireland.Google Scholar
- Feng, X., Fang, J., and Qiu, G. 2003. Color photo categorization using compressed histograms and support vector machines. In International Conference on Image Processing ICIP’03, Barcelona, Spain.Google Scholar
- Hsu, C.-W. and Lin, C.-J. 2002. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, 13(2):415–425.CrossRefGoogle Scholar
- Jain, R., Kasturi, R., and Schunck. B.G. 1995. Machine Vision. McGraw-Hill, Inc.Google Scholar
- Joachims, T. 2002. Learning to Classify Text using Support Vector Machines—Methods, Theory, and Algorithms. Kluwer Academic Publishers.Google Scholar
- Kline, P. 2000. Handbook of Psychological Testing, 2nd edition. Routledge.Google Scholar
- Kumar, S. and Hebert, M. 2003. Man-made structure detection in natural images using a causal multiscale random field. In Conference on Computer Vision and Pattern Recognition CVPR’03, Madison, Wisconsin, pp. 119–126.Google Scholar
- Lavrenko, V., Manmatha, R., and Jeon, J. 2003. A model for learning the semantics of pictures. In 17th Annual Conference on Neural Information Processing Systems NIPS’03, Vancouver, Canada.Google Scholar
- Li, J. and Wang, J.Z. 2003. Automatic linguistic indexing of pictures by a statistical modeling approach. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075–1088.CrossRefGoogle Scholar
- Lipson, P., Grimson, E., and Sinha. P. 1997. Configuration based scene classification and image indexing. In Conference on Computer Vision and Pattern Recognition CVPR’97, Puerto Rico, pp. 1007–1011.Google Scholar
- Maron, O. and Ratan, A.L. 1998. Multiple-instance learning for natural scene classification. In International Conference on Machine Learning ICML’98, Morgan Kaufmann, San Francisco, CA, pp. 341–349.Google Scholar
- Minka, T.P. and Picard, R.W. 1997. Interactive learning using a society of models. In IEEE Transactions on Pattern Recognition and Machine Intelligence, 30(4).Google Scholar
- Mojsilovic, A., Gomes, J., and Rogowitz, B. 2004. Semantic-friendly indexing and querying of images based on the extraction of the objective semantic cues. International Journal of Computer Vision, 56(1/2):79–107.CrossRefGoogle Scholar
- Murphy, G.L. 2002. The Big Book of Concepts. MIT Press.Google Scholar
- Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175.CrossRefGoogle Scholar
- Oliva, A. and Torralba, A. 2002. Scene-centered description from spatial envelope properties. In Second Workshop on Biologically Motivated Computer Vision BMCV’02, Tübingen, Germany.Google Scholar
- Oliva, A., Torralba, A., Guerin-Dugue, A., and Herault, J. 1999. Global semantic classification of scenes using power spectrum templates. In Challenge of Image Retrieval CIR, Newcastle, UK.Google Scholar
- Picard, R.W. and Minka, T.P. 1995. Vision texture for annotation. ACM Journal of Multimedia Systems.Google Scholar
- Rogowitz, B.E., Frese, T., Smith, J.R., Bouman, C.A., and Kalin, E. 1997. Perceptual image similarity experiments. In SPIE Conference on Human Vision and Electronic Imaging, San Jose, California, pp. 576–590.Google Scholar
- Rosch, E. 1978. Principles of categorization. In E. Rosch, and B.B. Lloyd, (Eds), Cognition and Categorization, Erlbaum.Google Scholar
- Rosch, E. and Mervis, C.B. 1975. Family resemblance: Studies in the internal structure of categories. Cognitive Psychology, 7:573–605.CrossRefGoogle Scholar
- Rosch, E., Simpson, C., and Miller, R.S. 1976. Structural bases of typicality effects. Journal of Experimental Psychology: Human Perception and Performance, 2:491–502.CrossRefGoogle Scholar
- Rui, Y., Huang, T.S., and Chang, S. 1999. Image retrieval: Current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 10:39–62.CrossRefGoogle Scholar
- Schwaninger, A., Vogel, J., Hofer, F., and Schiele, B. A psychophysically plausible model for typicality ranking of natural scenes. Submitted to ACM Transactions on Applied Perception.Google Scholar
- Sebe, N., Lew, M.S., Zhou, X., Huang, Th.S., and Bakker. E.M. 2003. The state of the art in image and video retrieval. In Conf. Image and Video Retrieval CIVR, Urbana-Champaign, IL, USA, pp. 1–8.Google Scholar
- Serrano, N., Savakis, A.E., and Luo, J. 2004. Improved scene classification using efficient low-level features and semantic cues. Pattern Recognition, 37(9):1773–1784.CrossRefGoogle Scholar
- Shi, J. and Malik, J. 1997. Normalised cuts and image segmentation. In Conference on Computer Vision and Pattern Recognition CVPR’97, Puerto Rico.Google Scholar
- Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349–1380.CrossRefGoogle Scholar
- Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classification. In Workshop on Content-Based Access of Image and Video Databases, Bombay, India.Google Scholar
- Town, C.P. and Sinclair, D. 2000. Content based image retrieval using semantic visual categories. Technical Report 2000.14, AT&T Laboratories Cambridge.Google Scholar
- Tversky, B. and Hemenway, K. 1983. Categories of environmental scenes. Cognitive Psychology, 15:121–149.CrossRefGoogle Scholar
- Vailaya, A., Figueiredo, M.A., Jain, A.K., and Zhang, H.J. 2001. Image classification for content-based indexing. IEEE Transactions on Image Processing, 10(1):117–130.CrossRefGoogle Scholar
- Veltkamp, R.C. and Tanase, M. 2001. Content-based image retrieval systems: A survey. Technical report, Department of Computer Science, Utrecht University.Google Scholar
- Vogel, J. 2004. Semantic Scene Modeling and Retrieval. Number 33 in Selected Readings in Vision and Graphics. Hartung-Gorre, Verlag Konstanz.Google Scholar
- Wang, Y. and Zhang, H. 2001. Content-based image orientation detection with support vector machines. In Workshop on Content-Based Access of Image and Video Libraries CBAIVL’01, Kauai, Hawaii, USA.Google Scholar