Abstract
We consider object recognition as the process of attaching meaningful labels to specific regions of an image, and propose a model that learns spatial relationships between objects. Given a set of images and their associated text (e.g. keywords, captions, descriptions), the objective is to segment an image, in either a crude or sophisticated fashion, then to find the proper associations between words and regions. Previous models are limited by the scope of the representation. In particular, they fail to exploit spatial context in the images and words. We develop a more expressive model that takes this into account. We formulate a spatially consistent probabilistic mapping between continuous image feature vectors and the supplied word tokens. By learning both word-to-region associations and object relations, the proposed model augments scene segmentations due to smoothing implicit in spatial consistency. Context introduces cycles to the undirected graph, so we cannot rely on a straightforward implementation of the EM algorithm for estimating the model parameters and densities of the unknown alignment variables. Instead, we develop an approximate EM algorithm that uses loopy belief propagation in the inference step and iterative scaling on the pseudo-likelihood approximation in the parameter update step. The experiments indicate that our approximate inference and learning algorithm converges to good local solutions. Experiments on a diverse array of images show that spatial context considerably improves the accuracy of object recognition. Most significantly, spatial context combined with a nonlinear discrete object representation allows our models to cope well with over-segmented scenes.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Barnard, K., Duygulu, P., Forsyth, D.A.: Clustering art. In: IEEE Conf. Comp. Vision and Pattern Recognition (2001)
Barnard, K., Duygulu, P., Forsyth, D.A., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Machine Learning Res. 3, 1107–1135 (2003)
Barnard, K., Duygulu, P., Guru, R., Gabbur, P., Forsyth, D.A.: The Effects of segmentation and feature choice in a translation model of object recognition. In: IEEE Conf. Comp. Vision and Pattern Recognition (2003)
Barnard, K., Forsyth, D.A.: Learning the semantics of words and pictures. In: Intl. Conf. Comp. Vision (2001)
Berger, A.: The Improved iterative scaling algorithm: a gentle introduction. Carnegie Mellon University, Pittsburgh (1997)
Besag, J.: On the Statistical analysis of dirty pictures. J. Royal Statistical Society, Series B 48(3), 259–302 (1986)
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: ACM SIGIR Conf. on Research and Development in Information Retrieval (2003)
Borra, S., Sarkar, S.: A Framework for performance characterization of intermediate- level grouping modules. IEEE Trans. on Pattern Analysis and Machine Intelligence 19(11), 1306–1312 (1997)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The Mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Cadez, I., Smyth, P.: Parameter estimation for inhomogeneous Markov random fields using PseudoLikelihood. University of California, Irvine (1998)
Carbonetto, P., de Freitas, N., Gustafson, P., Thompson, N.: Bayesian feature weighting for unsupervised learning, with application to object recognition. In: Workshop on Artificial Intelligence and Statistics (2003)
Dorkó, G., Schmid, C.: Selection of scale invariant neighborhoods for object class recognition. In: Intl. Conf. Comp. Vision (2003)
Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.A.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. iN: IEEE Conf. Comp. Vision and Pattern Recognition (2003)
Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Intl. J. of Comp. Vision 40(1), 23–47 (2000)
Kumar, S., Hebert, H.: Discriminative Random Fields: a discriminative framework for contextual interaction in classification. In: Intl. Conf. Comp. Vision (2003)
Kumar, S., Hebert, H.: Discrminative Fields for modeling spatial dependencies in natural images. In: Adv. in Neural Information Processing Systems, Vol. 16 (2003)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Intl. Conf. Comp. Vision (1999)
Murphy, K., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: an empirical study. In: Conf. Uncertainty in Artificial Intelligence (1999)
Seymour, L.: Parameter estimation and model selection in image analysis using Gibbs-Markov random fields. PhD thesis, U. of North Carolina, Chapel Hill (1993)
Mikolajczk, K., Schmid, C.: A Performance evaluation of local descriptors. In: IEEE Conf. Comp. Vision and Pattern Recognition (2003)
Shi, J., Malik, J.: Normalized cuts and image segmentation. In: IEEE Conf. Comp. Vision and Pattern Recognition (1997)
Teh, Y.W., Welling, M.: The Unified propagation and scaling algorithm. In: Advances in Neural Information Processing Systems, Vol. 14 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carbonetto, P., de Freitas, N., Barnard, K. (2004). A Statistical Model for General Contextual Object Recognition. In: Pajdla, T., Matas, J. (eds) Computer Vision - ECCV 2004. ECCV 2004. Lecture Notes in Computer Science, vol 3021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24670-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-24670-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21984-2
Online ISBN: 978-3-540-24670-1
eBook Packages: Springer Book Archive