A Statistical Model for General Contextual Object Recognition

Carbonetto, Peter; de Freitas, Nando; Barnard, Kobus

doi:10.1007/978-3-540-24670-1_27

Peter Carbonetto¹⁶,
Nando de Freitas¹⁶ &
Kobus Barnard¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3021))

Included in the following conference series:

European Conference on Computer Vision

5266 Accesses
107 Citations

Abstract

We consider object recognition as the process of attaching meaningful labels to specific regions of an image, and propose a model that learns spatial relationships between objects. Given a set of images and their associated text (e.g. keywords, captions, descriptions), the objective is to segment an image, in either a crude or sophisticated fashion, then to find the proper associations between words and regions. Previous models are limited by the scope of the representation. In particular, they fail to exploit spatial context in the images and words. We develop a more expressive model that takes this into account. We formulate a spatially consistent probabilistic mapping between continuous image feature vectors and the supplied word tokens. By learning both word-to-region associations and object relations, the proposed model augments scene segmentations due to smoothing implicit in spatial consistency. Context introduces cycles to the undirected graph, so we cannot rely on a straightforward implementation of the EM algorithm for estimating the model parameters and densities of the unknown alignment variables. Instead, we develop an approximate EM algorithm that uses loopy belief propagation in the inference step and iterative scaling on the pseudo-likelihood approximation in the parameter update step. The experiments indicate that our approximate inference and learning algorithm converges to good local solutions. Experiments on a diverse array of images show that spatial context considerably improves the accuracy of object recognition. Most significantly, spatial context combined with a nonlinear discrete object representation allows our models to cope well with over-segmented scenes.

Download to read the full chapter text

Chapter PDF

Context Analysis Using a Bayesian Normal Graph

Visual and semantic context modeling for scene-centric image annotation

Article 06 April 2016

Semantic Image Segmentation Method with Multiple Adjacency Trees and Multiscale Features

Article 07 December 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Barnard, K., Duygulu, P., Forsyth, D.A.: Clustering art. In: IEEE Conf. Comp. Vision and Pattern Recognition (2001)
Google Scholar
Barnard, K., Duygulu, P., Forsyth, D.A., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Machine Learning Res. 3, 1107–1135 (2003)
Article MATH Google Scholar
Barnard, K., Duygulu, P., Guru, R., Gabbur, P., Forsyth, D.A.: The Effects of segmentation and feature choice in a translation model of object recognition. In: IEEE Conf. Comp. Vision and Pattern Recognition (2003)
Google Scholar
Barnard, K., Forsyth, D.A.: Learning the semantics of words and pictures. In: Intl. Conf. Comp. Vision (2001)
Google Scholar
Berger, A.: The Improved iterative scaling algorithm: a gentle introduction. Carnegie Mellon University, Pittsburgh (1997)
Google Scholar
Besag, J.: On the Statistical analysis of dirty pictures. J. Royal Statistical Society, Series B 48(3), 259–302 (1986)
MATH MathSciNet Google Scholar
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: ACM SIGIR Conf. on Research and Development in Information Retrieval (2003)
Google Scholar
Borra, S., Sarkar, S.: A Framework for performance characterization of intermediate- level grouping modules. IEEE Trans. on Pattern Analysis and Machine Intelligence 19(11), 1306–1312 (1997)
Article Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The Mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Cadez, I., Smyth, P.: Parameter estimation for inhomogeneous Markov random fields using PseudoLikelihood. University of California, Irvine (1998)
Google Scholar
Carbonetto, P., de Freitas, N., Gustafson, P., Thompson, N.: Bayesian feature weighting for unsupervised learning, with application to object recognition. In: Workshop on Artificial Intelligence and Statistics (2003)
Google Scholar
Dorkó, G., Schmid, C.: Selection of scale invariant neighborhoods for object class recognition. In: Intl. Conf. Comp. Vision (2003)
Google Scholar
Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.A.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. iN: IEEE Conf. Comp. Vision and Pattern Recognition (2003)
Google Scholar
Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Intl. J. of Comp. Vision 40(1), 23–47 (2000)
Google Scholar
Kumar, S., Hebert, H.: Discriminative Random Fields: a discriminative framework for contextual interaction in classification. In: Intl. Conf. Comp. Vision (2003)
Google Scholar
Kumar, S., Hebert, H.: Discrminative Fields for modeling spatial dependencies in natural images. In: Adv. in Neural Information Processing Systems, Vol. 16 (2003)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Intl. Conf. Comp. Vision (1999)
Google Scholar
Murphy, K., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: an empirical study. In: Conf. Uncertainty in Artificial Intelligence (1999)
Google Scholar
Seymour, L.: Parameter estimation and model selection in image analysis using Gibbs-Markov random fields. PhD thesis, U. of North Carolina, Chapel Hill (1993)
Google Scholar
Mikolajczk, K., Schmid, C.: A Performance evaluation of local descriptors. In: IEEE Conf. Comp. Vision and Pattern Recognition (2003)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. In: IEEE Conf. Comp. Vision and Pattern Recognition (1997)
Google Scholar
Teh, Y.W., Welling, M.: The Unified propagation and scaling algorithm. In: Advances in Neural Information Processing Systems, Vol. 14 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of British Columbia, Vancouver, Canada
Peter Carbonetto & Nando de Freitas
Dept. of Computer Science, University of Arizona, Tucson, Arizona
Kobus Barnard

Authors

Peter Carbonetto
View author publications
You can also search for this author in PubMed Google Scholar
Nando de Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Kobus Barnard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Machine Perception, Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University, Prague 6, Czech Republic
Tomás Pajdla
Center for Machine Perception, Dept. of Cybernetics, Faculty of Elec. Eng., Czech Technical University in Prague, Karlovo nám. 13, 121 35, Prague, Czech Rep
Jiří Matas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carbonetto, P., de Freitas, N., Barnard, K. (2004). A Statistical Model for General Contextual Object Recognition. In: Pajdla, T., Matas, J. (eds) Computer Vision - ECCV 2004. ECCV 2004. Lecture Notes in Computer Science, vol 3021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24670-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-24670-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21984-2
Online ISBN: 978-3-540-24670-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Statistical Model for General Contextual Object Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Context Analysis Using a Bayesian Normal Graph

Visual and semantic context modeling for scene-centric image annotation

Semantic Image Segmentation Method with Multiple Adjacency Trees and Multiscale Features

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Statistical Model for General Contextual Object Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Context Analysis Using a Bayesian Normal Graph

Visual and semantic context modeling for scene-centric image annotation

Semantic Image Segmentation Method with Multiple Adjacency Trees and Multiscale Features

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation