This paper addresses the problem of accurately segmenting instances of object classes in images without any human interaction. Our model combines a bag-of-words recognition component with spatial regularization based on a random field and a Dirichlet process mixture. Bag-of-words models successfully predict the presence of an object within an image; however, they can not accurately locate object boundaries. Random Fields take into account the spatial layout of images and provide local spatial regularization. Yet, as they use local coupling between image labels, they fail to capture larger scale structures needed for object recognition. These components are combined with a Dirichlet process mixture. It models images as a composition of regions, each representing a single object instance. Gibbs sampling is used for parameter estimations and object segmentation.
Our model successfully segments object category instances, despite cluttered backgrounds and large variations in appearance and viewpoints. The strengths and limitations of our model are shown through extensive experimental evaluations. First, we evaluate the result of two methods to build visual vocabularies. Second, we show how to combine strong labeling (segmented images) with weak labeling (images annotated with bounding boxes), in order to limit the labeling effort needed to learn the model. Third, we study the effect of different initializations. We present results on four image databases, including the challenging PASCAL VOC 2007 data set on which we obtain state-of-the art results.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Borenstein, E., & Malik, J. (2006). Shape guided object segmentation. In IEEE conference on computer vision & pattern recognition.
Boykov, Y., & Jolly, M.-P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In IEEE international conference on computer vision.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth and Brooks.
Cao, L., & Fei-Fei, L. (2007). Spatially coherent latent topic model for concurrent object segmentation and classification. In IEEE international conference on computer vision.
Csurka, G., & Perronnin, F. (2008). A simple high performance approach to semantic segmentation. In British machine vision conference.
Csurka, G., Dance, C., Fan, L., Williamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision.
Dalal, N., Triggs, B., & Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In European conference on computer vision.
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge 2007 Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In IEEE international conference on computer vision.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(6), 721–741.
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1–2), 177–196.
Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In IEEE international conference on computer vision.
Kumar, S., & Hebert, M. (2006). Discriminative random fields. International Journal of Computer Vision, 68(2), 179–201.
Kumar, M., Torr, P., & Zisserman, A. (2005). OBJ CUT. In IEEE conference on computer vision & pattern recognition.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning.
Larlus, D., & Jurie, F. (2006). Latent mixture vocabularies for object categorization. In British machine vision conference.
Larlus, D., & Jurie, F. (2008). Combining appearance models and Markov random fields for category level object segmentation. In IEEE conference on computer vision & pattern recognition.
Leibe, B., & Schiele, B. (2003). Interleaved object categorization and segmentation. In British machine vision conference.
Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In European conference on computer vision.
Li, Y., Sun, J., Tang, C., & Shum, H. (2004). Lazy snapping. ACM Transactions on Graphics, 23(3), 303–308.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Martin, D., Fowlkes, C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transaction on Pattern Analysis and Machine Intelligence, 26(5), 530–549.
Moosmann, F., Nowak, E., & Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transaction on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.
Neal, R. (1998). Markov chain sampling methods for Dirichlet process mixture models (Technical Report 9815). University of Toronto, Dept. of Statistics.
Orbanz, P., & Buhmann, J. (2006). Smooth image segmentation by nonparametric Bayesian inference. In European conference on computer vision.
Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. In European conference on computer vision.
Rother, C., Kolmogorov, V., & Blake, A. (2004). GrabCut: interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In European conference on computer vision.
Shotton, S., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In IEEE conference on computer vision & pattern recognition.
Storkey, A., & Williams, C. (2003). Image modeling with position-encoding dynamic trees. IEEE Transaction on Pattern Analysis and Machine Intelligence, 25, 859–871.
Sudderth, E., Torralba, A., Freeman, W., & Willsky, A. (2008). Describing visual scenes using transformed objects and parts. International Journal of Computer Vision, 77(1–3), 291–330.
van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In European conference on computer vision.
Verbeek, J., & Triggs, B. (2007). Region classification with Markov field aspect models. In IEEE conference on computer vision & pattern recognition.
Verbeek, J., & Triggs, B. (2008). Scene segmentation with CRFs learned from partially labeled images. In Advances in neural information processing systems.
Winn, J., & Jojic, N. (2005). Locus: learning object classes with unsupervised segmentation. In IEEE international conference on computer vision.
Winn, J., & Shotton, J. (2006). The layout consistent random field for recognizing and segmenting partially occluded objects. In IEEE conference on computer vision & pattern recognition.
About this article
Cite this article
Larlus, D., Verbeek, J. & Jurie, F. Category Level Object Segmentation by Combining Bag-of-Words Models with Dirichlet Processes and Random Fields. Int J Comput Vis 88, 238–253 (2010). https://doi.org/10.1007/s11263-009-0245-x
- Object recognition
- Random fields