Advertisement

International Journal of Computer Vision

, Volume 88, Issue 2, pp 238–253 | Cite as

Category Level Object Segmentation by Combining Bag-of-Words Models with Dirichlet Processes and Random Fields

  • Diane Larlus
  • Jakob Verbeek
  • Frédéric Jurie
Article

Abstract

This paper addresses the problem of accurately segmenting instances of object classes in images without any human interaction. Our model combines a bag-of-words recognition component with spatial regularization based on a random field and a Dirichlet process mixture. Bag-of-words models successfully predict the presence of an object within an image; however, they can not accurately locate object boundaries. Random Fields take into account the spatial layout of images and provide local spatial regularization. Yet, as they use local coupling between image labels, they fail to capture larger scale structures needed for object recognition. These components are combined with a Dirichlet process mixture. It models images as a composition of regions, each representing a single object instance. Gibbs sampling is used for parameter estimations and object segmentation.

Our model successfully segments object category instances, despite cluttered backgrounds and large variations in appearance and viewpoints. The strengths and limitations of our model are shown through extensive experimental evaluations. First, we evaluate the result of two methods to build visual vocabularies. Second, we show how to combine strong labeling (segmented images) with weak labeling (images annotated with bounding boxes), in order to limit the labeling effort needed to learn the model. Third, we study the effect of different initializations. We present results on four image databases, including the challenging PASCAL VOC 2007 data set on which we obtain state-of-the art results.

Keywords

Object recognition Segmentation Random fields 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. zbMATHCrossRefGoogle Scholar
  2. Borenstein, E., & Malik, J. (2006). Shape guided object segmentation. In IEEE conference on computer vision & pattern recognition. Google Scholar
  3. Boykov, Y., & Jolly, M.-P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In IEEE international conference on computer vision. Google Scholar
  4. Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth and Brooks. zbMATHGoogle Scholar
  5. Cao, L., & Fei-Fei, L. (2007). Spatially coherent latent topic model for concurrent object segmentation and classification. In IEEE international conference on computer vision. Google Scholar
  6. Csurka, G., & Perronnin, F. (2008). A simple high performance approach to semantic segmentation. In British machine vision conference. Google Scholar
  7. Csurka, G., Dance, C., Fan, L., Williamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision. Google Scholar
  8. Dalal, N., Triggs, B., & Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In European conference on computer vision. Google Scholar
  9. Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge 2007 Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
  10. Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In IEEE international conference on computer vision. Google Scholar
  11. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(6), 721–741. zbMATHCrossRefGoogle Scholar
  12. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. zbMATHCrossRefGoogle Scholar
  13. Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1–2), 177–196. zbMATHCrossRefGoogle Scholar
  14. Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In IEEE international conference on computer vision. Google Scholar
  15. Kumar, S., & Hebert, M. (2006). Discriminative random fields. International Journal of Computer Vision, 68(2), 179–201. CrossRefGoogle Scholar
  16. Kumar, M., Torr, P., & Zisserman, A. (2005). OBJ CUT. In IEEE conference on computer vision & pattern recognition. Google Scholar
  17. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning. Google Scholar
  18. Larlus, D., & Jurie, F. (2006). Latent mixture vocabularies for object categorization. In British machine vision conference. Google Scholar
  19. Larlus, D., & Jurie, F. (2008). Combining appearance models and Markov random fields for category level object segmentation. In IEEE conference on computer vision & pattern recognition. Google Scholar
  20. Leibe, B., & Schiele, B. (2003). Interleaved object categorization and segmentation. In British machine vision conference. Google Scholar
  21. Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In European conference on computer vision. Google Scholar
  22. Li, Y., Sun, J., Tang, C., & Shum, H. (2004). Lazy snapping. ACM Transactions on Graphics, 23(3), 303–308. CrossRefGoogle Scholar
  23. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  24. Martin, D., Fowlkes, C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transaction on Pattern Analysis and Machine Intelligence, 26(5), 530–549. CrossRefGoogle Scholar
  25. Moosmann, F., Nowak, E., & Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transaction on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646. CrossRefGoogle Scholar
  26. Neal, R. (1998). Markov chain sampling methods for Dirichlet process mixture models (Technical Report 9815). University of Toronto, Dept. of Statistics. Google Scholar
  27. Orbanz, P., & Buhmann, J. (2006). Smooth image segmentation by nonparametric Bayesian inference. In European conference on computer vision. Google Scholar
  28. Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. In European conference on computer vision. Google Scholar
  29. Rother, C., Kolmogorov, V., & Blake, A. (2004). GrabCut: interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314. CrossRefGoogle Scholar
  30. Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In European conference on computer vision. Google Scholar
  31. Shotton, S., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In IEEE conference on computer vision & pattern recognition. Google Scholar
  32. Storkey, A., & Williams, C. (2003). Image modeling with position-encoding dynamic trees. IEEE Transaction on Pattern Analysis and Machine Intelligence, 25, 859–871. CrossRefGoogle Scholar
  33. Sudderth, E., Torralba, A., Freeman, W., & Willsky, A. (2008). Describing visual scenes using transformed objects and parts. International Journal of Computer Vision, 77(1–3), 291–330. CrossRefGoogle Scholar
  34. van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In European conference on computer vision. Google Scholar
  35. Verbeek, J., & Triggs, B. (2007). Region classification with Markov field aspect models. In IEEE conference on computer vision & pattern recognition. Google Scholar
  36. Verbeek, J., & Triggs, B. (2008). Scene segmentation with CRFs learned from partially labeled images. In Advances in neural information processing systems. Google Scholar
  37. Winn, J., & Jojic, N. (2005). Locus: learning object classes with unsupervised segmentation. In IEEE international conference on computer vision. Google Scholar
  38. Winn, J., & Shotton, J. (2006). The layout consistent random field for recognizing and segmenting partially occluded objects. In IEEE conference on computer vision & pattern recognition. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.INP GrenobleDarmstadt University of TechnologyDarmstadtGermany
  2. 2.INRIA Rhône-AlpesSaint Ismier cedexFrance
  3. 3.UFR Sciences–GREYCUniversity of CaenCaen cedexFrance

Personalised recommendations