International Journal of Computer Vision

, Volume 97, Issue 2, pp 191–209 | Cite as

Accurate Object Recognition with Shape Masks

  • Marcin Marszałek
  • Cordelia Schmid


In this paper we propose an object recognition approach that is based on shape masks—generalizations of segmentation masks. As shape masks carry information about the extent (outline) of objects, they provide a convenient tool to exploit the geometry of objects. We apply our ideas to two common object class recognition tasks—classification and localization. For classification, we extend the orderless bag-of-features image representation. In the proposed setup shape masks can be seen as weak geometrical constraints over bag-of-features. Those constraints can be used to reduce background clutter and help recognition. For localization, we propose a new recognition scheme based on high-dimensional hypothesis clustering. Shape masks allow to go beyond bounding boxes and determine the outline (approximate segmentation) of the object during localization. Furthermore, the method easily learns and detects possible object viewpoints and articulations, which are often well characterized by the object outline. Our experiments reveal that shape masks can improve recognition accuracy of state-of-the-art methods while returning richer recognition answers at the same time. We evaluate the proposed approach on the challenging natural-scene Graz-02 object classes dataset.


Shape masks Object recognition Object segmentation Local features Bag-of-features Graz-02 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agarwal, S., & Roth, D. (2002). Learning a sparse representation for object detection. In ECCV. Google Scholar
  2. Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490. CrossRefGoogle Scholar
  3. Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV. Google Scholar
  4. Chapelle, O., Haffner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064. CrossRefGoogle Scholar
  5. Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision. Google Scholar
  6. Dorkó, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In ICCV. Google Scholar
  7. Everingham, M., Zisserman, A., Williams, C., & Gool, L.V., et al. (2006). The 2005 PASCAL visual object classes challenge. In Selected proceedings of the first PASCAL challenges workshop. Google Scholar
  8. Everingham, M., van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2008). Overview and results of the detection challenge. In The PASCAL VOC’08 challenge workshop in conj. with ECCV. Google Scholar
  9. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The PASCAL visual object classes challenge 2009 (VOC2009) results.
  10. Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303. CrossRefGoogle Scholar
  11. Fowlkes, C., Belongie, S., Chung, F., & Malik, J. (2004). Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 1–12. CrossRefGoogle Scholar
  12. Fritz, M., Leibe, B., Caputo, B., & Schiele, B. (2005). Integrating representative and discriminant models for object category detection. In ICCV. Google Scholar
  13. Fussenegger, M., Opelt, A., & Pinz, A. (2006). Object localization/segmentation using generic shape priors. In ICPR. Google Scholar
  14. Galleguillos, C., Babenko, B., Rabinovich, A., & Belongie, S. (2008). Weakly supervised object localization with stable segmentations. In ECCV. Google Scholar
  15. Gårding, J., & Lindeberg, T. (1996). Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, 17(2), 163–191. CrossRefGoogle Scholar
  16. Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In ICCV. Google Scholar
  17. Gu, C., Lim, J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In CVPR. Google Scholar
  18. Hayman, E., Caputo, B., Fritz, M., & Eklundh, JO (2004). On the significance of real-world conditions for material classification. In ECCV. Google Scholar
  19. Jing, F., Li, M., Zhang, H. J., & Zhang, B. (2003). Support vector machines for region-based image retrieval. In ICME. Google Scholar
  20. Lazebnik, S., Schmid, C., & Ponce, J. (2005). A maximum entropy framework for part-based texture and object recognition. In ICCV. Google Scholar
  21. Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR. Google Scholar
  22. Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289. CrossRefGoogle Scholar
  23. Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding: classification, annotation and segmentation in an unsupervised framework. In CVPR. Google Scholar
  24. Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116. CrossRefGoogle Scholar
  25. Lowe, D. (2004). Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  26. Lyu, S. (2005). Mercer kernels for object recognition with local features. In CVPR. Google Scholar
  27. Marr, D. (1982). Vision. New York: Freeman. Google Scholar
  28. Marszałek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In CVPR. Google Scholar
  29. Marszałek, M., & Schmid, C. (2007). Accurate object localization with shape masks. In CVPR. Google Scholar
  30. Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86. CrossRefGoogle Scholar
  31. Opelt, A., & Pinz, A. (2005). Object localization with boosting and weak supervision for generic object recognition. In SCIA. Google Scholar
  32. Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004a). Generic object recognition with boosting. Tech. rep. TR-EMT-2004-01, TU Graz. Google Scholar
  33. Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004b). Weak hypotheses and boosting for generic object detection and recognition. In ECCV. Google Scholar
  34. Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006). Generic object recognition with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3), 416–431. CrossRefGoogle Scholar
  35. Peterson, M. (1994). Object recognition processes can and do operate before figure-ground organization. Current Directions in Psychological Science, 3, 105–111. CrossRefGoogle Scholar
  36. Ramanan, D. (2007). Using segmentation to verify object hypotheses. In CVPR. Google Scholar
  37. Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2003). 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In CVPR. Google Scholar
  38. Rowley, H., Baluja, S., & Kanade, T. (1998). Neural networks based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 22–38. CrossRefGoogle Scholar
  39. Rubner, Y., Tomasi, C., & Guibas, L. (2000). The Earth Mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121. zbMATHCrossRefGoogle Scholar
  40. Russell, B., Efros, A., Sivic, J., Freeman, W., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extents in image collections. In CVPR. Google Scholar
  41. Schölkopf, B., & Smola, A. (2002). Learning with kernels: support vector machines, regularization, optimization and beyond. Cambridge: MIT Press. Google Scholar
  42. Seemann, E., & Schiele, B. (2006). Cross-articulation learning for robust detection of pedestrians. In DAGM. Google Scholar
  43. Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In CVPR. Google Scholar
  44. Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In ICCV. Google Scholar
  45. Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In CVPR. Google Scholar
  46. Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV. Google Scholar
  47. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV. Google Scholar
  48. Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & Gool, L. V. (2006). Towards multi-view object class detection. In CVPR. Google Scholar
  49. Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR. Google Scholar
  50. Vecera, S. (1998). Figure-ground organization and object recognition processes: an interactive account. Journal of Experimental Psychology. Human Perception and Performance, 24(2), 441–462. CrossRefGoogle Scholar
  51. Viola, P., & Jones, M. (2004). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154. CrossRefGoogle Scholar
  52. Winn, J., & Joijic, N. (2005). LOCUS: learning object classes with unsupervised segmentation. In ICCV. Google Scholar
  53. Wu, B., & Nevatia, R. (2007). Simultaneous object detection and segmentation by boosting local shape feature based classifier. In CVPR. Google Scholar
  54. Yu, S., & Shi, J. (2003). Object-specific figure-ground segregation. In CVPR. Google Scholar
  55. Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.INRIA GrenobleLEAR - LJKMontbonnotFrance

Personalised recommendations