Beyond Bounding-Boxes: Learning Object Shape by Model-Driven Grouping

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7574)


Visual recognition requires to learn object models from training data. Commonly, training samples are annotated by marking only the bounding-box of objects, since this appears to be the best trade-off between labeling information and effectiveness. However, objects are typically not box-shaped. Thus, the usual parametrization of object hypotheses by only their location, scale and aspect ratio seems inappropriate since the box contains a significant amount of background clutter. Most important, however, is that object shape becomes only explicit once objects are segregated from the background. Segmentation is an ill-posed problem and so we propose an approach for learning object models for detection while, simultaneously, learning to segregate objects from clutter and extracting their overall shape. For this purpose, we exclusively use bounding-box annotated training data. The approach groups fragmented object regions using the Multiple Instance Learning (MIL) framework to obtain a meaningful representation of object shape which, at the same time, crops away distracting background clutter to improve the appearance representation.


Object Detection Average Precision Object Shape Multiple Instance Learn Mercer Kernel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI (2010)Google Scholar
  2. 2.
    Levin, A., Weiss, Y.: Learning to combine bottom-up and top-down segmentation. IJCV 81(1), 105–118 (2009)CrossRefGoogle Scholar
  3. 3.
    Gao, T., Packer, B., Koller, D.: A segmentation-aware object detection model with occlusion handling. In: CVPR, pp. 1361–1368 (2011)Google Scholar
  4. 4.
    Marszalek, M., Schmidt, C.: Accurate object recognition with shape masks. IJCV (97), 191–209 (2011)Google Scholar
  5. 5.
    Vijayanarasimhan, S., Grauman, K.: Efficient region search for object detection. In: CVPR (2011)Google Scholar
  6. 6.
    Malisiewicz, T., Efros, A.: Improving spacial support for objects via multiple segmentations. In: BMVC (2007)Google Scholar
  7. 7.
    Todorovic, S., Ahuja, N.: Learning subcategory relevances for category recognition. In: CVPR (2008)Google Scholar
  8. 8.
    Wang, X., Han, T., Yan, S.: An hog-lbp human detector with partial occlusion handling. In: ICCV (2009)Google Scholar
  9. 9.
    Chen, Y., Zhu, L(L.), Yuille, A.: Active Mask Hierarchies for Object Detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 43–56. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Carreira, J., Li, F., Sminchisescu, C.: Object Recognition by Sequential Figure-Ground Ranking. IJCV (November 2011)Google Scholar
  11. 11.
    Gu, C., Lim, J., Arbeláez, J., Malik, J.: Recognition using regions. In: ICCV (2009)Google Scholar
  12. 12.
    Van de Sande, K., Uijlings, J., Gevers, T., Smeulders, A.: Segmentation as selective search for object recognition. In: ICCV (2011)Google Scholar
  13. 13.
    Zhu, L., Chen, Y., Yuille, A.L., Freeman, W.: Latent hierarchical structural learning for object detection. In: CVPR, pp. 1062–1069 (2010)Google Scholar
  14. 14.
    Ommer, B., Malik, J.: Multi-scale object detection by clustering lines. In: ICCV (2009)Google Scholar
  15. 15.
    Carreira, J., Scminchisescu, C.: Constrained parametric min-cuts for automatic object segmentation. In: CVPR (2010)Google Scholar
  16. 16.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  17. 17.
    Andrews, S., Tscochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS, vol. 15 (2003)Google Scholar
  18. 18.
    Deselaers, T., Ferrari, V.: A conditional random field for multiple-instance learning. In: ICML (2010)Google Scholar
  19. 19.
    Ferrari, V., Jurie, F., Schmid, C.: Accurate object detection with deformable shape models learnt from images. In: CVPR (2007)Google Scholar
  20. 20.
    Toshev, A., Taskar, B., Daniilidis, K.: Object detection via boundary structure segmentation. In: CVPR (2010)Google Scholar
  21. 21.
    Yarlagadda, P., Monroy, A., Ommer, B.: Voting by Grouping Dependent Parts. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 197–210. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  22. 22.
    Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: CVPR (2009)Google Scholar
  23. 23.
    Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection. PAMI 30(1), 36–51 (2008)CrossRefGoogle Scholar
  24. 24.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: ICCV (2009)Google Scholar
  25. 25.
    Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV (2009)Google Scholar
  26. 26.
    Mark, E., Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2007 (voc 2007). Results (2007)Google Scholar
  27. 27.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for mulit-class object layout. In: ICCV, pp. 229–236 (2009)Google Scholar
  28. 28.
    Pedersoli, M., Vedaldi, A., Gonzalez, J.: A coarse-to-fine approach for fast deformable object detection. In: CVPR (2011)Google Scholar
  29. 29.
    Razavi, N., Gall, J., van Gool, L.: Scalable mulit-class object detection. In: CVPR (2011)Google Scholar
  30. 30.
    Schnitzpan, P., Fritz, M., Roth, S., Schiele, B.: Discriminative structure learning of hierarchical representations for object detection. In: CVPR, pp. 2238–2245 (2009)Google Scholar
  31. 31.
    Schnitzspan, P., Roth, S., Schiele, B.: Automatic discovery of meaningful object parts with latent crfs. In: CVPR, pp. 121–128 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Interdisciplinary Center for Scientific ComputingUniversity of HeidelbergGermany

Personalised recommendations