Abstract
Deformable part-based models [1, 2] achieve state-of-the-art performance for object detection, but rely on heuristic initialization during training due to the optimization of non-convex cost function. This paper investigates limitations of such an initialization and extends earlier methods using additional supervision. We explore strong supervision in terms of annotated object parts and use it to (i) improve model initialization, (ii) optimize model structure, and (iii) handle partial occlusions. Our method is able to deal with sub-optimal and incomplete annotations of object parts and is shown to benefit from semi-supervised learning setups where part-level annotation is provided for a fraction of positive examples only. Experimental results are reported for the detection of six animal classes in PASCAL VOC 2007 and 2010 datasets. We demonstrate significant improvements in detection performance compared to the LSVM [1] and the Poselet [3] object detectors.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI 32, 1627–1645 (2010)
Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: CVPR, pp. 1062–1069 (2010)
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A. (The PASCAL Visual Object Classes Challenge 2010 VOC 2010, Results) (2010)
Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. TC 22, 67–92 (1973)
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61, 55–79 (2005)
Ramanan, D.: Learning to parse images of articulated bodies. In: NIPS (2006)
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Pose search: Retrieving people using their pose. In: CVPR (2009)
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR, pp. 1385–1392 (2011)
Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: CVPR, pp. 2030–2037 (2010)
Cristinacce, D., Cootes, T.: Feature detection and tracking with constrained local models. In: BMVC (2006)
Naderi Parizi, S., Oberlin, J., Felzenszwalb, P.: Reconfigurable models for scene recognition. In: CVPR (2012)
Ott, P., Everingham, M.: Shared parts for deformable part-based models. In: CVPR (2011)
Wang, Y., Tran, D., Liao, Z.: Learning hierarchical poselets for human parsing. In: CVPR, pp. 1705–1712 (2011)
Branson, S., Belongie, S., Perona, P.: Strong supervision from weak annotation: Interactive training of deformable part models. In: ICCV (2011)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. I:886–I:893 (2005)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73, 213–238 (2007)
Chen, Y., Zhu, L(L.), Yuille, A.: Active Mask Hierarchies for Object Detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 43–56. Springer, Heidelberg (2010)
Parkhi, O., Vedaldi, A., Jawahar, C.V., Zisserman, A.: The truth about cats and dogs. In: ICCV (2011)
Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting People Using Mutually Consistent Poselet Activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR (2011)
Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV (2011)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009)
Harris, C., Stephens, C.: A combined corner and edge detector. In: Alvey Vision Conference (1998)
Girshick, A., Felzenszwalb, P., McAllester, D.: LSVM Release 4 Notes, http://www.cs.brown.edu/people/pff/latent/
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Azizpour, H., Laptev, I. (2012). Object Detection Using Strongly-Supervised Deformable Part Models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33718-5_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-33718-5_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33717-8
Online ISBN: 978-3-642-33718-5
eBook Packages: Computer ScienceComputer Science (R0)