Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

Sun, Jian; Ponce, Jean

doi:10.1007/s11263-016-0899-0

Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

Published: 21 March 2016

Volume 120, pages 111–133, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1309 Accesses
23 Citations
Explore all metrics

Abstract

This paper proposes a novel approach to learning mid-level image models for image categorization and cosegmentation. We represent each image class by a dictionary of part detectors that best discriminate that class from the background. We learn category-specific part detectors in a weakly supervised setting in which the training images are only annotated with category labels without part/object location information. We use a latent SVM model regularized using the \(\ell _{2,1}\) group sparsity norm to learn the part detectors. Starting from a large set of initial parts, the group sparsity regularizer forces the model to jointly select and optimize a set of discriminative part detectors in a max-margin framework. We propose a stochastic version of a proximal algorithm to solve the corresponding optimization problem. We apply the learned part detectors to image classification and cosegmentation, and present extensive comparative experiments with standard benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Learning Discriminative Mid-Level Patches for Fast Scene Classification

Training Deformable Object Models for Human Detection Based on Alignment and Clustering

Cluster Centers Provide Good First Labels for Object Detection

Notes

https://github.com/exploreman/discriminative_parts.
http://www.csie.ntu.edu.tw/~cjlin/liblinear/.
In our approach, image correspondence cues can be disabled by setting \(\alpha _m = 0.\)
http://research.microsoft.com/en-us/projects/objectclassrecognition/.

References

Ahmed, E., Shakhnarovich, G., & Maji, S. (2014). Knowing a good hog filter when you see it: Efficient selection of filters for detection. In ECCV.
Arbeláez, P., Hariharan, B., Gu, C., Gupta, S., Bourdev, L., & Malik, J. (2012). Semantic segmentation using regions and parts. In CVPR.
Azizpour, H., & Laptev, I. (2012). Object detection using strongly-supervised deformable part models. In ECCV.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.
Article MathSciNet MATH Google Scholar
Bo, L., Ren, X., & Fox, D. (2013). Multipath sparse coding using hierarchical matching pursuit. In CVPR.
Bo, L., & Sminchisescu, C. (2009). Efficient match kernel between sets of features for visual recognition. In NIPS.
Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010) Detecting people using mutually consistent poselet activations. In ECCV (pp. 168–181).
Bourdev, L., & Malik, J. (2009) Poselets: Body part detectors trained using 3d human pose annotations. In ICCV.
Boureau, Y., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In CVPR.
Boureau, Y., Le Roux, N., Bach, F., Ponce, J., & LeCun, Y. (2011). Ask the locals: Multi-way local pooling for image recognition. In ICCV.
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.
Article MATH Google Scholar
Chen, X., Shrivastava, A., & Gupta, A. (2013). Neil: Extracting visual knowledge from web data. In ICCV.
Chen, X., Shrivastava, A., & Gupta, A. (2015). Enriching visual knowledge bases via object discovery and segmentation. In CVPR.
Cheng, M. M., Zhang, G. X., Mitra, N. J., Huang, X., & Hu, S. M. (2011). Global contrast based salient region detection. In CVPR.
Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recognition and segmentation. In CVPR.
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning in Computer Vision.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR.
Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In NIPS.
Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. (2012). What makes paris look like Paris? ACM Transactions on Graphics, 31(4), 101:1–101:9.
Article Google Scholar
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML.
Duchenne, O., Joulin, A., & Ponce, J. (2011). A graph-matching kernel for object categorization. In ICCV.
Duchi, J., & Singer, Y. (2009). Efficient learning using forward-backward splitting. In NIPS.
Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.
Article MathSciNet Google Scholar
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In CVPR Workshop on Generative-Model Based Vision.
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Girshick, R., Iandola, F., Darrell, T., & Malik, J. (2015). Deformable part models are convolutional neural networks. In CVPR.
Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In ECCV.
Griffin, G., & Holub, A. (2007). Perona, P.: Caltech-256 object category data set.
Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In ECCV.
Jiang, Z., Lin, Z., & Davis, L. S. (2011). Learning a discriminative dictionary for sparse coding via label consistent k-svd. In CVPR.
Jiang, Z., Lin, Z., & Davis, L. S. (2013). Label consistent k-svd: Learning a discriminative dictionary for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2651–2664.
Article Google Scholar
Joulin, A., Bach, F., & Ponce, J. (2010). Discriminative clustering for image co-segmentation. In CVPR.
Joulin, A., Bach, F., & Ponce, J. (2012). Multi-class cosegmentation. In CVPR.
Juneja, M., Vedaldi, A., Jawahar, C., & Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In CVPR.
Kim, G., & Xing, E. P. (2012). On multiple foreground cosegmentation. In CVPR.
Kim, G., Xing, E. P., Fei-Fei, L., & Kanade, T. (2011). Distributed cosegmentation via submodular optimization on anisotropic diffusion. In ICCV.
Kim, J., Liu, C., Sha, F., & Grauman, K. (2013). Deformable spatial pyramid matching for fast dense correspondences. In CVPR.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS (pp. 1097–1105).
Kuettel, D., Guillaumin, M., & Ferrari, V. (2012). Segmentation propagation in ImageNet. In ECCV.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR .
Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS.
Li, L. J., & Fei-Fei, L. (2007). What, where and who? Classifying events by scene and object recognition. In ICCV.
Lin, D., Lu, C., Liao, R., & Jia, J. (2014). Learning important spatial pooling regions for scene classification. In CVPR.
Liu, L., Wang, L., & Liu, X. (2011). In defense of soft-assignment coding. In ICCV.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In CVPR.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In ICML.
Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2008). Discriminative learned dictionaries for local image analysis. In CVPR.
M. Juneja, Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In CVPR.
Mukherjee, L., Singh, V., & Peng, J. (2011). Scale invariant cosegmentation for image groups. In CVPR.
Mukherjee, L., Singh, V., Xu, J., & Collins, M. D. (2012). Analyzing the subspace structure of related images:concurrent segmentation of image sets. In ECCV.
Oliva, A., & Torralba, A. (2010). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Article MATH Google Scholar
Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 37(23), 3311–3325.
Article Google Scholar
Pandey, M., & Lazebnik, S. (2011). Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV.
Parizi, S. N., Oberlin, J. G., & Felzenszwalb, P. F. (2012). Reconfigurable models for scene recognition. In CVPR.
Parizi, S. N., Vedaldi, A., Zisserman, A., & Felzenszwalb, P. (2015). Automatic discovery and optimization of parts for image classification. In ICLR.
Perronnin, F., Sanchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV.
Quattoni, A., & A. Torralba (2009). Recognizing indoor scenes. In CVPR.
Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.
Article Google Scholar
Rubinstein, M., Joulin, A., Kopf, J., & Liu, C. (2013). Unsupervised joint object discovery and segmentation in internet images. In CVPR.
Sadeghi, F., & Tappen, M. F. (2012). Latent pyramidal regions for recognizing scenes. In ECCV.
Sánchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the Fisher vector: Theory and practice. International Journal of Computer Vision, 105(3), 222–245.
Article MathSciNet MATH Google Scholar
Santosh, K., Divvala, A. A. E., & Hebert, M. (2012). How important are deformable parts in the deformable parts model? In ECCV Workshop on Parts and Attributes.
Seidenari, L., Serra, G., Bagdanov, A. D., & Bimbo, A. D. (2014). Local pyramidal descriptors for image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 1033–1040.
Article Google Scholar
Sharma, G., Jurie, F., & Schmid, C. (2012). Discriminative spatial saliency for image classification. In CVPR.
Singh, S., Gupta, A., & Efros, A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.
Siva, P., Russell, C., & Xiang, T. (2012). In defence of negative mining for annotating weakly labelled data. In ECCV.
Su, Y., & Jurie, F. (2011). Visual word disambiguation by semantic contexts. In ICCV.
Sun, J., & Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In ICCV.
Todorovic, S., & Ahuja, N. (2008). Learning subcategory relevances for category recognition. In CVPR.
Vezhnevets, A., & Buhmann, J. M. (2012). Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning. In CVPR.
Vezhnevets, A., Ferrari, V., & Buhmann, J. M. (2012). Weakly supervised structured output learning for semantic segmentation. In CVPR.
Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In CVPR.
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality constrained linear coding for image classification. In CVPR.
Wang, X., Wang, B., Bai, X., Liu, W., & Tu, Z. (2013). Max-margin multiple-instance dictionary learning. In ICML.
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In CVPR.
Yan, S., Xu, X., Xu, D., Lin, S., & Li, X. (2012). Beyond spatial pyramids: A new feature extraction framework with dense spatial sampling for image classification. In ECCV.
Yang, J., Li, Y., Tian, Y., Duan, L., & Gao, W. (2009). Group-sensitive multiple kernel learning for object categorization. In CVPR.
Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In CVPR.
Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., & Fei-Fei, L. (2011). Human action recognition by learning bases of action attributes and parts. In ICCV.
Yuan, M., & Lin, Y. (2005). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1), 49–67.
Article MathSciNet MATH Google Scholar
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.
Zheng, Y., Jiang, Y. G., & Xue, X. (2012). Learning hybrid part filters for scene recognition. In ECCV.
Zuo, Z., Wang, G., Shuai, B., Zhao, L., Yang, Q., & Jiang, X. (2014). Learning discriminative and shareable features for scene classification. In ECCV.

Download references

Acknowledgments

Jian Sun was supported by NSFC (No. 61472313, 11131006), the 973 program (2013CB329404), NCET-12-0442, and NSFC (No. 61303121). Jean Ponce’s work was supported in part by European Research Council (VideoWorld project) and the Institut Universitaire de France.

Author information

Authors and Affiliations

Xi’an Jiaotong University, No.28, Xianning West Road, Xi’an, 710049, Shaanxi, People’s Republic of China
Jian Sun
École Normale Supérieure / PSL Research University, 45, Rue d’Ulm, 75005, Paris, France
Jean Ponce

Authors

Jian Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jean Ponce
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Sun.

Additional information

Communicated by Derek Hoiem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, J., Ponce, J. Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation. Int J Comput Vis 120, 111–133 (2016). https://doi.org/10.1007/s11263-016-0899-0

Download citation

Received: 02 July 2015
Accepted: 02 March 2016
Published: 21 March 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11263-016-0899-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

Abstract

Access this article

Similar content being viewed by others

Learning Discriminative Mid-Level Patches for Fast Scene Classification

Training Deformable Object Models for Human Detection Based on Alignment and Clustering

Cluster Centers Provide Good First Labels for Object Detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

Abstract

Access this article

Similar content being viewed by others

Learning Discriminative Mid-Level Patches for Fast Scene Classification

Training Deformable Object Models for Human Detection Based on Alignment and Clustering

Cluster Centers Provide Good First Labels for Object Detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation