Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework

Lobel, Hans; Vidal, René; Mery, Domingo; Soto, Alvaro

doi:10.1007/978-3-642-53842-1_8

Hans Lobel¹⁹,
René Vidal²⁰,
Domingo Mery¹⁹ &
…
Alvaro Soto¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8333))

Included in the following conference series:

Pacific-Rim Symposium on Image and Video Technology

3073 Accesses
11 Citations

Abstract

The Bag-of-Visual-Words (BoVW) model is a popular approach for visual recognition. Used successfully in many different tasks, simplicity and good performance are the main reasons for its popularity. The central aspect of this model, the visual dictionary, is used to build mid-level representations based on low level image descriptors. Classifiers are then trained using these mid-level representations to perform categorization. While most works based on BoVW models have been focused on learning a suitable dictionary or on proposing a suitable pooling strategy, little effort has been devoted to explore and improve the coupling between the dictionary and the top-level classifiers, in order to generate more discriminative models. This problem can be highly complex due to the large dictionary size usually needed by these methods. Also, most BoVW based systems usually perform multiclass categorization using a one-vs-all strategy, ignoring relevant correlations among classes. To tackle the previous issues, we propose a novel approach that jointly learns dictionary words and a proper top-level multiclass classifier. We use a max-margin learning framework to minimize a regularized energy formulation, allowing us to propagate labeled information to guide the commonly unsupervised dictionary learning process. As a result we produce a dictionary that is more compact and discriminative. We test our method on several popular datasets, where we demonstrate that our joint optimization strategy induces a word sharing behavior among the target classes, being able to achieve state-of-the-art performance using far less visual words than previous approaches.

Download to read the full chapter text

Chapter PDF

Constructing Hierarchical Visual Tree for Discriminative Image Representation and Classification

Bag-of-Words Image Representation: Key Ideas and Further Insight

A Novel Method for Scene Categorization Using an Improved Visual Vocabulary Approach

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV (2003)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Yang, L., Jin, R., Sukthankar, R., Jurie, F.: Unifying discriminative visual codebook generation with classifier training for object category reorganization. In: CVPR (2008)
Google Scholar
Niebles, J.C., Wang, H., Li, F.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)
Article Google Scholar
Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: In Workshop on Statistical Learning in Computer Vision, ECCV (2004)
Google Scholar
Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: ICCV, pp. 1800–1807 (2005)
Google Scholar
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: ICCV (2005)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2178 (2006)
Google Scholar
Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: Neural Information Processing Systems, NIPS (2007)
Google Scholar
Lazebnik, S., Raginsky, M.: Supervised learning of quantizer codebooks by information loss minimization. PAMI 31(7), 1294–1309 (2009)
Article Google Scholar
Singaraju, D., Vidal, R.: Using global bag of features models in random fields for joint categorization and segmentation of objects. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2011)
Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Google Scholar
Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR (2010)
Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Supervised dictionary learning. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1033–1040 (2008)
Google Scholar
Lian, X.-C., Li, Z., Lu, B.-L., Zhang, L.: Max-margin dictionary learning for multiclass image categorization. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 157–170. Springer, Heidelberg (2010)
Chapter Google Scholar
Hinton, G., Osindero, S.: A fast learning algorithm for deep belief nets. Neural Computation 18, 2006 (2006)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)
Article Google Scholar
Wang, Y., Mori, G.: Hidden part models for human action recognition: Probabilistic versus max margin. PAMI 33(7), 1310–1323 (2011)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893 (2005)
Google Scholar
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(12), 2037–2041 (2006)
Article Google Scholar
Jain, A., Zappella, L., McClure, P., Vidal, R.: Visual dictionary learning for joint object categorization and segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 718–731. Springer, Heidelberg (2012)
Chapter Google Scholar
Li, L., Su, H., Xing, E., Fei-Fei, L.: Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: Neural Information Processing Systems (NIPS), Vancouver, Canada (December 2010)
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)
Google Scholar
Waechter, A., Biegler, L.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming 106, 25–57 (2006)
Article MATH MathSciNet Google Scholar
Wang, X., Han, T.X., Yan, S.: An hog-lbp human detector with partial occlusion handling. In: IEEE International Conference on Computer Vision (ICCV), pp. 32–39 (2009)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 145–175 (2001)
Article MATH Google Scholar
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2010)
Google Scholar
Shabou, A., Le-Borgne, H.: Locality-constrained and spatially regularized coding for scene categorization. In: CVPR (2012)
Google Scholar
Parizi, S., Oberlin, J., Felzenszwalb, P.: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2775–2782 (2012)
Google Scholar
Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Ponficia Universidad Católica de Chile, Chile
Hans Lobel, Domingo Mery & Alvaro Soto
Center for Imaging Science, Johns Hopkins University, USA
René Vidal

Authors

Hans Lobel
View author publications
You can also search for this author in PubMed Google Scholar
René Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Domingo Mery
View author publications
You can also search for this author in PubMed Google Scholar
Alvaro Soto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of Auckland, 1142, Auckland, New Zealand
Reinhard Klette
Centro de Investigación en Matematicas A.C., 36000, Guanajuato, Mexico
Mariano Rivera
National Institute of Informatics, 101-8430, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lobel, H., Vidal, R., Mery, D., Soto, A. (2014). Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework. In: Klette, R., Rivera, M., Satoh, S. (eds) Image and Video Technology. PSIVT 2013. Lecture Notes in Computer Science, vol 8333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53842-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-53842-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53841-4
Online ISBN: 978-3-642-53842-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework

Abstract

Chapter PDF

Similar content being viewed by others

Constructing Hierarchical Visual Tree for Discriminative Image Representation and Classification

Bag-of-Words Image Representation: Key Ideas and Further Insight

A Novel Method for Scene Categorization Using an Improved Visual Vocabulary Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework

Abstract

Chapter PDF

Similar content being viewed by others

Constructing Hierarchical Visual Tree for Discriminative Image Representation and Classification

Bag-of-Words Image Representation: Key Ideas and Further Insight

A Novel Method for Scene Categorization Using an Improved Visual Vocabulary Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation