Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework

  • Hans Lobel
  • René Vidal
  • Domingo Mery
  • Alvaro Soto
Conference paper

DOI: 10.1007/978-3-642-53842-1_8

Volume 8333 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Lobel H., Vidal R., Mery D., Soto A. (2014) Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework. In: Klette R., Rivera M., Satoh S. (eds) Image and Video Technology. PSIVT 2013. Lecture Notes in Computer Science, vol 8333. Springer, Berlin, Heidelberg

Abstract

The Bag-of-Visual-Words (BoVW) model is a popular approach for visual recognition. Used successfully in many different tasks, simplicity and good performance are the main reasons for its popularity. The central aspect of this model, the visual dictionary, is used to build mid-level representations based on low level image descriptors. Classifiers are then trained using these mid-level representations to perform categorization. While most works based on BoVW models have been focused on learning a suitable dictionary or on proposing a suitable pooling strategy, little effort has been devoted to explore and improve the coupling between the dictionary and the top-level classifiers, in order to generate more discriminative models. This problem can be highly complex due to the large dictionary size usually needed by these methods. Also, most BoVW based systems usually perform multiclass categorization using a one-vs-all strategy, ignoring relevant correlations among classes. To tackle the previous issues, we propose a novel approach that jointly learns dictionary words and a proper top-level multiclass classifier. We use a max-margin learning framework to minimize a regularized energy formulation, allowing us to propagate labeled information to guide the commonly unsupervised dictionary learning process. As a result we produce a dictionary that is more compact and discriminative. We test our method on several popular datasets, where we demonstrate that our joint optimization strategy induces a word sharing behavior among the target classes, being able to achieve state-of-the-art performance using far less visual words than previous approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Hans Lobel
    • 1
  • René Vidal
    • 2
  • Domingo Mery
    • 1
  • Alvaro Soto
    • 1
  1. 1.Department of Computer SciencePonficia Universidad Católica de ChileChile
  2. 2.Center for Imaging ScienceJohns Hopkins UniversityUSA