International Journal of Computer Vision

, Volume 99, Issue 3, pp 281–301 | Cite as

On Taxonomies for Multi-class Image Categorization

  • Alexander Binder
  • Klaus-Robert Müller
  • Motoaki Kawanabe
Open Access
Article

Abstract

We study the problem of classifying images into a given, pre-determined taxonomy. This task can be elegantly translated into the structured learning framework. However, despite its power, structured learning has known limits in scalability due to its high memory requirements and slow training process. We propose an efficient approximation of the structured learning approach by an ensemble of local support vector machines (SVMs) that can be trained efficiently with standard techniques. A first theoretical discussion and experiments on toy-data allow to shed light onto why taxonomy-based classification can outperform taxonomy-free approaches and why an appropriately combined ensemble of local SVMs might be of high practical use. Further empirical results on subsets of Caltech256 and VOC2006 data indeed show that our local SVM formulation can effectively exploit the taxonomy structure and thus outperforms standard multi-class classification algorithms while it achieves on par results with taxonomy-based structured algorithms at a significantly decreased computing time.

Keywords

Multi-class object categorization Taxonomies Support vector machine Structure learning 

References

  1. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135. MATHGoogle Scholar
  2. Blaschko, M. B., & Gretton, A. (2009). Learning taxonomies by dependence maximization. In Advances in neural information processing systems. Google Scholar
  3. Bosch, A. (2007). Image classification for a large number of object categories. Ph.D. thesis, University of Girona. Google Scholar
  4. Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Proceedings of the conference on information and knowledge management. Google Scholar
  5. Cortes, C., & Vapnik, V. (1995). Support-vector networks. In Machine Learning (pp. 273–297). Google Scholar
  6. Dollár, P., Babenko, B., Belongie, S. J., Perona, P., & Tu, Z. (2008). Multiple component learning for object detection. In ECCV (pp. 211–224). Google Scholar
  7. Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf.
  8. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
  9. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (voc2008) results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.
  10. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The PASCAL visual object classes challenge 2009 (voc2009) results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.
  11. Fan, X. (2005). Efficient multiclass object detection by a hierarchy of classifiers. In CVPR (pp. 716–723). Google Scholar
  12. Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. A. (2009). Describing objects by their attributes. In CVPR (pp. 1778–1785). Google Scholar
  13. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(1). Google Scholar
  14. Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303. CrossRefGoogle Scholar
  15. Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV. Google Scholar
  16. Griffin, G., & Perona, P. (2008). Learning and using taxonomies for fast visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR). Google Scholar
  17. Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset (Technical Report 7694). California Institute of Technology. Google Scholar
  18. Har-Peled, S., Roth, D., & Zimak, D. (2002). Constraint classification for multi–class classification and ranking. In Advances in neural information processing systems. Google Scholar
  19. Joachims, T. (1999). Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning. Cambridge: MIT Press. Google Scholar
  20. Kishida, K. (2005). Property of average precision and its generalization: an examination of evaluation indicator for information retrieval experiments (Technical report). National Institute of Informatics, Japan. Google Scholar
  21. Lafferty, J., Zhu, X., & Liu, Y. (2004). Kernel conditional random fields: representation and clique selection. In Proceedings of the international conference on machine learning. Google Scholar
  22. Lampert, C. H., & Blaschko, M. B. (2008). A multiple kernel learning approach to joint multi-class object detection. In Proceedings of the 30th DAGM symposium on pattern recognition. Google Scholar
  23. Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR (pp. 951–958). Google Scholar
  24. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. 2169–2178). New York, USA. Google Scholar
  25. Lowe, D. (2004). Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  26. Marszalek, M., & Schmid, C. (2007). Semantic hierarchies for visual object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. Google Scholar
  27. Marszalek, M., & Schmid, C. (2008). Constructing category hierarchies for visual recognition. In Proceedings of the European conference on computer vision. Google Scholar
  28. Moosmann, F., Nowak, E., & Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646. CrossRefGoogle Scholar
  29. Müller, K. R., Mika, S., Rätsch, G., Tsuda, S., & Schölkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–202. CrossRefGoogle Scholar
  30. Ommer, B., & Buhmann, J. M. (2010). Learning the compositional nature of visual object categories for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 501–516. CrossRefGoogle Scholar
  31. Ommer, B., Sauter, M., & Buhmann, J. M. (2006). Learning top-down grouping of compositional hierarchies for recognition. In CVPRW’06: proceedings of the 2006 conference on computer vision and pattern recognition workshop (p. 194), Washington, DC, USA. Los Alamitos: IEEE Comput. Soc. CrossRefGoogle Scholar
  32. Platt, J. (1999). In Probabilistic outputs for support vector machine and comparison to regularized likelihood methods. Google Scholar
  33. Qi, G. J., Hur, X. S., & Zhang, H. J. (2009). Learning semantic distance from community-tagged media collection. In MM’09: proceedings of the seventeen ACM international conference on Multimedia (pp. 243–252). CrossRefGoogle Scholar
  34. Schölkopf, B., & Smola, A. J. (2001). Learning with Kernels: support vector machines, regularization, optimization, and beyond. Adaptive computation and machine learning. Cambridge: MIT Press. Google Scholar
  35. Shahbaz Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In IEEE conference on computer vision (ICCV’09). Google Scholar
  36. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. CrossRefGoogle Scholar
  37. Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., de Bona, F., Binder, A., Gehl, C., & Franc, V. (2010). The SHOGUN machine learning toolbox. Journal of Machine Learning Research, 11, 1799–1802. Google Scholar
  38. Tahir, M., van de Sande, K., Uijlings, J., Yan, F., Li, X., Mikolajczyk, K., Kittler, J., Gevers, T., & Smeulders, A. (2008). SurreyUVA SRKDA method. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/workshop/tahir.pdf.
  39. Taskar, B., Guestrin, C., & Koller, D. (2004). Max–margin Markov networks. In Advances in neural information processing systems. Google Scholar
  40. Tibshirani, R., & Hastie, T. (2007). Margin trees for high-dimensional classification. JMLR, 8, 637–652. MATHGoogle Scholar
  41. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484. MathSciNetMATHGoogle Scholar
  42. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596. http://doi.ieeecomputersociety.org/10.1109/TPAMI.2009.154. CrossRefGoogle Scholar
  43. Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In ESANN (pp. 219–224). Google Scholar
  44. Yang, L., Jin, R., Sukthankar, R., & Jurie, F. (2008). Unifying discriminative visual codebook generation with classifier training for object category recognition. In Proceedings of IEEE conference on computer vision and pattern recognition, IEEE (pp. 1–8). Google Scholar
  45. Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238. CrossRefGoogle Scholar
  46. Zweig, A., & Weinshall, D. (2007). Exploiting object hierarchy: combining models from different category levels. In ICCV (pp. 1–8). Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Alexander Binder
    • 1
    • 2
  • Klaus-Robert Müller
    • 1
  • Motoaki Kawanabe
    • 1
    • 2
  1. 1.Dep. Computer Science, Machine Learning GroupBerlin Institute of TechnologyBerlinGermany
  2. 2.Dep. Intelligent Data AnalysisFraunhofer FIRSTBerlinGermany

Personalised recommendations