International Journal of Computer Vision

, Volume 119, Issue 1, pp 60–75 | Cite as

Sparse Output Coding for Scalable Visual Recognition

Article

Abstract

Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness of our proposed approach.

Keywords

Scalable classification Output coding Probabilistic decoding Object recognition  Scene recognition 

References

  1. Allwein, E., Schapire, R., & Singer, Y. (2001). Reducing multiclass to binary: A unifying approach for margin classifiers. The Journal of Machine Learning Research, 1, 113–141.MathSciNetMATHGoogle Scholar
  2. Bakker, B., & Heskes, T. (2003). Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4, 83–99.MATHGoogle Scholar
  3. Bengio, S., Weston, J., & Grangier, D. (2010). Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems, pp. 163–171.Google Scholar
  4. Bergamo, A., & Torresani, L. (2012). Meta-class features for large-scale object categorization on a budget. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR ’12).Google Scholar
  5. Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., & Strehl, A. (2009). Conditional probability tree estimation analysis and algorithms. In Conference in Uncertainty in Artificial Intelligence (UAI).Google Scholar
  6. Beygelzimer, A., Langford, J., & Ravikumar, P. (2009). Error-correcting tournaments. In International conference on algorithmic learning theory (ALT).Google Scholar
  7. Binder, A., Mller, K. -R., & Kawanabe, M. (2011). On taxonomies for multi-class image categorization. International Journal of Computer Vision, 1–21.Google Scholar
  8. Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  9. Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In COMPSTAT.Google Scholar
  10. Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation and Trends in Machine Learning, 3(1), 1–122.CrossRefMATHGoogle Scholar
  11. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32, 13–47.CrossRefMATHGoogle Scholar
  12. Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In CIKM.Google Scholar
  13. Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 2, 265–292.MATHGoogle Scholar
  14. Dekel, O., Keshet, J., & Singer, Y. (2004). Large margin hierarchical classification. In ICML.Google Scholar
  15. Deng, J., Berg, A., & Fei-Fei, L. (2011). Hierarchical semantic indexing for large scale image retrieval. In CVPR.Google Scholar
  16. Deng, J., Dong, W., Socher, R., Li, L. -J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition (CVPR).Google Scholar
  17. Deng, J., Satheesh, S., Berg, A., & Fei-Fei, L. (2011). Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS.Google Scholar
  18. Dietterich, T., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.MATHGoogle Scholar
  19. Eckstein, J., & Bertsekas, D. (1992). On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293–318.MathSciNetCrossRefMATHGoogle Scholar
  20. Escalera, S., Pujol, O., & Radeva, P. (2010). On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 120–134.CrossRefGoogle Scholar
  21. Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  22. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In CVPR Workshop on Generative-Model Based Vision.Google Scholar
  23. Fergus, R., Bernal, H., Weiss, Y., & Torralba, A. (2010). Semantic label sharing for learning with many categories. In ECCV. Berlin: Springer.Google Scholar
  24. Gabay, D., & Mercier, B. (1976). A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers and Mathematics with Applications, 2(1), 17–40.CrossRefMATHGoogle Scholar
  25. Gao, T., & Koller, D. (2011). Discriminative learning of relaxed hierarchy for large-scale visual recognition. In International Conference on Computer Vision (ICCV).Google Scholar
  26. Gao, T., & Koller, D. (2011). Multiclass boosting with hinge loss based on output coding. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) .Google Scholar
  27. Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology.Google Scholar
  28. Haussler, D. (1999). Convolution kernels on discrete structures. Technical report.Google Scholar
  29. Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In Proceedings of NIPS.Google Scholar
  30. Jacob, L., Bach, F., & Vert, J. -P. (2008). Clustered multi-task learning: A convex formulation. In Advances in Neural Information Processing Systems NIPS.Google Scholar
  31. Koller, D., & Sahami, M. (1997). Hierarchically classifying docuemnts using very few words. In ICML.Google Scholar
  32. Kosmopoulos, A., Gaussier, E., Paliouras, G., & Aseervatham, S. (2010). The ecir 2010 large scale hierarchical classification workshop. SIGIR Forum, 44(1), 23–32.CrossRefGoogle Scholar
  33. Kumar, N., Berg, A., Belhumeur, P., & Nayar, S. (2009). Attribute and simile classifiers for face verification. In 2009 IEEE 12th International Conference on Computer Vision (ICCV).Google Scholar
  34. Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  35. Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. (2012). Building high-level features using large scale unsupervised learning. In ICML.Google Scholar
  36. Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010). Object bank: A highlevel image representation for scene classification and semantic feature sparsification. In Proceedings of NIPS.Google Scholar
  37. Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L., & Huang. T. (2011). Large-scale image classification: fast feature extraction and svm training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1689–1696.Google Scholar
  38. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRefGoogle Scholar
  39. Nilsson, N. (1965). Learning Machines. New York: McGraw-Hill.MATHGoogle Scholar
  40. Parsana, M., Bhattacharya, S., Bhattacharyya, C., & Ramakrishnan, K. (2007). Kernels on attributed pointsets with applications. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
  41. Passerini, A., Pontil, M., & Frasconi, P. (2004). New results on error correcting output codes of kernel machines. IEEE Transactions on Neural Networks, 15(1), 45–54.CrossRefGoogle Scholar
  42. Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  43. Póczos, B., Xiong, L., & Schneider, J. (2011). Nonparametric divergence estimation with applications to machine learning on distributions. In UAI.Google Scholar
  44. Pujol, O., Radeva, P., & Vitria, J. (2006). Discriminant ecoc: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6), 1001–1007.Google Scholar
  45. Rastegari, M., Farhadi, A., & Forsyth, D. (2012). Attribute discovery via predictable discriminative binary codes. In Computer Vision (ECCV). Berlin: Springer.Google Scholar
  46. Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. The Journal of Machine Learning Research, 5, 101–141.Google Scholar
  47. Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.CrossRefGoogle Scholar
  48. Sanchez, Jorge, Perronnin, Florent, Mensink, Thomas, & Verbeek, Jakob. (2013). Image classification with the Fisher vector: Theory and practice. International Journal of Computer Vision, 105(3), 222–245.MathSciNetCrossRefMATHGoogle Scholar
  49. Schapire, R. (1997). Using output codes to boost multiclass learing problems. In ICML .Google Scholar
  50. Schapire, R., & Freund, Y. (2012). Boosting: Foundations and algorithms., Adaptive computation and machine learning series Cambridge: MIT Press.MATHGoogle Scholar
  51. Torralba, A., Fergus, R., & Freeman, W. (2008). 80 Million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1958–1970.CrossRefGoogle Scholar
  52. Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In Computer Vision (ECCV).Google Scholar
  53. Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from flickr using stochastic intersection kernel machines. In IEEE 12th International Conference on Computer Vision (ICCV).Google Scholar
  54. Weinberger, K., & Chapelle, O. (2008). Large margin taxonomy embedding for document categorization. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
  55. Wen, Z., & Yin, W. (2012). A feasible method for optimization with orthogonality constraints. Mathematical Programming, pp. 1–38.Google Scholar
  56. Weston, J., Bengio, S., & Usunier, N. (2011). Wsabie: scaling up to large vocabulary image annotation. In IJCAI.Google Scholar
  57. Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  58. Zhang, X., Liang, L., & Shum, H. (2009). Spectral error correcting output codes for efficient multiclass recognition. In 12th International Conference on Computer Vision (ICCV).Google Scholar
  59. Zhang, Y., & Schneider, J. (2012). Maximum margin output coding. In ICML.Google Scholar
  60. Zhao, B., & Xing, E. (2013). Sparse output coding for large-scale visual recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  61. Zhou, D., Xiao, L., & Wu, M. (2011). Hierarchical classification via orthogonal transfer. In Proceedings of the 28th International Conference on Machine Learning (ICML).Google Scholar
  62. Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations