Learning Compact Class Codes for Fast Inference in Large Multi Class Classification

  • M. Cissé
  • T. Artières
  • Patrick Gallinari
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)


We describe a new approach for classification with a very large number of classes where we assume some class similarity information is available, e.g. through a hierarchical organization. The proposed method learns a compact binary code using such an existing similarity information defined on classes. Binary classifiers are then trained using this code and decoding is performed using a simple nearest neighbor rule. This strategy, related to Error Correcting Output Codes methods, is shown to perform similarly or better than the standard and efficient one-vs-all approach, with much lower inference complexity.


Training Sample Hide Unit Code Length Stochastic Gradient Descent Neighbor Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Weinberger, K., Chapelle, O.: Large margin taxonomy embedding for document categorization. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 1737–1744 (2009)Google Scholar
  2. 2.
    Bennett, P.N., Nguyen, N.: Refined experts: improving classification in large taxonomies. In: SIGIR, pp. 11–18 (2009)Google Scholar
  3. 3.
    Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi class tasks. In: Advances in Neural information Processing (2010)Google Scholar
  4. 4.
    Xiao, L., Zhou, D., Wu, M.: Hierarchical classification via orthogonal transfer. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 801–808. ACM, New York (2011)Google Scholar
  5. 5.
    Deng, J., Satheesh, S., Berg, A.C., Li, F.F.: Fast and balanced: Efficient label tree learning for large scale object recognition. In: NIPS, pp. 567–575 (2011)Google Scholar
  6. 6.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)zbMATHGoogle Scholar
  7. 7.
    Weinberger, K., Chapelle, O.: Large taxonomy embedding with an application to document categorization. In: Advances in Neural Information Processing (2008)Google Scholar
  8. 8.
    Kosmopoulos, A., Gaussier, E., Paliouras, G., Aseervatham, S.: The ecir 2010 large scale hierarchical classification workshop. SIGIR Forum 44(1), 23–32 (2010)CrossRefGoogle Scholar
  9. 9.
    Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., Strehl, A.: Conditional probability tree estimation analysis and algorithms. In: Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 2009), pp. 51–58. AUAI Press, Corvallis (2009)Google Scholar
  10. 10.
    Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 78–87 (2004)Google Scholar
  11. 11.
    Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Allwein, E.L., Schapire, R.E., Singer, Y., Kaelbling, P.: Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141 (2000)Google Scholar
  13. 13.
    Gallinari, P., LeCun, Y., Thiria, S., Fogelma-soulie, F.: Mémoires associatives distribuées: une comparaison (distributed associative memories: a comparison). In: Proceedings of COGNITIVA 1987, Paris, La Villette, Cesta-Afcet (May 1987)Google Scholar
  14. 14.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine learning, ICML 2008, pp. 1096–1103. ACM, New York (2008)CrossRefGoogle Scholar
  15. 15.
    Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a siamese time delay neural network. In: NIPS, pp. 737–744 (1993)Google Scholar
  16. 16.
    Pujol, O., Escalera, S., Radeva, P.: An incremental node embedding technique for error correcting output codes. Pattern Recogn. 41(2), 713–725 (2008)zbMATHCrossRefGoogle Scholar
  17. 17.
    Moore, A.: Efficient memory-based learning for robot control (October 1990)Google Scholar
  18. 18.
    Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS, pp. 1753–1760 (2008)Google Scholar
  19. 19.
    Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI, pp. 646–651 (2008)Google Scholar
  20. 20.
    Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: NIPS, pp. 1410–1418 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • M. Cissé
    • 1
  • T. Artières
    • 1
  • Patrick Gallinari
    • 1
  1. 1.Laboratoire d’Informatique de Paris 6 (LIP6)Université Pierre et Marie CurieParisFrance

Personalised recommendations