Skip to main content

Binary Stochastic Representations for Large Multi-class Classification

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

Abstract

Classification with a large number of classes is a key problem in machine learning and corresponds to many real-world applications like tagging of images or textual documents in social networks. If one-vs-all methods usually reach top performance in this context, these approaches suffer of a high inference complexity, linear w.r.t. the number of categories. Different models based on the notion of binary codes have been proposed to overcome this limitation, achieving in a sublinear inference complexity. But they a priori need to decide which binary code to associate to which category before learning using more or less complex heuristics. We propose a new end-to-end model which aims at simultaneously learning to associate binary codes with categories, but also learning to map inputs to binary codes. This approach called Deep Stochastic Neural Codes (DSNC) keeps the sublinear inference complexity but do not need any a priori tuning. Experimental results on different datasets show the effectiveness of the approach w.r.t. baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In practice, a code of size \(k \log K\) is needed with k ranging between 10 and 20.

  2. 2.

    The code size denotes in this case the number of hidden units.

  3. 3.

    One-versus-all algorithm has been tested with results similar to the best MLP score.

  4. 4.

    All the models share the same encoding complexity (fowarding the input to the hidden layer, discrete or not).

References

  1. Partalas, I., Kosmopoulos, A., Baskiotis, N., Artieres, T., Paliouras, G., Gaussier, E., Androutsopoulos, I., Amini, M.R., Galinari, P.: Large scale hierarchical text classification challenge : a benchmark for large-scale text classification. arXiv:1503.08581v1 (2015)

  2. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  3. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Good practice in large-scale learning for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 507–520 (2014)

    Article  Google Scholar 

  4. Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432 (2013)

  5. Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. Adv. Neural Inf. Process. Syst. 23, 163–171 (2010)

    Google Scholar 

  6. Weston, J., Makadia, A., Yee, H.: Label partitioning for sublinear ranking. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), vol. 28, pp. 181–189 (2013)

    Google Scholar 

  7. Puget, R., Baskiotis, N.: Hierarchical label partitioning for large scale classification. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2015)

    Google Scholar 

  8. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)

    MATH  Google Scholar 

  9. Zhong, G., Cheriet, M.: Adaptive error-correcting output codes. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI 2013), pp. 1932–1938. AAAI Press (2013)

    Google Scholar 

  10. Cissé, M., Artières, T., Gallinari, P.: Learning compact class codes for fast inference in large multi class classification. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS, vol. 7523, pp. 506–520. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33460-3_38

    Chapter  Google Scholar 

  11. Norouzi, M., Punjani, A., Fleet, D.J.: Fast exact search in hamming space with multi-index hashing. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1107–1119 (2014)

    Article  Google Scholar 

  12. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 518–529. Morgan Kaufmann Publishers Inc. (1999)

    Google Scholar 

  13. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. Adv. Neural Inf. Process. Syst. 21, 1753–1760 (2009)

    Google Scholar 

  14. Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approx. Reason. 50(7), 969–978 (2009)

    Article  Google Scholar 

  15. Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. arXiv:1504.03410 (2015)

  16. Do, T.-T., Doan, A.-D., Cheung, N.-M.: Learning to hash with binary deep neural network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 219–234. Springer, Cham (2016). doi:10.1007/978-3-319-46454-1_14

    Chapter  Google Scholar 

  17. Wang, J., Zhang, T., Sebe, N., Shen, H.T., et al.: A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. (2017, to appear)

    Google Scholar 

  18. Erin Liong, V., Lu, J., Wang, G., Moulin, P., Zhou, J.: Deep hashing for compact binary codes learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  19. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992)

    MATH  Google Scholar 

  20. Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recognit. 46(12), 3412–3424 (2013)

    Article  Google Scholar 

  21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv:1512.03385 (2015)

  22. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)

Download references

Acknowledgments

This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-2015-CRG4-2639.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Gerald .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gerald, T., Baskiotis, N., Denoyer, L. (2017). Binary Stochastic Representations for Large Multi-class Classification. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70087-8_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70086-1

  • Online ISBN: 978-3-319-70087-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics