Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2015: Machine Learning and Knowledge Discovery in Databases pp 102-118 | Cite as

Predicting Unseen Labels Using Label Hierarchies in Large-Scale Multi-label Learning

  • Jinseok Nam
  • Eneldo Loza Mencía
  • Hyunwoo J. Kim
  • Johannes Fürnkranz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)


An important problem in multi-label classification is to capture label patterns or underlying structures that have an impact on such patterns. One way of learning underlying structures over labels is to project both instances and labels into the same space where an instance and its relevant labels tend to have similar representations. In this paper, we present a novel method to learn a joint space of instances and labels by leveraging a hierarchy of labels. We also present an efficient method for pretraining vector representations of labels, namely label embeddings, from large amounts of label co-occurrence patterns and hierarchical structures of labels. This approach also allows us to make predictions on labels that have not been seen during training. We empirically show that the use of pretrained label embeddings allows us to obtain higher accuracies on unseen labels even when the number of labels are quite large. Our experimental results also demonstrate qualitatively that the proposed method is able to learn regularities among labels by exploiting a label hierarchy as well as label co-occurrences.


Neural Information Processing System Label Pattern Machine Learn Research Binary Relevance Label Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Balikas, G., Partalas, I., Ngomo, A.N., Krithara, A., Paliouras, G.: Results of the BioASQ track of the question answering lab at CLEF 2014. In: Working Notes for CLEF 2014 Conference, pp. 1181–1193 (2014)Google Scholar
  2. 2.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  3. 3.
    Bi, W., Kwok, J.T.: Multilabel classification on tree- and DAG-structured hierarchies. In: Proceedings of the 28th International Conference on Machine Learning, pp. 17–24 (2011)Google Scholar
  4. 4.
    Bi, W., Kwok, J.T.: Efficient multi-label classification with many labels. In: Proc. of the 30th International Conference on Machine Learning, pp. 405–413 (2013)Google Scholar
  5. 5.
    Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. Journal of Machine Learning Research 7, 31–54 (2006)MathSciNetMATHGoogle Scholar
  6. 6.
    Chekina, L., Gutfreund, D., Kontorovich, A., Rokach, L., Shapira, B.: Exploiting label dependencies for improved sample complexity. Machine Learning 91(1), 1–42 (2013)CrossRefMathSciNetMATHGoogle Scholar
  7. 7.
    Chen, Y.N., Lin, H.T.: Feature-aware label space dimension reduction for multi-label classification. In: Advances in Neural Information Processing Systems, pp. 1529–1537 (2012)Google Scholar
  8. 8.
    Crammer, K., Singer, Y.: A family of additive online algorithms for category ranking. The Journal of Machine Learning Research 3, 1025–1058 (2003)MathSciNetMATHGoogle Scholar
  9. 9.
    Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, E.: On label dependence and loss minimization in multi-label classification. Machine Learning 88(1–2), 5–45 (2012)CrossRefMathSciNetMATHGoogle Scholar
  10. 10.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research 12, 2121–2159 (2011)MathSciNetMATHGoogle Scholar
  11. 11.
    Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. Advances in Neural Information Processing Systems 14, 681–687 (2001)Google Scholar
  12. 12.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008)MATHGoogle Scholar
  13. 13.
    Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems 26, 2121–2129 (2013)Google Scholar
  14. 14.
    Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., Brinker, K.: Multilabel classification via calibrated label ranking. Machine Learning 73(2), 133–153 (2008)CrossRefGoogle Scholar
  15. 15.
    Fürnkranz, J., Sima, J.F.: On exploiting hierarchical label structure with pairwise classifiers. SIGKDD Explorations 12(2), 21–25 (2010)CrossRefGoogle Scholar
  16. 16.
    Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: Ohsumed: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference, pp. 192–201 (1994)Google Scholar
  17. 17.
    Hsu, D., Kakade, S., Langford, J., Zhang, T.: Multi-label prediction via compressed sensing. In: Advances in Neural Information Processing Systems 22, vol. 22, pp. 772–780 (2009)Google Scholar
  18. 18.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(3), 453–465 (2014)CrossRefGoogle Scholar
  19. 19.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  20. 20.
    Loza Mencía, E., Fürnkranz, J.: Pairwise learning of multilabel classifications with perceptrons. In: Proceedings of the International Joint Conference on Neural Networks, pp. 2899–2906 (2008)Google Scholar
  21. 21.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26, 3111–3119 (2013)Google Scholar
  22. 22.
    Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. Advances in Neural Information Processing Systems 22, 1081–1088 (2009)Google Scholar
  23. 23.
    Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pp. 246–252 (2005)Google Scholar
  24. 24.
    Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Machine Learning 85(3), 333–359 (2011)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Recht, B., Re, C., Wright, S., Niu, F.: Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 693–701 (2011)Google Scholar
  26. 26.
    Rousu, J., Saunders, C., Szedmák, S., Shawe-Taylor, J.: Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research 7, 1601–1626 (2006)MATHGoogle Scholar
  27. 27.
    Silla Jr, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22(1–2), 31–72 (2011)CrossRefMathSciNetMATHGoogle Scholar
  28. 28.
    Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)Google Scholar
  29. 29.
    Tai, F., Lin, H.T.: Multilabel classification with principal label space transformation. Neural Computation 24(9), 2508–2542 (2012)CrossRefMathSciNetMATHGoogle Scholar
  30. 30.
    Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Machine Learning 73(2), 185–214 (2008)CrossRefGoogle Scholar
  31. 31.
    Weston, J., Bengio, S., Usunier, N.: Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 2764–2770 (2011)Google Scholar
  32. 32.
    Zimek, A., Buchwald, F., Frank, E., Kramer, S.: A study of hierarchical and flat classification of proteins. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7, 563–571 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jinseok Nam
    • 1
    • 2
    • 3
  • Eneldo Loza Mencía
    • 2
    • 3
  • Hyunwoo J. Kim
    • 4
  • Johannes Fürnkranz
    • 2
    • 3
  1. 1.Knowledge Discovery in Scientific LiteratureTU DarmstadtDarmstadtGermany
  2. 2.Research Training Group AIPHESTU DarmstadtDarmstadtGermany
  3. 3.Knowledge Engineering GroupTU DarmstadtDarmstadtGermany
  4. 4.Department of Computer SciencesUniversity of Wisconsin-MadisonMadisonUSA

Personalised recommendations