Machine Learning

, Volume 106, Issue 9–10, pp 1725–1746 | Cite as

Cost-sensitive label embedding for multi-label classification

  • Kuan-Hao Huang
  • Hsuan-Tien Lin
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2017 Journal Track


Label embedding (LE) is an important family of multi-label classification algorithms that digest the label information jointly for better performance. Different real-world applications evaluate performance by different cost functions of interest. Current LE algorithms often aim to optimize one specific cost function, but they can suffer from bad performance with respect to other cost functions. In this paper, we resolve the performance issue by proposing a novel cost-sensitive LE algorithm that takes the cost function of interest into account. The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using the classic multidimensional scaling approach for manifold learning. CLEMS is able to deal with both symmetric and asymmetric cost functions, and effectively makes cost-sensitive decisions by nearest-neighbor decoding within the embedded vectors. We derive theoretical results that justify how CLEMS achieves the desired cost-sensitivity. Furthermore, extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions.


Multi-label classification Cost-sensitive Label embedding 



We thank the anonymous reviewers for valuable suggestions. This material is based upon work supported by the Air Force Office of Scientific Research, Asian Office of Aerospace Research and Development (AOARD) under award number FA2386-15-1-4012, and by the Ministry of Science and Technology of Taiwan under number MOST 103-2221-E-002-149-MY3.


  1. Balasubramanian, K., & Lebanon, G. (2012). The landmark selection method for multiple output prediction. In ICML.Google Scholar
  2. Barutçuoglu, Z., Schapire, R. E., & Troyanskaya, O. G. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7), 830–836.CrossRefGoogle Scholar
  3. Bhatia, K., Jain, H., Kar, P., Varma, M., & Jain, P. (2015). Sparse local embeddings for extreme multi-label classification. In NIPS (pp. 730–738).Google Scholar
  4. Bi, W., & Kwok, J. T. (2013). Efficient multi-label classification with many labels. In ICML (pp. 405–413).Google Scholar
  5. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefMATHGoogle Scholar
  6. Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410.CrossRefGoogle Scholar
  7. Chen, Y. N., & Lin, H. T. (2012). Feature-aware label space dimension reduction for multi-label classification. In NIPS (pp. 1538–1546).Google Scholar
  8. De Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. Recent Developments in Statistics (pp. 133–145).Google Scholar
  9. Dembczynski, K., Cheng, W., & Hüllermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. In ICML (pp. 279–286).Google Scholar
  10. Dembczynski, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2011). An exact algorithm for F-measure maximization. In NIPS (pp. 1404–1412).Google Scholar
  11. Ferng, C. S., & Lin, H. T. (2013). Multilabel classification using error-correcting codes of hard or soft bits. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1888–1900.CrossRefGoogle Scholar
  12. Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In NIPS (pp. 772–780).Google Scholar
  13. Kapoor, A., Viswanathan, R., & Jain, P. (2012). Multilabel classification using bayesian compressed sensing. In NIPS (pp. 2654–2662).Google Scholar
  14. Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1–27.MathSciNetCrossRefMATHGoogle Scholar
  15. Li, C. L., & Lin, H. T. (2014). Condensed filter tree for cost-sensitive multi-label classification. In ICML (pp. 423–431).Google Scholar
  16. Lin, Z., Ding, G., Hu, M., & Wang, J. (2014). Multi-label classification via feature-aware implicit label space encoding. In ICML (pp. 325–333).Google Scholar
  17. Lo, H. Y., Lin, S. D., & Wang, H. M. (2014). Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering, 26(7), 1679–1691.CrossRefGoogle Scholar
  18. Lo, H. Y., Wang, J. C., Wang, H. M., & Lin, S. D. (2011). Cost-sensitive multi-label learning for audio tag annotation and retrieval. IEEE Transactions on Multimedia, 13(3), 518–529.CrossRefGoogle Scholar
  19. Madjarov, G., Kocev, D., Gjorgjevikj, D., & Dzeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45(9), 3084–3104.CrossRefGoogle Scholar
  20. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetMATHGoogle Scholar
  21. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333–359.MathSciNetCrossRefGoogle Scholar
  22. Read, J., Reutemann, P., Pfahringer, B., & Holmes, G. (2016). MEKA: a multi-label/multi-target extension to Weka. Journal of Machine Learning Research, 17(21), 1–5.MathSciNetMATHGoogle Scholar
  23. Schölkopf, B., Smola, A., & Müller, K. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.CrossRefGoogle Scholar
  24. Sun, L., Ji, S., & Ye, J. (2011). Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 194–200.CrossRefGoogle Scholar
  25. Tai, F., & Lin, H. T. (2012). Multilabel classification with principal label space transformation. Neural Computation, 24(9), 2508–2542.MathSciNetCrossRefMATHGoogle Scholar
  26. Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. P. (2008). Multi-label classification of music into emotions. In ISMIR (pp. 325–330).Google Scholar
  27. Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.CrossRefGoogle Scholar
  28. Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2010). Mining multi-label data. In Data Mining and Knowledge Discovery Handbook (pp. 667–685).Google Scholar
  29. Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2011a). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079–1089.CrossRefGoogle Scholar
  30. Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., & Vlahavas, I. P. (2011b). MULAN: a java library for multi-label learning. Journal of Machine Learning Research, 12, 2411–2414.MathSciNetMATHGoogle Scholar
  31. Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., & Schölkopf, B. (2002). Kernel dependency estimation. In NIPS (pp. 873–880).Google Scholar
  32. Yeh, C. K., Wu, W. C., Ko, W. J., & Wang, Y. C. F. (2017). Learning deep latent space for multi-label classification. In AAAI (pp. 2838–2844).Google Scholar
  33. Yu, H. F., Jain, P., Kar, P., & Dhillon, I. S. (2014). Large-scale multi-label learning with missing labels. In ICML (pp. 593–601)Google Scholar
  34. Zhang, Y., & Schneider, J. G. (2011). Multi-label output codes using canonical correlation analysis. In AISTATS (pp. 873–882).Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.CSIE DepartmentNational Taiwan UniversityTaipeiTaiwan

Personalised recommendations