Progress in Artificial Intelligence

, Volume 3, Issue 2, pp 107–118 | Cite as

Optimizing different loss functions in multilabel classifications

  • Jorge DíezEmail author
  • Oscar Luaces
  • Juan José del Coz
  • Antonio Bahamonde
Regular Paper


Multilabel classification (ML) aims to assign a set of labels to an instance. This generalization of multiclass classification yields to the redefinition of loss functions and the learning tasks become harder. The objective of this paper is to gain insights into the relations of optimization aims and some of the most popular performance measures: subset (or 0/1), Hamming, and the example-based F-measure. To make a fair comparison, we implemented three ML learners for optimizing explicitly each one of these measures in a common framework. This can be done considering a subset of labels as a structured output. Then, we use structured output support vector machines tailored to optimize a given loss function. The paper includes an exhaustive experimental comparison. The conclusion is that in most cases, the optimization of the Hamming loss produces the best or competitive scores. This is a practical result since the Hamming loss can be minimized using a bunch of binary classifiers, one for each label separately, and therefore, it is a scalable and fast method to learn ML tasks. Additionally, we observe that in noise-free learning tasks optimizing the subset loss is the best option, but the differences are very small. We have also noticed that the biggest room for improvement can be found when the goal is to optimize an F-measure in noisy learning tasks.


Multilabel classification Structured outputs Optimization Tensor product 



The research reported here is supported in part under Grant TIN2011-23558 from the MINECO (Ministerio de Economía y Competitividad, Spain), partially supported with FEDER funds. We would also like to acknowledge all those people who generously shared the datasets and software used in this paper.


  1. 1.
    Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2), 211–225 (2009)CrossRefGoogle Scholar
  2. 2.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2, 265–292 (2002)zbMATHGoogle Scholar
  3. 3.
    Dembczyński, K., Cheng, W., Hüllermeier, E.: Bayes optimal multilabel classification via probabilistic classifier chains. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)Google Scholar
  4. 4.
    Dembczyński, K., Kotłowski, W., Jachnik, A., Waegeman, W., Hüllermeier, E.: Optimizing the f-measure in multi-label classification: plug-in rule approach versus structured loss minimization. ICML (2013)Google Scholar
  5. 5.
    Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, E.: An exact algorithm for F-measure maximization. In: Proceedings of the neural information processing systems (NIPS) (2011)Google Scholar
  6. 6.
    Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, E.: On label dependence and loss minimization in multi-label classification. Mach Learn 88, 1–41 (2012)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Díez, J., del Coz, J.J., Luaces, O., Bahamonde, A.: Tensor products to optimize label-based loss measures in multilabel classifications. Tech. rep., Centro de Inteligencia Artificial. Universidad de Oviedo at Gijón (2012)Google Scholar
  8. 8.
    Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 681–687. MIT Press, Cambridge (2001)Google Scholar
  9. 9.
    Gao, W., Zhou, Z.H.: On the consistency of multi-label learning. J Mach Learn Res Proc Track (COLT) 19, 341–358 (2011)Google Scholar
  10. 10.
    Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 195–200. ACM, New York (2005)Google Scholar
  11. 11.
    Hariharan, B., Vishwanathan, S., Varma, M.: Efficient max-margin multi-label classification with applications to zero-shot learning. Mach Learn 88(1–2), 127–155 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the International Conference on Machine Learning (ICML) (2005)Google Scholar
  13. 13.
    Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York (2006)Google Scholar
  14. 14.
    Joachims, T., Finley, T., Yu, C.: Cutting-plane training of structural svms. Mach Learn 77(1), 27–59 (2009)CrossRefzbMATHGoogle Scholar
  15. 15.
    Lampert, C.H.: Maximum margin multi-label structured prediction. In: Advances in Neural Information Processing Systems, pp. 289–297 (2011)Google Scholar
  16. 16.
    Luaces, O., Dfez, J., Barranquero, J., del Coz, J.J., Bahamonde, A.: Binary relevance efficacy for multilabel classification. Prog Artif Intell 4(1), 303–313 (2012)CrossRefGoogle Scholar
  17. 17.
    Madjarov, G., Kocev, D., Gjorgjevikj, D., D\(\rm \check{z}\)eroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit 45(9), 3084–3104 (2012). doi: 10.1016/j.patcog.2012.03.004.
  18. 18.
    Montañés, E., Quevedo, J., del Coz, J.: Aggregating independent and dependent models to learn multi-label classifiers. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 484–500. Springer, Berlin (2011)Google Scholar
  19. 19.
    Montañes, E., Senge, R., Barranquero, J., Ramón Quevedo, J., José del Coz, J., Hüllermeier, E.: Dependent binary relevance models for multi-label classification. Pattern Recognit 47(3), 1494–1508 (2014)CrossRefGoogle Scholar
  20. 20.
    Petterson, J., Caetano, T.: Reverse multi-label learning. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1912–1920 (2010)Google Scholar
  21. 21.
    Petterson, J., Caetano, T.S.: Submodular multi-label learning. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1512–1520 (2011)Google Scholar
  22. 22.
    Quevedo, J.R., Luaces, O., Bahamonde, A.: Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recognit 45(2), 876–883 (2012)zbMATHGoogle Scholar
  23. 23.
    Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 254–269 (2009)Google Scholar
  24. 24.
    Schapire, R., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach Learn 39(2), 135–168 (2000)CrossRefzbMATHGoogle Scholar
  25. 25.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J Mach Learn Res 6(2), 1453 (2006)MathSciNetGoogle Scholar
  26. 26.
    Tsoumakas, G., Katakis, I.: Multi labelclassification: an overview. Int J Data Wareh Min 3(3), 1–13 (2007)CrossRefGoogle Scholar
  27. 27.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multilabel data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Springer, Berlin (2010)Google Scholar
  28. 28.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multi-label classification. IEEE Trans Knowl Discov Data Eng 23, 1079–1089 (2010)CrossRefGoogle Scholar
  29. 29.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  30. 30.
    Vedaldi, A.: A MATLAB wrapper of \({\text{ SVM }}^{{\text{ struct }}}\) (2011).
  31. 31.
    Zaragoza, J., Sucar, L., Bielza, C., Larrañaga, P.: Bayesian chain classifiers for multidimensional classification. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2011)Google Scholar
  32. 32.
    Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7), 2038–2048 (2007)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Jorge Díez
    • 1
    Email author
  • Oscar Luaces
    • 1
  • Juan José del Coz
    • 1
  • Antonio Bahamonde
    • 1
  1. 1.Artificial Intelligence CenterUniversity of OviedoGijónSpain

Personalised recommendations