Machine Learning

, Volume 88, Issue 1, pp 5-45

First online:

On label dependence and loss minimization in multi-label classification

  • Krzysztof DembczyńskiAffiliated withInstitute of Computing Science, Poznań University of Technology Email author 
  • , Willem WaegemanAffiliated withDepartment of Mathematical Modelling, Statistics and Bioinformatics, Ghent University
  • , Weiwei ChengAffiliated withDepartment of Mathematics and Computer Science, Marburg University
  • , Eyke HüllermeierAffiliated withDepartment of Mathematics and Computer Science, Marburg University


Most of the multi-label classification (MLC) methods proposed in recent years intended to exploit, in one way or the other, dependencies between the class labels. Comparing to simple binary relevance learning as a baseline, any gain in performance is normally explained by the fact that this method is ignoring such dependencies. Without questioning the correctness of such studies, one has to admit that a blanket explanation of that kind is hiding many subtle details, and indeed, the underlying mechanisms and true reasons for the improvements reported in experimental studies are rarely laid bare. Rather than proposing yet another MLC algorithm, the aim of this paper is to elaborate more closely on the idea of exploiting label dependence, thereby contributing to a better understanding of MLC. Adopting a statistical perspective, we claim that two types of label dependence should be distinguished, namely conditional and marginal dependence. Subsequently, we present three scenarios in which the exploitation of one of these types of dependence may boost the predictive performance of a classifier. In this regard, a close connection with loss minimization is established, showing that the benefit of exploiting label dependence does also depend on the type of loss to be minimized. Concrete theoretical results are presented for two representative loss functions, namely the Hamming loss and the subset 0/1 loss. In addition, we give an overview of state-of-the-art decomposition algorithms for MLC and we try to reveal the reasons for their effectiveness. Our conclusions are supported by carefully designed experiments on synthetic and benchmark data.


Multi-label classification Label dependence Loss functions