Exploiting label dependencies for improved sample complexity
Purchase on Springer.com
$39.95 / €34.95 / £29.95*
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.
Multi-label classification exhibits several challenges not present in the binary case. The labels may be interdependent, so that the presence of a certain label affects the probability of other labels’ presence. Thus, exploiting dependencies among the labels could be beneficial for the classifier’s predictive performance. Surprisingly, only a few of the existing algorithms address this issue directly by identifying dependent labels explicitly from the dataset. In this paper we propose new approaches for identifying and modeling existing dependencies between labels. One principal contribution of this work is a theoretical confirmation of the reduction in sample complexity that is gained from unconditional dependence. Additionally, we develop methods for identifying conditionally and unconditionally dependent label pairs; clustering them into several mutually exclusive subsets; and finally, performing multi-label classification incorporating the discovered dependencies. We compare these two notions of label dependence (conditional and unconditional) and evaluate their performance on various benchmark and artificial datasets. We also compare and analyze labels identified as dependent by each of the methods. Moreover, we define an ensemble framework for the new methods and compare it to existing ensemble methods. An empirical comparison of the new approaches to existing base-line and state-of-the-art methods on 12 various benchmark datasets demonstrates that in many cases the proposed single-classifier and ensemble methods outperform many multi-label classification algorithms. Perhaps surprisingly, we discover that the weaker notion of unconditional dependence plays the decisive role.
- Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1(1), 151–160. CrossRef
- Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4), 929–965. CrossRef
- Dembczynski, K., Cheng, W., & Hullermeier, E. (2010a). Bayes optimal multilabel classification via probabilistic classifier chains. In Proc. ICML 2010, Haifa, Israel.
- Dembczynski, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2010b). On label dependence in multi-label classification. Working notes of the 2nd international workshop on learning from multi-label data, Haifa, Israel.
- Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
- Eisenstat, D., & Angluin, D. (2007). The VC dimension of k-fold union. Information Processing Letters, 101(5), 181–184. CrossRef
- Eisenstat, D. (2009). k-fold unions of low-dimensional concept classes. Information Processing Letters, 109(23–24), 1232–1234. CrossRef
- Ghamrawi, N., & McCallum, A. (2005). Collective multi-label classification. In CIKM 2005 (pp. 195–200).
- Kearns, M. J., Schapire, R. E., & Sellie, L. (1994). Toward efficient agnostic learning. Machine Learning, 17(2–3), 115–141.
- Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324. CrossRef
- Pollard, D. (1984). Convergence of stochastic processes. New York: Springer. CrossRef
- Read, J., Pfahringer, B., & Holmes, G. (2008). Multi-label classification using ensembles of pruned sets. In Proceedings of eighth IEEE international conference on data mining (pp. 995–1000).
- Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009). Classifier chains for multi-label classification. In Proceedings of 20th European conference on machine learning and knowledge discovery in databases (Vol. 2, pp. 254–269). CrossRef
- Rokach, L. (2008). Genetic algorithm-based feature set partitioning for classification problems. Pattern Recognition, 41(5), 1693–1717. doi:10.1016/j.patcog.2007.10.013. CrossRef
- Rokach, L. (2010). Pattern classification using ensemble methods. Series in machine perception and artificial intelligence: Vol. 75. Singapore: World Scientific.
- Rokach, L., & Maimon, O. (2005). Feature set decomposition for decision trees. Journal of Intelligent Data Analysis, 9(2), 131–158.
- Schapire, R. E., & Singer, Y. (2000). Boostexter: a boosting-based system for text categorization. Machine Learning, 39(2–3), 135–168. CrossRef
- Tenenboim, L., Rokach, L., & Shapira, B. (2009). Multi-label classification by analyzing labels dependencies. In G. Tsoumakas, M. L. Zhang, & Z. H. Zhou (Eds.), Proceedings of the 1st international workshop on learning from multi-label data, Bled, Slovenia (pp. 117–132).
- Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: an ensemble method for multilabel classification. In Proceedings of 18th European conference on machine learning, Warsaw, Poland (pp. 406–417).
- Tsoumakas, G., Katakis, I., & Vlahavas, I. (2008). Effective and efficient multilabel classification in domains with large number of labels. In Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (pp. 30–44).
- Tsoumakas, G., Dimou, A., Spyromitros, E., Mezaris, V., Kompatsiaris, I., & Vlahavas, I. (2009). Correlation-based pruning of stacked binary relevance models for multi-label learning. In G. Tsoumakas, M. L. Zhang, & Z. H. Zhou (Eds.), Proceedings of the 1st international workshop on learning from multi-label data, Bled, Slovenia (pp. 101–116).
- Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook (2nd ed., pp. 667–686). New York: Springer.
- Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 16, 264–279. CrossRef
- Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
- Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259. CrossRef
- Xu, J. (2010). Constructing a fast algorithm for multi-label classification with support vector data description. In IEEE international conference on granular computing (pp. 817–821).
- Zhang, M. L., Peña, J. M., & Robles, V. (2009). Feature selection for multi-label naive Bayes classification. Information Sciences, 179(19), 3218–3229. CrossRef
- Zhang, M., & Zhang, K. (2010). Multi-label learning by exploiting label dependency. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA (pp. 999–1008). http://doi.acm.org/10.1145/1835804.1835930. CrossRef
- Exploiting label dependencies for improved sample complexity
Volume 91, Issue 1 , pp 1-42
- Cover Date
- Print ISSN
- Online ISSN
- Springer US
- Additional Links
- Multi-label classification
- Conditional and unconditional label dependence
- Generalization bounds
- Multi-label evaluation measures
- Ensemble learning algorithms
- Ensemble models diversity
- Empirical experiment
- Artificial datasets
- Industry Sectors
- Author Affiliations
- 1. Department of Information Systems Engineering and Telekom Innovation Laboratories, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
- 2. IBM Research, Haifa, Israel
- 3. Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel