Abstract
Multi-label classification is a branch of machine learning that can effectively reflect real-world problems. Among all the multi-label classification methods, stacked binary relevance (2BR) is a classic approach. Based on 2BR, a series of optimized algorithms have been derived. Although these algorithms have adapted complex optimizing processes and shown remarkable performance, their core concepts are rather similar, mainly involving restructuring the feature spaces of meta-classifiers. Existing research has rarely discussed that the use of inappropriate two-level predictions causes negative impacts on 2BR structures. In this study, we propose a 2BR-based label correction method named LabCor, which focuses on the identification and correction of unreliable two-level predictions. We first discuss the inner mechanism by which the 2BR-based algorithms obtain their two-level outputs and find a marker that can reflect the reliabilities of the samples predictions. Based on the mechanism, we then introduce a graph based output determination method that can use the training samples to generate dimensional decision patterns. The global label count distribution is also used to reflect the goal of classification problems. In the prediction phase, LabCor uses the decision patterns and label count constraints to identify and correct the misclassified labels. According to the evaluation results, the proposed method can effectively reduce the impact of troubling two-level predictions and yield superior or competitive performance versus well-established 2BR-based algorithms.
Similar content being viewed by others
Code Availability
The source code used in this work is available on Github (https://github.com/KingsWoo/LabCor).
References
Alali A, Kubat M (2015) Prudent: a pruned and confident stacking approach for multi-label classification. IEEE Trans Knowl Data Eng 27(9):2480–2493
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recog 37(9):1757–1771
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach learn Res 7(Jan):1–30
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. In: Advances in neural information processing systems, pp 681–687
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Fürnkranz J, Hüllermeier E, Mencía EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153
Godbole S, Sarawagi S (2004). Discriminative methods for multi-labeled classification. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 22–30
Gonçalves EC, Plastino A, Freitas AA (2015) Simpler is better: a novel genetic algorithm to induce compact multi-label chain classifiers. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp 559–566
Holm S (1979) A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6(2):65–70. Retrieved July 24, 2021, from http://www.jstor.org/stable/4615733
Huang J, Li G, Huang Q, Wu X (2017) Joint feature selection and classification for multilabel learning. IEEE Trans Cybern 48(3):876–889
Huang J, Li G, Wang S, Xue Z, Huang Q (2017) Multi-label classification by exploiting local positive and negative pairwise label correlation. Neurocomputing 257:164–174
Jun X, Lu Y, Lei Z, Guolun D (2019) Conditional entropy based classifier chains for multi-label classification. Neurocomputing 335:185–194
Kumar A, Vembu S, Menon AK, Elkan C (2013) Beam search algorithms for multilabel learning. Mach Learn 92(1):65– 89
Li YK, Zhang ML (2014) Enhancing binary relevance for multi-label learning with controlled label correlations exploitation. In: Pacific Rim International Conference on Artificial Intelligence, Springer, pp 91–103
Liu H, Wang Z, Sun Y (2020) Stacking model of multi-label classification based on pruning strategies. Neural Comput & Applic 32:16763–16774. https://doi.org/10.1007/s00521-018-3888-0
Luchi D, Varejao FM (2014) Recursive dependent binary relevance model for multi-label classification. In: Advances in artificial intelligence–IBERAMIA 2014, vol 8864, Springer, pp 206
Mencía EL, Fürnkranz J, Hüllermeier E, Rapp M (2018) Learning interpretable rules for multi-label classification. In: Explainable and Interpretable Models in Computer Vision and Machine Learning, Springer, pp 81–113
Montañes E, Senge R, Barranquero J, Quevedo JR, del Coz JJ, Hüllermeier E (2014) Dependent binary relevance models for multi-label classification. Pattern Recogn 47(3):1494–1508
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33 (3):1065–1076
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333
Read J, Reutemann P, Pfahringer B, Holmes G (2016) Meka: a multi-label/multi-target extension to weka. J Mach Learn Res 17(1):667–671
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1-2):1–39
Senge R, Del Coz JJ, Hüllermeier E (2014) On the problem of error propagation in classifier chains for multi-label classification. In: Data Analysis, Machine Learning and Knowledge Discovery, Springer, pp 163–170
da Silva PN, Gonçalves EC, Plastino A, Freitas AA (2014) Distinct chains for different instances: An effective strategy for multi-label classifier chains. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 453–468
Trajdos P, Kurzynski M (2019) Dynamic classifier chains for multi-label learning. In: German Conference on Pattern Recognition, Springer, pp 567–580
Tsoumakas G, Vlahavas I (2007) Random k-labelsets: An ensemble method for multilabel classification. In: European conference on machine learning, Springer, pp 406–417
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: a java library for multi-label learning. J Mach Learn Res 12(Jul):2411–2414
Varma S, Simon R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinform 7(1): 1–8
Weng W, Chen CL, Wu SX, Li YW, Wen J (2019) An efficient stacking model of multi-label classification based on pareto optimum. IEEE Access 7:127427–127437
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY et al (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14(1):1– 37
Zhang ML, Wu L (2014) Lift: Multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell 37(1):107–120
Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351
Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recog 40(7):2038–2048
Zhang ML, Zhou ZH (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Zhang Y, Li Y, Cai Z (2015) Correlation-based pruning of dependent binary relevance models for multi-label classification. In: 2015 IEEE 14Th international conference on cognitive informatics & cognitive computing (ICCI* CC). IEEE, pp 399–404
Acknowledgements
This work was supported by the National Key Research and Development Program of China (No. 2018YFC0116901), the National Natural Science Foundation of China (No. 81771936), the Major Scientific Project of Zhejiang Lab, China (No. 2020ND8AD01), the Fundamental Research Funds for the Central Universities (No. 2021FZZX002-18) and the Youth Innovation Team Project of the College of Biomedical Engineering & Instrument Science, Zhejiang University.
Funding
This work was supported by the National Key Research and Development Program of China (No. 2018YFC0116901) and the National Natural Science Foundation of China (No.81771936), which provided financial support in the design of the study, analysis of data and writing of the manuscript; supported by the Major Scientific Project of Zhejiang Lab (No. 2018DG0ZX01), the Fundamental Research Funds for the Central Universities (No. 2021FZZX002-18) and the Youth Innovation Team Project of the College of Biomedical Engineering & Instrument Science, Zhejiang University, which provided financial support for the analysis.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Availability of data and material
Datasets used in this work are available on the Mulan (http://mulan.sourceforge.net/datasets-mlc.html) and Meka (http://waikato.github.io/meka/datasets/) websites.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors Chengkai Wu and Tianshu Zhou contribute equally to this article.
Rights and permissions
About this article
Cite this article
Wu, C., Zhou, T., Wu, J. et al. LabCor: Multi-label classification using a label correction strategy. Appl Intell 52, 5414–5434 (2022). https://doi.org/10.1007/s10489-021-02674-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02674-y