Biclustering-based multi-label classification

Schmitke, Luiz Rafael; Paraiso, Emerson Cabrera; Nievola, Julio Cesar

doi:10.1007/s10115-024-02109-3

Biclustering-based multi-label classification

Regular Paper
Published: 23 April 2024

(2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Luiz Rafael Schmitke¹,
Emerson Cabrera Paraiso¹^na1 &
Julio Cesar Nievola¹^na1

84 Accesses
Explore all metrics

Abstract

In multi-label classification, data can have multiple labels simultaneously. Two approaches to this issue are either transforming the multi-label data or adapting single-label algorithms for multi-label data. Despite the problem transformation’s effectiveness, some algorithms use fixed parameters to determine the number of subproblems, and the label relationships maintenance is done without using correlation or co-occurrence measures. In this work, the approach that converts multi-label problems into multiple binary subproblems was chosen because this offers a low execution time, enabling the use of complex single-label algorithms during classification. However, it has low performance in multi-label metrics. Thus, the BicbPT algorithm is introduced, which uses the biclustering technique combined with the multi-label to binary problem transformation to improve performance in multi-label metrics without increasing this transformation’s running time. For the evaluation, comparisons were made with the algorithms BR, CC, ECC, RAkEL and LP. Single-label algorithms SVM, C4.5 and Naive Bayes were applied to classify the binary subproblems across 12 datasets. The experiments demonstrate that BicbPT performed better in the multi-label metrics than the other multi-label to binary algorithms, being similar only to ECC. Still, the running time is up to 10 times higher in ECC, which makes the BicbPT better. Also, it keeps running time similar to algorithms in the multi-label to binary category. Finally, during the experiments, it was possible to perceive that the way the labels influence each other allow to improve the multi-label classification and not only consider maintaining the relationships like other approaches do.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Multi-label classification via closed frequent labelsets and label taxonomies

Article 14 April 2023

Evaluation of Different Data-Derived Label Hierarchies in Multi-label Classification

Predictive Bi-clustering Trees for Hierarchical Multi-label Classification

Notes

References

Witten IH, Frank E, Hall MA, Pal CJ (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Elsevier, https://doi.org/10.1016/C2009-0-19715-5
Lai C-C, Tsai M-C (2004) An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Fourth International Conference on Hybrid Intelligent Systems (HIS’04), pp. 44–48. https://doi.org/10.1109/ICHIS.2004.21
Bulbul HI, Unsal O (2011) Comparison of classification techniques used in machine learning as applied on vocational guidance data. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, 2, 298–301. https://doi.org/10.1109/ICMLA.2011.49
Ubik S, Žejdl P (2010) Evaluating application-layer classification using a machine learning technique over different high speed networks. In: 2010 Fifth International Conference on Systems and Networks Communications, pp. 387–391. https://doi.org/10.1109/ICSNC.2010.66
Zhan Y, Chen H, Zhang S-F, Zheng M (2009) Chinese text categorization study based on feature weight learning. In: 2009 International Conference on Machine Learning and Cybernetics, 3, 1723–1726. https://doi.org/10.1109/ICMLC.2009.5212257
Kashef S, Nezamabadi-pour H, Nikpour B (2018) Multilabel feature selection: a comprehensive review and guiding experiments. WIREs Data Min Knowl Discov. https://doi.org/10.1002/widm.1240
Article Google Scholar
Zhang M-L, Zhou Z-H (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837. https://doi.org/10.1109/TKDE.2013.39
Article Google Scholar
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1):5–45. https://doi.org/10.1007/s10994-012-5285-8
Article MathSciNet Google Scholar
Cherman EA, Monard MC, Metz J (2011) Multi-label Problem Transformation Methods: a Case Study. CLEI Electronic Journal 14:4–4, http://www.scielo.edu.uy/scielo.php?script=sci_arttext &pid=S0717-50002011000100005 &nrm=iso
Tsoumakas G, Katakis I (2007) Multi-label classification. Int J Data Warehous Min 3:1–13. https://doi.org/10.4018/jdwm.2007070101
Article Google Scholar
Gibaja E, Ventura S (2015) A tutorial on multilabel learning 47(3). https://doi.org/10.1145/2716262
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85:333–359. https://doi.org/10.1007/s10994-011-5256-5
Article MathSciNet Google Scholar
Tsoumakas G, Katakis I, Vlahavas I (2010) Mining Multi-label Data, pp. 667–685. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_34
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern Recognit 45(9):3084–3104. https://doi.org/10.1016/j.patcog.2012.03.004
Article Google Scholar
Gibaja E, Ventura S (2014) Multi-label learning: a review of the state of the art and ongoing research. WIREs Data Min Knowl Discov 4(6):411–444. https://doi.org/10.1002/widm.1139
Article Google Scholar
Chen W, Yan J, Zhang B, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 451–456. https://doi.org/10.1109/ICDM.2007.18
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1(1):24–45. https://doi.org/10.1109/TCBB.2004.2
Article Google Scholar
Bergmann S, Ihmels J, Barkai N (2003) Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E 67:031902. https://doi.org/10.1103/PhysRevE.67.031902
Article Google Scholar
Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129. https://doi.org/10.1093/bioinformatics/btl060
Article Google Scholar
Lazzeroni L, Owen A (2002) Plaid models for gene expression data. Statistica Sinica 12(1):61–86, http://www.jstor.org/stable/24307036. Accessed 2023-06-15
Pontes B, Giráldez R, Aguilar-Ruiz JS (2015) Biclustering on expression data: a review. J Biomed Inf 57:163–180. https://doi.org/10.1016/j.jbi.2015.06.028
Article Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology 8:93–103, https://pubmed.ncbi.nlm.nih.gov/10977070/
Murali TM, Kasif S (2003) Extracting conserved gene expression motifs from gene expression data. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 8:77–88, http://www.ncbi.nlm.nih.gov/pubmed/12603019
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:136–144. https://doi.org/10.1093/bioinformatics/18.suppl_1.S136
Article Google Scholar
Lotf H, Ramdani M (2020) Multi-label classification: A novel approach using decision trees for learning label-relations and preventing cyclical dependencies: Relations recognition and removing cycles (3rc). Association for Computing Machinery, New York, NY, USA.https://doi.org/10.1145/3419604.3419763
Wever M, Tornede A, Mohr F, Hüllermeier E (2020) Libre: Label-wise selection of base learners in binary relevance for multi-label classification. In: Advances in Intelligent Data Analysis XVIII, pp. 561–573. Springer. https://doi.org/10.1007/978-3-030-44584-3_44
Sun L, Kudo M (2019) Multi-label classification by polytree-augmented classifier chains with label-dependent features. Pattern Anal Appl 22:1029–1049. https://doi.org/10.1007/s10044-018-0711-6
Article MathSciNet Google Scholar
Soonsiripanichkul B, Murata T (2016) Domination dependency analysis of sales marketing based on multi-label classification using label ordering and cycle chain classification. In: 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 1048–1053. https://doi.org/10.1109/IIAI-AAI.2016.61
Glinka K, Zakrzewska D (2016) Effective multi-label classification method for multidimensional datasets. In: Flexible Query Answering Systems 2015, pp. 127–138. Springer. https://doi.org/10.1007/978-3-319-26154-6_10
Zhang J-J, Fang M, Li X (2015) Multi-label learning with discriminative features for each label. Neurocomputing 154:305–316. https://doi.org/10.1016/j.neucom.2014.11.062
Article Google Scholar
Gjorgjevikj D, Madjarov G (2011) Two stage classifier chain architecture for efficient pair-wise multi-label learning. In: 2011 IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6. https://doi.org/10.1109/MLSP.2011.6064599
Madjarov G, Gjorgjevikj D, Džeroski S (2012) Two stage architecture for multi-label learning. Pattern Recogn 45(3):1019–1034. https://doi.org/10.1016/j.patcog.2011.08.011
Article Google Scholar
Zhang M-L, Li Y-K, Liu X-Y, Geng X (2018) Binary relevance for multi-label learning: an overview. Front Comp Sci 12:191–202. https://doi.org/10.1007/s11704-017-7031-7
Article Google Scholar
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: a java library for multi-label learning. J Mach Learn Res 12(71):2411–2414
MathSciNet Google Scholar
Curi Z, de Souza Britto Jr, A, Paraiso EC (2018) Multi-label classification of user reactions in online news. CoRR. arXiv:1809.02811
Curi Z, de Souza Britto Jr, A, Paraiso EC (2019) Using correlation for labelset selection in multi-label classification of users reactions. In: Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference, Sarasota, Florida, USA, May 19-22 2019, pp. 167–172. AAAI Press. https://aaai.org/ocs/index.php/FLAIRS/FLAIRS19/paper/view/18297
Dosciatti M, Ferreira L, Paraiso EC (2015) Anotando um corpus de notícias para a análise de sentimentos: um relato de experiência (annotating a corpus of news for sentiment analysis: An experience report). In: Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology, STIL 2015, Natal, Brazil, November 4-7, 2015, pp. 121–130. Sociedade Brasileira de Computação. https://aclanthology.org/W15-5616/
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923. https://doi.org/10.1162/089976698300017197
Article Google Scholar
Read J, Reutemann P, Pfahringer B, Holmes G (2016) Meka: A multi-label/multi-target extension to weka. J Mach Learn Res 17(21):1–5
MathSciNet Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30. https://doi.org/10.5555/1248547.1248548
Article MathSciNet Google Scholar

Download references

Author information

E. C. Paraiso, J. C. Nievola: These authors contributed equally to this work.

Authors and Affiliations

Graduate Program in Informatics, Pontifícia Universidade Católica do Paraná, Rua Imaculada Conceição, 1155, Curitiba, Paraná, 80215-901, Brazil
Luiz Rafael Schmitke, Emerson Cabrera Paraiso & Julio Cesar Nievola

Authors

Luiz Rafael Schmitke
View author publications
You can also search for this author in PubMed Google Scholar
Emerson Cabrera Paraiso
View author publications
You can also search for this author in PubMed Google Scholar
Julio Cesar Nievola
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors made substantial contributions to the conception of the work. S.L.R wrote the initial manuscript. P. E.C. and N.J.C. contributed to the revisions. All authors approved the final version submitted for publication.

Corresponding author

Correspondence to Luiz Rafael Schmitke.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Schmitke, L.R., Paraiso, E.C. & Nievola, J.C. Biclustering-based multi-label classification. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02109-3

Download citation

Received: 01 September 2023
Revised: 06 March 2024
Accepted: 21 March 2024
Published: 23 April 2024
DOI: https://doi.org/10.1007/s10115-024-02109-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Biclustering-based multi-label classification

Abstract

Access this article

Similar content being viewed by others

Multi-label classification via closed frequent labelsets and label taxonomies

Evaluation of Different Data-Derived Label Hierarchies in Multi-label Classification

Predictive Bi-clustering Trees for Hierarchical Multi-label Classification

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Biclustering-based multi-label classification

Abstract

Access this article

Similar content being viewed by others

Multi-label classification via closed frequent labelsets and label taxonomies

Evaluation of Different Data-Derived Label Hierarchies in Multi-label Classification

Predictive Bi-clustering Trees for Hierarchical Multi-label Classification

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation