Assessing the Multi-labelness of Multi-label Data

Park, Laurence A. F.; Guo, Yi; Read, Jesse

doi:10.1007/978-3-030-46147-8_10

Laurence A. F. Park¹⁴,
Yi Guo¹⁴ &
Jesse Read¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11907))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1369 Accesses
1 Citations

Abstract

Before constructing a classifier, we should examine the data to gain an understanding of the relationships between the variables, to assist with the design of the classifier. Using multi-label data requires us to examine the association between labels: its multi-labelness. We cannot directly measure association between two labels, since the labels’ relationships are confounded with the set of observation variables. A better approach is to fit an analytical model to a label with respect to the observations and remaining labels, but this might present false relationships due to the problem of multicollinearity between the observations and labels. In this article, we examine the utility of regularised logistic regression and a new form of split logistic regression for assessing the multi-labelness of data. We find that a split analytical model using regularisation is able to provide fewer label relationships when no relationships exist, or if the labels can be partitioned. We also find that if label relationships do exist, logistic regression with \(l_1\) regularisation provides the better measurement of multi-labelness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Section 6 of https://cran.r-project.org/web/packages/penalized/vignettes/ penalized.pdf.
2.
All available from http://mulan.sourceforge.net/datasets-mlc.html, https://sourceforge.net/projects/meka/files/Datasets/ (Slashdot), and http://cecas.clemson.edu/~ahoover/stare/ (Stare).

References

Guo, Y., Gu, S.: Multi-label classification using conditional dependency networks. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1300 (2011)
Google Scholar
Osojnik, A., Panov, P., Džeroski, S.: Multi-label classification via multi-target regression on data streams. Mach. Learn. 106(6), 745–770 (2016). https://doi.org/10.1007/s10994-016-5613-5
Article MathSciNet MATH Google Scholar
Park, L.A.F., Read, J.: A blended metric for multi-label optimisation and evaluation. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 719–734. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_44
Chapter Google Scholar
Park, L.A.F., Simoff, S.: Using entropy as a measure of acceptance for multi-label classification. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 217–228. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24465-5_19
Chapter Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333 (2011). https://doi.org/10.1007/s10994-011-5256-5
Article MathSciNet Google Scholar
Sucar, L.E., Bielza, C., Morales, E.F., Hernandez-Leal, P., Zaragoza, J.H., Larrañaga, P.: Multi-label classification with Bayesian network-based chain classifiers. Pattern Recogn. Lett. 41, 14–22 (2014)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Article Google Scholar
Zhang, M.-L., Zhang, K.: Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 999–1008. ACM (2010)
Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Research in Mathematics, School of Computing, Engineering and Mathematics, Western Sydney University, Sydney, Australia
Laurence A. F. Park & Yi Guo
DaSciM team, LIX Laboratory, École Polytechnique, 91120, Palaiseau, France
Jesse Read

Authors

Laurence A. F. Park
View author publications
You can also search for this author in PubMed Google Scholar
Yi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jesse Read
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laurence A. F. Park .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, L.A.F., Guo, Y., Read, J. (2020). Assessing the Multi-labelness of Multi-label Data. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11907. Springer, Cham. https://doi.org/10.1007/978-3-030-46147-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-46147-8_10
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46146-1
Online ISBN: 978-3-030-46147-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)