Case Studies and Metrics

Herrera, Francisco; Charte, Francisco; Rivera, Antonio J.; del Jesus, María J.

doi:10.1007/978-3-319-41111-8_3

Francisco Herrera⁵,
Francisco Charte⁵,
Antonio J. Rivera⁶ &
…
María J. del Jesus⁶

2277 Accesses
1 Citations

Abstract

Multilabel classification techniques have been applied in many real-world situations in the last two decades. Each one represents a different case study for MLC, using one or more MLDs. After the general overview provided in Sect. 3.1, this chapter begins by briefly describing in Sect. 3.2 the most usual case studies found in the literature. As a result, a full list of available MLDs will be obtained, and the usual characterization metrics are explained and put in use with them in Sect. 3.3. Then, a practical use case is detailed in Sect. 3.4, running a simple MLC algorithm over a few MLDs. Lastly, the usual performance evaluation metrics for MLC are introduced in Sect. 3.5 and they are used to analyze the results obtained from this experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
All datasets are available at RUMDR (R Ultimate Multilabel Dataset Repository) [10], from which can be downloaded and exported to several file formats.
2.
The differences among the main file formats, all of them derived from the ARFF format used by WEKA, and how to use each of them, will be detailed in Chap. 9.
3.
http://www.bibsonomy.org.
4.
https://delicious.com/.
5.
Additional information about how these MLDs were produced, including the software to do so, can be found at http://www.ke.tu-darmstadt.de/resources/eurlex.
6.
http://imdb.org.
7.
http://languagelog.ldc.upenn.edu/nll/.
8.
http://www.cdc.gov/nchs/icd/icd9cm.htm.
9.
https://www.nlm.nih.gov/mesh/indman/chapter_23.html.
10.
http://slashdot.org.
11.
http://stackexchange.com/.
12.
http://web.eecs.utk.edu/events/tmw07/.
13.
http://web.archive.org/web/19970517033654/http://www9.yahoo.com/.
14.
https://archive.ics.uci.edu/ml/datasets/Flags.
15.
http://www-nlpir.nist.gov/projects/trecvid/.
16.
https://www.flickr.com/.
17.
http://prosite.expasy.org/prosite.html.
18.
http://www.ncbi.nlm.nih.gov/pubmed/15608217.
19.
http://sites.labic.icmc.usp.br/mldatagen/.
20.
http://cse.seu.edu.cn/people/zhangml/Resources.htm#codes.
21.
The values of metrics such as HammingLoss, OneError, and RankingLoss have been complemented as the difference with respect to 1, aiming to preserve the principle of assigning a larger area to better values.
22.
It must be taken into account that ML-kNN does not generate a real ranking of labels as prediction, but a binary partition. The ranking is generated from the posterior probabilities calculated for each label. With so few labels in emotions, it is possible to have many ties in these probabilities, so the positions in the ranking could be randomly determined in some cases.

References

Aha, D.W. (ed.): Lazy Learning. Springer (1997)
Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Alcala-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL multi-label dataset repository. http://sci2s.ugr.es/keel/multilabel.php
Atkinson, A.B.: On the measurement of inequality. J. Econ. Theory 2(3), 244–263 (1970)
Article MathSciNet Google Scholar
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)
MATH Google Scholar
Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Article Google Scholar
Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X.Z., Raich, R., Hadley, S.J.K., Hadley, A.S., Betts, M.G.: Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J. Acoust. Soc. Am. 131(6), 4640–4650 (2012)
Article Google Scholar
Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. J. Mach. Learn. Res. 7, 31–54 (2006)
MATH MathSciNet Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM data: multi-label classification repository. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html
Charte, F., Charte, D., Rivera, A.J., del Jesus, M.J., Herrera, F.: R Ultimate multilabel dataset repository. In: Proceedings of 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS’16, vol. 9648, pp. 487–499. Springer (2016)
Google Scholar
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: LI-MLC: a label inference methodology for addressing high dimensionality in the label space for multilabel classification. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1842–1854 (2014)
Article Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Multilabel classification. Problem analysis, metrics and techniques book repository. https://github.com/fcharte/SM-MLC
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Concurrence among Imbalanced labels and its influence on multilabel resampling algorithms. In: Proceedings of 9th International Conference on Hybrid Artificial Intelligent Systems, HAIS’14, vol. 8480. Springer (2014)
Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)
Article Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: QUINTA: a question tagging assistant to improve the answering ratio in electronic forums. In: Proceedings of IEEE International Conference on Computer as a Tool, EUROCON’15, pp. 1–6. IEEE (2015)
Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: On the impact of dataset complexity and sampling strategy in multilabel classifiers performance. In: Proceedings of 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS’16, vol. 9648, pp. 500–511. Springer (2016)
Google Scholar
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of 8th ACM international Conference on Image and Video Retrieval, CIVR’09, pp. 48:1–48:9. ACM (2009)
Google Scholar
Crammer, K., Dredze, M., Ganchev, K., Talukdar, P.P., Carroll, S.: Automatic code assignment to medical text. In: Proceedings of Workshop on Biological, Translational, and Clinical Language Processing, BioNLP’07, pp. 129–136. Association for Computational Linguistics (2007)
Google Scholar
Diplaris, S., Tsoumakas, G., Mitkas, P., Vlahavas, I.: Protein classification with multiple algorithms. In: Proceedings of 10th Panhellenic Conference on Informatics, PCI’05, vol. 3746, pp. 448–456. Springer (2005)
Google Scholar
Duygulu, P., Barnard, K., de Freitas, J., Forsyth, D.: Object recognition as machine translation: learning a Lexicon for a fixed image vocabulary. In: Proceedings of 7th European Conference on Computer Vision, ECCV’02, vol. 2353, pp. 97–112. Springer (2002)
Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT Press (2001)
Google Scholar
Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of 14th ACM International Conference on Information and Knowledge Management, CIKM’05, pp. 195–200. ACM (2005)
Google Scholar
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. Adv. Knowl. Discov. Data Min. 3056, 22–30 (2004)
Google Scholar
Gonçalves, E.C., Plastino, A., Freitas, A.A.: A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: Proceedings of 25th IEEE International Conference on Tools with Artificial Intelligence, ICTAI’13, pp. 469–476. IEEE (2013)
Google Scholar
Joachims, T.: Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of 10th European Conference on Machine Learning, ECML’98, pp. 137–142. Springer (1998)
Google Scholar
Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD’08, pp. 75–83 (2008)
Google Scholar
Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Proceedings of 15th European Conference on Machine Learning, ECML’04, pp. 217–226. Springer (2004)
Google Scholar
Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of 12th International Conference on Machine Learning, ML’95, pp. 331–339 (1995)
Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Google Scholar
Mencia, E.L., Fürnkranz, J.: Efficient pairwise multilabel classification for large-scale problems in the legal domain. In: Proceedings of 11th European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD’08, pp. 50–65. Springer (2008)
Google Scholar
Read, J.: Scalable multi-label classification. Ph.D. thesis, University of Waikato (2010)
Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85, 333–359 (2011)
Article MathSciNet Google Scholar
Read, J., Reutemann, P.: MEKA multi-label dataset repository. http://sourceforge.net/projects/meka/files/Datasets/
Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)
Article MATH Google Scholar
Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of 14th ACM International Conference on Multimedia, MULTIMEDIA’06, pp. 421–430 (2006)
Google Scholar
Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I.Y., Tsoumakas, G., Vlahavas, I.: A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Trans. Multimedia 16(6), 1713–1728 (2014)
Article Google Scholar
Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)
Google Scholar
Tomás, J.T., Spolaôr, N., Cherman, E.A., Monard, M.C.: A framework to generate synthetic multi-label datasets. Electron. Notes Theoret. Comput. Sci. 302, 155–176 (2014)
Article Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Int. J. Data Warehouse. Min. 3(3), 1–13 (2007)
Article Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD’08, pp. 30–44 (2008)
Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2010)
Google Scholar
Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Proceedings of 18th European Conference on Machine Learning, ECML’07, vol. 4701, pp. 406–417. Springer (2007)
Google Scholar
Tsoumakas, G., Xioufis, E.S., Vilcek, J., Vlahavas, I.: MULAN multi-label dataset repository. http://mulan.sourceforge.net/datasets.html
Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008)
Article Google Scholar
Turner, M.D., Chakrabarti, C., Jones, T.B., Xu, J.F., Fox, P.T., Luger, G.F., Laird, A.R., Turner, J.A.: Automated annotation of functional imaging experiments via multi-label classification. Front. Neurosci. 7 (2013)
Google Scholar
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Article Google Scholar
Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Proceedings of 15th Annual Conference on Neural Information Processing Systems, NIPS’02, pp. 721–728 (2002)
Google Scholar
Wieczorkowska, A., Synak, P., Raś, Z.: Multi-label classification of emotions in music. In: Intelligent Information Processing and Web Mining, AISC, vol. 35, chap. 30, pp. 307–315 (2006)
Google Scholar
Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Granada, Granada, Spain
Francisco Herrera & Francisco Charte
University of Jaén, Jaén, Spain
Antonio J. Rivera & María J. del Jesus

Authors

Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Charte
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Rivera
View author publications
You can also search for this author in PubMed Google Scholar
María J. del Jesus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco Herrera .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J. (2016). Case Studies and Metrics. In: Multilabel Classification . Springer, Cham. https://doi.org/10.1007/978-3-319-41111-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-41111-8_3
Published: 10 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41110-1
Online ISBN: 978-3-319-41111-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics