Skip to main content

Case Studies and Metrics

  • Chapter
  • First Online:
Multilabel Classification

Abstract

Multilabel classification techniques have been applied in many real-world situations in the last two decades. Each one represents a different case study for MLC, using one or more MLDs. After the general overview provided in Sect. 3.1, this chapter begins by briefly describing in Sect. 3.2 the most usual case studies found in the literature. As a result, a full list of available MLDs will be obtained, and the usual characterization metrics are explained and put in use with them in Sect. 3.3. Then, a practical use case is detailed in Sect. 3.4, running a simple MLC algorithm over a few MLDs. Lastly, the usual performance evaluation metrics for MLC are introduced in Sect. 3.5 and they are used to analyze the results obtained from this experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All datasets are available at RUMDR (R Ultimate Multilabel Dataset Repository) [10], from which can be downloaded and exported to several file formats.

  2. 2.

    The differences among the main file formats, all of them derived from the ARFF format used by WEKA, and how to use each of them, will be detailed in Chap. 9.

  3. 3.

    http://www.bibsonomy.org.

  4. 4.

    https://delicious.com/.

  5. 5.

    Additional information about how these MLDs were produced, including the software to do so, can be found at http://www.ke.tu-darmstadt.de/resources/eurlex.

  6. 6.

    http://imdb.org.

  7. 7.

    http://languagelog.ldc.upenn.edu/nll/.

  8. 8.

    http://www.cdc.gov/nchs/icd/icd9cm.htm.

  9. 9.

    https://www.nlm.nih.gov/mesh/indman/chapter_23.html.

  10. 10.

    http://slashdot.org.

  11. 11.

    http://stackexchange.com/.

  12. 12.

    http://web.eecs.utk.edu/events/tmw07/.

  13. 13.

    http://web.archive.org/web/19970517033654/http://www9.yahoo.com/.

  14. 14.

    https://archive.ics.uci.edu/ml/datasets/Flags.

  15. 15.

    http://www-nlpir.nist.gov/projects/trecvid/.

  16. 16.

    https://www.flickr.com/.

  17. 17.

    http://prosite.expasy.org/prosite.html.

  18. 18.

    http://www.ncbi.nlm.nih.gov/pubmed/15608217.

  19. 19.

    http://sites.labic.icmc.usp.br/mldatagen/.

  20. 20.

    http://cse.seu.edu.cn/people/zhangml/Resources.htm#codes.

  21. 21.

    The values of metrics such as HammingLoss, OneError, and RankingLoss have been complemented as the difference with respect to 1, aiming to preserve the principle of assigning a larger area to better values.

  22. 22.

    It must be taken into account that ML-kNN does not generate a real ranking of labels as prediction, but a binary partition. The ranking is generated from the posterior probabilities calculated for each label. With so few labels in emotions, it is possible to have many ties in these probabilities, so the positions in the ranking could be randomly determined in some cases.

References

  1. Aha, D.W. (ed.): Lazy Learning. Springer (1997)

    Google Scholar 

  2. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  3. Alcala-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL multi-label dataset repository. http://sci2s.ugr.es/keel/multilabel.php

  4. Atkinson, A.B.: On the measurement of inequality. J. Econ. Theory 2(3), 244–263 (1970)

    Article  MathSciNet  Google Scholar 

  5. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)

    MATH  Google Scholar 

  6. Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  7. Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X.Z., Raich, R., Hadley, S.J.K., Hadley, A.S., Betts, M.G.: Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J. Acoust. Soc. Am. 131(6), 4640–4650 (2012)

    Article  Google Scholar 

  8. Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. J. Mach. Learn. Res. 7, 31–54 (2006)

    MATH  MathSciNet  Google Scholar 

  9. Chang, C.C., Lin, C.J.: LIBSVM data: multi-label classification repository. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html

  10. Charte, F., Charte, D., Rivera, A.J., del Jesus, M.J., Herrera, F.: R Ultimate multilabel dataset repository. In: Proceedings of 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS’16, vol. 9648, pp. 487–499. Springer (2016)

    Google Scholar 

  11. Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: LI-MLC: a label inference methodology for addressing high dimensionality in the label space for multilabel classification. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1842–1854 (2014)

    Article  Google Scholar 

  12. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Multilabel classification. Problem analysis, metrics and techniques book repository. https://github.com/fcharte/SM-MLC

  13. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Concurrence among Imbalanced labels and its influence on multilabel resampling algorithms. In: Proceedings of 9th International Conference on Hybrid Artificial Intelligent Systems, HAIS’14, vol. 8480. Springer (2014)

    Google Scholar 

  14. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)

    Article  Google Scholar 

  15. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: QUINTA: a question tagging assistant to improve the answering ratio in electronic forums. In: Proceedings of IEEE International Conference on Computer as a Tool, EUROCON’15, pp. 1–6. IEEE (2015)

    Google Scholar 

  16. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: On the impact of dataset complexity and sampling strategy in multilabel classifiers performance. In: Proceedings of 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS’16, vol. 9648, pp. 500–511. Springer (2016)

    Google Scholar 

  17. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of 8th ACM international Conference on Image and Video Retrieval, CIVR’09, pp. 48:1–48:9. ACM (2009)

    Google Scholar 

  18. Crammer, K., Dredze, M., Ganchev, K., Talukdar, P.P., Carroll, S.: Automatic code assignment to medical text. In: Proceedings of Workshop on Biological, Translational, and Clinical Language Processing, BioNLP’07, pp. 129–136. Association for Computational Linguistics (2007)

    Google Scholar 

  19. Diplaris, S., Tsoumakas, G., Mitkas, P., Vlahavas, I.: Protein classification with multiple algorithms. In: Proceedings of 10th Panhellenic Conference on Informatics, PCI’05, vol. 3746, pp. 448–456. Springer (2005)

    Google Scholar 

  20. Duygulu, P., Barnard, K., de Freitas, J., Forsyth, D.: Object recognition as machine translation: learning a Lexicon for a fixed image vocabulary. In: Proceedings of 7th European Conference on Computer Vision, ECCV’02, vol. 2353, pp. 97–112. Springer (2002)

    Google Scholar 

  21. Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT Press (2001)

    Google Scholar 

  22. Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of 14th ACM International Conference on Information and Knowledge Management, CIKM’05, pp. 195–200. ACM (2005)

    Google Scholar 

  23. Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. Adv. Knowl. Discov. Data Min. 3056, 22–30 (2004)

    Google Scholar 

  24. Gonçalves, E.C., Plastino, A., Freitas, A.A.: A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: Proceedings of 25th IEEE International Conference on Tools with Artificial Intelligence, ICTAI’13, pp. 469–476. IEEE (2013)

    Google Scholar 

  25. Joachims, T.: Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of 10th European Conference on Machine Learning, ECML’98, pp. 137–142. Springer (1998)

    Google Scholar 

  26. Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD’08, pp. 75–83 (2008)

    Google Scholar 

  27. Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Proceedings of 15th European Conference on Machine Learning, ECML’04, pp. 217–226. Springer (2004)

    Google Scholar 

  28. Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of 12th International Conference on Machine Learning, ML’95, pp. 331–339 (1995)

    Google Scholar 

  29. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  30. Mencia, E.L., Fürnkranz, J.: Efficient pairwise multilabel classification for large-scale problems in the legal domain. In: Proceedings of 11th European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD’08, pp. 50–65. Springer (2008)

    Google Scholar 

  31. Read, J.: Scalable multi-label classification. Ph.D. thesis, University of Waikato (2010)

    Google Scholar 

  32. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85, 333–359 (2011)

    Article  MathSciNet  Google Scholar 

  33. Read, J., Reutemann, P.: MEKA multi-label dataset repository. http://sourceforge.net/projects/meka/files/Datasets/

  34. Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)

    Article  MATH  Google Scholar 

  35. Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of 14th ACM International Conference on Multimedia, MULTIMEDIA’06, pp. 421–430 (2006)

    Google Scholar 

  36. Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I.Y., Tsoumakas, G., Vlahavas, I.: A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Trans. Multimedia 16(6), 1713–1728 (2014)

    Article  Google Scholar 

  37. Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)

    Google Scholar 

  38. Tomás, J.T., Spolaôr, N., Cherman, E.A., Monard, M.C.: A framework to generate synthetic multi-label datasets. Electron. Notes Theoret. Comput. Sci. 302, 155–176 (2014)

    Article  Google Scholar 

  39. Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Int. J. Data Warehouse. Min. 3(3), 1–13 (2007)

    Article  Google Scholar 

  40. Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD’08, pp. 30–44 (2008)

    Google Scholar 

  41. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2010)

    Google Scholar 

  42. Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Proceedings of 18th European Conference on Machine Learning, ECML’07, vol. 4701, pp. 406–417. Springer (2007)

    Google Scholar 

  43. Tsoumakas, G., Xioufis, E.S., Vilcek, J., Vlahavas, I.: MULAN multi-label dataset repository. http://mulan.sourceforge.net/datasets.html

  44. Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008)

    Article  Google Scholar 

  45. Turner, M.D., Chakrabarti, C., Jones, T.B., Xu, J.F., Fox, P.T., Luger, G.F., Laird, A.R., Turner, J.A.: Automated annotation of functional imaging experiments via multi-label classification. Front. Neurosci. 7 (2013)

    Google Scholar 

  46. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)

    Article  Google Scholar 

  47. Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Proceedings of 15th Annual Conference on Neural Information Processing Systems, NIPS’02, pp. 721–728 (2002)

    Google Scholar 

  48. Wieczorkowska, A., Synak, P., Raś, Z.: Multi-label classification of emotions in music. In: Intelligent Information Processing and Web Mining, AISC, vol. 35, chap. 30, pp. 307–315 (2006)

    Google Scholar 

  49. Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco Herrera .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J. (2016). Case Studies and Metrics. In: Multilabel Classification . Springer, Cham. https://doi.org/10.1007/978-3-319-41111-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41111-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41110-1

  • Online ISBN: 978-3-319-41111-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics