WISE 2014 Challenge: Multi-label Classification of Print Media Articles to Topics

  • Grigorios Tsoumakas
  • Apostolos Papadopoulos
  • Weining Qian
  • Stavros Vologiannidis
  • Alexander D’yakonov
  • Antti Puurula
  • Jesse Read
  • Jan Švec
  • Stanislav Semenov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8787)

Abstract

The WISE 2014 challenge was concerned with the task of multi-label classification of articles coming from Greek print media. Raw data comes from the scanning of print media, article segmentation, and optical character segmentation, and therefore is quite noisy. Each article is examined by a human annotator and categorized to one or more of the topics being monitored. Topics range from specific persons, products, and companies that can be easily categorized based on keywords, to more general semantic concepts, such as environment or economy. Building multi-label classifiers for the automated annotation of articles into topics can support the work of human annotators by suggesting a list of all topics by order of relevance, or even automate the annotation process for media and/or categories that are easier to predict. This saves valuable time and allows a media monitoring company to expand the portfolio of media being monitored. This paper summarizes the approaches of the top 4 among the 121 teams that participated in the competition.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 667–685. Springer, Heidelberg (2010)Google Scholar
  2. 2.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
  3. 3.
    Lesk, M.E.: Word-word associations in document retrieval systems. American Documentation 20(1), 27–38 (1969)CrossRefGoogle Scholar
  4. 4.
    Sill, J., Takács, G., Mackey, L., Lin, D.: Feature-weighted linear stacking. CoRR abs/0911.0460 (2009)Google Scholar
  5. 5.
    Puurula, A., Bifet, A.: Ensembles of sparse multinomial classifiers for scalable text classification. In: ECML/PKDD - PASCAL Workshop on Large-Scale Hierarchical Classification (2012)Google Scholar
  6. 6.
    Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 116. ACM, New York (2004)Google Scholar
  7. 7.
    Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 694–699 (2002)Google Scholar
  8. 8.
    Puurula, A., Read, J., Bifet, A.: Kaggle LSHTC4 winning solution. CoRR abs/1405.0546 (2014)Google Scholar
  9. 9.
    Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems 14 (2002)Google Scholar
  10. 10.
    Nam, J., Kim, J., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification - revisiting neural networks. CoRR abs/1312.5419 (2013)Google Scholar
  11. 11.
    Domingos, P.: The role of occam’s razor in knowledge discovery. Data Min. Knowl. Discov. 3(4), 409–425 (1999)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Grigorios Tsoumakas
    • 1
  • Apostolos Papadopoulos
    • 1
  • Weining Qian
    • 2
  • Stavros Vologiannidis
    • 3
  • Alexander D’yakonov
    • 4
  • Antti Puurula
    • 5
  • Jesse Read
    • 6
  • Jan Švec
    • 7
  • Stanislav Semenov
    • 8
  1. 1.Aristotle University of ThessalonikiThessalonikiGreece
  2. 2.East China Normal UniversityChina
  3. 3.DataScoutingGreece
  4. 4.Lomonosov Moscow State UniversityRussia
  5. 5.The University of WaikatoNew Zealand
  6. 6.Aalto UniversityFinland
  7. 7.University of West BohemiaCzech Republic
  8. 8.Higher School of Economics and the Yandex School of Data AnalysisRussia

Personalised recommendations