Abstract
The WISE 2014 challenge was concerned with the task of multi-label classification of articles coming from Greek print media. Raw data comes from the scanning of print media, article segmentation, and optical character segmentation, and therefore is quite noisy. Each article is examined by a human annotator and categorized to one or more of the topics being monitored. Topics range from specific persons, products, and companies that can be easily categorized based on keywords, to more general semantic concepts, such as environment or economy. Building multi-label classifiers for the automated annotation of articles into topics can support the work of human annotators by suggesting a list of all topics by order of relevance, or even automate the annotation process for media and/or categories that are easier to predict. This saves valuable time and allows a media monitoring company to expand the portfolio of media being monitored. This paper summarizes the approaches of the top 4 among the 121 teams that participated in the competition.
Keywords
- Latent Dirichlet Allocation
- Ridge Regression
- Vote Weight
- Stochastic Gradient Descent
- Binary Relevance
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 667–685. Springer, Heidelberg (2010)
Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Lesk, M.E.: Word-word associations in document retrieval systems. American Documentation 20(1), 27–38 (1969)
Sill, J., Takács, G., Mackey, L., Lin, D.: Feature-weighted linear stacking. CoRR abs/0911.0460 (2009)
Puurula, A., Bifet, A.: Ensembles of sparse multinomial classifiers for scalable text classification. In: ECML/PKDD - PASCAL Workshop on Large-Scale Hierarchical Classification (2012)
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 116. ACM, New York (2004)
Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 694–699 (2002)
Puurula, A., Read, J., Bifet, A.: Kaggle LSHTC4 winning solution. CoRR abs/1405.0546 (2014)
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems 14 (2002)
Nam, J., Kim, J., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification - revisiting neural networks. CoRR abs/1312.5419 (2013)
Domingos, P.: The role of occam’s razor in knowledge discovery. Data Min. Knowl. Discov. 3(4), 409–425 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tsoumakas, G. et al. (2014). WISE 2014 Challenge: Multi-label Classification of Print Media Articles to Topics. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8787. Springer, Cham. https://doi.org/10.1007/978-3-319-11746-1_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-11746-1_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11745-4
Online ISBN: 978-3-319-11746-1
eBook Packages: Computer ScienceComputer Science (R0)