Weakly-Supervised Neural Categorization of Wikipedia Articles

Chen, Xingyu; Iwaihara, Mizuho

doi:10.1007/978-3-030-34058-2_2

Xingyu Chen¹¹ &
Mizuho Iwaihara¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11853))

Included in the following conference series:

International Conference on Asian Digital Libraries

736 Accesses
1 Citations

Abstract

Deep neural models are gaining increasing popularity for many NLP tasks, due to their strong expressive power and less requirement for feature engineering. Neural models often need a large amount of labeled training documents. However, one category of Wikipedia does not contain enough articles for training. Weakly-supervised neural document classification can deal with situations even when only a small labeled document set is given. However, these RNN-based approaches often fail on long documents such as Wikipedia articles, due to hardness to retain memories on important parts of a long document. To overcome these challenges, we propose a text summarization method called WS-Rank, which extracts key sentences of documents with weighting based on class-related keywords and sentence positions in documents. After applying our WS-Rank to training and test documents to summarize then into key sentences, weakly-supervised neural classification shows remarkable improvement on classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aouicha, M.B., Ali, M., Taieb, H., Ezzeddine, M.: Derivation of ‘is a’ taxonomy from Wikipedia Category Graph. Eng. Appl. Artif. Intell. 50, 265–286 (2016)
Article Google Scholar
Abdelghani, B., Al-Dhelaan, M.: Ne-rank: a novel graph-based keyphrase extraction in twitter. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 1. IEEE (2012)
Google Scholar
Ferreira, R., Freitas, F., Cabral, L.S., Lins, R.D.: A four dimension graph model for automatic text summarization. IEEE (2013)
Google Scholar
Kittur, A.: What’s in Wikipedia? Mapping topics and conflict using socially annotated category structure, pp. 1509–1512. ACM (2009)
Google Scholar
Kim, Y.: Convolutional neural network for sentence classification. In: EMNLP, pp. 1746–1751 (2014)
Google Scholar
Lu, J., Shang, J., Cheng, C., Ren, X., Han, J.: Mining quality phrases from massive text corpora. In: SIGMOD (2015)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: IJCAI 2016 (2016)
Google Scholar
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, vol. 333 (2015)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bridging order into texts. In: EMNLP, pp. 404–411 (2004)
Google Scholar
Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: CIKM (2018)
Google Scholar
Parveen, D., Ramsl, H., Strube, M.: Topical coherence for graph-based extractive summarization. In: EMNLP, pp. 1949–1954 (2015)
Google Scholar
Pengfei, L., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press (2016)
Google Scholar
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: ICML (2003)
Google Scholar
Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI (2014)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: HLT-NAACL (2016)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: NIPS (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, Japan
Xingyu Chen & Mizuho Iwaihara

Authors

Xingyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mizuho Iwaihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mizuho Iwaihara .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Adam Jatowt
Ritsumeikan University, Kusatsu, Japan
Akira Maeda
The Catholic University of America, Washington, DC, USA
Sue Yeon Syn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Iwaihara, M. (2019). Weakly-Supervised Neural Categorization of Wikipedia Articles. In: Jatowt, A., Maeda, A., Syn, S. (eds) Digital Libraries at the Crossroads of Digital Information for the Future. ICADL 2019. Lecture Notes in Computer Science(), vol 11853. Springer, Cham. https://doi.org/10.1007/978-3-030-34058-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-34058-2_2
Published: 29 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34057-5
Online ISBN: 978-3-030-34058-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics