Abstract
Deep neural models are gaining increasing popularity for many NLP tasks, due to their strong expressive power and less requirement for feature engineering. Neural models often need a large amount of labeled training documents. However, one category of Wikipedia does not contain enough articles for training. Weakly-supervised neural document classification can deal with situations even when only a small labeled document set is given. However, these RNN-based approaches often fail on long documents such as Wikipedia articles, due to hardness to retain memories on important parts of a long document. To overcome these challenges, we propose a text summarization method called WS-Rank, which extracts key sentences of documents with weighting based on class-related keywords and sentence positions in documents. After applying our WS-Rank to training and test documents to summarize then into key sentences, weakly-supervised neural classification shows remarkable improvement on classification results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aouicha, M.B., Ali, M., Taieb, H., Ezzeddine, M.: Derivation of ‘is a’ taxonomy from Wikipedia Category Graph. Eng. Appl. Artif. Intell. 50, 265–286 (2016)
Abdelghani, B., Al-Dhelaan, M.: Ne-rank: a novel graph-based keyphrase extraction in twitter. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 1. IEEE (2012)
Ferreira, R., Freitas, F., Cabral, L.S., Lins, R.D.: A four dimension graph model for automatic text summarization. IEEE (2013)
Kittur, A.: What’s in Wikipedia? Mapping topics and conflict using socially annotated category structure, pp. 1509–1512. ACM (2009)
Kim, Y.: Convolutional neural network for sentence classification. In: EMNLP, pp. 1746–1751 (2014)
Lu, J., Shang, J., Cheng, C., Ren, X., Han, J.: Mining quality phrases from massive text corpora. In: SIGMOD (2015)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: IJCAI 2016 (2016)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, vol. 333 (2015)
Mihalcea, R., Tarau, P.: TextRank: bridging order into texts. In: EMNLP, pp. 404–411 (2004)
Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: CIKM (2018)
Parveen, D., Ramsl, H., Strube, M.: Topical coherence for graph-based extractive summarization. In: EMNLP, pp. 1949–1954 (2015)
Pengfei, L., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press (2016)
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: ICML (2003)
Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI (2014)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: HLT-NAACL (2016)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: NIPS (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X., Iwaihara, M. (2019). Weakly-Supervised Neural Categorization of Wikipedia Articles. In: Jatowt, A., Maeda, A., Syn, S. (eds) Digital Libraries at the Crossroads of Digital Information for the Future. ICADL 2019. Lecture Notes in Computer Science(), vol 11853. Springer, Cham. https://doi.org/10.1007/978-3-030-34058-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-34058-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34057-5
Online ISBN: 978-3-030-34058-2
eBook Packages: Computer ScienceComputer Science (R0)