Skip to main content

Weakly-Supervised Neural Categorization of Wikipedia Articles

  • Conference paper
  • First Online:
Digital Libraries at the Crossroads of Digital Information for the Future (ICADL 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11853))

Included in the following conference series:

Abstract

Deep neural models are gaining increasing popularity for many NLP tasks, due to their strong expressive power and less requirement for feature engineering. Neural models often need a large amount of labeled training documents. However, one category of Wikipedia does not contain enough articles for training. Weakly-supervised neural document classification can deal with situations even when only a small labeled document set is given. However, these RNN-based approaches often fail on long documents such as Wikipedia articles, due to hardness to retain memories on important parts of a long document. To overcome these challenges, we propose a text summarization method called WS-Rank, which extracts key sentences of documents with weighting based on class-related keywords and sentence positions in documents. After applying our WS-Rank to training and test documents to summarize then into key sentences, weakly-supervised neural classification shows remarkable improvement on classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aouicha, M.B., Ali, M., Taieb, H., Ezzeddine, M.: Derivation of ‘is a’ taxonomy from Wikipedia Category Graph. Eng. Appl. Artif. Intell. 50, 265–286 (2016)

    Article  Google Scholar 

  2. Abdelghani, B., Al-Dhelaan, M.: Ne-rank: a novel graph-based keyphrase extraction in twitter. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 1. IEEE (2012)

    Google Scholar 

  3. Ferreira, R., Freitas, F., Cabral, L.S., Lins, R.D.: A four dimension graph model for automatic text summarization. IEEE (2013)

    Google Scholar 

  4. Kittur, A.: What’s in Wikipedia? Mapping topics and conflict using socially annotated category structure, pp. 1509–1512. ACM (2009)

    Google Scholar 

  5. Kim, Y.: Convolutional neural network for sentence classification. In: EMNLP, pp. 1746–1751 (2014)

    Google Scholar 

  6. Lu, J., Shang, J., Cheng, C., Ren, X., Han, J.: Mining quality phrases from massive text corpora. In: SIGMOD (2015)

    Google Scholar 

  7. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: IJCAI 2016 (2016)

    Google Scholar 

  8. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, vol. 333 (2015)

    Google Scholar 

  9. Mihalcea, R., Tarau, P.: TextRank: bridging order into texts. In: EMNLP, pp. 404–411 (2004)

    Google Scholar 

  10. Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: CIKM (2018)

    Google Scholar 

  11. Parveen, D., Ramsl, H., Strube, M.: Topical coherence for graph-based extractive summarization. In: EMNLP, pp. 1949–1954 (2015)

    Google Scholar 

  12. Pengfei, L., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press (2016)

    Google Scholar 

  13. Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: ICML (2003)

    Google Scholar 

  14. Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI (2014)

    Google Scholar 

  15. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: HLT-NAACL (2016)

    Google Scholar 

  16. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: NIPS (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mizuho Iwaihara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., Iwaihara, M. (2019). Weakly-Supervised Neural Categorization of Wikipedia Articles. In: Jatowt, A., Maeda, A., Syn, S. (eds) Digital Libraries at the Crossroads of Digital Information for the Future. ICADL 2019. Lecture Notes in Computer Science(), vol 11853. Springer, Cham. https://doi.org/10.1007/978-3-030-34058-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34058-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34057-5

  • Online ISBN: 978-3-030-34058-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics