In this paper we investigate how to adapt the TextRank method to make it work in a supervised way. TextRank is a graph based method that applies the ideas of the ranking algorithm used in Google (PageRank) to Natural Language Processing (NLP) tasks. This approach has given very good results in many NLP tasks like text summarization, keyword extraction or word sense disambiguation. In all these tasks TextRank operates in an unsupervised way, without using any training corpus. Our main contribution is the definition of a method that allows to apply TextRank to a graph that includes information generated from a training tagged corpus. We have tested our method with the Part of Speech (POS) tagging task, comparing the results with those obtained with tools specialized in this task. The performance of our system is quite near to these tools, improving the results of two of them when the corpus tagset is big and therefore the tagging task more complicated.
KeywordsNatural Language Processing Ranking Algorithm Training Corpus Word Sense Disambiguation Unknown Word
Unable to display preview. Download preview PDF.
- 1.Baldridge, J., Morton, T., Bierner, G.: Maxent, Mature Java package for training and using maximum entropy models. An OpenNLP project (2005)Google Scholar
- 2.Brants, T.: TnT. A statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP 2000), USA (2000)Google Scholar
- 3.Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21(4) (1995)Google Scholar
- 4.Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Computer Networks and ISDN Systems (1998)Google Scholar
- 5.Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL Memory Based Learner, version 5.1, Reference Guide. ILK Research Group Technical Report Series no. 04-02. The Netherlands (2004)Google Scholar
- 6.Mihalcea, R., Tarau, P.: TextRank. Bringing Order into Texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain (2004)Google Scholar
- 7.Mihalcea, R., Tarau, P., Figa, E.: PageRank on Semantic Networks, with application to Word Sense Disambiguation. In: Proceedings of The 20th International Conference on Computational Linguistics, Switzerland, Geneva (2004)Google Scholar
- 8.Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of North American ACL 2001 (2001)Google Scholar
- 9.Radev, D., Mihalcea, R. (organizers): Graph-based Algorithms for Natural Language Processing. In: Workshop at HLT/NAACL, New York, USA (2006)Google Scholar
- 10.Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994)Google Scholar