Keyword Extraction from a Single Document Using Centrality Measures
Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two words, derived from the frequency of their co-occurrence in the document. We propose that central vertices in this graph are candidates as keywords. We model importance of a word in terms of its centrality in this graph. Using graph-theoretical notions of vertex centrality, we suggest several algorithms to extract keywords from the given document. We demonstrate the effectiveness of the proposed algorithms on real-life documents.
KeywordsCentrality Measure News Story Dissimilarity Measure Index Term News Item
- 1.Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms 2/e. MIT Press, Cambridge (2001)Google Scholar
- 3.Kubica, J., Moore, A., Cohn, D., Schneider, J.: Finding underlying structure: A fast graph-based method for link analysis and collaboration queries. In: Proc. 20th Int. Conf. on Machine Learning (ICML 2003) (2003)Google Scholar
- 5.Matsumura, N., Ohsawa, Y., Ishizuka, M.: Pai: Automatic indexing for extracting assorted keywords from a document. In: Proc. AAAI 2002 (2002)Google Scholar
- 6.Ohsawa, Y., Benson, N.E., Yachida, M.: Keygraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proc. Advanced Digital Library Conference (ADL 1998), pp. 12–18 (1998)Google Scholar
- 7.Wasserman, S., Faust, K., Iacobucci, D.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1995)Google Scholar