Keyword Extraction from a Single Document Using Centrality Measures

  • Girish Keshav Palshikar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4815)

Abstract

Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two words, derived from the frequency of their co-occurrence in the document. We propose that central vertices in this graph are candidates as keywords. We model importance of a word in terms of its centrality in this graph. Using graph-theoretical notions of vertex centrality, we suggest several algorithms to extract keywords from the given document. We demonstrate the effectiveness of the proposed algorithms on real-life documents.

Keywords

Centrality Measure News Story Dissimilarity Measure Index Term News Item 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms 2/e. MIT Press, Cambridge (2001)Google Scholar
  2. 2.
    Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)CrossRefGoogle Scholar
  3. 3.
    Kubica, J., Moore, A., Cohn, D., Schneider, J.: Finding underlying structure: A fast graph-based method for link analysis and collaboration queries. In: Proc. 20th Int. Conf. on Machine Learning (ICML 2003) (2003)Google Scholar
  4. 4.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. Journal on AI Tools 13(1), 157–169 (2004)CrossRefGoogle Scholar
  5. 5.
    Matsumura, N., Ohsawa, Y., Ishizuka, M.: Pai: Automatic indexing for extracting assorted keywords from a document. In: Proc. AAAI 2002 (2002)Google Scholar
  6. 6.
    Ohsawa, Y., Benson, N.E., Yachida, M.: Keygraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proc. Advanced Digital Library Conference (ADL 1998), pp. 12–18 (1998)Google Scholar
  7. 7.
    Wasserman, S., Faust, K., Iacobucci, D.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Girish Keshav Palshikar
    • 1
  1. 1.Tata Research Development and Design Centre (TRDDC), 54B, Hadapsar Industrial Estate, Pune 411013India

Personalised recommendations