Clustering Scientific Literature Using Sparse Citation Graph Analysis

  • Levent Bolelli
  • Seyda Ertekin
  • C. Lee Giles
Conference paper

DOI: 10.1007/11871637_8

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)
Cite this paper as:
Bolelli L., Ertekin S., Giles C.L. (2006) Clustering Scientific Literature Using Sparse Citation Graph Analysis. In: Fürnkranz J., Scheffer T., Spiliopoulou M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science, vol 4213. Springer, Berlin, Heidelberg

Abstract

It is well known that connectivity analysis of linked documents provides significant information about the structure of the document space for unsupervised learning tasks. However, the ability to identify distinct clusters of documents based on link graph analysis is proportional to the density of the graph and depends on the availability of the linking and/or linked documents in the collection. In this paper, we present an information theoretic approach towards measuring the significance of individual words based on the underlying link structure of the document collection. This enables us to generate a non-uniform weight distribution of the feature space which is used to augment the original corpus-based document similarities. The experimental results on the collection of scientific literature show that our method achieves better separation of distinct groups of documents, yielding improved clustering solutions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Levent Bolelli
    • 1
  • Seyda Ertekin
    • 1
  • C. Lee Giles
    • 1
    • 2
  1. 1.Department of Computer Science and EngineeringThe Pennsylvania State UniversityUniversity ParkUSA
  2. 2.College of Information Sciences and TechnologyThe Pennsylvania State UniversityUniversity ParkUSA

Personalised recommendations