Natural Document Clustering by Clique Percolation in Random Graphs

  • Wei Gao
  • Kam-Fai Wong
Conference paper

DOI: 10.1007/11880592_10

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4182)
Cite this paper as:
Gao W., Wong KF. (2006) Natural Document Clustering by Clique Percolation in Random Graphs. In: Ng H.T., Leong MK., Kan MY., Ji D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg

Abstract

Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in order to discover natural overlapping clusters. Experiments show that CPC can outperform some typical algorithms on benchmark data sets, and shed light on natural document clustering.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wei Gao
    • 1
  • Kam-Fai Wong
    • 1
  1. 1.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongShatin, N.T., Hong Kong

Personalised recommendations