Abstract
Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in order to discover natural overlapping clusters. Experiments show that CPC can outperform some typical algorithms on benchmark data sets, and shed light on natural document clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, New York
Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: Proc. of the 21th ACM SIGIR Conference, pp. 96–103 (1998)
Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph. Communications of the ACM 16, 575–577 (1971)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn. McGraw-Hill, New York
Cutting, D., Karger, D., Pedersen, J., Tukey, J.W.: Scatter/Gather: A clusterbased approach to browsing large document collections. In: Proc. of the 15th ACM SIGIR Conference, pp. 318–329 (1992)
Derenyi, I., Palla, G., Vicsek, T.: Clique percolation in random networks. Physics Review Letters 95, 160202 (2005)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc. of the 7th ACM-KDD, pp. 269–274 (2001)
Ding, C.H.Q., He, X.F., Zha, H.Y., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: Proc. of IEEE ICDM, pp. 107–114 (2001)
Dorogovtsev, S.N., Mendes, J.F.F.: Evolution of networks. Oxford Press, New York
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)
King, B.: Step-wise clustering procedures. Journal of the American Statistical Association 69, 86–101 (1967)
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.Y.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9, 595–607 (2001)
Liu, X., Gong, Y.: Document clustering with clustering refinement and model selection capabilitities. In: Proc. of the 25th ACM SIGIR Conference, pp. 191–198 (2002)
Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex netowrks in nature and society. Nature 435, 814–818 (2005)
Raghavan, V.V., Yu, C.T.: A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 393–402 (1981)
Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, New York
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proc. of the 23th ACM SIGIR Conference, pp. 208–215 (2000)
Sneath, P.H.A., Sokal, R.R.: Numerical taxonomy: the principles and practice of numerical classification. Freeman, London
Steinbach, M., Karypis, G., Kumar, V.: A comparison of doucment clustering techniques. In: Proc. of KDD 2000 Workshop on Text Mining (2000)
Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing 6, 505–517 (1977)
Zhao, Y., Karypis, G.: Criterion functions for document clustering. Technical Report #01-40, Department of Computer Science, University of Minnesota
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, W., Wong, KF. (2006). Natural Document Clustering by Clique Percolation in Random Graphs. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_10
Download citation
DOI: https://doi.org/10.1007/11880592_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)