Natural Document Clustering by Clique Percolation in Random Graphs

Gao, Wei; Wong, Kam-Fai

doi:10.1007/11880592_10

Wei Gao²⁰ &
Kam-Fai Wong²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4182))

Included in the following conference series:

Asia Information Retrieval Symposium

1014 Accesses
3 Citations
3 Altmetric

Abstract

Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in order to discover natural overlapping clusters. Experiments show that CPC can outperform some typical algorithms on benchmark data sets, and shed light on natural document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, New York
Google Scholar
Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: Proc. of the 21th ACM SIGIR Conference, pp. 96–103 (1998)
Google Scholar
Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Google Scholar
Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph. Communications of the ACM 16, 575–577 (1971)
Article Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn. McGraw-Hill, New York
Google Scholar
Cutting, D., Karger, D., Pedersen, J., Tukey, J.W.: Scatter/Gather: A clusterbased approach to browsing large document collections. In: Proc. of the 15th ACM SIGIR Conference, pp. 318–329 (1992)
Google Scholar
Derenyi, I., Palla, G., Vicsek, T.: Clique percolation in random networks. Physics Review Letters 95, 160202 (2005)
Article Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc. of the 7th ACM-KDD, pp. 269–274 (2001)
Google Scholar
Ding, C.H.Q., He, X.F., Zha, H.Y., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: Proc. of IEEE ICDM, pp. 107–114 (2001)
Google Scholar
Dorogovtsev, S.N., Mendes, J.F.F.: Evolution of networks. Oxford Press, New York
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
King, B.: Step-wise clustering procedures. Journal of the American Statistical Association 69, 86–101 (1967)
Article Google Scholar
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.Y.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9, 595–607 (2001)
Article Google Scholar
Liu, X., Gong, Y.: Document clustering with clustering refinement and model selection capabilitities. In: Proc. of the 25th ACM SIGIR Conference, pp. 191–198 (2002)
Google Scholar
Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex netowrks in nature and society. Nature 435, 814–818 (2005)
Article Google Scholar
Raghavan, V.V., Yu, C.T.: A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 393–402 (1981)
Article MATH Google Scholar
Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, New York
Google Scholar
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proc. of the 23th ACM SIGIR Conference, pp. 208–215 (2000)
Google Scholar
Sneath, P.H.A., Sokal, R.R.: Numerical taxonomy: the principles and practice of numerical classification. Freeman, London
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of doucment clustering techniques. In: Proc. of KDD 2000 Workshop on Text Mining (2000)
Google Scholar
Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing 6, 505–517 (1977)
Article MATH MathSciNet Google Scholar
Zhao, Y., Karypis, G.: Criterion functions for document clustering. Technical Report #01-40, Department of Computer Science, University of Minnesota
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Wei Gao & Kam-Fai Wong

Authors

Wei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Kam-Fai Wong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, National University of Singapore, 3 Science Drive 2, 117543, Singapore
Hwee Tou Ng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Mun-Kew Leong
Department of Computer Science, School of Computing, National University of Singapore, 117543, Singapore
Min-Yen Kan
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, P.O. Box, 119613, Singapore
Donghong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, W., Wong, KF. (2006). Natural Document Clustering by Clique Percolation in Random Graphs. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_10

Download citation

DOI: https://doi.org/10.1007/11880592_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics