Clique Percolation Method for Finding Naturally Cohesive and Overlapping Document Clusters

Gao, Wei; Wong, Kam-Fai; Xia, Yunqing; Xu, Ruifeng

doi:10.1007/11940098_10

Wei Gao²²,
Kam-Fai Wong²²,
Yunqing Xia²² &
…
Ruifeng Xu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

1132 Accesses
4 Citations

Abstract

Techniques for find document clusters mostly depend on models that impose strong explicit and/or implicit priori assumptions. As a consequence, the clustering effects tend to be unnatural and stray away from the intrinsic grouping natures of a document collection. We apply a novel graph-theoretic technique called Clique Percolation Method (CPM) for document clustering. In this method, a process of enumerating highly cohesive maximal document cliques is performed in a random graph, where those strongly adjacent cliques are mingled to form naturally overlapping clusters. Our clustering results can unveil the inherent structural connections of the underlying data. Experiments show that CPM can outperform some typical algorithms on benchmark data sets, and shed light on its advantages on natural document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: Proc. of ACM SIGIR, pp. 96–103 (1998)
Google Scholar
Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Google Scholar
Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph. Communications of the ACM 16, 575–577 (1971)
Article Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn. McGraw-Hill, New York
Google Scholar
Cutting, D., Karger, D., Pedersen, J., Tukey, J.W.: Scatter/Gather: A cluster-based approach to browsing large document collections. In: Proc. of the 15th ACM SIGIR Conference, pp. 318–329 (1992)
Google Scholar
Derenyi, I., Palla, G., Vicsek, T.: Clique percolation in random networks. Physics Review Letters 95, 160–202 (2005)
Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc. of the 7th ACM-KDD, pp. 269–274 (2001)
Google Scholar
Ding, C.H.Q., He, X.F., Zha, H.Y., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: Proc. of IEEE ICDM, pp. 107–114 (2001)
Google Scholar
Dorogovtsev, S.N., Mendes, J.F.F.: Evolution of networks. Oxford Press, New York
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
King, B.: Step-wise clustering procedures. Journal of the American Statistical Association 69, 86–101 (1967)
Article Google Scholar
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.Y.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9, 595–607 (2001)
Article Google Scholar
Liu, X., Gong, Y.: Document clustering with clustering refinement and model selection capabilitities. In: Proc. of ACM SIGIR, pp. 191–198 (2002)
Google Scholar
Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)
Article Google Scholar
Raghavan, V.V., Yu, C.T.: A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 393–402 (1981)
Article MATH Google Scholar
Sneath, P.H.A., Sokal, R.R.: Numerical taxonomy: the principles and practice of numerical classification. Freeman, London
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of doucment clustering techniques. In: Proc. of KDD 2000 Workshop on Text Mining (2000)
Google Scholar
Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing 6, 505–517 (1977)
Article MATH MathSciNet Google Scholar
Zhao, Y., Karypis, G.: Criterion functions for document clustering. Technical Report #01-40, Department of Computer Science, University of Minnesota
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
Wei Gao, Kam-Fai Wong, Yunqing Xia & Ruifeng Xu

Authors

Wei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Kam-Fai Wong
View author publications
You can also search for this author in PubMed Google Scholar
Yunqing Xia
View author publications
You can also search for this author in PubMed Google Scholar
Ruifeng Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, W., Wong, KF., Xia, Y., Xu, R. (2006). Clique Percolation Method for Finding Naturally Cohesive and Overlapping Document Clusters. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_10

Download citation

DOI: https://doi.org/10.1007/11940098_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics