A New Fuzzy Co-clustering Algorithm for Categorization of Datasets with Overlapping Clusters

  • William-Chandra Tjhi
  • Lihui Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4093)


Fuzzy co-clustering is a method that performs simultaneous fuzzy clustering of objects and features. In this paper, we introduce a new fuzzy co-clustering algorithm for high-dimensional datasets called Cosine-Distance-based & Dual-partitioning Fuzzy Co-clustering (CODIALING FCC). Unlike many existing fuzzy co-clustering algorithms, CODIALING FCC is a dual-partitioning algorithm. It clusters the features in the same manner as it clusters the objects, that is, by partitioning them according to their natural groupings. It is also a cosine-distance-based algorithm because it utilizes the cosine distance to capture the belongingness of objects and features in the co-clusters. Our main purpose of introducing this new algorithm is to improve the performance of some prominent existing fuzzy co-clustering algorithms in dealing with datasets with high overlaps. In our opinion, this is very crucial since most real-world datasets involve significant amount of overlaps in their inherent clustering structures. We discuss how this improvement can be made through the dual-partitioning formulation adopted. Experimental results on a toy problem and five large benchmark document datasets demonstrate the effectiveness of CODIALING FCC in handling overlaps better.


Feature Cluster Feature Ranking Object Cluster Cosine Distance High Membership 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mitra, S., Acharya, T.: Data Mining Multimedia, Soft Computing, and Bioinformatics. John Wiley & Sons Inc., New Jersey (2003)Google Scholar
  2. 2.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Academic Press, London (2001)Google Scholar
  3. 3.
    Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proc. of the Twenty First Annual International ACM SIGIR Conf. on R&D in Information Retrieval, pp. 46–54 (1998)Google Scholar
  4. 4.
    Madeira, S.C., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Trans. on Comp. Biology and Bioinf. 1, 24–45 (2004)CrossRefGoogle Scholar
  5. 5.
    Ertoz, L., Steinbach, M., Kumar, V.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In: Proc. of SIAM International Conf. on Data Mining (2003)Google Scholar
  6. 6.
    Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-clustering. In: Proc of the Ninth ACM SIGKDD International Conf. on KDD, pp. 89–98 (2003)Google Scholar
  7. 7.
    Banerjee, A., Dhillon, I.S., Modha, D.S.: A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation. In: Proc. of the Tenth ACM SIGKDD International Conf. on KDD, pp. 509–514 (2004)Google Scholar
  8. 8.
    Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum Sum-squared Residues Co-clustering of Gene Expression Data. In: Proc. of the Fourth SIAM International Conf. on Data Mining (2004)Google Scholar
  9. 9.
    Mandhani, B., Joshi, S., Kummamuru, K.: A Matrix Density Based Algorithm to Hierarchically Co-Cluster Documents and Words. In: Proc. of the Twelfth Int. Conference on WWW, pp. 511–518 (2003)Google Scholar
  10. 10.
    Zadeh, L.A.: Fuzzy Sets. Information and Control 8 (1965)Google Scholar
  11. 11.
    Frigui, H., Nasraoui, O.: Simultaneous Clustering and Dynamic Keyword Weighting for Text Documents. In: Berry, M.W. (ed.) Survey of Text Mining, pp. 45–72. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Kummamuru, K., Dhawale, A., Krishnapuram, R.: Fuzzy Co-clustering of Documents and Keywords. IEEE International Conf. on Fuzzy Systems 2, 772–777 (2003)CrossRefGoogle Scholar
  13. 13.
    Ruspini, E.: A new approach to clustering. Information and Control 15, 22–32 (1969)CrossRefMATHGoogle Scholar
  14. 14.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press (1981)Google Scholar
  15. 15.
    Oh, C.H., Honda, K., Ichihashi, H.: Fuzzy Clustering for Categorical Multivariate Data. In: Proc. of Joint 9th IFSA World Congress and 2nd NAFIPS Inter. Conf., pp. 2154–2159 (2001)Google Scholar
  16. 16.
    Sinka, M.P., Corne, D.W.: A Large Benchmark Dataset for Web Document Clustering. In: Abraham, A., et al. (eds.) Soft Computing Systems: Design, Management and Applications, pp. 881–892. IOS Press, Amsterdam (2002)Google Scholar
  17. 17.
    Dhillon, I.S., Fan, J., Guan, Y.: Efficient Clustering of Very Large Document Collections. In: Grossman, R.L., et al. (eds.) Data Mining for Scientific and Engineering Applications, pp. 357–382. Kluwer Academic Publishers, Dordrecht (2001)CrossRefGoogle Scholar
  18. 18.
    Yates, R.B., Neto, R.R.: Modern Information Retrieval. ACM Press, New York (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • William-Chandra Tjhi
    • 1
  • Lihui Chen
    • 1
  1. 1.Nanyang Technological UniversityRepublic of Singapore

Personalised recommendations