Advertisement

A Bi-clustering Framework for Categorical Data

  • Ruggero G. Pensa
  • Céline Robardet
  • Jean-François Boulicaut
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3721)

Abstract

Bi-clustering is a promising conceptual clustering approach. Within categorical data, it provides a collection of (possibly overlapping) bi-clusters, i.e., linked clusters for both objects and attribute-value pairs. We propose a generic framework for bi-clustering which enables to compute a bi-partition from collections of local patterns which capture locally strong associations between objects and properties. To validate this framework, we have studied in details the instance CDK-Means. It is a K-Means-like clustering on collections of formal concepts, i.e., connected closed sets on both dimensions. It enables to build bi-partitions with a user control on overlapping between bi-clusters. We provide an experimental validation on many benchmark datasets and discuss the interestingness of the computed bi-partitions.

Keywords

Local Pattern Formal Concept Benchmark Dataset Jaccard Index Scalability Issue 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Jain, A., Dubes, R.: Algorithms for clustering data. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  2. 2.
    Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)Google Scholar
  3. 3.
    Robardet, C., Feschet, F.: Efficient local search in conceptual clustering. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 323–335. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  4. 4.
    Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings ACM SIGKDD 2003, Washington, USA, pp. 89–98. ACM Press, New York (2003)Google Scholar
  5. 5.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)CrossRefGoogle Scholar
  6. 6.
    Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered sets. Reidel, pp. 445–470 (1982)Google Scholar
  7. 7.
    Besson, J., Robardet, C., Boulicaut, J.F., Rome, S.: Constraint-based concept mining and its application to microarray data analysis. Intelligent Data Analysis 9(1), 59–82 (2005)Google Scholar
  8. 8.
    Goodman, L.A., Kruskal, W.H.: Measures of association for cross classification. Journal of the American Statistical Association 49, 732–764 (1954)zbMATHCrossRefGoogle Scholar
  9. 9.
    Pensa, R.G., Robardet, C., Boulicaut, J.F.: Using locally relevant bi-sets for categorical data conceptual clustering. Research report, LIRIS CNRS UMR 5205 - INSA Lyon, Villeurbanne, France (2005) Submitted to a journal (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ruggero G. Pensa
    • 1
  • Céline Robardet
    • 2
  • Jean-François Boulicaut
    • 1
  1. 1.INSA Lyon, LIRIS CNRS UMR 5205VilleurbanneFrance
  2. 2.INSA Lyon, PRISMa EA INSA-UCBL 2058VilleurbanneFrance

Personalised recommendations