From Local Pattern Mining to Relevant Bi-cluster Characterization

  • Ruggero G. Pensa
  • Jean-François Boulicaut
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3646)


Clustering or bi-clustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bi-partition is available. We introduce a generic cluster characterization technique which is based on collections of bi-sets (i.e., sets of objects associated to sets of properties) which satisfy some user-defined constraints, and a measure of the accuracy of a given bi-set as a bi-cluster characterization pattern. The method is illustrated on both formal concepts (i.e., “maximal rectangles of true values”) and the new type of δ-bi-sets (i.e., “rectangles of true values with a bounded number of exceptions per column”). The added-value is illustrated on benchmark data and two real data sets which are intrinsically noisy: a medical data about meningitis and Plasmodium falciparum gene expression data.


Association Rule Formal Concept Association Rule Mining Minimal Frequency Subgroup Discovery 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jain, A., Dubes, R.: Algorithms for clustering data. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  2. 2.
    Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95, 14863–14868 (1998)CrossRefGoogle Scholar
  3. 3.
    Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)Google Scholar
  4. 4.
    Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings ISMB 2000, San Diego, USA, pp. 93–103. AAAI Press, Menlo Park (2000)Google Scholar
  5. 5.
    Robardet, C., Feschet, F.: Efficient local search in conceptual clustering. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 323–335. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings ACM SIGKDD 2003, Washington, USA, pp. 89–98. ACM Press, New York (2003)Google Scholar
  7. 7.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)CrossRefGoogle Scholar
  8. 8.
    Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered sets. Reidel, pp. 445–470 (1982)Google Scholar
  9. 9.
    Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal 7, 5–22 (2003)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Stumme, G., Taouil, R., Bastide, Y., Pasqier, N., Lakhal, L.: Computing iceberg concept lattices with TITANIC. Data & Knowledge Engineering 42, 189–222 (2002)zbMATHCrossRefGoogle Scholar
  11. 11.
    Besson, J., Robardet, C., Boulicaut, J.F.: Constraint-based mining of formal concepts in transactional data. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 615–624. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Besson, J., Robardet, C., Boulicaut, J.F.: Mining formal concepts with a bounded number of exceptions from transactional data. In: Goethals, B., Siebes, A. (eds.) KDID 2004. LNCS, vol. 3377, pp. 33–45. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD 1993, Washington, D.C., USA, pp. 207–216. ACM Press, New York (1993)CrossRefGoogle Scholar
  14. 14.
    Boulicaut, J.F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by mean of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  15. 15.
    Crémilleux, B., Boulicaut, J.F.: Simplest rules characterizing classes generated by delta-free sets. In: Proceedings, E.S. (ed.) Proceedings ES 2002, Cambridge, UK, pp. 33–46 (2002)Google Scholar
  16. 16.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings KDD 1998, New York, pp. 80–86 (1998)Google Scholar
  17. 17.
    Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings ICDM 2001, San Jose, CA, pp. 369–376 (2001)Google Scholar
  18. 18.
    Robardet, C., Crémilleux, B., Boulicaut, J.F.: Characterization of unsupervized clusters by means of the simplest association rules: an application for child’s meningitis. In: Proceedings IDAMAP 2002 co-located with ECAI 2002, Lyon, pp. 61–66 (2002)Google Scholar
  19. 19.
    Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  20. 20.
    Bozdech, Z., Llinás, M., Pulliam, B.L., Wong, E., Zhu, J., DeRisi, J.: The transcriptome of the intraerythrocytic developmental cycle of plasmodium falciparum. PLoS Biology 1, 1–16 (2003)CrossRefGoogle Scholar
  21. 21.
    Pensa, R.G., Leschi, C., Besson, J., Boulicaut, J.F.: Assessment of discretization techniques for relevant pattern discovery from gene expression data. In: Proceedings ACM BIOKDD 2004, Seattle, USA, pp. 24–30 (2004)Google Scholar
  22. 22.
    Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: Methodology and application. JAIR 17, 501–527 (2002)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ruggero G. Pensa
    • 1
  • Jean-François Boulicaut
    • 1
  1. 1.INSA Lyon, LIRIS CNRS, UMR 5205Villeurbanne, cedexFrance

Personalised recommendations