Biclustering meets triadic concept analysis

  • Mehdi Kaytoue
  • Sergei O. Kuznetsov
  • Juraj Macko
  • Amedeo Napoli
Article

Abstract

Biclustering numerical data became a popular data-mining task at the beginning of 2000’s, especially for gene expression data analysis and recommender systems. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So-called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non-redundant enumeration of such patterns, a well-known intractable problem, while no formal framework exists. We introduce important links between biclustering and Formal Concept Analysis (FCA). Indeed, FCA is known to be, among others, a methodology for biclustering binary data. Handling numerical data is not direct, and we argue that Triadic Concept Analysis (TCA), the extension of FCA to ternary relations, provides a powerful mathematical and algorithmic framework for biclustering numerical data. We discuss hence both theoretical and computational aspects on biclustering numerical data with triadic concept analysis. These results also scale to n-dimensional numerical datasets.

Keywords

Numerical biclustering Similarity relation Formal concept analysis Triadic concept analysis N-ary relations 

Mathematics Subject Classifications (2010)

06 68 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216. ACM Press (1993)Google Scholar
  4. 4.
    Alqadah, F., Bhatnagar, R.: Similarity measures in formal concept analysis. Ann. Math. Artif. Intell. 61(3), 245–256 (2011)CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Besson, J., Robardet, C., Boulicaut, J.-F.: Mining a new fault-tolerant pattern type as an alternative to formal concept discovery. In: Schärfe, H., Hitzler, P., Øhrstrøm, P. (eds.) Conceptual Structures: Inspiration and Application, 14th International Conference on Conceptual Structures (ICCS). Lecture Notes in Computer Science, vol. 4068, pp. 144–157. Springer (2006)Google Scholar
  6. 6.
    Besson, J., Robardet, C., Raedt, L.D., Boulicaut, J.-F.: Mining bi-sets in numerical data. In: Dzeroski, S., Struyf, J. (eds.) KDID. Lecture Notes in Computer Science, vol. 4747, pp. 11–23. Springer (2007)Google Scholar
  7. 7.
    Blachon, S., Pensa, R., Besson, J., Robardet, C., Boulicaut, J.-F., Gandrillon, O.: Clustering formal concepts to discover biologically relevant knowledge from gene expression data. Silico Biology 7(4–5), 467–483 (2007)Google Scholar
  8. 8.
    Braga Araújo, R., Trielli Ferreira, G., Orair, G., Meira, J., Wagner, R., Ferreira, C., Olavo Guedes Neto, D., Zaki, M.: The partricluster algorithm for gene expression analysis. Int. J. Parallel Prog. 36, 226–249 (2008)CrossRefMATHGoogle Scholar
  9. 9.
    Califano, A., Stolovitzky, G., Tu, Y.: Analysis of gene expression microarrays for phenotype classification. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 75–85. AAAI (2000)Google Scholar
  10. 10.
    Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F.: Closed patterns meet n-ary relations. TKDD 3(1), 3:1–3:36 (2009)CrossRefGoogle Scholar
  11. 11.
    Cheng, Y., Church, G.: Biclustering of expression data. In: Proc. 8th International Conference on Intelligent Systems for Molecular Biology (ISBM), pp. 93–103 (2000)Google Scholar
  12. 12.
    Ganter, B., Wille, R.: Formal Concept Analysis. Springer (1999)Google Scholar
  13. 13.
    Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)CrossRefGoogle Scholar
  14. 14.
    Ignatov, D.I., Kuznetsov, S.O., Poelmans, J.: Concept-based biclustering for internet advertisement. In: Vreeken, J., Ling, C., Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds.) ICDM Workshops, pp. 123–130. IEEE Computer Society (2012)Google Scholar
  15. 15.
    Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Trias—an algorithm for mining iceberg tri-lattices. In: ICDM, pp. 907–911 (2006)Google Scholar
  16. 16.
    Ji, L., Tan, K.-L., Tung, A.K.H.: Mining frequent closed cubes in 3d datasets. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp. 811–822. ACM (2006)Google Scholar
  17. 17.
    Kaytoue, M., Assaghir, Z., Napoli, A., Kuznetsov, S.O.: Embedding tolerance relations in formal concept analysis: an application in information fusion. In: CIKM, pp. 1689–1692. ACM (2010)Google Scholar
  18. 18.
    Kaytoue, M., Kuznetsov, S.O., Macko, J., Meira,W., Napoli, A.: Mining biclusters of similar values with Triadic Concept Analysis. In: Napoli, A., Vychodil, V. (eds.) The Eighth International Conference on Concept Lattices and their Applications—CLA 2011. INRIA Nancy Grand Est - LORIA, Nancy, France (2011)Google Scholar
  19. 19.
    Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Biclustering numerical data in formal concept analysis. In: Valtchev, P., Jäschke, R. (eds.) ICFCA. LNCS, vol. 6628, pp. 135–150. Springer (2011)Google Scholar
  20. 20.
    Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181(10), 1989–2001 (2011)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Kaytoue-Uberall, M., Duplessis, S., Kuznetsov, S.O., Napoli, A.: Two fca-based methods for mining gene expression data. In: Ferré, S., Rudolph, S. (eds.) Proceedings of the 7th International Conference on Formal Concept Analysis (ICFCA). Lecture Notes in Computer Science, vol. 5548, pp. 251–266. Springer (2009)Google Scholar
  22. 22.
    Krajca, P., Vychodil, V.: Distributed algorithm for computing formal concepts using map-reduce framework. In: IDA, pp. 333–344. Springer (2009)Google Scholar
  23. 23.
    Kuznetsov, S.O.: A fast algorithm for computing all intersections of objects in a finite semi-lattice. Autom. Doc. Math. Linguist. 27(5), 11–21 (1993)Google Scholar
  24. 24.
    Kuznetsov, S.O., Obiedkov, S.A.: Comparing performance of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell. 14(2–3), 189–216 (2002)CrossRefMATHGoogle Scholar
  25. 25.
    Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: ICCS. LNCS, vol. 954, pp. 32–43. Springer (1995)Google Scholar
  26. 26.
    Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinforma. 1(1), 24–45 (2004)CrossRefGoogle Scholar
  27. 27.
    Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Publisher, Boston (1996)CrossRefMATHGoogle Scholar
  28. 28.
    Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/Crc Computer Science (2005)Google Scholar
  29. 29.
    Mirkin, B., Kramarenko, A.V.: Approximate bicluster and tricluster boxes in the analysis of binary data. In: Kuznetsov, S.O., Slezak, D., Hepting, D.H., Mirkin, B. (eds.) Proceedings of the 13th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC 2011). Lecture Notes in Computer Science, vol. 6743, pp. 248–256. Springer (2011)Google Scholar
  30. 30.
    Motameny, S., Versmold, B., Schmutzler, R.: Formal concept analysis for the identification of combinatorial biomarkers in breast cancer. In: Medina, R., Obiedkov, S.A. (eds.) Formal Concept Analysis, 6th International Conference (ICFCA). Lecture Notes in Computer Science, vol. 4933, pp. 229–240. Springer (2008)Google Scholar
  31. 31.
    Pensa, R.G., Leschi, C., Besson, J., Boulicaut, J.-F.: Assessment of discretization techniques for relevant pattern discovery from gene expression data. In: Zaki, M.J., Morishita, S., Rigoutsos, I. (eds.) Proceedings of the 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD 2004), pp. 24–30 (2004)Google Scholar
  32. 32.
    Prelic, A., Bleuler, S., Zimmermann, P., Wille, A., Buhlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)CrossRefGoogle Scholar
  33. 33.
    Raïssi, C., Pei, J., Kister, T.: Computing closed skycubes. PVLDB 3(1), 838–847 (2010)Google Scholar
  34. 34.
    Soulet, A., Raïssi, C., Plantevit, M., Crémilleux, B.: Mining dominant patterns in the sky. In: Cook, D.J., Pei, J., Wang, W., Zaïane, O.R., Wu, X. (eds.) In: 11th IEEE International Conference on Data Mining (ICDM), pp. 655–664. IEEE (2011)Google Scholar
  35. 35.
    Tchagang, A.B., Phan, S., Famili, F., Shearer, H., Fobert, P.R., Huang, Y., Zou, J., Huang, D., Cutler, A., Liu, Z., Pan, Y.: Mining biological information from 3d short time-series gene expression data: the optricluster algorithm. BMC Bioinforma. 13, 54 (2012)CrossRefGoogle Scholar
  36. 36.
    Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge discovery and data mining: the new challenges. In: Eklund, P.W. (ed.) ICFCA. LNCS, vol. 2961, pp. 352–371. Springer (2004)Google Scholar
  37. 37.
    Voutsadakis, G.: Polyadic concept analysis. Order 19(3), 295–304 (2002)CrossRefMATHMathSciNetGoogle Scholar
  38. 38.
    Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (eds.) Ordered Sets, pp. 445–470. Reidel (1982)Google Scholar
  39. 39.
    Wille, R.: Why can concept lattices support knowledge discovery in databases? J. Exp. Theor. Artif. Intell. 14(2–3), 81–92 (2002)CrossRefMATHGoogle Scholar
  40. 40.
    Zhao, L., Zaki, M.J.: Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD ’05, pp. 694–705. ACM, New York, USA (2005)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Mehdi Kaytoue
    • 1
  • Sergei O. Kuznetsov
    • 2
  • Juraj Macko
    • 3
  • Amedeo Napoli
    • 4
  1. 1.Université de Lyon, CNRS, INSA-Lyon, LIRISVilleurbanne CedexFrance
  2. 2.Higher School of Economis (HSE)MoscowRussia
  3. 3.Palacky UniversityOlomoucCzech Republic
  4. 4.Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)Vandœuvre-lès-NancyFrance

Personalised recommendations