Abstract
Biclustering numerical data became a popular data-mining task at the beginning of 2000’s, especially for gene expression data analysis and recommender systems. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So-called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non-redundant enumeration of such patterns, a well-known intractable problem, while no formal framework exists. We introduce important links between biclustering and Formal Concept Analysis (FCA). Indeed, FCA is known to be, among others, a methodology for biclustering binary data. Handling numerical data is not direct, and we argue that Triadic Concept Analysis (TCA), the extension of FCA to ternary relations, provides a powerful mathematical and algorithmic framework for biclustering numerical data. We discuss hence both theoretical and computational aspects on biclustering numerical data with triadic concept analysis. These results also scale to n-dimensional numerical datasets.
Similar content being viewed by others
References
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216. ACM Press (1993)
Alqadah, F., Bhatnagar, R.: Similarity measures in formal concept analysis. Ann. Math. Artif. Intell. 61(3), 245–256 (2011)
Besson, J., Robardet, C., Boulicaut, J.-F.: Mining a new fault-tolerant pattern type as an alternative to formal concept discovery. In: Schärfe, H., Hitzler, P., Øhrstrøm, P. (eds.) Conceptual Structures: Inspiration and Application, 14th International Conference on Conceptual Structures (ICCS). Lecture Notes in Computer Science, vol. 4068, pp. 144–157. Springer (2006)
Besson, J., Robardet, C., Raedt, L.D., Boulicaut, J.-F.: Mining bi-sets in numerical data. In: Dzeroski, S., Struyf, J. (eds.) KDID. Lecture Notes in Computer Science, vol. 4747, pp. 11–23. Springer (2007)
Blachon, S., Pensa, R., Besson, J., Robardet, C., Boulicaut, J.-F., Gandrillon, O.: Clustering formal concepts to discover biologically relevant knowledge from gene expression data. Silico Biology 7(4–5), 467–483 (2007)
Braga Araújo, R., Trielli Ferreira, G., Orair, G., Meira, J., Wagner, R., Ferreira, C., Olavo Guedes Neto, D., Zaki, M.: The partricluster algorithm for gene expression analysis. Int. J. Parallel Prog. 36, 226–249 (2008)
Califano, A., Stolovitzky, G., Tu, Y.: Analysis of gene expression microarrays for phenotype classification. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 75–85. AAAI (2000)
Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F.: Closed patterns meet n-ary relations. TKDD 3(1), 3:1–3:36 (2009)
Cheng, Y., Church, G.: Biclustering of expression data. In: Proc. 8th International Conference on Intelligent Systems for Molecular Biology (ISBM), pp. 93–103 (2000)
Ganter, B., Wille, R.: Formal Concept Analysis. Springer (1999)
Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)
Ignatov, D.I., Kuznetsov, S.O., Poelmans, J.: Concept-based biclustering for internet advertisement. In: Vreeken, J., Ling, C., Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds.) ICDM Workshops, pp. 123–130. IEEE Computer Society (2012)
Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Trias—an algorithm for mining iceberg tri-lattices. In: ICDM, pp. 907–911 (2006)
Ji, L., Tan, K.-L., Tung, A.K.H.: Mining frequent closed cubes in 3d datasets. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp. 811–822. ACM (2006)
Kaytoue, M., Assaghir, Z., Napoli, A., Kuznetsov, S.O.: Embedding tolerance relations in formal concept analysis: an application in information fusion. In: CIKM, pp. 1689–1692. ACM (2010)
Kaytoue, M., Kuznetsov, S.O., Macko, J., Meira,W., Napoli, A.: Mining biclusters of similar values with Triadic Concept Analysis. In: Napoli, A., Vychodil, V. (eds.) The Eighth International Conference on Concept Lattices and their Applications—CLA 2011. INRIA Nancy Grand Est - LORIA, Nancy, France (2011)
Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Biclustering numerical data in formal concept analysis. In: Valtchev, P., Jäschke, R. (eds.) ICFCA. LNCS, vol. 6628, pp. 135–150. Springer (2011)
Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181(10), 1989–2001 (2011)
Kaytoue-Uberall, M., Duplessis, S., Kuznetsov, S.O., Napoli, A.: Two fca-based methods for mining gene expression data. In: Ferré, S., Rudolph, S. (eds.) Proceedings of the 7th International Conference on Formal Concept Analysis (ICFCA). Lecture Notes in Computer Science, vol. 5548, pp. 251–266. Springer (2009)
Krajca, P., Vychodil, V.: Distributed algorithm for computing formal concepts using map-reduce framework. In: IDA, pp. 333–344. Springer (2009)
Kuznetsov, S.O.: A fast algorithm for computing all intersections of objects in a finite semi-lattice. Autom. Doc. Math. Linguist. 27(5), 11–21 (1993)
Kuznetsov, S.O., Obiedkov, S.A.: Comparing performance of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell. 14(2–3), 189–216 (2002)
Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: ICCS. LNCS, vol. 954, pp. 32–43. Springer (1995)
Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinforma. 1(1), 24–45 (2004)
Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Publisher, Boston (1996)
Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/Crc Computer Science (2005)
Mirkin, B., Kramarenko, A.V.: Approximate bicluster and tricluster boxes in the analysis of binary data. In: Kuznetsov, S.O., Slezak, D., Hepting, D.H., Mirkin, B. (eds.) Proceedings of the 13th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC 2011). Lecture Notes in Computer Science, vol. 6743, pp. 248–256. Springer (2011)
Motameny, S., Versmold, B., Schmutzler, R.: Formal concept analysis for the identification of combinatorial biomarkers in breast cancer. In: Medina, R., Obiedkov, S.A. (eds.) Formal Concept Analysis, 6th International Conference (ICFCA). Lecture Notes in Computer Science, vol. 4933, pp. 229–240. Springer (2008)
Pensa, R.G., Leschi, C., Besson, J., Boulicaut, J.-F.: Assessment of discretization techniques for relevant pattern discovery from gene expression data. In: Zaki, M.J., Morishita, S., Rigoutsos, I. (eds.) Proceedings of the 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD 2004), pp. 24–30 (2004)
Prelic, A., Bleuler, S., Zimmermann, P., Wille, A., Buhlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Raïssi, C., Pei, J., Kister, T.: Computing closed skycubes. PVLDB 3(1), 838–847 (2010)
Soulet, A., Raïssi, C., Plantevit, M., Crémilleux, B.: Mining dominant patterns in the sky. In: Cook, D.J., Pei, J., Wang, W., Zaïane, O.R., Wu, X. (eds.) In: 11th IEEE International Conference on Data Mining (ICDM), pp. 655–664. IEEE (2011)
Tchagang, A.B., Phan, S., Famili, F., Shearer, H., Fobert, P.R., Huang, Y., Zou, J., Huang, D., Cutler, A., Liu, Z., Pan, Y.: Mining biological information from 3d short time-series gene expression data: the optricluster algorithm. BMC Bioinforma. 13, 54 (2012)
Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge discovery and data mining: the new challenges. In: Eklund, P.W. (ed.) ICFCA. LNCS, vol. 2961, pp. 352–371. Springer (2004)
Voutsadakis, G.: Polyadic concept analysis. Order 19(3), 295–304 (2002)
Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (eds.) Ordered Sets, pp. 445–470. Reidel (1982)
Wille, R.: Why can concept lattices support knowledge discovery in databases? J. Exp. Theor. Artif. Intell. 14(2–3), 81–92 (2002)
Zhao, L., Zaki, M.J.: Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD ’05, pp. 694–705. ACM, New York, USA (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kaytoue, M., Kuznetsov, S.O., Macko, J. et al. Biclustering meets triadic concept analysis. Ann Math Artif Intell 70, 55–79 (2014). https://doi.org/10.1007/s10472-013-9379-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-013-9379-1
Keywords
- Numerical biclustering
- Similarity relation
- Formal concept analysis
- Triadic concept analysis
- N-ary relations