Abstract
This paper presents an approach for characterizing groups of data represented by Boolean vectors. The purpose is to find minimal set of attributes that allow to distinguish data from different groups. In this work, we precisely defined the multiple characterization problem and the algorithms that can be used to solve its different variants. Our data characterization approach can be related to Logical Analysis of Data and we propose thus a comparison between these two methodologies. The purpose of this paper is also to precisely study the properties of the solutions that are computed with regards to the topological properties of the instances. Experiments are thus conducted on real biological data.
Similar content being viewed by others
Notes
Note that for simplicity, we present the full array that contains the data matrix (which corresponds thus only the Booleanpart of the array).
Remind that Θ is a set of couples of observations defined in Section 4.1 for indexing lines of the constraint matrix C.
References
Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC Press, Boca Raton
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Acm sigmod record, vol 22. ACM, pp 207–216
Alexe G, Alexe S, Axelrod D, Hammer PL, Weissmann D (2005) Logical analysis of diffuse large b-cell lymphomas. Artif Intell Med 34(3):235–267
Alexe G, Alexe S, Axelrod DE, Bonates TO, Lozina II, Reiss M, Hammer PL (2006) Breast cancer prognosis by combinatorial analysis of gene expression data. Breast Cancer Res 8(4):1–20
Alexe G, Alexe S, Bonates TO, Kogan A (2007) Logical analysis of data - the vision of Peter L. Hammer. Ann Math Artif Intell 49(1–4):265–312
Bennane A, Yacout S (2012) Lad-cbm; new data processing tool for diagnosis and prognosis in condition-based maintenance. J Intell Manuf 23(2):265–275
Boros E, Crama Y, Hammer PL, Ibaraki T, Kogan A, Makino K (2011) Logical analysis of data: classification with justification. Ann Oper Res 188(1):33–61
Boros E, Hammer PL, Ibaraki T, Kogan A (1997) Logical analysis of numerical data. Math Program 79:163–190
Boureau T, Kerkoud M, Chhel F, Hunault G, Darrasse A, Brin C, Durand K, Hajri A, Poussier S, Manceau C, Lardeux F, Saubion F, Jacques M-A. (2013) A multiplex-pcr assay for identification of the quarantine plant pathogen xanthomonas axonopodis pv. phaseoli. J Microbiol Methods 92(1):42–50
Chambon A, Boureau T, Lardeux F, Saubion F, Le Saux M (2015) Characterization of multiple groups of data. In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI). IEEE, New York, pp 1021–1028
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electric Eng 40(1):16 – 28. 40th-year commemorative issue
Chhel F, Lardeux F, Saubion F, Zanuttini B (2013) Application du problėme de caractėrisation multiple ȧla conception de tests de diagnostic pour la biologie vėgėtale. Revue d’Intelligence Artificielle 27(4-5):649–668
Chikalov I, Lozin V, Lozina I, Moshkov M, Nguyen H, Skowron A, Zielosko B (2013) Logical analysis of data: Theory, methodology and applications. In: Three approaches to data analysis. Vol. 41 of intelligent systems reference library. Springer, Berlin, pp 147–192
Dasgupta S (2008) The hardness of k-means clustering. Department of Computer Science and Engineering, University of California, San Diego
Dupuis C, Gamache M, Pagé JF (2012) Logical analysis of data for estimating passenger show rates at Air Canada. J Air Transp Manag 18(1):78–81
Hammer PL, Bonates TO (2006) Logical analysis of data - an overview: from combinatorial optimization to medical applications. Ann Oper Res 148(1):203–225
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Series C (Appl Stat) 28(1):100–108
Kaufman L, Rousseeuw PJ (1990) Partitioning around medoids (program pam). Finding groups in data: an introduction to cluster analysis. Wiley, New York, pp 68–125
Kholodovych V, Smith JR, Knight D, Abramson S, Kohn J, Welsh WJ (2004) Accurate predictions of cellular response using qspr: a feasibility test of rational design of polymeric biomaterials. Polymer 45 (22):7367–7379
Kumar V, Abbas AK, Fausto N, Aster JC (2014) Robbins and Cotran pathologic basis of disease. Elsevier Health Sciences, Amsterdam
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, Oakland, pp 281–297
Makino K, Hatanaka K, Ibaraki T (1999) Horn extensions of a partially defined boolean function. SIAM J Comput 28(6):2168–2186
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chambon, A., Boureau, T., Lardeux, F. et al. Logical characterization of groups of data: a comparative study. Appl Intell 48, 2284–2303 (2018). https://doi.org/10.1007/s10489-017-1080-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1080-3