Knowledge discovery with clustering based on rules. Interpreting results
It is clear that nowadays analysis of complex systems is an important handicap in Statistics, Artificial Intelligence, Information Systems, Data visualization, and other fields.
Describing the structure or obtaining knowledge of complex systems is known as a difficult task. The combination of Data Analysis techniques (including clustering), Inductive Learning (knowledge-based systems), Management of Data Bases and Multidimensional Graphical Representation must produce benefits on this field.
Clustering based on rules (CBR) is a methodology developed with the aim of finding the structure of complex domains, which performs better than traditional clustering algorithms or knowledge based systems approaches. In our proposal, a combination of clustering and inductive learning is focussed to the problem of finding and interpreting special patterns (or concepts) from large data bases, in order to extract useful knowledge to represent real-world domains. This methodology and its behaviour as a Knowledge Discovery has been, in fact, presented in previous papers (,
The aim of this paper is to emphasize the reporting phase. Some tools oriented to the interpretation of the clusters are presented; automatic rules generation is presented and applied to a real research. Actually, in a KD system, data preparation and interpretation of the results is as important as the analysis itself. In this paper, missing data treatment is analysed; a statistical test, based on non parametric techniques, for comparing several classifications is presented. Also, a method for finding characteristic values of the classes is presented; this is based on the prototype of each class. Finally, these characterizations allow automatic generation of decision rules, as a predictive tool for future items.
KeywordsCombining many methods in one system statistical tests in KDD applications medicine: diagnosis and prognosis from concept learning to concept discovery Prior domain knowledge and use of discovered knowledge
- 1.Fayyad, U., et al. From Data Mining to Knowledge Discovery: An overview Advances in KD and DM, Fayyad, U., et. al. R. AAAI/MIT, 1996.Google Scholar
- 2.Gibert, K, Cortés, U (98) Clustering based on rules and knowledge discovery in ill-structured domains, Computación y Sistemas, México, 1998. (in press).Google Scholar
- 3.Gibert, K, Cortés, U. Weighing quantitative and qualitative variables in clustering methods, MATH-WARE 10(4), January 1997.Google Scholar
- 4.-Combining a knowledge based system with a clustering method for an inductive construction of models in: P. Cheeseman et al. (Eds.), Selecting Models from Data: AI and Statistics IV, LNS no 89 (Springer-Verlag, New York, 1994) 351–360.Google Scholar
- 5.Gibert, K., Sonicki, Z. (97) Classification based on rules and medical research. Proc Applied Stochastic Models and Data Analysis. Ed. Lauro et al., Napoli. pp 181–186.Google Scholar
- 6.Gower, J. C., A general coefficient for similarity, Biometrics, (27) 857–872.Google Scholar
- 7.Lebart, L et al. Traitement statistique des données. Dunod, Paris.Google Scholar
- 8.Nakhaeizadeh, G. Classification as a subtask of of Data Mining experiences form some industrial projects. In IFCS’96. Kobe, Japan (in press). pp. 17–20.Google Scholar
- 9.Sonicki, Z. et al. (93) The use of induction in routine laboratory diagnostics of thyroid, LIJECNICKI VJESNIK 115, pp 306–309 (in Croatian).Google Scholar