ISMIS 1999: Foundations of Intelligent Systems pp 1-15 | Cite as
Applications and research problems of subgroup mining
Abstract
Knowledge Discovery in Databases (KDD) is a data analysis process which, in contrast to conventional data analysis, automatically generates and evaluates very many hypotheses, deals with complex, i.e. large, high dimensional, multi relational, dynamic, or heterogeneous data, and produces understandable results for those who “own the data”. With these objectives, subgroup mining searches for hypotheses that can be supported or confirmed by the given data and that are represented as a specialization of one of three general hypothesis types: deviating subgroups, associations between two subgroups, and partially ordered sets of subgroups where the partial ordering usually relates to time. This paper gives a short introduction into the methods of subgroup mining. Especially the main preprocessing, data mining and postprocessing steps are discussed in more detail for two applications. We conclude with some problems of the current state of the art of subgroup mining.
Keywords
Data Mining Domain Knowledge Description Language Hill Climbing Inductive Logic ProgrammingPreview
Unable to display preview. Download preview PDF.
References
- 1.Hand, D.: Data mining—reaching beyond statistics. Journal of Official Statistics 3 (1998).Google Scholar
- 2.Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, I.: Fast Discovery of Association Rules. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge (1996) 307–328.Google Scholar
- 3.Mannila, H., Toivonen, H., Verkamo, I.: Discovery of frequent episodes in event sequences, Data Mining and Knowledge Discovery 1 (3) (1997) 259–289.CrossRefGoogle Scholar
- 4.Klösgen, W.: Deviation and association patterns for subgroup mining in temporal, spatial, and textual data bases. In: Polkowski, L., Skowron, A. (eds.). Rough Sets and Current Trends in Computing. Lecture Notes in Artificial Intelligence, Vol. 1424. Springer-Verlag, Berlin Heidelberg New York (1998) 1–18.Google Scholar
- 5.Feldman, R., Klösgen, W., Zilberstein, A.: Visualization Techniques to Explore Data Mining Results for Document Collections. In: Heckerman, D., Mannila, H., Pregibon, D. (eds.) Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-97). AAAI Press, Menlo Park (1997).Google Scholar
- 6.Wrobel, S.: An Algorithm for Multi-relational Discovery of Subgroups. In: Komorowski, J., Zytkow, J. (eds): Principles of Data Mining and Knowledge Discovery. Lecture Notes in Artificial Intelligence, Vol. 1263. Springer-Verlag, Berlin Heidelberg New York (1997) 78–87.Google Scholar
- 7.Siebes, A.: Data Surveying: Foundations of an Inductive Query Language. In: Fayyad, U., Uthurusamy, R. (eds.): Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDDM95). AAAI Press, Menlo Park, CA: (1995).Google Scholar
- 8.Klösgen, W.: Exploration of Simulation Experiments by Discovery. In: Fayyad, U., Uthurusamy, R. (eds.). Proceedings of AAAI-94 Workshop on Knowledge Discovery in Databases. AAAI Press, Menlo Park (1994).Google Scholar
- 9.Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA (1996).Google Scholar
- 10.Friedman, J., Fisher, N.: Bump Hunting in High-Dimensional Data. Statistics and Computing (1998).Google Scholar
- 11.Quinlan, R.: Learning Logical Definitions from Relations. Machine Learning 5(3) (1990).Google Scholar