Abstract
We introduce the problem of cluster-grouping and show that it integrates several important data mining tasks, i.e. subgroup discovery, mining correlated patterns and aspects from clustering. The problem of cluster-grouping can be regarded as a new type of inductive optimization query that asks for the k best patterns according to a convex criterion. The algorithm CG for solving cluster-grouping problems is presented and the underlying mechanisms are discussed. The approach is experimentally evaluated on a number of real-life data sets. The results indicate that the algorithm improves upon the subgroup discovery algorithm CN2-WRAcc and is competitive with the clustering algorithm CobWeb.
A 3-page abstract of this paper appeared as Albrecht Zimmermann, Luc De Raedt: Cluster-Grouping: From Subgroup Discovery to Clustering. ECML 2004: 575–577.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Todorovski, L., Flach, P.A., Lavrač, N.: Predictive performance of weighted relative accuracy. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 255–264. Springer, Heidelberg (2000)
Klösgen, W.: Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems 4, 53–69 (1995)
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: Autoclass: A bayesian classification system. In: Laird, J.E. (ed.) ICML 1988, Ann Arbor, Michigan, USA, pp. 54–64. Morgan Kaufmann, San Francisco (1988)
Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Dallas, Texas, USA, pp. 226–236. ACM, New York (2000)
Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining constrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)
Gluck, M.A., Corter, J.E.: Information, uncertainty, and the utility of categories. In: Proceedings of the 7th Annual Conference of the Cognitive Science Society, Irvine, California, USA, pp. 283–287. Lawrence Erlbaum Associate, Mahwah (1985)
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39, 58–64 (1996)
Raedt, L.D.: A perspective on inductive databases. SIGKDD Explorations 4, 69–77 (2002)
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Lavrač, N., Flach, P.A., Zupan, B.: Rule evaluation measures: A unifying view. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999)
Horst, R., Tuy, H.: Global Optimization - Deterministic Approaches. Springer, Heidelberg (1996)
Sese, J., Morishita, S.: Itemset classified clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 398–409. Springer, Heidelberg (2004)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Rand, W.M.: Objective criteria for evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J.W. (ed.) ICML 1998, Madison, Wisconsin, USA, pp. 144–151. Morgan Kaufmann, San Francisco (1998)
Talavera, L.: Dynamic feature selection in incremental hierarchical clustering. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 392–403. Springer, Heidelberg (2000)
Cardie, C.: Using decision trees to improve case-based learning. In: ICML 1993, Amherst, Massachusetts, USA, pp. 25–32. Morgan Kaufmann, San Francisco (1993)
Flach, P.A., Lachiche, N.: Confirmation-guided discovery of first-order rules with Tertius. Machine Learning 42, 61–95 (2001)
Murthy, S.K.: On Growing Better Decision Trees from Data. PhD thesis, John Hopkins University, Baltimore, Maryland, USA (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zimmermann, A., De Raedt, L. (2006). Inductive Querying for Discovering Subgroups and Clusters. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_18
Download citation
DOI: https://doi.org/10.1007/11615576_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)