Inductive Querying for Discovering Subgroups and Clusters

Zimmermann, Albrecht; De Raedt, Luc

doi:10.1007/11615576_18

Inductive Querying for Discovering Subgroups and Clusters

Albrecht Zimmermann²¹ &
Luc De Raedt²¹

Conference paper

303 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

Abstract

We introduce the problem of cluster-grouping and show that it integrates several important data mining tasks, i.e. subgroup discovery, mining correlated patterns and aspects from clustering. The problem of cluster-grouping can be regarded as a new type of inductive optimization query that asks for the k best patterns according to a convex criterion. The algorithm CG for solving cluster-grouping problems is presented and the underlying mechanisms are discussed. The approach is experimentally evaluated on a number of real-life data sets. The results indicate that the algorithm improves upon the subgroup discovery algorithm CN2-WRAcc and is competitive with the clustering algorithm CobWeb.

A 3-page abstract of this paper appeared as Albrecht Zimmermann, Luc De Raedt: Cluster-Grouping: From Subgroup Discovery to Clustering. ECML 2004: 575–577.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Todorovski, L., Flach, P.A., Lavrač, N.: Predictive performance of weighted relative accuracy. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 255–264. Springer, Heidelberg (2000)
Chapter Google Scholar
Klösgen, W.: Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems 4, 53–69 (1995)
Article Google Scholar
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)
Google Scholar
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: Autoclass: A bayesian classification system. In: Laird, J.E. (ed.) ICML 1988, Ann Arbor, Michigan, USA, pp. 54–64. Morgan Kaufmann, San Francisco (1988)
Google Scholar
Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Dallas, Texas, USA, pp. 226–236. ACM, New York (2000)
Chapter Google Scholar
Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining constrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)
Article MATH Google Scholar
Gluck, M.A., Corter, J.E.: Information, uncertainty, and the utility of categories. In: Proceedings of the 7th Annual Conference of the Cognitive Science Society, Irvine, California, USA, pp. 283–287. Lawrence Erlbaum Associate, Mahwah (1985)
Google Scholar
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39, 58–64 (1996)
Article Google Scholar
Raedt, L.D.: A perspective on inductive databases. SIGKDD Explorations 4, 69–77 (2002)
Article Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Google Scholar
Lavrač, N., Flach, P.A., Zupan, B.: Rule evaluation measures: A unifying view. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999)
Chapter Google Scholar
Horst, R., Tuy, H.: Global Optimization - Deterministic Approaches. Springer, Heidelberg (1996)
MATH Google Scholar
Sese, J., Morishita, S.: Itemset classified clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 398–409. Springer, Heidelberg (2004)
Chapter Google Scholar
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Google Scholar
Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Rand, W.M.: Objective criteria for evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)
Article Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J.W. (ed.) ICML 1998, Madison, Wisconsin, USA, pp. 144–151. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Talavera, L.: Dynamic feature selection in incremental hierarchical clustering. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 392–403. Springer, Heidelberg (2000)
Chapter Google Scholar
Cardie, C.: Using decision trees to improve case-based learning. In: ICML 1993, Amherst, Massachusetts, USA, pp. 25–32. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Flach, P.A., Lachiche, N.: Confirmation-guided discovery of first-order rules with Tertius. Machine Learning 42, 61–95 (2001)
Article MATH Google Scholar
Murthy, S.K.: On Growing Better Decision Trees from Data. PhD thesis, John Hopkins University, Baltimore, Maryland, USA (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair of Machine Learning, Institute of Computer Science, Albert-Ludwigs-University, Freiburg, Georges-Köhler-Allee 079, 79110, Freiburg, Germany
Albrecht Zimmermann & Luc De Raedt

Authors

Albrecht Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Luc De Raedt
HIIT, Helsinki University of Technology and, University of Helsinki, Finland
Heikki Mannila

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zimmermann, A., De Raedt, L. (2006). Inductive Querying for Discovering Subgroups and Clusters. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_18

Download citation

DOI: https://doi.org/10.1007/11615576_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics