Skip to main content

Inductive Querying for Discovering Subgroups and Clusters

  • Conference paper
  • 303 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

Abstract

We introduce the problem of cluster-grouping and show that it integrates several important data mining tasks, i.e. subgroup discovery, mining correlated patterns and aspects from clustering. The problem of cluster-grouping can be regarded as a new type of inductive optimization query that asks for the k best patterns according to a convex criterion. The algorithm CG for solving cluster-grouping problems is presented and the underlying mechanisms are discussed. The approach is experimentally evaluated on a number of real-life data sets. The results indicate that the algorithm improves upon the subgroup discovery algorithm CN2-WRAcc and is competitive with the clustering algorithm CobWeb.

A 3-page abstract of this paper appeared as Albrecht Zimmermann, Luc De Raedt: Cluster-Grouping: From Subgroup Discovery to Clustering. ECML 2004: 575–577.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Todorovski, L., Flach, P.A., Lavrač, N.: Predictive performance of weighted relative accuracy. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 255–264. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Klösgen, W.: Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems 4, 53–69 (1995)

    Article  Google Scholar 

  3. Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)

    Google Scholar 

  4. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: Autoclass: A bayesian classification system. In: Laird, J.E. (ed.) ICML 1988, Ann Arbor, Michigan, USA, pp. 54–64. Morgan Kaufmann, San Francisco (1988)

    Google Scholar 

  5. Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Dallas, Texas, USA, pp. 226–236. ACM, New York (2000)

    Chapter  Google Scholar 

  6. Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining constrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)

    Article  MATH  Google Scholar 

  7. Gluck, M.A., Corter, J.E.: Information, uncertainty, and the utility of categories. In: Proceedings of the 7th Annual Conference of the Cognitive Science Society, Irvine, California, USA, pp. 283–287. Lawrence Erlbaum Associate, Mahwah (1985)

    Google Scholar 

  8. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39, 58–64 (1996)

    Article  Google Scholar 

  9. Raedt, L.D.: A perspective on inductive databases. SIGKDD Explorations 4, 69–77 (2002)

    Article  Google Scholar 

  10. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)

    Google Scholar 

  11. Lavrač, N., Flach, P.A., Zupan, B.: Rule evaluation measures: A unifying view. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  12. Horst, R., Tuy, H.: Global Optimization - Deterministic Approaches. Springer, Heidelberg (1996)

    MATH  Google Scholar 

  13. Sese, J., Morishita, S.: Itemset classified clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 398–409. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Blake, C., Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

  15. Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  16. Rand, W.M.: Objective criteria for evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)

    Article  Google Scholar 

  17. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J.W. (ed.) ICML 1998, Madison, Wisconsin, USA, pp. 144–151. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  18. Talavera, L.: Dynamic feature selection in incremental hierarchical clustering. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 392–403. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  19. Cardie, C.: Using decision trees to improve case-based learning. In: ICML 1993, Amherst, Massachusetts, USA, pp. 25–32. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  20. Flach, P.A., Lachiche, N.: Confirmation-guided discovery of first-order rules with Tertius. Machine Learning 42, 61–95 (2001)

    Article  MATH  Google Scholar 

  21. Murthy, S.K.: On Growing Better Decision Trees from Data. PhD thesis, John Hopkins University, Baltimore, Maryland, USA (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zimmermann, A., De Raedt, L. (2006). Inductive Querying for Discovering Subgroups and Clusters. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_18

Download citation

  • DOI: https://doi.org/10.1007/11615576_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31331-1

  • Online ISBN: 978-3-540-31351-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics