Difference-Based Estimates for Generalization-Aware Subgroup Discovery

  • Florian Lemmerich
  • Martin Becker
  • Frank Puppe
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8190)


For the task of subgroup discovery, generalization-aware interesting measures that are based not only on the statistics of the patterns itself, but also on the statistics of their generalizations have recently been shown to be essential. A key technique to increase runtime performance of subgroup discovery algorithms is the application of optimistic estimates to limit the search space size. These are upper bounds for the interestingness that any specialization of the currently evaluated pattern may have. Until now these estimates are based on the anti-monotonicity of instances, which are covered by the current pattern. This neglects important properties of generalizations. Therefore, we present in this paper a new scheme of deriving optimistic estimates for generalization aware subgroup discovery, which is based on the instances by which patterns differ in comparison to their generalizations. We show, how this technique can be applied for the most popular interestingness measures for binary as well as for numeric target concepts. The novel bounds are incorporated in an efficient algorithm, which outperforms previous methods by up to an order of magnitude.


Association Rule Quality Function Optimistic Estimate Target Concept Positive Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agrawal, R., Imielienski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record, 1–10 (May 1993)Google Scholar
  2. 2.
    Atzmueller, M., Lemmerich, F.: Fast subgroup discovery for continuous target concepts. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 35–44. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Atzmueller, M., Lemmerich, F.: VIKAMINE–Open-Source Subgroup Discovery, Pattern Mining, and Analytics. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 842–845. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Aumann, Y., Lindell, Y.: A statistical theory for quantitative association rules. In: Knowledge Discovery and Data Mining, pp. 261–270 (1999)Google Scholar
  5. 5.
    Batal, I., Hauskrecht, M.: A concise representation of association rules using minimal predictive rules. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 87–102. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Batal, I., Hauskrecht, M.: Constructing classification features using minimal predictive patterns. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 869–877 (2010)Google Scholar
  7. 7.
    Bay, S., Pazzani, M.: Detecting change in categorical data: Mining contrast sets. In: Proceedings of the Fifth ACM SIGKDD Int. Conf. on KDD (1999)Google Scholar
  8. 8.
    Bayardo, R.: Efficiently mining long patterns from databases. ACM SIGMOD Record, 85–93 (1998)Google Scholar
  9. 9.
    Blake, C., Merz, C.J.: {UCI} Repository of machine learning databases (1998)Google Scholar
  10. 10.
    Cheng, H., Yan, X., Han, J., Yu, P.: Direct discriminative pattern mining for effective classification. In: ICDE 2008, Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pp. 169–178 (April 2008)Google Scholar
  11. 11.
    Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1–11 (1999)Google Scholar
  12. 12.
    Garriga, G., Kralj, P., Lavrac, N.: Closed sets for labeled data. The Journal of Machine Learning Research 9, 559–580 (2008)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining. ACM Computing Surveys 38(3), 9–es (2006)Google Scholar
  14. 14.
    Grosskreutz, H., Boley, M., Krause-Traudes, M.: Subgroup discovery for election analysis: a case study in descriptive data mining. Disc. Science, 57–71 (2010)Google Scholar
  15. 15.
    Grosskreutz, H., Rüping, S., Wrobel, S.: Tight optimistic estimates for fast subgroup discovery. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 440–456. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Kavšek, B., Lavrač, N.: Apriori-Sd: Adapting Association Rule Learning To Subgroup Discovery 20 (September 2006)Google Scholar
  17. 17.
    Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence (1996)Google Scholar
  18. 18.
    Kralj Novak, P., Lavrač, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set. Emerging Pattern and Subgroup Mining 10, 377–403 (2009)zbMATHGoogle Scholar
  19. 19.
    Lemmerich, F., Puppe, F.: Local Models for Expectation-Driven Subgroup Discovery. In: 2011 IEEE 11th International Conference on Data Mining, pp. 360–369 (2011)Google Scholar
  20. 20.
    Morishita, S., Sese, J.: Traversing Itemset Lattices with Statistical Metric Pruning. In: Proc. of ACM SIGMOD, pp. 226–236 (2000)Google Scholar
  21. 21.
    Nijssen, S., Guns, T., Raedt, L.D.: Correlated itemset mining in roc space: a constraint programming approach. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009)Google Scholar
  22. 22.
    Webb, G.I.: OPUS: An efficient admissible algorithm for unordered search. arXiv preprint cs/9512101 3, 431–465 (1995)zbMATHGoogle Scholar
  23. 23.
    Webb, G.I.: Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2001)Google Scholar
  24. 24.
    Webb, G.I., Zhang, S.: Removing trivial associations in association rule discovery. In: Proceedings of the First International NAISO Congress on Autonomous Intelligent Systems. NAISO Academic Press, Geelong (2002)Google Scholar
  25. 25.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  26. 26.
    Zimmermann, A., Raedt, L.D.: From Subgroup Discovery to Clustering. Machine Learning 77(1), 125–159 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Florian Lemmerich
    • 1
  • Martin Becker
    • 1
  • Frank Puppe
    • 1
  1. 1.Artificial Intelligence and Applied Computer Science GroupUniversity of WürzburgWürzburgGermany

Personalised recommendations