Discovering Skylines of Subgroup Sets

  • Matthijs van Leeuwen
  • Antti Ukkonen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8190)

Abstract

Many tasks in exploratory data mining aim to discover the top-k results with respect to a certain interestingness measure. Unfortunately, in practice top-k solution sets are hardly satisfactory, if only because redundancy in such results is a severe problem. To address this, a recent trend is to find diverse sets of high-quality patterns. However, a ‘perfect’ diverse top-k cannot possibly exist, since there is an inherent trade-off between quality and diversity.

We argue that the best way to deal with the quality-diversity trade-off is to explicitly consider the Pareto front, or skyline, of non-dominated solutions, i.e. those solutions for which neither quality nor diversity can be improved without degrading the other quantity. In particular, we focus on k-pattern set mining in the context of Subgroup Discovery [6]. For this setting, we present two algorithms for the discovery of skylines; an exact algorithm and a levelwise heuristic.

We evaluate the performance of the two proposed skyline algorithms, and the accuracy of the levelwise method. Furthermore, we show that the skylines can be used for the objective evaluation of subgroup set heuristics. Finally, we show characteristics of the obtained skylines, which reveal that different quality-diversity trade-offs result in clearly different subgroup sets. Hence, the discovery of skylines is an important step towards a better understanding of ‘diverse top-k’s’.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bringmann, B., Nijssen, S., Tatti, N., Vreeken, J., Zimmermann, A.: Mining sets of patterns: Next generation pattern mining. In: Tutorial at ICDM 2011 (2011)Google Scholar
  2. 2.
    Bringmann, B., Zimmermann, A.: The chosen few: On identifying valuable patterns. In: Proceedings of the ICDM 2007, pp. 63–72 (2007)Google Scholar
  3. 3.
    Duivesteijn, W., Knobbe, A.: Exploiting false discoveries – statistical validation of patterns and quality measures in subgroup discovery. In: Proceedings of the ICDM 2011, pp. 151–160 (2011)Google Scholar
  4. 4.
    Ehrgott, M., Gandibleux, X.: A survey and annoted bibliography of multiobjective combinatorial optimization. OR Spektrum (2000)Google Scholar
  5. 5.
    Ehrgott, M., Gandibleux, X.: Approximative solution methods for multiobjective combinatorial optimization. TOP: An Official Journal of the Spanish Society of Statistics and Operations Research 12(1), 1–63 (2004)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)Google Scholar
  7. 7.
    Knobbe, A., Ho, E.K.Y.: Maximally informative k-itemsets and their efficient discovery. In: Proceedings of the KDD 2006, pp. 237–244 (2006)Google Scholar
  8. 8.
    Knobbe, A.J., Ho, E.K.Y.: Pattern teams. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 577–584. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. J. ACM 22(4), 469–476 (1975)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    van Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Mining and Knowledge Discovery 25, 208–242 (2012)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Markowitz, H.: Portfolio selection. The Journal of Finance 7(1), 77–91 (1952)Google Scholar
  12. 12.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  13. 13.
    Soulet, A., Raïssi, C., Plantevit, M., Crémilleux, B.: Mining dominant patterns in the sky. In: Proceedings of the ICDM 2011, pp. 655–664 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Matthijs van Leeuwen
    • 1
  • Antti Ukkonen
    • 2
  1. 1.Department of Computer ScienceKU LeuvenBelgium
  2. 2.Helsinki Institute for Information Technology HIITAalto UniversityFinland

Personalised recommendations