Non-redundant Subgroup Discovery in Large and Complex Data

  • Matthijs van Leeuwen
  • Arno Knobbe
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)

Abstract

Large and complex data is challenging for most existing discovery algorithms, for several reasons. First of all, such data leads to enormous hypothesis spaces, making exhaustive search infeasible. Second, many variants of essentially the same pattern exist, due to (numeric) attributes of high cardinality, correlated attributes, and so on. This causes top-k mining algorithms to return highly redundant result sets, while ignoring many potentially interesting results.

These problems are particularly apparent with Subgroup Discovery and its generalisation, Exceptional Model Mining. To address this, we introduce subgroup set mining: one should not consider individual subgroups, but sets of subgroups. We consider three degrees of redundancy, and propose corresponding heuristic selection strategies in order to eliminate redundancy. By incorporating these strategies in a beam search, the balance between exploration and exploitation is improved.

Experiments clearly show that the proposed methods result in much more diverse subgroup sets than traditional Subgroup Discovery methods.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Matthijs van Leeuwen
    • 1
  • Arno Knobbe
    • 2
  1. 1.Dept. of Information & Computing SciencesUniversiteit UtrechtThe Netherlands
  2. 2.LIACSUniversiteit LeidenThe Netherlands

Personalised recommendations