Recent Developments in Pattern Mining
Pattern Mining is one of the most researched topics in the data mining community. Literally hundreds of algorithms for efficiently enumerating all frequent itemsets have been proposed. These exhaustive algorithms, however, all suffer from the pattern explosion problem. Depending on the minimal support threshold, even for moderately sized databases, millions of patterns may be generated. Although this problem is by now well recognized in te pattern mining community, it has not yet been solved satisfactorily. In my talk I will give an overview of the different approaches that have been proposed to alleviate this problem. As a first step, constraint-based mining and condensed representations such as the closed itemsets and the non-derivable itemsets were introduced. These methods, however, still produce too many and redundant results. More recently, promising methods based upon the minimal description length principle, information theory, and statistical models have been introduced. We show the respective advantages and disadvantages of these approaches and their connections, and illustrate their usefulness on real life data. After this overview we move from itemsets to more complex patterns, such as sequences and graphs. Even though these extensions seem trivial at first, they turn out to be quite challenging. I will end my talk with an overview of what I consider to be important open questions in this fascinating research area.