- Jilles VreekenAffiliated withMax-Planck Institute for Informatics and Saarland University Email author
- , Nikolaj TattiAffiliated withHIIT, Department of Information and Computer Science, Aalto University
Pattern mining is one of the most important aspects of data mining. By far the most popular and well-known approach is frequent pattern mining. That is, to discover patterns that occur in many transactions. This approach has many virtues including monotonicity, which allows efficient discovery of all frequent patterns. Nevertheless, in practice frequent pattern mining rarely gives good results—the number of discovered patterns is typically gargantuan and they are heavily redundant.
Consequently, a lot of research effort has been invested toward improving the quality of the discovered patterns. In this chapter we will give an overview of the interestingness measures and other redundancy reduction techniques that have been proposed to this end.
In particular, we first present classic techniques such as closed and non-derivable itemsets that are used to prune unnecessary itemsets. We then discuss techniques for ranking patterns on how expected their score is under a null hypothesis—considering patterns that deviate from this expectation to be interesting. These models can either be static, as well as dynamic; we can iteratively update this model as we discover new patterns. More generally, we also give a brief overview on pattern set mining techniques, where we measure quality over a set of patterns, instead of individually. This setup gives us freedom to explicitly punish redundancy which leads to a more to-the-point results.
KeywordsPattern mining Interestingness measures Statistics Ranking Pattern set mining
- Interesting Patterns
- Book Title
- Frequent Pattern Mining
- pp 105-134
- Print ISBN
- Online ISBN
- Springer International Publishing
- Copyright Holder
- Springer International Publishing Switzerland
- Additional Links
- Pattern mining
- Interestingness measures
- Pattern set mining
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 1. IBM
- 2. University of Illinois at Urbana-Champaign
- Author Affiliations
- 3. Max-Planck Institute for Informatics and Saarland University, Saarbrücken, Germany
- 4. HIIT, Department of Information and Computer Science, Aalto University, Helsinki, Finland
To view the rest of this content please follow the download PDF link above.