Chapter

Frequent Pattern Mining

pp 165-198

Date:

Mining and Using Sets of Patterns through Compression

  • Matthijs van LeeuwenAffiliated withKU Leuven Email author 
  • , Jilles VreekenAffiliated withMax-Planck Institute for Informatics and Saarland University

* Final gross prices may vary according to local VAT.

Get Access

Abstract

In this chapter we describe how to successfully apply the MDL principle to pattern mining. In particular, we discuss how pattern-based models can be designed and induced by means of compression, resulting in succinct and characteristic descriptions of the data.

As motivation, we argue that traditional pattern mining asks the wrong question: instead of asking for all patterns satisfying some interestingness measure, one should ask for a small, non-redundant, and interesting set of patterns—which allows us to avoid the pattern explosion. Firmly rooted in algorithmic information theory, the approach we discuss in this chapter states that the best set of patterns is that set that compresses the data best. We formalize this problem using the Minimum Description Length (MDL) principle, describe useful model classes, and briefly discuss algorithmic approaches to inducing good models from data. Last but not least, we describe how the obtained models—in addition to showing the key patterns of the data—can be used for a wide range of data mining tasks; hence showing that MDL selects useful patterns.

Keywords

Compression MDL Pattern set mining Data summarization