Data Mining and Knowledge Discovery

, Volume 23, Issue 1, pp 169–214

Krimp: mining itemsets that compress

  • Jilles Vreeken
  • Matthijs van Leeuwen
  • Arno Siebes
Open Access
Article

DOI: 10.1007/s10618-010-0202-x

Cite this article as:
Vreeken, J., van Leeuwen, M. & Siebes, A. Data Min Knowl Disc (2011) 23: 169. doi:10.1007/s10618-010-0202-x

Abstract

One of the major problems in pattern mining is the explosion of the number of results. Tight constraints reveal only common knowledge, while loose constraints lead to an explosion in the number of returned patterns. This is caused by large groups of patterns essentially describing the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of patterns is that set that compresses the database best. For this task we introduce the Krimp algorithm. Experimental evaluation shows that typically only hundreds of itemsets are returned; a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets. These selections, called code tables, are of high quality. This is shown with compression ratios, swap-randomisation, and the accuracies of the code table-based Krimp classifier, all obtained on a wide range of datasets. Further, we extensively evaluate the heuristic choices made in the design of the algorithm.

Keywords

MDL Pattern mining Pattern selection Itemsets 

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Jilles Vreeken
    • 1
    • 2
  • Matthijs van Leeuwen
    • 1
  • Arno Siebes
    • 1
  1. 1.Algorithmic Data Analysis, Department of Information and Computing Sciences, Faculty of ScienceUniversiteit UtrechtUtrechtThe Netherlands
  2. 2.ADReM, Department of Mathematics and Computer ScienceFaculty of Science, University of AntwerpAntwerpBelgium

Personalised recommendations