RealKrimp — Finding Hyperintervals that Compress with MDL for Real-Valued Data

  • Jouke Witteveen
  • Wouter Duivesteijn
  • Arno Knobbe
  • Peter Grünwald
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8819)

Abstract

The MDL Principle (induction by compression) is applied with meticulous effort in the Krimpalgorithm for the problem of itemset mining, where one seeks exceptionally frequent patterns in a binary dataset. As is the case with many algorithms in data mining, Krimpis not designed to cope with real-valued data, and it is not able to handle such data natively. Inspired by Krimp’s success at using the MDL Principle in itemset mining, we develop RealKrimp: an MDL-based Krimp-inspired mining scheme that seeks exceptionally high-density patterns in a real-valued dataset. We review how to extend the underlying Kraft inequality, which relates probabilities to codelengths, to real-valued data. Based on this extension we introduce the RealKrimpalgorithm: an efficient method to find hyperintervals that compress the real-valued dataset, without the need for pre-algorithm data discretization.

Keywords

Minimum Description Length Information Theory Real-Valued Data RealKrimp 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jouke Witteveen
    • 1
  • Wouter Duivesteijn
    • 2
  • Arno Knobbe
    • 3
  • Peter Grünwald
    • 4
  1. 1.ILLCUniversity of AmsterdamThe Netherlands
  2. 2.Fakultät für Informatik, LS VIIITU DortmundGermany
  3. 3.LIACSLeiden UniversityThe Netherlands
  4. 4.CWI and Leiden UniversityThe Netherlands

Personalised recommendations