RealKrimp — Finding Hyperintervals that Compress with MDL for Real-Valued Data
The MDL Principle (induction by compression) is applied with meticulous effort in the Krimpalgorithm for the problem of itemset mining, where one seeks exceptionally frequent patterns in a binary dataset. As is the case with many algorithms in data mining, Krimpis not designed to cope with real-valued data, and it is not able to handle such data natively. Inspired by Krimp’s success at using the MDL Principle in itemset mining, we develop RealKrimp: an MDL-based Krimp-inspired mining scheme that seeks exceptionally high-density patterns in a real-valued dataset. We review how to extend the underlying Kraft inequality, which relates probabilities to codelengths, to real-valued data. Based on this extension we introduce the RealKrimpalgorithm: an efficient method to find hyperintervals that compress the real-valued dataset, without the need for pre-algorithm data discretization.
KeywordsMinimum Description Length Information Theory Real-Valued Data RealKrimp
Unable to display preview. Download preview PDF.