RealKrimp - Finding Hyperintervals that Compress with MDL for Real-Valued Data
| Authors |
|
|---|---|
| Publication date | 2014 |
| Host editors |
|
| Book title | Advances in Intelligent Data Analysis XIII |
| Book subtitle | 13th International Symposium, IDA 2014, Leuven, Belgium, October 30-November 1, 2014 : proceedings |
| ISBN |
|
| ISBN (electronic) |
|
| Series | Lecture Notes in Computer Science |
| Event | IDA 2014 |
| Pages (from-to) | 368-379 |
| Publisher | Cham: Springer |
| Organisations |
|
| Abstract |
The MDL Principle (induction by compression) is applied with meticulous effort in the Krimpalgorithm for the problem of itemset mining, where one seeks exceptionally frequent patterns in a binary dataset. As is the case with many algorithms in data mining, Krimpis not designed to cope with real-valued data, and it is not able to handle such data natively. Inspired by Krimp’s success at using the MDL Principle in itemset mining, we develop RealKrimp: an MDL-based Krimp-inspired mining scheme that seeks exceptionally high-density patterns in a real-valued dataset. We review how to extend the underlying Kraft inequality, which relates probabilities to codelengths, to real-valued data. Based on this extension we introduce the RealKrimpalgorithm: an efficient method to find hyperintervals that compress the real-valued dataset, without the need for pre-algorithm data discretization.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1007/978-3-319-12571-8_32 |
| Permalink to this page | |