RealKrimp - Finding Hyperintervals that Compress with MDL for Real-Valued Data

Authors
Publication date 2014
Host editors
  • H. Blockeel
  • M. Leeuwen
  • V. Vinciotti
Book title Advances in Intelligent Data Analysis XIII
Book subtitle 13th International Symposium, IDA 2014, Leuven, Belgium, October 30-November 1, 2014 : proceedings
ISBN
  • 9783319125701
ISBN (electronic)
  • 9783319125718
Series Lecture Notes in Computer Science
Event IDA 2014
Pages (from-to) 368-379
Publisher Cham: Springer
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
The MDL Principle (induction by compression) is applied with meticulous effort in the Krimpalgorithm for the problem of itemset mining, where one seeks exceptionally frequent patterns in a binary dataset. As is the case with many algorithms in data mining, Krimpis not designed to cope with real-valued data, and it is not able to handle such data natively. Inspired by Krimp’s success at using the MDL Principle in itemset mining, we develop RealKrimp: an MDL-based Krimp-inspired mining scheme that seeks exceptionally high-density patterns in a real-valued dataset. We review how to extend the underlying Kraft inequality, which relates probabilities to codelengths, to real-valued data. Based on this extension we introduce the RealKrimpalgorithm: an efficient method to find hyperintervals that compress the real-valued dataset, without the need for pre-algorithm data discretization.
Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-3-319-12571-8_32
Permalink to this page
Back