Normalized compression distance of multisets with applications

Authors
Publication date 08-2015
Journal IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume | Issue number 37 | 8
Pages (from-to) 1602-1614
Number of pages 13
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract

Pairwise normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity metric based on compression. We propose an NCD of multisets that is also metric. Previously, attempts to obtain such an NCD failed. For classification purposes it is superior to the pairwise NCD in accuracy and implementation complexity. We cover the entire trajectory from theoretical underpinning to feasible practice. It is applied to biological (stem cell, organelle transport) and OCR classification questions that were earlier treated with the pairwise NCD. With the new method we achieved significantly better results. The theoretic foundation is Kolmogorov complexity.

Document type Article
Language English
Published at https://doi.org/10.1109/TPAMI.2014.2375175
Other links https://www.scopus.com/pages/publications/84947747978
Permalink to this page
Back