The similarity metric
| Authors |
|
|---|---|
| Publication date | 2003 |
| Book title | Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms |
| Book subtitle | SODA '03 : Baltimore, MD, January 12-13, 2003 |
| ISBN |
|
| Event | 14th ACM-SIAM symposium on discrete algorithms |
| Pages (from-to) | 863-872 |
| Publisher | New York: Association for Computing Machinery |
| Organisations |
|
| Abstract |
A new class of metrics appropriate for measuring effective similarity
relations between sequences, say one type of similarity per metric, is
studied. We propose a new "normalized information distance", based on
the noncomputable notion of Kolmogorov complexity, and show that it
minorizes every metric in the class (that is, it is universal in that it
discovers all effective similarities). We demonstrate that it too is a
metric and takes values in [0, 1]; hence it may be called the similarity metric.
This is a theory foundation for a new general practical tool. We give
two distinctive applications in widely divergent areas (the experiments
by necessity use just computable approximations to the target notions).
First, we computationally compare whole mitochondrial genomes and infer
their evolutionary history. This results in a first completely automatic
computed whole mitochondrial phylogeny tree. Secondly, we give fully
automatically computed language tree of 52 different language based on
translated versions of the "Universal Declaration of Human Rights".
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://dl.acm.org/doi/10.5555/644108.644253 |
| Permalink to this page | |