Power-Law Distributions for Paraphrases Extracted from Bilingual Corpora
| Authors | |
|---|---|
| Publication date | 2012 |
| Host editors |
|
| Book title | EACL 2012: 13th Conference of the European Chapter of the Association for Computational Linguistics |
| Book subtitle | proceedings of the conference : April 23-27 2012, Avignon France |
| ISBN |
|
| Event | EACL 2012: 13th Conference of the European Chapter of the Association for Computational Linguistics |
| Pages (from-to) | 2-11 |
| Publisher | Stroudsburg, PA: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages. In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases. We convert the clusters into graphs, add smoothing/syntactic-information-carrier vertices, and compute the similarity between phrases with a random walk-based measure, the commute time. The resulting phrase-paraphrase probabilities are built upon the conversion of the commute times into artificial co-occurrence counts with a novel technique. The co-occurrence count distribution belongs to the power-law family.
|
| Document type | Conference contribution |
| Language | English |
| Published at | http://www.aclweb.org/anthology/E/E12/E12-1002.pdf http://dl.acm.org/citation.cfm?id=2380820 |
| Downloads |
381121
(Final published version)
|
| Permalink to this page | |