Significant Words Representations of Entities
| Authors | |
|---|---|
| Publication date | 2016 |
| Book title | SIGIR'16 |
| Book subtitle | the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval: Pisa, Italy , July 17-21, 2016 |
| ISBN (electronic) |
|
| Event | SIGIR 2016: 39th international ACM SIGIR conference on Research and development in information retrieval |
| Pages (from-to) | 1183 |
| Number of pages | 1 |
| Publisher | New York, NY: Association for Computing Machinery |
| Organisations |
|
| Abstract |
Transforming the data into a suitable representation is the first key step of data analysis, and the performance of any data-oriented method is heavily depending on it. We study questions on how we can best learn representations for textual entities that are: 1) precise, 2) robust against noisy terms, 3) transferable over time, and 4) interpretable by human inspection. Inspired by the early work of Luhn[1], we propose significant words language models of a set of documents that capture all, and only, the significant shared terms from them. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of incidental rare terms that are only explained by specific documents, which eventually results in having only the significant terms left in the model.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/2911451.2911474 |
| Downloads |
p1183-dehghani
(Final published version)
|
| Permalink to this page | |