Boosting Entity Linking Performance by Leveraging Unlabeled Documents
| Authors | |
|---|---|
| Publication date | 2019 |
| Host editors |
|
| Book title | The 57th Annual Meeting of the Association for Computational Linguistics |
| Book subtitle | ACL 2019 : proceedings of the conference : July 28-August 2, 2019, Florence, Italy |
| ISBN (electronic) |
|
| Event | The 57th Annual Meeting of the Association for Computational Linguistics - ACL 2019 |
| Pages (from-to) | 1935-1945 |
| Publisher | Stroudsburg, PA: The Association for Computational Linguistics |
| Organisations |
|
| Abstract |
Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/P19-1187 |
| Other links | https://vimeo.com/384532543 |
| Downloads |
P19-1187
(Final published version)
|
| Permalink to this page | |