Boosting Entity Linking Performance by Leveraging Unlabeled Documents

P. Le; I. Titov

doi:https://doi.org/10.18653/v1/P19-1187

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Authors	P. Le I. Titov
Publication date	2019
Host editors	A. Korhonen D. Traum L. Màrquez
Book title	The 57th Annual Meeting of the Association for Computational Linguistics
Book subtitle	ACL 2019 : proceedings of the conference : July 28-August 2, 2019, Florence, Italy
ISBN (electronic)	9781950737482
Event	The 57th Annual Meeting of the Association for Computational Linguistics - ACL 2019
Pages (from-to)	1935-1945
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/P19-1187
Other links	https://vimeo.com/384532543
Downloads	P19-1187 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Boosting Entity Linking Performance by Leveraging Unlabeled Documents