Robustness Evaluation of Entity Disambiguation Using Prior Probes: the Case of Entity Overshadowing

Open Access
Authors
Publication date 2021
Host editors
  • M.-C. Moens
  • X. Huang
  • L. Specia
  • S.W. Yih
Book title 2021 Conference on Empirical Methods in Natural Language Processing
Book subtitle EMNLP 2021 : proceedings of the conference : November 7-11, 2021
ISBN (electronic)
  • 9781955917094
Event 2021 Conference on Empirical Methods in Natural Language Processing
Pages (from-to) 10501-10510
Number of pages 10
Publisher Stroudsburg, PA: The Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Entity disambiguation (ED) is the last step of entity linking (EL), when candidate entities are reranked according to the context they appear in. All datasets for training and evaluating models for EL consist of convenience samples, such as news articles and tweets, that propagate the prior probability bias of the entity distribution towards more frequently occurring entities. It was shown that the performance of the EL systems on such datasets is overestimated since it is possible to obtain higher accuracy scores by merely learning the prior. To provide a more adequate evaluation benchmark, we introduce the ShadowLink dataset, which includes 16K short text snippets annotated with entity mentions. We evaluate and report the performance of popular EL systems on the ShadowLink benchmark. The results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation, demonstrating the effect of prior probability bias and entity overshadowing.
Document type Conference contribution
Note With supplementary video
Language English
Published at https://doi.org/10.18653/v1/2021.emnlp-main.820
Downloads
2021.emnlp-main.820 (Final published version)
Supplementary materials
Permalink to this page
Back