FANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German
| Authors |
|
|---|---|
| Publication date | 2021 |
| Host editors |
|
| Book title | FEVER : Fact Extraction and VERification |
| Book subtitle | Proceedings of the Fourth Workshop : EMNLP 2021 : November 10, 2021 |
| ISBN (electronic) |
|
| Event | 4th workshop Fact Extraction and VERification |
| Pages (from-to) | 78-91 |
| Publisher | Stroudsburg, PA: The Association for Computational Linguistics |
| Organisations |
|
| Abstract |
As the world continues to fight the COVID-19 pandemic, it is
simultaneously fighting an ‘infodemic’ – a flood of disinformation and
spread of conspiracy theories leading to health threats and the division
of society. To combat this infodemic, there is an urgent need for
benchmark datasets that can help researchers develop and evaluate models
geared towards automatic detection of disinformation. While there are
increasing efforts to create adequate, open-source benchmark datasets
for English, comparable resources are virtually unavailable for German,
leaving research for the German language lagging significantly behind.
In this paper, we introduce the new benchmark dataset FANG-COVID
consisting of 28,056 real and 13,186 fake German news articles related
to the COVID-19 pandemic as well as data on their propagation on
Twitter. Furthermore, we propose an explainable textual- and social
context-based model for fake news detection, compare its performance to
“black-box” models and perform feature ablation to assess the relative
importance of human-interpretable features in distinguishing fake news
from authentic news.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/2021.fever-1.9 |
| Downloads |
2021.fever-1.9
(Final published version)
|
| Permalink to this page | |