Advances in information verification using natural language processing

Open Access
Authors
Supervisors
Award date 03-04-2024
Number of pages 143
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The accelerating proliferation of digital textual information demands effective information verification systems. False claims can lead to misinformation and detrimental effects on individuals and society. This issue extends beyond general information. The influx of scientific publications necessitates the careful validation of new findings and information. In this thesis, we seek to address information verification from various natural language processing perspectives to meet these critical demands.
First, we address claim verification as a text classification task. We construct a pipeline that effectively retrieves relevant information, thereby providing supportive evidence for verifying claims. Evidence-based claim verification improves verification accuracy and interpretability. For evidence retrieval, we examine pairwise and pointwise approaches. Additionally, we propose a hard negative mining approach to address extremely imbalanced datasets.
Second, we focus on information verification through a Question Answering (QA) system, facilitating comprehensive analysis of documents and information. We focus on non-factoid QA, a category encompassing many real-life questions for information verification. Non-factoid questions require complex responses, including descriptions or opinions, often spanning multiple sentences. Traditional QA evaluation metrics, which measure word overlap between references and answers, are unsuited for these longer answers. We propose a new evaluation metric designed to verify the accuracy of such answers.
Third, we investigate document summarization to enhance user experiences with significant amounts of information and long documents. We focus on aspect-based summarization, summarizing a document based on specific points of interest. It enables readers to navigate articles quickly and aids document assistance systems for information verification. We propose a self-supervised pre-training method to improve few-shot and zero-shot performance with unlabelled data and unseen aspects or domains.
Finally, we propose an automatic factuality evaluation metric for document summarization. Current summarization models face the challenge of generating nonfactual summaries inconsistent with their source documents. We propose a nonfactual summary data generation model. These synthetic negative summaries assist in training a classifier to evaluate factuality. Our metric achieves high performance even without human-annotated reference summaries.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back